Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README overhaul #15

Closed
markusressel opened this issue Oct 2, 2019 · 7 comments · Fixed by #17
Closed

README overhaul #15

markusressel opened this issue Oct 2, 2019 · 7 comments · Fixed by #17
Assignees
Labels
enhancement New feature or request

Comments

@markusressel
Copy link
Owner

markusressel commented Oct 2, 2019

Some things about how py-image-dedup works changed since v1.0.0 and the README needs some guidance on how to use the docker-compose file. A big overhaul of the README is necessary.

Specify

  • since a fork of image-match supporting elasticsearch v6 as well as v7 is now used the cumbersome package dependency section for it can be removed
  • how the daemon works
  • how and what statistics are exposed
  • how to use with docker-compose
@markusressel markusressel self-assigned this Oct 2, 2019
@markusressel markusressel added the enhancement New feature or request label Oct 2, 2019
@jasontitus
Copy link

Any chance for a quick update that just shows what the index creation call should be with v7? For those of us who aren't seasoned ElasticSearch users it is totally not clear how we need to change it.

@markusressel
Copy link
Owner Author

markusressel commented Feb 10, 2020

For v7 just omit the image node :

curl -X PUT "192.168.2.115:9200/images?pretty" -H "Content-Type: application/json" -d "
    {
      \"mappings\": {
        \"properties\": {
          \"path\": {
            \"type\": \"keyword\",
            \"ignore_above\": 256
          }
        }
      }
    }"

Otherwise you simply have to insert a image node for v6 and _doc node otherwise like this:

curl -X PUT "192.168.2.115:9200/images?pretty" -H "Content-Type: application/json" -d "
    {
      \"mappings\": {
        \"_doc\": {
          \"properties\": {
            \"path\": {
              \"type\": \"keyword\",
              \"ignore_above\": 256
            }
          }
        }
      }
    }"

The WIP version of py-image-dedup is able to create such an index automatically. I currently just do not have the time to get to it :(

@jasontitus
Copy link

Awesome. That worked. Very helpful for those of us who have never used ElasticSearch.

@jasontitus
Copy link

jasontitus commented Feb 15, 2020

Actually, while it seemed to create it, after running the script processing thousands of images I still seem to have an empty index. It looks like it is fetching against it but not inserting into it. Any best way to debug why it might not be adding images to the index?

curl 'localhost:9200/images/_stats'
{"_shards":{"total":2,"successful":1,"failed":0},"_all":{"primaries":{"docs":{"count":0,"deleted":0},"store":{"size_in_bytes":283},"indexing":{"index_total":0,"index_time_in_millis":0,"index_current":0,"index_failed":0,"delete_total":0,"delete_time_in_millis":0,"delete_current":0,"noop_update_total":0,"is_throttled":false,"throttle_time_in_millis":0},"get":{"total":0,"time_in_millis":0,"exists_total":0,"exists_time_in_millis":0,"missing_total":0,"missing_time_in_millis":0,"current":0},"search":{"open_contexts":0,"query_total":4874494,"query_time_in_millis":125150,"query_current":0,"fetch_total":4874494,"fetch_time_in_millis":33225,"fetch_current":0,"scroll_total":249803,"scroll_time_in_millis":32582,"scroll_current":0,"suggest_total":0,"suggest_time_in_millis":0,"suggest_current":0},"merges":{"current":0,"current_docs":0,"current_size_in_bytes":0,"total":0,"total_time_in_millis":0,"total_docs":0,"total_size_in_bytes":0,"total_stopped_time_in_millis":0,"total_throttled_time_in_millis":0,"total_auto_throttle_in_bytes":20971520},"refresh":{"total":2,"total_time_in_millis":0,"external_total":2,"external_total_time_in_millis":0,"listeners":0},"flush":{"total":1,"periodic":0,"total_time_in_millis":0},"warmer":{"current":0,"total":1,"total_time_in_millis":0},"query_cache":{"memory_size_in_bytes":0,"total_count":0,"hit_count":0,"miss_count":0,"cache_size":0,"cache_count":0,"evictions":0},"fielddata":{"memory_size_in_bytes":0,"evictions":0},"completion":{"size_in_bytes":0},"segments":{"count":0,"memory_in_bytes":0,"terms_memory_in_bytes":0,"stored_fields_memory_in_bytes":0,"term_vectors_memory_in_bytes":0,"norms_memory_in_bytes":0,"points_memory_in_bytes":0,"doc_values_memory_in_bytes":0,"index_writer_memory_in_bytes":0,"version_map_memory_in_bytes":0,"fixed_bit_set_memory_in_bytes":0,"max_unsafe_auto_id_timestamp":-1,"file_sizes":{}},"translog":{"operations":0,"size_in_bytes":110,"uncommitted_operations":0,"uncommitted_size_in_bytes":110,"earliest_last_modified_age":0},"request_cache":{"memory_size_in_bytes":1387,"evictions":0,"hit_count":10,"miss_count":2},"recovery":{"current_as_source":0,"current_as_target":0,"throttle_time_in_millis":0}},"total":{"docs":{"count":0,"deleted":0},"store":{"size_in_bytes":283},"indexing":{"index_total":0,"index_time_in_millis":0,"index_current":0,"index_failed":0,"delete_total":0,"delete_time_in_millis":0,"delete_current":0,"noop_update_total":0,"is_throttled":false,"throttle_time_in_millis":0},"get":{"total":0,"time_in_millis":0,"exists_total":0,"exists_time_in_millis":0,"missing_total":0,"missing_time_in_millis":0,"current":0},"search":{"open_contexts":0,"query_total":4874494,"query_time_in_millis":125150,"query_current":0,"fetch_total":4874494,"fetch_time_in_millis":33225,"fetch_current":0,"scroll_total":249803,"scroll_time_in_millis":32582,"scroll_current":0,"suggest_total":0,"suggest_time_in_millis":0,"suggest_current":0},"merges":{"current":0,"current_docs":0,"current_size_in_bytes":0,"total":0,"total_time_in_millis":0,"total_docs":0,"total_size_in_bytes":0,"total_stopped_time_in_millis":0,"total_throttled_time_in_millis":0,"total_auto_throttle_in_bytes":20971520},"refresh":{"total":2,"total_time_in_millis":0,"external_total":2,"external_total_time_in_millis":0,"listeners":0},"flush":{"total":1,"periodic":0,"total_time_in_millis":0},"warmer":{"current":0,"total":1,"total_time_in_millis":0},"query_cache":{"memory_size_in_bytes":0,"total_count":0,"hit_count":0,"miss_count":0,"cache_size":0,"cache_count":0,"evictions":0},"fielddata":{"memory_size_in_bytes":0,"evictions":0},"completion":{"size_in_bytes":0},"segments":{"count":0,"memory_in_bytes":0,"terms_memory_in_bytes":0,"stored_fields_memory_in_bytes":0,"term_vectors_memory_in_bytes":0,"norms_memory_in_bytes":0,"points_memory_in_bytes":0,"doc_values_memory_in_bytes":0,"index_writer_memory_in_bytes":0,"version_map_memory_in_bytes":0,"fixed_bit_set_memory_in_bytes":0,"max_unsafe_auto_id_timestamp":-1,"file_sizes":{}},"translog":{"operations":0,"size_in_bytes":110,"uncommitted_operations":0,"uncommitted_size_in_bytes":110,"earliest_last_modified_age":0},"request_cache":{"memory_size_in_bytes":1387,"evictions":0,"hit_count":10,"miss_count":2},"recovery":{"current_as_source":0,"current_as_target":0,"throttle_time_in_millis":0}}},"indices":{"images":{"uuid":"h_EB5_h6SoKo1_Ls4zFj3w","primaries":{"docs":{"count":0,"deleted":0},"store":{"size_in_bytes":283},"indexing":{"index_total":0,"index_time_in_millis":0,"index_current":0,"index_failed":0,"delete_total":0,"delete_time_in_millis":0,"delete_current":0,"noop_update_total":0,"is_throttled":false,"throttle_time_in_millis":0},"get":{"total":0,"time_in_millis":0,"exists_total":0,"exists_time_in_millis":0,"missing_total":0,"missing_time_in_millis":0,"current":0},"search":{"open_contexts":0,"query_total":4874494,"query_time_in_millis":125150,"query_current":0,"fetch_total":4874494,"fetch_time_in_millis":33225,"fetch_current":0,"scroll_total":249803,"scroll_time_in_millis":32582,"scroll_current":0,"suggest_total":0,"suggest_time_in_millis":0,"suggest_current":0},"merges":{"current":0,"current_docs":0,"current_size_in_bytes":0,"total":0,"total_time_in_millis":0,"total_docs":0,"total_size_in_bytes":0,"total_stopped_time_in_millis":0,"total_throttled_time_in_millis":0,"total_auto_throttle_in_bytes":20971520},"refresh":{"total":2,"total_time_in_millis":0,"external_total":2,"external_total_time_in_millis":0,"listeners":0},"flush":{"total":1,"periodic":0,"total_time_in_millis":0},"warmer":{"current":0,"total":1,"total_time_in_millis":0},"query_cache":{"memory_size_in_bytes":0,"total_count":0,"hit_count":0,"miss_count":0,"cache_size":0,"cache_count":0,"evictions":0},"fielddata":{"memory_size_in_bytes":0,"evictions":0},"completion":{"size_in_bytes":0},"segments":{"count":0,"memory_in_bytes":0,"terms_memory_in_bytes":0,"stored_fields_memory_in_bytes":0,"term_vectors_memory_in_bytes":0,"norms_memory_in_bytes":0,"points_memory_in_bytes":0,"doc_values_memory_in_bytes":0,"index_writer_memory_in_bytes":0,"version_map_memory_in_bytes":0,"fixed_bit_set_memory_in_bytes":0,"max_unsafe_auto_id_timestamp":-1,"file_sizes":{}},"translog":{"operations":0,"size_in_bytes":110,"uncommitted_operations":0,"uncommitted_size_in_bytes":110,"earliest_last_modified_age":0},"request_cache":{"memory_size_in_bytes":1387,"evictions":0,"hit_count":10,"miss_count":2},"recovery":{"current_as_source":0,"current_as_target":0,"throttle_time_in_millis":0}},"total":{"docs":{"count":0,"deleted":0},"store":{"size_in_bytes":283},"indexing":{"index_total":0,"index_time_in_millis":0,"index_current":0,"index_failed":0,"delete_total":0,"delete_time_in_millis":0,"delete_current":0,"noop_update_total":0,"is_throttled":false,"throttle_time_in_millis":0},"get":{"total":0,"time_in_millis":0,"exists_total":0,"exists_time_in_millis":0,"missing_total":0,"missing_time_in_millis":0,"current":0},"search":{"open_contexts":0,"query_total":4874494,"query_time_in_millis":125150,"query_current":0,"fetch_total":4874494,"fetch_time_in_millis":33225,"fetch_current":0,"scroll_total":249803,"scroll_time_in_millis":32582,"scroll_current":0,"suggest_total":0,"suggest_time_in_millis":0,"suggest_current":0},"merges":{"current":0,"current_docs":0,"current_size_in_bytes":0,"total":0,"total_time_in_millis":0,"total_docs":0,"total_size_in_bytes":0,"total_stopped_time_in_millis":0,"total_throttled_time_in_millis":0,"total_auto_throttle_in_bytes":20971520},"refresh":{"total":2,"total_time_in_millis":0,"external_total":2,"external_total_time_in_millis":0,"listeners":0},"flush":{"total":1,"periodic":0,"total_time_in_millis":0},"warmer":{"current":0,"total":1,"total_time_in_millis":0},"query_cache":{"memory_size_in_bytes":0,"total_count":0,"hit_count":0,"miss_count":0,"cache_size":0,"cache_count":0,"evictions":0},"fielddata":{"memory_size_in_bytes":0,"evictions":0},"completion":{"size_in_bytes":0},"segments":{"count":0,"memory_in_bytes":0,"terms_memory_in_bytes":0,"stored_fields_memory_in_bytes":0,"term_vectors_memory_in_bytes":0,"norms_memory_in_bytes":0,"points_memory_in_bytes":0,"doc_values_memory_in_bytes":0,"index_writer_memory_in_bytes":0,"version_map_memory_in_bytes":0,"fixed_bit_set_memory_in_bytes":0,"max_unsafe_auto_id_timestamp":-1,"file_sizes":{}},"translog":{"operations":0,"size_in_bytes":110,"uncommitted_operations":0,"uncommitted_size_in_bytes":110,"earliest_last_modified_age":0},"request_cache":{"memory_size_in_bytes":1387,"evictions":0,"hit_count":10,"miss_count":2},"recovery":{"current_as_source":0,"current_as_target":0,"throttle_time_in_millis":0}}}}}

@jasontitus
Copy link

I suspect it is because there are 400 errors on insert although it isn't clear why -
POST http://localhost:9200/images/image?refresh=false [status:400 request:0.004s]
POST http://localhost:9200/images/image?refresh=false [status:400 request:0.004s]

@markusressel
Copy link
Owner Author

py-image-dedup probably doesnt use the correct request format for your version of elasticsearch. v1.0.0 can not work around this without changing the code. You can try with the latest version from master which should detect your EL version automatically. There is no release for that version yet, its on my TODO list.

@markusressel
Copy link
Owner Author

@jasontitus I have invested a couple hours, updated dependencies and fixed related stuff. I have not yet released a new version since it doesn't feel polished enough yet, but you can try the latest version from master or dockerhub if you want to give it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants