You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a set of 4000 images which I want to create into a cluster. My images are a large set of images taken from various fixed cameras (might move a small, small bit due to wind), some at day some at night, and they might have people, dogs, cats, etc. I am trying to create clusters based on the camera (i.e. clusters of images all taken by the same camera).
I've got image-match running and have done the following modifications to the library to attempt and get a complete distance matrix:
I have tried settings distance_cutoff of SignatureDatabaseBase() to 1.0, and size of SignatureES() to 4000, but I seem to be getting a sparse 4000x4000 matrix.
Is there any easy way to get the full distance matrix?
Also, any hints on when increasing k, N and n_grid is correct for more precise results?
I also noticed some images contain specific textual labels embedded in the image in the same places (like date/time and camera name). Since these labels aren't big, I'm pretty sure they're mostly ignored here - am I right?
The text was updated successfully, but these errors were encountered:
For 4000 images, I would not use the database part of the package. Just use the generate_signature method from the ImageSignature class in image_match/goldberg.py on your images, and then use the normalized_distance over all pairs of signatures to generate your distance matrix.
Roughly speaking, decreasing k and increasing N should give you better results at the expense of lookup speed. Similarly, increasing n_grid should give you more discerning signatures (i.e. longer). I haven't tested anything but the defaults with any rigor though.
You are correct in that the labels shouldn't make much of a difference. If you have a couple examples of images you expect to cluster, could you post them here so I could advise further?
Hi there,
I have a set of 4000 images which I want to create into a cluster. My images are a large set of images taken from various fixed cameras (might move a small, small bit due to wind), some at day some at night, and they might have people, dogs, cats, etc. I am trying to create clusters based on the camera (i.e. clusters of images all taken by the same camera).
I'm planning on using HDBSCAN for this:
http://hdbscan.readthedocs.io/en/latest/basic_hdbscan.html
I've got image-match running and have done the following modifications to the library to attempt and get a complete distance matrix:
I have tried settings
distance_cutoff
of SignatureDatabaseBase() to 1.0, andsize
of SignatureES() to 4000, but I seem to be getting a sparse 4000x4000 matrix.Is there any easy way to get the full distance matrix?
Also, any hints on when increasing k, N and n_grid is correct for more precise results?
I also noticed some images contain specific textual labels embedded in the image in the same places (like date/time and camera name). Since these labels aren't big, I'm pretty sure they're mostly ignored here - am I right?
The text was updated successfully, but these errors were encountered: