Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature IC: Show how to extract DNN features for a given image(s) #10

Closed
PatrickBue opened this issue Feb 14, 2019 · 4 comments
Closed

Comments

@PatrickBue
Copy link
Contributor

Ideally this should use batching to speed up feature computation. Also, could include a toy example, e.g. k-means clustering of an image dataset

@PatrickBue PatrickBue created this issue from a note in IC data science features (To do) Feb 14, 2019
@miguelgfierro miguelgfierro self-assigned this Mar 4, 2019
@PatrickBue PatrickBue moved this from To do to In progress in IC data science features Mar 28, 2019
@miguelgfierro
Copy link
Member

question here @PatrickBue @loomlike @jainr @maxkazmsft

Patrick and I discussed about reformatting the computation metrics to use sklearn pairwise distances.

Recently I've been doing a lot of profiling for the reco proejct, so I did it here as well. It turns out that sklearn is much slower (I haven't tried all the functions though):

from sklearn.metrics import pairwise_distances
def compute_vector_distance2(vec1, vec2,method="l2"):
    dist = pairwise_distances(vec1.reshape(1, -1), vec2.reshape(1, -1), method)
    return dist[0][0]
print(feat1.shape) #(2048,)
%timeit compute_vector_distance(feat1, feat2, "l2")
#7.33 µs ± 43.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit compute_vector_distance2(feat1, feat2, "l2")
#109 µs ± 692 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

this happens because compute_vector_distance is using np.linalg.norm instead of the sklearn equivalent.
it's up to you guys, I'm a fan of priorizing redability over speed in python, if you think the original code is not readable I can refactor to sklearn, if you think it is readable, the original code is faster.

@PatrickBue
Copy link
Contributor Author

I am on the same page as you Miguel, readability over speed, and not (re)implementing our own distance metrics. That's interesting finding and somewhat surprising.

@miguelgfierro
Copy link
Member

very nice results at the end: https://github.com/Microsoft/ComputerVision/blob/e011b08cca5eb3c35483cc1b3df8863eb51a5efe/image_similarity/notebooks/image_similarity_introduction.ipynb

there is an interesting mix of several things, first using resnet50 vs resnet18, the small one didn't converge. The key for the results I think it was to use a small feature size (512) instead of the initial ones that I had which was 2048. Not sure if also having batch normalization helped (could be, haven't tested). The small feature size probably also helps with the L2 distance. I can imagine that using KL could help if we use a larger feature size. Using finetunning vs freezing also improved the last computation.

Alexandra (what is her github user?) and I are planning to improve this, then she will take over

@ateste
Copy link
Contributor

ateste commented Apr 18, 2019

very nice results at the end: https://github.com/Microsoft/ComputerVision/blob/e011b08cca5eb3c35483cc1b3df8863eb51a5efe/image_similarity/notebooks/image_similarity_introduction.ipynb

there is an interesting mix of several things, first using resnet50 vs resnet18, the small one didn't converge. The key for the results I think it was to use a small feature size (512) instead of the initial ones that I had which was 2048. Not sure if also having batch normalization helped (could be, haven't tested). The small feature size probably also helps with the L2 distance. I can imagine that using KL could help if we use a larger feature size. Using finetunning vs freezing also improved the last computation.

Alexandra (what is her github user?) and I are planning to improve this, then she will take over

Very cool, indeed! Nice work, Miguel! My username is ateste.

@PatrickBue PatrickBue moved this from In progress to Done in IC data science features May 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

3 participants