Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Why was faiss not used for DC2? #53

Closed
yutaizhou opened this issue Mar 1, 2021 · 6 comments
Closed

Why was faiss not used for DC2? #53

yutaizhou opened this issue Mar 1, 2021 · 6 comments

Comments

@yutaizhou
Copy link

Hello,

Absolutely fascinating work! I was looking into how DC2 improves upon the original DC, and noticed that DC2 implementation does not use faiss for clustering. May I know why this choice was made?

Thank you.

@mathildecaron31
Copy link
Contributor

mathildecaron31 commented Mar 18, 2021

Hello @yutaizhou

I implemented k-means clustering with pytorch directly in this work because I found it simpler to avoid an additionnal dependency.

I think you can implement DC2 with faiss as well, it should not make any difference performance wise :)

@yutaizhou
Copy link
Author

Ah yeah, getting faiss to install and interoperate with torch tensors (on both GPU and CPU) have not been straight forward, but that is the approach I am currently taking. Thanks!

@yutaizhou
Copy link
Author

One more question, are you aware of any application of DeepClustering, DC2, or SwAV, applied to time series data? I am trying to implement DC2 with a transformer backbone for time series data, and would love to see what others have done :)

@mathildecaron31
Copy link
Contributor

Mm no sorry I cannot think of any such work :/

@yutaizhou
Copy link
Author

Sorry I have yet another question that I was hoping you could answer. In section D.1 of the paper, the cross entropy loss between the pseudo label q and the classification score z@c is minimized with respect to z. This means only the backbone architecture, i.e., resnet, is updated, not the classification head itself, which contains the centroids learned from k-means.

Centroids are supposed to be updated once per epoch from k-means, and backbone is supposed to be updated with every mini-batch. In your code, however, you call detach() on the output of the backbone, not the classification head. Wouldn't this flow the gradients only to the centroids, and not the parameters of the backbone?

Or am I not understanding the detach() operation correctly?

@BoPang1996
Copy link

BoPang1996 commented May 13, 2021

Sorry I have yet another question that I was hoping you could answer. In section D.1 of the paper, the cross entropy loss between the pseudo label q and the classification score z@c is minimized with respect to z. This means only the backbone architecture, i.e., resnet, is updated, not the classification head itself, which contains the centroids learned from k-means.

Centroids are supposed to be updated once per epoch from k-means, and backbone is supposed to be updated with every mini-batch. In your code, however, you call detach() on the output of the backbone, not the classification head. Wouldn't this flow the gradients only to the centroids, and not the parameters of the backbone?

Or am I not understanding the detach() operation correctly?

I have the same question. Why the prototypes are updated every iteration.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants