Why was faiss not used for DC2? #53

yutaizhou · 2021-03-01T19:17:54Z

Hello,

Absolutely fascinating work! I was looking into how DC2 improves upon the original DC, and noticed that DC2 implementation does not use faiss for clustering. May I know why this choice was made?

Thank you.

mathildecaron31 · 2021-03-18T09:21:29Z

Hello @yutaizhou

I implemented k-means clustering with pytorch directly in this work because I found it simpler to avoid an additionnal dependency.

I think you can implement DC2 with faiss as well, it should not make any difference performance wise :)

yutaizhou · 2021-03-18T11:35:33Z

Ah yeah, getting faiss to install and interoperate with torch tensors (on both GPU and CPU) have not been straight forward, but that is the approach I am currently taking. Thanks!

yutaizhou · 2021-03-18T11:43:11Z

One more question, are you aware of any application of DeepClustering, DC2, or SwAV, applied to time series data? I am trying to implement DC2 with a transformer backbone for time series data, and would love to see what others have done :)

mathildecaron31 · 2021-03-18T11:45:07Z

Mm no sorry I cannot think of any such work :/

yutaizhou · 2021-03-18T13:15:41Z

Sorry I have yet another question that I was hoping you could answer. In section D.1 of the paper, the cross entropy loss between the pseudo label q and the classification score z@c is minimized with respect to z. This means only the backbone architecture, i.e., resnet, is updated, not the classification head itself, which contains the centroids learned from k-means.

Centroids are supposed to be updated once per epoch from k-means, and backbone is supposed to be updated with every mini-batch. In your code, however, you call detach() on the output of the backbone, not the classification head. Wouldn't this flow the gradients only to the centroids, and not the parameters of the backbone?

Or am I not understanding the detach() operation correctly?

BoPang1996 · 2021-05-13T09:05:10Z

Sorry I have yet another question that I was hoping you could answer. In section D.1 of the paper, the cross entropy loss between the pseudo label q and the classification score z@c is minimized with respect to z. This means only the backbone architecture, i.e., resnet, is updated, not the classification head itself, which contains the centroids learned from k-means.

Centroids are supposed to be updated once per epoch from k-means, and backbone is supposed to be updated with every mini-batch. In your code, however, you call detach() on the output of the backbone, not the classification head. Wouldn't this flow the gradients only to the centroids, and not the parameters of the backbone?

Or am I not understanding the detach() operation correctly?

I have the same question. Why the prototypes are updated every iteration.

yutaizhou closed this as completed Mar 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why was faiss not used for DC2? #53

Why was faiss not used for DC2? #53

yutaizhou commented Mar 1, 2021

mathildecaron31 commented Mar 18, 2021 •

edited

yutaizhou commented Mar 18, 2021

yutaizhou commented Mar 18, 2021

mathildecaron31 commented Mar 18, 2021

yutaizhou commented Mar 18, 2021

BoPang1996 commented May 13, 2021 •

edited

Why was faiss not used for DC2? #53

Why was faiss not used for DC2? #53

Comments

yutaizhou commented Mar 1, 2021

mathildecaron31 commented Mar 18, 2021 • edited

yutaizhou commented Mar 18, 2021

yutaizhou commented Mar 18, 2021

mathildecaron31 commented Mar 18, 2021

yutaizhou commented Mar 18, 2021

BoPang1996 commented May 13, 2021 • edited

mathildecaron31 commented Mar 18, 2021 •

edited

BoPang1996 commented May 13, 2021 •

edited