Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper segment: Comparing scores of Clustering to ClusteringFast #835

Closed
KennethEnevoldsen opened this issue May 28, 2024 · 7 comments
Closed
Assignees

Comments

@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented May 28, 2024

These segments compare the new bootstrapping method (+ multilevel) for clustering methods to the older one. The goal is to establish that it produces similar results (model rank) at a lower cost.

Details

  1. Using e.g. the English benchmark's clustering tasks, obtain v_measures of 3 models.
  2. On the same task's v2 (fast version) and on the same models, obtain v_measures
  3. These two methods should then obtain a spearman ranks correlation of ~1. If we just rank them I believe it would be equivalent to just using Spearman ranks correlation on the v_measure
  4. Additionally, run a check for duplicate sentences like in Speed up Reranking tasks #793 (e.g. on duplicate examples in TwentyNewsgroupsClustering #407 )

Multi-level will be covered in #849

@isaac-chung
Copy link
Collaborator

I'm happy to collaborate and help out here. I'd imagine the evaluation criteria are quite similar to those of the retrieval speedups?

@KennethEnevoldsen
Copy link
Contributor Author

Yep, quite similar feel free to update the first comment. I believe a good case for this is the MTEB(eng) clustering tasks e.g. arxiv clustering (though it might be too large).

Not sure exactly what the most convincing argument is here, but something like:

model old rank new score
m1 1 1 (-)
m2 2 3 (downarrow)
m3 3 2 (uparrow)

might be simple, but hard to scale to many tasks (ideally we want all ranks to stay the same). Significant rank is probably a better metric.

@isaac-chung
Copy link
Collaborator

I would probably go with a small, medium, large, and then "close-relative" for the large one (we want to differentiate if we do a spearman's correlation matrix we should have perfect for small, medium, large and close to perfect for large and relatives)

@KennethEnevoldsen I picked e5-small, e5-base, and e5-large. Any preferences on the "close-relative" model?

@isaac-chung
Copy link
Collaborator

isaac-chung commented Jun 8, 2024

Looks like we mixed up how much to embed and how much to sample for bootstrapping. Added more analyses in the PR above. Now should be seeing even faster gains and better spearman score.

@KennethEnevoldsen
Copy link
Contributor Author

A few good pairs for:

would be:

If we can meaningfully differentiate these models I would be quite happy.

@isaac-chung
Copy link
Collaborator

isaac-chung commented Jun 14, 2024

Adding a comment here for visibility of the progress made in #892:

  1. So far we've found a way forward to compare ranks and significant ranks of models in Clustering and ClusteringFast tasks in the English benchmark. On average we observe a 15x speedup.
  2. Instead of downsampling all datasets with a single value, we downsample datasets to 4% of its original size. The only exceptions are RedditClustering and StackExchangeClustering (they use 32768 samples) due to their high category count and short documents. Under this method when comparing the classic and fast versions, all tasks exhibit moderate to perfect agreement. (see table in PR description)
  3. A few more models are to be run. Probably on the higher end of the English benchmark to get a wider spread.
    a. e5-large-v2 for large
    b. paraphrase-multilingual-mpnet-base-v2 for medium
  4. The table in the PR description can be added to the paper, specifically in the section where the speed up is described. Detailed plots to be added to the Appendix B under the ClusteringFast dataset construction section.

@isaac-chung
Copy link
Collaborator

With the PR merged and the bulk of the content moved to the paper draft, I'm closing the issue. Can be reopened if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants