Skip to content

rgeirhos/dataset-pruning-metrics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 

Repository files navigation

Metric scores

This repository contains metric scores for "Beyond neural scaling laws: beating power law scaling via data pruning" (NeurIPS 2022 Outstanding Paper Award) by Sorscher*, Geirhos*, Shekhar, Ganguli, & Morcos.

Here is a list of metrics and corresponding papers:

Thanks to the respective authors for their contributions, please make sure to cite them if you use the scores!

A few notes regarding the scores:

  • While different scores have different ranges, they are all structured in a way such that high scores indicate keeping an example is beneficial (according to a certain metric). This is the natural structure for some metrics, and perhaps counter-intuitive for some scores which had to be reversed for consistency reasons. For instance, the prototype scores are based on Cosine similarity which is, in its raw form, high for high similarity (= high redundancy); but we then reversed the scores for consistency with the other scores such that now high scores mean low Cosine similarity, i.e. keeping the example is good because there is reduced redundancy.
  • active-learning is essentially a binary score - the method selects examples such that 80% of ImageNet is used. To adapt it for our purposes, we assigned a value of either 1 or 0 depending on whether the images were included in this subset. This means that the active learning scores here aren't calibrated for pruning more / less than 20% of ImageNet.

About

Metrics for "Beyond neural scaling laws: beating power law scaling via data pruning " (NeurIPS 2022 Outstanding Paper Award)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published