Skip to content

Commit

Permalink
add 6/5
Browse files Browse the repository at this point in the history
  • Loading branch information
irenetrampoline committed Jun 5, 2018
1 parent ac6eaf7 commit d3b3940
Show file tree
Hide file tree
Showing 3 changed files with 24 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Expand Up @@ -2,6 +2,8 @@
My goal is to read an academic paper every day. Here I keep myself accountable.

## Papers
**Jun 05, 2018:** [Do CIFAR-10 Classifiers Generalize to CIFAR-10?.](writeups/RecEtAl18.md) B. Recht, R. Roelofs, L. Schmit, V. Shankar. 2018. [[pdf]](https://arxiv.org/pdf/1806.00451.pdf)

**Jun 03, 2018:** [Large-scale Analysis of Counseling Conversations: An Application of Natural Language Processing to Mental Health.](writeups/AltClaLes16.md) T. Althoff, K. Clark, J. Leskovec. 2016. [[pdf]](http://www.aclweb.org/anthology/Q16-1033)

**May 31, 2018:** [Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness.](writeups/KeaEtAl18.md) M. Kearns, S. Neel, A. Roth, Z. Steven Wu. 2018. [[pdf]](https://arxiv.org/pdf/1711.05144.pdf)
Expand Down
Binary file added pdfs/RecEtAl18.pdf.pdf
Binary file not shown.
22 changes: 22 additions & 0 deletions writeups/RecEtAl18.md
@@ -0,0 +1,22 @@
# Do CIFAR-10 Classifiers Generalize to CIFAR-10?

Benjamin Recht, Rebecca Roelofs, Ludwig Schmit, Vaishaal Shankar. [Do CIFAR-10 Classifiers Generalize to CIFAR-10?.](https://arxiv.org/pdf/1806.00451.pdf) 2018.

## tl;dr
- To understand overfitting, authors augmented CIFAR-10 with new (but similar) data.
- Large drop in accuracy from lots of deep learning models, but models with higher accuracy have smaller drop
- Evidence that current accuracy numbers are brittle.

## CIFAR-10
One of the original AI datasets, CIFAR-10 has images from 10 classes (e.g. dog) and multiple correponding keywords. CIFAR-10 was drawn from a larger dataset called Tiny Images. In collecting new images and labelling from Tiny Images, authors ensured that the keyword distribution was roughly similar with a bias on more common keywords.

## Performance Results
Authors tested over 20 deep learning models spanning conventional (VGG, ResNet) to state-of-art (Shake-Drop) using publically available code. All models see a drop in accuracy (e.g. VGG saw 8% drop).

## Explanations
The original CIFAR-10 dataset had near duplicates which may allow models to overfit and see test accuracy still remain high. This would only explain at most 1% difference. Another possible explanation is that hyperparameter tuning may have overfit to CIFAR-10. Even with hyperparameter tuning, the results still hold and no settings can be found to produce significantly higher accuracy.

The implication seems to be that years of CIFAR-10 training and testing have overfit to the CIFAR-10 test data.

## What's next?
Certainly other common datasets (ImageNET, MIMIC-III) could be augmented and examined against existing algorithms. More broadly, we want to ensure that our models are not overfitting and are truly robust.

0 comments on commit d3b3940

Please sign in to comment.