A simple synthetic dataset and baseline model for visual counting. The task is to count the number of even digits given a 100x100 image, each with up to 5 randomly chosen MNIST digits. We use rejection sampling to ensure digits are separated by at least 28 pixels. Reproduced with details from Learning to count with deep object features.
NOTE: This is not a dataset to beat, but a simple place to start for validating ideas in counting models.
- Generate TFRecords:
python -m counting_mnist.create_dataset
- Train baseline:
python -m counting_mnist.main
|Always Predict Zero Even Digits||33%|
|Uniform Count Predictions||12%|