Does Progress On Object Recognition Benchmarks Improve Real-World Generalization?

This repository is used to replicate the evaluation performed in the paper: Does Progress on Object Recognition Benchmarks Improve Real-World Generalization? It is also intended to serve as as a resource for evaluating models on both standard ImageNet benchmarks and the DollarStreet and GeoDE benchmarks. We welcome feedback, questions, and additions!

In this Repository

✅ Download 6 benchmarks with just one line of code
✅ Use our implementations of ~100 models, from ResNet to CLIP and DinoV2.
✅ Evaluate your model with a simple, scalable script
✅ Built-in logging with TensorBoard/Lightning, & compatible to run locally and multi-GPU

License

This repository is Attribution-NonCommercial 4.0 International licensed, as found in the LICENSE file.

Getting Started

Clone this repository:

git clone https://github.com/facebookresearch/Geographic_Generalization.git

In a new conda environment, install the required packages:

pip install -f https://download.pytorch.org/whl/torch_stable.html -e .

Download Benchmarks. You can specify which benchmarks to download using the -b flag, as shown in the examples below. Currently supported benchamrks are: imagenet_v2, imagenet_o, imagenet_r, imagenet_sketch, imagenet_a, objectnet, dollarstreet, and geode. To download ImageNet's validation set, follow the instructions here: https://www.image-net.org/download.php.

To download all benchmarks (default), run:
```
python download_data.py
```
To download specific benchmarks, use the -b flag:
```
### Example 1: downloading only dollarstreet
python download_data.py -b dollarstreet

### Example 2: downloading dollarstreet, geode, and imagenet
python download_data.py -b dollarstreet,geode,imagenet
```
To reference existing benchmarks outsite the data folder, change the file paths in each benchmark's dataloader, which are defined in the datasets folder. An example is shown below:
```
class ImageNet1kDataModule(ImageDataModule):
   def __init__(
       self,
       data_dir: str = <ALTERNATIVE IMAGENET PATH>,
       batch_size: int = 32,
       num_workers=8,
       image_size=224,
   ):
```
Run an evaluation.
```
python evaluate.py
```

Adding Your Own Model Weights

Details

Find the model's yaml file (config/model/.yaml Example: "config/model/resnet50.yaml").
Make a copy of the yaml file with a unique name

Add the checkpoint_path parameter to specify the paths to your new weights:

## config/model/resnet50_myweights.yaml

model_name: resnet50_myweights

model: 
  _target_: models.resnet.resnet.ResNet50ClassifierModule
   checkpoint_path: <INSERT YOUR PATH HERE>                    <- add the relative path to your model weights

To run an evaluation, change the model specified in config/evalaute_defaults.yaml to your new model's name, and run 'python evaluate.py'.

## config/evaluate_defaults.yaml

defaults:
  - base: base
  - mode: local
  - dataset_library: all
  - model: resnet50_my_weights   <- add your new model's name
  - measurement_library: all
  - measurement_group: test

Adding A New Architecture

Details

To add a new model:

Add a config yaml file in config/models/<new_model>.yaml with a 'model_name' and a 'model' key that maps to the model target.

config/models/<new_model>.yaml

   # @package _global_
   model_name: new_model_name

   model: 
     _target_: models.<model_architecture>.<file_name>.<class>
     learning_rate: 1e-4
     optimizer: adam

Add the model name to either evaluate_defaults.yaml or the sweep to include it in your run.
```
config/evaluate_defaults.yaml

  model: new_model_name
```

Add a python class for a new model in models/<architecture_folder>/<new_model>.py (e.g. models/resnet/resnet.py) that inherits the ClassifierModule class. You can either keep all the models for a given architecture in one script, or separate them out into distinct files if there's more detailed implementation. Just make sure your the config target matches the path you use!

models/<architecture_folder>/<new_model>.py
    
    from base_model import ClassifierModule
    
    class NewModelName(ClassifierModule):
        def __init__(
            self,
            timm_name: str = "",
            checkpoint_url: str = "",
        ):
            super().__init__(
                timm_name=timm_name,
                checkpoint_url=checkpoint_url
            )
        
        # Optional 
        def load_model(self):
            model = <something>

            return model

Building on Our Codebase - Guide to Further Customization

By default, the evaluation evaluates a pretrained Resnet50 on imagenet's validation set. These choices are encoded in config files, refer to the advanced section to learn about customization. The configs have the following structure:

config
├── base                # Hydra specifications, including experiment naming
├── dataset_library     # Library of all datasets compatible with this evaluation           
├── mode                # Hydra / Lightning specification for running locally / on clusters / testing
├── models              # Model specifications
├── measurement_library # Library of all measurements compatible with this evaluation
├── measurement_group   # Groups / lists of properties to use in a given evaluation   
├── evaluate_defaults.yaml 
└── ...

Changing Which Measurements Are Used

To change which measurements are measured:

Option 1: Alter the list of measurements in config/measurement_group/base

config/measurement_group/base.yaml

 measurements: [<add_measurement_name>]

Option 2: create a new measurement group (make a new config file, ex: config/measurement_group/new_measurement_group.yaml, and specify it in evaluate_defaults.yaml

config/measurement_group/new_measurement_group.yaml

 measurements: [<measurement_name>]

config/evaluate_defaults.yaml

  property_group: <new_measurement_group>

Changing Models Used

To change which model(s) are used:

For non-sweep experiments, change the model in evaluate_defaults.yaml. You can find supported models in config/models/
```
config/evaluate_defaults.yaml

  model: chosen_model
```
For sweeps: change the models list in your sweep file directly, e.g. in sh sweeps/basic_interplay_experiment.sh
```
sweeps/basic_interplay_experiment.yaml

  python evaluate.py -m model=resnet101,resnet18,chosen_model \
```

Adding New Measurements

To add a new measurement:

Add a config object to the measurement library found in config/measurement_library/all.yaml under the appropriate subsection. Measurement type is either 'properties' or 'benefits', as shown in the folder names. Leave the model and experiment_config values blank - they are dynamically passed in during the evaluation, but are necessary to list in the config for Hydra to identify the object.
```
config/measurement_library/all.yaml
  
  new_measurement_name: 
      _target_: measurements.<measurement_type>.<file_name>.<class>
      datamodule_names: [<datamodule_name>] # e.g. imagenet, v2
      model: 
      experiment_config: 
```
Add the measurement name to the desired measurement_group (e.g. change 'measurements' in config/measurement_group/base.yaml to include the new measurement)
```
config/measurement_group/base.yaml

  measurements: [<new_measurement_name>]
```

Add a python class for a new measurement in measurements.<file_name>.<class>, inheriting the Measurement class. For a commented and explained example, see the ClassificationAccuracyEvaluation class. Each measurement object is passed in a list of dataset names (that you will define in the measurement config, as above). This list determines which datasets the measurement accesses. The abstract measurement class constructs the datasets for you and stores them in the self.datamodules, which is dictionary mapping in the form of {datamodule_name: datamdule object}. To use the dataset in your measurement, just use this dictionary to access the desired datasets (see below, and in ClassificationAccuracyEvaluation example). ** Logging: the measurement object must return a dict[str: float], with the key identifying the measurement, followng the convention of <datamodule_name><data_split><property_name>, all lowercase. Example: imagenet_test_accuracy**

measurements.<measurement_type>.<file_name>.py
    
  class NewMeasurementName(Measurement):
      """<Describe the measurement>
        Args:
            datamodule_names (list[str]): list of dataset names required for this measurement. E.g. ['imagenet', 'dollarstreet']
            model (ClassifierModule): pytorch model to perform the measurement with
            experiment_config (DictConfig): Hydra config used primarily to instantiate a trainer. Must have key: 'trainer' to be compatible with pytorch lightning.
        Return:
            dict in the form {str: float}, where each key represents the name of the measurement, and each float is the corresponding value.
        """

    def __init__(self, datamodule_names: list[str],  model: ClassifierModule, experiment_config: DictConfig,):
        super().__init__(datamodule_names, model, experiment_config)

    def measure(self):

        # Get datamodule of interest
        datamodule_name, datamodule = next(iter(self.datamodules.items()))
        
        # Access model and trainer like this: self.model, self.trainer

        #### Insert Calculation Here #### 
        
        property_name = "example"
        return {f"{datamodule_name}_{split}_{property_name}: 13}

Testing

To run tests (excluding slow tests): python -m pytest tests/

To run all tests (including slow tests): python -m pytest --runslow tests/

To launch a run on a few batches locally: python train.py -m mode=local_test

Debugging Configs

To debug what configs are used: python evaluate.py --cfg job

Citation

@article{richards2023does,
 title={Does Progress On Object Recognition Benchmarks Improve Real-World Generalization?},
 author={Richards, Megan and Kirichenko, Polina and Bouchacourt, Diane and Ibrahim, Mark},
 journal={arXiv preprint arXiv:2307.13136},
 year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Does Progress On Object Recognition Benchmarks Improve Real-World Generalization?

In this Repository

✅ Download 6 benchmarks with just one line of code

✅ Use our implementations of ~100 models, from ResNet to CLIP and DinoV2.

✅ Evaluate your model with a simple, scalable script

✅ Built-in logging with TensorBoard/Lightning, & compatible to run locally and multi-GPU

License

Getting Started

To download all benchmarks (default), run:

To download specific benchmarks, use the -b flag:

To reference existing benchmarks outsite the data folder, change the file paths in each benchmark's dataloader, which are defined in the datasets folder. An example is shown below:

Adding Your Own Model Weights

Adding A New Architecture

To add a new model:

Building on Our Codebase - Guide to Further Customization

To change which measurements are measured:

To change which model(s) are used:

To add a new measurement:

Testing

Debugging Configs

Citation

About

Releases

Packages

Contributors 5

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
config		config
data		data
datasets		datasets
images		images
measurements		measurements
models		models
sweeps		sweeps
.gitignore		.gitignore
CODE_OF_CONDUCT		CODE_OF_CONDUCT
CONTRIBUTING		CONTRIBUTING
LICENSE		LICENSE
README.md		README.md
download_data.py		download_data.py
evaluate.py		evaluate.py
requirements.txt		requirements.txt
setup.py		setup.py

License

facebookresearch/Geographic_Generalization

Folders and files

Latest commit

History

Repository files navigation

Does Progress On Object Recognition Benchmarks Improve Real-World Generalization?

In this Repository

✅ Download 6 benchmarks with just one line of code

✅ Use our implementations of ~100 models, from ResNet to CLIP and DinoV2.

✅ Evaluate your model with a simple, scalable script

✅ Built-in logging with TensorBoard/Lightning, & compatible to run locally and multi-GPU

License

Getting Started

To download all benchmarks (default), run:

To download specific benchmarks, use the -b flag:

To reference existing benchmarks outsite the data folder, change the file paths in each benchmark's dataloader, which are defined in the datasets folder. An example is shown below:

Adding Your Own Model Weights

Adding A New Architecture

To add a new model:

Building on Our Codebase - Guide to Further Customization

To change which measurements are measured:

To change which model(s) are used:

To add a new measurement:

Testing

Debugging Configs

Citation

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages