Skip to content
Code for "MultiGrain: a unified image embedding for classes and instances"
Branch: master
Clone or download
Latest commit 8a5496b Oct 10, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
data Initial commit Mar 22, 2019
scripts minor fixes Oct 10, 2019
.gitignore Initial commit Mar 22, 2019
CHANGELOG Initial commit Mar 22, 2019 Initial commit Mar 22, 2019 Initial commit Mar 22, 2019
LICENSE Initial commit Mar 22, 2019 minor fixes Oct 10, 2019
environment.yml Add whitening/eval implementation Mar 27, 2019
requirements.txt Update requirements.txt Jun 21, 2019


MultiGrain is a neural network architecture that solves both image classification and image retrieval tasks.

The method is described in "MultiGrain: a unified image embedding for classes and instances" (arXiv link).

BibTeX reference:

       author = {Berman, Maxim and J{\'e}gou, Herv{\'e} and Vedaldi Andrea and
         Kokkinos, Iasonas and Douze, Matthijs},
        title = "{{MultiGrain}: a unified image embedding for classes and instances}",
      journal = {arXiv e-prints},
         year = "2019",
        month = "Feb",

Please cite it if you use it.


The MultiGrain code requires

  • Python 3.5 or higher
  • PyTorch 1.0 or higher

and the requirements highlighted in requirements.txt

The requirements can be installed:

  • Ether by setting up a dedicated conda environment: conda env create -f environment.yml followed by source activate multigrain
  • Or with pip: pip install -r requirements.txt

Using the code

Extracting features with pre-trained networks

We provide pre-trained networks with ResNet-50 trunks for the following settings (top-1 accuracies given at scale 224):

λ p augmentation top-1 weights
1 1 full 76.8 joint_1B_1.0.pth
1 3 full 76.9 joint_3B_1.0.pth
0.5 1 full 77.0 joint_1B_0.5.pth
0.5 3 full 77.4 joint_3B_0.5.pth
0.5 3 autoaugment 78.2 joint_3BAA_0.5.pth

We provide fine-tuned networks for scales bigger than 224, as described in the Supplementary E. Only the pooling coefficient is fine-tuned:

network scale p top-1 weights
NASNet-A-Mobile 350 px 1.7 75.1 joint_1B_1.0.pth
SENet154 400 px 1.6 83.0 joint_3B_1.0.pth
PNASNet-5-Large 500 px 1.7 83.6 joint_1B_0.5.pth

To load a network, use the following PyTorch code:

import torch
from multigrain.lib import get_multigrain

net = get_multigrain('resnet50')

checkpoint = torch.load('base_1B_1.0.pth')


The network takes images in any resolution. A normalization pre-processing step is used, with mean [0.485, 0.456, 0.406]. and standard deviation [0.229, 0.224, 0.225].

The pretrained weights do not include whitening of the features (important for retrieval), which are specific to each evaluation scale; follow steps below to compute and apply a whitening.

Evaluation of the networks

scripts/ evaluates the network on standard benchmarks.

Classification results

Evaluate a network on ImageNet-val is straightforward using options from For instance the following command:

IMAGENET_PATH=  # the path that contains the /val and /train image directories

python scripts/ --expdir experiments/joint_3B_0.5/eval_p4_500 \
--imagenet-path $IMAGENET_PATH --input-size 500 --dataset imagenet-val \
--pooling-exponent 4 --resume-from joint_3B_0.5.pth

using the joint_3B_0.5.pth pretrained weights, should reproduce the top-1/top5 results of 78.6%/94.4% given in the article in Table 2 for ResNet-50 MultiGrain p=3, λ=0.5 and p*=4 scale s*=500.

Retrieval results

The implementation of the evaluation on the retrieval benchmarks in is in progress, but one may already use the dataloaders implemented in datasets/ for this purpose.


The training is performed in three steps. See help (-h flag) for detailed parameter list of each script. Only the initial joint training script benefits from multi-gpu hardware, the remaining scripts are not parallelized.

Joint training

scripts/ trains a MultiGrain architecture.

Important parameters:

  • --repeated-augmentations: number of repeated augmentations in the batches, N=3 was used in our joint trainings; N=1 is vanilla uniform sampling.
  • --pooling-exponent: pooling exponent in GeM pooling, p=1: vanilla average pooling.
  • --classif-weight: weighting factor between margin loss and classification loss (parameter λ in paper)

Other useful parameters:

  • --expdir: dedicated directory for the experiments
  • --resume-from: takes either an expdir or a model checkpoint file to restore from
  • --pretrained-backbone: initialized backbone weights from model zoo

Input size fine-tuning of GeM exponent

scripts/ determines the optimal p* for a given input resolution by fine-tuning (see supplementary E. in paper for details). Alternatively one may use cross-validation to determine p*, as done in the main article.

Whitening of the retrieval features

scripts/ computes a PCA whitening and modifies the network accordingly, integrating the reversed transformation in the fully-connected classification layer as described in the article. The scripts takes a list and directory of whitening images; the list given in data/whiten.txt is relative to the multimedia commons file structure.

Example training procedure

For example, the results with p=3 and λ=0.5 at scale s*=500 can be obtained with

# train network
python scripts/ --expdir experiments/joint_3B_0.5 --repeated-augmentations 3 \
--pooling-exponent 3 --classif-weight 0.5 --imagenet-path $IMAGENET_PATH

# fine-tune p*
python scripts/ --expdir experiments/joint_3B_0.5/finetune500 \
--resume-from experiments/joint_3B_0.5 --input-size 500 --imagenet-path $IMAGENET_PATH

# whitening 
python scripts/ --expdir experiments/joint_3B_0.5/finetune500_whitened \
--resume-from experiments/joint_3B_0.5/finetune500 --input-size 500 --whiten-path $WHITEN_PATH

Fine-tuning existing network

In appendix E. we report fine-tuning results on several pretrained networks. This experience can be reproduced using the script. For example, in the case of SENet154 at scale s*=450, the following command should yield 83.1 top-1 accuracy with p*=1.6:

python scripts/ --expdir experiments/se154/finetune450 \
--pretrained-backbone --imagenet-path $IMAGENET_PATH --input-size 450 --backbone senet154 \


See the CONTRIBUTING file for how to help out.


MultiGrain is CC BY-NC 4.0 licensed, as found in the LICENSE file.

The AutoAugment implementation is based on The Distance Weighted Sampling and margin loss implementation is based on the authors implementation

You can’t perform that action at this time.