Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

openfoodfacts/robotoff-ann

Repository files navigation

Robotoff ANN

This project is archived, as we now use Elasticsearch to perform ANN. We don't need anymore an external service to serve index files, everything related to ANN is done in Robotoff directly.


This project helps robotoff in categorizing logos. Bug tracking is mostly done on the main Robotoff repository

Tangible results

  • You can see all the crops generated and up for manual annotation in Hunger Games, our gamified annotation engine.
  • Robotoff pings new crops and annotations in the #robotoff-alerts-annotations Slack channel

Contributing

To setup the project you must have a recent version of docker and docker-compose installed.

use make dev.

make quality will run linters and tests.

Models used in production are published in releases of openfoodfacts-ai.

See more in Makefile.

Architecture

From images we extract logos (logo detection is in robotoff). Those logos are embedded in a metric space using a specific model1.

Artist view of logo embeddings

We then use approximate nearest neighbors 2 in this metric space to try to classify the logos from known examples KNN.

Those logos will then help apply labels to Open Food Facts products.

Main entry point is API to get nearest neighbors, either for logo id 3, or an embedding vector 4, or add new logo from a image 5.

Note that the approximate nearest neighbors index is only regenerated using a specific command 6.

Preliminary research

How does it work ?

  • The ANN /add endpoints works as follows:
  • from the raw image and detected bounding boxes, we crop the image to get all detected logos.
  • Each logo is provided as input to the neural network (here an EfficientNet), to get an embedding for each logo.
  • The embedding is saved locally on an HDF5 file (
    def save_embeddings(
    ).

Annotation

Annotation guidelines

  • there should be as little space as possible between the bounding box and the object. Conversely, the whole object must be included in the bounding box.
  • if the object is partially hidden, indicate the object as "occluded" (click on the "profile" icon on the object in question, in the right panel)
  • for best results, it is necessary that similar objects are annotated in the same way (especially concerning the extent of the object). It happens that there are several scales of annotation (cf the question discussed above of pictograms "to recycle"), the most important is that the annotations are coherent.
  • several very similar images or concerning the same product follow one another in the dataset. For the next campaign, it will be better to shuffle the dataset to have as much diversity as possible (edited)

Colab notebooks

Pipeline on colab

Roadmap

API routes

ANNResource: Allows you to do XYZ

/api/v1/ann/{logo_id:int}

ANNResource: Allows you to do XYZ

/api/v1/ann

ANNBatchResource: Allows you to do XYZ

/api/v1/ann/batch

ANNEmbeddingResource: Allows you to do XYZ

/api/v1/ann/from_embedding

AddLogoResource: Allows you to do XYZ

/api/v1/ann/add

ANNCountResource: Allows you to do XYZ

/api/v1/ann/count

ANNStoredLogoResource: Allows you to do XYZ

/api/v1/ann/stored

Datasets

Footnotes

  1. see embeddings.generate_embeddings and settings.DEFAULT_MODEL

  2. see api.ANNIndex which currently relies on Annoy

  3. see api.ANNResource and api.ANNBatchResource

  4. see api.ANNEmbeddingResource

  5. see api.AddLogoResource

  6. see manage.generate_index