# Perform image tagging with pollen-vision

Learn how to perform image tagging with the pollen-vision library, using RAM++.

This notebook will show you how to use our wrapper for the RAM++ image tagging model developped by Xinyu Huang et al. at the Oppo research institute.

## A word on RAM++

RAM stands for Recognize Anything Model. RAM is an image tagging model which can recognize any common category of object with high accuracy. RAM++ is the newest generation of RAM which can now also perform zero shot image tagging. This means that the model is able to tag images with any object, considering you provide it with a description of the object.

This is very useful for us in Robotics because we can include this in applications to adapt the robot behavior depending of its environment. For example if we ask the robot to try to grasp a mug, it can first check whether there is a mug to grasp or not and if not perform another behavior.

You can find the RAM++ paper [here](https://arxiv.org/abs/2310.15200). Our wrapper for RAM uses [its implementation](https://github.com/xinyu1205/recognize-anything?tab=readme-ov-file) for the authors of the paper, credits to them!

## Setup environment

> Note: If you are working locally on your machine and have already installed the library from source, discard the following.

We need to first install the pollen-vision library. We will install the library from source, this might take a couple of minutes as there are quite heavy dependencies.

> If you are on Colab and a warning window pops up indicating "You must restart the runtime in order to use the newly installed versions." don't worry. Just press restart session and don't execute the pip install cell again, pollen-vision will already be installed.

In [None]:
!pip install "pollen-vision[vision] @ git+https://github.com/pollen-robotics/pollen-vision.git@main"

To use RAM in Colab, we need a few more steps:


*   download the weight of the model
*   download the configuration file needed by the model with the description of a few objects to identify

In [None]:
!wget https://huggingface.co/xinyu1205/recognize-anything-plus-model/resolve/main/ram_plus_swin_large_14m.pth

!wget https://raw.githubusercontent.com/pollen-robotics/pollen-vision/develop/pollen_vision/pollen_vision/vision_models/object_detection/recognize_anything/objects_descriptions/example_objects_descriptions.json

Move what has been downloaded to the correct locations and the setup is done!

In [None]:
!mv ram_plus_swin_large_14m.pth /usr/local/lib/python3.10/dist-packages/checkpoints

!mkdir -p /usr/local/lib/python3.10/dist-packages/pollen_vision/vision_models/object_detection/recognize_anything/objects_descriptions

!mv example_objects_descriptions.json /usr/local/lib/python3.10/dist-packages/pollen_vision/vision_models/object_detection/recognize_anything/objects_descriptions

## Use the RAM wrapper

In [None]:
import numpy as np
from PIL import Image

from pollen_vision.vision_models.object_detection import RAM_wrapper

To use pollen-vision's RAM wrapper, you just need to provide a description file. We provided a description file with a few basic objects if you want to try with it.

In [None]:
wrapper = RAM_wrapper(objects_descriptions_filename="example_objects_descriptions")

## Import example image

Here we will take one test image of the project where Reachy tries to serve a croissant, a French pastry made from puff pastry in a crescent shape, yummy! 🥐

 We will use an image from the [reachy-doing-things image dataset](https://huggingface.co/datasets/pollen-robotics/reachy-doing-things) available on Hugging Face. In this dataset, we captured images from an egocentric view of Reachy doing manipulation tasks while being teleoperated.

Feel fry to try the object detection with your own image instead!

In [None]:
from datasets import load_dataset

dataset = load_dataset("pollen-robotics/reachy-doing-things", split="train")

In [None]:
img = dataset[0]["image"]
img

The object classes that RAM can tag with the configuration file that you provided at the instanciation of the master can be checked with the *open_set_categories* attribute.

In [None]:
print(wrapper.open_set_categories)

Let's run RAM to check what objects in its open set it can tag in the image.

In [None]:
wrapper.infer(np.array(img))

So here, based on the objects our current RAM wrapper can tag, only two objects are considered to be in the frame: a humanoid robot and some chairs. But what if we want our wrapper to tag whether a croissant is in the image or not (to start a croissant grasping task for example)? 

## Generate a new description file

You can easily generate a new description file using pollen-vision with other objects that you want to tag. This file can then be used by the RAM wrapper.

In [None]:
from pollen_vision.vision_models.object_detection import ObjectDescriptionGenerator

💡 Please note that you will need an OpenAI API key for this as the generator uses the GPT 3.5 model to perform the generation.
By default the ObjectDetectionGenerator object looks at init for the API key defined with the **OPENAI_API_KEY** environment variable. If you prefer, you can just pass your API key as argument of the init with the *api_key* argument.

If you're working on Colab, you will need to pass your OPENAI api key.

In [None]:
OPENAI_API_KEY = ""  # Add your OPENAI API key here

if OPENAI_API_KEY != "":
  object_detection_generator = ObjectDescriptionGenerator(api_key=OPENAI_API_KEY)
else:
  object_detection_generator = ObjectDescriptionGenerator()

Just call the *generate_descriptions* method to generate the discription for the objects you want. Pass a list of objects names as argument. Here we will ask to generate descriptions for our croissant.

In [None]:
new_objects_list = ["croissant"]

objects_descriptions = object_detection_generator.generate_descriptions(objects=new_objects_list, generation_nb_per_object=30)

By default, 10 descriptions are generated per object. This can be changed with the optional argument *generation_nb_per_object* of the *generate_descriptions* method. Because a croissant can be a bit technical to tag, we asked for more description per object.

In [None]:
print(objects_descriptions)

You can save the descriptions you just generated to a json file that you can later use with RAM using the *save_descriptions* method. The description file will be saved in the object_descriptions folder.

In [None]:
object_detection_generator.save_descriptions(descriptions=objects_descriptions, descriptor_file_name="croissant-descriptor")

You can then use the description file you just generated with RAM.

In [None]:
my_new_ram_wrapper = RAM_wrapper(objects_descriptions_filename="croissant-descriptor")

Let's check the objects our new wrapper is able to tag, to make sure if it can tag croissants

In [None]:
my_new_ram_wrapper.open_set_categories

Let's perform the image tagging with the same image as before.

In [None]:
my_new_ram_wrapper.infer(np.array(img))

Yes, apparently there is a croissant in the image!

Let's try to tag on a another image where there is no croissant, just to check if the tag actually works and does not tag a croissant on any image.

In [None]:
non_croissant_img = dataset[21]["image"]
non_croissant_img

In [None]:
my_new_ram_wrapper.infer(np.array(non_croissant_img))

Nice, the croissant tagging seems to work!