<a href="https://colab.research.google.com/github/PierreLeveau/automl/blob/main/notebooks/image_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Image Classification Using AutoML

In this notebook, we will see how we can simply create an image classification model with AutoML to pre-annotate our dataset on the [Kili Platform](https://cloud.kili-technology.com/label/).

## Install

We first follow the install procedure explained in the [README.md](https://github.com/kili-technology/automl/blob/main/README.md). 

In [23]:
!git clone https://github.com/kili-technology/automl.git

Cloning into 'automl'...
remote: Enumerating objects: 2479, done.[K
remote: Counting objects: 100% (875/875), done.[K
remote: Compressing objects: 100% (355/355), done.[K
remote: Total 2479 (delta 586), reused 755 (delta 516), pack-reused 1604[K
Receiving objects: 100% (2479/2479), 28.78 MiB | 30.01 MiB/s, done.
Resolving deltas: 100% (1211/1211), done.


In [24]:
%cd automl


/content/automl/automl


Install the packages. This should take less than a minute. 

In [25]:
%%capture
!git submodule update --init
!pip install -r requirements.txt -r kiliautoml/utils/ultralytics/yolov5/requirements.txt
!pip install -e .
!pip install kili

## Imports

In [26]:
import os
from getpass import getpass
from tqdm.autonotebook import tqdm, trange

from kili.client import Kili

Setup the python PATH to use kiliautoml.

In [27]:
KILI_URL="https://cloud.kili-technology.com/"
os.environ["PYTHONPATH"] += ":/content/automl/"

After getting your API key from the Kili platform, you can setup your environment variables.

In [28]:
api_key = getpass("Add your API Key here: ")
api_endpoint = f'{KILI_URL}api/label/v2/graphql' # If you are not using Kili SaaS, change the endpoint to your configuration

Add your API Key here: ··········


## Setup a mock Kili project

Setup the kili connection.

In [29]:
kili = Kili(api_key=api_key, api_endpoint=api_endpoint)

### Create the project

In [30]:
json_interface = {
    "jobRendererWidth": 0.2,
    "jobs": {
        "CLASSIFICATION_JOB": {
            "mlTask": "CLASSIFICATION",
            "content": {
                "categories": {
                    "VEHICLE": {
                        "name": "vehicle"
                    },
                    "NON_VEHICLE": {
                        "name": "non vehicle"
                    }
                },
                "input": "radio"
            },
            "required": 0,
            "isChild": False,
            "instruction": "Class of vehicle presence"
        }
    }
}

In [31]:
project_id = kili.create_project(
        title="Vehicle Classification",
        description="Classify vehicle presence",
        input_type="IMAGE",
        json_interface=json_interface
)["id"]

### Add assets

In [32]:
vehicle_assets = [
    {
        "externalId": f"{i}",
        "content": f"https://storage.googleapis.com/kili-machine-learning-automl/notebooks/vehicle_classification/vehicles/image_{i}.png",
        "metadata": {}
    }
    for i in range(500)
]
non_vehicle_assets = [
    {
        "externalId": f"{len(vehicle_assets) + i}",
        "content": f"https://storage.googleapis.com/kili-machine-learning-automl/notebooks/vehicle_classification/non-vehicles/image_{i}.png",
        "metadata": {}
    }
    for i in range(500)
]
assets_to_import = vehicle_assets + non_vehicle_assets

Now we send the data to our Kili project.

In [33]:
external_id_array = [a.get("externalId") for a in assets_to_import]
content_array = [a.get("content") for a in assets_to_import]
json_metadata_array = [a.get("metadata") for a in assets_to_import]
kili.append_many_to_dataset(project_id=project_id, 
                            content_array=content_array,
                            external_id_array=external_id_array, 
                            json_metadata_array=json_metadata_array)

{'id': 'cl48kw7rx07e70lm9ejgn9fvg'}

### Add labels to assets

We add labels to half of the data to simulate a project where we haven't labeled much data and we want to predict the labels of the unlabeled data. 

In [34]:
asset_ids = kili.assets(project_id=project_id, fields=["id", "externalId"])
asset_ids[0]

100%|██████████| 1000/1000 [00:02<00:00, 375.61it/s]


{'externalId': '0', 'id': 'cl48kw9n7189v0lpud8xk08no'}

In [35]:
asset_ids = kili.assets(project_id=project_id, fields=["id", "externalId"], first=1000)

for asset_id in tqdm(asset_ids):
    external_id = int(asset_id["externalId"])
    if external_id < 300:
        kili.append_to_labels(label_asset_id=asset_id["id"],
                              json_response={
                                  "CLASSIFICATION_JOB": {
                                      "categories": [{"name": "VEHICLE"}]
                                  }
                              })

    elif 499 < external_id < 800:
        kili.append_to_labels(label_asset_id=asset_id["id"],
                              json_response={
                                  "CLASSIFICATION_JOB": {
                                      "categories": [{"name": "NON_VEHICLE"}]
                                  }
                              })
    else:
        pass

100%|██████████| 1000/1000 [00:02<00:00, 412.91it/s]


  0%|          | 0/1000 [00:00<?, ?it/s]

You can now click on the following link to see the assets in your project:

In [36]:
print(f"{KILI_URL}label/projects/{project_id}/menu/queue?currentPage=1&pageSize=20")

https://cloud.kili-technology.com/label/projects/cl48kw7rx07e70lm9ejgn9fvg/menu/queue?currentPage=1&pageSize=20


## Training a image classifier with Kiliautoml

The following command will automatically download the labeled data in your Kili project. Then, it will choose the right model for your task, train it with this data and save it locally.

In [37]:
!kiliautoml train \
    --api-key {api_key} \
    --project-id {project_id} \
    --epochs 30

Loading KiliAutoML...
100% 1/1 [00:00<00:00,  2.03it/s]
[1m[33mkili:[0m Downloading assets with status in ['LABELED', 'TO_REVIEW', 'REVIEWED'] from Kili project
100% 600/600 [00:01<00:00, 323.55it/s]
[1m[33mkili:[0m Training on job: CLASSIFICATION_JOB
[1m[33mkili:[0m defaulting to model_repository=torchvision
[1m[33mkili:[0m defaulting to model_name=efficientnet_b0
[1m[33mkili:[0m Downloading images to folder /root/.cache/kili/automl/cl48kw7rx07e70lm9ejgn9fvg/CLASSIFICATION_JOB/torchvision/data
Downloading images: 100% 600/600 [04:49<00:00,  2.07it/s]
Initialization of the model with N=2 classes
Downloading: "https://download.pytorch.org/models/efficientnet_b0_rwightman-3dd342df.pth" to /root/.cache/torch/hub/checkpoints/efficientnet_b0_rwightman-3dd342df.pth
100% 20.5M/20.5M [00:01<00:00, 18.7MB/s]
Training - Epoch: 100% 30/30 [02:23<00:00,  4.78s/it]
[1m[33mkili:[0m
job_name              training_loss
------------------  ---------------
CLASSIFICATION_JOB         0.

### Send predictions

Now we can use our local trained model to predict the classes of our image assets and send the prediction scores to the project on Kili. These preannotations can then be validated or corrected by annotators.

In [38]:
!kiliautoml predict \
    --api-key {api_key} \
    --project-id {project_id}

Loading KiliAutoML...
100% 1/1 [00:00<00:00,  2.08it/s]
[1m[33mkili:[0m Downloading assets with status in ['TODO', 'ONGOING'] from Kili project
100% 400/400 [00:01<00:00, 332.85it/s]
[1m[33mkili:[0m Predicting annotations for job: CLASSIFICATION_JOB
[1m[33mkili:[0m defaulting to model_repository=torchvision
[1m[33mkili:[0m defaulting to model_name=efficientnet_b0
[1m[33mkili:[0m Downloading images to folder /root/.cache/kili/automl/cl48kw7rx07e70lm9ejgn9fvg/CLASSIFICATION_JOB/torchvision/data
Downloading images: 100% 400/400 [03:12<00:00,  2.08it/s]
Initialization of the model with N=2 classes
[1m[33mkili:[0m JobPredictions: 400 predictions successfully created for job CLASSIFICATION_JOB.


Now you can ckeck that your assets have predictions on [Kili](https://cloud.kili-technology.com/)!

In [None]:
print(f"{KILI_URL}label/projects/{project_id}/menu/queue?currentPage=1&pageSize=20")

### Label Errors

You can add wrong labels to a small portion of the data to simulate a project where some data is incorrectly labeled. 


In [42]:
asset_ids = kili.assets(project_id=project_id, fields=['id', 'externalId'])

for asset_id in tqdm(asset_ids):
    external_id = int(asset_id["externalId"])
    if 299 < external_id < 310:
        kili.append_to_labels(label_asset_id=asset_id["id"],
                              json_response={
                                  "CLASSIFICATION_JOB": {
                                      "categories": [{"name": "NON_VEHICLE"}]
                                  }
                              })

    elif 799 < external_id < 810:
        kili.append_to_labels(label_asset_id=asset_id["id"],
                              json_response={
                                  "CLASSIFICATION_JOB": {
                                      "categories": [{"name": "VEHICLE"}]
                                  }
                              })
    else:
        pass

100%|██████████| 1000/1000 [00:02<00:00, 365.53it/s]


  0%|          | 0/1000 [00:00<?, ?it/s]

This command analyses the labeled data to detect potential labeling errors and sends warnings to the concerned assets. The user can then use the `potential_label_error` filter on the project's asset exploration to find the potentially problematic assets.

In [43]:
!kiliautoml label_errors \
    --api-key {api_key} \
    --project-id {project_id}

Loading KiliAutoML...
100% 1/1 [00:00<00:00,  2.08it/s]
[1m[33mkili:[0m Detecting errors for job: CLASSIFICATION_JOB
[1m[33mkili:[0m Downloading assets with status in ['LABELED', 'TO_REVIEW', 'REVIEWED'] from Kili project
100% 620/620 [00:01<00:00, 313.20it/s]
[1m[33mkili:[0m defaulting to model_repository=torchvision
[1m[33mkili:[0m defaulting to model_name=efficientnet_b0
[1m[33mkili:[0m Downloading images to folder /root/.cache/kili/automl/cl48kw7rx07e70lm9ejgn9fvg/CLASSIFICATION_JOB/torchvision/data
Downloading images: 100% 620/620 [00:01<00:00, 474.72it/s]
Training and predicting on several folds:   0% 0/4 [00:00<?, ?it/s]Initialization of the model with N=2 classes

Training - Epoch:   0% 0/50 [00:00<?, ?it/s][A
Training - Epoch:   2% 1/50 [00:04<03:48,  4.66s/it][A
Training - Epoch:   4% 2/50 [00:08<03:12,  4.01s/it][A
Training - Epoch:   6% 3/50 [00:11<02:58,  3.79s/it][A
Training - Epoch:   8% 4/50 [00:15<02:50,  3.71s/it][A
Training - Epoch:  10% 5/50 [00:

You can see in the project that several uncorrectly labeled images have been spotted.

In [41]:
print(f"{KILI_URL}label/projects/{project_id}/menu/queue?currentPage=1&metadata%5Blabeling_error%5D=true&pageSize=20")

https://cloud.kili-technology.com/label/projects/cl48kw7rx07e70lm9ejgn9fvg/menu/queue?currentPage=1&metadata%5Blabeling_error%5D=true&pageSize=20
