<a href="https://colab.research.google.com/github/joanvlasschaert/joanvlasschaert/blob/main/notebooks/image_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Image Classification Using AutoML

In this notebook, we will see how we can simply create an image classification model with AutoML to pre-annotate our dataset on the [Kili Platform](https://cloud.kili-technology.com/label/).

## Setup API key

We first setup the api key and automl path.

In [1]:
from getpass import getpass

In [14]:
KILI_URL="https://cloud.kili-technology.com/"  # If you are not using Kili SaaS, change the url to your configuration

api_endpoint = f"{KILI_URL}api/label/v2/graphql"

You can get your API key from the [Kili platform](https://cloud.kili-technology.com/label/my-account/api-key) and setup your environment variables. If you are working locally, please set your environment variables in a `.env` file. Also, if notebook is used on Colab, the Python path is redirected. 

In [4]:
! pip install python-dotenv
%reload_ext dotenv
%dotenv

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting python-dotenv
  Downloading python_dotenv-0.21.0-py3-none-any.whl (18 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-0.21.0
cannot find .env file


In [5]:
from IPython import get_ipython
import os

if "google.colab" in str(get_ipython()):
    os.environ["PYTHONPATH"] += ":/content/automl/"
    api_key = getpass("Add your API Key here: ")
else:
    api_key = os.getenv("KILI_API_KEY")

Add your API Key here: ··········


## Install

We will follow the install procedure explained in the [README.md](https://github.com/kili-technology/automl/blob/main/README.md). 

In [6]:
!git clone https://github.com/kili-technology/automl.git

Cloning into 'automl'...
remote: Enumerating objects: 4643, done.[K
remote: Counting objects: 100% (1539/1539), done.[K
remote: Compressing objects: 100% (571/571), done.[K
remote: Total 4643 (delta 1154), reused 1216 (delta 957), pack-reused 3104[K
Receiving objects: 100% (4643/4643), 45.66 MiB | 28.13 MiB/s, done.
Resolving deltas: 100% (2699/2699), done.


In [7]:
%cd automl

/content/automl


Install the packages. This should take less than a minute. 

In [8]:
%%capture
!git submodule update --init
!pip install torch && pip install -e .

## Imports

In [9]:
from tqdm.autonotebook import tqdm

from kili.client import Kili

  from tqdm.autonotebook import tqdm


## Setup a mock Kili project

Setup the kili connection.

In [15]:
kili = Kili(api_key=api_key, api_endpoint=api_endpoint)

Please install version: "pip install kili==2.125.1"


### Create the project

In [16]:
json_interface = {
    "jobRendererWidth": 0.2,
    "jobs": {
        "CLASSIFICATION_JOB": {
            "mlTask": "CLASSIFICATION",
            "content": {
                "categories": {
                    "VEHICLE": {
                        "name": "vehicle"
                    },
                    "NON_VEHICLE": {
                        "name": "non vehicle"
                    }
                },
                "input": "radio"
            },
            "required": 0,
            "isChild": False,
            "instruction": "Class of vehicle presence"
        }
    }
}

In [17]:
project = kili.create_project(
    title="AutoML demo - Image Classification (Vehicle Classification)",
    description="Classify vehicle presence",
    input_type="IMAGE",
    json_interface=json_interface
)

In [18]:
project_id = project["id"]

### Add assets

In [19]:
vehicle_assets = [
    {
        "externalId": f"{i}",
        "content": f"https://storage.googleapis.com/kili-machine-learning-automl/notebooks/vehicle_classification/vehicles/image_{i}.png",
        "metadata": {}
    }
    for i in range(500)
]
non_vehicle_assets = [
    {
        "externalId": f"{len(vehicle_assets) + i}",
        "content": f"https://storage.googleapis.com/kili-machine-learning-automl/notebooks/vehicle_classification/non-vehicles/image_{i}.png",
        "metadata": {}
    }
    for i in range(500)
]
assets_to_import = vehicle_assets + non_vehicle_assets

Now we send the data to our Kili project.

In [20]:
external_id_array = [a.get("externalId") for a in assets_to_import]
content_array = [a.get("content") for a in assets_to_import]
json_metadata_array = [a.get("metadata") for a in assets_to_import]
kili.append_many_to_dataset(project_id=project_id, 
                            content_array=content_array,
                            external_id_array=external_id_array, 
                            json_metadata_array=json_metadata_array)

100%|██████████| 1000/1000 [01:01<00:00, 16.23it/s]


{'id': 'clba4yl4g008s0lwmhlu89h7e'}

### Add labels to assets

We add labels to half of the data to simulate a project where we haven't labeled much data and we want to predict the labels of the unlabeled data. 

In [21]:
assets = kili.assets(project_id=project_id, fields=["id", "externalId"], first=1000)

asset_ids = []
json_responses = []
for asset_id in tqdm(assets):
    external_id = int(asset_id["externalId"])

    if external_id < 300:
        asset_ids.append(asset_id["id"])
        json_responses.append({
                                  "CLASSIFICATION_JOB": {
                                      "categories": [{"name": "VEHICLE"}]
                                  }
                              })
    
    elif 499 < external_id < 800:
        asset_ids.append(asset_id["id"])
        json_responses.append({
                                  "CLASSIFICATION_JOB": {
                                      "categories": [{"name": "NON_VEHICLE"}]
                                  }
                              })
    else:
        pass

kili.append_labels(
    asset_ids,
    json_responses
)

100%|██████████| 1000/1000 [00:02<00:00, 394.76it/s]


  0%|          | 0/1000 [00:00<?, ?it/s]

[{'id': 'clba521rq00e30l03geoh8uhx'},
 {'id': 'clba521rq00e40l039ksyfb7p'},
 {'id': 'clba521rq00e50l03drm0htjy'},
 {'id': 'clba521rq00e60l03h6vkfywp'},
 {'id': 'clba521rq00e70l034gy10lro'},
 {'id': 'clba521rr00e80l03218c6j3b'},
 {'id': 'clba521rr00e90l0328y55fwy'},
 {'id': 'clba521rr00ea0l031x0a0gay'},
 {'id': 'clba521rr00eb0l0301sf61yx'},
 {'id': 'clba521rr00ec0l038h1cem4j'},
 {'id': 'clba521rr00ed0l03baki8z0u'},
 {'id': 'clba521rr00ee0l031bzmdz3n'},
 {'id': 'clba521rr00ef0l0313e8epy6'},
 {'id': 'clba521rr00eg0l0360dlco1v'},
 {'id': 'clba521rr00eh0l03gez73wu2'},
 {'id': 'clba521rr00ei0l037uf3flpm'},
 {'id': 'clba521rr00ej0l0357di4liu'},
 {'id': 'clba521rr00ek0l033rmke80g'},
 {'id': 'clba521rr00el0l03cnqwbkv9'},
 {'id': 'clba521rs00em0l03cdmr3goa'},
 {'id': 'clba521rs00en0l03edre0vf9'},
 {'id': 'clba521rs00eo0l033diw4nr9'},
 {'id': 'clba521rs00ep0l0336u7h17n'},
 {'id': 'clba521rs00eq0l037yxsainc'},
 {'id': 'clba521rs00er0l0355h38z0u'},
 {'id': 'clba521rs00es0l03d03f87qa'},
 {'id': 'clb

You can now click on the following link to see the assets in your project:

In [22]:
print(f"{KILI_URL}label/projects/{project_id}/menu/queue?currentPage=1&pageSize=20")

https://cloud.kili-technology.com/label/projects/clba4yl4g008s0lwmhlu89h7e/menu/queue?currentPage=1&pageSize=20


## Training a image classifier with Kiliautoml

The following command will automatically download the labeled data in your Kili project. Then, it will choose the right model for your task, train it with this data and save it locally.

In [23]:
!kiliautoml train \
    --api-key $api_key \
    --project-id $project_id \
    --epochs 30

Loading KiliAutoML...
Please install version: "pip install kili==2.125.1"
100% 1/1 [00:00<00:00,  2.08it/s]
[33mKiliAutoML[0m INFO [1mTraining on job: CLASSIFICATION_JOB[0m
[33mKiliAutoML[0m INFO [1mDownloading assets with status in ['LABELED', 'TO_REVIEW', 'REVIEWED'] from Kili project[0m
cache_path /root/.cache/kili/automl/clba4yl4g008s0lwmhlu89h7e/get_asset_memoized
________________________________________________________________________________
[Memory] Calling kiliautoml.utils.helpers.get_asset_memoized...
get_asset_memoized(kili=Kili(), project_id='clba4yl4g008s0lwmhlu89h7e', total=None, skip=0, status_in=['LABELED', 'TO_REVIEW', 'REVIEWED'], asset_filter=None)
100% 600/600 [00:08<00:00, 66.80it/s]
_______________________________________________get_asset_memoized - 9.4s, 0.2min
[34m[1mwandb[0m: (1) Create a W&B account
[34m[1mwandb[0m: (2) Use an existing W&B account
[34m[1mwandb[0m: (3) Don't visualize my results
[34m[1mwandb[0m: Enter your choice: 1
[34m[1

### Send predictions

Now we can use our local trained model to predict the classes of our image assets and send the prediction scores to the project on Kili. These preannotations can then be validated or corrected by annotators.

In [24]:
!kiliautoml predict \
    --api-key $api_key \
    --project-id $project_id

Loading KiliAutoML...
[33mKiliAutoML[0m INFO [1mAre you sure You want to send the predictions to Kili? Y/N[0m
y
[33mKiliAutoML[0m INFO [1mOK, We will send the predictions to Kili![0m
Please install version: "pip install kili==2.125.1"
100% 1/1 [00:00<00:00,  2.01it/s]
[33mKiliAutoML[0m INFO [1mDownloading assets with status in ['TODO', 'ONGOING'] from Kili project[0m
cache_path /root/.cache/kili/automl/clba4yl4g008s0lwmhlu89h7e/get_asset_memoized
________________________________________________________________________________
[Memory] Calling kiliautoml.utils.helpers.get_asset_memoized...
get_asset_memoized(kili=Kili(), project_id='clba4yl4g008s0lwmhlu89h7e', total=None, skip=0, status_in=['TODO', 'ONGOING'], asset_filter=None)
100% 400/400 [00:01<00:00, 205.03it/s]
_______________________________________________get_asset_memoized - 2.1s, 0.0min
[33mKiliAutoML[0m INFO [1mPredicting annotations for job: CLASSIFICATION_JOB[0m
[33mKiliAutoML[0m INFO [1mdefaulting to mod

Now you can ckeck that your assets have predictions on [Kili](https://cloud.kili-technology.com/)!

In [None]:
print(f"{KILI_URL}label/projects/{project_id}/menu/queue?currentPage=1&pageSize=20")

### Label Errors

You can add wrong labels to a small portion of the data to simulate a project where some data is incorrectly labeled. 


In [25]:
assets = kili.assets(project_id=project_id, fields=["id", "externalId"])

asset_ids = []
json_responses = []
for asset in tqdm(assets):

    external_id = int(asset["externalId"])
    if 299 < external_id < 310:
        asset_ids.append(asset_id["id"])
        json_responses.append({
                                  "CLASSIFICATION_JOB": {
                                      "categories": [{"name": "NON_VEHICLE"}]
                                  }
                              })

    elif 799 < external_id < 810:
        asset_ids.append(asset_id["id"])
        external_id = int(asset["externalId"])
        json_responses.append({
                                  "CLASSIFICATION_JOB": {
                                      "categories": [{"name": "VEHICLE"}]
                                  }
                              })

kili.append_labels(
    asset_ids,
    json_responses
)

100%|██████████| 1000/1000 [00:02<00:00, 376.75it/s]


  0%|          | 0/1000 [00:00<?, ?it/s]

[{'id': 'clba5o6ws00tv0lxs2t0labb6'},
 {'id': 'clba5o6ws00tw0lxs19a5f0t5'},
 {'id': 'clba5o6ws00tx0lxsek5a0fz0'},
 {'id': 'clba5o6ws00ty0lxs79wxa72l'},
 {'id': 'clba5o6ws00tz0lxscj4m1vi2'},
 {'id': 'clba5o6ws00u00lxs704yd9hk'},
 {'id': 'clba5o6ws00u10lxshgnl7k8c'},
 {'id': 'clba5o6ws00u20lxs6kve3maa'},
 {'id': 'clba5o6ws00u30lxsba0447nn'},
 {'id': 'clba5o6wt00u40lxs6v1g8lis'},
 {'id': 'clba5o6wt00u50lxs1xp329tn'},
 {'id': 'clba5o6wt00u60lxsa7ve1nxv'},
 {'id': 'clba5o6wt00u70lxscwkbc74y'},
 {'id': 'clba5o6wt00u80lxsbzw276eu'},
 {'id': 'clba5o6wt00u90lxs4c2hf82u'},
 {'id': 'clba5o6wt00ua0lxsglvifbv6'},
 {'id': 'clba5o6wt00ub0lxs4t47f33w'},
 {'id': 'clba5o6wt00uc0lxscfk90owp'},
 {'id': 'clba5o6wt00ud0lxs2a0m1e25'},
 {'id': 'clba5o6wt00ue0lxsb00009m5'}]

This command analyses the labeled data to detect potential labeling errors and sends warnings to the concerned assets. The user can then use the `potential_label_error` filter on the project's asset exploration to find the potentially problematic assets.

In [26]:
!kiliautoml label_errors \
    --api-key $api_key \
    --project-id $project_id

Loading KiliAutoML...
[33mKiliAutoML[0m INFO [1mAre you sure You want to send the label errors to Kili? Y/N[0m
y
[33mKiliAutoML[0m INFO [1mOK, We will send the label errors to Kili![0m
Please install version: "pip install kili==2.125.1"
100% 1/1 [00:00<00:00,  2.08it/s]
[33mKiliAutoML[0m INFO [1mDetecting errors for job: CLASSIFICATION_JOB[0m
[33mKiliAutoML[0m INFO [1mDownloading assets with status in ['LABELED', 'TO_REVIEW', 'REVIEWED'] from Kili project[0m
cache_path /root/.cache/kili/automl/clba4yl4g008s0lwmhlu89h7e/get_asset_memoized
[33mKiliAutoML[0m INFO [1mdefaulting to model_name=efficientnet_b0[0m
[33mKiliAutoML[0m INFO [1mDownloading images to folder /root/.cache/kili/automl/clba4yl4g008s0lwmhlu89h7e/CLASSIFICATION_JOB/torchvision/data[0m
Downloading images: 100% 600/600 [00:07<00:00, 78.60it/s]
Training and predicting on several folds:   0% 0/4 [00:00<?, ?it/s][33mKiliAutoML[0m INFO [1mInitialization of the model with N=2 classes[0m
[33mKiliAutoM

You can see in the project that several uncorrectly labeled images have been spotted.

In [27]:
print(f"{KILI_URL}label/projects/{project_id}/menu/queue?currentPage=1&metadata%5Blabeling_error%5D=true&pageSize=20")

https://cloud.kili-technology.com/label/projects/clba4yl4g008s0lwmhlu89h7e/menu/queue?currentPage=1&metadata%5Blabeling_error%5D=true&pageSize=20
