<a href="https://colab.research.google.com/github/spaceml-org/Curator-Unlabeled-Image-Search-Guide/blob/main/notebooks/Active_Labeler.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This demo will demonstrate how to use a pretrained self-supervised model with the active labeler's command-line interface. We used [RESISC45](https://www.tensorflow.org/datasets/catalog/resisc45) in this demo. Although RESISC45 has labels, we first set up the dataset as an unlabeled dataset to demonstrate how to use unlabeled datasets in this pipeline.




# 1. Setup

## 1.1 Installing packages & Active Labeler

In [None]:
%cd "/content"
import os
import shutil

if os.path.exists('/content/Active-Labeler'):
  shutil.rmtree('/content/Active-Labeler')

!git clone https://github.com/spaceml-org/Active-Labeler.git

/content
Cloning into 'Active-Labeler'...
remote: Enumerating objects: 2106, done.[K
remote: Counting objects: 100% (2106/2106), done.[K
remote: Compressing objects: 100% (1522/1522), done.[K
remote: Total 2106 (delta 657), reused 1942 (delta 560), pack-reused 0
Receiving objects: 100% (2106/2106), 24.10 MiB | 26.31 MiB/s, done.
Resolving deltas: 100% (657/657), done.


In [None]:
!pip install -r /content/Active-Labeler/requirements.txt

Collecting split-folders==0.4.3
  Downloading split_folders-0.4.3-py3-none-any.whl (7.4 kB)
Collecting pytorch-lightning-bolts
  Downloading pytorch_lightning_bolts-0.3.2-py3-none-any.whl (253 kB)
[K     |████████████████████████████████| 253 kB 5.4 MB/s 
[?25hCollecting pytorch-lightning==1.1.8
  Downloading pytorch_lightning-1.1.8-py3-none-any.whl (696 kB)
[K     |████████████████████████████████| 696 kB 37.7 MB/s 
[?25hCollecting scikit-learn==0.23.2
  Downloading scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl (6.8 MB)
[K     |████████████████████████████████| 6.8 MB 32.6 MB/s 
[?25hCollecting wandb==0.10.12
  Downloading wandb-0.10.12-py2.py3-none-any.whl (1.8 MB)
[K     |████████████████████████████████| 1.8 MB 36.2 MB/s 
[?25hCollecting torch===1.7.1
  Downloading torch-1.7.1-cp37-cp37m-manylinux1_x86_64.whl (776.8 MB)
[K     |████████████████████████████████| 776.8 MB 17 kB/s 
[?25hCollecting torchtext==0.6.0
  Downloading torchtext-0.6.0-py3-none-any.whl (64 kB)

In [None]:
!pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda100

Looking in indexes: https://pypi.org/simple, https://developer.download.nvidia.com/compute/redist
Collecting nvidia-dali-cuda100
  Downloading https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda100/nvidia_dali_cuda100-1.3.0-2471498-py3-none-manylinux2014_x86_64.whl (391.8 MB)
[K     |████████████████████████████████| 391.8 MB 15 kB/s 
[?25hInstalling collected packages: nvidia-dali-cuda100
Successfully installed nvidia-dali-cuda100-1.3.0


## 1.2 Setting up a dataset and Self-Supervised Learner (SSL) model

In [None]:
import os
import shutil
import pathlib
from pathlib import Path
from imutils import paths

In [None]:
if os.path.exists("/content/RESISC45"):
  shutil.rmtree("/content/RESISC45")

!gdown https://drive.google.com/uc?id=14zEhqi9mczZaLEb33TQuKbhmurn2ClGL&export=download
!unrar x /content/RESISC45.rar
!mv NWPU-RESISC45/ RESISC45/
!rm -rf /content/RESISC45.rar
 
folder = '/content/Dataset/Unlabeled'
if os.path.exists(folder):
    shutil.rmtree(folder)

pathlib.Path(folder).mkdir(parents=True, exist_ok=True)

for i in paths.list_images('/content/RESISC45'):
  shutil.copy(i,os.path.join(folder,i.split('/')[-1]))

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Extracting  NWPU-RESISC45/snowberg/snowberg_609.jpg                       83%  OK 
Extracting  NWPU-RESISC45/snowberg/snowberg_610.jpg                       83%  OK 
Extracting  NWPU-RESISC45/snowberg/snowberg_611.jpg                       83%  OK 
Extracting  NWPU-RESISC45/snowberg/snowberg_612.jpg                       83%  OK 
Extracting  NWPU-RESISC45/snowberg/snowberg_613.jpg                       83%  OK 
Extracting  NWPU-RESISC45/snowberg/snowberg_614.jpg                       83%  OK 
Extracting  NWPU-RESISC45/snowberg/snowberg_615.jpg                       83%  OK 
Extracting  NWPU-RESISC45/snowberg/snowberg_616.jpg                       83%  OK 
Extracting  NWPU-RESISC45/snowberg/snowberg_617.jpg                       83%  OK 
Extracting  NWPU-RESISC45/snowberg/snowberg_618.jpg                       83%  OK 
Extracting  NWP

In [None]:
!gdown https://drive.google.com/uc?id=1h2rm2SrcsqBXoxoHqzseuKtQfX3EVj-8

Downloading...
From: https://drive.google.com/uc?id=1h2rm2SrcsqBXoxoHqzseuKtQfX3EVj-8
To: /content/RESISC45-imagenet_resnet18.ckpt
137MB [00:01, 115MB/s]


# 2. Active Labeler CLI Tool

## 2-1. Changing config files

The code cell below will change *model_config.yaml* file and *pipeline_config.yml* file so that the CLI tool runs on Colab. If you want to run it on you local device, you will have to manually change the config files. The main changes are the location of the SSL model and the refrence image along with model's embedding size. 

In [None]:
# Edit the config files
import yaml

with open("/content/Active-Labeler/model_config.yml") as f:
     list_doc = yaml.safe_load(f)

list_doc["encoder"]["encoder_path"] = "/content/RESISC45-imagenet_resnet18.ckpt"
list_doc["encoder"]["e_embedding_size"] = 512

with open("/content/Active-Labeler/model_config.yml", "w") as f:
    yaml.dump(list_doc, f, default_flow_style=False)


with open("/content/Active-Labeler/pipeline_config.yml") as f:
     list_doc = yaml.safe_load(f)

list_doc["model"]["model_path"] = "/content/RESISC45-imagenet_resnet18.ckpt"
list_doc["model"]["embedding_size"] = 512
list_doc["seed_dataset"]["ref_img_path"] = "/content/RESISC45/airplane/airplane_001.jpg"

with open("/content/Active-Labeler/pipeline_config.yml", "w") as f:
    yaml.dump(list_doc, f, default_flow_style=False)

## 2-2. Running Active Labeler

Once you run the cell below, you'll get a URL for labeling images. 

**Note**: This link only works on Colab. If you are running the CLI tool on your local device, you will get a different link (http://0.0.0.0:5000/) after you run the CLI command. Use that link for your local device.

In [None]:
from google.colab.output import eval_js
print(eval_js("google.colab.kernel.proxyPort(5000)"))

https://yfw07ayh68-496ff2e9c6d22116-5000-colab.googleusercontent.com/


In [None]:
!python3 /content/Active-Labeler/main.py --config_path /content/Active-Labeler/pipeline_config.yml

Initialization
Load Config
HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=31500.0), HTML(value='')))
  im = torch.Tensor(im).unsqueeze(0).cpu()
