# Setup

## Install dependencies:

```
pip install ftfy regex tqdm autofaiss tensorflow google-colab
pip install git+https://github.com/openai/CLIP.git
```

Dependending on if you want to GPU-accelerate the nearest-neighbors index:

```pip install faiss-gpu``` OR ```pip install faiss-cpu```

## Run setup script

Run `setup.sh`.

It will create the following file structure:

```
laion400m/
  |- train/
      |- metadata
          |- metadata_0.parquet
          |- ...
          |- metadata_99.parquet
      |- npy
          |- img_emb_0.npy
          |- ...
          |- img_emb_99.npy
      |- image.index
  |- eval/
      |- metadata
          |- metadata_100.parquet
          |- ...
          |- metadata_199.parquet
      |- npy
          |- img_emb_100.npy
          |- ...
          |- img_emb_199.npy
  |- al/
      |- metadata
          |- metadata_200.parquet
          |- ...
          |- metadata_299.parquet
      |- npy
          |- img_emb_200.npy
          |- ...
          |- img_emb_299.npy
```

Licensed under the Apache License, Version 2.0.


In [None]:
#@title Dependencies

%load_ext autoreload
%autoreload 2

from annotator import Annotator
import numpy as np
import tensorflow as tf

from clip_knn_service import create
import utils

In [None]:
#@title FAISS setup.
knn_service = create(
    indices_paths="indices_train.json",

    # Enable hdf5 caching for the metadata. This reduces metadata memory
    # use to near zero, but on the first run will take a bit of time to
    # create the memory-mapped hdf5 files.
    enable_hdf5=True,

    # Use an index with memory mapping, decreasing memory use to zero.
    enable_faiss_memory_mapping=True
)

In [None]:
#@markdown ## Step 1: User Input
#@markdown What visual concept would you like to classify? Describe it in a short phrase.
concept = 'gourmet tuna' #@param {type:"string"}

#@markdown Great! Next, please enter a few positive and negative queries for your concept
#@markdown
#@markdown Positive queries are short phrases that are synonyms or define a specific subset of the concept.
#@markdown Negative queries are short phrases that define categories that are out-of-scope for your concept,
#@markdown but are related. For example, if your concept is `gourmet tuna`, a positive
#@markdown query could be `seared tuna` and a negative query could be
#@markdown `canned tuna`. You do not have to fill out all the queries, but try to think of at least one positive and negative (the more the better).
#@markdown Our system uses commas to separate different queries. For example if you want to use `tuna sushi` and `seared tuna` as postive queries, enter them as `tuna sushi, seared tuna`.
#@markdown You do not need to re-enter the concept as a query.

#@markdown Positive queries:
positive_queries = 'tuna sushi, seared tuna, tuna sashimi'  #@param {type:"string"}

#@markdown Negative queries:
negative_queries = 'canned tuna, tuna sandwich, tuna fish'  #@param {type:"string"}

def convert_query_string_to_list(query_str):
  query_list = query_str.split(',')
  # Remove starting spaces.
  query_list = [x.lstrip(' ') for x in query_list]
  # Remove empty strings
  return [x for x in query_list if x]

queries = convert_query_string_to_list(positive_queries)
queries.extend(convert_query_string_to_list(negative_queries))
queries.append(concept)

round_num = 0

In [None]:
#@markdown ## Step 2: Initial User Labeling
#@markdown Using your concept and provided queries, we find relevant images and ask you to rate them. We bring up a rating UI. Here are the instructions:
#@markdown 1. You can click the "Positive", "Negative", or "No Image" buttons to rate the image. We do not train on "No Image" ratings.
#@markdown 2. Alternatively you can press the '1', '2', '3' keys as keyboard shortcuts for positive, negative, and no image respectively. However, to enable keyboard shortcuts, make sure that you have the rating UI focused -- an easy way to ensure this is to simply click on the image with your mouse.
#@markdown 3. In case you want to re-rate images, you can use the left and right arrow keys to move forward and backward images. Simply press the buttons or use the '1', '2', '3' keys to rate again.

#@markdown Finding images and displaying the rating interface takes a few seconds.

# Randomly (but deterministically) take 50 images for initial labeling.
SAMPLE_SIZE = 100
nb_list = []

for query in queries:
  nb_list += knn_service.query(
        text_input=query,
        modality="image",
        num_images=100,
        num_result_ids=100,
        deduplicate=True,
    )
np.random.seed(0)
np.random.shuffle(nb_list)

# Take double the sample size initially because some images won't exist.
nb_list = nb_list[:SAMPLE_SIZE*2]
urls = [n['url'] for n in nb_list]

# Make sure that the urls actually exist.
exists = utils.check_images_exist_parallel(urls)

# Filter by results where URL exist.
exists_idx = np.argwhere(exists).squeeze()[:SAMPLE_SIZE]

nb_list = [nb_list[i] for i in exists_idx]
urls = [urls[i] for i in exists_idx]

cache_path = None # Set this if you want to save ratings.
annotator = Annotator(urls, cache_file_name=cache_path)
annotations = annotator.annotate()

In [None]:
#@markdown ## Step 3: Train an Initial Model
#@markdown Using the data you've rated, we train an initial model. This part should take about 2-3 minutes to run. You should see the training progress printing out.

# Filter our the 'no image' ratings.
annotations, nb_list = zip(*((anno, nb) for anno, nb in zip(annotations, nb_list) if anno != 'no image'))

x_labeled = np.asarray([n['embedding'] for n in nb_list])
y_labeled = np.asarray([anno == 'positive' for anno in annotations])

# Collect data
x_rand = np.load('laion400m/al/npy/img_emb_200.npy')
x_rand = x_rand[:500000]
y_rand = np.zeros((len(x_rand),))

print('Got', len(x_labeled), 'ground-truth labels and', len(x_rand), 'random labels')

# Upsample known positive and negatives.
pos_features = x_labeled[y_labeled]
pos_labels = y_labeled[y_labeled]

ids = np.arange(len(pos_features))
choices = np.random.choice(ids, len(x_rand))

res_pos_features = pos_features[choices]
res_pos_labels = pos_labels[choices]

neg_features = x_labeled[~y_labeled]
neg_labels = y_labeled[~y_labeled]

ids = np.arange(len(neg_features))
choices = np.random.choice(ids, len(x_rand)//2)
res_neg_features = neg_features[choices]
res_neg_labels = neg_labels[choices]

# Combine labeled and random annotations.
x_all = np.concatenate((res_pos_features, res_neg_features, x_rand), axis=0)
y_all = np.concatenate((res_pos_labels, res_neg_labels, y_rand), axis=0)

# Train a one-layer model (no hidden layers) since we have so few labels.
model = utils.create_classifier(layer_dims=[16])
print('The model has', model.count_params(), 'parameters')

utils.train_model(model, x_all, y_all, validation_data=0.0, verbose=1)
round_num = 0