# Goal
---
The goal of this notebook is to extract images from Azure Blob storage and obtain the post-CNN feature vectors using the existing `cnn-v1-b3` model.

This notebook is optimized to conduct model inference in parallel on CPU.

## 0. Set-up
To begin, we'll need to import all necessary modules. This should come installed with the virtual environment provided by [`environment.yml`](../environment.yml).

If not, please install the modules with the following commands:

```bash
pip install <module_name>
```

or 

```bash
conda install <module_name>
```

In [210]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import os

import azure.storage.blob
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

import sys
sys.path.append('../PIVOT/') # point to wherever the ml-workflow directory is

import utils.data_utils as du
import utils.insert_data as idu
import utils.sql_utils as sq

from tqdm.auto import trange, tqdm
import concurrent.futures

from importlib import reload
reload(du)
reload(idu)


<module 'utils.insert_data' from 'C:\\Users\\clair\\HF\\Pivot_App\\PIVOT\\notebooks\\../PIVOT\\utils\\insert_data.py'>

In [211]:
sq.run_sql_query("select * from Images;")

  sq.run_sql_query("select * from Images;")


# 1. Load in the training data for cnn-v1-b3 model

Let's load in the 400K training images found in `../PIVOT/data/model-summary-cnn-v1-b3.csv`.
Essentially, the dataset contains the following attributes:
* `full_path`: the location of the file on Ali Chase's VM
* `pred_label`: an integer denoting which class the the model predicts the image to be.
* `true_label`: an integer denoting which class the image actually is .
* `high_group`: the corresponding text description of the true class. (None/NaN denote "Unidentifiable")
* `is_correct`: whether or not the model got the image prediction right.
* `top_5`: whether or not the true prediction is within the top5 probabilities from the model.
* `0`: the probability of the image belonging to class 0
* `1`: the probability of the image belonging to class 1
$$\dots$$
* `9`: the probability of the image belonging to class 9


In [212]:
train_data = pd.read_csv("../PIVOT/data/model-summary-cnn-v1-b3.csv")
train_data.head()

Unnamed: 0,full_path,high_group,pred_label,true_label,is_correct,0,1,2,3,4,5,6,7,8,9,top_5
0,/home/azureuser/data/NAAMES_ml/D20160513T22082...,Other,7,7,1,1.8e-05,1.1461140000000001e-18,2.892286e-07,1e-05,4.91494e-13,0.003955,1.792351e-06,0.995379,0.000604,3.3e-05,1
1,/home/azureuser/data/NAAMES_ml/D20180402T13445...,Other,0,7,0,0.53955,0.0001456711,4.600495e-09,0.000224,0.1425122,0.006822,3.214117e-11,0.083037,0.226862,0.000847,1
2,/home/azureuser/data/NAAMES_ml/D20160528T22330...,Other,7,7,1,0.00012,5.510693e-15,2.841731e-05,7.5e-05,6.837943e-09,0.000447,6.920837e-07,0.994058,8.6e-05,0.005186,1
3,/home/azureuser/data/NAAMES_ml/D20160531T08382...,Other,7,7,1,0.000241,2.925888e-11,4.873408e-06,0.003934,2.065363e-09,0.273014,0.0001542221,0.718308,0.000597,0.003748,1
4,/home/azureuser/data/NAAMES_ml/D20160526T21432...,Diatom,3,3,1,0.000313,4.06205e-14,6.139785e-05,0.710152,2.181512e-10,0.000211,4.933558e-06,0.287664,0.000222,0.001372,1


Let's do a quick sanity check to make sure the pred labels line up with the columns.

In [213]:
np.all(train_data[list(np.arange(10).astype(int).astype(str))].idxmax(axis=1).astype(int) == train_data.pred_label)

True

### Get the order of the class labels and replace the `np.nan` with `Null`.

In [214]:
class_labels = train_data.drop_duplicates(subset = ['true_label', 'high_group']).sort_values(by="true_label")['high_group'].values
class_labels[-1] = "Null"
class_labels = list(class_labels)

class_labels

['Chloro',
 'Cilliate',
 'Crypto',
 'Diatom',
 'Dictyo',
 'Dinoflagellate',
 'Eugleno',
 'Other',
 'Prymnesio',
 'Null']

### Extract the filepath for blob storage and rename the 0-9 index probabilities with their respective class_labels.

In [215]:
train_data["filepath"] = 'ml/' + train_data['full_path'].str.split("NAAMES_ml/", expand=True)[1]

In [216]:
train_data.head()

Unnamed: 0,full_path,high_group,pred_label,true_label,is_correct,0,1,2,3,4,5,6,7,8,9,top_5,filepath
0,/home/azureuser/data/NAAMES_ml/D20160513T22082...,Other,7,7,1,1.8e-05,1.1461140000000001e-18,2.892286e-07,1e-05,4.91494e-13,0.003955,1.792351e-06,0.995379,0.000604,3.3e-05,1,ml/D20160513T220825_IFCB107/IFCB107D20160513T2...
1,/home/azureuser/data/NAAMES_ml/D20180402T13445...,Other,0,7,0,0.53955,0.0001456711,4.600495e-09,0.000224,0.1425122,0.006822,3.214117e-11,0.083037,0.226862,0.000847,1,ml/D20180402T134458_IFCB107/IFCB107D20180402T1...
2,/home/azureuser/data/NAAMES_ml/D20160528T22330...,Other,7,7,1,0.00012,5.510693e-15,2.841731e-05,7.5e-05,6.837943e-09,0.000447,6.920837e-07,0.994058,8.6e-05,0.005186,1,ml/D20160528T223306_IFCB107/IFCB107D20160528T2...
3,/home/azureuser/data/NAAMES_ml/D20160531T08382...,Other,7,7,1,0.000241,2.925888e-11,4.873408e-06,0.003934,2.065363e-09,0.273014,0.0001542221,0.718308,0.000597,0.003748,1,ml/D20160531T083824_IFCB107/IFCB107D20160531T0...
4,/home/azureuser/data/NAAMES_ml/D20160526T21432...,Diatom,3,3,1,0.000313,4.06205e-14,6.139785e-05,0.710152,2.181512e-10,0.000211,4.933558e-06,0.287664,0.000222,0.001372,1,ml/D20160526T214329_IFCB107/IFCB107D20160526T2...


In [217]:
use_cols = ["filepath", 'pred_label'] + list(np.arange(10).astype(int).astype(str))
train_df = train_data[use_cols]
train_df = train_df.rename(columns = dict(zip(list(np.arange(10).astype(int).astype(str)),class_labels)))
del train_data
train_df.head()

Unnamed: 0,filepath,pred_label,Chloro,Cilliate,Crypto,Diatom,Dictyo,Dinoflagellate,Eugleno,Other,Prymnesio,Null
0,ml/D20160513T220825_IFCB107/IFCB107D20160513T2...,7,1.8e-05,1.1461140000000001e-18,2.892286e-07,1e-05,4.91494e-13,0.003955,1.792351e-06,0.995379,0.000604,3.3e-05
1,ml/D20180402T134458_IFCB107/IFCB107D20180402T1...,0,0.53955,0.0001456711,4.600495e-09,0.000224,0.1425122,0.006822,3.214117e-11,0.083037,0.226862,0.000847
2,ml/D20160528T223306_IFCB107/IFCB107D20160528T2...,7,0.00012,5.510693e-15,2.841731e-05,7.5e-05,6.837943e-09,0.000447,6.920837e-07,0.994058,8.6e-05,0.005186
3,ml/D20160531T083824_IFCB107/IFCB107D20160531T0...,7,0.000241,2.925888e-11,4.873408e-06,0.003934,2.065363e-09,0.273014,0.0001542221,0.718308,0.000597,0.003748
4,ml/D20160526T214329_IFCB107/IFCB107D20160526T2...,3,0.000313,4.06205e-14,6.139785e-05,0.710152,2.181512e-10,0.000211,4.933558e-06,0.287664,0.000222,0.001372


In [218]:
train_df_fp = train_df[['filepath']].to_dict(orient='records')

In [219]:
# convert pred_label to string
pred_labels = train_df.pred_label.apply(lambda x: class_labels[x])
train_df['pred_label'] = pred_labels

In [220]:
train_df.head()

Unnamed: 0,filepath,pred_label,Chloro,Cilliate,Crypto,Diatom,Dictyo,Dinoflagellate,Eugleno,Other,Prymnesio,Null
0,ml/D20160513T220825_IFCB107/IFCB107D20160513T2...,Other,1.8e-05,1.1461140000000001e-18,2.892286e-07,1e-05,4.91494e-13,0.003955,1.792351e-06,0.995379,0.000604,3.3e-05
1,ml/D20180402T134458_IFCB107/IFCB107D20180402T1...,Chloro,0.53955,0.0001456711,4.600495e-09,0.000224,0.1425122,0.006822,3.214117e-11,0.083037,0.226862,0.000847
2,ml/D20160528T223306_IFCB107/IFCB107D20160528T2...,Other,0.00012,5.510693e-15,2.841731e-05,7.5e-05,6.837943e-09,0.000447,6.920837e-07,0.994058,8.6e-05,0.005186
3,ml/D20160531T083824_IFCB107/IFCB107D20160531T0...,Other,0.000241,2.925888e-11,4.873408e-06,0.003934,2.065363e-09,0.273014,0.0001542221,0.718308,0.000597,0.003748
4,ml/D20160526T214329_IFCB107/IFCB107D20160526T2...,Diatom,0.000313,4.06205e-14,6.139785e-05,0.710152,2.181512e-10,0.000211,4.933558e-06,0.287664,0.000222,0.001372


## 2. Insert train data filepaths to SQL

We call the `initial_ingestion()` function that takes in data from an initial set of blob filepaths. It will check whether the filepath exists and then inserts it into the `IMAGES` table.

In [221]:
idu.initial_ingestion(image_filepaths=train_df.filepath.values)

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/400 [00:00<?, ?it/s]

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Inserting Batch 10:


0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 8:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 4:


0it [00:00, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 14:
	Inserted 1000 images
Inserting Batch 17:


0it [00:00, ?it/s]

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



	Inserted 1000 images
Inserting Batch 2:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 23:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 22:


0it [00:00, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 20:
	Inserted 1000 images
Inserting Batch 1:


0it [00:00, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 16:
	Inserted 1000 images
Inserting Batch 9:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 19:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 13:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 11:


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 5:


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 3:


0it [00:00, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 15:
	Inserted 1000 images
Inserting Batch 18:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 12:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 21:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 7:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 0:


IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



	Inserted 1000 images
Inserting Batch 24:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 38:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 40:


  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 28:


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 37:


  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 39:
	Inserted 1000 images
Inserting Batch 47:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 43:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 42:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 41:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 35:


0it [00:00, ?it/s]

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 30:
	Inserted 1000 images
Inserting Batch 34:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 45:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 33:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 29:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 26:


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 32:


  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 46:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 27:


  0%|          | 0/1000 [00:00<?, ?it/s]

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 31:
	Inserted 1000 images
Inserting Batch 44:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 25:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 55:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 60:


IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



0it [00:00, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 62:
	Inserted 1000 images
Inserting Batch 59:
	Inserted 1000 images
Inserting Batch 57:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 54:


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 67:


0it [00:00, ?it/s]

	Inserted 1000 images


  0%|          | 0/1000 [00:00<?, ?it/s]

Inserting Batch 68:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 50:


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 49:


IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 65:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 64:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 58:


0it [00:00, ?it/s]

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 52:
	Inserted 1000 images
Inserting Batch 63:
	Inserted 1000 images
Inserting Batch 70:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 48:


  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

Inserting Batch 51:


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 53:


  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 61:
	Inserted 1000 images
Inserting Batch 66:


0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 69:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 71:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 77:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 72:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 80:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 73:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 75:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 78:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 87:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 76:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 81:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 86:


0it [00:00, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 88:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 89:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 74:


0it [00:00, ?it/s]

0it [00:00, ?it/s]

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



	Inserted 1000 images
Inserting Batch 84:
	Inserted 1000 images
Inserting Batch 83:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 85:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 82:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 90:


0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 91:


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 93:


0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 92:
	Inserted 1000 images
Inserting Batch 94:


0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 95:


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 103:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 101:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 108:


0it [00:00, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 107:
	Inserted 1000 images
Inserting Batch 105:
	Inserted 1000 images
Inserting Batch 106:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 109:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 100:


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 98:


  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 97:


0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 113:
	Inserted 1000 images
Inserting Batch 99:


0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 112:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 96:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 102:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 104:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 114:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 110:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 111:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 115:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 116:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 117:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 118:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 119:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 120:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 131:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 127:


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 121:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 122:


  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 128:


0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 134:


0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 135:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 124:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 125:


0it [00:00, ?it/s]

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



	Inserted 1000 images
Inserting Batch 126:
	Inserted 1000 images
Inserting Batch 123:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 137:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 138:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 133:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 130:


0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 132:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 129:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 139:


  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 140:


0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 141:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 142:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 143:


IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



	Inserted 1000 images
Inserting Batch 148:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 150:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 144:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 151:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 153:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 155:


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 158:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 146:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 159:


  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 157:


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 154:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 152:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 147:


0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 160:
	Inserted 1000 images
Inserting Batch 149:
	Inserted 1000 images
Inserting Batch 156:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 145:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 161:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 162:


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 163:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 165:


0it [00:00, ?it/s]

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 164:
	Inserted 1000 images
Inserting Batch 166:


0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 167:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 171:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 168:


0it [00:00, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 170:
	Inserted 1000 images
Inserting Batch 169:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 185:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 178:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 183:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 174:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 175:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 172:


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 173:


  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 180:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 184:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 176:


  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 181:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 179:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 182:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 177:


0it [00:00, ?it/s]

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



	Inserted 1000 images
Inserting Batch 186:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 187:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 188:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 189:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 191:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 190:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 195:


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 196:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 198:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 199:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 201:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 205:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 193:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 197:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 204:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 202:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 200:


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 209:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 192:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 203:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 208:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 207:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 194:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 206:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 210:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 211:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 213:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 214:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 215:


0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 212:
	Inserted 1000 images
Inserting Batch 218:
	Inserted 1000 images
Inserting Batch 232:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 219:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 225:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 220:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 227:


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 216:


0it [00:00, ?it/s]

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



	Inserted 1000 images
Inserting Batch 229:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 226:


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 222:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 217:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 221:


0it [00:00, ?it/s]

0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 223:
	Inserted 1000 images
Inserting Batch 228:


IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 233:
	Inserted 1000 images
Inserting Batch 224:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 230:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 231:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 234:


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/780 [00:00<?, ?it/s]

	Inserted 1000 images
Inserting Batch 239:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 235:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 236:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 237:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 238:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 244:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 240:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 248:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 254:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 246:


0it [00:00, ?it/s]

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



	Inserted 1000 images
Inserting Batch 256:
	Inserted 1000 images
Inserting Batch 255:


0it [00:00, ?it/s]

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



	Inserted 1000 images
Inserting Batch 261:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 260:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 259:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 266:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 269:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 267:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 265:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 264:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 278:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 268:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 271:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 270:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 272:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 274:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 280:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 277:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 273:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 275:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 279:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 281:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 276:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 282:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 284:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 285:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 287:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 283:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 286:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 292:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 299:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 294:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 298:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 297:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 304:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 289:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 296:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 300:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 295:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 288:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 301:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 290:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 302:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 303:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 291:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 293:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 305:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 309:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 306:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 307:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 310:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 311:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 308:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 312:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 314:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 329:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 323:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 320:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 325:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 318:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 328:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 324:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 319:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 316:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 313:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 315:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 322:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 317:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 321:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 327:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 326:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 335:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 333:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 332:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 331:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 330:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 334:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 347:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 345:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 340:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 338:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 339:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 349:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 351:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 336:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 342:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 346:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 343:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 341:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 337:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 344:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 348:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 353:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 352:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 350:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 357:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 354:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 359:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 356:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 358:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 355:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 368:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 360:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 363:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 375:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 373:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 367:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 377:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 369:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 364:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 366:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 361:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 362:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 371:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 365:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 370:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 372:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 376:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 374:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 383:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 379:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 380:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 382:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 378:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 381:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 399:


0it [00:00, ?it/s]

	Inserted 780 images
Inserting Batch 389:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 387:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 388:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 391:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 394:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 384:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 392:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 393:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 385:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 397:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 390:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 395:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 398:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 386:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Batch 396:


0it [00:00, ?it/s]

	Inserted 1000 images
Inserting Last Batch (396):


0it [00:00, ?it/s]

IntegrityError: (2627, b"Violation of UNIQUE KEY constraint 'UQ__images__DFE356BE9D779A25'. Cannot insert duplicate key in object 'dbo.images'. The duplicate key value is (ml/D20160529T215202_IFCB107/IFCB107D20160529T215202P01207.png).DB-Lib error message 20018, severity 14:\nGeneral SQL Server error: Check messages from the SQL Server\n")

If we see any images that weren't found, we store then and attempt to reinsert the images that encountered issues.

In [222]:
images_that_no_exist = []

In [223]:
image_ids = sq.run_sql_query("select * from images;")

remaining_images = list(set(train_df.filepath.values) - set(image_ids.filepath.values) - set(images_that_no_exist))
len(remaining_images)

0

In [224]:
idu.initial_ingestion(image_filepaths=remaining_images)

0it [00:00, ?it/s]

UnboundLocalError: local variable 'local_exists' referenced before assignment

## 3. Insert into PREDICTIONS 

Now, let's gather I_IDs for each of the imagery we inserted. Then, we can join the with the class_probs and insert the predictions into the `PREDICTIONS` table

We know these two data points have differing shapes so we merge and expect output size of 399775.

In [225]:
image_ids.shape

(399780, 2)

In [226]:
train_df.shape

(399780, 12)

In [227]:
train_df_merged = train_df.merge(image_ids, on="filepath", how="inner").sort_values(by="i_id")

In [228]:
print(train_df_merged.shape)
train_df_merged.head()

(399780, 13)


Unnamed: 0,filepath,pred_label,Chloro,Cilliate,Crypto,Diatom,Dictyo,Dinoflagellate,Eugleno,Other,Prymnesio,Null,i_id
10000,ml/D20151109T032543_IFCB107/IFCB107D20151109T0...,Other,0.01340362,7.251423e-09,0.0002045121,0.001316,6.023725e-06,0.031816,8.516476e-05,0.900335,0.001642,0.051192,1
10001,ml/D20170913T184403_IFCB107/IFCB107D20170913T1...,Null,8.667803e-08,2.886774e-08,1.139554e-07,0.002341,6.971248e-06,0.000263,9.341269e-07,0.01062,0.000444,0.986324,2
10002,ml/D20151107T163826_IFCB107/IFCB107D20151107T1...,Other,5.006306e-05,2.241651e-16,2.092107e-06,1.6e-05,5.580865e-11,0.001865,1.348206e-06,0.995065,0.002943,5.8e-05,3
10003,ml/D20160602T190205_IFCB107/IFCB107D20160602T1...,Other,0.0005993233,1.755176e-07,1.634909e-05,0.003645,4.334597e-07,0.372081,0.0001838412,0.612976,0.001276,0.009223,4
10004,ml/D20151107T165939_IFCB107/IFCB107D20151107T1...,Other,0.07506377,3.889124e-08,0.0003356065,0.000456,2.155542e-05,0.008954,2.321053e-05,0.866775,0.04694,0.00143,5


In [229]:
# combine the class_probs into 1 dictionary per image
prob_dicts = (train_df_merged[class_labels]
              .rename(columns = dict(zip(class_labels, list(np.arange(10).astype(int)))))
              .to_dict(orient = 'records')
             )

In [230]:
# Gather the necessary variables in the right order for inserting into the Predictions Table
pred_table_train = train_df_merged[['i_id', 'pred_label']]
pred_table_train['class_prob'] = np.array(prob_dicts, dtype=str)
# assign model id as 1
pred_table_train['m_id'] = 1
# reorder for insertion
pred_table_train = pred_table_train[['m_id', 'i_id', 'class_prob', 'pred_label']]

pred_table_train.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pred_table_train['class_prob'] = np.array(prob_dicts, dtype=str)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pred_table_train['m_id'] = 1


Unnamed: 0,m_id,i_id,class_prob,pred_label
10000,1,1,"{0: 0.013403621, 1: 7.251423e-09, 2: 0.0002045...",Other
10001,1,2,"{0: 8.667803e-08, 1: 2.8867737e-08, 2: 1.13955...",Null
10002,1,3,"{0: 5.0063063e-05, 1: 2.241651e-16, 2: 2.09210...",Other
10003,1,4,"{0: 0.0005993233, 1: 1.755176e-07, 2: 1.634909...",Other
10004,1,5,"{0: 0.07506377, 1: 3.8891244e-08, 2: 0.0003356...",Other


### Insert model into models table first.

We use `data_utils.insert_data()` to insert the model, which is a single observation.

In [208]:
cnn_class_map = dict(zip(list(np.arange(10).astype(int)),class_labels))

du.insert_data(table_name='models', data = {"model_name":"model-cnn-v1-b3",
                                            "model_link": "https://basemodel-endpoint.westus2.inference.ml.azure.com/score",
                                            "class_map": str(cnn_class_map)
})

Decimal('1')

Let's check that it was actually inserted.

In [231]:
sq.run_sql_query("select * from models;")

Unnamed: 0,m_id,model_name,model_link,class_map
0,1,model-cnn-v1-b3,https://basemodel-endpoint.westus2.inference.m...,"{0: 'Chloro', 1: 'Cilliate', 2: 'Crypto', 3: '..."


You can read in the str(dictionary) to python with ast.literal_eval(string)
```python
    import ast
    ast.literal_eval(string)
```

### Insert the raw data into predictions

Let's format the data into the desired format and bulk insert.

In [232]:
pred_table_train_lst = pred_table_train.to_dict(orient='records')

We use the `insert_data.bulk_insert()` that takes in either a single dict or a list of dicts to insert.

In [233]:
idu.bulk_insert_data(table_name="predictions", data=pred_table_train_lst)

0it [00:00, ?it/s]

In [234]:
sq.run_sql_query("""
SELECT column_name, data_type, character_maximum_length
FROM information_schema.columns
WHERE table_name = 'models';
""")

Unnamed: 0,column_name,data_type,character_maximum_length
0,m_id,int,
1,model_name,varchar,255.0
2,model_link,varchar,-1.0
3,class_map,varchar,-1.0


## 4. Checkout Non-Train Data
---
Now, let's repeat the same process with the subsampled set of 1M unvalidated images. This can be found under 
`./data/inventory_df_with_probs.parquet.gzip`. Note, this dataset was generated using the code from [`Parallelized_Image_Loading.ipynb`](../notebooks/Parallelized_Image_Loading.ipynb)

Essentially, dataset contains the following attributes:
* `image_path`: the location of the file on Ali Chase's VM
* `pred_label`: an integer denoting which class the the model predicts the image to be.
* `pred_class`: the corresponding text description of each class. (None/NaN denote "Unidentifiable")
* `0`: the probability of the image belonging to class 0
* `1`: the probability of the image belonging to class 1
$$\dots$$

* `9`: the probability of the image belonging to class 9

For our analysis, we'll only use the columns with the class probabilities.

In [235]:
# load all new data
new_data = pd.read_parquet("../PIVOT/data/inventory_df_with_probs.parquet.gzip")

print(new_data.shape)
new_data.head()

(1056000, 15)


Unnamed: 0,Index,image_path,pred_label,pred_class,index,0,1,2,3,4,5,6,7,8,9
0,0,/Users/alisonchase/Documents/IFCB/NAAMES/NAAME...,7,Other,0,0.00319,2.078881e-11,8.588634e-05,0.000173,8.921839e-08,0.003271,8.093556e-06,0.957533,0.035501,0.000237
1,1,/Users/alisonchase/Documents/IFCB/NAAMES/NAAME...,7,Other,1,0.007551,5.126985e-07,0.0003592776,0.002601,2.53448e-06,0.204228,0.00101794,0.718851,0.061205,0.004183
2,2,/Users/alisonchase/Documents/IFCB/NAAMES/NAAME...,9,,2,0.000135,1.20064e-11,1.257837e-05,0.00022,2.652156e-05,0.000103,1.847491e-07,0.390917,0.000375,0.608209
3,3,/Users/alisonchase/Documents/IFCB/NAAMES/NAAME...,7,Other,3,0.001123,1.05487e-14,4.769512e-05,0.000167,1.587807e-10,0.013132,1.304258e-05,0.98447,0.000259,0.000788
4,4,/Users/alisonchase/Documents/IFCB/NAAMES/NAAME...,7,Other,4,0.0001,7.816149e-09,8.546382e-08,0.008106,8.124642e-09,0.343707,2.029714e-05,0.645359,0.000388,0.002319


In [236]:
# extract the blob storage path
new_data['filepath'] = 'ml/' + new_data.image_path.str.split("NAAMES_ml/", expand=True)[1]
# remove data that was already in train
new_df = new_data.drop(columns = ['image_path']).merge(image_ids, on ='filepath', how ='outer', indicator=True).query('_merge=="left_only"')
# gather only necessary columns
new_df = new_df.drop(columns = ['Index', 'pred_class', 'index', 'i_id', '_merge'])
# convert pred_label into int
new_df['pred_label'] = new_df['pred_label'].astype(int)
# convert pred_label to string that will be written to sql
pred_labels = new_df.pred_label.apply(lambda x: class_labels[x])
new_df['pred_label'] = pred_labels

In [237]:
new_df.head()

Unnamed: 0,pred_label,0,1,2,3,4,5,6,7,8,9,filepath
0,Null,5.478702e-05,1.248894e-09,5.017794e-06,0.000321,3.204076e-05,0.000771,2.426469e-07,0.338648,0.000835,0.659334,ml/D20151104T062314_IFCB107/IFCB107D20151104T0...
1,Null,0.0009859082,5.909088e-09,9.584037e-05,0.000357,6.094646e-05,0.001303,1.327462e-06,0.410725,0.000547,0.585924,ml/D20151104T062314_IFCB107/IFCB107D20151104T0...
2,Other,0.0007118566,2.966088e-11,0.0002576837,0.00047,7.211523e-06,0.000671,4.228251e-06,0.654213,0.0003,0.343365,ml/D20151104T062314_IFCB107/IFCB107D20151104T0...
3,Null,3.839303e-06,2.288627e-08,2.723434e-06,0.289263,1.254053e-05,0.000283,0.0001665228,0.033909,0.000527,0.675832,ml/D20151104T062314_IFCB107/IFCB107D20151104T0...
4,Null,5.20861e-07,6.950269e-09,4.218971e-09,2.8e-05,9.571343e-07,1.3e-05,1.176711e-10,0.014986,7.1e-05,0.984902,ml/D20151104T062314_IFCB107/IFCB107D20151104T0...


#### Bulk Insert to IMAGES

In [238]:
new_imagepaths = new_df[['filepath']].to_dict(orient="records")

In [239]:
%%time
idu.bulk_insert_data(table_name="images", data=new_imagepaths)

0it [00:00, ?it/s]

CPU times: total: 17.6 s
Wall time: 1min 40s


Let's gather the I_IDs of the images we inserted.

In [240]:
image_ids = sq.run_sql_query("select * from images;")

In [241]:
image_ids

Unnamed: 0,i_id,filepath
0,1,ml/D20151109T032543_IFCB107/IFCB107D20151109T0...
1,2,ml/D20170913T184403_IFCB107/IFCB107D20170913T1...
2,3,ml/D20151107T163826_IFCB107/IFCB107D20151107T1...
3,4,ml/D20160602T190205_IFCB107/IFCB107D20160602T1...
4,5,ml/D20151107T165939_IFCB107/IFCB107D20151107T1...
...,...,...
1382667,1382669,ml/D20180412T012434_IFCB107/IFCB107D20180412T0...
1382668,1382670,ml/D20180412T012434_IFCB107/IFCB107D20180412T0...
1382669,1382671,ml/D20180412T012434_IFCB107/IFCB107D20180412T0...
1382670,1382672,ml/D20180412T012434_IFCB107/IFCB107D20180412T0...


### Gather image_ids and prepare table for inserting into predictions

In [242]:
pred_table_new = new_df.merge(image_ids, on='filepath', how ='inner').sort_values(by="i_id")

In [243]:
pred_table_new.head()

Unnamed: 0,pred_label,0,1,2,3,4,5,6,7,8,9,filepath,i_id
0,Null,5.478702e-05,1.248894e-09,5.017794e-06,0.000321,3.204076e-05,0.000771,2.426469e-07,0.338648,0.000835,0.659334,ml/D20151104T062314_IFCB107/IFCB107D20151104T0...,399782
1,Null,0.0009859082,5.909088e-09,9.584037e-05,0.000357,6.094646e-05,0.001303,1.327462e-06,0.410725,0.000547,0.585924,ml/D20151104T062314_IFCB107/IFCB107D20151104T0...,399783
2,Other,0.0007118566,2.966088e-11,0.0002576837,0.00047,7.211523e-06,0.000671,4.228251e-06,0.654213,0.0003,0.343365,ml/D20151104T062314_IFCB107/IFCB107D20151104T0...,399784
3,Null,3.839303e-06,2.288627e-08,2.723434e-06,0.289263,1.254053e-05,0.000283,0.0001665228,0.033909,0.000527,0.675832,ml/D20151104T062314_IFCB107/IFCB107D20151104T0...,399785
4,Null,5.20861e-07,6.950269e-09,4.218971e-09,2.8e-05,9.571343e-07,1.3e-05,1.176711e-10,0.014986,7.1e-05,0.984902,ml/D20151104T062314_IFCB107/IFCB107D20151104T0...,399786


In [244]:
# combine the class_probs into 1 dictionary per image
prob_dicts_new = (pred_table_new[list(np.arange(10).astype(int).astype(str))]
              .rename(columns = dict(zip(list(np.arange(10).astype(int).astype(str)), list(np.arange(10).astype(int)))))
              .to_dict(orient = 'records')
             )

In [245]:
# Gather the necessary variables in the right order for inserting into the Predictions Table
pred_table_new = pred_table_new[['i_id', 'pred_label']]
pred_table_new['class_prob'] = np.array(prob_dicts_new, dtype=str)
# assign model id as 1
pred_table_new['m_id'] = 1
# reorder for insertion
pred_table_new = pred_table_new[['m_id', 'i_id', 'class_prob', 'pred_label']]

pred_table_new.head()

Unnamed: 0,m_id,i_id,class_prob,pred_label
0,1,399782,"{0: 5.4787022e-05, 1: 1.2488937e-09, 2: 5.0177...",Null
1,1,399783,"{0: 0.0009859082, 1: 5.909088e-09, 2: 9.584037...",Null
2,1,399784,"{0: 0.00071185664, 1: 2.9660885e-11, 2: 0.0002...",Other
3,1,399785,"{0: 3.839303e-06, 1: 2.288627e-08, 2: 2.723434...",Null
4,1,399786,"{0: 5.20861e-07, 1: 6.950269e-09, 2: 4.2189705...",Null


In [246]:
pred_table_new_lst = pred_table_new.to_dict(orient='records')

In [247]:
len(pred_table_new_lst)

982892

In [248]:
idu.bulk_insert_data(table_name="predictions", data=pred_table_new_lst)

0it [00:00, ?it/s]

## 5. Now, Select Random Subset for Test Evaluation

Insert into DISSIMILARITY and MODEL table first to prevent Foreign Key errors.

In [256]:
# insert 0 representing random test_data
sq.run_sql_query(
"""
SET IDENTITY_INSERT dissimilarity ON;

INSERT INTO dissimilarity (d_id, name, formula)
VALUES (0, 'random_sample', 'none');
SET IDENTITY_INSERT dissimilarity OFF;
"""
)

IntegrityError: (2627, b"Violation of PRIMARY KEY constraint 'PK__dissimil__D95F582B521F4F8B'. Cannot insert duplicate key in object 'dbo.dissimilarity'. The duplicate key value is (0).DB-Lib error message 20018, severity 14:\nGeneral SQL Server error: Check messages from the SQL Server\n")

In [257]:
# insert 0 representing test_data
sq.run_sql_query(
"""
SET IDENTITY_INSERT models ON;


INSERT INTO models (m_id, model_name, model_link, class_map)
VALUES (0, 'random_sample', 'none', 'none');
SET IDENTITY_INSERT models OFF;
"""
)

  sq.run_sql_query(


Run stored procedure to gather test set of 100,000 images.

In [258]:
sq.create_alter_stored_procedure('GENERATE_RANDOM_TEST_SET')

Using preset file to create procedure GENERATE_RANDOM_TEST_SET: C:\Users\clair\HF\Pivot_App\PIVOT\PIVOT\utils\stored_procedures\Generate_Random_Test_Set.sql
/*
Name: GENERATE_RANDOM_TEST_SET
Description: This stored procedure generates a random test set of specified size from a
             pool of available images, excluding any image IDs provided in @IMAGE_IDS.
Parameters:
- @TEST_SIZE: Integer value specifying the size of the test set to be generated.
- @IMAGE_IDS: Comma-separated string containing image IDs to be excluded from sampling.
*/

CREATE OR ALTER PROCEDURE GENERATE_RANDOM_TEST_SET
    @TEST_SIZE INT,
    @IMAGE_IDS VARCHAR(MAX) -- other image_ids to be excluded from sampling
AS
BEGIN
    DECLARE @EXCLUDE_IDS TABLE (I_ID INT);

    -- Convert comma-separated string to a table variable
    INSERT INTO @EXCLUDE_IDS (I_ID)
    SELECT CAST(value AS INT)
    FROM STRING_SPLIT(@IMAGE_IDS, ',');

    -- Common Table Expression (CTE) to retrieve existing images
    WITH EXISTING_I

Get list of train_ids that can't be used for metric evaluation.

In [259]:
train_ids = [i for i in range(1, 399777)]

In [260]:
sq.generate_random_evaluation_set(train_ids = train_ids)

In [261]:
sq.run_sql_query("select count(*) from metrics;")

Unnamed: 0,Unnamed: 1
0,100000


### Prepare for insertion into the DISSIMILARITY table

Insert methods into the dissimilarity table.

In [68]:
sq.run_sql_query(
"""
INSERT INTO dissimilarity (name, formula)
VALUES ('entropy', '-x.T @ np.nan_to_num(np.log(x))')
"""
)

  sq.run_sql_query(


In [172]:
sq.run_sql_query("select * from dissimilarity")

Unnamed: 0,d_id,name,formula
0,0,random_sample,none
1,1,entropy,-x.T @ np.nan_to_num(np.log(x))


## 6. Insert into METRICS
---

Let's calculate metrics and insert into the metrics for all images in predictions.

### train_data

In [262]:
def least_confident_score(x):
    return 1 - np.max(x)

def least_margin_score(x):
    sort_x = np.sort(x)
    return 1 - (sort_x[-1] - sort_x[-2])

def entropy_score(x):
    return -x.T @ np.nan_to_num(np.log(x))
def get_score(x, f):
    try:
        s = f(x)
    except:
        print(f"Error computing {f.__name__}")
        s = -1
    return s

In [263]:
train_df_merged.head()

Unnamed: 0,filepath,pred_label,Chloro,Cilliate,Crypto,Diatom,Dictyo,Dinoflagellate,Eugleno,Other,Prymnesio,Null,i_id
10000,ml/D20151109T032543_IFCB107/IFCB107D20151109T0...,Other,0.01340362,7.251423e-09,0.0002045121,0.001316,6.023725e-06,0.031816,8.516476e-05,0.900335,0.001642,0.051192,1
10001,ml/D20170913T184403_IFCB107/IFCB107D20170913T1...,Null,8.667803e-08,2.886774e-08,1.139554e-07,0.002341,6.971248e-06,0.000263,9.341269e-07,0.01062,0.000444,0.986324,2
10002,ml/D20151107T163826_IFCB107/IFCB107D20151107T1...,Other,5.006306e-05,2.241651e-16,2.092107e-06,1.6e-05,5.580865e-11,0.001865,1.348206e-06,0.995065,0.002943,5.8e-05,3
10003,ml/D20160602T190205_IFCB107/IFCB107D20160602T1...,Other,0.0005993233,1.755176e-07,1.634909e-05,0.003645,4.334597e-07,0.372081,0.0001838412,0.612976,0.001276,0.009223,4
10004,ml/D20151107T165939_IFCB107/IFCB107D20151107T1...,Other,0.07506377,3.889124e-08,0.0003356065,0.000456,2.155542e-05,0.008954,2.321053e-05,0.866775,0.04694,0.00143,5


In [264]:
%%time
entropy_train = train_df_merged[['i_id']]
entropy_train['m_id'] = 1
entropy_train['d_id'] = 1
entropy_train['d_value'] = train_df_merged[class_labels].apply(lambda x: get_score(x,entropy_score), axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  result = getattr(ufunc, method)(*inputs, **kwargs)


CPU times: total: 42.5 s
Wall time: 1min 16s


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [265]:
%%time
idu.bulk_insert_data(table_name="metrics", data=entropy_train.to_dict(orient='records'))

0it [00:00, ?it/s]

CPU times: total: 17.9 s
Wall time: 1min 45s


## New images

In [266]:
new_scores= pd.read_parquet("../pivot/data/inventory_df_with_scores.parquet.gzip")
new_scores.shape

(1056000, 8)

In [267]:
new_scores.tail()

Unnamed: 0,Index,image_path,pred_label,pred_class,index,entropy_score,least_confident_score,least_margin_score
1055995,1103035,/Users/alisonchase/Documents/IFCB/NAAMES/NAAME...,2,Crypto,1103035,0.234404,0.057854,0.112909
1055996,1103036,/Users/alisonchase/Documents/IFCB/NAAMES/NAAME...,7,Other,1103036,1.12972,0.399378,0.606505
1055997,1103037,/Users/alisonchase/Documents/IFCB/NAAMES/NAAME...,7,Other,1103037,1.289111,0.38593,0.545256
1055998,1103038,/Users/alisonchase/Documents/IFCB/NAAMES/NAAME...,0,Chloro,1103038,0.526977,0.174938,0.332709
1055999,1103039,/Users/alisonchase/Documents/IFCB/NAAMES/NAAME...,7,Other,1103039,0.795194,0.474722,0.931747


In [268]:
# gather IDs for images
image_ids = sq.run_sql_query("select * from images;")
image_ids.shape

(1382672, 2)

In [269]:
# join image IDs to table and remove train data that might overlap
new_scores['filepath'] = 'ml/' + new_scores['image_path'].str.split("NAAMES_ml/", expand=True)[1]
entropy_new = new_scores[['filepath', 'entropy_score']].merge(image_ids, on='filepath', how='inner').query("i_id>399776")

# assign model and d_id
entropy_new['m_id'] = 1
entropy_new['d_id'] = 1
# reorder to relevant info
entropy_new = entropy_new.rename(columns={"entropy_score":"d_value"})[['i_id','m_id', 'd_id', 'd_value']]
print(entropy_new.shape)
entropy_new.head()

(982892, 4)


Unnamed: 0,i_id,m_id,d_id,d_value
0,909522,1,1,0.201498
1,909823,1,1,0.817881
2,909813,1,1,0.676992
3,909804,1,1,0.089775
4,909361,1,1,0.706994


In [270]:
train_subset = new_scores[['filepath', 'entropy_score']].merge(image_ids, on='filepath', how='inner').query("i_id<=399776")

In [271]:
%%time
idu.bulk_insert_data(table_name="metrics", data=entropy_new.to_dict(orient='records'))

0it [00:00, ?it/s]

CPU times: total: 39.3 s
Wall time: 4min 21s


# Load the Existing Labels for Train Data

In [87]:
train_data = pd.read_csv("../PIVOT/data/model-summary-cnn-v1-b3.csv")
train_data["filepath"] = 'ml' + train_data['full_path'].str.split("NAAMES_ml/", expand=True)[1]

In [88]:
train_data = train_data[['filepath', 'true_label']].merge(image_ids, on='filepath', how='inner')

In [89]:
train_data.shape

(0, 3)

Gather the class label mapping from the `models` table.

In [90]:
class_map = sq.run_sql_query("select * from models where m_id = 1;")

In [91]:
import ast
class_map = ast.literal_eval(class_map.class_map[0])
class_map

{0: 'Chloro',
 1: 'Cilliate',
 2: 'Crypto',
 3: 'Diatom',
 4: 'Dictyo',
 5: 'Dinoflagellate',
 6: 'Eugleno',
 7: 'Other',
 8: 'Prymnesio',
 9: 'Null'}

In [92]:
# use mapper to convert ints to varchar labels.
train_data['label'] = train_data['true_label'].map(class_map)

Insert an "unknown labeler" to the users table to allow us to insert the training data labels with their respective experience.
I set it as 3 because I assume it was grad students who did it. 

In [93]:
sq.run_sql_query("""
Insert into users (email, name, experience, lab)
VALUES ('initial@unknown.com', 'Initial Labelers', 3, 'Labelers who annotated data for model cnn-v1-b3');
""")

  sq.run_sql_query("""


In [94]:
sq.run_sql_query("select * from users;")

Unnamed: 0,u_id,email,name,experience,lab
0,1,initial@unknown.com,Initial Labelers,3,Labelers who annotated data for model cnn-v1-b3


Get the schema for `labels`.

In [95]:
sq.run_sql_query("""
SELECT column_name, data_type, character_maximum_length
FROM information_schema.columns
WHERE table_name = 'labels';
""")

Unnamed: 0,column_name,data_type,character_maximum_length
0,i_id,int,
1,u_id,int,
2,weight,int,
3,date,datetime,
4,label,varchar,255.0


Now, let's assume the `u_id` for each of these labels to be 1.

In [96]:
from datetime import datetime

In [97]:
train_data['u_id'] = 1
# default user experience is 3
train_data['weight'] = 3
# set labeling date to last year
train_data['date'] = str(datetime(2023, 1, 1, 0, 0, 0))

# reorder columns
train_data = train_data[['i_id', 'u_id', 'weight', 'date','label']]

train_data.head()

Unnamed: 0,i_id,u_id,weight,date,label


In [98]:
%%time
# insert into labels
idu.bulk_insert_data(table_name="labels", data=train_data.to_dict(orient='records'))

CPU times: total: 15.6 ms
Wall time: 150 ms
