#### Overview 

This notebook is a work in progress. Eventually, the contents will demonstrate an NLP-based drift detection algorithm in action, but until the feature is developed, it shows the loading and use of two datasets to be used in the examples:

- Civil Comments dataset: online comments to be used in toxicity classification problems 
- Amazon Reviews dataset: amazon reviews to be used in a variety of NLP problems

The data is accessed by using the `wilds` library, which contains several such datasets and wraps them in an API as shown below. 

#### Imports

In [1]:
import pandas as pd
# from wilds import get_dataset

from menelaus.experimental.transform import auto_tokenize, extract_embedding, uae_reduce_dimension
from menelaus.experimental.detector import Detector
from menelaus.experimental.alarm import KolmogorovSmirnovAlarm

  from .autonotebook import tqdm as notebook_tqdm


#### Load Data

Note that initially, the large data files need to be downloaded first. Later examples may assume the data is already stored to disk.

In [2]:
# civil comments
# dataset_civil = get_dataset(dataset="civilcomments", download=True, root_dir="./wilds_datasets")
dataset_civil = pd.read_csv('wilds_datasets/civilcomments_v1.0/all_data_with_identities.csv')
dataset_civil = dataset_civil['comment_text'][:5].tolist()

In [3]:
# tokens 
tokenizer = auto_tokenize(model_name='bert-base-cased', pad_to_max_length=True, return_tensors='tf')
tokens = tokenizer(data=dataset_civil)

# embedding (TODO abstract this layers line)
layers = [-_ for _ in range(1, 8 + 1)]
embedder = extract_embedding(model_name='bert-base-cased', embedding_type='hidden_state', layers=layers)

# dimension reduction via Untrained AutoEncoder
uae_reduce = uae_reduce_dimension(enc_dim=32)

# detector + set reference
ks_alarm = KolmogorovSmirnovAlarm()
detector = Detector(alarm=ks_alarm, transforms=[tokenizer, embedder, uae_reduce])
detector.step(dataset_civil)
assert detector.rep_test is None and detector.rep_reference.shape == (5, 32)

# detector + add test (copy reference)  
detector.step(dataset_civil)
assert detector.rep_test.shape == (5, 32)

# TODO - recalibrate and re-evaluate ...

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFBertModel: ['cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing TFBertModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFBertModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions w

[0.       0.01     0.0025   0.       0.       0.01     0.       0.
 0.025625 0.       0.       0.       0.       0.0025   0.       0.
 0.0025   0.01     0.025625 0.03125  0.025625 0.03125  0.       0.
 0.       0.       0.0025   0.       0.       0.       0.       0.      ]
