<a href="https://colab.research.google.com/github/jb-diplom/phd/blob/main/NLM_Trainer4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#The Effect of Humour in Political Messaging: 
An investigation combining fine-tuned neural language models and social network analysis<br>
by<br>
Janice Butler: University of Amsterdam, Master Thesis 2021

## Introduction
This notebook implements the fine-tuning of various neural language models (NLMs) based on a new corpus of annotated humorous texts.
Two classifications are made

1.   Degree of humour
2.   Comic styles

## Method 
The training script is modified from [run_glue.py](https://huggingface.co/transformers/examples.html#glue). 
The training is automatically tracked in the Weights & Biases dashboard. 

### Supervised Fine-Tuning

This script fine-tunes NMLs on corpora scraped from several sub-reddits:
* https://www.reddit.com/r/Jokes/
* https://www.reddit.com/r/satire/
* https://www.reddit.com/r/Showerthoughts/
* https://www.reddit.com/r/SurrealHumor

and from twitter:
* https://twitter.com/midnight

For non-humorous texts an equal amount of data was taken from these serious news outlets:
* https://twitter.com/AP
* https://twitter.com/BBCworld
* https://twitter.com/ITN
* https://twitter.com/ITVnews
* https://twitter.com/SkyNewsPolitics
* https://twitter.com/TheEconomist

### Annotation
The reddit data is automatically annotated into 5 grades  according to the up-votes per subreddit. All other annotation was achieved through manual categorisation, for humour degree on the https://twitter.com/midnight tweets and in all cases for type of humour. The categories being:
* Serious
* Fun
* Benevolent humour
* Wit
* Nonsense
* Irony
* Satire
* Sarcasm
* Cynicism

## Install dependencies

Pre-Trained Neural Language Models (NLMs) are taken from [the repository at Huggingface](https://huggingface.co/models). The generic Huggingface [Transformers API](https://huggingface.co/transformers/)  is used throughout for fine-tuning and the [Huggingface Pipeline API](https://huggingface.co/transformers/main_classes/pipelines.html) is taken for easy utilisation of the finished models

### NLM Training-Performance Monitoring
During fine-tuning a multitude of parameters are relayed to a data-base at https://wandb.ai/site. Additionally the fine-tuned model and all resultant meta-data for later cataloging of results and use with the model are recorded in projects defined [here](https://wandb.ai/jb-diplom) 
 

```
TODO: add screenshot loaded from GIT
```



Install the Hugging Face transformers and Weights & Biases libraries, and the dataset and training script for humour fine-tuning.

## Installation and Import of Required Packages and Libraries

The dependencies are as follows:


* Huggingface framework for loading and training models, preprocessing of data
* Optionally install transformers datasets, but not needed if own data/project data is being used
* Wandb is used for visualization of results on the project dashboard https://wandb.ai/jb-diplom/janice-demo
* sentencepiece is required for deberta models
* General purpose libraries (os, glob, pandas, numpy)
* GUI and visualization libraries (data_table, ipywidgets, plotly, tqdm, matplotlib
* For calculating accuracy of fine-tuned models and visualizing the results , sklearn.metrics is used



In [None]:
#@markdown Do imports
!pip install transformers -qq           # huggingface framework for loading and training models, preprocessing of data
# Uncomment following line to carry out benchmark tests with hf datasets
!pip install transformers datasets -qq  # currently transformers datasets --> add own data
!pip install wandb -qq                  # for visualization of results on the project dashboard https://wandb.ai/jb-diplom/janice-demo
!pip install sentencepiece              # required for deberta
!pip install chart_studio
#!pip install evaluate                   # needed for new run_glue scrtipt
# this was the basis for the inital imlementation
# !wget https://github.com/huggingface/transformers/blob/master/examples/text-classification/run_glue.py -qq

# Weights and Biases logging of training metrics and archiving of training results
import wandb

# General purpose libraries
from   google.colab import drive
import glob
import os
import pandas as pd
import numpy as np
import datetime 

# Visualization libraries
%load_ext google.colab.data_table
from   google.colab import data_table
import ipywidgets as widgets
import plotly.express as px
import plotly.graph_objects as go
from   tqdm.notebook import trange, tqdm
import matplotlib.pyplot as plt # For multi plots

# Hugging face API for loading pre-trained models, fine-tuning and utilization
import transformers
from   transformers import AutoModelForSequenceClassification, AutoConfig, pipeline

# Stuff for calculating accuracy of fine-tuned models
from sklearn.metrics import confusion_matrix
from sklearn.metrics import roc_curve
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import average_precision_score
from sklearn import metrics

# Stuff for displaying metrics of fine-tuned models
from sklearn.metrics import ConfusionMatrixDisplay
from sklearn.metrics import RocCurveDisplay
from sklearn.metrics import PrecisionRecallDisplay
from sklearn.metrics import precision_recall_fscore_support

## API Key
The following calls registers this run at Weights and Biases github unless a session is already active.
Optionally, we can set environment variables to customize W&B logging. See [documentation](https://docs.wandb.com/library/integrations/huggingface).

### Google Drive
The project data is hosted on GDrive to enable an easy interface with [Google Colab](https://colab.research.google.com/). The training data and results are taken from and stored to directories on GDrive, which has to be mounted and requires appropriate credentials.

In [None]:
#@markdown Connect to wandb
os.environ['WANDB_NOTEBOOK_NAME'] = 'JanicesPhD'
wandb.login(relogin='true')

In [None]:
#@markdown Mount GDrive
drive.mount('/content/gdrive',True)
file_list = glob.glob("/content/gdrive/MyDrive/ColabNotebooks/Visualization/data/*")

# New Section

In [None]:
#@markdown Set some global values for consistency of output styling
plot_bgcolor='rgb(150,150,160)'
cmap='viridis'
color_palette_r = px.colors.sequential.Viridis_r
color_palette   = px.colors.sequential.Viridis

# Specify Parameters and Train Model

Here you can choose which pre-trained NL model to fine-tune. Further options are:

*   Which training-data to use
*   Which project to save run-time data to
*   The GLUE-Task to use
*   Initial learning rate
*   Number of epochs to train
*   Stepsize for logging
*   Whether to freeze layers
*   Testrun with mini dataset or not









In [None]:
#@title Enter Parameters for Training { vertical-output: true, form-width: "50%", display-mode: "form" }

#@markdown Specify Parameters for Training
#@markdown ---

# Take viable names from https://huggingface.co/transformers/pretrained_models.html
Comment = "policy-noeu-roberta-large3e_mini" #@param {type:"string"}
Model = "roberta-large" #@param ["bert-base-uncased", "distilbert-base-uncased", "gpt2", "distilgpt2", "gpt2-medium", "xlnet-base-cased", "roberta-base", "distilroberta-base", "t5-base", "microsoft/deberta-base", "google/electra-base-discriminator", "google/electra-large-discriminator", "vinai/bertweet-base", "nghuyong/ernie-3.0-base-zh", "nghuyong/ernie-2.0-large-en", "nghuyong/ernie-2.0-en", "distilgpt2", "gpt2-large", "roberta-large"] {allow-input: true}
GLUE_Task = "" #@param ["", "cola", "mnli", "mrpc", "qnli", "qqp", "rte", "sst2", "stsb", "wnli", "GPT2"]
Initial_Learn_Rate = 2e-5 #@param {type: "number"}

NrEpochs =   3#@param {type: "number"}
Do_Train = True #@param {type:"boolean"}
Do_Eval = True #@param {type:"boolean"}
Do_Predict = True #@param {type:"boolean"}
 

#@markdown ---
#@markdown Parameters for Quick Tests
#@markdown ---
do_quick_test = True #@param {type:"boolean"}
Freeze_Layers = False #@param {type:"boolean"}
max_train_samples = 30000 #@param {type:"slider", min:100, max:100000, step:100}
max_val_samples = 3000 #@param {type:"slider", min:10, max:10000, step:10}
max_test_samples = 3000 #@param {type:"slider", min:10, max:10000, step:10}
#Percent_of_Trainingdata_to_use = 10 #@param {type:"slider", min: 5, max:100, step:5}

#@markdown ---
#@markdown Visualization Parameters
#@markdown ---

Do_Visualization = True #@param {type:"boolean"}
WandB_Project = "janice-final" #@param ["thesis", "thesis-test-runs", "humour-type", "humour degree", "binary humour degree", "janice-final"] {allow-input: true}
Logging_Steps = 20 #@param {type:"slider", min:10, max:100, step:10}

#@markdown Choose Files for Training 
#@markdown ---
file_ext = ".tsv" #@param [".tsv", ".csv", ".json"] {allow-input: true}
own_modelid=Comment + datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S').replace(' ','_').replace(':','.')
print ("New Model ID:", own_modelid)

In [None]:
#@markdown Which files would you like to use for traiing the NLM?

dir_list = glob.glob("/content/gdrive/MyDrive/ColabNotebooks/Visualization/phd_data/*/")
dir_choice = widgets.Dropdown(options=dir_list,value=dir_list[0])
# file_list = glob.glob("/content/gdrive/MyDrive/ColabNotebooks/Visualization/data/*"+ file_ext)
# file_list = glob.glob("/content/gdrive/MyDrive/ColabNotebooks/Visualization/msc_data/*"+ file_ext)
# file_list = glob.glob("/content/gdrive/MyDrive/ColabNotebooks/Visualization/midnight/*"+ file_ext)

file_list.insert(0,"")
train_file = widgets.Dropdown(options=file_list,value="")
validation_file = widgets.Dropdown(options=file_list,value="")
test_file = widgets.Dropdown(options=file_list,value="")

items = [widgets.Label(value="Source Directory"),
         widgets.Label(value= "Training"),
         widgets.Label(value="Validation"),
         widgets.Label(value="Test")]

left_box = widgets.VBox([items[0], items[1], items[2], items[3]],width='10%')
right_box = widgets.VBox([dir_choice,train_file,validation_file,test_file],width='80%')
file_pickers=widgets.HBox([left_box, right_box], width='100%')
# file_pickers.overflow_x = 'auto'
right_box.overflow_x = 'auto'

def updateDoclist(b):
    train_file.options=glob.glob(dir_choice.value + "*train*" + file_ext)
    validation_file.options=glob.glob(dir_choice.value + "*dev*" + file_ext)
    test_file.options=glob.glob(dir_choice.value + "*test*" + file_ext)

dir_choice.observe(updateDoclist, names='value')
display(file_pickers)
updateDoclist(None)
do_restart = False

In [None]:
#@markdown Optionally specify the restart of an aborted run
#@markdown ---

do_restart = False #@param {type:"boolean"}
specify_model_path = "/content/gdrive/MyDrive/ColabNotebooks/SavedModels/" #@param {type:"string"}

#@markdown **Checkpoint Name** (e.g. `checkpoint-12000`)
specify_checkpoint = "checkpoint-20000" #@param {type:"string"}

#@markdown ---
#@markdown **Model Name**<br>
#@markdown Fetch model Name from the W & B workspace e.g. [here](https://wandb.ai/jb-diplom/janice-final/table?workspace=user-jb-diplom) <br>
#@markdown Should be of the form <Comment> + <Timestamp> (e.g. `electra-L-htype_balanced20e2021-05-16_07.13.44`)
original_modelname = 'bert-base-htype_balanced20e2021-05-16_07.15.13' #@param {type:"string"}


In [None]:
os.environ['TRAINING_FILE']=train_file.value
os.environ['VALID_FILE']=validation_file.value
os.environ['TEST_FILE']=test_file.value
os.environ['WANDB_PROJECT']=WandB_Project 
os.environ['WANDB_WANDB_TAGS']=train_file.value # can add comma separated additions, here
os.environ['WANDB_JOB_TYPE']=("Testrun" if do_quick_test else "Fullrun")
os.environ['WANDB_LOG_MODEL'] = 'true'  # saving the model to wandb
os.environ['WANDB_RUN_ID'] = own_modelid  # set own id to allow reuse in next cell
os.environ['WANDB_WATCH']="all"
os.environ['WANDB_RESUME']="auto"

os.environ['GLUE_TASK_NAME']=GLUE_Task
os.environ['TRAIN_EPOCHS']=str(NrEpochs)
os.environ['MODEL']=Model
os.environ['LR']=str(Initial_Learn_Rate)
os.environ['LS']=str(Logging_Steps)
os.environ['RUNNAME']=Comment
os.environ['REPORT_TO']="wandb"
os.environ['OUTPUT_DIR']="/content/gdrive/MyDrive/ColabNotebooks/SavedModels/"+Model
os.environ['SAVE_STEPS']="50000" # big step to avoid filling disk quota
os.environ['SAVE_LIMIT']="1"    # only one backup (let's live dangerously but save space)
os.environ['BATCH_SIZE']="64"    
os.environ['SEQ_LENGTH']="256"    

if do_restart:
  # to restart from checkpoint use following type of model path
  # os.environ['MODEL']="/content/gdrive/MyDrive/ColabNotebooks/SavedModels/"+Model+"/checkpoint-12000/"
  # https://wandb.ai/jb-diplom/janice-final/runs/electra-L-htype_balanced20e2021-05-16_07.13.44
  own_modelid=original_modelname
  run_id="jb-diplom/"+ WandB_Project + "/" + own_modelid
  run=wandb.init(project=WandB_Project, entity='jb-diplom', id=wandb.Api().run(run_id).id, resume='allow')
  os.environ['MODEL']="/content/gdrive/MyDrive/ColabNotebooks/SavedModels/"+Model+"/" + specify_checkpoint + "/"

if (Do_Visualization):
  os.environ['REPORT_TO']="wandb"

# %env
#  --task_name $GLUE_TASK_NAME \
#  --jb_task_name "t5" \
# --adafactor --lr_scheduler_type cosine --warmup_ratio 0.1 \

if do_quick_test:
  os.environ['TRAIN_SAMPLES']=(str(max_train_samples) if do_quick_test else "")
  os.environ['VAL_SAMPLES']=  (str(max_val_samples)   if do_quick_test else "")
  os.environ['TEST_SAMPLES']= (str(max_test_samples)  if do_quick_test else "")
  
  !python '/content/gdrive/MyDrive/ColabNotebooks/Visualization/run_glue3.py' \
    --model_name_or_path $MODEL \
    --max_val_samples $VAL_SAMPLES \
    --max_test_samples $TEST_SAMPLES \
    --max_train_samples $TRAIN_SAMPLES \
    --tokenizer_name $MODEL \
    --do_train \
    --do_eval \
    --do_predict \
    --max_seq_length $SEQ_LENGTH \
    --per_device_train_batch_size $BATCH_SIZE \
    --per_device_eval_batch_size=$BATCH_SIZE \
    --learning_rate $LR \
    --num_train_epochs $TRAIN_EPOCHS \
    --output_dir $OUTPUT_DIR \
    --overwrite_output_dir \
    --logging_steps $LS \
    --pad_to_max_length \
    --run_name $RUNNAME \
    --report_to $REPORT_TO \
    --train_file $TRAINING_FILE \
    --validation_file $VALID_FILE \
    --test_file $TEST_FILE \
    --save_steps $SAVE_STEPS \
    --save_total_limit $SAVE_LIMIT \
    --fp16 \
    --optim adafactor --lr_scheduler_type cosine \
    --warmup_ratio 0.1 \
    --skip_memory_metrics
else:
  !python '/content/gdrive/MyDrive/ColabNotebooks/Visualization/run_glue3.py' \
    --model_name_or_path $MODEL \
    --tokenizer_name $MODEL \
    --do_train \
    --do_eval \
    --do_predict \
    --max_seq_length $SEQ_LENGTH \
    --per_device_train_batch_size $BATCH_SIZE \
    --per_device_eval_batch_size=$BATCH_SIZE \
    --learning_rate $LR \
    --num_train_epochs $TRAIN_EPOCHS \
    --output_dir $OUTPUT_DIR \
    --overwrite_output_dir \
    --logging_steps $LS \
    --run_name $RUNNAME \
    --report_to $REPORT_TO \
    --train_file $TRAINING_FILE \
    --validation_file $VALID_FILE \
    --test_file $TEST_FILE \
    --save_steps $SAVE_STEPS \
    --save_total_limit $SAVE_LIMIT \
    --fp16 \
    --optim adafactor --lr_scheduler_type cosine \
    --warmup_ratio 0.1 \
    --skip_memory_metrics


# Testing of fine-tuned models
Retrieve model from W&B repository

In [None]:
# run = wandb.init()
run= wandb.init(project=WandB_Project, entity='jb-diplom')
# Take this from just finished run or from one of your 
# favorite fine-tuned models at https://wandb.ai/jb-diplom/janice-full/artifacts

#@markdown Enter Model Type and Name to be retreived from https://wandb.ai
Data_Type = "Policy (no EU)" #@param ["Humour Type", "Humour Degree", "H-Degree (binary)", "Policy Type", "Emotion2", "EU (binary)", "Policy (no EU)"]
use_latest_model = False #@param {type:"boolean"}
if use_latest_model:
  model_id=own_modelid
else:
  model_id = 'policy-noeu-roberta-large5e_all2023-01-05_09.41.04'  #@param {type: "string"}
  own_modelid = model_id

# Set human-readable labels
leng = 1
policy_labels=[]
leng = 1  # 1 for humour stuff, 2 for policy

if Data_Type == "Policy Type" :
  leng = 2  # 1 for humour stuff, 2 for policy
  policy_labels=['10','11','20','30','40','41','50','60','70']
  label_map= {
      "LABEL_0": 'Ext. Rel.',
      "LABEL_1": 'EU',
      "LABEL_2": 'Democracy',
      "LABEL_3": 'Political System',
      "LABEL_4": 'Economy',
      "LABEL_5": 'Growth',
      "LABEL_6": 'Welfare',
      "LABEL_7": 'Society',
      "LABEL_8": 'Social Grps',
      }
elif Data_Type == "Policy (no EU)" :
  leng = 2  # 1 for humour stuff, 2 for policy
  policy_labels=['10','20','30','40','41','50','60','70']
  label_map= {
      "LABEL_0": 'Ext. Rel.',
      "LABEL_1": 'Democracy',
      "LABEL_2": 'Political System',
      "LABEL_3": 'Economy',
      "LABEL_4": 'Growth',
      "LABEL_5": 'Welfare',
      "LABEL_6": 'Society',
      "LABEL_7": 'Social Grps',
      }
elif Data_Type == "Emotion2" :
  policy_labels=['0','1','2','3','4','5','6']
  label_map= {
      "LABEL_0": "anger",
      "LABEL_1": "disgust",
      "LABEL_2": "fear",
      "LABEL_3": "joy",
      "LABEL_4": "neutral",
      "LABEL_5": "sadness",
      "LABEL_6": "surprise"
      }
elif Data_Type == "Humour Type" :
  policy_labels=['0','1','2','3','4','5','6','7','8']
  label_map= {
      "LABEL_0": 'serious',
      "LABEL_1": 'fun',
      "LABEL_2": 'benevolent',
      "LABEL_3": 'wit',
      "LABEL_4": 'nonsense',
      "LABEL_5": 'irony',
      "LABEL_6": 'satire',
      "LABEL_7": 'sarcasm',
      "LABEL_8": 'cynicism'
      }
elif Data_Type == "EU (binary)" :
  policy_labels=['0','1']
  label_map= {
      "LABEL_1": 'EU',
      "LABEL_0": 'non-EU'
      }
else :
  policy_labels=['0','1','2','3','4','5','6','7','8']

names=['serious','fun','benevolent','wit','nonsense','irony','satire','sarcasm','cynicism']
degree_names=['serious','wry smile','smile','grin','very funny','hilarious']
policy_names=['Ext. Rel.','EU','Democracy','Political System','Economy','Growth','Welfare','Society','Social Grps']
emotion_names=['anger','disgust','fear','joy','neutral','sadness','surprise']
binary_degree_names=['serious','funny']
name_lst=list(label_map.values())  # generalized

model_root= 'jb-diplom/' + WandB_Project + '/model-'
model_path= model_root + model_id + ':v0'
print("Retreiving artefact:", model_path)
artifact = run.use_artifact(model_path, type='model')
# artifact = run.use_artifact('jb-diplom/janice-full/model-219xio3e:v0', type='model')
artifact_dir = artifact.download()
print("Model saved locally to:", artifact_dir)

In [None]:
# Load fine-tuned model
from transformers import GPT2TokenizerFast,AutoTokenizer,GPT2ForSequenceClassification
from transformers import AutoModelForSequenceClassification, TextClassificationPipeline

model_path='/content/artifacts/model-' + model_id + ':v0' # The model just downloaded from wandb.io

humour_classif = ''

model = AutoModelForSequenceClassification.from_pretrained(model_path)

tokenizer = AutoTokenizer.from_pretrained(model_path)
humour_classif = TextClassificationPipeline(
    model=model,
    tokenizer=tokenizer,
    framework="pt",
    device=0,
    task='sentiment-analysis',
    # top_k=None
    # return_all_scores = True,
)

multi_humour_classif = TextClassificationPipeline(
    model=model,
    tokenizer=tokenizer,
    framework="pt",
    device=0,
    task='sentiment-analysis',
#    return_all_scores = True,
    # top_k=None
)
  # humour_classif = pipeline('sentiment-analysis',model_path)

# This text classification pipeline can currently be loaded from pipeline() using the following task 
# identifier: "sentiment-analysis"
#@markdown ***Create Pipeline***<br>
#@markdown Enter a test string to check whether the downloaded model is working
test_string = 'i love the european commission'  #@param {type: "string"}

multi_humour_classif(test_string)

In [None]:
#@markdown Load file (specified for testing above) for manual testing
# Take test.tsv and compare expected with actual results
test_df = pd.read_csv(test_file.value, delimiter='\t', header=None, 
                        lineterminator='\n',encoding='utf-8')
cols=['Text','Humour Level']
test_df.columns=cols
# test_df.describe()
data_table.DataTable(test_df, include_index=False, num_rows_per_page=10)

In [None]:
# Do a little test against a chosen test-dataset
#@markdown #### **Attention:** This Operation may take some hours, depending on the quantity of test data and model inference speed!
#@markdown ----
#@markdown ###How many records would you like to test?
#@markdown An entry of -1 implies processing of **ALL** records
#@markdown Which type of model is being analysed?
Data_Type = "Policy (no EU)" #@param ["Humour Type", "Humour Degree", "H-Degree (binary)", "Policy Type", "Emotion2", "EU (binary)", "Policy (no EU)"]
sample_nr =   10000#@param {type: "number"}

# tweet_df.iloc[1:5, 1:1]
content=test_df.iloc[:,0]
labels=test_df.iloc[:,1]  # NB: 2 digit numbers for collapsed policy types
humour=[]
y_true=[]
y_score=[]

# establish humour content of tweet from fine-tuned model
hit=0
miss=0
out_by_one=0

# names=['serious','fun','benevolent','wit','nonsense','irony','satire','sarcasm','cynicism']
# degree_names=['serious','wry smile','smile','grin','very funny','hilarious']
# policy_names=['Ext. Rel.','EU','Democracy','Political System','Economy','Growth','Welfare','Society','Social Grps']
# emotion_names=['anger','disgust','fear','joy','neutral','sadness','surprise']

for i, tweet in tqdm(enumerate(content.head(sample_nr)),total=(len(content) if sample_nr==-1 else sample_nr)):
  # clip to max_seq_length, extract the number from the label of the result 
  cls_val=humour_classif('{:1.512}'.format(tweet))
  val=cls_val[0]['label'][6:7:1]
  try:
    if (int(policy_labels[int(val)]) == int(labels[i][0:leng:1])):
      y_true.append(1)
      hit +=1
    else:
      miss +=1
      y_true.append(0)
      if abs(int(policy_labels[int(val)]) - int(labels[i][0:leng:1])) <3:
        out_by_one += 1
    humour.append(policy_labels[int(val)])
    y_score.append(cls_val[0]['score'])
  except:
    print (".")
    continue

print ("Hits:", hit,"\nMisses:",miss,"\nOut by one:", out_by_one, "\n%-age of hits:", (hit*100)/(hit+miss))
# '{:1.35}'.format('12345678901234567890') use to truncate string


In [None]:
#@title Analyse Results of Training

#@markdown Which type of model is being analysed?
#Data_Type = "Policy Type" #@param ["Humour Type", "Humour Degree", "H-Degree (binary)", "Policy Type"]

#@markdown To assess the statistical success of the fine-tuning, 3 analyses are conducted using the test data-set:
#@markdown  **TODO** add option for humour degree (0to5) or txype (0to8)

#@markdown * **Confusion Matrix**<br>
#@markdown  A confusion matrix is a table that is often used to describe the performance of a classification model (or "classifier") on a set of test data for which the true values are known.
#@markdown Here the classifiers are ether humour type or degree of humour. The correctly predicted results (as compared against the expected results in the test data) are
#@markdown those on the diagonal. The numbers within the matrix represent the following four cases
#@markdown * true positives (TP): These are cases in which the trained NLM predicted the correct value (of type or degree)
#@markdown * true negatives (TN): These are cases in which the trained NLM predicted correctly that the value (of type or degree) is not fitting for the test text
#@markdown * false positives (FP): These are cases in which the trained NLM predicted incorrectly that the value (of type or degree) would match for the test text
#@markdown * false negatives (FN): These are cases in which the trained NLM predicted incorrectly that the value (of type or degree) is not fitting for the test text
#@markdown To help with interpretation of the results, the first matrix is additionally normalized according to the proportion of all values for each indicator counted individually for the test run
#@markdown <br>For more details consult 
#@markdown [scikit Confusion Matrix](https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html)

#@markdown * **ROC Curve**<br>
#@markdown  This is a commonly used graph that summarizes the performance of a classifier over all possible thresholds. It is generated by plotting the True Positive Rate (y-axis) against the False Positive Rate (x-axis) as you vary the threshold for assigning observations to a given class 
#@markdown <br>For more details consult 
#@markdown [scikit ROC](https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html)

#@markdown * **Precision Recall**<br>
#@markdown  Precision-Recall is a useful measure of success of prediction when the classes are very imbalanced. 
#@markdown In information retrieval, precision is a measure of result relevancy, while recall is a measure of how many truly relevant results are returned.
#@markdown The precision-recall curve shows the tradeoff between precision and recall for different threshold. 
#@markdown A high area under the curve represents both high recall and high precision, where high precision relates to a low false 
#@markdown positive rate, and high recall relates to a low false negative rate. High scores for both show that the classifier is returning accurate results 
#@markdown (high precision), as well as returning a majority of all positive results (high recall).<br>
#@markdown Average precision (**AP**) summarizes such a plot as the weighted mean of precisions achieved at each threshold, 
#@markdown with the increase in recall from the previous threshold used as the weight. <br>For more details consult 
#@markdown [scikit Precision Recall](https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html)

clean_h=[]
for val in humour:
  clean_h.append(val.replace('\r',""))
clean_p=[]
pred=test_df['Humour Level'].tolist()[1:len(clean_h)+1:1]
for val in pred:
  clean_p.append(val.replace(' ',"").replace('\r',""))

fig = plt.figure()
cm_n =''
cm = ''
if Data_Type == "Humour Type":
  cm_n=confusion_matrix(clean_p,clean_h, labels=['0','1','2','3','4','5','6','7','8'], normalize= 'pred')
  cm=confusion_matrix(clean_p,clean_h, labels=['0','1','2','3','4','5','6','7','8'])
  cm_n_display = ConfusionMatrixDisplay(cm_n, display_labels=names)
  cm_display = ConfusionMatrixDisplay(cm, display_labels=names)
elif Data_Type == "Humour Degree" :
  cm_n=confusion_matrix(clean_p,clean_h, labels=['0','1','2','3','4','5'], normalize= 'pred')
  cm=confusion_matrix(clean_p,clean_h, labels=['0','1','2','3','4','5'])
  cm_n_display = ConfusionMatrixDisplay(cm_n, display_labels=degree_names)
  cm_display = ConfusionMatrixDisplay(cm, display_labels=degree_names)
elif Data_Type == "Policy Type" :
  cm_n=confusion_matrix(clean_p,clean_h, labels=['10','11','20','30','40','41','50','60','70'], normalize= 'pred')
  cm=confusion_matrix(clean_p,clean_h, labels=['10','11','20','30','40','41','50','60','70'])
  cm_n_display = ConfusionMatrixDisplay(cm_n, display_labels=name_lst)
  cm_display = ConfusionMatrixDisplay(cm, display_labels=name_lst)
elif Data_Type == "Policy (no EU)" :
  cm_n=confusion_matrix(clean_p,clean_h, labels=['10','20','30','40','41','50','60','70'], normalize= 'pred')
  cm=confusion_matrix(clean_p,clean_h, labels=['10','20','30','40','41','50','60','70'])
  cm_n_display = ConfusionMatrixDisplay(cm_n, display_labels=name_lst)
  cm_display = ConfusionMatrixDisplay(cm, display_labels=name_lst)
elif Data_Type == "Emotion2" :
  cm_n=confusion_matrix(clean_p,clean_h, labels=['0','1','2','3','4','5','6'], normalize= 'pred')
  cm=confusion_matrix(clean_p,clean_h, labels=['0','1','2','3','4','5','6'])
  cm_n_display = ConfusionMatrixDisplay(cm_n, display_labels=name_lst)
  cm_display = ConfusionMatrixDisplay(cm, display_labels=name_lst)
else :
  cm_n=confusion_matrix(clean_p,clean_h, labels=['0','1'], normalize= 'pred')
  cm=confusion_matrix(clean_p,clean_h, labels=['0','1'])
  cm_n_display = ConfusionMatrixDisplay(cm_n, display_labels=binary_degree_names)
  cm_display = ConfusionMatrixDisplay(cm, display_labels=binary_degree_names)

fpr, tpr,  _ = roc_curve(y_true, y_score, pos_label=1)
roc_auc = metrics.auc(fpr, tpr)
# fig2, ax2 = plt.subplots(figsize=(10,10))
roc_display = RocCurveDisplay(fpr=fpr, tpr=tpr, roc_auc=roc_auc, estimator_name="")

avg_precision = average_precision_score(y_true, y_score)
prec, recall, _ = precision_recall_curve(y_true, y_score, pos_label=1)
pr_display = PrecisionRecallDisplay(precision=prec, recall=recall,average_precision=avg_precision,estimator_name="" )

# Try a multi plot :-)
# fig, (ax1, ax2, ax3, ax4) = plt.subplots(4, 1, figsize=(20, 20))
fig, axs = plt.subplots(2, 2, figsize=(20, 20))

cm_n_display.plot(ax=axs[0,0],cmap='magma')
axs[0,0].set_title('Normalized Confusion Matrix')
cm_display.plot(ax=axs[0,1],cmap='magma')
axs[0,1].set_title('Raw Confusion Matrix')
roc_display.plot(ax=axs[1,0])
axs[1,0].set_title('ROC Plot')
pr_display.plot(ax=axs[1,1])
axs[1,1].set_title('Precision Recall Plot')
# plt.show()

# create image to save to wandb
plotname= "conf_matrix-" + own_modelid + ".png"
plot_dir = "/content/gdrive/MyDrive/ColabNotebooks/Visualization/plots/"
fig.savefig(plot_dir + plotname)

# TODO : extend this to multi-class precision recall. See example here
# https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html#sphx-glr-auto-examples-model-selection-plot-precision-recall-py


In [None]:
#@title Calculate Metrics for Precision, Recall and F1

#@markdown Compute precision, recall, F-measure and support for each class <br>The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives
#@markdown * The precision is intuitively the ability of the classifier not to label as positive a sample that is negative.
#@markdown * The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives. <br>The recall is intuitively the ability of the classifier to find all the positive samples.
#@markdown * The F-beta score can be interpreted as a weighted harmonic mean of the precision and recall, where an F-beta score reaches its best value at 1 and worst score at 0.
#@markdown * The F-beta score weights recall more than precision by a factor of beta. beta == 1.0 means recall and precision are equally important.

#@markdown **NB:** The metrics and image of the Confusion Matrix, ROC plot and Precision/Recall plot are all archived to WandB
# print( "F1 none:",f1_score(clean_h,clean_p, average=None))
from sklearn.metrics import precision_recall_fscore_support, matthews_corrcoef
print( "Prec, Recall, F-Score (macro):",precision_recall_fscore_support(clean_h, clean_p, average='macro'))
print( "Prec, Recall, F-Score (micro):",precision_recall_fscore_support(clean_h, clean_p, average='micro'))
prf_weighted = precision_recall_fscore_support(clean_h, clean_p, average='weighted')
print( "Prec, Recall, F-Score (weighted):",prf_weighted)
print( "Prec, Recall, F-Score (per category):",precision_recall_fscore_support(clean_h, clean_p, average=None))

mcc = matthews_corrcoef(clean_h, clean_p)
print("Matthews Coeff:",mcc)
#@markdown ---
#@markdown If required, specify that plots and metrics be saved together with the model at [W & B](https://wandb.io)

save_to_WandB = True #@param {type:"boolean"}

if save_to_WandB:
  # relay metrics to WandB
  api = wandb.Api()
  # run = api.run("jb-diplom/"+ WandB_Project + "/" + own_modelid)

  run_id="jb-diplom/"+ WandB_Project + "/runs/" + own_modelid
  run = api.run(run_id)

  run.summary["precision"] = prf_weighted[0]
  run.summary["recall"] = prf_weighted[1]
  run.summary["MCC"] = mcc
  run.summary["F1"] = prf_weighted[2]
  run.summary.update()

  # Send saved image of plots to WandB
  # run_id="jb-diplom/"+ WandB_Project + "/" + own_modelid
  wandb.init(project=WandB_Project, entity='jb-diplom', id=wandb.Api().run(run_id).id, resume='allow')
  im = plt.imread(plot_dir + plotname) # from previous cell
  wandb.log({"img": [wandb.Image(im, caption=plotname)]})
  wandb.finish()


In [None]:
# Read in tweets, do humour evaluation and write results back to df and save new csv
#@title Optionally Enter Alternative File Name for Conducting Test{ vertical-output: true, form-width: "50%", display-mode: "form" } (Prototype to replace the equivalent 2 cells down below)

#@markdown #### Specify Source for inference (skip this if you wish to continue with the results obtained from the previous step). You should choose either:
#@markdown - From the raw data directory, a set of tweets from one of various groupings
#@markdown 
#@markdown or
#@markdown - From the results directory, a set of tweets which has already had results fro a previous infernce run added
#@markdown 
#@markdown **Tip**: reload the cell to repopulate the selection boxes

Options = "eu_influencer_tweets" #@param ["comedian-tweets250", "journalist-tweets", "ukmp_tweets", "eu_committee_tweets", "eu_influencer_tweets"] {allow-input: true}
basedir = glob.glob("/content/gdrive/MyDrive/ColabNotebooks/Visualization/phd_data/*/")
dir_choice2 = widgets.Dropdown(options=dir_list,value=dir_list[0])
file_list.insert(0,"")
file_names = widgets.Dropdown(options=file_list,value="")
items = [widgets.Label(value="Directory"),
         widgets.Label(value= "Source")]
def updateSourcelist(b):
    file_names.options=glob.glob(dir_choice2.value + Options + "*" + file_ext)

left_box = widgets.VBox([items[0], items[1]],width='10%')
right_box = widgets.VBox([dir_choice2,file_names],width='80%')
file_pickers2=widgets.HBox([left_box, right_box], width='100%')
right_box.overflow_x = 'auto'

dir_choice2.observe(updateSourcelist, names='value')
display(file_pickers2)
updateSourcelist(None)


In [None]:
#@title Read and Display Sample of Loaded Data for Analysis

tweet_df = pd.read_csv(file_names.value, delimiter='\t',encoding='utf-16')
# tweet_df.pop(tweet_df.columns[0])
# tweet_df.pop(tweet_df.columns[3])
data_table.DataTable(tweet_df, include_index=False, num_rows_per_page=10)

In [None]:
use_all_samples = False #@param {type:"boolean"}
sample_nr = 1000 #@param {type:"slider", min:100, max:250000, step:100}
output_max= sample_nr

if use_all_samples:
  sample_nr = -1
  output_max = len(tweet_df)



In [None]:
#@title Evaluate in Pipeline and Save (sample nr. or all) Results to GDrive
#@markdown The results are added as a column corresponding to the current trait (e.g. Humour, Policy, Emotion, EU) to the test (input) data and are saved with a suffix (the model name) as `.tsv` file in the adjacent 'results' directory

content=tweet_df.iloc[:,1]  # grab the content column (it should be the 2nd)
trait=[]
score=[]

for tweet in tqdm(content.head(sample_nr)):
  res = humour_classif(tweet)[0]
  trait.append(res['label'])
  score.append(res['score'])

hlen=len(trait)
for i in range (hlen, len(tweet_df)):
  trait.append("not evaluated")
  score.append(0)

import re
Data_Type_name=re.sub(r'\W+', '', Data_Type) # remove any non-alphanumerics

# add trait column to dataframe
tweet_df[Data_Type_name]=trait
tweet_df[Data_Type_name+'_score']=score
for feature in label_map.keys():
  tweet_df[Data_Type_name] = tweet_df[Data_Type_name].replace(feature,label_map.get(feature),regex=True)

# save results
tweet_file_out=dir_choice2.value + '../results/' + Options +'_'+ Data_Type_name +'_' + model_id + ".tsv"
print ("Saving to:", tweet_file_out)

# tweet_df.to_csv(tweet_file_out, sep='\t', index=False, lineterminator='\n',encoding='utf-16')
tweet_df.to_csv(tweet_file_out, sep='\t', index=False,encoding='utf-16')

data_table.DataTable(tweet_df.head(min(sample_nr,20000)), include_index=False, num_rows_per_page=10)

In [None]:
# Read in tweets, do humour evaluation and write results back to df and save new csv
#@title (Legacy) Optionally Enter Alternative File Name for Conducting Test{ vertical-output: true, form-width: "50%", display-mode: "form" }

#@markdown #### Specify Source for Humour Tests (skip this if you wish to continue with the results obtained from the previous step)
#@markdown ---

# basedir="/content/gdrive/MyDrive/ColabNotebooks/Visualization/phd_data/raw_tweets/"#@param {type:"string"}
basedir = "/content/gdrive/MyDrive/ColabNotebooks/Visualization/phd_data/raw_tweets/" #@param ["/content/gdrive/MyDrive/ColabNotebooks/Visualization/phd_data/raw_tweets/"] {allow-input: true}
file_name = "eu_influencer_tweets" #@param ["comedian-tweets250", "journalist-tweets", "ukmp_tweets", "eu_committee_tweets", "eu_influencer_tweets"] {allow-input: true}
ext = ".tsv" #@param [".tsv", ".csv"]
tweet_file=basedir + file_name + ext
print (tweet_file)
dtyps={'tweetId':str, 'content':str, 'username':str,'followers':int, 'conversationId':str, 
         'replyCount':int, 'retweetCount':int, 'likeCount':int, 'quoteCount':int}

# tweet_df = pd.read_csv(tweet_file, delimiter='\t', header=None, dtype=dtyps,lineterminator='\n',encoding='utf-16')
# tweet_df = pd.read_csv(tweet_file, delimiter='\t', header=None, dtype=dtyps,encoding='utf-16')
# cols=['tweetId', 'content', 'username','followers', 'conversationId', 'replyCount', 'retweetCount', 'likeCount', 'quoteCount']

tweet_df = pd.read_csv(tweet_file, delimiter='\t',encoding='utf-16')
# tweet_df.pop(tweet_df.columns[0])
# tweet_df.pop(tweet_df.columns[3])

tweet_df.head(1000)

In [None]:
#@title (Legacy) Evaluate in Pipeline and Save (sample nr. or all) Results to GDrive
#@markdown The results are added as a column corresponding to the current trait (e.g. Humour, Policy, Emotion, EU) to the test (input) data and are saved with a suffix (the model name) as `.tsv` file in the adjacent 'results' directory

content=tweet_df.iloc[:,1]  # grab the content column (it should be the 2nd)
trait=[]
score=[]

for tweet in tqdm(content.head(sample_nr)):
  res = humour_classif(tweet)[0]
  trait.append(res['label'])
  score.append(res['score'])

hlen=len(trait)
for i in range (hlen, len(tweet_df)):
  trait.append("not evaluated")
  score.append(0)

# add trait column to dataframe
tweet_df[Data_Type]=trait
tweet_df[Data_Type+'_score']=score
for feature in label_map.keys():
  tweet_df[Data_Type] = tweet_df[Data_Type].replace(feature,label_map.get(feature),regex=True)

# save results
tweet_file_out=basedir + '../results/' + file_name +'_' + model_id + ".tsv"
print ("Saving to:", tweet_file_out)

# tweet_df.to_csv(tweet_file_out, sep='\t', index=False, lineterminator='\n',encoding='utf-16')
tweet_df.to_csv(tweet_file_out, sep='\t', index=False,encoding='utf-16')

data_table.DataTable(tweet_df.head(min(sample_nr,20000)), include_index=False, num_rows_per_page=10)

In [None]:
#@title Display distribution of data for each type of humour
vc=tweet_df[Data_Type_name].value_counts()
vc

In [None]:
#@title Display resulting data
#@markdown Assuming very large numbers of processed tweets (> 20'000) you can specify a higher 
#@markdown threshold in terms of numbers of likes to filter out the less relevant data.<br><br>
#@markdown Choose the minimum number of likes to reduce the displayed and visualized data
tweet_df = tweet_df.dropna()
min_nr_of_likes = 900 #@param {type:"slider", min:100, max:50000, step:100}

nr_likes_str="{}".format(min_nr_of_likes)
print(nr_likes_str)
query_str = "'" + Data_Type_name +"'!='not evaluated' & likeCount > " + nr_likes_str
print (query_str)
subdf=tweet_df.query(query_str)
data_table.DataTable(subdf.head(20000), include_index=False, num_rows_per_page=10)

In [None]:
#@title Choose Sample of Data for Visualization

# limit of absolute_max=20000 due to colab DataTable

import warnings
absolute_max=20000

display_labels=''

if Data_Type == "Humour Type":
  display_labels=names
elif Data_Type == "Humour Degree" :
  display_labels=degree_names
elif Data_Type == "Emotion2" :
  display_labels=emotion_names
else :
  display_labels=binary_degree_names

as_many_as_possible = True #@param {type:"boolean"}
sample = 1300 #@param {type:"slider", min:100, max:20000, step:100}
op_max= sample

if as_many_as_possible:
  op_max = 20000

likes_sorted=np.sort(tweet_df['likeCount'])
num_tweets=len(tweet_df)
divisor = min (absolute_max, op_max)

last_like=0
if absolute_max < num_tweets:
  chunk_lst=np.array_split(likes_sorted,int(num_tweets/divisor))
  last_like=int(chunk_lst[-1][0])

tweet_df = tweet_df.dropna()
subdf=tweet_df.loc[(tweet_df[Data_Type_name] !='not evaluated') & (tweet_df['likeCount'] > last_like)]

# add line breaks to make tooltips readable
with warnings.catch_warnings():
    warnings.simplefilter('ignore')
    subdf.content = subdf.content.str.wrap(80)
    subdf.content = subdf.content.apply(lambda x: x.replace('\n', '<br>'))

fig1 = px.scatter(subdf, x="retweetCount", y="username", size="likeCount",
                 color=Data_Type_name, hover_name="content", facet_col=Data_Type_name, size_max=60, log_x=True, 
                 category_orders = {Data_Type_name: display_labels},
                 color_discrete_sequence=color_palette, height=1500,opacity=0.5, facet_col_wrap=3)
fig1.update_layout(
    plot_bgcolor=plot_bgcolor,
    title="Tweet Distribution in usage of "+Data_Type+ " amongst leading Twitterers",
    xaxis_title="Retweets (log-scale)",
    yaxis_title="Twitter Handles",
  )
#fig1.for_each_annotation(lambda a: a.update(text=display_labels[int(a.text.split("=")[-1][6:7])]))

fig1.show()

In [None]:
#@title Calculate Statistics 
#@markdown * For each type of Tweet propagation (reply, retweet, like, quote) 
#@markdown * And each trait type

agg_fns = [np.mean,np.std,np.sum, np.median, np.count_nonzero, min, max]
table = pd.pivot_table(tweet_df, values=['replyCount', 'retweetCount', 'likeCount', 'quoteCount'], index=[Data_Type_name],
                    aggfunc={'replyCount': agg_fns,
                             'retweetCount': agg_fns,
                             'likeCount': agg_fns,
                             'quoteCount': agg_fns,
                             })
table=table.loc[(table.index != "not evaluated")]
# table.head(10)
# print(table.columns)

# need to rotate 'replyCount', 'retweetCount', 'likeCount', 'quoteCount' into one column and put the averages in a new column
dic ={Data_Type_name:[], "propagation type":[],"mean value":[],"Sum":[],"SD":[],"Median":[],"Count":[],"Min":[],"Max":[]}
for htype in table.iterrows():
  # print ("Htype:",htype)
  for i in range (0,4) : dic[Data_Type_name].append(htype[0])
  dic["propagation type"].append('likes')
  dic["propagation type"].append('quotes')
  dic["propagation type"].append('replies')
  dic["propagation type"].append('retweets')
  for prop_type in ['likeCount','quoteCount','replyCount','retweetCount']:
    dic["mean value"].append(int(htype[1][prop_type,'mean']))
    dic["SD"].append(int(htype[1][prop_type,'std']))
    dic["Median"].append(int(htype[1][prop_type,'median']))
    dic["Sum"].append(int(htype[1][prop_type,'sum']))
    dic["Count"].append(int(htype[1][prop_type,'count_nonzero']))
    dic["Min"].append(int(htype[1][prop_type,'min']))
    dic["Max"].append(int(htype[1][prop_type,'max']))

df_avg=pd.DataFrame(dic)

df_avg.head(24)

In [None]:
#@title Visualize average numbers of Tweet propagations per humour type
with_error_bars = False #@param {type:"boolean"}

from plotly.graph_objs import *

display_labels=''

if Data_Type == "Humour Type":
  display_labels=names
  bar_title = "Average Numbers of Tweet Propagations per Humour-Type"
elif Data_Type == "Humour Degree" :
  display_labels=degree_names
  bar_title = "Average Numbers of Tweet Propagations per Humour-Degree"
elif Data_Type == "Emotion2" :
  display_labels=emotion_names
  bar_title = "Average Numbers of Tweet Propagations per Emotion-Type"
else :
  display_labels=binary_degree_names
  bar_title = "Average Numbers of Tweet Propagations when Non-Humorous/Humorous"

ebars = 'SD' if with_error_bars else None

labels={'Humour': "",'propagation type': ''}
fig = px.bar(df_avg, 
                   x='propagation type', 
                   y='mean value', 
                   facet_col="humour",
                  #  histfunc ='avg',
                   color_discrete_sequence=color_palette_r,
                   color='propagation type',
                  #  barmode='group',
                   category_orders = {'Humour': display_labels,
                                      'propagation type': ['likes','retweets','replies','quotes']},
                   labels=labels, 
                   error_y = ebars
                   )
fig.update_xaxes(type='category')

fig.update_layout(
    plot_bgcolor=plot_bgcolor,
    title=bar_title,
    xaxis_title="",
    yaxis_title="Mean Number of Tweet Propagations",
    # legend_title="Legend Title"
  )

fig.show()

In [None]:
#@markdown ##Visualize Distribution of Tweets per Twitter-Handle
#@markdown Select/deselect humour types in the legend to analyse deeper

subdf2=tweet_df.query(Data_Type_name+"!='not evaluated'")
s = subdf2.groupby("username")["followers"].sum().rank(ascending=True)
# s = subdf2.groupby("username").size().reset_index().groupby(['replyCount', 'retweetCount', 'likeCount', 'quoteCount']).sum().rank(ascending=True)

fig = px.histogram(subdf2, 
                   x='username',
                   color=Data_Type_name,
                   color_discrete_sequence=color_palette_r,
                   category_orders = {Data_Type_name: display_labels,
                                      'propagation type': ['likes','retweets','replies','quotes'],
                                      "username":s[s < 100000].sort_values().index.to_list()},
                   orientation='v', height=800,
                   labels=display_labels
                      )
fig.update_layout(
    plot_bgcolor=plot_bgcolor,
    title="Numbers of Tweets amongst leading Twitterers",
    )
fig.show()

# Neuer Abschnitt

# Utility Code

In [None]:
# Invoke to show what gpu is in use
gpu = !nvidia-smi
gpu = '\n'.join(gpu)
print(gpu)

In [None]:
import datasets
import random
import pandas as pd
from IPython.display import display, HTML

def show_random_elements(dataset, num_examples=10):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))
    

In [None]:
%env TRAINING_FILE
from datasets import load_dataset, load_metric
datasets = load_dataset( "csv", delimiter='\t', data_files=[train_file.value, test_file.value,  validation_file.value])

len(datasets)
datasets["train"].column_names 
datasets["train"]
datasets[0]
show_random_elements(datasets,1)
train_file.value
import pandas as pd

# Load the dataset into a pandas dataframe.
df = pd.read_csv(train_file.value, delimiter='\t', header=None, names=['sentence', 'label'])

# Report the number of sentences.
print('Number of training sentences: {:,}\n'.format(df.shape[0]))

# Display 10 random rows from the data.
df.sample(10)

In [None]:
%env
!python '/content/gdrive/MyDrive/ColabNotebooks/Visualization/run_glue2.py' --help
train_file.value

## Visualization of results in dashboard
Analyze results (as they happen) on the project dashboard https://wandb.ai/jb-diplom/janice-demo

### To retrieve models and their metadata from wandb

1.   Go to the artifacts area of wandb (e.g. `https://wandb.ai/jb-diplom/janice-full/artifacts`)
2.   Select the API Tag, which gives the precise code (below) for downloading the model that you need.
3.   Check in the artifacts folder of Colab for the sub-folder (e.g. `model-12ai5jvy:0`) with the model.<br> Right-click it and take a copy of the path
4.   Use the path in the huggingface pipeline constructor e.g.<br>
`model_path="/content/artifacts/model-15ai5jvy:v0"`
`humour_classiffier = pipeline('sentiment-analysis',model_path)`


In [None]:
!pip install jupyter-dash

import plotly.express as px
from jupyter_dash import JupyterDash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
# Load Data
df = px.data.tips()
# Build App
app = JupyterDash(__name__)
app.layout = html.Div([
    html.H1("JupyterDash Demo"),
    dcc.Graph(id='graph'),
    html.Label([
        "colorscale",
        dcc.Dropdown(
            id='colorscale-dropdown', clearable=False,
            value='plasma', options=[
                {'label': c, 'value': c}
                for c in px.colors.named_colorscales()
            ])
    ]),
])
# Define callback to update graph
@app.callback(
    Output('graph', 'figure'),
    [Input("colorscale-dropdown", "value")]
)
def update_figure(colorscale):
    return px.scatter(
        df, x="total_bill", y="tip", color="size",
        color_continuous_scale=colorscale,
        render_mode="webgl", title="Tips"
    )
# Run app and display result inline in the notebook
app.run_server(mode='inline')

[humour-type dashboard](https://wandb.ai/jb-diplom/janice-final/reports/Dashboard-humour-type---Vmlldzo3MjE0NTI?accessToken=7pnh3o16evevab0a8abpt2th6ph0da4xg6575x2hoh3otkp8w1ch4sfbk2l54i0l)

In [None]:
# Display W & B Destop for 
# own_modelid='electra-L-htype_balanced20e2021-05-16_07.13.44'
run_id="jb-diplom/"+ WandB_Project + "/" + own_modelid
print(run_id)
# run=wandb.init(project=WandB_Project, entity='jb-diplom', id=wandb.Api().run(run_id).id, resume='allow')
run

In [None]:
#@title Some Test Code 
#@markdown This uses the ostensible batching API. But it doesn't seem (yet) to have any speed advantage
##

from datasets import Dataset
from time import time

sample_nr=1000  # in practice this and the other variables are already set above

dataset = Dataset.from_pandas(pd.DataFrame(content.head(sample_nr)))
class_names=list(label_map.keys())

start = time()
batch_size = 512 # larger batch size bc distilled model is more memory efficient
preds = []
for i in tqdm(range(0, sample_nr, batch_size)):
    examples = dataset[i:i+batch_size]['Text']
    outputs = humour_classif(examples)
    preds += [class_names.index(o['label']) for o in outputs]
accuracy = np.mean(np.array(preds) == np.array(dataset['Text']))
print(f"Distilled model accuracy: {accuracy*100:0.2f}%")
print(f"Runtime: {time() - start : 0.2f} seconds")