# HuggingFace Training Baseline

I wanted to create my own baseline for this competition, and I tried to do so "without peeking" at the kernels published by others. Ideally this can be used for training on a Kaggle kernel. Let's see how good we can get. 

This baseline is based on the following notebook by Sylvain Gugger: https://github.com/huggingface/notebooks/blob/master/examples/token_classification.ipynb

I initially started building with Roberta - thanks to Chris Deotte for pointing to Longformer :) The evaluation code is from Rob Mulla.

The notebook requires a couple of hours to run, so we'll use W&B to be able to monitor it along the way and keep the record of our experiments. 

## Setup

In [1]:
SAMPLE = False # set True for debugging

In [2]:
!pip install seqeval -qq # evaluation metrics for training (not the competition metric)
!pip install --upgrade wandb -qq # experiment tracking

In [3]:
!cp -r ../input/nlpaug-from-github/nlpaug-master ./
!pip install nlpaug-master/
!rm -r nlpaug-master

import nlpaug.augmenter.char as nac
import nlpaug.augmenter.word as naw
import nlpaug.augmenter.word.context_word_embs as nawcwe
import nlpaug.augmenter.word.word_embs as nawwe
import nlpaug.augmenter.word.spelling as naws

Processing ./nlpaug-master
  Preparing metadata (setup.py) ... [?25l- done
Building wheels for collected packages: nlpaug
  Building wheel for nlpaug (setup.py) ... [?25l- \ | done
[?25h  Created wheel for nlpaug: filename=nlpaug-1.1.10-py3-none-any.whl size=406197 sha256=25e2f51101140d62656ce8ec8fcb813373afffb9a72d1276f15df6a3c3275030
  Stored in directory: /root/.cache/pip/wheels/43/64/85/ce1afc6a0b63f139f70ea6945d5deebcebed4a875cb186adc8
Successfully built nlpaug
Installing collected packages: nlpaug
Successfully installed nlpaug-1.1.10


In [4]:
# !conda install -y mpi4py 
# !pip -qq install deepspeed

In [5]:
# setup wandb for experiment tracking
# source: https://www.kaggle.com/debarshichanda/pytorch-w-b-jigsaw-starter

import wandb

try:
    from kaggle_secrets import UserSecretsClient
    user_secrets = UserSecretsClient()
    api_key = user_secrets.get_secret("wandb_api")
    wandb.login(key=api_key)
    wandb.init(project="feedback_prize", entity="darek")
    anony = None
except:
    anony = "must"
    print('If you want to use your W&B account, go to Add-ons -> Secrets and provide your W&B access token. Use the Label name as wandb_api. \nGet your W&B access token from here: https://wandb.ai/authorize')

[34m[1mwandb[0m: W&B API key is configured (use `wandb login --relogin` to force relogin)
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mtyaba55[0m (use `wandb login --relogin` to force relogin)
wandb: ERROR Error while calling W&B API: project not found (<Response [404]>)
Thread SenderThread:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/lib/retry.py", line 102, in __call__
    result = self._call_fn(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/internal/internal_api.py", line 146, in execute
    six.reraise(*sys.exc_info())
  File "/opt/conda/lib/python3.7/site-packages/six.py", line 719, in reraise
    raise value
  File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/internal/internal_api.py", line 140, in execute
    return self.client.execute(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/wandb/vendo

Problem at: /tmp/ipykernel_25/3461110190.py 11 <module>


Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/wandb_init.py", line 954, in init
    run = wi.init()
  File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/wandb_init.py", line 614, in init
    backend.cleanup()
  File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/backend/backend.py", line 248, in cleanup
    self.interface.join()
  File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/interface/interface_shared.py", line 467, in join
    super().join()
  File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/interface/interface.py", line 630, in join
    _ = self._communicate_shutdown()
  File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/interface/interface_shared.py", line 464, in _communicate_shutdown
    _ = self._communicate(record)
  File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/interface/interface_shared.py", line 222, in _communicate
    return self._communicate_async(rec, local=local).get(timeout=timeout)
  File

If you want to use your W&B account, go to Add-ons -> Secrets and provide your W&B access token. Use the Label name as wandb_api. 
Get your W&B access token from here: https://wandb.ai/authorize


In [6]:
# CONFIG

EXP_NUM = 4
task = "ner"
model_checkpoint = 'google/bigbird-roberta-base'#"allenai/longformer-base-4096"
max_length = 1024
stride = 128
min_tokens = 6
model_path = f'{model_checkpoint.split("/")[-1]}-{EXP_NUM}'

# TRAINING HYPERPARAMS
BS = 4
GRAD_ACC = 8
LR = 5e-5
WD = 0.01
WARMUP = 0.1
N_EPOCHS = 5

## Data Preprocessing

In [7]:
import pandas as pd

# read train data
train = pd.read_csv('../input/feedbackprize2021aug1/feedback-prize-2021/train.csv')
# train_aug = pd.read_csv('../input/feedbackprize2021aug1/feedback-prize-2021/train_aug.csv')
# train = pd.concat([train, train_aug], axis=0).reset_index(drop=True)
train

Unnamed: 0,id,discourse_id,discourse_start,discourse_end,discourse_text,discourse_type,discourse_type_num,predictionstring
0,423A1CA112E2,1.622628e+12,8.0,229.0,Modern humans today are always on their phone....,Lead,Lead 1,1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1...
1,423A1CA112E2,1.622628e+12,230.0,312.0,They are some really bad consequences when stu...,Position,Position 1,45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
2,423A1CA112E2,1.622628e+12,313.0,401.0,Some certain areas in the United States ban ph...,Evidence,Evidence 1,60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
3,423A1CA112E2,1.622628e+12,402.0,758.0,"When people have phones, they know about certa...",Evidence,Evidence 2,76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 9...
4,423A1CA112E2,1.622628e+12,759.0,886.0,Driving is one of the way how to get around. P...,Claim,Claim 1,139 140 141 142 143 144 145 146 147 148 149 15...
...,...,...,...,...,...,...,...,...
144288,4C471936CD75,1.618153e+12,2234.0,3203.0,if I'm not sure what college I want to attend...,Evidence,Evidence 2,386 387 388 389 390 391 392 393 394 395 396 39...
144289,4C471936CD75,1.618153e+12,3221.0,4509.0,seeking multiple opinions before making a har...,Evidence,Evidence 3,576 577 578 579 580 581 582 583 584 585 586 58...
144290,4C471936CD75,1.618025e+12,4510.0,4570.0,it is better to seek multiple opinions instead...,Position,Position 1,828 829 830 831 832 833 834 835 836 837 838
144291,4C471936CD75,1.618025e+12,4570.0,4922.0,The impact of asking people to help you make a...,Evidence,Evidence 4,839 840 841 842 843 844 845 846 847 848 849 85...


In [8]:
# check unique classes
classes = train.discourse_type.unique().tolist()
classes

['Lead',
 'Position',
 'Evidence',
 'Claim',
 'Concluding Statement',
 'Counterclaim',
 'Rebuttal']

In [9]:
# setup label indices

from collections import defaultdict
tags = defaultdict()

for i, c in enumerate(classes):
    tags[f'B-{c}'] = i
    tags[f'I-{c}'] = i + len(classes)
tags[f'O'] = len(classes) * 2
tags[f'Special'] = -100
    
l2i = dict(tags)

i2l = defaultdict()
for k, v in l2i.items(): 
    i2l[v] = k
i2l[-100] = 'Special'

i2l = dict(i2l)

N_LABELS = len(i2l) - 1 # not accounting for -100

In [10]:
# some helper functions

from pathlib import Path

path = Path('../input/feedbackprize2021aug1/feedback-prize-2021/train')

def get_raw_text(ids):
    with open(path/f'{ids}.txt', 'r') as file: data = file.read()
    return data

In [11]:
# group training labels by text file

df1 = train.groupby('id')['discourse_type'].apply(list).reset_index(name='classlist')
df2 = train.groupby('id')['discourse_start'].apply(list).reset_index(name='starts')
df3 = train.groupby('id')['discourse_end'].apply(list).reset_index(name='ends')
df4 = train.groupby('id')['predictionstring'].apply(list).reset_index(name='predictionstrings')

df = pd.merge(df1, df2, how='inner', on='id')
df = pd.merge(df, df3, how='inner', on='id')
df = pd.merge(df, df4, how='inner', on='id')
df['text'] = df['id'].apply(get_raw_text)

df.head()

Unnamed: 0,id,classlist,starts,ends,predictionstrings,text
0,0000D23A521A,"[Position, Evidence, Evidence, Claim, Counterc...","[0.0, 170.0, 358.0, 438.0, 627.0, 722.0, 836.0...","[170.0, 357.0, 438.0, 626.0, 722.0, 836.0, 101...",[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1...,"Some people belive that the so called ""face"" o..."
1,00066EA9880D,"[Lead, Position, Claim, Evidence, Claim, Evide...","[0.0, 456.0, 638.0, 738.0, 1399.0, 1488.0, 231...","[455.0, 592.0, 738.0, 1398.0, 1487.0, 2219.0, ...",[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1...,Driverless cars are exaclty what you would exp...
2,000E6DE9E817,"[Position, Counterclaim, Rebuttal, Evidence, C...","[17.0, 64.0, 158.0, 310.0, 438.0, 551.0, 776.0...","[56.0, 157.0, 309.0, 422.0, 551.0, 775.0, 961....","[2 3 4 5 6 7 8, 10 11 12 13 14 15 16 17 18 19 ...",Dear: Principal\n\nI am arguing against the po...
3,001552828BD0,"[Lead, Evidence, Claim, Claim, Evidence, Claim...","[0.0, 161.0, 872.0, 958.0, 1191.0, 1542.0, 161...","[160.0, 872.0, 957.0, 1190.0, 1541.0, 1612.0, ...",[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1...,Would you be able to give your car up? Having ...
4,0016926B079C,"[Position, Claim, Claim, Claim, Claim, Evidenc...","[0.0, 58.0, 94.0, 206.0, 236.0, 272.0, 542.0, ...","[57.0, 91.0, 150.0, 235.0, 271.0, 542.0, 650.0...","[0 1 2 3 4 5 6 7 8 9, 10 11 12 13 14 15, 16 17...",I think that students would benefit from learn...


In [12]:
# debugging
if SAMPLE: df = df.sample(n=1000).reset_index(drop=True)

In [13]:
# import re
# # m = re.match(r'\s'\s\w\s'\s', s)

In [14]:
# aug.augment(df['text'][0], n=2)

In [15]:
# augmented_texts = aug.augment(df['text'][0], n=2)
# augmented_texts = [x.replace(" ' ", "@") for x in augmented_texts]
# # m = re.match(r'@.*@', augmented_texts[0])
# # print(m.span())
# augmented_texts

In [16]:
# # bert
# aug = nawcwe.ContextualWordEmbsAug(model_path='../input/huggingface-bert-variants/bert-base-cased/bert-base-cased')
# # augmented_texts = aug.augment(sample, n=3)
# # augmented_texts = [x.replace(" ' ", "'") for x in augmented_texts]
# # print_and_highlight_diff(sample, augmented_texts)

# for sample in df.iterrows():
#     sample = sample[1]
#     t = sample['text']
#     augmented_texts = aug.augment(t, n=1)
#     augmented_texts = [x.replace(" ' ", "@") for x in augmented_texts]
#     m = re.match(r'@.*@', augmented_texts[0])
#     print(m.span())
#     break

In [17]:
# we will use HuggingFace datasets
from datasets import Dataset, load_metric

ds = Dataset.from_pandas(df)
datasets = ds.train_test_split(test_size=0.1, shuffle=True, seed=42)
datasets

DatasetDict({
    train: Dataset({
        features: ['id', 'classlist', 'starts', 'ends', 'predictionstrings', 'text', '__index_level_0__'],
        num_rows: 14034
    })
    test: Dataset({
        features: ['id', 'classlist', 'starts', 'ends', 'predictionstrings', 'text', '__index_level_0__'],
        num_rows: 1560
    })
})

In [18]:
# df['id'].apply(lambda x: str(x)[:12])
# df

In [19]:
# leakを防ぐためにgroupkfold
# df['id_'] = df['id'].apply(lambda x: str(x)[:12])
# from sklearn.model_selection import GroupKFold
# gkf = GroupKFold(n_splits=10)
# for i, j in gkf.split(df, groups=df['id_']):
#     tr_idx, va_idx = i, j
#     break
# tr_df = Dataset.from_pandas(df.iloc[tr_idx])
# va_df = Dataset.from_pandas(df.iloc[va_idx])
# datasets = {'train': tr_df, 'test': va_df}
# datasets

In [20]:
from transformers import AutoTokenizer
    
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, add_prefix_space=True)

Downloading:   0%|          | 0.00/0.99k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/760 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/826k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/775 [00:00<?, ?B/s]

normalizer.cc(51) LOG(INFO) precompiled_charsmap is empty. use identity normalization.


### これは何かわからん

In [21]:
# Not sure if this is needed, but in case we create a span with certain class without starting token of that class,
# let's convert the first token to be the starting token.

e = [0,7,7,7,1,1,8,8,8,9,9,9,14,4,4,4]

def fix_beginnings(labels):
    for i in range(1,len(labels)):
        curr_lab = labels[i]
        prev_lab = labels[i-1]
        if curr_lab in range(7,14):
            if prev_lab != curr_lab and prev_lab != curr_lab - 7:
                labels[i] = curr_lab -7
    return labels

fix_beginnings(e)

[0, 7, 7, 7, 1, 1, 8, 8, 8, 2, 9, 9, 14, 4, 4, 4]

### これでラベルを付けている

In [22]:
# tokenize and add labels
def tokenize_and_align_labels(examples):

    o = tokenizer(examples['text'], truncation=True, padding=True, 
                  return_offsets_mapping=True, max_length=max_length, 
                  stride=stride, return_overflowing_tokens=True)

    # Since one example might give us several features if it has a long context, we need a map from a feature to
    # its corresponding example. This key gives us just that.
    sample_mapping = o["overflow_to_sample_mapping"]
    # The offset mappings will give us a map from token to character position in the original context. This will
    # help us compute the start_positions and end_positions.
    offset_mapping = o["offset_mapping"]
    
    o["labels"] = []

    for i in range(len(offset_mapping)):
                   
        sample_index = sample_mapping[i]

        labels = [l2i['O'] for i in range(len(o['input_ids'][i]))]

        for label_start, label_end, label in \
        list(zip(examples['starts'][sample_index], examples['ends'][sample_index], examples['classlist'][sample_index])):
            for j in range(len(labels)):
                token_start = offset_mapping[i][j][0]
                token_end = offset_mapping[i][j][1]
                if token_start == label_start: 
                    labels[j] = l2i[f'B-{label}']    
                if token_start > label_start and token_end <= label_end: 
                    labels[j] = l2i[f'I-{label}']

        for k, input_id in enumerate(o['input_ids'][i]):
            if input_id in [0,1,2]:
                labels[k] = -100

        labels = fix_beginnings(labels)
                   
        o["labels"].append(labels)
        
    return o

In [23]:
tokenized_datasets = datasets.map(tokenize_and_align_labels, batched=True, \
                                  batch_size=20000, remove_columns=datasets["train"].column_names)

  0%|          | 0/1 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

In [24]:
tokenized_datasets

DatasetDict({
    train: Dataset({
        features: ['attention_mask', 'input_ids', 'labels', 'offset_mapping', 'overflow_to_sample_mapping'],
        num_rows: 14505
    })
    test: Dataset({
        features: ['attention_mask', 'input_ids', 'labels', 'offset_mapping', 'overflow_to_sample_mapping'],
        num_rows: 1616
    })
})

In [25]:
# tokenized_datasets['train']['overflow_to_sample_mapping']

## Model and Training

In [26]:
# we will use auto model for token classification

from transformers import AutoModelForTokenClassification, TrainingArguments, Trainer

model = AutoModelForTokenClassification.from_pretrained(model_checkpoint, num_labels=N_LABELS)

Downloading:   0%|          | 0.00/489M [00:00<?, ?B/s]

Some weights of the model checkpoint at google/bigbird-roberta-base were not used when initializing BigBirdForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BigBirdForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BigBirdForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BigBirdForTokenClassification were no

In [27]:
model_name = model_checkpoint.split("/")[-1]
args = TrainingArguments(
    f"{model_name}-finetuned-{task}",
    evaluation_strategy = "epoch",
    logging_strategy = "epoch",
    save_strategy = "epoch",
    learning_rate=LR,
    per_device_train_batch_size=BS,
    per_device_eval_batch_size=BS,
    num_train_epochs=N_EPOCHS,
    weight_decay=WD,
    report_to='wandb', 
    gradient_accumulation_steps=GRAD_ACC,
    warmup_ratio=WARMUP,
    
    fp16 = True,
    
#     #### THE ONLY CHANGE YOU NEED TO MAKE TO USE DEEPSPEED ########
#     deepspeed=ds_config_dict
)

In [28]:
from transformers import DataCollatorForTokenClassification
# データをバッチにする処理
data_collator = DataCollatorForTokenClassification(tokenizer)

In [29]:
# this is not the competition metric, but for now this will be better than nothing...

metric = load_metric("seqeval")

Downloading:   0%|          | 0.00/2.48k [00:00<?, ?B/s]

### これはこのコンペの評価関数ではない

In [30]:
import numpy as np

def compute_metrics(p):
    predictions, labels = p
    predictions = np.argmax(predictions, axis=2)

    # Remove ignored index (special tokens)
    true_predictions = [
        [i2l[p] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]
    true_labels = [
        [i2l[l] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]

    results = metric.compute(predictions=true_predictions, references=true_labels)
    return {
        "precision": results["overall_precision"],
        "recall": results["overall_recall"],
        "f1": results["overall_f1"],
        "accuracy": results["overall_accuracy"],
    }

### plで学習している？　→　transformersのtrainerやった

In [31]:
trainer = Trainer(
    model,
    args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics, 
)

Using amp fp16 backend


In [32]:
trainer.train()
wandb.finish()

The following columns in the training set  don't have a corresponding argument in `BigBirdForTokenClassification.forward` and have been ignored: offset_mapping, overflow_to_sample_mapping.
***** Running training *****
  Num examples = 14505
  Num Epochs = 5
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 8
  Total optimization steps = 2265
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /usr/local/src/pytorch/aten/src/ATen/native/BinaryOps.cpp:461.)
  return torch.floor_divide(self, other)


Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy
0,1.0058,0.643853,0.221843,0.367104,0.276559,0.79378
1,0.5897,0.580042,0.25458,0.392653,0.308889,0.813247
2,0.4904,0.576332,0.262826,0.4225,0.324062,0.814134
3,0.4087,0.595378,0.252181,0.425384,0.316645,0.80972
4,0.3495,0.620855,0.25805,0.428857,0.322217,0.808093


  args.max_grad_norm,
The following columns in the evaluation set  don't have a corresponding argument in `BigBirdForTokenClassification.forward` and have been ignored: offset_mapping, overflow_to_sample_mapping.
***** Running Evaluation *****
  Num examples = 1616
  Batch size = 4
Saving model checkpoint to bigbird-roberta-base-finetuned-ner/checkpoint-453
Configuration saved in bigbird-roberta-base-finetuned-ner/checkpoint-453/config.json
Model weights saved in bigbird-roberta-base-finetuned-ner/checkpoint-453/pytorch_model.bin
tokenizer config file saved in bigbird-roberta-base-finetuned-ner/checkpoint-453/tokenizer_config.json
Special tokens file saved in bigbird-roberta-base-finetuned-ner/checkpoint-453/special_tokens_map.json
  args.max_grad_norm,
The following columns in the evaluation set  don't have a corresponding argument in `BigBirdForTokenClassification.forward` and have been ignored: offset_mapping, overflow_to_sample_mapping.
***** Running Evaluation *****
  Num examples




VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
eval/accuracy,▁██▆▆
eval/f1,▁▆█▇█
eval/loss,█▁▁▃▆
eval/precision,▁▇█▆▇
eval/recall,▁▄▇██
eval/runtime,▅▁█▅▂
eval/samples_per_second,▄█▁▅▇
eval/steps_per_second,▄█▁▄▇
train/epoch,▁▁▃▃▅▅▆▆███
train/global_step,▁▁▃▃▅▅▆▆███

0,1
eval/accuracy,0.80809
eval/f1,0.32222
eval/loss,0.62086
eval/precision,0.25805
eval/recall,0.42886
eval/runtime,138.0635
eval/samples_per_second,11.705
eval/steps_per_second,2.926
train/epoch,5.0
train/global_step,2265.0


### モデルの保存

In [33]:
trainer.save_model(model_path)

Saving model checkpoint to bigbird-roberta-base-4
Configuration saved in bigbird-roberta-base-4/config.json
Model weights saved in bigbird-roberta-base-4/pytorch_model.bin
tokenizer config file saved in bigbird-roberta-base-4/tokenizer_config.json
Special tokens file saved in bigbird-roberta-base-4/special_tokens_map.json


## Validation

In [34]:
def tokenize_for_validation(examples):

    o = tokenizer(examples['text'], truncation=True, return_offsets_mapping=True, max_length=4096)

    # The offset mappings will give us a map from token to character position in the original context. This will
    # help us compute the start_positions and end_positions.
    offset_mapping = o["offset_mapping"]
    
    o["labels"] = []

    for i in range(len(offset_mapping)):
                   
        labels = [l2i['O'] for i in range(len(o['input_ids'][i]))]

        for label_start, label_end, label in \
        list(zip(examples['starts'][i], examples['ends'][i], examples['classlist'][i])):
            for j in range(len(labels)):
                token_start = offset_mapping[i][j][0]
                token_end = offset_mapping[i][j][1]
                if token_start == label_start: 
                    labels[j] = l2i[f'B-{label}']    
                if token_start > label_start and token_end <= label_end: 
                    labels[j] = l2i[f'I-{label}']

        for k, input_id in enumerate(o['input_ids'][i]):
            if input_id in [0,1,2]:
                labels[k] = -100

        labels = fix_beginnings(labels)
                   
        o["labels"].append(labels)
        
    return o

In [35]:
tokenized_val = datasets.map(tokenize_for_validation, batched=True)
tokenized_val

  0%|          | 0/15 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

DatasetDict({
    train: Dataset({
        features: ['__index_level_0__', 'attention_mask', 'classlist', 'ends', 'id', 'input_ids', 'labels', 'offset_mapping', 'predictionstrings', 'starts', 'text'],
        num_rows: 14034
    })
    test: Dataset({
        features: ['__index_level_0__', 'attention_mask', 'classlist', 'ends', 'id', 'input_ids', 'labels', 'offset_mapping', 'predictionstrings', 'starts', 'text'],
        num_rows: 1560
    })
})

In [36]:
tokenized_val['train'][0]['predictionstrings']

['0 1 2 3 4 5 6 7 8 9 10',
 '11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67',
 '68 69 70 71 72 73 74 75 76 77 78 79 80 81 82',
 '83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100',
 '101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117',
 '118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173',
 '174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192',
 '193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238',
 '239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 

In [37]:
# ground truth for validation

l = []
for example in tokenized_val['test']:
    for c, p in list(zip(example['classlist'], example['predictionstrings'])):
        l.append({
            'id': example['id'],
            'discourse_type': c,
            'predictionstring': p,
        })
    
gt_df = pd.DataFrame(l)
gt_df

Unnamed: 0,id,discourse_type,predictionstring
0,7B5F5B33B566,Lead,0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18...
1,7B5F5B33B566,Position,43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 5...
2,7B5F5B33B566,Evidence,69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 8...
3,7B5F5B33B566,Claim,166 167 168 169 170 171 172 173 174 175 176 17...
4,7B5F5B33B566,Evidence,180 181 182 183 184 185 186 187 188 189 190 19...
...,...,...,...
14461,B3E4B633261B,Claim,94 95 96 97 98 99 100 101 102 103 104 105 106 ...
14462,B3E4B633261B,Evidence,113 114 115 116 117 118 119 120 121 122 123 12...
14463,B3E4B633261B,Counterclaim,126 127 128 129 130 131 132
14464,B3E4B633261B,Rebuttal,133 134 135 136 137 138 139 140 141 142 143 14...


In [38]:
gt_df['id'].nunique()

1560

In [39]:
# visualization with displacy

import pandas as pd
import os
from pathlib import Path
import spacy
from spacy import displacy
from pylab import cm, matplotlib

In [40]:
path = Path('../input/feedbackprize2021aug1/feedback-prize-2021/train')

colors = {
            'Lead': '#8000ff',
            'Position': '#2b7ff6',
            'Evidence': '#2adddd',
            'Claim': '#80ffb4',
            'Concluding Statement': 'd4dd80',
            'Counterclaim': '#ff8042',
            'Rebuttal': '#ff0000',
            'Other': '#007f00',
         }

def visualize(df, text):
    ents = []
    example = df['id'].loc[0]

    for i, row in df.iterrows():
        ents.append({
                        'start': int(row['discourse_start']), 
                         'end': int(row['discourse_end']), 
                         'label': row['discourse_type']
                    })

    doc2 = {
        "text": text,
        "ents": ents,
        "title": example
    }

    options = {"ents": train.discourse_type.unique().tolist() + ['Other'], "colors": colors}
    displacy.render(doc2, style="ent", options=options, manual=True, jupyter=True)

In [41]:
predictions, labels, _ = trainer.predict(tokenized_val['test'])

The following columns in the test set  don't have a corresponding argument in `BigBirdForTokenClassification.forward` and have been ignored: predictionstrings, id, starts, classlist, offset_mapping, __index_level_0__, text, ends.
***** Running Prediction *****
  Num examples = 1560
  Batch size = 4
Attention type 'block_sparse' is not possible if sequence_length: 637 <= num global tokens: 2 * config.block_size + min. num sliding tokens: 3 * config.block_size + config.num_random_blocks * config.block_size + additional buffer: config.num_random_blocks * config.block_size = 704 with config.block_size = 64, config.num_random_blocks = 3. Changing attention type to 'original_full'...


In [42]:
# predictions.shape # (10, 829, 15) => (test_id, max_len, labels) labelsはdiscoursetypeのしゅるいのこと

In [43]:
# preds = np.argmax(predictions, axis=-1)
# preds.shape # それぞれのtokenに対する予測ラベルがついている

In [44]:
# pd.Series(preds[0]).map(i2l) # 予測ラベル

In [45]:
# code that will convert our predictions into prediction strings, and visualize it at the same time
# this most likely requires some refactoring

# def get_class(c):
#     if c == 14: return 'Other'
#     else: return i2l[c][2:]

# def pred2span(pred, example, viz=False, test=False):
#     example_id = example['id']
#     n_tokens = len(example['input_ids'])
#     classes = []
#     all_span = []
#     for i, c in enumerate(pred.tolist()):
#         if i == n_tokens-1:
#             break
#         if i == 0:
#             cur_span = example['offset_mapping'][i]
#             classes.append(get_class(c))
#         elif i > 0 and (c == pred[i-1] or (c-7) == pred[i-1]):
#             cur_span[1] = example['offset_mapping'][i][1]
#         else:
#             all_span.append(cur_span)
#             cur_span = example['offset_mapping'][i]
#             classes.append(get_class(c))
#     all_span.append(cur_span)
    
#     if test: text = get_test_text(example_id)
#     else: text = get_raw_text(example_id)
    
#     # abra ka dabra se soli fanta ko pelo
    
#     # map token ids to word (whitespace) token ids
#     predstrings = []
#     for span in all_span:
#         span_start = span[0]
#         span_end = span[1]
#         before = text[:span_start]
#         token_start = len(before.split())
#         if len(before) == 0: token_start = 0
#         elif before[-1] != ' ': token_start -= 1
#         num_tkns = len(text[span_start:span_end+1].split())
#         tkns = [str(x) for x in range(token_start, token_start+num_tkns)]
#         predstring = ' '.join(tkns)
#         predstrings.append(predstring)
                    
#     rows = []
#     for c, span, predstring in zip(classes, all_span, predstrings):
#         e = {
#             'id': example_id,
#             'discourse_type': c,
#             'predictionstring': predstring,
#             'discourse_start': span[0],
#             'discourse_end': span[1],
#             'discourse': text[span[0]:span[1]+1]
#         }
#         rows.append(e)


#     df = pd.DataFrame(rows)
#     df['length'] = df['discourse'].apply(lambda t: len(t.split()))
    
#     # short spans are likely to be false positives, we can choose a min number of tokens based on validation
#     df = df[df.length > min_tokens].reset_index(drop=True)
#     if viz: visualize(df, text)

#     return df

In [46]:
proba_thresh = {
    "Lead": 0.687,
    "Position": 0.537,
    "Evidence": 0.637,
    "Claim": 0.537,
    "Concluding Statement": 0.687,
    "Counterclaim": 0.537,
    "Rebuttal": 0.537,
}

min_thresh = {
    "Lead": 9,
    "Position": 5,
    "Evidence": 14,
    "Claim": 3,
    "Concluding Statement": 11,
    "Counterclaim": 6,
    "Rebuttal": 4,
}
# code that will convert our predictions into prediction strings. we'll skip visualization here. 
# this most likely requires some refactoring

def get_class(c):
    if c == 14: return 'Other'
    else: return i2l[c][2:]

def pred2span(predictions, example, proba_thresh, min_thresh, viz=False, test=False): # pred => (5, 1304, 15)にしたい
    pred = np.argmax(predictions, axis=-1)
    example_id = example['id']
    n_tokens = len(example['input_ids'])
    classes = [] # かたまりのラベル
    all_span = [] # かたまりごとのstartとend
    pred_score = [] # かたまりのprobの平均
    
    for i, c in enumerate(pred.tolist()):
        if i == n_tokens-1:                                    # text最後のtokenのとき
            break
        if i == 0:                                             # text最初のtokenのとき
            cur_span = example['offset_mapping'][i]
            classes.append(get_class(c))
            cur_score = predictions[i][c]
        elif i > 0 and (c == pred[i-1] or (c-7) == pred[i-1]): # ひとつ前のtokenとおなじとき
            cur_span[1] = example['offset_mapping'][i][1]
            cur_score += predictions[i][c]
        else:                                                  # tokenのラベルがひとつ前と変わった
            all_span.append(cur_span)
            pred_score.append(cur_score / (cur_span[1]-cur_span[0]+1))
            cur_score = predictions[i][c]
            cur_span = example['offset_mapping'][i]
            classes.append(get_class(c))
    all_span.append(cur_span)
    pred_score.append(cur_score / (cur_span[1]-cur_span[0]+1))
    
    if test: text = get_test_text(example_id)
    else: text = get_raw_text(example_id)
        
    # map token ids to word (whitespace) token ids
    predstrings = []
    lastid = []
    for i, span in enumerate(all_span):
        
        span_start = span[0]
        span_end = span[1]
        before = text[:span_start]
        token_start = len(before.split())
        if len(before) == 0: token_start = 0
        elif before[-1] != ' ': token_start -= 1
        num_tkns = len(text[span_start:span_end+1].split())
            
        tkns = [str(x) for x in range(token_start, token_start+num_tkns)]
        predstring = ' '.join(tkns)
        predstrings.append(predstring)
        
        if classes[i] == 'Other':
            continue
        #post processing
        if num_tkns > min_thresh[classes[i]] and pred_score[i] > proba_thresh[classes[i]]:
            lastid.append(i)
        
#     print(len(classes),  len(all_span), len(predstrings), len(lastid), len(pred_score))
                    
    rows = []
    for i in lastid:
        e = {
            'id': example_id,
            'discourse_type': classes[i],
            'predictionstring': predstrings[i],
            'discourse_start': all_span[i][0],
            'discourse_end': all_span[i][1],
            'discourse': text[all_span[i][0]:all_span[i][1]+1]
        }
        rows.append(e)


    df = pd.DataFrame(rows)
#     print(df)
#     df['length'] = df['discourse'].apply(lambda t: len(t.split()))
    
    # short spans are likely to be false positives, we can choose a min number of tokens based on validation
#     df = df[df.length > min_tokens].reset_index(drop=True)
    if viz: visualize(df, text)
    return df

In [47]:
pred2span(predictions[0], tokenized_val['test'][0], proba_thresh, min_thresh, viz=True)

Unnamed: 0,id,discourse_type,predictionstring,discourse_start,discourse_end,discourse
0,7B5F5B33B566,Lead,0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18...,0,228,"When people ask for advice\n\n, they sometimes..."
1,7B5F5B33B566,Position,43 44 45 46 47 48 49 50 51 52 53,228,294,advice from another person can help you make ...
2,7B5F5B33B566,Claim,55 56 57 58 59 60 61 62 63,297,354,will make you understand things more clearly ...
3,7B5F5B33B566,Evidence,69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 8...,384,698,opinions really is a foundation to a job like...
4,7B5F5B33B566,Claim,135 136 137 138 139 140,700,723,"In the world we live in,"
5,7B5F5B33B566,Claim,143 144 145 146 147,732,759,a lot of important choices
6,7B5F5B33B566,Claim,164 165 166 167 168 169 170 171 172 173 174 17...,840,922,". Next, advice from others can make you more w..."
7,7B5F5B33B566,Evidence,180 181 182 183 184 185 186 187 188 189 190 19...,927,2232,Abraham\n\nLincoln never saw how African Ameri...
8,7B5F5B33B566,Concluding Statement,422 423 424 425 426 427 428 429 430 431 432 43...,2234,2587,"In Conclusion, advice is there to help you, op..."


In [48]:
pred2span(predictions[1], tokenized_val['test'][1], proba_thresh, min_thresh, viz=True)

Unnamed: 0,id,discourse_type,predictionstring,discourse_start,discourse_end,discourse
0,3CF52C3ED074,Lead,0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18...,0,173,All students do is waste their time and i'm ti...
1,3CF52C3ED074,Position,36 37 38 39 40 41 42 43 44 45,188,251,principal that all students should participat...
2,3CF52C3ED074,Claim,49 50 51 52 53 54 55 56 57 58 59 60 61 62,286,375,are too lazy instead of wasting time they sho...
3,3CF52C3ED074,Claim,64 65 66 67 68 69,379,412,"students it could benefit a lot,"
4,3CF52C3ED074,Evidence,89 90 91 92 93 94 95 96 97 98 99 100 101 102 1...,514,1110,extracurricular activity can give an idea of ...
5,3CF52C3ED074,Evidence,202 203 204 205 206 207 208 209 210 211 212 21...,1163,1935,"competitions there is always a reward, and go..."
6,3CF52C3ED074,Evidence,354 355 356 357 358 359 360 361 362 363 364 36...,1998,2260,your friends and family what you did today. s...
7,3CF52C3ED074,Concluding Statement,402 403 404 405 406 407 408 409 410 411 412 41...,2278,2698,strongly agree with the principal decision th...


In [49]:
dfs = []
for i in range(len(tokenized_val['test'])):
    dfs.append(pred2span(predictions[i], tokenized_val['test'][i], proba_thresh, min_thresh))

pred_df = pd.concat(dfs, axis=0)
pred_df['class'] = pred_df['discourse_type']
pred_df

Unnamed: 0,id,discourse_type,predictionstring,discourse_start,discourse_end,discourse,class
0,7B5F5B33B566,Lead,0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18...,0,228,"When people ask for advice\n\n, they sometimes...",Lead
1,7B5F5B33B566,Position,43 44 45 46 47 48 49 50 51 52 53,228,294,advice from another person can help you make ...,Position
2,7B5F5B33B566,Claim,55 56 57 58 59 60 61 62 63,297,354,will make you understand things more clearly ...,Claim
3,7B5F5B33B566,Evidence,69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 8...,384,698,opinions really is a foundation to a job like...,Evidence
4,7B5F5B33B566,Claim,135 136 137 138 139 140,700,723,"In the world we live in,",Claim
...,...,...,...,...,...,...,...
3,B3E4B633261B,Claim,54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 6...,307,395,also think if kid do community service it wil...,Claim
4,B3E4B633261B,Evidence,72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 8...,400,511,will see all the people they helped all the p...,Evidence
5,B3E4B633261B,Evidence,94 95 96 97 98 99 100 101 102 103 104 105 106 ...,513,621,think community service is a great thing for ...,Evidence
6,B3E4B633261B,Rebuttal,134 135 136 137 138 139,735,764,will thank you for it latter.,Rebuttal


In [50]:
# source: https://www.kaggle.com/robikscube/student-writing-competition-twitch#Competition-Metric-Code

def calc_overlap(row):
    """
    Calculates the overlap between prediction and
    ground truth and overlap percentages used for determining
    true positives.
    """
    set_pred = set(row.predictionstring_pred.split(" "))
    set_gt = set(row.predictionstring_gt.split(" "))
    # Length of each and intersection
    len_gt = len(set_gt)
    len_pred = len(set_pred)
    inter = len(set_gt.intersection(set_pred))
    overlap_1 = inter / len_gt
    overlap_2 = inter / len_pred
    return [overlap_1, overlap_2]


def score_feedback_comp_micro(pred_df, gt_df):
    """
    A function that scores for the kaggle
        Student Writing Competition

    Uses the steps in the evaluation page here:
        https://www.kaggle.com/c/feedback-prize-2021/overview/evaluation
    """
    gt_df = (
        gt_df[["id", "discourse_type", "predictionstring"]]
        .reset_index(drop=True)
        .copy()
    )
    pred_df = pred_df[["id", "class", "predictionstring"]].reset_index(drop=True).copy()
    pred_df["pred_id"] = pred_df.index
    gt_df["gt_id"] = gt_df.index
    # Step 1. all ground truths and predictions for a given class are compared.
    joined = pred_df.merge(
        gt_df,
        left_on=["id", "class"],
        right_on=["id", "discourse_type"],
        how="outer",
        suffixes=("_pred", "_gt"),
    )
    joined["predictionstring_gt"] = joined["predictionstring_gt"].fillna(" ")
    joined["predictionstring_pred"] = joined["predictionstring_pred"].fillna(" ")

    joined["overlaps"] = joined.apply(calc_overlap, axis=1)

    # 2. If the overlap between the ground truth and prediction is >= 0.5,
    # and the overlap between the prediction and the ground truth >= 0.5,
    # the prediction is a match and considered a true positive.
    # If multiple matches exist, the match with the highest pair of overlaps is taken.
    joined["overlap1"] = joined["overlaps"].apply(lambda x: eval(str(x))[0])
    joined["overlap2"] = joined["overlaps"].apply(lambda x: eval(str(x))[1])

    joined["potential_TP"] = (joined["overlap1"] >= 0.5) & (joined["overlap2"] >= 0.5)
    joined["max_overlap"] = joined[["overlap1", "overlap2"]].max(axis=1)
    tp_pred_ids = (
        joined.query("potential_TP")
        .sort_values("max_overlap", ascending=False)
        .groupby(["id", "predictionstring_gt"])
        .first()["pred_id"]
        .values
    )

    # 3. Any unmatched ground truths are false negatives
    # and any unmatched predictions are false positives.
    fp_pred_ids = [p for p in joined["pred_id"].unique() if p not in tp_pred_ids]

    matched_gt_ids = joined.query("potential_TP")["gt_id"].unique()
    unmatched_gt_ids = [c for c in joined["gt_id"].unique() if c not in matched_gt_ids]

    # Get numbers of each type
    TP = len(tp_pred_ids)
    FP = len(fp_pred_ids)
    FN = len(unmatched_gt_ids)
    # calc microf1
    my_f1_score = TP / (TP + 0.5 * (FP + FN))
    return my_f1_score


def score_feedback_comp(pred_df, gt_df, return_class_scores=False):
    class_scores = {}
    pred_df = pred_df[["id", "class", "predictionstring"]].reset_index(drop=True).copy()
    for discourse_type, gt_subset in gt_df.groupby("discourse_type"):
        pred_subset = (
            pred_df.loc[pred_df["class"] == discourse_type]
            .reset_index(drop=True)
            .copy()
        )
        class_score = score_feedback_comp_micro(pred_subset, gt_subset)
        class_scores[discourse_type] = class_score
    f1 = np.mean([v for v in class_scores.values()])
    if return_class_scores:
        return f1, class_scores
    return f1

In [51]:
# import optuna

# def objective(trial):
        
#     proba_thresh = {
#         "Lead": trial.suggest_float('Lead', 0.01, 1),
#         "Position": trial.suggest_float('Position', 0.01, 1),
#         "Evidence": trial.suggest_float('Evidence', 0.01, 1),
#         "Claim": trial.suggest_float('Claim', 0.01, 1),
#         "Concluding Statement": trial.suggest_float('Concluding Statement', 0.01, 1),
#         "Counterclaim": trial.suggest_float('Counterclaim', 0.01, 1),
#         "Rebuttal": trial.suggest_float('Rebuttal', 0.01, 1),
#     }

# #     min_thresh = {
# #         "Lead": trial.suggest_int('Lead', 1, 15),
# #         "Position": trial.suggest_int('Position', 1, 15),
# #         "Evidence": trial.suggest_int('Evidence', 1, 15),
# #         "Claim": trial.suggest_int('Claim', 1, 15),
# #         "Concluding Statement": trial.suggest_int('Concluding Statement', 1, 15),
# #         "Counterclaim": trial.suggest_int('Counterclaim', 1, 15),
# #         "Rebuttal": trial.suggest_int('Rebuttal', 1, 15),
# #     }
#     min_thresh = {
#         "Lead": trial.suggest_int('Lead_', 1, 15),
#         "Position": trial.suggest_int('Position_', 1, 15),
#         "Evidence": trial.suggest_int('Evidence_', 1, 15),
#         "Claim": trial.suggest_int('Claim_', 1, 15),
#         "Concluding Statement": trial.suggest_int('Concluding Statement_', 1, 15),
#         "Counterclaim": trial.suggest_int('Counterclaim_', 1, 15),
#         "Rebuttal": trial.suggest_int('Rebuttal_', 1, 15),
#     }

#     dfs = []
#     for i in range(len(tokenized_val['test'])):
#         dfs.append(pred2span(predictions[i], tokenized_val['test'][i], proba_thresh, min_thresh))

#     pred_df = pd.concat(dfs, axis=0)
#     pred_df['class'] = pred_df['discourse_type']

#     score = score_feedback_comp(pred_df, gt_df, return_class_scores=True)[0]

#     return -1*score

# study = optuna.create_study(direction='minimize')
# study.optimize(objective, n_trials=100)

In [52]:
pred_df

Unnamed: 0,id,discourse_type,predictionstring,discourse_start,discourse_end,discourse,class
0,7B5F5B33B566,Lead,0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18...,0,228,"When people ask for advice\n\n, they sometimes...",Lead
1,7B5F5B33B566,Position,43 44 45 46 47 48 49 50 51 52 53,228,294,advice from another person can help you make ...,Position
2,7B5F5B33B566,Claim,55 56 57 58 59 60 61 62 63,297,354,will make you understand things more clearly ...,Claim
3,7B5F5B33B566,Evidence,69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 8...,384,698,opinions really is a foundation to a job like...,Evidence
4,7B5F5B33B566,Claim,135 136 137 138 139 140,700,723,"In the world we live in,",Claim
...,...,...,...,...,...,...,...
3,B3E4B633261B,Claim,54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 6...,307,395,also think if kid do community service it wil...,Claim
4,B3E4B633261B,Evidence,72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 8...,400,511,will see all the people they helped all the p...,Evidence
5,B3E4B633261B,Evidence,94 95 96 97 98 99 100 101 102 103 104 105 106 ...,513,621,think community service is a great thing for ...,Evidence
6,B3E4B633261B,Rebuttal,134 135 136 137 138 139,735,764,will thank you for it latter.,Rebuttal


In [53]:
# best_proba_thresh = study.best_params
# best_proba_thresh

## CV Score

In [54]:
score_feedback_comp(pred_df, gt_df, return_class_scores=True)

(0.5939719243444349,
 {'Claim': 0.5480999329951182,
  'Concluding Statement': 0.7178545187362234,
  'Counterclaim': 0.45161290322580644,
  'Evidence': 0.6878003885877901,
  'Lead': 0.7685185185185185,
  'Position': 0.6120253164556962,
  'Rebuttal': 0.37189189189189187})

## End

I'll appreciate every upvote or comment!