# 1. Information about the submission

## 1.1 Name and number of the assignment 

### Text categorization: argument mining. Assignment 2.

## 1.2 Student name

### Nuzhnov Mark

## 1.3 Codalab user ID

### Nuzhnov_Mark

## 1.4 Additional comments

***Enter here** any additional comments which you would like to communicate to a TA who is going to grade this work not related to the content of your submission.*

# 2. Technical Report

## 2.1 Methodology 


BERT (Bidirectional Encoder Representations from Transformers) is a recent paper published by researchers at Google AI Language. In practice, we compute the attention function on a set of queries simultaneously, packed together into a matrix $Q$.   The keys and values are also packed together into matrices $K$ and $V$.  We compute the matrix of outputs as:                      
                                                                 
$$                                                                         
   \mathrm{Attention}(Q, K, V) = \mathrm{softmax}(\frac{QK^T}{\sqrt{d_k}})V               
$$   

![alt text](https://lena-voita.github.io/resources/lectures/seq2seq/transformer/qkv_explained-min.png)
Source: [Lena Voita's Lecture about Seq2Seq](https://lena-voita.github.io/nlp_course/seq2seq_and_attention.html)

Multi-head attention allows the model to jointly attend to information from different representation subspaces at different positions. With a single attention head, averaging inhibits this.                                            
$$    
\mathrm{MultiHead}(Q, K, V) = \mathrm{Concat}(\mathrm{head_1}, ..., \mathrm{head_h})W^O    \\                                           
    \text{where}~\mathrm{head_i} = \mathrm{Attention}(QW^Q_i, KW^K_i, VW^V_i)                                
$$                                                                                                                 

Where the projections are parameter matrices $W^Q_i \in \mathbb{R}^{d_{\text{model}} \times d_k}$, $W^K_i \in \mathbb{R}^{d_{\text{model}} \times d_k}$, $W^V_i \in \mathbb{R}^{d_{\text{model}} \times d_v}$ and $W^O \in \mathbb{R}^{hd_v \times d_{\text{model}}}$.                                                                                                                                                                                             In this work I tried employ $h=3$, $h=5$, $h=8$ parallel attention layers, or heads. For each of these I use $d_k=d_v=d_{\text{model}}/h=64$. Due to the reduced dimension of each head, the total computational cost is similar to that of single-head attention with full dimensionality. 

![alt text](https://jalammar.github.io/images/t/transformer_multi-headed_self-attention-recap.png)
Source: [The Illustrated Transformer](https://jalammar.github.io/illustrated-transformer/)

## 2.2 Discussion of results

When I ran baseline on Quarantine I got pretty bad results so I decided not to train it further and add MHA. First I tried 6 heads with key_dim=128 but results on dev were poor because I probably overfitted. Then I switched to 3 heads with key_dim=64. I was able to achieve f1-stance 0.5044 and f1-premise 0.4405 on development and 0.5384 and 0.4948 on post-eval.

Training process details are available on [wandb](https://wandb.ai/darkdestiny/Argument%20mining?workspace=user-benchm)

# 3. Code

*Enter here all code used to produce your results submitted to Codalab. Add some comments and subsections to navigate though your solution.*

*In this part you are expected to develop yourself a solution of the task and provide a reproducible code:*
- *Using Python 3;*
- *Contains code for installation of all dependencies;*
- *Contains code for downloading of all the datasets used*;
- *Contains the code for reproducing your results (in other words, if a tester downloads your notebook she should be able to run cell-by-cell the code and obtain your experimental results as described in the methodology section)*.


*As a result, you code will be graded according to these criteria:*
- ***Readability**: your code should be well-structured preferably with indicated parts of your approach (Preprocessing, Model training, Evaluation, etc.).*
- ***Reproducibility**: your code should be reproduced without any mistakes with “Run all” mode (obtaining experimental part).*


## 3.1 Requirements

In [1]:
! pip install transformers
! pip install gdown
! pip install wandb

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.28.1-py3-none-any.whl (7.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.0/7.0 MB[0m [31m71.9 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m107.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting huggingface-hub<1.0,>=0.11.0
  Downloading huggingface_hub-0.14.1-py3-none-any.whl (224 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m224.5/224.5 kB[0m [31m27.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.14.1 tokenizers-0.13.3 transformers-4.28.1
Looking in indexes: https://pypi.org/simple, htt

In [2]:
!wandb login 7ed66683ab734c5607259a94b5937616a911f67b

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


In [3]:
from transformers import TFBertModel,  BertConfig, BertTokenizerFast
from tensorflow.keras.layers import Input, Dropout, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.initializers import TruncatedNormal
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import CategoricalAccuracy
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import MultiHeadAttention, Flatten, TimeDistributed
from tensorflow.keras.optimizers.experimental import AdamW
import pandas as pd
from sklearn.model_selection import train_test_split
import os
import numpy as np
from sklearn.metrics import classification_report
import wandb
from wandb.keras import WandbCallback
#from google.colab import drive
#drive.mount('/content/drive')

seed_value = 42
os.environ['PYTHONHASHSEED']=str(seed_value)

In [4]:
# Name of the BERT model to use
model_name = 'DeepPavlov/rubert-base-cased-sentence'

# Load transformers config and set output_hidden_states to False
config = BertConfig.from_pretrained(model_name)
config.output_hidden_states = False

# Load BERT tokenizer
tokenizer = BertTokenizerFast.from_pretrained(pretrained_model_name_or_path = model_name, config=config)

# Load the Transformers BERT model
transformer_model = TFBertModel.from_pretrained(model_name, config=config, from_pt=True)

# Load the MainLayer
bert = transformer_model.layers[0]

Downloading (…)lve/main/config.json:   0%|          | 0.00/642 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/24.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/1.65M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/711M [00:00<?, ?B/s]

All PyTorch model weights were used when initializing TFBertModel.

All the weights of TFBertModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


## 3.2 Download the data

In [11]:
!gdown 19LzgUlM3417TlG6bmo-eSY5vRAKw2ukR
!gdown 1eu97ydTza6HGyvsD3nz_mp-bFgxN9mCv

Downloading...
From: https://drive.google.com/uc?id=19LzgUlM3417TlG6bmo-eSY5vRAKw2ukR
To: /content/train_all.tsv
100% 1.54M/1.54M [00:00<00:00, 160MB/s]
Downloading...
From: https://drive.google.com/uc?id=1eu97ydTza6HGyvsD3nz_mp-bFgxN9mCv
To: /content/test-no_labels.tsv
100% 298k/298k [00:00<00:00, 130MB/s]


## 3.3 Quarantine

### 3.3.1 Preprocessing

In [None]:
CLASS_NAME = "quarantine"
# Import data from csv
whole_data = pd.read_csv('./train_all.tsv', sep='\t')

# Train_test_split
test_size = 0.2
data, data_test = train_test_split(whole_data, test_size=test_size, random_state = seed_value)


#------------------------------------------------------------------------------------#
## Train
# Select required columns
data = data[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]

# Set your model output as categorical and save in new label col
data['stance_label'] = pd.Categorical(data[f'{CLASS_NAME}_stance'])
data['argument_label'] = pd.Categorical(data[f'{CLASS_NAME}_argument'])

# Transform your output to numeric
data[f'{CLASS_NAME}_stance'] = data['stance_label'].cat.codes
data[f'{CLASS_NAME}_argument'] = data['argument_label'].cat.codes

#------------------------------------------------------------------------------------#
## Test
# Select required columns
data_test = data_test[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]

# Set your model output as categorical and save in new label col
data_test['stance_label'] = pd.Categorical(data_test[f'{CLASS_NAME}_stance'])
data_test['argument_label'] = pd.Categorical(data_test[f'{CLASS_NAME}_argument'])

# Transform your output to numeric
data_test[f'{CLASS_NAME}_stance'] = data_test['stance_label'].cat.codes
data_test[f'{CLASS_NAME}_argument'] = data_test['argument_label'].cat.codes

In [None]:
# Ready output data for the model
test_y_stance = to_categorical(data_test[f'{CLASS_NAME}_stance'])
test_y_argument = to_categorical(data_test[f'{CLASS_NAME}_argument'])

# Tokenize the input (takes some time)
test_x = tokenizer(
    text=data_test['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding='max_length', 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)

### 3.3.2 Architecture

In [None]:
# Build your model
input_ids = Input(shape=(256,), name='input_ids', dtype='int32')
attention_mask = Input(shape=(256,), name='attention_mask', dtype='int32')
inputs = {'input_ids': input_ids, 'attention_mask': attention_mask}

# Load the Transformers BERT model as a layer in a Keras model
bert_model = bert(inputs)[0]

# Add multi-head attention layer
attention_output = MultiHeadAttention(num_heads=3, key_dim=64)(bert_model, bert_model)

# Flatten the output from multi-head attention layer
flatten = Flatten()(attention_output)

# Apply dropout layer
dropout = Dropout(config.hidden_dropout_prob, name='pooled_output')
pooled_output = dropout(flatten, training=False)

# Then build your model output
stance = Dense(units=len(data.stance_label.value_counts()), activation='relu', kernel_initializer=TruncatedNormal(stddev=config.initializer_range), name='stance')(pooled_output)
argument = Dense(units=len(data.argument_label.value_counts()), activation='relu', kernel_initializer=TruncatedNormal(stddev=config.initializer_range), name='argument')(pooled_output)
outputs = {'stance': stance, 'argument': argument}

# And combine it all in a model object
model = Model(inputs=inputs, outputs=outputs, name='BERT_MultiLabel_MultiClass')

# Take a look at the model
model.summary()

Model: "BERT_MultiLabel_MultiClass"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 attention_mask (InputLayer)    [(None, 256)]        0           []                               
                                                                                                  
 input_ids (InputLayer)         [(None, 256)]        0           []                               
                                                                                                  
 bert (TFBertMainLayer)         TFBaseModelOutputWi  177853440   ['attention_mask[0][0]',         
                                thPoolingAndCrossAt               'input_ids[0][0]']              
                                tentions(last_hidde                                               
                                n_state=(None, 256,                      

### 3.3.3 Training

In [None]:
# Set an optimizer
optimizer = AdamW(
    learning_rate=1e-6,
    epsilon=1e-08,
    weight_decay=0.01,
    clipnorm=1.0)

# Set loss and metrics
loss = {'stance': CategoricalCrossentropy(from_logits = True), 'argument': CategoricalCrossentropy(from_logits = True)}
metric = {'stance': CategoricalAccuracy('accuracy'), 'argument': CategoricalAccuracy('accuracy')}

# Compile the model
model.compile(
    optimizer = optimizer,
    loss = loss, 
    metrics = metric)

# Ready output data for the model
y_stance = to_categorical(data[f'{CLASS_NAME}_stance'])
y_argument = to_categorical(data[f'{CLASS_NAME}_argument'])

# Tokenize the input (takes some time)
x = tokenizer(
    text=data['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding=True, 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)

wandb.init(project="Argument mining", name = "Bert_attention_quarantine", tags = ["Ruberta_MHA", "RB"])
epochs = 8
# Fit the model
history = model.fit(
    x={'input_ids': x['input_ids'], 'attention_mask': x['attention_mask']},
    y={'stance': y_stance, 'argument': y_argument},
    validation_data=({'input_ids': test_x['input_ids'][:8], 'attention_mask': test_x['attention_mask'][:8]}, 
                     {'stance': test_y_stance[:8], 'argument': test_y_argument[:8]}),
    batch_size=8,
    epochs=epochs, callbacks=[WandbCallback()])
wandb.finish()

Epoch 1/8






[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230425_134736-xmwd6zns/files/model-best)... Done. 17.3s


Epoch 2/8
Epoch 3/8

[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230425_134736-xmwd6zns/files/model-best)... Done. 18.1s


Epoch 4/8

[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230425_134736-xmwd6zns/files/model-best)... Done. 17.7s


Epoch 5/8

[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230425_134736-xmwd6zns/files/model-best)... Done. 17.8s


Epoch 6/8
Epoch 7/8

[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230425_134736-xmwd6zns/files/model-best)... Done. 18.2s


Epoch 8/8


VBox(children=(Label(value='12399.391 MB of 12399.391 MB uploaded (0.721 MB deduped)\r'), FloatProgress(value=…

0,1
argument_accuracy,▁▇▇▇▇███
argument_loss,█▃▂▂▂▂▁▁
epoch,▁▂▃▄▅▆▇█
loss,█▃▃▂▂▂▁▁
stance_accuracy,▁▅▆▇▇▇██
stance_loss,█▄▃▂▂▂▁▁
val_argument_accuracy,▁▁▁▁▁▁▁▁
val_argument_loss,▂▄▁▂▁█▂▃
val_loss,▅▇▃▃▂█▁▂
val_stance_accuracy,▁▁▁▁▁▁▁▁

0,1
argument_accuracy,0.95701
argument_loss,0.13219
best_epoch,6.0
best_val_loss,0.05055
epoch,7.0
loss,0.31192
stance_accuracy,0.93039
stance_loss,0.17973
val_argument_accuracy,1.0
val_argument_loss,0.03254


### 3.3.4 Prediction

In [None]:
test = pd.read_csv("./val_empty.tsv", sep='\t')
test.head()
test_d = test[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]
for_pred = tokenizer(
    text=test_d['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding='max_length', 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)
test_results = model.predict(x={'input_ids': for_pred['input_ids'], 'attention_mask': for_pred['attention_mask']})



In [None]:
test_d[f'{CLASS_NAME}_stance'] = test_results['stance'].argmax(axis=-1) - 1
test_d[f'{CLASS_NAME}_argument'] = test_results['argument'].argmax(axis=-1) - 1
test_d[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']].to_csv(f"./val_predict_MHA_{CLASS_NAME}.tsv", sep='\t', index=None)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_d[f'{CLASS_NAME}_stance'] = test_results['stance'].argmax(axis=-1) - 1
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_d[f'{CLASS_NAME}_argument'] = test_results['argument'].argmax(axis=-1) - 1


In [None]:
test = pd.read_csv("./test-no_labels.tsv", sep='\t')
test.head()
test_d = test[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]
for_pred = tokenizer(
    text=test_d['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding='max_length', 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)
test_results = model.predict(x={'input_ids': for_pred['input_ids'], 'attention_mask': for_pred['attention_mask']})



In [None]:
test_d[f'{CLASS_NAME}_stance'] = test_results['stance'].argmax(axis=-1) - 1
test_d[f'{CLASS_NAME}_argument'] = test_results['argument'].argmax(axis=-1) - 1
test_d[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']].to_csv(f"./test_predict_MHA_{CLASS_NAME}.tsv", sep='\t', index=None)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_d[f'{CLASS_NAME}_stance'] = test_results['stance'].argmax(axis=-1) - 1
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_d[f'{CLASS_NAME}_argument'] = test_results['argument'].argmax(axis=-1) - 1


## 3.4 Masks

### 3.4.1 Preprocessing

In [6]:
CLASS_NAME = "masks"
# Import data from csv
whole_data = pd.read_csv('./train_all.tsv', sep='\t')

# Train_test_split
test_size = 0.2
data, data_test = train_test_split(whole_data, test_size=test_size, random_state = seed_value)


#------------------------------------------------------------------------------------#
## Train
# Select required columns
data = data[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]

# Set your model output as categorical and save in new label col
data['stance_label'] = pd.Categorical(data[f'{CLASS_NAME}_stance'])
data['argument_label'] = pd.Categorical(data[f'{CLASS_NAME}_argument'])

# Transform your output to numeric
data[f'{CLASS_NAME}_stance'] = data['stance_label'].cat.codes
data[f'{CLASS_NAME}_argument'] = data['argument_label'].cat.codes

#------------------------------------------------------------------------------------#
## Test
# Select required columns
data_test = data_test[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]

# Set your model output as categorical and save in new label col
data_test['stance_label'] = pd.Categorical(data_test[f'{CLASS_NAME}_stance'])
data_test['argument_label'] = pd.Categorical(data_test[f'{CLASS_NAME}_argument'])

# Transform your output to numeric
data_test[f'{CLASS_NAME}_stance'] = data_test['stance_label'].cat.codes
data_test[f'{CLASS_NAME}_argument'] = data_test['argument_label'].cat.codes

In [7]:
# Ready output data for the model
test_y_stance = to_categorical(data_test[f'{CLASS_NAME}_stance'])
test_y_argument = to_categorical(data_test[f'{CLASS_NAME}_argument'])

# Tokenize the input (takes some time)
test_x = tokenizer(
    text=data_test['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding='max_length', 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)

### 3.4.2 Architecture

In [8]:
# Build your model
input_ids = Input(shape=(256,), name='input_ids', dtype='int32')
attention_mask = Input(shape=(256,), name='attention_mask', dtype='int32')
inputs = {'input_ids': input_ids, 'attention_mask': attention_mask}

# Load the Transformers BERT model as a layer in a Keras model
bert_model = bert(inputs)[0]

# Add multi-head attention layer
attention_output = MultiHeadAttention(num_heads=3, key_dim=64)(bert_model, bert_model)

# Flatten the output from multi-head attention layer
flatten = Flatten()(attention_output)

# Apply dropout layer
dropout = Dropout(config.hidden_dropout_prob, name='pooled_output')
pooled_output = dropout(flatten, training=False)

# Then build your model output
stance = Dense(units=len(data.stance_label.value_counts()), activation='relu', kernel_initializer=TruncatedNormal(stddev=config.initializer_range), name='stance')(pooled_output)
argument = Dense(units=len(data.argument_label.value_counts()), activation='relu', kernel_initializer=TruncatedNormal(stddev=config.initializer_range), name='argument')(pooled_output)
outputs = {'stance': stance, 'argument': argument}

# And combine it all in a model object
model = Model(inputs=inputs, outputs=outputs, name='BERT_MultiLabel_MultiClass')

# Take a look at the model
model.summary()

Model: "BERT_MultiLabel_MultiClass"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 attention_mask (InputLayer)    [(None, 256)]        0           []                               
                                                                                                  
 input_ids (InputLayer)         [(None, 256)]        0           []                               
                                                                                                  
 bert (TFBertMainLayer)         TFBaseModelOutputWi  177853440   ['attention_mask[0][0]',         
                                thPoolingAndCrossAt               'input_ids[0][0]']              
                                tentions(last_hidde                                               
                                n_state=(None, 256,                      

### 3.4.3 Training

In [9]:
# Set an optimizer
optimizer = AdamW(
    learning_rate=1e-6,
    epsilon=1e-08,
    weight_decay=0.01,
    clipnorm=1.0)

# Set loss and metrics
loss = {'stance': CategoricalCrossentropy(from_logits = True), 'argument': CategoricalCrossentropy(from_logits = True)}
metric = {'stance': CategoricalAccuracy('accuracy'), 'argument': CategoricalAccuracy('accuracy')}

# Compile the model
model.compile(
    optimizer = optimizer,
    loss = loss, 
    metrics = metric)

# Ready output data for the model
y_stance = to_categorical(data[f'{CLASS_NAME}_stance'])
y_argument = to_categorical(data[f'{CLASS_NAME}_argument'])

# Tokenize the input (takes some time)
x = tokenizer(
    text=data['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding=True, 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)

wandb.init(project="Argument mining", name = "Bert_attention_masks", tags = ["Ruberta_MHA", "RB"])
epochs = 8
# Fit the model
history = model.fit(
    x={'input_ids': x['input_ids'], 'attention_mask': x['attention_mask']},
    y={'stance': y_stance, 'argument': y_argument},
    validation_data=({'input_ids': test_x['input_ids'][:8], 'attention_mask': test_x['attention_mask'][:8]}, 
                     {'stance': test_y_stance[:8], 'argument': test_y_argument[:8]}),
    batch_size=8,
    epochs=epochs, callbacks=[WandbCallback()])
wandb.finish()

[34m[1mwandb[0m: Currently logged in as: [33mbenchm[0m ([33mdarkdestiny[0m). Use [1m`wandb login --relogin`[0m to force relogin




Epoch 1/8






[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230425_175705-g478norl/files/model-best)... Done. 12.3s


Epoch 2/8

[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230425_175705-g478norl/files/model-best)... Done. 12.1s


Epoch 3/8

[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230425_175705-g478norl/files/model-best)... Done. 12.7s


Epoch 4/8

[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230425_175705-g478norl/files/model-best)... Done. 12.1s


Epoch 5/8

[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230425_175705-g478norl/files/model-best)... Done. 12.4s


Epoch 6/8

[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230425_175705-g478norl/files/model-best)... Done. 12.1s


Epoch 7/8

[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230425_175705-g478norl/files/model-best)... Done. 12.2s


Epoch 8/8

[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230425_175705-g478norl/files/model-best)... Done. 12.4s




0,1
argument_accuracy,▁▆▇▇▇███
argument_loss,█▃▂▂▂▁▁▁
epoch,▁▂▃▄▅▆▇█
loss,█▄▃▃▂▂▁▁
stance_accuracy,▁▅▅▆▆▇▇█
stance_loss,█▄▄▃▂▂▁▁
val_argument_accuracy,██▁▁█▁▁█
val_argument_loss,█▇▅▅▂▂▂▁
val_loss,█▆▅▄▂▂▂▁
val_stance_accuracy,▁▅▆▆▆█▆▆

0,1
argument_accuracy,0.92053
argument_loss,0.22411
best_epoch,7.0
best_val_loss,0.94498
epoch,7.0
loss,0.54757
stance_accuracy,0.87288
stance_loss,0.32346
val_argument_accuracy,0.625
val_argument_loss,0.69148


### 3.4.4 Prediction

In [12]:
CLASS_NAME = "masks"
test = pd.read_csv("./val_empty.tsv", sep='\t')
test.head()
test_d = test[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]
for_pred = tokenizer(
    text=test_d['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding='max_length', 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)
test_results = model.predict(x={'input_ids': for_pred['input_ids'], 'attention_mask': for_pred['attention_mask']})
test_d[f'{CLASS_NAME}_stance'] = test_results['stance'].argmax(axis=-1) - 1
test_d[f'{CLASS_NAME}_argument'] = test_results['argument'].argmax(axis=-1) - 1
test_d[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']].to_csv(f"./val_predict_MHA_{CLASS_NAME}.tsv", sep='\t', index=None)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_d[f'{CLASS_NAME}_stance'] = test_results['stance'].argmax(axis=-1) - 1
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_d[f'{CLASS_NAME}_argument'] = test_results['argument'].argmax(axis=-1) - 1


In [13]:
CLASS_NAME = "masks"
test = pd.read_csv("./test-no_labels.tsv", sep='\t')
test.head()
test_d = test[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]
for_pred = tokenizer(
    text=test_d['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding='max_length', 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)
test_results = model.predict(x={'input_ids': for_pred['input_ids'], 'attention_mask': for_pred['attention_mask']})
test_d[f'{CLASS_NAME}_stance'] = test_results['stance'].argmax(axis=-1) - 1
test_d[f'{CLASS_NAME}_argument'] = test_results['argument'].argmax(axis=-1) - 1
test_d[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']].to_csv(f"./test_predict_MHA_{CLASS_NAME}.tsv", sep='\t', index=None)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_d[f'{CLASS_NAME}_stance'] = test_results['stance'].argmax(axis=-1) - 1
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_d[f'{CLASS_NAME}_argument'] = test_results['argument'].argmax(axis=-1) - 1


## 3.5 vaccines

### 3.5.1 Preprocessing

In [14]:
CLASS_NAME = "vaccines"
# Import data from csv
whole_data = pd.read_csv('./train_all.tsv', sep='\t')

# Train_test_split
test_size = 0.2
data, data_test = train_test_split(whole_data, test_size=test_size)


#------------------------------------------------------------------------------------#
## Train
# Select required columns
data = data[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]

# Set your model output as categorical and save in new label col
data['stance_label'] = pd.Categorical(data[f'{CLASS_NAME}_stance'])
data['argument_label'] = pd.Categorical(data[f'{CLASS_NAME}_argument'])

# Transform your output to numeric
data[f'{CLASS_NAME}_stance'] = data['stance_label'].cat.codes
data[f'{CLASS_NAME}_argument'] = data['argument_label'].cat.codes

#------------------------------------------------------------------------------------#
## Test
# Select required columns
data_test = data_test[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]

# Set your model output as categorical and save in new label col
data_test['stance_label'] = pd.Categorical(data_test[f'{CLASS_NAME}_stance'])
data_test['argument_label'] = pd.Categorical(data_test[f'{CLASS_NAME}_argument'])

# Transform your output to numeric
data_test[f'{CLASS_NAME}_stance'] = data_test['stance_label'].cat.codes
data_test[f'{CLASS_NAME}_argument'] = data_test['argument_label'].cat.codes
# Ready output data for the model
test_y_stance = to_categorical(data_test[f'{CLASS_NAME}_stance'])
test_y_argument = to_categorical(data_test[f'{CLASS_NAME}_argument'])

# Tokenize the input (takes some time)
test_x = tokenizer(
    text=data_test['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding='max_length', 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)

### 3.5.2 Architecture

In [15]:
# Build your model
input_ids = Input(shape=(256,), name='input_ids', dtype='int32')
attention_mask = Input(shape=(256,), name='attention_mask', dtype='int32')
inputs = {'input_ids': input_ids, 'attention_mask': attention_mask}

# Load the Transformers BERT model as a layer in a Keras model
bert_model = bert(inputs)[0]

# Add multi-head attention layer
attention_output = MultiHeadAttention(num_heads=3, key_dim=64)(bert_model, bert_model)

# Flatten the output from multi-head attention layer
flatten = Flatten()(attention_output)

# Apply dropout layer
dropout = Dropout(config.hidden_dropout_prob, name='pooled_output')
pooled_output = dropout(flatten, training=False)

# Then build your model output
stance = Dense(units=len(data.stance_label.value_counts()), activation='relu', kernel_initializer=TruncatedNormal(stddev=config.initializer_range), name='stance')(pooled_output)
argument = Dense(units=len(data.argument_label.value_counts()), activation='relu', kernel_initializer=TruncatedNormal(stddev=config.initializer_range), name='argument')(pooled_output)
outputs = {'stance': stance, 'argument': argument}

# And combine it all in a model object
model = Model(inputs=inputs, outputs=outputs, name='BERT_MultiLabel_MultiClass')

# Take a look at the model
model.summary()

Model: "BERT_MultiLabel_MultiClass"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 attention_mask (InputLayer)    [(None, 256)]        0           []                               
                                                                                                  
 input_ids (InputLayer)         [(None, 256)]        0           []                               
                                                                                                  
 bert (TFBertMainLayer)         TFBaseModelOutputWi  177853440   ['attention_mask[0][0]',         
                                thPoolingAndCrossAt               'input_ids[0][0]']              
                                tentions(last_hidde                                               
                                n_state=(None, 256,                      

### 3.5.3 Training

In [16]:
# Set an optimizer
optimizer = AdamW(
    learning_rate=1e-6,
    epsilon=1e-08,
    weight_decay=0.01,
    clipnorm=1.0)

# Set loss and metrics
loss = {'stance': CategoricalCrossentropy(from_logits = True), 'argument': CategoricalCrossentropy(from_logits = True)}
metric = {'stance': CategoricalAccuracy('accuracy'), 'argument': CategoricalAccuracy('accuracy')}

# Compile the model
model.compile(
    optimizer = optimizer,
    loss = loss, 
    metrics = metric)

# Ready output data for the model
y_stance = to_categorical(data[f'{CLASS_NAME}_stance'])
y_argument = to_categorical(data[f'{CLASS_NAME}_argument'])

# Tokenize the input (takes some time)
x = tokenizer(
    text=data['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding=True, 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)

wandb.init(project="Argument mining", name = "Bert_attention_vaccines", tags = ["Ruberta_with_MHA", "RB"])
epochs = 8
# Fit the model
history = model.fit(
    x={'input_ids': x['input_ids'], 'attention_mask': x['attention_mask']},
    y={'stance': y_stance, 'argument': y_argument},
    validation_data=({'input_ids': test_x['input_ids'][:8], 'attention_mask': test_x['attention_mask'][:8]}, 
                     {'stance': test_y_stance[:8], 'argument': test_y_argument[:8]}),
    batch_size=8,
    epochs=epochs, callbacks=[WandbCallback()])
wandb.finish()

Epoch 1/8






[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230425_185602-r8qrp22p/files/model-best)... Done. 14.3s


Epoch 2/8

[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230425_185602-r8qrp22p/files/model-best)... Done. 14.2s


Epoch 3/8
Epoch 4/8

[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230425_185602-r8qrp22p/files/model-best)... Done. 16.7s


Epoch 5/8

[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230425_185602-r8qrp22p/files/model-best)... Done. 16.9s


Epoch 6/8

[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230425_185602-r8qrp22p/files/model-best)... Done. 16.7s


Epoch 7/8
Epoch 8/8

[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230425_185602-r8qrp22p/files/model-best)... Done. 21.4s




0,1
argument_accuracy,▁▅▆▆▆▇██
argument_loss,█▃▃▂▂▂▁▁
epoch,▁▂▃▄▅▆▇█
loss,█▃▃▂▂▂▁▁
stance_accuracy,▁▄▅▅▆▇▇█
stance_loss,█▃▃▂▂▂▁▁
val_argument_accuracy,▁▁▁▁▁▁▁▁
val_argument_loss,▁▂▁▂▃▅█▁
val_loss,█▅▆▃▂▂▃▁
val_stance_accuracy,▁▁▁█████

0,1
argument_accuracy,0.95552
argument_loss,0.12224
best_epoch,7.0
best_val_loss,0.04273
epoch,7.0
loss,0.30313
stance_accuracy,0.9289
stance_loss,0.18089
val_argument_accuracy,1.0
val_argument_loss,0.01768


In [17]:
CLASS_NAME = "vaccines"
test = pd.read_csv("./val_empty.tsv", sep='\t')
test.head()
test_d = test[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]
for_pred = tokenizer(
    text=test_d['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding='max_length', 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)
test_results = model.predict(x={'input_ids': for_pred['input_ids'], 'attention_mask': for_pred['attention_mask']})
test_d[f'{CLASS_NAME}_stance'] = test_results['stance'].argmax(axis=-1) - 1
test_d[f'{CLASS_NAME}_argument'] = test_results['argument'].argmax(axis=-1) - 1
test_d[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']].to_csv(f"./val_predict_MHA_{CLASS_NAME}.tsv", sep='\t', index=None)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_d[f'{CLASS_NAME}_stance'] = test_results['stance'].argmax(axis=-1) - 1
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_d[f'{CLASS_NAME}_argument'] = test_results['argument'].argmax(axis=-1) - 1


In [18]:
CLASS_NAME = "vaccines"
test = pd.read_csv("./test-no_labels.tsv", sep='\t')
test.head()
test_d = test[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]
for_pred = tokenizer(
    text=test_d['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding='max_length', 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)
test_results = model.predict(x={'input_ids': for_pred['input_ids'], 'attention_mask': for_pred['attention_mask']})
test_d[f'{CLASS_NAME}_stance'] = test_results['stance'].argmax(axis=-1) - 1
test_d[f'{CLASS_NAME}_argument'] = test_results['argument'].argmax(axis=-1) - 1
test_d[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']].to_csv(f"./test_predict_MHA_{CLASS_NAME}.tsv", sep='\t', index=None)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_d[f'{CLASS_NAME}_stance'] = test_results['stance'].argmax(axis=-1) - 1
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_d[f'{CLASS_NAME}_argument'] = test_results['argument'].argmax(axis=-1) - 1


## 3.6 Concatinate all files with results for masks, vaccines, quarantine

In [19]:
CLASS_NAME = "quarantine"
df1 = pd.read_csv(f"./val_predict_MHA_{CLASS_NAME}.tsv", sep='\t')
CLASS_NAME = "masks"
df2 = pd.read_csv(f"./val_predict_MHA_{CLASS_NAME}.tsv", sep='\t')
CLASS_NAME = "vaccines"
df3 = pd.read_csv(f"./val_predict_MHA_{CLASS_NAME}.tsv", sep='\t')


result = pd.merge(df1, df2, on="text")
result = pd.merge(result, df3, on="text")
result.to_csv("./val_predict_concat_MHA.tsv", sep='\t', index=None)

In [20]:
CLASS_NAME = "quarantine"
df1 = pd.read_csv(f"./test_predict_MHA_{CLASS_NAME}.tsv", sep='\t')
CLASS_NAME = "masks"
df2 = pd.read_csv(f"./test_predict_MHA_{CLASS_NAME}.tsv", sep='\t')
CLASS_NAME = "vaccines"
df3 = pd.read_csv(f"./test_predict_MHA_{CLASS_NAME}.tsv", sep='\t')


result = pd.merge(df1, df2, on="text")
result = pd.merge(result, df3, on="text")
result.to_csv("./test_predict_concat_MHA.tsv", sep='\t', index=None)

In [21]:
!zip val_predict_concat.zip val_predict_concat_MHA.tsv

  adding: val_predict_concat_MHA.tsv (deflated 73%)


In [22]:
!zip test_predict_concat.zip test_predict_concat_MHA.tsv

  adding: test_predict_concat_MHA.tsv (deflated 73%)
