# 1. Information about the submission

## 1.1 Name and number of the assignment 

### Text categorization and argument mining task. HW2

## 1.2 Student name

### Nuzhnov Mark

## 1.3 Codalab user ID

### Nuzhnov_Mark

## 1.4 Additional comments

***Enter here** any additional comments which you would like to communicate to a TA who is going to grade this work not related to the content of your submission.*

# 2. Technical Report

*Use Section 2 to describe results of your experiments as you would do writing a paper about your results. DO NOT insert code in this part. Only insert plots and tables summarizing results as needed. Use formulas if needed do described your methodology. The code is provided in Section 3.*

## 2.1 Methodology 


BERT (Bidirectional Encoder Representations from Transformers) is a recent paper published by researchers at Google AI Language. In practice, we compute the attention function on a set of queries simultaneously, packed together into a matrix $Q$.   The keys and values are also packed together into matrices $K$ and $V$.  We compute the matrix of outputs as:                      
                                                                 
$$                                                                         
   \mathrm{Attention}(Q, K, V) = \mathrm{softmax}(\frac{QK^T}{\sqrt{d_k}})V               
$$   

![alt text](https://lena-voita.github.io/resources/lectures/seq2seq/transformer/qkv_explained-min.png)
Source: [Lena Voita's Lecture about Seq2Seq](https://lena-voita.github.io/nlp_course/seq2seq_and_attention.html)

Multi-head attention allows the model to jointly attend to information from different representation subspaces at different positions. With a single attention head, averaging inhibits this.                                            
$$    
\mathrm{MultiHead}(Q, K, V) = \mathrm{Concat}(\mathrm{head_1}, ..., \mathrm{head_h})W^O    \\                                           
    \text{where}~\mathrm{head_i} = \mathrm{Attention}(QW^Q_i, KW^K_i, VW^V_i)                                
$$                                                                                                                 

Where the projections are parameter matrices $W^Q_i \in \mathbb{R}^{d_{\text{model}} \times d_k}$, $W^K_i \in \mathbb{R}^{d_{\text{model}} \times d_k}$, $W^V_i \in \mathbb{R}^{d_{\text{model}} \times d_v}$ and $W^O \in \mathbb{R}^{hd_v \times d_{\text{model}}}$.                                                                                                                                                                                             In this work I tried employ $h=3$, $h=5$, $h=8$ parallel attention layers, or heads. For each of these I use $d_k=d_v=d_{\text{model}}/h=64$. Due to the reduced dimension of each head, the total computational cost is similar to that of single-head attention with full dimensionality. 

![alt text](https://jalammar.github.io/images/t/transformer_multi-headed_self-attention-recap.png)
Source: [The Illustrated Transformer](https://jalammar.github.io/illustrated-transformer/)

## 2.2 Discussion of results

Discussing the results I would say that Baseline model (without multihead attention layer) gives a **0.3924 - F1_Stance_Detection** and **0.4517 - F1_Premise_Classification**. By adding MHA layer I was able to increased the score to **0.5713 - F1_Stance_Detection** and **0.6179 - F1_Premise_Classification**. Regarding to loss values we can see that in my case **val_argument_accuracy** was always higher than **val_stance_accuracy**, and loss was controversial. All graphs and losses can be found by the link to wandb below.

https://wandb.ai/smolenkovaea00/Text_categorization?workspace=user-smolenkovaea00

# 3. Code

*Enter here all code used to produce your results submitted to Codalab. Add some comments and subsections to navigate though your solution.*

*In this part you are expected to develop yourself a solution of the task and provide a reproducible code:*
- *Using Python 3;*
- *Contains code for installation of all dependencies;*
- *Contains code for downloading of all the datasets used*;
- *Contains the code for reproducing your results (in other words, if a tester downloads your notebook she should be able to run cell-by-cell the code and obtain your experimental results as described in the methodology section)*.


*As a result, you code will be graded according to these criteria:*
- ***Readability**: your code should be well-structured preferably with indicated parts of your approach (Preprocessing, Model training, Evaluation, etc.).*
- ***Reproducibility**: your code should be reproduced without any mistakes with “Run all” mode (obtaining experimental part).*


## 3.1 Requirements

In [None]:
! pip install transformers
! pip install gdown
! pip install wandb

[0mCollecting gdown
  Downloading gdown-4.7.1-py3-none-any.whl (15 kB)
Installing collected packages: gdown
Successfully installed gdown-4.7.1
[0m

In [None]:
from transformers import TFBertModel,  BertConfig, BertTokenizerFast
from tensorflow.keras.layers import Input, Dropout, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.initializers import TruncatedNormal
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import CategoricalAccuracy
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import MultiHeadAttention, Flatten, TimeDistributed
from tensorflow.keras.optimizers.experimental import AdamW
import pandas as pd
from sklearn.model_selection import train_test_split
import os
import numpy as np
from sklearn.metrics import classification_report
import wandb
from wandb.keras import WandbCallback
#from google.colab import drive
#drive.mount('/content/drive')

seed_value = 42
os.environ['PYTHONHASHSEED']=str(seed_value)

In [None]:
# Name of the BERT model to use
model_name = 'DeepPavlov/rubert-base-cased-sentence'

# Load transformers config and set output_hidden_states to False
config = BertConfig.from_pretrained(model_name)
config.output_hidden_states = False

# Load BERT tokenizer
tokenizer = BertTokenizerFast.from_pretrained(pretrained_model_name_or_path = model_name, config=config)

# Load the Transformers BERT model
transformer_model = TFBertModel.from_pretrained(model_name, config=config, from_pt=True)

# Load the MainLayer
bert = transformer_model.layers[0]

Downloading (…)lve/main/config.json:   0%|          | 0.00/642 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/24.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/1.65M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/711M [00:00<?, ?B/s]

All PyTorch model weights were used when initializing TFBertModel.

All the weights of TFBertModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


## 3.2 Download the data

In [None]:
!gdown 19LzgUlM3417TlG6bmo-eSY5vRAKw2ukR

Downloading...
From: https://drive.google.com/uc?id=19LzgUlM3417TlG6bmo-eSY5vRAKw2ukR
To: /kaggle/working/train_all.tsv
100%|███████████████████████████████████████| 1.54M/1.54M [00:00<00:00, 133MB/s]


## 3.3 Quarantine

### 3.3.1 Preprocessing

In [None]:
CLASS_NAME = "quarantine"
# Import data from csv
whole_data = pd.read_csv('/kaggle/working/train_all.tsv', sep='\t')

# Train_test_split
test_size = 0.4
data, data_test = train_test_split(whole_data, test_size=test_size, random_state = seed_value)


#------------------------------------------------------------------------------------#
## Train
# Select required columns
data = data[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]

# Set your model output as categorical and save in new label col
data['stance_label'] = pd.Categorical(data[f'{CLASS_NAME}_stance'])
data['argument_label'] = pd.Categorical(data[f'{CLASS_NAME}_argument'])

# Transform your output to numeric
data[f'{CLASS_NAME}_stance'] = data['stance_label'].cat.codes
data[f'{CLASS_NAME}_argument'] = data['argument_label'].cat.codes

#------------------------------------------------------------------------------------#
## Test
# Select required columns
data_test = data_test[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]

# Set your model output as categorical and save in new label col
data_test['stance_label'] = pd.Categorical(data_test[f'{CLASS_NAME}_stance'])
data_test['argument_label'] = pd.Categorical(data_test[f'{CLASS_NAME}_argument'])

# Transform your output to numeric
data_test[f'{CLASS_NAME}_stance'] = data_test['stance_label'].cat.codes
data_test[f'{CLASS_NAME}_argument'] = data_test['argument_label'].cat.codes

In [None]:
#wandb.init(project="Text_categorization", name = "baseline_run", tags = ["Ruberta", "RB"])
# Ready output data for the model
test_y_stance = to_categorical(data_test[f'{CLASS_NAME}_stance'])
test_y_argument = to_categorical(data_test[f'{CLASS_NAME}_argument'])

# Tokenize the input (takes some time)
test_x = tokenizer(
    text=data_test['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding='max_length', 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)

### 3.3.2 Training Process

In [None]:
# Build your model input
input_ids = Input(shape=(256,), name='input_ids', dtype='int32')
inputs = {'input_ids': input_ids}

# Load the Transformers BERT model as a layer in a Keras model
bert_model = bert(inputs)[1]
dropout = Dropout(config.hidden_dropout_prob, name='pooled_output')
pooled_output = dropout(bert_model, training=False)

# Then build your model output
stance = Dense(units=len(data.stance_label.value_counts()), kernel_initializer=TruncatedNormal(stddev=config.initializer_range), name='stance')(pooled_output)
argument = Dense(units=len(data.argument_label.value_counts()), kernel_initializer=TruncatedNormal(stddev=config.initializer_range), name='argument')(pooled_output)
outputs = {'stance': stance, 'argument': argument}

# And combine it all in a model object
model = Model(inputs=inputs, outputs=outputs, name='BERT_MultiLabel_MultiClass')

# Take a look at the model
model.summary()

Model: "BERT_MultiLabel_MultiClass"

__________________________________________________________________________________________________

 Layer (type)                   Output Shape         Param #     Connected to                     


 input_ids (InputLayer)         [(None, 256)]        0           []                               

                                                                                                  

 bert (TFBertMainLayer)         TFBaseModelOutputWi  177853440   ['input_ids[0][0]']              

                                thPoolingAndCrossAt                                               

                                tentions(last_hidde                                               

                                n_state=(None, 256,                                               

                                 768),                                                            

                                 pooler_output=(Non           

In [None]:
! wandb login --relogin Nikita4epuh
# Set an optimizer
optimizer = Adam(
    learning_rate=5e-05,
    epsilon=1e-08,
    weight_decay=0.01,
    clipnorm=1.0)

# Set loss and metrics
loss = {'stance': CategoricalCrossentropy(from_logits = True), 'argument': CategoricalCrossentropy(from_logits = True)}
metric = {'stance': CategoricalAccuracy('accuracy'), 'argument': CategoricalAccuracy('accuracy')}

# Compile the model
model.compile(
    optimizer = optimizer,
    loss = loss, 
    metrics = metric)

# Ready output data for the model
y_stance = to_categorical(data[f'{CLASS_NAME}_stance'])
y_argument = to_categorical(data[f'{CLASS_NAME}_argument'])

# Tokenize the input (takes some time)
x = tokenizer(
    text=data['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding=True, 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)

wandb.init(project="Text_categorization", name = "baseline_run_quarantine", tags = ["Ruberta", "RB"])
epochs = 20
# Fit the model
history = model.fit(
    # x={'input_ids': x['input_ids'], 'attention_mask': x['attention_mask']},
    x={'input_ids': x['input_ids']},
    y={'stance': y_stance, 'argument': y_argument},
    validation_data=({'input_ids': test_x['input_ids'][:8]}, {'stance': test_y_stance[:8], 'argument': test_y_argument[:8]}),
    batch_size=8,
    epochs=epochs, callbacks=[WandbCallback()])
for epoch in range(epochs): 
    wandb.log({'loss': history.history['loss'][epoch],
               'argument_loss': history.history['argument_loss'][epoch],
               'stance_loss': history.history['stance_loss'][epoch],
               'argument_accuracy': history.history['argument_accuracy'][epoch],
               'stance_accuracy': history.history['stance_accuracy'][epoch],
               'val_loss': history.history['stance_accuracy'][epoch],
               'val_argument_loss': history.history['val_argument_loss'][epoch],
               'val_stance_loss': history.history['val_stance_loss'][epoch],
               'val_argument_accuracy': history.history['val_argument_accuracy'][epoch],
               'val_stance_accuracy': history.history['val_stance_accuracy'][epoch]}) 

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


[34m[1mwandb[0m: Currently logged in as: [33msmolenkovaea00[0m. Use [1m`wandb login --relogin`[0m to force relogin




Epoch 1/20



[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230418_200519-egdadxhc/files/model-best)... Done. 13.0s



Epoch 2/20



[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230418_200519-egdadxhc/files/model-best)... Done. 12.4s



Epoch 3/20


Epoch 4/20


Epoch 5/20


Epoch 6/20


Epoch 7/20


Epoch 8/20


Epoch 9/20


Epoch 10/20


Epoch 11/20


Epoch 12/20


Epoch 13/20


Epoch 14/20


Epoch 15/20


Epoch 16/20


Epoch 17/20


Epoch 18/20


Epoch 19/20


Epoch 20/20



### 3.3.3 Inference

In [None]:
val_results = model.predict(x={'input_ids': test_x['input_ids']})



In [None]:
data_test[f'{CLASS_NAME}_stance_predict'] = val_results['stance'].argmax(axis=-1)
data_test[f'{CLASS_NAME}_argument_predict'] = val_results['argument'].argmax(axis=-1)

In [None]:
print(classification_report(data_test[f'{CLASS_NAME}_stance'].values.tolist(), val_results['stance'].argmax(axis=-1), zero_division=0))

              precision    recall  f1-score   support



           0       0.99      1.00      0.99       910

           1       0.00      0.00      0.00        33

           2       0.62      0.98      0.76       267

           3       0.00      0.00      0.00       134



    accuracy                           0.87      1344

   macro avg       0.40      0.49      0.44      1344

weighted avg       0.79      0.87      0.82      1344




In [None]:
print(classification_report(data_test[f'{CLASS_NAME}_argument'].values.tolist(), val_results['argument'].argmax(axis=-1), zero_division=0))

              precision    recall  f1-score   support



           0       0.99      1.00      0.99       910

           1       0.00      0.00      0.00        18

           2       0.85      0.97      0.91       375

           3       0.00      0.00      0.00        41



    accuracy                           0.94      1344

   macro avg       0.46      0.49      0.47      1344

weighted avg       0.91      0.94      0.92      1344




### 3.3.4 Prediction

In [None]:
test = pd.read_csv("/content/drive/MyDrive/HW_2/val_empty.tsv", sep='\t')
test.head()

Unnamed: 0,text_id,text,masks_stance,masks_argument,quarantine_stance,quarantine_argument,vaccines_stance,vaccines_argument
0,17041,> 26 марта его поместили на принудительный кар...,,,,,,
1,17057,И шевкунов вещает из телевизора про необходимо...,,,,,,
2,17058,Это результат его же лобировал до последнего ...,,,,,,
3,17071,При этом нормально обеспечены (к слову о якобы...,,,,,,
4,17079,для опасного врага нужен официальный карантин ...,,,,,,


In [None]:
test_d = test[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]

In [None]:
for_pred = tokenizer(
    text=test_d['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding='max_length', 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)

In [None]:
test_results = model.predict(x={'input_ids': for_pred['input_ids']})



In [None]:
test_results

{'stance': array([[-4.089829  , -0.33473855,  2.5395935 ,  1.3415891 ],
        [-4.0897512 , -0.3347472 ,  2.5395615 ,  1.3415637 ],
        [-4.0894866 , -0.33475885,  2.5394356 ,  1.3414531 ],
        ...,
        [ 5.198219  , -1.4057834 , -1.263636  , -1.1913619 ],
        [ 5.200686  , -1.4059892 , -1.2654437 , -1.1920764 ],
        [ 5.1925864 , -1.4060783 , -1.2590035 , -1.1900823 ]],
       dtype=float32),
 'argument': array([[-4.1424227 , -0.21485949,  2.9515023 ,  0.39686468],
        [-4.1423306 , -0.21488512,  2.9514856 ,  0.3968405 ],
        [-4.1420374 , -0.21497214,  2.9513922 ,  0.39675808],
        ...,
        [ 5.410088  , -1.8092142 , -0.41383114, -1.7783811 ],
        [ 5.4125404 , -1.8092424 , -0.41525313, -1.7785566 ],
        [ 5.40485   , -1.8096377 , -0.41075704, -1.7784284 ]],
       dtype=float32)}

In [None]:
test_d[f'{CLASS_NAME}_stance'] = test_results['stance'].argmax(axis=-1)
test_d[f'{CLASS_NAME}_argument'] = test_results['argument'].argmax(axis=-1)


A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead



See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  test_d[f'{CLASS_NAME}_stance'] = test_results['stance'].argmax(axis=-1)


A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead



See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  test_d[f'{CLASS_NAME}_argument'] = test_results['argument'].argmax(axis=-1)


In [None]:
test_d[f'{CLASS_NAME}_stance'] -= 1
test_d[f'{CLASS_NAME}_argument'] -= 1


A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead



See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  test_d[f'{CLASS_NAME}_stance'] -= 1


A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead



See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  test_d[f'{CLASS_NAME}_argument'] -= 1


In [None]:
test_d.head()

Unnamed: 0,text,quarantine_stance,quarantine_argument
0,> 26 марта его поместили на принудительный кар...,1,1
1,И шевкунов вещает из телевизора про необходимо...,1,1
2,Это результат его же лобировал до последнего ...,1,1
3,При этом нормально обеспечены (к слову о якобы...,1,1
4,для опасного врага нужен официальный карантин ...,1,1


In [None]:
test_d[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']].to_csv(f"/content/drive/MyDrive/HW_2/val_predict_{CLASS_NAME}.tsv", sep='\t', index=None)

In [None]:
CLASS_NAME = "quarantine"
df1 = pd.read_csv(f"/content/drive/MyDrive/HW_2/val_predict_{CLASS_NAME}.tsv", sep='\t')

### 3.3.5 new acrhitecture MHA

In [None]:
# Build your model
input_ids = Input(shape=(256,), name='input_ids', dtype='int32')
attention_mask = Input(shape=(256,), name='attention_mask', dtype='int32')
inputs = {'input_ids': input_ids, 'attention_mask': attention_mask}

# Load the Transformers BERT model as a layer in a Keras model
bert_model = bert(inputs)[0]

# Add multi-head attention layer
attention_output = MultiHeadAttention(num_heads=4, key_dim=64)(bert_model, bert_model)

# Flatten the output from multi-head attention layer
flatten = Flatten()(attention_output)

# Apply dropout layer
dropout = Dropout(config.hidden_dropout_prob, name='pooled_output')
pooled_output = dropout(flatten, training=False)

# Then build your model output
stance = Dense(units=len(data.stance_label.value_counts()), activation='relu', kernel_initializer=TruncatedNormal(stddev=config.initializer_range), name='stance')(pooled_output)
argument = Dense(units=len(data.argument_label.value_counts()), activation='relu', kernel_initializer=TruncatedNormal(stddev=config.initializer_range), name='argument')(pooled_output)
outputs = {'stance': stance, 'argument': argument}

# And combine it all in a model object
model = Model(inputs=inputs, outputs=outputs, name='BERT_MultiLabel_MultiClass')

# Take a look at the model
model.summary()

#### 3.3.5.0 Training

In [None]:
! wandb login --relogin Mark4epih
# Set an optimizer
optimizer = AdamW(
    learning_rate=5e-06,
    epsilon=1e-08,
    weight_decay=0.01,
    clipnorm=1.0)

# Set loss and metrics
loss = {'stance': CategoricalCrossentropy(from_logits = True), 'argument': CategoricalCrossentropy(from_logits = True)}
metric = {'stance': CategoricalAccuracy('accuracy'), 'argument': CategoricalAccuracy('accuracy')}

# Compile the model
model.compile(
    optimizer = optimizer,
    loss = loss, 
    metrics = metric)

# Ready output data for the model
y_stance = to_categorical(data[f'{CLASS_NAME}_stance'])
y_argument = to_categorical(data[f'{CLASS_NAME}_argument'])

# Tokenize the input (takes some time)
x = tokenizer(
    text=data['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding=True, 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)

wandb.init(project="Text_categorization", name = "Bert_attention_quarantine_ts_0.4_AdamW_5e6_heads_4_20epochs", tags = ["Ruberta_with_MHA", "RB"])
epochs = 3
# Fit the model
history = model.fit(
    x={'input_ids': x['input_ids'], 'attention_mask': x['attention_mask']},
    y={'stance': y_stance, 'argument': y_argument},
    validation_data=({'input_ids': test_x['input_ids'][:8], 'attention_mask': test_x['attention_mask'][:8]}, 
                     {'stance': test_y_stance[:8], 'argument': test_y_argument[:8]}),
    batch_size=8,
    epochs=epochs, callbacks=[WandbCallback()])

### 3.3.5.1 Inference

In [None]:
val_results = model.predict(x={'input_ids': test_x['input_ids'], 'attention_mask': test_x['attention_mask']})
data_test[f'{CLASS_NAME}_stance_predict'] = val_results['stance'].argmax(axis=-1)
data_test[f'{CLASS_NAME}_argument_predict'] = val_results['argument'].argmax(axis=-1)
print(classification_report(data_test[f'{CLASS_NAME}_stance'].values.tolist(), val_results['stance'].argmax(axis=-1), zero_division=0), classification_report(data_test[f'{CLASS_NAME}_argument'].values.tolist(), val_results['argument'].argmax(axis=-1), zero_division=0))


              precision    recall  f1-score   support



           0       1.00      0.98      0.99       468

           1       0.00      0.00      0.00        19

           2       0.59      0.99      0.74       126

           3       0.00      0.00      0.00        59



    accuracy                           0.87       672

   macro avg       0.40      0.49      0.43       672

weighted avg       0.80      0.87      0.83       672

               precision    recall  f1-score   support



           0       1.00      0.98      0.99       468

           1       0.00      0.00      0.00        15

           2       0.79      0.99      0.88       170

           3       0.00      0.00      0.00        19



    accuracy                           0.93       672

   macro avg       0.45      0.49      0.47       672

weighted avg       0.90      0.93      0.91       672




In [None]:
val_results = model.predict(x={'input_ids': test_x['input_ids'], 'attention_mask': test_x['attention_mask']})
data_test[f'{CLASS_NAME}_stance_predict'] = val_results['stance'].argmax(axis=-1)
data_test[f'{CLASS_NAME}_argument_predict'] = val_results['argument'].argmax(axis=-1)
print(classification_report(data_test[f'{CLASS_NAME}_stance'].values.tolist(), val_results['stance'].argmax(axis=-1), zero_division=0), classification_report(data_test[f'{CLASS_NAME}_argument'].values.tolist(), val_results['argument'].argmax(axis=-1), zero_division=0))


              precision    recall  f1-score   support



           0       0.97      0.99      0.98      1359

           1       0.00      0.00      0.00        47

           2       0.70      0.74      0.72       408

           3       0.41      0.41      0.41       202



    accuracy                           0.86      2016

   macro avg       0.52      0.53      0.53      2016

weighted avg       0.84      0.86      0.85      2016

               precision    recall  f1-score   support



           0       0.98      0.98      0.98      1359

           1       0.00      0.00      0.00        43

           2       0.82      0.95      0.88       543

           3       0.59      0.14      0.23        71



    accuracy                           0.93      2016

   macro avg       0.60      0.52      0.52      2016

weighted avg       0.90      0.93      0.91      2016




#### 3.3.5.2 Prediction

In [None]:
test = pd.read_csv("/kaggle/working/val_empty.tsv", sep='\t')
test.head()
test_d = test[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]
for_pred = tokenizer(
    text=test_d['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding='max_length', 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)
test_results = model.predict(x={'input_ids': for_pred['input_ids'], 'attention_mask': for_pred['attention_mask']})

In [None]:
test_d[f'{CLASS_NAME}_stance'] = test_results['stance'].argmax(axis=-1)
test_d[f'{CLASS_NAME}_argument'] = test_results['argument'].argmax(axis=-1)
test_d[f'{CLASS_NAME}_stance'] -= 1
test_d[f'{CLASS_NAME}_argument'] -= 1
test_d[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']].to_csv(f"/kaggle/working/val_predict_MHA_AdamW_lr_5e6_ts_0.4_num_4_20epochs{CLASS_NAME}.tsv", sep='\t', index=None)

In [None]:
CLASS_NAME = "quarantine"
df1 = pd.read_csv(f"/content/drive/MyDrive/HW_2/val_predict_MHA_AdamW_lr_5e6_{CLASS_NAME}.tsv", sep='\t')

## 3.4 Masks

### 3.4.1 Preprocessing

In [None]:
CLASS_NAME = "masks"
# Import data from csv
whole_data = pd.read_csv('/kaggle/working/train_all.tsv', sep='\t')

# Train_test_split
test_size = 0.4
data, data_test = train_test_split(whole_data, test_size=test_size, random_state = seed_value)


#------------------------------------------------------------------------------------#
## Train
# Select required columns
data = data[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]

# Set your model output as categorical and save in new label col
data['stance_label'] = pd.Categorical(data[f'{CLASS_NAME}_stance'])
data['argument_label'] = pd.Categorical(data[f'{CLASS_NAME}_argument'])

# Transform your output to numeric
data[f'{CLASS_NAME}_stance'] = data['stance_label'].cat.codes
data[f'{CLASS_NAME}_argument'] = data['argument_label'].cat.codes

#------------------------------------------------------------------------------------#
## Test
# Select required columns
data_test = data_test[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]

# Set your model output as categorical and save in new label col
data_test['stance_label'] = pd.Categorical(data_test[f'{CLASS_NAME}_stance'])
data_test['argument_label'] = pd.Categorical(data_test[f'{CLASS_NAME}_argument'])

# Transform your output to numeric
data_test[f'{CLASS_NAME}_stance'] = data_test['stance_label'].cat.codes
data_test[f'{CLASS_NAME}_argument'] = data_test['argument_label'].cat.codes

In [None]:
#wandb.init(project="Text_categorization", name = "baseline_run", tags = ["Ruberta", "RB"])
# Ready output data for the model
test_y_stance = to_categorical(data_test[f'{CLASS_NAME}_stance'])
test_y_argument = to_categorical(data_test[f'{CLASS_NAME}_argument'])

# Tokenize the input (takes some time)
test_x = tokenizer(
    text=data_test['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding='max_length', 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)
# Build your model input
input_ids = Input(shape=(256,), name='input_ids', dtype='int32')
inputs = {'input_ids': input_ids}

# Load the Transformers BERT model as a layer in a Keras model
bert_model = bert(inputs)[1]
dropout = Dropout(config.hidden_dropout_prob, name='pooled_output')
pooled_output = dropout(bert_model, training=False)

# Then build your model output
stance = Dense(units=len(data.stance_label.value_counts()), kernel_initializer=TruncatedNormal(stddev=config.initializer_range), name='stance')(pooled_output)
argument = Dense(units=len(data.argument_label.value_counts()), kernel_initializer=TruncatedNormal(stddev=config.initializer_range), name='argument')(pooled_output)
outputs = {'stance': stance, 'argument': argument}

# And combine it all in a model object
model = Model(inputs=inputs, outputs=outputs, name='BERT_MultiLabel_MultiClass')

# Take a look at the model
model.summary()

Model: "BERT_MultiLabel_MultiClass"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_ids (InputLayer)         [(None, 256)]        0           []                               
                                                                                                  
 bert (TFBertMainLayer)         TFBaseModelOutputWi  177853440   ['input_ids[0][0]']              
                                thPoolingAndCrossAt                                               
                                tentions(last_hidde                                               
                                n_state=(None, 256,                                               
                                 768),                                                            
                                 pooler_output=(Non                      

### 3.4.2 Training

In [None]:
! wandb login --relogin VasyaBog
# Set an optimizer
optimizer = Adam(
    learning_rate=5e-05,
    epsilon=1e-08,
    weight_decay=0.01,
    clipnorm=1.0)

# Set loss and metrics
loss = {'stance': CategoricalCrossentropy(from_logits = True), 'argument': CategoricalCrossentropy(from_logits = True)}
metric = {'stance': CategoricalAccuracy('accuracy'), 'argument': CategoricalAccuracy('accuracy')}

# Compile the model
model.compile(
    optimizer = optimizer,
    loss = loss, 
    metrics = metric)

# Ready output data for the model
y_stance = to_categorical(data[f'{CLASS_NAME}_stance'])
y_argument = to_categorical(data[f'{CLASS_NAME}_argument'])

# Tokenize the input (takes some time)
x = tokenizer(
    text=data['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding=True, 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)

wandb.init(project="Text_categorization", name = "baseline_run_masks", tags = ["Ruberta", "RB"])
epochs = 20
# Fit the model
history = model.fit(
    # x={'input_ids': x['input_ids'], 'attention_mask': x['attention_mask']},
    x={'input_ids': x['input_ids']},
    y={'stance': y_stance, 'argument': y_argument},
    validation_data=({'input_ids': test_x['input_ids'][:8]}, {'stance': test_y_stance[:8], 'argument': test_y_argument[:8]}),
    batch_size=8,
    epochs=epochs, callbacks=[WandbCallback()])
for epoch in range(epochs): 
    wandb.log({'loss': history.history['loss'][epoch],
               'argument_loss': history.history['argument_loss'][epoch],
               'stance_loss': history.history['stance_loss'][epoch],
               'argument_accuracy': history.history['argument_accuracy'][epoch],
               'stance_accuracy': history.history['stance_accuracy'][epoch],
               'val_loss': history.history['stance_accuracy'][epoch],
               'val_argument_loss': history.history['val_argument_loss'][epoch],
               'val_stance_loss': history.history['val_stance_loss'][epoch],
               'val_argument_accuracy': history.history['val_argument_accuracy'][epoch],
               'val_stance_accuracy': history.history['val_stance_accuracy'][epoch]}) 

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


ERROR:wandb.sdk.internal.internal_api:500 response executing GraphQL.

ERROR:wandb.sdk.internal.internal_api:{"error":"driver: bad connection"}

[34m[1mwandb[0m: Currently logged in as: [33msmolenkovaea00[0m. Use [1m`wandb login --relogin`[0m to force relogin




Epoch 1/20



[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230419_070439-ofpqwwmf/files/model-best)... Done. 12.2s



Epoch 2/20


Epoch 3/20



[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230419_070439-ofpqwwmf/files/model-best)... Done. 12.1s



Epoch 4/20


Epoch 5/20


Epoch 6/20


KeyboardInterrupt: ignored

### 3.4.3 Inference

In [None]:
val_results = model.predict(x={'input_ids': test_x['input_ids']})



In [None]:
data_test[f'{CLASS_NAME}_stance_predict'] = val_results['stance'].argmax(axis=-1)
data_test[f'{CLASS_NAME}_argument_predict'] = val_results['argument'].argmax(axis=-1)
print(classification_report(data_test[f'{CLASS_NAME}_stance'].values.tolist(), val_results['stance'].argmax(axis=-1), zero_division=0))

              precision    recall  f1-score   support



           0       0.99      0.99      0.99       534

           1       0.00      0.00      0.00        89

           2       0.60      1.00      0.75       287

           3       0.00      0.00      0.00        98



    accuracy                           0.81      1008

   macro avg       0.40      0.50      0.43      1008

weighted avg       0.70      0.81      0.74      1008




In [None]:
print(classification_report(data_test[f'{CLASS_NAME}_argument'].values.tolist(), val_results['argument'].argmax(axis=-1), zero_division=0))

              precision    recall  f1-score   support



           0       0.99      0.99      0.99       534

           1       0.00      0.00      0.00        56

           2       0.78      0.99      0.87       373

           3       0.00      0.00      0.00        45



    accuracy                           0.89      1008

   macro avg       0.44      0.49      0.47      1008

weighted avg       0.81      0.89      0.85      1008




In [None]:
test = pd.read_csv("/content/drive/MyDrive/HW_2/val_empty.tsv", sep='\t')
test.head()

Unnamed: 0,text_id,text,masks_stance,masks_argument,quarantine_stance,quarantine_argument,vaccines_stance,vaccines_argument
0,17041,> 26 марта его поместили на принудительный кар...,,,,,,
1,17057,И шевкунов вещает из телевизора про необходимо...,,,,,,
2,17058,Это результат его же лобировал до последнего ...,,,,,,
3,17071,При этом нормально обеспечены (к слову о якобы...,,,,,,
4,17079,для опасного врага нужен официальный карантин ...,,,,,,


In [None]:
test_d = test[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]
for_pred = tokenizer(
    text=test_d['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding='max_length', 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)
test_results = model.predict(x={'input_ids': for_pred['input_ids']})



In [None]:
test_d[f'{CLASS_NAME}_stance'] = test_results['stance'].argmax(axis=-1)
test_d[f'{CLASS_NAME}_argument'] = test_results['argument'].argmax(axis=-1)
test_d[f'{CLASS_NAME}_stance'] -= 1
test_d[f'{CLASS_NAME}_argument'] -= 1


A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead



See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  test_d[f'{CLASS_NAME}_stance'] = test_results['stance'].argmax(axis=-1)


A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead



See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  test_d[f'{CLASS_NAME}_argument'] = test_results['argument'].argmax(axis=-1)


A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead



See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  test_d[f'{CLASS_NAME}_stance'] -= 1


A 

In [None]:
test_d[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']].to_csv(f"/content/drive/MyDrive/HW_2/val_predict_{CLASS_NAME}.tsv", sep='\t', index=None)
df2 = pd.read_csv(f"/content/drive/MyDrive/HW_2/val_predict_{CLASS_NAME}.tsv", sep='\t')

### 3.4.4 New Arch

In [None]:
# Build your model
input_ids = Input(shape=(256,), name='input_ids', dtype='int32')
attention_mask = Input(shape=(256,), name='attention_mask', dtype='int32')
inputs = {'input_ids': input_ids, 'attention_mask': attention_mask}

# Load the Transformers BERT model as a layer in a Keras model
bert_model = bert(inputs)[0]

# Add multi-head attention layer
attention_output = MultiHeadAttention(num_heads=4, key_dim=64)(bert_model, bert_model)

# Flatten the output from multi-head attention layer
flatten = Flatten()(attention_output)

# Apply dropout layer
dropout = Dropout(config.hidden_dropout_prob, name='pooled_output')
pooled_output = dropout(flatten, training=False)

# Then build your model output
stance = Dense(units=len(data.stance_label.value_counts()), activation='relu', kernel_initializer=TruncatedNormal(stddev=config.initializer_range), name='stance')(pooled_output)
argument = Dense(units=len(data.argument_label.value_counts()), activation='relu', kernel_initializer=TruncatedNormal(stddev=config.initializer_range), name='argument')(pooled_output)
outputs = {'stance': stance, 'argument': argument}

# And combine it all in a model object
model = Model(inputs=inputs, outputs=outputs, name='BERT_MultiLabel_MultiClass')

# Take a look at the model
model.summary()

Model: "BERT_MultiLabel_MultiClass"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 attention_mask (InputLayer)    [(None, 256)]        0           []                               
                                                                                                  
 input_ids (InputLayer)         [(None, 256)]        0           []                               
                                                                                                  
 bert (TFBertMainLayer)         TFBaseModelOutputWi  177853440   ['attention_mask[0][0]',         
                                thPoolingAndCrossAt               'input_ids[0][0]']              
                                tentions(last_hidde                                               
                                n_state=(None, 256,                      

### 3.4.5 Training

In [None]:
! wandb login --relogin NikitaLoh
# Set an optimizer
optimizer = AdamW(
    learning_rate=5e-06,
    epsilon=1e-08,
    weight_decay=0.01,
    clipnorm=1.0)

# Set loss and metrics
loss = {'stance': CategoricalCrossentropy(from_logits = True), 'argument': CategoricalCrossentropy(from_logits = True)}
metric = {'stance': CategoricalAccuracy('accuracy'), 'argument': CategoricalAccuracy('accuracy')}

# Compile the model
model.compile(
    optimizer = optimizer,
    loss = loss, 
    metrics = metric)

# Ready output data for the model
y_stance = to_categorical(data[f'{CLASS_NAME}_stance'])
y_argument = to_categorical(data[f'{CLASS_NAME}_argument'])

# Tokenize the input (takes some time)
x = tokenizer(
    text=data['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding=True, 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)

wandb.init(project="Text_categorization", name = "Bert_attention_masks_20epochs_AdamW_lr_5e6_masks_num_4", tags = ["Ruberta_with_MHA", "RB"])
epochs = 20
# Fit the model
history = model.fit(
    x={'input_ids': x['input_ids'], 'attention_mask': x['attention_mask']},
    y={'stance': y_stance, 'argument': y_argument},
    validation_data=({'input_ids': test_x['input_ids'][:8], 'attention_mask': test_x['attention_mask'][:8]}, 
                     {'stance': test_y_stance[:8], 'argument': test_y_argument[:8]}),
    batch_size=8,
    epochs=epochs, callbacks=[WandbCallback()])

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


0,1
argument_accuracy,▁██
argument_loss,█▁▁
epoch,▁▅█
loss,█▁▁
stance_accuracy,▁▇█
stance_loss,█▁▁
val_argument_accuracy,▁▁▁
val_argument_loss,█▂▁
val_loss,█▁▁
val_stance_accuracy,▁██

0,1
argument_accuracy,0.99975
argument_loss,0.00081
best_epoch,2.0
best_val_loss,0.02229
epoch,2.0
loss,0.00193
stance_accuracy,0.99975
stance_loss,0.00112
val_argument_accuracy,1.0
val_argument_loss,0.0


Epoch 1/20

[34m[1mwandb[0m: Adding directory to artifact (/kaggle/working/wandb/run-20230420_113722-b17zbcum/files/model-best)... Done. 13.9s


Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


#### 3.4.5.1 Inference

In [None]:
val_results = model.predict(x={'input_ids': test_x['input_ids'], 'attention_mask': test_x['attention_mask']})
data_test[f'{CLASS_NAME}_stance_predict'] = val_results['stance'].argmax(axis=-1)
data_test[f'{CLASS_NAME}_argument_predict'] = val_results['argument'].argmax(axis=-1)
print(classification_report(data_test[f'{CLASS_NAME}_stance'].values.tolist(), val_results['stance'].argmax(axis=-1), zero_division=0), classification_report(data_test[f'{CLASS_NAME}_argument'].values.tolist(), val_results['argument'].argmax(axis=-1), zero_division=0))


              precision    recall  f1-score   support



           0       1.00      1.00      1.00       534

           1       0.41      0.21      0.28        89

           2       0.75      0.75      0.75       287

           3       0.28      0.41      0.33        98



    accuracy                           0.80      1008

   macro avg       0.61      0.59      0.59      1008

weighted avg       0.81      0.80      0.80      1008

               precision    recall  f1-score   support



           0       1.00      1.00      1.00       534

           1       0.33      0.54      0.41        56

           2       0.90      0.92      0.91       373

           3       0.00      0.00      0.00        45



    accuracy                           0.90      1008

   macro avg       0.55      0.61      0.58      1008

weighted avg       0.88      0.90      0.89      1008




#### 3.4.5.2 Predict

In [None]:
CLASS_NAME = "masks"
test = pd.read_csv("/kaggle/working/val_empty.tsv", sep='\t')
test.head()
test_d = test[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]
for_pred = tokenizer(
    text=test_d['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding='max_length', 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)
test_results = model.predict(x={'input_ids': for_pred['input_ids'], 'attention_mask': for_pred['attention_mask']})
test_d[f'{CLASS_NAME}_stance'] = test_results['stance'].argmax(axis=-1)
test_d[f'{CLASS_NAME}_argument'] = test_results['argument'].argmax(axis=-1)
test_d[f'{CLASS_NAME}_stance'] -= 1
test_d[f'{CLASS_NAME}_argument'] -= 1
test_d[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']].to_csv(f"/kaggle/working/val_predict_MHA_AdamW_lr_5e6_ts_0.4_num_4_20epochs_{CLASS_NAME}.tsv", sep='\t', index=None)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  app.launch_new_instance()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pyd

## 3.5 vaccines

### 3.5.1 Preprocessing

In [None]:
CLASS_NAME = "vaccines"
# Import data from csv
whole_data = pd.read_csv('/kaggle/working/train_all.tsv', sep='\t')

# Train_test_split
test_size = 0.2
data, data_test = train_test_split(whole_data, test_size=test_size)


#------------------------------------------------------------------------------------#
## Train
# Select required columns
data = data[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]

# Set your model output as categorical and save in new label col
data['stance_label'] = pd.Categorical(data[f'{CLASS_NAME}_stance'])
data['argument_label'] = pd.Categorical(data[f'{CLASS_NAME}_argument'])

# Transform your output to numeric
data[f'{CLASS_NAME}_stance'] = data['stance_label'].cat.codes
data[f'{CLASS_NAME}_argument'] = data['argument_label'].cat.codes

#------------------------------------------------------------------------------------#
## Test
# Select required columns
data_test = data_test[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]

# Set your model output as categorical and save in new label col
data_test['stance_label'] = pd.Categorical(data_test[f'{CLASS_NAME}_stance'])
data_test['argument_label'] = pd.Categorical(data_test[f'{CLASS_NAME}_argument'])

# Transform your output to numeric
data_test[f'{CLASS_NAME}_stance'] = data_test['stance_label'].cat.codes
data_test[f'{CLASS_NAME}_argument'] = data_test['argument_label'].cat.codes
#wandb.init(project="Text_categorization", name = "baseline_run", tags = ["Ruberta", "RB"])
# Ready output data for the model
test_y_stance = to_categorical(data_test[f'{CLASS_NAME}_stance'])
test_y_argument = to_categorical(data_test[f'{CLASS_NAME}_argument'])

# Tokenize the input (takes some time)
test_x = tokenizer(
    text=data_test['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding='max_length', 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)
# Build your model input
input_ids = Input(shape=(256,), name='input_ids', dtype='int32')
inputs = {'input_ids': input_ids}

# Load the Transformers BERT model as a layer in a Keras model
bert_model = bert(inputs)[1]
dropout = Dropout(config.hidden_dropout_prob, name='pooled_output')
pooled_output = dropout(bert_model, training=False)

# Then build your model output
stance = Dense(units=len(data.stance_label.value_counts()), kernel_initializer=TruncatedNormal(stddev=config.initializer_range), name='stance')(pooled_output)
argument = Dense(units=len(data.argument_label.value_counts()), kernel_initializer=TruncatedNormal(stddev=config.initializer_range), name='argument')(pooled_output)
outputs = {'stance': stance, 'argument': argument}

# And combine it all in a model object
model = Model(inputs=inputs, outputs=outputs, name='BERT_MultiLabel_MultiClass')

# Take a look at the model
model.summary()

Model: "BERT_MultiLabel_MultiClass"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_ids (InputLayer)         [(None, 256)]        0           []                               
                                                                                                  
 bert (TFBertMainLayer)         TFBaseModelOutputWi  177853440   ['input_ids[0][0]']              
                                thPoolingAndCrossAt                                               
                                tentions(last_hidde                                               
                                n_state=(None, 256,                                               
                                 768),                                                            
                                 pooler_output=(Non                      

### 3.5.2 Training

In [None]:
! wandb login --relogin Markkrasav4ik
# Set an optimizer
optimizer = Adam(
    learning_rate=5e-05,
    epsilon=1e-08,
    weight_decay=0.01,
    clipnorm=1.0)

# Set loss and metrics
loss = {'stance': CategoricalCrossentropy(from_logits = True), 'argument': CategoricalCrossentropy(from_logits = True)}
metric = {'stance': CategoricalAccuracy('accuracy'), 'argument': CategoricalAccuracy('accuracy')}

# Compile the model
model.compile(
    optimizer = optimizer,
    loss = loss, 
    metrics = metric)

# Ready output data for the model
y_stance = to_categorical(data[f'{CLASS_NAME}_stance'])
y_argument = to_categorical(data[f'{CLASS_NAME}_argument'])

# Tokenize the input (takes some time)
x = tokenizer(
    text=data['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding=True, 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)

wandb.init(project="Text_categorization", name = "baseline_run_vaccines", tags = ["Ruberta", "RB"])
epochs = 20
# Fit the model
history = model.fit(
    # x={'input_ids': x['input_ids'], 'attention_mask': x['attention_mask']},
    x={'input_ids': x['input_ids']},
    y={'stance': y_stance, 'argument': y_argument},
    validation_data=({'input_ids': test_x['input_ids'][:8]}, {'stance': test_y_stance[:8], 'argument': test_y_argument[:8]}),
    batch_size=8,
    epochs=epochs, callbacks=[WandbCallback()])
for epoch in range(epochs): 
    wandb.log({'loss': history.history['loss'][epoch],
               'argument_loss': history.history['argument_loss'][epoch],
               'stance_loss': history.history['stance_loss'][epoch],
               'argument_accuracy': history.history['argument_accuracy'][epoch],
               'stance_accuracy': history.history['stance_accuracy'][epoch],
               'val_loss': history.history['stance_accuracy'][epoch],
               'val_argument_loss': history.history['val_argument_loss'][epoch],
               'val_stance_loss': history.history['val_stance_loss'][epoch],
               'val_argument_accuracy': history.history['val_argument_accuracy'][epoch],
               'val_stance_accuracy': history.history['val_stance_accuracy'][epoch]}) 

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


[34m[1mwandb[0m: Currently logged in as: [33msmolenkovaea00[0m. Use [1m`wandb login --relogin`[0m to force relogin




Epoch 1/20



[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230419_080129-aqk0c2j9/files/model-best)... Done. 12.9s



Epoch 2/20



[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230419_080129-aqk0c2j9/files/model-best)... Done. 12.9s



Epoch 3/20


Epoch 4/20


Epoch 5/20


Epoch 6/20


Epoch 7/20


Epoch 8/20


Epoch 9/20


Epoch 10/20


Epoch 11/20


Epoch 12/20


Epoch 13/20

 51/672 [=>............................] - ETA: 5:11 - loss: 1.6768 - argument_loss: 0.8013 - stance_loss: 0.8755 - argument_accuracy: 0.7255 - stance_accuracy: 0.7255

KeyboardInterrupt: ignored

In [None]:
! wandb login --relogin VasyaBog
# Set an optimizer
optimizer = Adam(
    learning_rate=5e-05,
    epsilon=1e-08,
    weight_decay=0.01,
    clipnorm=1.0)

# Set loss and metrics
loss = {'stance': CategoricalCrossentropy(from_logits = True), 'argument': CategoricalCrossentropy(from_logits = True)}
metric = {'stance': CategoricalAccuracy('accuracy'), 'argument': CategoricalAccuracy('accuracy')}

# Compile the model
model.compile(
    optimizer = optimizer,
    loss = loss, 
    metrics = metric)

# Ready output data for the model
y_stance = to_categorical(data[f'{CLASS_NAME}_stance'])
y_argument = to_categorical(data[f'{CLASS_NAME}_argument'])

# Tokenize the input (takes some time)
x = tokenizer(
    text=data['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding=True, 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)

wandb.init(project="Text_categorization", name = "baseline_run_vaccines_2_epochs", tags = ["Ruberta", "RB"])
epochs = 2
# Fit the model
history = model.fit(
    # x={'input_ids': x['input_ids'], 'attention_mask': x['attention_mask']},
    x={'input_ids': x['input_ids']},
    y={'stance': y_stance, 'argument': y_argument},
    validation_data=({'input_ids': test_x['input_ids'][:8]}, {'stance': test_y_stance[:8], 'argument': test_y_argument[:8]}),
    batch_size=8,
    epochs=epochs, callbacks=[WandbCallback()])
for epoch in range(epochs): 
    wandb.log({'loss': history.history['loss'][epoch],
               'argument_loss': history.history['argument_loss'][epoch],
               'stance_loss': history.history['stance_loss'][epoch],
               'argument_accuracy': history.history['argument_accuracy'][epoch],
               'stance_accuracy': history.history['stance_accuracy'][epoch],
               'val_loss': history.history['stance_accuracy'][epoch],
               'val_argument_loss': history.history['val_argument_loss'][epoch],
               'val_stance_loss': history.history['val_stance_loss'][epoch],
               'val_argument_accuracy': history.history['val_argument_accuracy'][epoch],
               'val_stance_accuracy': history.history['val_stance_accuracy'][epoch]}) 

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


[34m[1mwandb[0m: Currently logged in as: [33msmolenkovaea00[0m. Use [1m`wandb login --relogin`[0m to force relogin




Epoch 1/2



[34m[1mwandb[0m: Adding directory to artifact (/content/wandb/run-20230419_092630-z1wiapq3/files/model-best)... Done. 12.6s



Epoch 2/2



### 3.5.3 Inference

In [None]:
val_results = model.predict(x={'input_ids': test_x['input_ids']})
data_test[f'{CLASS_NAME}_stance_predict'] = val_results['stance'].argmax(axis=-1)
data_test[f'{CLASS_NAME}_argument_predict'] = val_results['argument'].argmax(axis=-1)
print(classification_report(data_test[f'{CLASS_NAME}_stance'].values.tolist(), val_results['stance'].argmax(axis=-1), zero_division=0))


              precision    recall  f1-score   support



           0       1.00      1.00      1.00      1009

           1       0.00      0.00      0.00        89

           2       0.51      0.98      0.67       173

           3       0.00      0.00      0.00        73



    accuracy                           0.88      1344

   macro avg       0.38      0.50      0.42      1344

weighted avg       0.81      0.88      0.84      1344




In [None]:
print(classification_report(data_test[f'{CLASS_NAME}_argument'].values.tolist(), val_results['argument'].argmax(axis=-1), zero_division=0))

              precision    recall  f1-score   support



           0       1.00      1.00      1.00      1009

           1       0.00      0.00      0.00        59

           2       0.72      0.98      0.83       244

           3       0.00      0.00      0.00        32



    accuracy                           0.93      1344

   macro avg       0.43      0.50      0.46      1344

weighted avg       0.88      0.93      0.90      1344




In [None]:
test = pd.read_csv("/content/drive/MyDrive/HW_2/val_empty.tsv", sep='\t')
test_d = test[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]
for_pred = tokenizer(
    text=test_d['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding='max_length', 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)
test_results = model.predict(x={'input_ids': for_pred['input_ids']})



In [None]:
test_d[f'{CLASS_NAME}_stance'] = test_results['stance'].argmax(axis=-1)
test_d[f'{CLASS_NAME}_argument'] = test_results['argument'].argmax(axis=-1)
test_d[f'{CLASS_NAME}_stance'] -= 1
test_d[f'{CLASS_NAME}_argument'] -= 1


A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead



See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  test_d[f'{CLASS_NAME}_stance'] = test_results['stance'].argmax(axis=-1)


A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead



See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  test_d[f'{CLASS_NAME}_argument'] = test_results['argument'].argmax(axis=-1)


A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead



See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  test_d[f'{CLASS_NAME}_stance'] -= 1


A 

In [None]:
test_d[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']].to_csv(f"/content/drive/MyDrive/HW_2/val_predict_{CLASS_NAME}.tsv", sep='\t', index=None)

In [None]:
CLASS_NAME = "vaccines"фы
df3 = pd.read_csv(f"/content/drive/MyDrive/HW_2/val_predict_{CLASS_NAME}.tsv", sep='\t')

### 3.5.4 New ARCH

In [None]:
# Build your model
input_ids = Input(shape=(256,), name='input_ids', dtype='int32')
attention_mask = Input(shape=(256,), name='attention_mask', dtype='int32')
inputs = {'input_ids': input_ids, 'attention_mask': attention_mask}

# Load the Transformers BERT model as a layer in a Keras model
bert_model = bert(inputs)[0]

# Add multi-head attention layer
attention_output = MultiHeadAttention(num_heads=3, key_dim=64)(bert_model, bert_model)

# Flatten the output from multi-head attention layer
flatten = Flatten()(attention_output)

# Apply dropout layer
dropout = Dropout(config.hidden_dropout_prob, name='pooled_output')
pooled_output = dropout(flatten, training=False)

# Then build your model output
stance = Dense(units=len(data.stance_label.value_counts()), activation='relu', kernel_initializer=TruncatedNormal(stddev=config.initializer_range), name='stance')(pooled_output)
argument = Dense(units=len(data.argument_label.value_counts()), activation='relu', kernel_initializer=TruncatedNormal(stddev=config.initializer_range), name='argument')(pooled_output)
outputs = {'stance': stance, 'argument': argument}

# And combine it all in a model object
model = Model(inputs=inputs, outputs=outputs, name='BERT_MultiLabel_MultiClass')

# Take a look at the model
model.summary()

Model: "BERT_MultiLabel_MultiClass"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 attention_mask (InputLayer)    [(None, 256)]        0           []                               
                                                                                                  
 input_ids (InputLayer)         [(None, 256)]        0           []                               
                                                                                                  
 bert (TFBertMainLayer)         TFBaseModelOutputWi  177853440   ['attention_mask[0][0]',         
                                thPoolingAndCrossAt               'input_ids[0][0]']              
                                tentions(last_hidde                                               
                                n_state=(None, 256,                      

#### 3.5.5 Training

In [None]:
#! wandb login --relogin Nikita4epuh
# Set an optimizer
optimizer = AdamW(
    learning_rate=5e-06,
    epsilon=1e-08,
    weight_decay=0.01,
    clipnorm=1.0)

# Set loss and metrics
loss = {'stance': CategoricalCrossentropy(from_logits = True), 'argument': CategoricalCrossentropy(from_logits = True)}
metric = {'stance': CategoricalAccuracy('accuracy'), 'argument': CategoricalAccuracy('accuracy')}

# Compile the model
model.compile(
    optimizer = optimizer,
    loss = loss, 
    metrics = metric)

# Ready output data for the model
y_stance = to_categorical(data[f'{CLASS_NAME}_stance'])
y_argument = to_categorical(data[f'{CLASS_NAME}_argument'])

# Tokenize the input (takes some time)
x = tokenizer(
    text=data['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding=True, 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)

#wandb.init(project="Text_categorization", name = "Bert_attention_vaccines_4epochs", tags = ["Ruberta_with_MHA", "RB"])
epochs = 20
# Fit the model
history = model.fit(
    x={'input_ids': x['input_ids'], 'attention_mask': x['attention_mask']},
    y={'stance': y_stance, 'argument': y_argument},
    validation_data=({'input_ids': test_x['input_ids'][:8], 'attention_mask': test_x['attention_mask'][:8]}, 
                     {'stance': test_y_stance[:8], 'argument': test_y_argument[:8]}),
    batch_size=8,
    epochs=epochs) #callbacks=[WandbCallback()])

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [None]:
! wandb login --login Nikitathebest
wandb.init(project="Text_categorization", name = "Bert_attention_vaccines_20epochs_Adam_5e-6", tags = ["Ruberta_with_MHA", "RB"])

for epoch in range(epochs): 
    wandb.log({'loss': history.history['loss'][epoch],
               'argument_loss': history.history['argument_loss'][epoch],
               'stance_loss': history.history['stance_loss'][epoch],
               'argument_accuracy': history.history['argument_accuracy'][epoch],
               'stance_accuracy': history.history['stance_accuracy'][epoch],
               'val_loss': history.history['stance_accuracy'][epoch],
               'val_argument_loss': history.history['val_argument_loss'][epoch],
               'val_stance_loss': history.history['val_stance_loss'][epoch],
               'val_argument_accuracy': history.history['val_argument_accuracy'][epoch],
               'val_stance_accuracy': history.history['val_stance_accuracy'][epoch]}) 

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Usage: wandb login [OPTIONS] [KEY]...
Try 'wandb login --help' for help.

Error: No such option: --login Did you mean --relogin?
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

  ········································


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.016670499766663245, max=1.0…

#### 3.5.6 Inference

In [None]:
val_results = model.predict(x={'input_ids': test_x['input_ids'], 'attention_mask': test_x['attention_mask']})
data_test[f'{CLASS_NAME}_stance_predict'] = val_results['stance'].argmax(axis=-1)
data_test[f'{CLASS_NAME}_argument_predict'] = val_results['argument'].argmax(axis=-1)
print(classification_report(data_test[f'{CLASS_NAME}_stance'].values.tolist(), val_results['stance'].argmax(axis=-1), zero_division=0), classification_report(data_test[f'{CLASS_NAME}_argument'].values.tolist(), val_results['argument'].argmax(axis=-1), zero_division=0))


              precision    recall  f1-score   support



           0       0.75      1.00      0.86      1010

           1       0.00      0.00      0.00        78

           2       0.00      0.00      0.00       174

           3       0.00      0.00      0.00        82



    accuracy                           0.75      1344

   macro avg       0.19      0.25      0.21      1344

weighted avg       0.56      0.75      0.64      1344

               precision    recall  f1-score   support



           0       1.00      1.00      1.00      1010

           1       0.00      0.00      0.00        50

           2       0.74      1.00      0.85       248

           3       0.00      0.00      0.00        36



    accuracy                           0.93      1344

   macro avg       0.43      0.50      0.46      1344

weighted avg       0.89      0.93      0.91      1344




#### 3.5.7 Predict

In [None]:
!gdown 1lUerv_gpQvo_e8Fl-flxEDtz77z_ZsLx

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Downloading...
From: https://drive.google.com/uc?id=1lUerv_gpQvo_e8Fl-flxEDtz77z_ZsLx
To: /kaggle/working/val_empty.tsv
100%|█████████████████████████████████████████| 316k/316k [00:00<00:00, 102MB/s]


In [None]:
CLASS_NAME = "vaccines"
test = pd.read_csv("/kaggle/working/val_empty.tsv", sep='\t')
test.head()
test_d = test[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']]
for_pred = tokenizer(
    text=test_d['text'].to_list(),
    add_special_tokens=True,
    max_length=256,
    truncation=True,
    padding='max_length', 
    return_tensors='tf',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)
test_results = model.predict(x={'input_ids': for_pred['input_ids'], 'attention_mask': for_pred['attention_mask']})
test_d[f'{CLASS_NAME}_stance'] = test_results['stance'].argmax(axis=-1)
test_d[f'{CLASS_NAME}_argument'] = test_results['argument'].argmax(axis=-1)
test_d[f'{CLASS_NAME}_stance'] -= 1
test_d[f'{CLASS_NAME}_argument'] -= 1
test_d[['text', f'{CLASS_NAME}_stance', f'{CLASS_NAME}_argument']].to_csv(f"/kaggle/working/val_predict_MHA_Adam_5e_6_{CLASS_NAME}.tsv", sep='\t', index=None)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  app.launch_new_instance()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pyd

## 3.6 Concatinate all files with results for masks, vaccines, quarantine

In [None]:
CLASS_NAME = "quarantine"
df1 = pd.read_csv(f"/content/drive/MyDrive/HW_2/val_predict_MHA_{CLASS_NAME}.tsv", sep='\t')
CLASS_NAME = "masks"
df2 = pd.read_csv(f"/content/drive/MyDrive/HW_2/val_predict_MHA_{CLASS_NAME}.tsv", sep='\t')
CLASS_NAME = "vaccines"
df3 = pd.read_csv(f"/content/drive/MyDrive/HW_2/val_predict_MHA_{CLASS_NAME}.tsv", sep='\t')


result = pd.merge(df1, df2, on="text")
result = pd.merge(result, df3, on="text")
result.to_csv("/content/drive/MyDrive/HW_2/val_predict_concat_MHA.tsv", sep='\t', index=None)

In [None]:
!zip /content/drive/MyDrive/HW_2/val_predict_concat.zip /content/drive/MyDrive/HW_2/val_predict_concat.tsv

updating: content/drive/MyDrive/HW_2/val_predict_concat.tsv (deflated 73%)
