[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1s3WPHPpG3fkZjvNhEYTdvgBbwhVPTgp8?authuser=1#scrollTo=ILb1FeTDcnmC)

#### Fine-tune BART for the arguemnt-keypoint matching pipeline
#### Input: Arguments
#### Output: Intermediary text

Install the simple transformers, tqdm, pandas packages

In [None]:
!pip install simpletransformers tqdm pandas

Import the train, dev and test sets

In [None]:
import pandas as pd
train_df_prev = pd.read_csv("train.csv")
dev_df_prev = pd.read_csv("dev.csv")
test_df_prev = pd.read_csv("test.csv")

In [None]:
train_df_prev.head(5)

In [None]:
train_df_prev.columns

We only need the 'argument', 'keypoint' and 'label' columns

In [None]:
train_df = train_df_prev.filter(['argument', 'key_point', 'label'], axis=1)
dev_df = dev_df_prev.filter(['argument', 'key_point', 'label'], axis=1)
test_df = test_df_prev.filter(['argument', 'key_point', 'label'], axis=1)

In [None]:
train_df.head(5)

There are two prompt engineering templates for the task.
Template 1: [X]. This means [Z].
Template 2: What are the keypoints for the following argument? [X] [Z]
where, X is the argument as input and Z is the intermediary text as output.

You need to select either template 1 or template 2 for fine-tuning BART

This is template 1

In [None]:
# Template 1
# For Train set

for i in train_df.index:
  arg = train_df['argument'][i]
  if arg[-1] != '.':
    modified_arg = arg + '. This means '
    train_df.at[i, 'argument'] = modified_arg
  else:
    modified_arg = arg + ' This means '
    train_df.at[i, 'argument'] = modified_arg

# For Dev set

for i in dev_df.index:
  arg = dev_df['argument'][i]
  if arg[-1] != '.':
    modified_arg = arg + '. This means '
    dev_df.at[i, 'argument'] = modified_arg
  else:
    modified_arg = arg + ' This means '
    dev_df.at[i, 'argument'] = modified_arg

This is Template 2

In [None]:
# Template 2
# For Train set

for i in train_df.index:
  arg = train_df['argument'][i]
  if arg[-1] != '.':
    modified_arg = 'What are the keypoints for the following argument? ' + arg + '.'
    train_df.at[i, 'argument'] = modified_arg
  else:
    modified_arg = 'What are the keypoints for the following argument? ' + arg
    train_df.at[i, 'argument'] = modified_arg

# For Dev set

for i in dev_df.index:
  arg = dev_df['argument'][i]
  if arg[-1] != '.':
    modified_arg = 'What are the keypoints for the following argument? ' + arg + '.'
    dev_df.at[i, 'argument'] = modified_arg
  else:
    modified_arg = 'What are the keypoints for the following argument? ' + arg
    dev_df.at[i, 'argument'] = modified_arg

Check a sample whether or not everything is okay.

In [None]:
train_df.at[110, 'argument']

Rename the columns for compatibility with Simple Transformers

In [None]:
# Renaming the columns
train_df.columns = ["input_text","target_text"]
dev_df.columns = ["input_text","target_text"]

In [None]:
train_data, dev_data = train_df, dev_df

Data pre-processing is complete. Let's fine-tune BART now!

In [None]:
import logging

import pandas as pd
from simpletransformers.seq2seq import (
    Seq2SeqModel,
    Seq2SeqArgs,
)


logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)

We are using BART-large here. You can use BART-base as well.

In [None]:
model_args = Seq2SeqArgs()
model_args.num_train_epochs = 10
model_args.no_save = True
model_args.evaluate_generated_text = True
model_args.evaluate_during_training = True
model_args.evaluate_during_training_verbose = True

# Initialize model
model = Seq2SeqModel(
    encoder_decoder_type="bart",
    encoder_decoder_name="facebook/bart-large",
    args=model_args,
    use_cuda=True,
)


In [None]:
def count_matches(labels, preds):
    print(labels)
    print(preds)
    return sum(
        [
            1 if label == pred else 0
            for label, pred in zip(labels, preds)
        ]
    )

Training starts from here.

In [None]:
# Train the model
model.train_model(
    train_df, eval_data=dev_df, matches=count_matches
)


Don''t forget to zip the model files and then upload it to your Google Drive

In [None]:
!zip -r /content/t5_base_no_prompt.zip /content/outputs

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Evaluate the model now. You can pass an input to the model manually and observe the output. Or you can pass an entire test set and get the list of outputs.

In [None]:
# results = model.eval_model(dev_df)

# Use the model for prediction
print(
    model.predict(
        [
            "What are the key points for the following argument?  marriage provides stability and a commitment between people which strengthens relationships and to abandon it would be to turn out backs on something good."
        ]
    )
)

In [None]:
# Predict full test set

prefix = "text-classification"
ref = []
to_predict = []
for i in test_df.index:
  input = prefix + ": " + test_df['input_text'][i]
  ref.append(test_df['target_text'][i])
  to_predict.append(input)
  #pred.append(trained_model.predict(f"{prefix}: {input}"))

predictions = trained_model.predict(to_predict)

Finally, look at the accuracy, f1-score, precision and recall

In [None]:
from sklearn.metrics import classification_report

print(classification_report(ref, predictions))