#### Lab1

First part is the code of Ernests that I didn't want to delete in case it is not really saved anywhere else, so jump to 'Change up code-Hannah'.
##### Problem:
Given a dataset with questions-answers, create a classifier that determines whether a question can be answered.

##### Approach:
1.
- Seperate the dataset in `training`/`validation`
- Extract all questions for a given `Language X`
- Save each `question`-`answer` pair in a data structure with a corresponding label (`ANS`/`UNANS`)
- For each of the questions, check whether it contains an `answer`
    - If `ans` is not empty, add an `ANS` label
    - Else add an `UNANS` label
- At this point, all necessary information is extracted from the dataset and saved as a python data structure

2. 
- Convert the data structure to a feature vector (i.e. `bag-of-words`?)
- Train the model (???)
- Evaluate by passing the validation data into the model

In [None]:
!python --version

Python 3.7.14


#### Install dependencies
If installing with conda install, add -y flag to prevent being stuck in Y/N

In [None]:
!pip install nltk
!pip install spacy
!pip install transformers
!pip install pyyaml
!pip install datasets
!pip install bpemb
!pip install gensim
!pip install fugashi
!pip install ipadic
!pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
!pip install googletrans
!python -m spacy download fi_core_news_sm
!python -m spacy download ja_core_news_sm


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.23.1-py3-none-any.whl (5.3 MB)
[K     |████████████████████████████████| 5.3 MB 5.2 MB/s 
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[K     |████████████████████████████████| 7.6 MB 39.0 MB/s 
Collecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.10.1-py3-none-any.whl (163 kB)
[K     |████████████████████████████████| 163 kB 50.9 MB/s 
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.10.1 tokenizers-0.13.1 transformers-4.23.1
Looking in indexes: https://pypi.

#### Sys Location
In case the dependencies below could not be found, it is possible that the Jupyter does not have the Conda environment currently within the Sys.Path

Solve this by
1. Activating your Conda env via terminal
2. run $ python -m ipykernel install --user --name NLP_Labs --display-name "Python (NLP_Labs)"
3. Reopen this notebook using the NLP_Labs environment

In [None]:
import spacy
import nltk
from bpemb import BPEmb
from transformers import AutoTokenizer
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
from dataclasses import dataclass
from datasets import load_dataset
import spacy
import pandas as pd
from sklearn import svm
from sklearn import tree
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, classification_report
import re
import string
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
nltk.download('punkt')
from nltk.tokenize import word_tokenize
import fugashi
import matplotlib.pyplot as plt
import time
from googletrans import Translator

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


### Define all functions
Functions for loading the dataset, tokenization, utilities,
constants, etc.


In [None]:
LABEL_TO_IDX = {"ANSWERABLE" : 0, "UNANSWERABLE" : 1} # label to indices
IDX_TO_LABEL = {0 : "ANSWERABLE", 1 : "UNANSWERABLE"}

@dataclass
class DataPoint():
    """ Utility class that represents a datapoint """
    qst : str
    qstTokenized : list
    ans : str
    ansTokenized : list
    lbl : str  # todo maybe bool instead?

def loadTokenizer(language):
  """ Returns an autoencoder-based tokenizer for a given language """
  print(f"Loading a tokenizer: {language}")
  if language == 'finnish':
    return AutoTokenizer.from_pretrained("TurkuNLP/bert-base-finnish-uncased-v1")
  elif language == 'english':
    return AutoTokenizer.from_pretrained("bert-base-uncased")
  elif language == 'japanese':
    return AutoTokenizer.from_pretrained("cl-tohoku/bert-base-japanese")
  else:
    print(f"Trying to load a tokenizer for an unsupported language: {language}")        

def loadDataset():
  """ Returns the raw TyDiQa dataset """
  return load_dataset("copenlu/answerable_tydiqa")

def getDataPoints(dataSet, language, dsType):
  """ Returns a list of DataPoint structure representing the important 
      information from dataset.
      
      @param dataSet: Raw TyDiQa dataset -> DatasetDict
      @param language: Language for which to extract information (eng, fin, jap) -> Str
      @param dsType: Type of the dataset (val, train, test) -> Str
  """

  # Convert ds to a panda and extract nice things from it
  ds = dataSet[dsType].to_pandas()  # Convert to pandas
  ds = ds.loc[ds['language'] == language]  # filter the language
  dsQ = ds['question_text'].values  # get the question text
  dsAns = ds['annotations'].values  # get the annotations
  dsAns = [i["answer_text"][0] for i in dsAns]  # get the answer
  assert len(dsQ) == len(dsQ), "Number of questions and answers is not the same"

  # Put all the goodies in a datastructure and save it in an array
  data = np.array([], dtype=DataPoint)
  tokenizer = loadTokenizer(language)
  for i, qst in enumerate(dsQ):
      ans = dsAns[i]
      qstTkn = tokenizer.tokenize(qst)
      ansTkn = tokenizer.tokenize(ans)
      
      # Label answerable/unanswerable
      if ans:
          entry = DataPoint(qst, qstTkn, ans, ansTkn, LABEL_TO_IDX["ANSWERABLE"])
          data = np.append(data, entry)
      else:
          entry = DataPoint(qst, qstTkn, ans, ansTkn, LABEL_TO_IDX["UNANSWERABLE"])
          data = np.append(data, entry)
  output = data.ravel()
  return output


def getVocab(trainSet, valSet):
  """ Get the vocabulary of the train and val dataset """
  qst_to_ix = {}
  allData = np.concatenate((trainSet, valSet))
  for datapoint in allData:
      for word in datapoint.qstTokenized + datapoint.ansTokenized:
          if word not in qst_to_ix:
              i = len(qst_to_ix)
              qst_to_ix[word] = i
  return qst_to_ix


def getBowFeat(ds, vocab):
  """ Create bag-of-words features from the dataset and vocab
      @param ds: Parsed dataset -> list[DataPoint]
      @param vocab: Vocabulary of the dataset -> dict{str:int}
  """
  featureVec = np.empty((len(ds), len(vocab)), dtype='u1')
  for i, dp in enumerate(ds):
    vec = np.zeros(len(vocab), dtype='u1')
    questionAnswerTokenized = np.append(dp.qstTokenized, dp.ansTokenized)
    for word in questionAnswerTokenized:
        vec[vocab[word]] += 1
    featureVec[i] = vec

  # Extract labels, assure they are the same length as features and return the outputs
  labels = np.array([i.lbl for i in ds])
  assert len(featureVec) == len(labels), f"Feature vector size: {len(featureVec)}, does not match label size: {len(labels)}"
  assert len(ds) == len(featureVec), f"Dataset length: {len(ds)} != featureVec length: {len(featureVec)}"
  return featureVec, labels


def getBPEmbFeat(ds, language):
  """ Get a BPEmb feature vector
    https://github.com/bheinzerling/bpemb
  
  """
  # Get the embeddings
  bpemb = None
  if language == 'english':
    bpemb = BPEmb(lang='en', dim=100, vs=25000)
  elif language == 'finnish':
    bpemb = BPEmb(lang='fi', dim=100, vs=25000)
  elif language == 'japanese':
    bpemb = BPEmb(lang='ja', dim=100, vs=25000)
  else:
    print(f"Trying to get BPEmb embeddings for an unsupported language: {language}")

  features = []
  labels = []
  for i, dp in enumerate(ds):
      tokens = dp.qstTokenized
      if(dp.ansTokenized is not None):
          tokens = np.append(dp.qstTokenized, dp.ansTokenized)
      labels.append(dp.lbl)
      feature = np.vstack([bpemb.embed(x) for x in tokens])
      features.append(feature.mean(0))
      
  labels = np.array(labels)
  features = np.stack(features)
  assert len(features) == len(labels), f"len features {len(features)} != len labels {len(labels)}"
  return features, labels      

    
def getFeatureVec(featureType, ds, language, vocab):
  """ Returns a feature vector """
  if featureType == 'bow':
    return getBowFeat(ds, vocab)
  elif featureType == 'bpemb':
    return getBPEmbFeat(ds, language)
  else:
    print(f"Trying to fetch an unsupported feature vector {featureType}")

def runClassifier(classifier, trainFeat, trainLabels, valFeat, valLabels):
  print(f"Using classifier {classifier.__class__.__name__}")
  # Train the classifier
  classifier.fit(trainFeat, trainLabels)

  # Run the predictions
  print("Running predictions..")
  preds = classifier.predict(valFeat)
  return preds, classifier

def analysePredictions(preds, labels, testSet):
  # Sanity check prediction label with dataset label, print accuracy report
  # for i, dp in enumerate(testSet):
  #     print(f"Qst: {dp.qst}; \n Ans: {dp.ans}; \n Label: {IDX_TO_LABEL[dp.lbl]}; Prediction: {IDX_TO_LABEL[preds[i]]}")
  print(classification_report(labels, preds))


### Pull everything together

In [None]:
def doClassification(language, featureType, classifier):
    """ Performs the actual classification and prints out results
        language : str -> 'english', 'finnish', 'japanese'
        featureType : str -> 'bow', 'bpemb'
        classifier : str -> 'lr', 'dt', 'svm'
    """
    print(f"Running for language: {language}; With featureType {featureType}")
    # Fetch the dataset and vocabulary
    ds = loadDataset()
    trainSet = getDataPoints(ds, language, 'train')
    valSet = getDataPoints(ds, language, 'validation')
    vocab = getVocab(trainSet, valSet)

    # Prepare the tools
    tokenizer = loadTokenizer(language) # Isn't that done in getDataPoints()?
    trainFeat, trainLbl = getFeatureVec(featureType, trainSet, language, vocab)
    valFeat, valLbl = getFeatureVec(featureType, valSet, language, vocab)
    print(f"TrainFeat size {len(trainFeat)}; valFeat size {len(valFeat)}")

    # Logistic Regressions
    if classifier == 'lr':  # linear regression
      preds, model = runClassifier(LogisticRegression(penalty='l2', max_iter=1000), trainFeat, trainLbl, valFeat, valLbl)
      analysePredictions(preds, valLbl, valSet)
    elif classifier == 'dt':  # decission tree
      preds = runClassifier(tree.DecisionTreeClassifier(), trainFeat, trainLbl, valFeat, valLbl)
      analysePredictions(preds, valLbl, valSet)
    elif classifier == 'svm':  # support-vector machine
      preds = runClassifier(svm.SVC(), trainFeat, trainLbl, valFeat, valLbl)
      analysePredictions(preds, valLbl, valSet)



In [None]:
ds = loadDataset()
language = 'english'
tokenizer = loadTokenizer(language)
trainSet = getDataPoints(ds, language, 'train')
valSet = getDataPoints(ds, language, 'validation')
vocab = getVocab(trainSet, valSet)

Downloading metadata:   0%|          | 0.00/2.47k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/4.94k [00:00<?, ?B/s]



Downloading and preparing dataset None/None (download: 75.43 MiB, generated: 131.78 MiB, post-processed: Unknown size, total: 207.21 MiB) to /root/.cache/huggingface/datasets/copenlu___parquet/copenlu--nlp_course_tydiqa-9ffd3d37cf2899c6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec...


Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/71.6M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/7.49M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

0 tables [00:00, ? tables/s]

0 tables [00:00, ? tables/s]

Dataset parquet downloaded and prepared to /root/.cache/huggingface/datasets/copenlu___parquet/copenlu--nlp_course_tydiqa-9ffd3d37cf2899c6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec. Subsequent calls will reuse this data.


  0%|          | 0/2 [00:00<?, ?it/s]

Loading a tokenizer: english


Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

Loading a tokenizer: english
Loading a tokenizer: english


In [None]:
# Prepare the tools
trainFeat, trainLbl = getFeatureVec('bow', trainSet, language, vocab)
#trainFeat2, trainLbl2 = getFeatureVec('bow2', trainSet, language, vocab)

### Run the actual classification


In [None]:
print("English; BOW; LR")
doClassification('english', 'bow', 'lr')
print("English; BPEmb; LR")
doClassification('english', 'bpemb', 'lr')


print("Finnish; BPEmb; LR")
doClassification('finnish', 'bpemb', 'lr')
print("Finnish; BOW; LR")
doClassification('finnish', 'bow', 'lr')

print("Japanese; BPEmb; LR")
doClassification('japanese', 'bpemb', 'lr')
print("Japanese; BOW; LR")
doClassification('japanese', 'bow', 'lr')

English; BOW; LR
Running for language: english; With featureType bow




  0%|          | 0/2 [00:00<?, ?it/s]

Loading a tokenizer: english
Loading a tokenizer: english
Loading a tokenizer: english
TrainFeat size 7389; valFeat size 990
Using classifier LogisticRegression
Running predictions..
              precision    recall  f1-score   support

           0       0.90      0.71      0.80       495
           1       0.76      0.92      0.84       495

    accuracy                           0.82       990
   macro avg       0.83      0.82      0.82       990
weighted avg       0.83      0.82      0.82       990

English; BPEmb; LR
Running for language: english; With featureType bpemb




  0%|          | 0/2 [00:00<?, ?it/s]

Loading a tokenizer: english
Loading a tokenizer: english
Loading a tokenizer: english
downloading https://nlp.h-its.org/bpemb/en/en.wiki.bpe.vs25000.model


100%|██████████| 661443/661443 [00:00<00:00, 683987.85B/s]


downloading https://nlp.h-its.org/bpemb/en/en.wiki.bpe.vs25000.d100.w2v.bin.tar.gz


100%|██████████| 9477142/9477142 [00:02<00:00, 4712529.25B/s]


TrainFeat size 7389; valFeat size 990
Using classifier LogisticRegression
Running predictions..
              precision    recall  f1-score   support

           0       0.78      0.76      0.77       495
           1       0.76      0.79      0.78       495

    accuracy                           0.77       990
   macro avg       0.77      0.77      0.77       990
weighted avg       0.77      0.77      0.77       990

Finnish; BPEmb; LR
Running for language: finnish; With featureType bpemb




  0%|          | 0/2 [00:00<?, ?it/s]

Loading a tokenizer: finnish


Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/433 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/427k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/819k [00:00<?, ?B/s]

Loading a tokenizer: finnish
Loading a tokenizer: finnish
downloading https://nlp.h-its.org/bpemb/fi/fi.wiki.bpe.vs25000.model


100%|██████████| 664551/664551 [00:00<00:00, 688551.03B/s]


downloading https://nlp.h-its.org/bpemb/fi/fi.wiki.bpe.vs25000.d100.w2v.bin.tar.gz


100%|██████████| 9479113/9479113 [00:01<00:00, 4867489.43B/s]


TrainFeat size 13701; valFeat size 1686
Using classifier LogisticRegression
Running predictions..
              precision    recall  f1-score   support

           0       0.81      0.80      0.80       843
           1       0.80      0.81      0.81       843

    accuracy                           0.81      1686
   macro avg       0.81      0.81      0.81      1686
weighted avg       0.81      0.81      0.81      1686

Finnish; BOW; LR
Running for language: finnish; With featureType bow




  0%|          | 0/2 [00:00<?, ?it/s]

Loading a tokenizer: finnish
Loading a tokenizer: finnish
Loading a tokenizer: finnish
TrainFeat size 13701; valFeat size 1686
Using classifier LogisticRegression
Running predictions..
              precision    recall  f1-score   support

           0       0.91      0.75      0.83       843
           1       0.79      0.93      0.85       843

    accuracy                           0.84      1686
   macro avg       0.85      0.84      0.84      1686
weighted avg       0.85      0.84      0.84      1686

Japanese; BPEmb; LR
Running for language: japanese; With featureType bpemb




  0%|          | 0/2 [00:00<?, ?it/s]

Loading a tokenizer: japanese


Downloading:   0%|          | 0.00/104 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/479 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/258k [00:00<?, ?B/s]

Loading a tokenizer: japanese
Loading a tokenizer: japanese
downloading https://nlp.h-its.org/bpemb/ja/ja.wiki.bpe.vs25000.model


100%|██████████| 647083/647083 [00:00<00:00, 673388.86B/s]


downloading https://nlp.h-its.org/bpemb/ja/ja.wiki.bpe.vs25000.d100.w2v.bin.tar.gz


100%|██████████| 9473123/9473123 [00:02<00:00, 4519299.58B/s]


TrainFeat size 8778; valFeat size 1036
Using classifier LogisticRegression
Running predictions..
              precision    recall  f1-score   support

           0       0.85      0.81      0.82       518
           1       0.81      0.85      0.83       518

    accuracy                           0.83      1036
   macro avg       0.83      0.83      0.83      1036
weighted avg       0.83      0.83      0.83      1036

Japanese; BOW; LR
Running for language: japanese; With featureType bow




  0%|          | 0/2 [00:00<?, ?it/s]

Loading a tokenizer: japanese
Loading a tokenizer: japanese
Loading a tokenizer: japanese
TrainFeat size 8778; valFeat size 1036
Using classifier LogisticRegression
Running predictions..
              precision    recall  f1-score   support

           0       0.89      0.76      0.82       518
           1       0.79      0.90      0.84       518

    accuracy                           0.83      1036
   macro avg       0.84      0.83      0.83      1036
weighted avg       0.84      0.83      0.83      1036



In [None]:
doClassification('english', 'bow', 'lr')

Running for language: english; With featureType bow




  0%|          | 0/2 [00:00<?, ?it/s]

Loading a tokenizer: english
Loading a tokenizer: english
Loading a tokenizer: english
TrainFeat size 7389; valFeat size 990
Using classifier LogisticRegression
Running predictions..
              precision    recall  f1-score   support

           0       0.90      0.71      0.80       495
           1       0.76      0.92      0.84       495

    accuracy                           0.82       990
   macro avg       0.83      0.82      0.82       990
weighted avg       0.83      0.82      0.82       990



### Task B)
Investigate how the
questions in the training set normally begin and end. Make an overview
of which words are the most common first and last tokens in a question,
for each of the three languages (English, Finnish, Japanese). Observe both
common question and non-question words that are common as the first
and last tokens in a question.

In [None]:
ds = loadDataset()
language = 'english'
tokenizer = loadTokenizer(language)
trainSet = getDataPoints(ds, language, 'train')
valSet = getDataPoints(ds, language, 'validation')




  0%|          | 0/2 [00:00<?, ?it/s]

Loading a tokenizer: english
Loading a tokenizer: english
Loading a tokenizer: english


# Change up code - Hannah

In [None]:
def loadDataset():
  """ Returns the raw TyDiQa dataset """
  dataset = load_dataset("copenlu/answerable_tydiqa")
  dataset = dataset.filter(lambda x: x["language"] == "english" or x["language"] == "finnish" or x["language"] == "japanese")
  train_set = dataset["train"]
  validation_set = dataset["validation"]
  training_data = answer_available(pd.DataFrame.from_dict(train_set))
  validation_data = answer_available(pd.DataFrame.from_dict(validation_set))
  training_data['question_context'] = training_data['question_text'] + ' ' + training_data['document_plaintext']
  validation_data['question_context'] = validation_data['question_text'] + ' ' + validation_data['document_plaintext']
  return training_data, validation_data

def answer_available(df):
    """Add a column indicating whether an answer is available in the context or not
    0: no answer available, 1: answer available"""
    df['answer_available'] = [0 if df['annotations'][i]['answer_text'] == [''] else 1 for i in range(len(df))]
    return df

def split_data(train_df, val_df, language):
    """Get training and validation data for one language"""
    train = train_df[train_df["language"] == language].reset_index(drop = True)
    val = val_df[val_df["language"] == language].reset_index(drop = True)
    return train, val

def preprocessing(text):
    """Prepare text: Convert everything to lowercase, remove punctuation, encoded and / or double+ whitespaces
    special quotationsmarks (found in japanese text), """
    # text = text.apply(lambda x: x.lower())
    text = re.sub('\[[0-9]+\]', '', text)
    text = text.translate(str.maketrans('', '', string.punctuation))
    text = re.sub('\s+', ' ', text)
    text = re.sub('”', '', text)
    return text

def do_nothing(tokens):
    return tokens

def tokenizeData(data, tokenizer, language, preprocessor = False):
    """Takes any iterable as input return lists of tokens in a list"""
    if preprocessor == True:
        data = data.apply(lambda x: preprocessing(x))
    # if tokenizer == 'bert':
    #     if language == 'english':
    #         tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
    #     elif language == 'finnish':
    #         tokenizer = AutoTokenizer.from_pretrained("TurkuNLP/bert-base-finnish-uncased-v1")
    #     elif language == 'japanese':
    #         # Not recommended
    #         # https://github.com/polm/ipadic-py
    #         tokenizer = AutoTokenizer.from_pretrained("cl-tohoku/bert-base-japanese")
    #     output_data = data.apply(lambda x: tokenizer.tokenize(x))
    if tokenizer == 'spacy':
        if language == 'english':
            tokenizer = spacy.load('en_core_web_sm')
        elif language == 'finnish':
            tokenizer = spacy.load("fi_core_news_sm")
        elif language == 'japanese':
            tokenizer = spacy.load('ja_core_news_sm')
        output_data = data.apply(lambda x: [token.text for token in tokenizer(x)])
    elif tokenizer == 'nltk':
        if language == 'english':
            output_data = data.apply(lambda x: word_tokenize(x, language = 'english'))
        elif language == 'finnish':
            output_data = data.apply(lambda x: word_tokenize(x, language = 'finnish'))
        elif language == 'japanese':
            print('NLTK does not support tokenization in japanese.')
            output_data = None
    elif tokenizer == 'fugashi':
        if language == 'japanese':
            tagger = fugashi.Tagger()
            output_data = data.apply(lambda x: [token.surface for token in tagger(x)])
        else:
            print('Fugashi only supports the language Japanese.')
            output_data = None
    return output_data

def most_common_tokens(data, language, save = False, plot = True):
    """any iterable as input, dictionary of first and last tokens, produces table or plot"""
    first_tokens = dict()
    last_tokens = dict()
    for tokens in data:
        try: 
            if tokens[0] in first_tokens.keys():
                first_tokens[tokens[0]] += 1
            else:
                first_tokens[tokens[0]] = 1
            if tokens[-1] in last_tokens.keys():
                last_tokens[tokens[-1]] += 1
            else:
                last_tokens[tokens[-1]] = 1
        except IndexError:
            pass
     # Sort dict
    first_tokens = dict(sorted(first_tokens.items(), key=lambda item: item[1], reverse = True))
    last_tokens = dict(sorted(last_tokens.items(), key=lambda item: item[1], reverse = True))

    if language == 'finnish' or language == 'japanese':
        first_translated = dict()
        last_translated = dict()
        translator = Translator()
        for key in first_tokens.keys():
            try: 
                first_translated[f'{key}_{translator.translate(key).text}'] = first_tokens[key]
            except:
                first_translated[f'{key}_[No translation available]'] = first_tokens[key]
        for key in last_tokens.keys():
            try:
                last_translated[f'{key}_{translator.translate(key).text}'] = last_tokens[key]
            except: 
                last_translated[f'{key}_[No translation available]'] = last_tokens[key]
        print(len(first_tokens.keys()), len(first_translated.keys()))
        print(len(last_tokens.keys()), len(last_tokens.keys()))
    
    if plot == False:
        if language == 'english':
            # Create dataframes
            df_first = pd.DataFrame({'token': first_tokens.keys(),
                      'count': first_tokens.values()})
            df_last = pd.DataFrame({'token': last_tokens.keys(),
                      'count': last_tokens.values()})
            return df_first, df_last
        else:
            # Create dataframes
            df_first = pd.DataFrame({'token': first_tokens.keys(), 
                                       'translated_token': first_translated.keys(),
                                       'count': first_tokens.values()})
            df_last = pd.DataFrame({'token': last_tokens.keys(),
                                      'translated_token': last_translated.keys(),
                                      'count': last_tokens.values()})
        return df_first, df_last
    
    else:
        if language == 'english':
            # Plot in barplot
            figure, axis = plt.subplots(1, 2, figsize=(12,5))

            axis[0].bar(list(first_tokens.keys())[:10], list(first_tokens.values())[:10])
            axis[0].set_title(f"Most common first tokens - {language}")
            axis[0].set_xticklabels(list(first_tokens.keys())[:10], rotation=45)
            # Last tokens
            axis[1].bar(list(last_tokens.keys())[:10], list(last_tokens.values())[:10])
            axis[1].set_title(f"Most common last tokens - {language}")
            axis[1].set_xticklabels(list(last_tokens.keys())[:10], rotation=45)
        else:
            figure, axis = plt.subplots(2, 2, figsize=(12,15))

            axis[0,0].bar(list(first_tokens.keys())[:10], list(first_tokens.values())[:10])
            axis[0,0].set_title(f"Most common first tokens - {language}")
            axis[0,0].set_xticklabels(list(first_tokens.keys())[:10], rotation=45)
                        # Last tokens
            axis[0,1].bar(list(last_tokens.keys())[:10], list(last_tokens.values())[:10])
            axis[0,1].set_title(f"Most common last tokens - {language}")
            axis[0,1].set_xticklabels(list(last_tokens.keys())[:10], rotation=45)

            axis[1,0].bar(list(first_translated.keys())[:10], list(first_translated.values())[:10])
            axis[1,0].set_title(f"Most common first tokens - translated")
            axis[1,0].set_xticklabels(list(first_translated.keys())[:10], rotation=45)
                        # Last tokens
            axis[1,1].bar(list(last_translated.keys())[:10], list(last_translated.values())[:10])
            axis[1,1].set_title(f"Most common last tokens - translated")
            axis[1,1].set_xticklabels(list(last_translated.keys())[:10], rotation=45)

        if save == False:
            plt.show()

        else:
            plt.savefig(f'token_overview_{"_".join(time.ctime().split())}.png')

def lr_classifier(training_tokens, validation_tokens, y_train, y_val, tfidf = False):
    
    # Count vs. Tfidf vectorizer
    if tfidf == False:
        vectorizer = CountVectorizer(tokenizer = do_nothing, 
                                    preprocessor = do_nothing)
    else:
        vectorizer = TfidfVectorizer(tokenizer = do_nothing, 
                                    preprocessor = do_nothing)
    
    # Creating dfm
    vectorizer = vectorizer.fit(training_tokens)
    X_train = vectorizer.transform(training_tokens)
    X_val = vectorizer.transform(validation_tokens)
    
    # Training LogisticRegression, random_state set to 42 
    log_mod = LogisticRegression(random_state = 42, max_iter = 1000).fit(X_train, y_train)
    
    # Evaluating model
    preds = log_mod.predict(X_val)
    f1 = f1_score(y_val, preds)
    acc = accuracy_score(y_val, preds)
    print(f'F1 = {f1}')
    print(f'Accuracy = {acc}')
    print(classification_report(y_val, preds))
    return f1, acc

def test_classifier(train_data, val_data, language, tokenizer, inp = 'question', save = False, common_tokens = False):
    """Putting it all together"""

    if tokenizer == 'nltk' and language == 'japanese':
        print(f"Nltk does not support the language japanese.")
    
    elif tokenizer == 'fugashi' and language != 'japanese':
        print("fugashi only supports the language japanese.")
      
    else:
        y_train = np.array(train_data['answer_available'])
        y_val = np.array(val_data['answer_available'])

        if inp == 'question':
            train = train_data['question_text']
            val = val_data['question_text']
        elif inp == 'context':
            train = train_data['document_plaintext']
            val = val_data['document_plaintext']
        elif inp == 'question_context':
            train = train_data['question_context']
            val = val_data['question_context']

        
        # No preprocessing, CountVectorizer
        print(f"Tokenizer: {tokenizer}, No preprocessing, no weighting")
        train_tok = tokenizeData(train, tokenizer = tokenizer, language = language)
        val_tok = tokenizeData(val, tokenizer = tokenizer, language = language)
        if common_tokens == True:
            most_common_tokens(train_tok, language = language, save = save)
        f1_np, acc_np = lr_classifier(train_tok, val_tok, y_train, y_val)

        # No preprocessing, TfidfVectorizer
        print(f"Tokenizer: {tokenizer}, No preprocessing, tfidf weighting")
        f1_nptfidf, acc_nptfidf = lr_classifier(train_tok, val_tok, y_train, y_val, tfidf = True)

        # Preprocessing, CountVectorizer
        print(f"Tokenizer: {tokenizer}, Preprocessing, no weighting")
        train_tok = tokenizeData(train, tokenizer = tokenizer, language = language, preprocessor = True)
        val_tok = tokenizeData(val, tokenizer = tokenizer, language = language, preprocessor = True)
        if common_tokens == True:
            most_common_tokens(train_tok, language = language, save = save)
        f1_p, acc_p = lr_classifier(train_tok, val_tok, y_train, y_val)

        # Preprocessing, TfidfVectorizer
        print(f"Tokenizer: {tokenizer}, Preprocessing, tfidf weighting")
        f1_ptfidf, acc_ptfidf = lr_classifier(train_tok, val_tok, y_train, y_val, tfidf = True)
        return f1_np, acc_np, f1_nptfidf, acc_nptfidf, f1_p, acc_p, f1_ptfidf, acc_ptfidf
    

In [None]:
training_data, validation_data = loadDataset()
# Split by language
train_en, val_en = split_data(training_data, validation_data, 'english')
train_fi, val_fi = split_data(training_data, validation_data, 'finnish')
train_ja, val_ja = split_data(training_data, validation_data, 'japanese')

Downloading metadata:   0%|          | 0.00/2.47k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/4.94k [00:00<?, ?B/s]



Downloading and preparing dataset None/None (download: 75.43 MiB, generated: 131.78 MiB, post-processed: Unknown size, total: 207.21 MiB) to /root/.cache/huggingface/datasets/copenlu___parquet/copenlu--nlp_course_tydiqa-9ffd3d37cf2899c6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec...


Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/7.49M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/71.6M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

0 tables [00:00, ? tables/s]

0 tables [00:00, ? tables/s]

Dataset parquet downloaded and prepared to /root/.cache/huggingface/datasets/copenlu___parquet/copenlu--nlp_course_tydiqa-9ffd3d37cf2899c6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec. Subsequent calls will reuse this data.


  0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/14 [00:00<?, ?ba/s]

  0%|          | 0/117 [00:00<?, ?ba/s]

In [None]:
results = pd.DataFrame(columns = ['language', 'input', 'tokenizer', 'f1_noprep_count','acc_noprep_count', 
                                  'f1_noprep_tfidf', 'acc_noprep_tfidf', 'f1_prep_count', 'acc_prep_count', 'f1_prep_tfidf', 'acc_prep_tfidf'])

for lan in ['english', 'finnish', 'japanese']:
    if lan == 'english':
        train, val = train_en, val_en #split_data(training_data, validation_data, 'english')
    elif lan == 'finnish':
        train, val = train_fi, val_fi #split_data(training_data, validation_data, 'finnish')
    elif lan == 'japanese':
        train, val = train_ja, val_ja #split_data(training_data, validation_data, 'japanese')
    for feature in ['question', 'context', 'question_context']:
        print(f"Language: {lan}, Input: {feature}")
        for tok in ['bert', 'spacy', 'nltk']: # use fugashi in jupyter notebook
            if lan == 'japanese' and tok == 'nltk':
                pass
            else:
                f1_np, acc_np, f1_nptfidf, acc_nptfidf, f1_p, acc_p, f1_ptfidf, acc_ptfidf = test_classifier(train, val, tokenizer = tok, language = lan, inp = feature)
                results.loc[len(results)] = [f'{lan}', f'{feature}', f'{tok}', f1_np, acc_np, f1_nptfidf, acc_nptfidf, f1_p, acc_p, f1_ptfidf, acc_ptfidf]

results.to_csv('results_lab1.csv')

Language: english, Input: question
Tokenizer: bert, No preprocessing, no weighting


Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

  "The parameter 'token_pattern' will not be used"


F1 = 0.5316934720908232
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.43      0.46       495
           1       0.50      0.57      0.53       495

    accuracy                           0.50       990
   macro avg       0.50      0.50      0.50       990
weighted avg       0.50      0.50      0.50       990

Tokenizer: bert, No preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.5244956772334294
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.45      0.47       495
           1       0.50      0.55      0.52       495

    accuracy                           0.50       990
   macro avg       0.50      0.50      0.50       990
weighted avg       0.50      0.50      0.50       990

Tokenizer: bert, Preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.543778801843318
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.40      0.45       495
           1       0.50      0.60      0.54       495

    accuracy                           0.50       990
   macro avg       0.50      0.50      0.50       990
weighted avg       0.50      0.50      0.50       990

Tokenizer: bert, Preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.5074626865671642
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.48      0.49       495
           1       0.50      0.52      0.51       495

    accuracy                           0.50       990
   macro avg       0.50      0.50      0.50       990
weighted avg       0.50      0.50      0.50       990

Tokenizer: spacy, No preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.5352112676056339
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.42      0.46       495
           1       0.50      0.58      0.54       495

    accuracy                           0.50       990
   macro avg       0.50      0.50      0.50       990
weighted avg       0.50      0.50      0.50       990

Tokenizer: spacy, No preprocessing, tfidf weighting
F1 = 0.5316934720908232
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.43      0.46       495
           1       0.50      0.57      0.53       495

    accuracy                           0.50       990
   macro avg       0.50      0.50      0.50       990
weighted avg       0.50      0.50      0.50       990

Tokenizer: spacy, Preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"
  "The parameter 'token_pattern' will not be used"


F1 = 0.5446182152713892
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.40      0.45       495
           1       0.50      0.60      0.54       495

    accuracy                           0.50       990
   macro avg       0.50      0.50      0.50       990
weighted avg       0.50      0.50      0.50       990

Tokenizer: spacy, Preprocessing, tfidf weighting
F1 = 0.4943820224719101
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.51      0.51       495
           1       0.50      0.49      0.49       495

    accuracy                           0.50       990
   macro avg       0.50      0.50      0.50       990
weighted avg       0.50      0.50      0.50       990

Tokenizer: nltk, No preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"
  "The parameter 'token_pattern' will not be used"


F1 = 0.5403899721448469
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.41      0.45       495
           1       0.50      0.59      0.54       495

    accuracy                           0.50       990
   macro avg       0.50      0.50      0.50       990
weighted avg       0.50      0.50      0.50       990

Tokenizer: nltk, No preprocessing, tfidf weighting
F1 = 0.5299145299145299
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.44      0.47       495
           1       0.50      0.56      0.53       495

    accuracy                           0.50       990
   macro avg       0.50      0.50      0.50       990
weighted avg       0.50      0.50      0.50       990

Tokenizer: nltk, Preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"
  "The parameter 'token_pattern' will not be used"


F1 = 0.5217391304347826
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.45      0.48       495
           1       0.50      0.55      0.52       495

    accuracy                           0.50       990
   macro avg       0.50      0.50      0.50       990
weighted avg       0.50      0.50      0.50       990

Tokenizer: nltk, Preprocessing, tfidf weighting
F1 = 0.4728434504792332
Accuracy = 0.5


  "The parameter 'token_pattern' will not be used"


              precision    recall  f1-score   support

           0       0.50      0.55      0.52       495
           1       0.50      0.45      0.47       495

    accuracy                           0.50       990
   macro avg       0.50      0.50      0.50       990
weighted avg       0.50      0.50      0.50       990

Language: english, Input: context
Tokenizer: bert, No preprocessing, no weighting


Token indices sequence length is longer than the specified maximum sequence length for this model (1004 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (645 > 512). Running this sequence through the model will result in indexing errors
  "The parameter 'token_pattern' will not be used"


F1 = 0.7170984455958549
Accuracy = 0.7242424242424242
              precision    recall  f1-score   support

           0       0.71      0.75      0.73       495
           1       0.74      0.70      0.72       495

    accuracy                           0.72       990
   macro avg       0.72      0.72      0.72       990
weighted avg       0.72      0.72      0.72       990

Tokenizer: bert, No preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7397540983606558
Accuracy = 0.7434343434343434
              precision    recall  f1-score   support

           0       0.74      0.76      0.75       495
           1       0.75      0.73      0.74       495

    accuracy                           0.74       990
   macro avg       0.74      0.74      0.74       990
weighted avg       0.74      0.74      0.74       990

Tokenizer: bert, Preprocessing, no weighting


Token indices sequence length is longer than the specified maximum sequence length for this model (895 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (680 > 512). Running this sequence through the model will result in indexing errors
  "The parameter 'token_pattern' will not be used"


F1 = 0.7001044932079415
Accuracy = 0.7101010101010101
              precision    recall  f1-score   support

           0       0.70      0.74      0.72       495
           1       0.73      0.68      0.70       495

    accuracy                           0.71       990
   macro avg       0.71      0.71      0.71       990
weighted avg       0.71      0.71      0.71       990

Tokenizer: bert, Preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7199170124481327
Accuracy = 0.7272727272727273
              precision    recall  f1-score   support

           0       0.72      0.75      0.73       495
           1       0.74      0.70      0.72       495

    accuracy                           0.73       990
   macro avg       0.73      0.73      0.73       990
weighted avg       0.73      0.73      0.73       990

Tokenizer: spacy, No preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7272727272727272
Accuracy = 0.7303030303030303
              precision    recall  f1-score   support

           0       0.73      0.74      0.73       495
           1       0.74      0.72      0.73       495

    accuracy                           0.73       990
   macro avg       0.73      0.73      0.73       990
weighted avg       0.73      0.73      0.73       990

Tokenizer: spacy, No preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7471042471042473
Accuracy = 0.7353535353535353
              precision    recall  f1-score   support

           0       0.76      0.69      0.72       495
           1       0.72      0.78      0.75       495

    accuracy                           0.74       990
   macro avg       0.74      0.74      0.73       990
weighted avg       0.74      0.74      0.73       990

Tokenizer: spacy, Preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7218813905930471
Accuracy = 0.7252525252525253
              precision    recall  f1-score   support

           0       0.72      0.74      0.73       495
           1       0.73      0.71      0.72       495

    accuracy                           0.73       990
   macro avg       0.73      0.73      0.73       990
weighted avg       0.73      0.73      0.73       990

Tokenizer: spacy, Preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.74540174249758
Accuracy = 0.7343434343434343
              precision    recall  f1-score   support

           0       0.76      0.69      0.72       495
           1       0.72      0.78      0.75       495

    accuracy                           0.73       990
   macro avg       0.74      0.73      0.73       990
weighted avg       0.74      0.73      0.73       990

Tokenizer: nltk, No preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7393075356415478
Accuracy = 0.7414141414141414
              precision    recall  f1-score   support

           0       0.74      0.75      0.74       495
           1       0.75      0.73      0.74       495

    accuracy                           0.74       990
   macro avg       0.74      0.74      0.74       990
weighted avg       0.74      0.74      0.74       990

Tokenizer: nltk, No preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.768199233716475
Accuracy = 0.7555555555555555
              precision    recall  f1-score   support

           0       0.79      0.70      0.74       495
           1       0.73      0.81      0.77       495

    accuracy                           0.76       990
   macro avg       0.76      0.76      0.75       990
weighted avg       0.76      0.76      0.75       990

Tokenizer: nltk, Preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.71120246659815
Accuracy = 0.7161616161616161
              precision    recall  f1-score   support

           0       0.71      0.73      0.72       495
           1       0.72      0.70      0.71       495

    accuracy                           0.72       990
   macro avg       0.72      0.72      0.72       990
weighted avg       0.72      0.72      0.72       990

Tokenizer: nltk, Preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7403100775193799
Accuracy = 0.7292929292929293
              precision    recall  f1-score   support

           0       0.75      0.69      0.72       495
           1       0.71      0.77      0.74       495

    accuracy                           0.73       990
   macro avg       0.73      0.73      0.73       990
weighted avg       0.73      0.73      0.73       990

Language: english, Input: question_context
Tokenizer: bert, No preprocessing, no weighting


Token indices sequence length is longer than the specified maximum sequence length for this model (1014 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (655 > 512). Running this sequence through the model will result in indexing errors
  "The parameter 'token_pattern' will not be used"


F1 = 0.721243523316062
Accuracy = 0.7282828282828283
              precision    recall  f1-score   support

           0       0.72      0.75      0.73       495
           1       0.74      0.70      0.72       495

    accuracy                           0.73       990
   macro avg       0.73      0.73      0.73       990
weighted avg       0.73      0.73      0.73       990

Tokenizer: bert, No preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7166324435318275
Accuracy = 0.7212121212121212
              precision    recall  f1-score   support

           0       0.71      0.74      0.73       495
           1       0.73      0.71      0.72       495

    accuracy                           0.72       990
   macro avg       0.72      0.72      0.72       990
weighted avg       0.72      0.72      0.72       990

Tokenizer: bert, Preprocessing, no weighting


Token indices sequence length is longer than the specified maximum sequence length for this model (904 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (518 > 512). Running this sequence through the model will result in indexing errors
  "The parameter 'token_pattern' will not be used"


F1 = 0.6833333333333333
Accuracy = 0.692929292929293
              precision    recall  f1-score   support

           0       0.68      0.72      0.70       495
           1       0.71      0.66      0.68       495

    accuracy                           0.69       990
   macro avg       0.69      0.69      0.69       990
weighted avg       0.69      0.69      0.69       990

Tokenizer: bert, Preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.6989690721649484
Accuracy = 0.705050505050505
              precision    recall  f1-score   support

           0       0.70      0.73      0.71       495
           1       0.71      0.68      0.70       495

    accuracy                           0.71       990
   macro avg       0.71      0.71      0.70       990
weighted avg       0.71      0.71      0.70       990

Tokenizer: spacy, No preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7325933400605449
Accuracy = 0.7323232323232324
              precision    recall  f1-score   support

           0       0.73      0.73      0.73       495
           1       0.73      0.73      0.73       495

    accuracy                           0.73       990
   macro avg       0.73      0.73      0.73       990
weighted avg       0.73      0.73      0.73       990

Tokenizer: spacy, No preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7392996108949416
Accuracy = 0.7292929292929293
              precision    recall  f1-score   support

           0       0.75      0.69      0.72       495
           1       0.71      0.77      0.74       495

    accuracy                           0.73       990
   macro avg       0.73      0.73      0.73       990
weighted avg       0.73      0.73      0.73       990

Tokenizer: spacy, Preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7082906857727738
Accuracy = 0.7121212121212122
              precision    recall  f1-score   support

           0       0.71      0.73      0.72       495
           1       0.72      0.70      0.71       495

    accuracy                           0.71       990
   macro avg       0.71      0.71      0.71       990
weighted avg       0.71      0.71      0.71       990

Tokenizer: spacy, Preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7246093750000001
Accuracy = 0.7151515151515152
              precision    recall  f1-score   support

           0       0.73      0.68      0.71       495
           1       0.70      0.75      0.72       495

    accuracy                           0.72       990
   macro avg       0.72      0.72      0.71       990
weighted avg       0.72      0.72      0.71       990

Tokenizer: nltk, No preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7349027635619243
Accuracy = 0.7383838383838384
              precision    recall  f1-score   support

           0       0.73      0.75      0.74       495
           1       0.74      0.73      0.73       495

    accuracy                           0.74       990
   macro avg       0.74      0.74      0.74       990
weighted avg       0.74      0.74      0.74       990

Tokenizer: nltk, No preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7591522157996147
Accuracy = 0.7474747474747475
              precision    recall  f1-score   support

           0       0.77      0.70      0.73       495
           1       0.73      0.80      0.76       495

    accuracy                           0.75       990
   macro avg       0.75      0.75      0.75       990
weighted avg       0.75      0.75      0.75       990

Tokenizer: nltk, Preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7037037037037037
Accuracy = 0.7090909090909091
              precision    recall  f1-score   support

           0       0.70      0.73      0.71       495
           1       0.72      0.69      0.70       495

    accuracy                           0.71       990
   macro avg       0.71      0.71      0.71       990
weighted avg       0.71      0.71      0.71       990

Tokenizer: nltk, Preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7247796278158668
Accuracy = 0.7161616161616161
              precision    recall  f1-score   support

           0       0.73      0.68      0.71       495
           1       0.70      0.75      0.72       495

    accuracy                           0.72       990
   macro avg       0.72      0.72      0.72       990
weighted avg       0.72      0.72      0.72       990

Language: finnish, Input: question
Tokenizer: bert, No preprocessing, no weighting


Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/433 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/427k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/819k [00:00<?, ?B/s]

  "The parameter 'token_pattern' will not be used"


F1 = 0.5218377765173
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.45      0.48       843
           1       0.50      0.55      0.52       843

    accuracy                           0.50      1686
   macro avg       0.50      0.50      0.50      1686
weighted avg       0.50      0.50      0.50      1686

Tokenizer: bert, No preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.5107370864770748
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.48      0.49       843
           1       0.50      0.52      0.51       843

    accuracy                           0.50      1686
   macro avg       0.50      0.50      0.50      1686
weighted avg       0.50      0.50      0.50      1686

Tokenizer: bert, Preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.5334809075816269
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.43      0.46       843
           1       0.50      0.57      0.53       843

    accuracy                           0.50      1686
   macro avg       0.50      0.50      0.50      1686
weighted avg       0.50      0.50      0.50      1686

Tokenizer: bert, Preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.49970326409495547
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.50      0.50       843
           1       0.50      0.50      0.50       843

    accuracy                           0.50      1686
   macro avg       0.50      0.50      0.50      1686
weighted avg       0.50      0.50      0.50      1686

Tokenizer: spacy, No preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.5765946760421898
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.32      0.39       843
           1       0.50      0.68      0.58       843

    accuracy                           0.50      1686
   macro avg       0.50      0.50      0.48      1686
weighted avg       0.50      0.50      0.48      1686

Tokenizer: spacy, No preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.5731645569620254
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.33      0.40       843
           1       0.50      0.67      0.57       843

    accuracy                           0.50      1686
   macro avg       0.50      0.50      0.48      1686
weighted avg       0.50      0.50      0.48      1686

Tokenizer: spacy, Preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.5857493857493858
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.29      0.37       843
           1       0.50      0.71      0.59       843

    accuracy                           0.50      1686
   macro avg       0.50      0.50      0.48      1686
weighted avg       0.50      0.50      0.48      1686

Tokenizer: spacy, Preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.5415986949429037
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.41      0.45       843
           1       0.50      0.59      0.54       843

    accuracy                           0.50      1686
   macro avg       0.50      0.50      0.50      1686
weighted avg       0.50      0.50      0.50      1686

Tokenizer: nltk, No preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.5782891445722862
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.31      0.39       843
           1       0.50      0.69      0.58       843

    accuracy                           0.50      1686
   macro avg       0.50      0.50      0.48      1686
weighted avg       0.50      0.50      0.48      1686

Tokenizer: nltk, No preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.5841144548593981
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.30      0.37       843
           1       0.50      0.70      0.58       843

    accuracy                           0.50      1686
   macro avg       0.50      0.50      0.48      1686
weighted avg       0.50      0.50      0.48      1686

Tokenizer: nltk, Preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.5824665676077266
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.30      0.38       843
           1       0.50      0.70      0.58       843

    accuracy                           0.50      1686
   macro avg       0.50      0.50      0.48      1686
weighted avg       0.50      0.50      0.48      1686

Tokenizer: nltk, Preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.5410996189439303
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.41      0.45       843
           1       0.50      0.59      0.54       843

    accuracy                           0.50      1686
   macro avg       0.50      0.50      0.50      1686
weighted avg       0.50      0.50      0.50      1686

Language: finnish, Input: context
Tokenizer: bert, No preprocessing, no weighting


Token indices sequence length is longer than the specified maximum sequence length for this model (721 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1582 > 512). Running this sequence through the model will result in indexing errors
  "The parameter 'token_pattern' will not be used"


F1 = 0.70412432755529
Accuracy = 0.7064056939501779
              precision    recall  f1-score   support

           0       0.70      0.71      0.71       843
           1       0.71      0.70      0.70       843

    accuracy                           0.71      1686
   macro avg       0.71      0.71      0.71      1686
weighted avg       0.71      0.71      0.71      1686

Tokenizer: bert, No preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7157894736842105
Accuracy = 0.7277580071174378
              precision    recall  f1-score   support

           0       0.71      0.77      0.74       843
           1       0.75      0.69      0.72       843

    accuracy                           0.73      1686
   macro avg       0.73      0.73      0.73      1686
weighted avg       0.73      0.73      0.73      1686

Tokenizer: bert, Preprocessing, no weighting


Token indices sequence length is longer than the specified maximum sequence length for this model (711 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1184 > 512). Running this sequence through the model will result in indexing errors
  "The parameter 'token_pattern' will not be used"


F1 = 0.6923540036122817
Accuracy = 0.6969157769869514
              precision    recall  f1-score   support

           0       0.69      0.71      0.70       843
           1       0.70      0.68      0.69       843

    accuracy                           0.70      1686
   macro avg       0.70      0.70      0.70      1686
weighted avg       0.70      0.70      0.70      1686

Tokenizer: bert, Preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7025355596784167
Accuracy = 0.7147093712930012
              precision    recall  f1-score   support

           0       0.70      0.76      0.73       843
           1       0.73      0.67      0.70       843

    accuracy                           0.71      1686
   macro avg       0.72      0.71      0.71      1686
weighted avg       0.72      0.71      0.71      1686

Tokenizer: spacy, No preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7084100675260896
Accuracy = 0.7182680901542111
              precision    recall  f1-score   support

           0       0.70      0.75      0.73       843
           1       0.73      0.68      0.71       843

    accuracy                           0.72      1686
   macro avg       0.72      0.72      0.72      1686
weighted avg       0.72      0.72      0.72      1686

Tokenizer: spacy, No preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7271714922048997
Accuracy = 0.7093712930011863
              precision    recall  f1-score   support

           0       0.74      0.64      0.69       843
           1       0.69      0.77      0.73       843

    accuracy                           0.71      1686
   macro avg       0.71      0.71      0.71      1686
weighted avg       0.71      0.71      0.71      1686

Tokenizer: spacy, Preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7152480097979179
Accuracy = 0.7241992882562278
              precision    recall  f1-score   support

           0       0.71      0.76      0.73       843
           1       0.74      0.69      0.72       843

    accuracy                           0.72      1686
   macro avg       0.73      0.72      0.72      1686
weighted avg       0.73      0.72      0.72      1686

Tokenizer: spacy, Preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7258156840297654
Accuracy = 0.7158956109134045
              precision    recall  f1-score   support

           0       0.73      0.68      0.71       843
           1       0.70      0.75      0.73       843

    accuracy                           0.72      1686
   macro avg       0.72      0.72      0.72      1686
weighted avg       0.72      0.72      0.72      1686

Tokenizer: nltk, No preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7385180649112063
Accuracy = 0.7467378410438908
              precision    recall  f1-score   support

           0       0.73      0.78      0.75       843
           1       0.76      0.72      0.74       843

    accuracy                           0.75      1686
   macro avg       0.75      0.75      0.75      1686
weighted avg       0.75      0.75      0.75      1686

Tokenizer: nltk, No preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7345783814374647
Accuracy = 0.7218268090154211
              precision    recall  f1-score   support

           0       0.75      0.67      0.71       843
           1       0.70      0.77      0.73       843

    accuracy                           0.72      1686
   macro avg       0.72      0.72      0.72      1686
weighted avg       0.72      0.72      0.72      1686

Tokenizer: nltk, Preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7166564228641672
Accuracy = 0.7265717674970344
              precision    recall  f1-score   support

           0       0.71      0.76      0.74       843
           1       0.74      0.69      0.72       843

    accuracy                           0.73      1686
   macro avg       0.73      0.73      0.73      1686
weighted avg       0.73      0.73      0.73      1686

Tokenizer: nltk, Preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7179487179487178
Accuracy = 0.7064056939501779
              precision    recall  f1-score   support

           0       0.72      0.67      0.69       843
           1       0.69      0.75      0.72       843

    accuracy                           0.71      1686
   macro avg       0.71      0.71      0.71      1686
weighted avg       0.71      0.71      0.71      1686

Language: finnish, Input: question_context
Tokenizer: bert, No preprocessing, no weighting


Token indices sequence length is longer than the specified maximum sequence length for this model (730 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1588 > 512). Running this sequence through the model will result in indexing errors
  "The parameter 'token_pattern' will not be used"


F1 = 0.703547805171377
Accuracy = 0.7075919335705813
              precision    recall  f1-score   support

           0       0.70      0.72      0.71       843
           1       0.71      0.69      0.70       843

    accuracy                           0.71      1686
   macro avg       0.71      0.71      0.71      1686
weighted avg       0.71      0.71      0.71      1686

Tokenizer: bert, No preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7136725935009198
Accuracy = 0.7230130486358244
              precision    recall  f1-score   support

           0       0.71      0.76      0.73       843
           1       0.74      0.69      0.71       843

    accuracy                           0.72      1686
   macro avg       0.72      0.72      0.72      1686
weighted avg       0.72      0.72      0.72      1686

Tokenizer: bert, Preprocessing, no weighting


Token indices sequence length is longer than the specified maximum sequence length for this model (719 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1189 > 512). Running this sequence through the model will result in indexing errors
  "The parameter 'token_pattern' will not be used"


F1 = 0.692493946731235
Accuracy = 0.6986951364175563
              precision    recall  f1-score   support

           0       0.69      0.72      0.70       843
           1       0.71      0.68      0.69       843

    accuracy                           0.70      1686
   macro avg       0.70      0.70      0.70      1686
weighted avg       0.70      0.70      0.70      1686

Tokenizer: bert, Preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.702075702075702
Accuracy = 0.7105575326215896
              precision    recall  f1-score   support

           0       0.70      0.74      0.72       843
           1       0.72      0.68      0.70       843

    accuracy                           0.71      1686
   macro avg       0.71      0.71      0.71      1686
weighted avg       0.71      0.71      0.71      1686

Tokenizer: spacy, No preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7120743034055727
Accuracy = 0.7241992882562278
              precision    recall  f1-score   support

           0       0.71      0.77      0.74       843
           1       0.74      0.68      0.71       843

    accuracy                           0.72      1686
   macro avg       0.73      0.72      0.72      1686
weighted avg       0.73      0.72      0.72      1686

Tokenizer: spacy, No preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.718961625282167
Accuracy = 0.7046263345195729
              precision    recall  f1-score   support

           0       0.73      0.65      0.69       843
           1       0.69      0.76      0.72       843

    accuracy                           0.70      1686
   macro avg       0.71      0.70      0.70      1686
weighted avg       0.71      0.70      0.70      1686

Tokenizer: spacy, Preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7118012422360249
Accuracy = 0.7247924080664294
              precision    recall  f1-score   support

           0       0.71      0.77      0.74       843
           1       0.75      0.68      0.71       843

    accuracy                           0.72      1686
   macro avg       0.73      0.72      0.72      1686
weighted avg       0.73      0.72      0.72      1686

Tokenizer: spacy, Preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7135446685878963
Accuracy = 0.7052194543297746
              precision    recall  f1-score   support

           0       0.72      0.68      0.70       843
           1       0.69      0.73      0.71       843

    accuracy                           0.71      1686
   macro avg       0.71      0.71      0.70      1686
weighted avg       0.71      0.71      0.70      1686

Tokenizer: nltk, No preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7193528313627878
Accuracy = 0.732502965599051
              precision    recall  f1-score   support

           0       0.71      0.78      0.74       843
           1       0.76      0.69      0.72       843

    accuracy                           0.73      1686
   macro avg       0.73      0.73      0.73      1686
weighted avg       0.73      0.73      0.73      1686

Tokenizer: nltk, No preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7329902801600915
Accuracy = 0.7230130486358244
              precision    recall  f1-score   support

           0       0.74      0.69      0.71       843
           1       0.71      0.76      0.73       843

    accuracy                           0.72      1686
   macro avg       0.72      0.72      0.72      1686
weighted avg       0.72      0.72      0.72      1686

Tokenizer: nltk, Preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7080745341614907
Accuracy = 0.7212336892052195
              precision    recall  f1-score   support

           0       0.70      0.77      0.73       843
           1       0.74      0.68      0.71       843

    accuracy                           0.72      1686
   macro avg       0.72      0.72      0.72      1686
weighted avg       0.72      0.72      0.72      1686

Tokenizer: nltk, Preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7135446685878963
Accuracy = 0.7052194543297746
              precision    recall  f1-score   support

           0       0.72      0.68      0.70       843
           1       0.69      0.73      0.71       843

    accuracy                           0.71      1686
   macro avg       0.71      0.71      0.70      1686
weighted avg       0.71      0.71      0.70      1686

Language: japanese, Input: question
Tokenizer: bert, No preprocessing, no weighting


Downloading:   0%|          | 0.00/104 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/479 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/258k [00:00<?, ?B/s]

  "The parameter 'token_pattern' will not be used"


F1 = 0.5256410256410257
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.45      0.47       518
           1       0.50      0.55      0.53       518

    accuracy                           0.50      1036
   macro avg       0.50      0.50      0.50      1036
weighted avg       0.50      0.50      0.50      1036

Tokenizer: bert, No preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.5595238095238095
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.36      0.42       518
           1       0.50      0.64      0.56       518

    accuracy                           0.50      1036
   macro avg       0.50      0.50      0.49      1036
weighted avg       0.50      0.50      0.49      1036

Tokenizer: bert, Preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.5167910447761194
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.47      0.48       518
           1       0.50      0.53      0.52       518

    accuracy                           0.50      1036
   macro avg       0.50      0.50      0.50      1036
weighted avg       0.50      0.50      0.50      1036

Tokenizer: bert, Preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.5632377740303541
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.36      0.42       518
           1       0.50      0.64      0.56       518

    accuracy                           0.50      1036
   macro avg       0.50      0.50      0.49      1036
weighted avg       0.50      0.50      0.49      1036

Tokenizer: spacy, No preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.49512670565302147
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.51      0.50       518
           1       0.50      0.49      0.50       518

    accuracy                           0.50      1036
   macro avg       0.50      0.50      0.50      1036
weighted avg       0.50      0.50      0.50      1036

Tokenizer: spacy, No preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.5038314176245211
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.49      0.50       518
           1       0.50      0.51      0.50       518

    accuracy                           0.50      1036
   macro avg       0.50      0.50      0.50      1036
weighted avg       0.50      0.50      0.50      1036

Tokenizer: spacy, Preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.494140625
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.51      0.51       518
           1       0.50      0.49      0.49       518

    accuracy                           0.50      1036
   macro avg       0.50      0.50      0.50      1036
weighted avg       0.50      0.50      0.50      1036

Tokenizer: spacy, Preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.5066666666666666
Accuracy = 0.5
              precision    recall  f1-score   support

           0       0.50      0.49      0.49       518
           1       0.50      0.51      0.51       518

    accuracy                           0.50      1036
   macro avg       0.50      0.50      0.50      1036
weighted avg       0.50      0.50      0.50      1036

Language: japanese, Input: context
Tokenizer: bert, No preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.6805555555555556
Accuracy = 0.6891891891891891
              precision    recall  f1-score   support

           0       0.68      0.72      0.70       518
           1       0.70      0.66      0.68       518

    accuracy                           0.69      1036
   macro avg       0.69      0.69      0.69      1036
weighted avg       0.69      0.69      0.69      1036

Tokenizer: bert, No preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7131474103585657
Accuracy = 0.722007722007722
              precision    recall  f1-score   support

           0       0.71      0.75      0.73       518
           1       0.74      0.69      0.71       518

    accuracy                           0.72      1036
   macro avg       0.72      0.72      0.72      1036
weighted avg       0.72      0.72      0.72      1036

Tokenizer: bert, Preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.6901960784313725
Accuracy = 0.694980694980695
              precision    recall  f1-score   support

           0       0.69      0.71      0.70       518
           1       0.70      0.68      0.69       518

    accuracy                           0.69      1036
   macro avg       0.70      0.69      0.69      1036
weighted avg       0.70      0.69      0.69      1036

Tokenizer: bert, Preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7012987012987013
Accuracy = 0.7113899613899614
              precision    recall  f1-score   support

           0       0.70      0.75      0.72       518
           1       0.73      0.68      0.70       518

    accuracy                           0.71      1036
   macro avg       0.71      0.71      0.71      1036
weighted avg       0.71      0.71      0.71      1036

Tokenizer: spacy, No preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.698443579766537
Accuracy = 0.7007722007722008
              precision    recall  f1-score   support

           0       0.70      0.71      0.70       518
           1       0.70      0.69      0.70       518

    accuracy                           0.70      1036
   macro avg       0.70      0.70      0.70      1036
weighted avg       0.70      0.70      0.70      1036

Tokenizer: spacy, No preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7154929577464788
Accuracy = 0.7075289575289575
              precision    recall  f1-score   support

           0       0.72      0.68      0.70       518
           1       0.70      0.74      0.72       518

    accuracy                           0.71      1036
   macro avg       0.71      0.71      0.71      1036
weighted avg       0.71      0.71      0.71      1036

Tokenizer: spacy, Preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7007722007722008
Accuracy = 0.7007722007722008
              precision    recall  f1-score   support

           0       0.70      0.70      0.70       518
           1       0.70      0.70      0.70       518

    accuracy                           0.70      1036
   macro avg       0.70      0.70      0.70      1036
weighted avg       0.70      0.70      0.70      1036

Tokenizer: spacy, Preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.713472485768501
Accuracy = 0.7084942084942085
              precision    recall  f1-score   support

           0       0.72      0.69      0.70       518
           1       0.70      0.73      0.71       518

    accuracy                           0.71      1036
   macro avg       0.71      0.71      0.71      1036
weighted avg       0.71      0.71      0.71      1036

Language: japanese, Input: question_context
Tokenizer: bert, No preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.6746268656716419
Accuracy = 0.6843629343629344
              precision    recall  f1-score   support

           0       0.67      0.71      0.69       518
           1       0.70      0.65      0.67       518

    accuracy                           0.68      1036
   macro avg       0.69      0.68      0.68      1036
weighted avg       0.69      0.68      0.68      1036

Tokenizer: bert, No preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.703187250996016
Accuracy = 0.7123552123552124
              precision    recall  f1-score   support

           0       0.70      0.74      0.72       518
           1       0.73      0.68      0.70       518

    accuracy                           0.71      1036
   macro avg       0.71      0.71      0.71      1036
weighted avg       0.71      0.71      0.71      1036

Tokenizer: bert, Preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.6803519061583577
Accuracy = 0.6843629343629344
              precision    recall  f1-score   support

           0       0.68      0.70      0.69       518
           1       0.69      0.67      0.68       518

    accuracy                           0.68      1036
   macro avg       0.68      0.68      0.68      1036
weighted avg       0.68      0.68      0.68      1036

Tokenizer: bert, Preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.6875000000000001
Accuracy = 0.7007722007722008
              precision    recall  f1-score   support

           0       0.69      0.74      0.71       518
           1       0.72      0.66      0.69       518

    accuracy                           0.70      1036
   macro avg       0.70      0.70      0.70      1036
weighted avg       0.70      0.70      0.70      1036

Tokenizer: spacy, No preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.6867119301648885
Accuracy = 0.6882239382239382
              precision    recall  f1-score   support

           0       0.69      0.69      0.69       518
           1       0.69      0.68      0.69       518

    accuracy                           0.69      1036
   macro avg       0.69      0.69      0.69      1036
weighted avg       0.69      0.69      0.69      1036

Tokenizer: spacy, No preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7092469018112487
Accuracy = 0.7055984555984556
              precision    recall  f1-score   support

           0       0.71      0.69      0.70       518
           1       0.70      0.72      0.71       518

    accuracy                           0.71      1036
   macro avg       0.71      0.71      0.71      1036
weighted avg       0.71      0.71      0.71      1036

Tokenizer: spacy, Preprocessing, no weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.6797257590597453
Accuracy = 0.6843629343629344
              precision    recall  f1-score   support

           0       0.68      0.70      0.69       518
           1       0.69      0.67      0.68       518

    accuracy                           0.68      1036
   macro avg       0.68      0.68      0.68      1036
weighted avg       0.68      0.68      0.68      1036

Tokenizer: spacy, Preprocessing, tfidf weighting


  "The parameter 'token_pattern' will not be used"


F1 = 0.7127761767531221
Accuracy = 0.7113899613899614
              precision    recall  f1-score   support

           0       0.71      0.71      0.71       518
           1       0.71      0.72      0.71       518

    accuracy                           0.71      1036
   macro avg       0.71      0.71      0.71      1036
weighted avg       0.71      0.71      0.71      1036



In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
results.to_csv('results_lab1.csv')

In [None]:
from google.colab import files
files.download('results_lab1.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>