## How to run the experiments

Run the code blocs bellow in sequence. You can read the descriptions to understand it.


The dependencies can be found in https://github.com/eduardogc8/simple-qc

Before starting to run the experiments, change the variable ``path_wordembedding``, in the code block below, for the correct directory path. Make sure that the word embedding inside follow the template `wiki.multi.*.vec`.

In [1]:
import nltk
import numpy as np
import pandas as pd
from keras.preprocessing.sequence import pad_sequences
from sklearn.feature_selection import SelectKBest, chi2
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import normalize

from benchmarking_methods import run_benchmark
from building_classifiers import lstm_default, svm_linear, random_forest, cnn
from download_word_embeddings import muse_embeddings_path, download_if_not_existing
from loading_data import load_embedding, load_uiuc

path_wordembedding = '/home/eduardo/word_embedding/'
download_if_not_existing()
from benchmarking_methods import run_benchmark_cv
from feature_creation import create_feature
from loading_data import load_disequa

Using TensorFlow backend.


### Extract features

The function *create_features* transform the questions in numerical vector to a classifier model.<br>It returns the output in the df_2 dataframe that is a parameter (*df_2.feature_type*, according to the *feature_type*).<br><br>
**feature_type:** type of feature. (bow, tfidf, embedding, embedding_sum, vocab_index, pos_index, pos_hotencode, ner_index, ner_hotencode)<br> 
**df:** the dataframe used to fit the transformers models (df.questions).<br>
**df_2:** dataframe wich the data will be transformed (df_2.questions).<br>
**embedding:** embedding model for word embedding features type.<br>
**max_features:** used in bag-of-words and TFIDF.


### Create classifier models

The models are created through functions that return them. These functions will be used to create a new model in each experiment. Therefore, an instance of a model is created by the benchmark function and not explicitly in a code block.


### UTILS



#### Load UIUC dataset

#### Load DISEQuA dataset

## Benchmark UIUC - Normal

**Normal:** it uses the default fixed split of UIUC between train dataset (at last 5500 instances) and test dataset (500 instances). Therefore, it does not use cross-validation.

When the *run_benchmark* function is executed, it will save each result in the *save* path.

**model:** a dictionary with the classifier name and the function to create and return the model (not an instance of the model). <br> Example: *model = {'name': 'SVM', 'model': svm_linear}*<br>
**X:** all the training set.<br>
**y:** all the labels of the training set.<br>
**x_test:** test set.<br>
**y_test:** labels of the test set.<br>
**sizes_train:** sizes of training set. For each size, an experiment is executed.<br>
**runs:** number of time that each experiment is executed (used in models which has parameters with random values, like weights in an ANN).<br>
**save:** csv path where the results will be saved.<br>
**metric_average:** used in f1, recall and precision metrics<br>
**onehot:** one-hot model to transform labels.<br>
**out_dim:** the total of classes for ANN models.<br>
**epochs:** epochs for ANN models.<br>
**batch_size:** batch_size for ANN models.<br>
**vocabulary_size:** vocabulary size (used in CNN model).



## Benchmark UIUC and DISEQuA - Cross-validation

**Cross-validation:** instead of uses default fixed splits, it uses the all the dataset with cross-validation with 10 folds.

When the *run_benchmark* function is executed, it will save each result in the *save* path.

**model:** a dictionary with the classifier name and the function to create and return the model (not an instance of the model). <br> Example: *model = {'name': 'SVM', 'model': svm_linear}*<br>
**X:** Input features.<br>
**y:** Input labels.<br>
**sizes_train:** sizes of training set. For each size, an experiment is executed.<br>
**folds:** Amount of folds for cross-validations.<br>
**save:** csv path where the results will be saved.<br>
**metric_average:** used in f1, recall and precision metrics<br>
**onehot:** one-hot model to transform labels.<br>
**epochs:** epochs for ANN models.<br>
**batch_size:** batch_size for ANN models.<br>
**vocabulary_size:** vocabulary size (used in CNN model).



## Run UIUC Benchmark - Normal

Different classifier models are tested with different dependency levels of external linguistic resources (Low, Medium and High)

#### SVM + TF-IDF

In [2]:
for language in ['en', 'es', 'pt']:
    print('\n\nLanguage: ', language)
    dataset_train, dataset_test = load_uiuc(language)
    create_feature('tfidf', dataset_train, dataset_train, max_features=2000)
    create_feature('tfidf', dataset_train, dataset_test, max_features=2000)
    
    model = {'name': 'svm', 'model': svm_linear}
    
    tfidf_train = np.array([list(r) for r in dataset_train['tfidf'].values])
    tfidf_test = np.array([list(r) for r in dataset_test['tfidf'].values])
    tfidf_train = normalize(tfidf_train, norm='max')
    tfidf_test = normalize(tfidf_test, norm='max')
    
    X_train = np.array([list(x) for x in dataset_train['tfidf'].values])
    X_test = np.array([list(x) for x in dataset_test['tfidf'].values])
    y_train = dataset_train['class'].values
    y_test = dataset_test['class'].values
    
    run_benchmark(model, X_train, y_train, X_test, y_test, sizes_train=[1000, 2000, 3000, 4000, 5500],
                  save='results/UIUC_svm_tfidf_' + language + '.csv', runs=1)



Language:  en

1000|.
2000|.
3000|.
4000|.
5500|.Run time benchmark: 0.5764660835266113


Language:  es

1000|.
2000|.
3000|.

  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)



4000|.
5500|.

  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)


Run time benchmark: 0.6209449768066406


Language:  pt

1000|.
2000|.
3000|.
4000|.
5500|.Run time benchmark: 0.5239431858062744


#### SVM + TF-IDF + WB

In [5]:
for language in ['en', 'es', 'pt']:
    print('\n\nLanguage: ', language)
    embedding = load_embedding(path_wordembedding + 'wiki.multi.' + language + '.vec')
    dataset_train, dataset_test = load_uiuc(language)
    create_feature('tfidf', dataset_train, dataset_train, max_features=2000)
    create_feature('tfidf', dataset_train, dataset_test, max_features=2000)
    create_feature('embedding_sum', None, dataset_train, embedding)
    create_feature('embedding_sum', None, dataset_test, embedding)
    
    model = {'name': 'svm', 'model': svm_linear}
    
    tfidf_train = np.array([list(r) for r in dataset_train['tfidf'].values])
    tfidf_test = np.array([list(r) for r in dataset_test['tfidf'].values])
    tfidf_train = normalize(tfidf_train, norm='max')
    tfidf_test = normalize(tfidf_test, norm='max')
    
    embedding_train = np.array([list(r) for r in dataset_train['embedding_sum'].values])
    embedding_test = np.array([list(r) for r in dataset_test['embedding_sum'].values])
    embedding_train = normalize(embedding_train, norm='max')
    embedding_test = normalize(embedding_test, norm='max')
    
    X_train = np.array([list(x) + list(xx) for x, xx in zip(tfidf_train, embedding_train)])
    X_test = np.array([list(x) + list(xx) for x, xx in zip(tfidf_test, embedding_test)])
    y_train = dataset_train['class'].values
    y_test = dataset_test['class'].values
    
    run_benchmark(model, X_train, y_train, X_test, y_test, sizes_train=[1000, 2000, 3000, 4000, 5500], 
                  runs=1, save='results/UIUC_svm_cortes_' + language + '.csv')



Language:  en

1000|.
2000|.
3000|.
4000|.




5500|.Run time benchmark: 11.371490478515625


Language:  es

1000|.

  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)



2000|.

  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)



3000|.

  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)



4000|.

  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)



5500|.

  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)


Run time benchmark: 14.12940001487732


Language:  pt

1000|.
2000|.




3000|.




4000|.




5500|.Run time benchmark: 14.28162956237793




#### SVM + TF-IDF + WB + POS + NER

In [6]:
for language in ['en', 'es', 'pt']:
    print('\n\nLanguage: ', language)
    embedding = load_embedding(path_wordembedding + 'wiki.multi.' + language + '.vec')
    dataset_train, dataset_test = load_uiuc(language)
    create_feature('tfidf', dataset_train, dataset_train, max_features=2000)
    create_feature('tfidf', dataset_train, dataset_test, max_features=2000)
    create_feature('embedding_sum', dataset_train, dataset_train, embedding)
    create_feature('embedding_sum', dataset_train, dataset_test, embedding)
    create_feature('pos_hotencode', dataset_train, dataset_train)
    create_feature('pos_hotencode', dataset_train, dataset_test)
    create_feature('ner_hotencode', dataset_train, dataset_train)
    create_feature('ner_hotencode', dataset_train, dataset_test)
    model = {'name': 'svm', 'model': svm_linear}
    
    tfidf_train = np.array([list(r) for r in dataset_train['tfidf'].values])
    tfidf_test = np.array([list(r) for r in dataset_test['tfidf'].values])
    tfidf_train = normalize(tfidf_train, norm='max')
    tfidf_test = normalize(tfidf_test, norm='max')
    
    embedding_train = np.array([list(r) for r in dataset_train['embedding_sum'].values])
    embedding_test = np.array([list(r) for r in dataset_test['embedding_sum'].values])
    embedding_train = normalize(embedding_train, norm='max')
    embedding_test = normalize(embedding_test, norm='max')
    
    pos_train = np.array([list(r) for r in dataset_train['pos_hotencode'].values])
    pos_test = np.array([list(r) for r in dataset_test['pos_hotencode'].values])
    
    ner_train = np.array([list(r) for r in dataset_train['ner_hotencode'].values])
    ner_test = np.array([list(r) for r in dataset_test['ner_hotencode'].values])
    
    X_train = np.array([list(x) + list(xx) + list(xxx) + list(xxxx) for x, xx, xxx, xxxx in zip(tfidf_train, embedding_train, pos_train, ner_train)])
    X_test = np.array([list(x) + list(xx) + list(xxx) + list(xxxx) for x, xx, xxx, xxxx in zip(tfidf_test, embedding_test, pos_test, ner_test)])
    
    y_train = dataset_train['class'].values
    y_test = dataset_test['class'].values
    
    classes = list(dataset_train['class'].unique())
    y_train_ = [classes.index(c) for c in y_train]
    
    run_benchmark(model, X_train, y_train, X_test, y_test, sizes_train=[1000, 2000, 3000, 4000, 5500],
                  runs=1, save='results/UIUC_svm_high_' + language + '.csv')



Language:  en

1000|.
2000|.
3000|.




4000|.




5500|.



Run time benchmark: 12.750246524810791


Language:  es

1000|.

  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)



2000|.

  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)



3000|.

  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)



4000|.

  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)



5500|.

  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)


Run time benchmark: 15.330715417861938


Language:  pt

1000|.




2000|.




3000|.




4000|.




5500|.Run time benchmark: 13.996777296066284




#### BERT + CNN

In [31]:
from typing import List
from flair_cnn_doc_embedding import DocumentCNNEmbeddings
from torch.utils.data import Dataset
import torch
from flair.data import Sentence, Corpus
from flair.embeddings import DocumentRNNEmbeddings, BertEmbeddings
from flair.models import TextClassifier
from flair.trainers import ModelTrainer
import time
import datetime
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, matthews_corrcoef, confusion_matrix


def build_flair_sentences(text_label_tuples):
    sentences = [Sentence(text, labels=[label], use_tokenizer=True) for text,label in text_label_tuples]
    return [s for s in sentences if len(s.tokens) > 0]

def get_labels(sentences:List[Sentence]):
    return [[l.value for l in s.labels] for s in sentences]


def calc_metrics_with_sklearn(clf:TextClassifier,sentences:List[Sentence],train_size=0,
                              run=0,train_time=0,metric_average='macro',
                              classes=['ABBR', 'DESC', 'ENTY', 'HUM', 'LOC', 'NUM']):
    targets = get_labels(sentences)
    start_time = time.time()
    clf.predict(sentences)
    test_time = time.time() - start_time
    prediction = get_labels(sentences)
    p = prediction
    t = targets
    data = {'datetime': datetime.datetime.now(),
            'model': 'cnn_bert',
            'accuracy': accuracy_score(prediction, targets),
            'precision': precision_score(prediction, targets, average=metric_average),
            'recall': recall_score(prediction, targets, average=metric_average),
            'f1': f1_score(prediction, targets, average=metric_average),
            'mcc': matthews_corrcoef(prediction, targets),
            'confusion': confusion_matrix(prediction, targets, labels=classes),
            'run': run,
            'train_size': size_train,
            'execution_time': train_time,
            'test_time': test_time}
          
    #report = metrics.classification_report(y_true=targets, y_pred=prediction, digits=3, output_dict=True)
    return data


for language in ['en', 'es', 'pt']: # , 
    results = pd.DataFrame()
    
    save = 'results/UIUC_cnn_bert_'+language+'.csv'
    for size_train in [1000, 2000, 3000, 4000, 5500]:
        for run in range(1,3):
            dataset_train, dataset_test = load_uiuc(language)
            if size_train < 5500:
                dataset_train = dataset_train[:size_train]

            sentences_train:Dataset = build_flair_sentences([(text, label) for text, label in zip(dataset_train['question'], dataset_train['class'])])
            sentences_dev:Dataset = sentences_train
            sentences_test:Dataset = build_flair_sentences([(text, label) for text, label in zip(dataset_test['question'], dataset_test['class'])])

            corpus:Corpus = Corpus(sentences_train, sentences_dev, sentences_test)
            label_dict = corpus.make_label_dictionary()
            word_embeddings = [
                # WordEmbeddings('glove'),
                BertEmbeddings('bert-base-multilingual-cased', layers='-1')
            ]
            document_embeddings = DocumentCNNEmbeddings(word_embeddings,
                                                        dropout=0.0,
                                                        hidden_size=64,
                                                        )

            clf = TextClassifier(document_embeddings, label_dictionary=label_dict, multi_label=False)
            trainer = ModelTrainer(clf, corpus,torch.optim.RMSprop)
            base_path = 'flair_resources/qc_en_uiuc'
            start_time = time.time()
            trainer.train(base_path,
                          learning_rate=0.001,
                          mini_batch_size=32,
                          anneal_factor=0.5,
                          patience=2,
                          max_epochs=4)
            train_time = time.time() - start_time
            data = calc_metrics_with_sklearn(clf, sentences_train, train_size=size_train, train_time=train_time, run=run)
            results = results.append([data])
            results.to_csv(save)

2019-08-17 08:46:23,258 {'LOC', 'NUM', 'DESC', 'ENTY', 'ABBR', 'HUM'}
2019-08-17 08:46:23,300 The pre-trained model you are loading is a cased model but you have not set `do_lower_case` to False. We are setting `do_lower_case=False` for you but you may want to check this behavior.
2019-08-17 08:46:43,412 ----------------------------------------------------------------------------------------------------
2019-08-17 08:46:43,413 Evaluation method: MICRO_F1_SCORE
2019-08-17 08:46:43,796 ----------------------------------------------------------------------------------------------------
2019-08-17 08:46:45,682 epoch 1 - iter 0/32 - loss 1.79350603
2019-08-17 08:46:48,613 epoch 1 - iter 3/32 - loss 10.62281498
2019-08-17 08:46:50,756 epoch 1 - iter 6/32 - loss 8.10027502
2019-08-17 08:46:52,628 epoch 1 - iter 9/32 - loss 7.24572681
2019-08-17 08:46:55,135 epoch 1 - iter 12/32 - loss 6.18870946
2019-08-17 08:46:57,147 epoch 1 - iter 15/32 - loss 5.40461296
2019-08-17 08:46:59,435 epoch 1 - i

2019-08-17 08:53:00,028 epoch 2 - iter 6/32 - loss 1.10534452
2019-08-17 08:53:02,064 epoch 2 - iter 9/32 - loss 1.05754998
2019-08-17 08:53:04,948 epoch 2 - iter 12/32 - loss 1.03652987
2019-08-17 08:53:07,004 epoch 2 - iter 15/32 - loss 0.99312482
2019-08-17 08:53:09,066 epoch 2 - iter 18/32 - loss 0.97060237
2019-08-17 08:53:11,406 epoch 2 - iter 21/32 - loss 0.93687834
2019-08-17 08:53:13,365 epoch 2 - iter 24/32 - loss 0.88149774
2019-08-17 08:53:15,487 epoch 2 - iter 27/32 - loss 0.84316277
2019-08-17 08:53:18,023 epoch 2 - iter 30/32 - loss 0.81487109
2019-08-17 08:53:18,480 ----------------------------------------------------------------------------------------------------
2019-08-17 08:53:18,481 EPOCH 2 done: loss 0.7926 - lr 0.0010 - bad epochs 0
2019-08-17 08:53:41,693 DEV : loss 0.46403899788856506 - score 0.815
2019-08-17 08:53:50,034 TEST : loss 0.5217366814613342 - score 0.802
2019-08-17 08:53:56,507 -----------------------------------------------------------------------

2019-08-17 09:02:03,889 DEV : loss 0.2357037216424942 - score 0.916
2019-08-17 09:02:12,377 TEST : loss 0.3198506832122803 - score 0.888
2019-08-17 09:02:22,128 ----------------------------------------------------------------------------------------------------
2019-08-17 09:02:23,673 epoch 4 - iter 0/63 - loss 0.18794799
2019-08-17 09:02:28,971 epoch 4 - iter 6/63 - loss 0.19233679
2019-08-17 09:02:33,335 epoch 4 - iter 12/63 - loss 0.23846299
2019-08-17 09:02:37,672 epoch 4 - iter 18/63 - loss 0.22381369
2019-08-17 09:02:41,672 epoch 4 - iter 24/63 - loss 0.21182877
2019-08-17 09:02:46,250 epoch 4 - iter 30/63 - loss 0.21003715
2019-08-17 09:02:50,844 epoch 4 - iter 36/63 - loss 0.19693211
2019-08-17 09:02:55,315 epoch 4 - iter 42/63 - loss 0.19483417
2019-08-17 09:02:59,481 epoch 4 - iter 48/63 - loss 0.20569481
2019-08-17 09:03:04,109 epoch 4 - iter 54/63 - loss 0.20341834
2019-08-17 09:03:08,687 epoch 4 - iter 60/63 - loss 0.20423686
2019-08-17 09:03:10,282 -----------------------

2019-08-17 09:12:47,128 ----------------------------------------------------------------------------------------------------
2019-08-17 09:13:31,106 {'LOC', 'NUM', 'DESC', 'ENTY', 'ABBR', 'HUM'}
2019-08-17 09:13:31,108 The pre-trained model you are loading is a cased model but you have not set `do_lower_case` to False. We are setting `do_lower_case=False` for you but you may want to check this behavior.
2019-08-17 09:13:49,550 ----------------------------------------------------------------------------------------------------
2019-08-17 09:13:49,551 Evaluation method: MICRO_F1_SCORE
2019-08-17 09:13:49,956 ----------------------------------------------------------------------------------------------------
2019-08-17 09:13:51,473 epoch 1 - iter 0/94 - loss 1.81323349
2019-08-17 09:13:58,089 epoch 1 - iter 9/94 - loss 7.05433103
2019-08-17 09:14:04,433 epoch 1 - iter 18/94 - loss 4.81122203
2019-08-17 09:14:11,394 epoch 1 - iter 27/94 - loss 3.72901098
2019-08-17 09:14:17,848 epoch 1 - i

2019-08-17 09:28:23,293 epoch 2 - iter 0/94 - loss 0.36321509
2019-08-17 09:28:30,102 epoch 2 - iter 9/94 - loss 0.37367314
2019-08-17 09:28:36,695 epoch 2 - iter 18/94 - loss 0.34471758
2019-08-17 09:28:43,518 epoch 2 - iter 27/94 - loss 0.35984362
2019-08-17 09:28:49,813 epoch 2 - iter 36/94 - loss 0.37111155
2019-08-17 09:28:57,099 epoch 2 - iter 45/94 - loss 0.37063751
2019-08-17 09:29:03,814 epoch 2 - iter 54/94 - loss 0.37399277
2019-08-17 09:29:11,272 epoch 2 - iter 63/94 - loss 0.37359442
2019-08-17 09:29:17,911 epoch 2 - iter 72/94 - loss 0.37465983
2019-08-17 09:29:24,899 epoch 2 - iter 81/94 - loss 0.37485730
2019-08-17 09:29:31,303 epoch 2 - iter 90/94 - loss 0.36811808
2019-08-17 09:29:33,383 ----------------------------------------------------------------------------------------------------
2019-08-17 09:29:33,385 EPOCH 2 done: loss 0.3760 - lr 0.0010 - bad epochs 0
2019-08-17 09:30:41,759 DEV : loss 0.2107694149017334 - score 0.939
2019-08-17 09:30:50,111 TEST : loss 0.2

2019-08-17 09:46:48,044 ----------------------------------------------------------------------------------------------------
2019-08-17 09:46:48,045 EPOCH 3 done: loss 0.2926 - lr 0.0010 - bad epochs 0
2019-08-17 09:48:20,042 DEV : loss 0.1856355220079422 - score 0.9353
2019-08-17 09:48:28,231 TEST : loss 0.2246459424495697 - score 0.922
2019-08-17 09:48:33,762 ----------------------------------------------------------------------------------------------------
2019-08-17 09:48:35,252 epoch 4 - iter 0/125 - loss 0.17332019
2019-08-17 09:48:45,187 epoch 4 - iter 12/125 - loss 0.17830779
2019-08-17 09:48:54,064 epoch 4 - iter 24/125 - loss 0.18016282
2019-08-17 09:49:03,899 epoch 4 - iter 36/125 - loss 0.21307701
2019-08-17 09:49:12,490 epoch 4 - iter 48/125 - loss 0.20307417
2019-08-17 09:49:21,304 epoch 4 - iter 60/125 - loss 0.21196498
2019-08-17 09:49:30,500 epoch 4 - iter 72/125 - loss 0.20856720
2019-08-17 09:49:39,222 epoch 4 - iter 84/125 - loss 0.21077075
2019-08-17 09:49:49,143 

2019-08-17 10:07:48,837 
MICRO_AVG: acc 0.8762 - f1-score 0.934
MACRO_AVG: acc 0.8825 - f1-score 0.9367166666666668
ABBR       tp: 9 - fp: 1 - fn: 0 - tn: 490 - precision: 0.9000 - recall: 1.0000 - accuracy: 0.9000 - f1-score: 0.9474
DESC       tp: 126 - fp: 3 - fn: 12 - tn: 359 - precision: 0.9767 - recall: 0.9130 - accuracy: 0.8936 - f1-score: 0.9438
ENTY       tp: 87 - fp: 19 - fn: 7 - tn: 387 - precision: 0.8208 - recall: 0.9255 - accuracy: 0.7699 - f1-score: 0.8700
HUM        tp: 62 - fp: 2 - fn: 3 - tn: 433 - precision: 0.9688 - recall: 0.9538 - accuracy: 0.9254 - f1-score: 0.9612
LOC        tp: 76 - fp: 5 - fn: 5 - tn: 414 - precision: 0.9383 - recall: 0.9383 - accuracy: 0.8837 - f1-score: 0.9383
NUM        tp: 107 - fp: 3 - fn: 6 - tn: 384 - precision: 0.9727 - recall: 0.9469 - accuracy: 0.9224 - f1-score: 0.9596
2019-08-17 10:07:48,838 ----------------------------------------------------------------------------------------------------
2019-08-17 10:09:19,024 {'LOC', 'NUM', 'DE

2019-08-17 10:31:13,236 epoch 1 - iter 68/171 - loss 1.90518217
2019-08-17 10:31:30,350 epoch 1 - iter 85/171 - loss 1.63942695
2019-08-17 10:31:42,878 epoch 1 - iter 102/171 - loss 1.44468410
2019-08-17 10:31:55,321 epoch 1 - iter 119/171 - loss 1.30967549
2019-08-17 10:32:08,693 epoch 1 - iter 136/171 - loss 1.19669949
2019-08-17 10:32:20,760 epoch 1 - iter 153/171 - loss 1.11269406
2019-08-17 10:32:32,956 epoch 1 - iter 170/171 - loss 1.04411441
2019-08-17 10:32:33,729 ----------------------------------------------------------------------------------------------------
2019-08-17 10:32:33,731 EPOCH 1 done: loss 1.0441 - lr 0.0010 - bad epochs 0
2019-08-17 10:34:38,355 DEV : loss 0.2695559561252594 - score 0.9118
2019-08-17 10:34:46,985 TEST : loss 0.23475687205791473 - score 0.93
2019-08-17 10:34:52,856 ----------------------------------------------------------------------------------------------------
2019-08-17 10:34:55,116 epoch 2 - iter 0/171 - loss 0.30169415
2019-08-17 10:35:07

2019-08-17 10:53:22,635 DEV : loss 0.9155165553092957 - score 0.642
2019-08-17 10:53:54,409 TEST : loss 0.9028897881507874 - score 0.6901
2019-08-17 10:54:01,032 ----------------------------------------------------------------------------------------------------
2019-08-17 10:54:02,836 epoch 3 - iter 0/32 - loss 0.91624880
2019-08-17 10:54:05,431 epoch 3 - iter 3/32 - loss 0.85368271
2019-08-17 10:54:08,300 epoch 3 - iter 6/32 - loss 0.82884765
2019-08-17 10:54:11,671 epoch 3 - iter 9/32 - loss 0.82783107
2019-08-17 10:54:14,095 epoch 3 - iter 12/32 - loss 0.83458025
2019-08-17 10:54:17,030 epoch 3 - iter 15/32 - loss 0.81725820
2019-08-17 10:54:19,853 epoch 3 - iter 18/32 - loss 0.80729662
2019-08-17 10:54:22,408 epoch 3 - iter 21/32 - loss 0.81678014
2019-08-17 10:54:25,545 epoch 3 - iter 24/32 - loss 0.80426239
2019-08-17 10:54:28,501 epoch 3 - iter 27/32 - loss 0.80894031
2019-08-17 10:54:31,533 epoch 3 - iter 30/32 - loss 0.82052761
2019-08-17 10:54:32,114 ------------------------

2019-08-17 11:03:50,812 epoch 4 - iter 21/32 - loss 0.40521612
2019-08-17 11:03:53,862 epoch 4 - iter 24/32 - loss 0.39665372
2019-08-17 11:03:57,415 epoch 4 - iter 27/32 - loss 0.39045922
2019-08-17 11:04:00,321 epoch 4 - iter 30/32 - loss 0.38527540
2019-08-17 11:04:00,906 ----------------------------------------------------------------------------------------------------
2019-08-17 11:04:00,907 EPOCH 4 done: loss 0.3867 - lr 0.0010 - bad epochs 0
2019-08-17 11:04:29,483 DEV : loss 0.45042726397514343 - score 0.823
2019-08-17 11:05:01,306 TEST : loss 1.1431770324707031 - score 0.6768
2019-08-17 11:05:14,160 ----------------------------------------------------------------------------------------------------
2019-08-17 11:05:14,162 Testing using best model ...
2019-08-17 11:05:14,164 loading file flair_resources/qc_en_uiuc/best-model.pt
2019-08-17 11:05:46,519 0.6768	0.6768	0.6768
2019-08-17 11:05:46,521 
MICRO_AVG: acc 0.5115 - f1-score 0.6768
MACRO_AVG: acc 0.4054 - f1-score 0.522583

2019-08-17 11:18:26,757 {'LOC', 'NUM', 'DESC', 'ENTY', 'ABBR', 'HUM'}
2019-08-17 11:18:26,759 The pre-trained model you are loading is a cased model but you have not set `do_lower_case` to False. We are setting `do_lower_case=False` for you but you may want to check this behavior.
2019-08-17 11:18:45,928 ----------------------------------------------------------------------------------------------------
2019-08-17 11:18:45,930 Evaluation method: MICRO_F1_SCORE
2019-08-17 11:18:46,519 ----------------------------------------------------------------------------------------------------
2019-08-17 11:18:48,162 epoch 1 - iter 0/63 - loss 2.25217152
2019-08-17 11:18:54,924 epoch 1 - iter 6/63 - loss 9.76867175
2019-08-17 11:19:00,804 epoch 1 - iter 12/63 - loss 7.11318125
2019-08-17 11:19:06,406 epoch 1 - iter 18/63 - loss 5.66402275
2019-08-17 11:19:11,758 epoch 1 - iter 24/63 - loss 4.81037269
2019-08-17 11:19:17,368 epoch 1 - iter 30/63 - loss 4.15887899
2019-08-17 11:19:23,947 epoch 1 - 

2019-08-17 11:35:01,655 epoch 2 - iter 18/94 - loss 0.74365022
2019-08-17 11:35:10,080 epoch 2 - iter 27/94 - loss 0.76530844
2019-08-17 11:35:18,607 epoch 2 - iter 36/94 - loss 0.73943855
2019-08-17 11:35:27,561 epoch 2 - iter 45/94 - loss 0.73229873
2019-08-17 11:35:35,207 epoch 2 - iter 54/94 - loss 0.70104388
2019-08-17 11:35:43,264 epoch 2 - iter 63/94 - loss 0.67599460
2019-08-17 11:35:51,742 epoch 2 - iter 72/94 - loss 0.66816185
2019-08-17 11:36:00,660 epoch 2 - iter 81/94 - loss 0.65971904
2019-08-17 11:36:08,170 epoch 2 - iter 90/94 - loss 0.65569107
2019-08-17 11:36:11,033 ----------------------------------------------------------------------------------------------------
2019-08-17 11:36:11,034 EPOCH 2 done: loss 0.6562 - lr 0.0010 - bad epochs 0
2019-08-17 11:37:35,919 DEV : loss 0.47348660230636597 - score 0.8453
2019-08-17 11:38:07,588 TEST : loss 0.882342517375946 - score 0.7554
2019-08-17 11:38:13,697 --------------------------------------------------------------------

2019-08-17 11:57:45,194 EPOCH 3 done: loss 0.4171 - lr 0.0010 - bad epochs 0
2019-08-17 11:59:11,300 DEV : loss 0.4108215868473053 - score 0.8413
2019-08-17 11:59:43,427 TEST : loss 1.0726624727249146 - score 0.7413
2019-08-17 11:59:50,227 ----------------------------------------------------------------------------------------------------
2019-08-17 11:59:52,233 epoch 4 - iter 0/94 - loss 0.40739280
2019-08-17 12:00:00,342 epoch 4 - iter 9/94 - loss 0.21649240
2019-08-17 12:00:08,975 epoch 4 - iter 18/94 - loss 0.28413985
2019-08-17 12:00:17,760 epoch 4 - iter 27/94 - loss 0.29646342
2019-08-17 12:00:26,695 epoch 4 - iter 36/94 - loss 0.29993403
2019-08-17 12:00:35,113 epoch 4 - iter 45/94 - loss 0.30052616
2019-08-17 12:00:42,969 epoch 4 - iter 54/94 - loss 0.28417528
2019-08-17 12:00:53,041 epoch 4 - iter 63/94 - loss 0.29020311
2019-08-17 12:01:01,384 epoch 4 - iter 72/94 - loss 0.29630827
2019-08-17 12:01:09,752 epoch 4 - iter 81/94 - loss 0.30193674
2019-08-17 12:01:26,678 epoch 4

2019-08-17 12:27:47,555 ----------------------------------------------------------------------------------------------------
2019-08-17 12:29:45,723 {'LOC', 'NUM', 'DESC', 'ENTY', 'ABBR', 'HUM'}
2019-08-17 12:29:45,726 The pre-trained model you are loading is a cased model but you have not set `do_lower_case` to False. We are setting `do_lower_case=False` for you but you may want to check this behavior.
2019-08-17 12:30:07,850 ----------------------------------------------------------------------------------------------------
2019-08-17 12:30:07,851 Evaluation method: MICRO_F1_SCORE
2019-08-17 12:30:08,570 ----------------------------------------------------------------------------------------------------
2019-08-17 12:30:10,618 epoch 1 - iter 0/125 - loss 2.00352716
2019-08-17 12:30:22,354 epoch 1 - iter 12/125 - loss 8.09406948
2019-08-17 12:30:33,513 epoch 1 - iter 24/125 - loss 5.17808596
2019-08-17 12:30:44,228 epoch 1 - iter 36/125 - loss 3.97723382
2019-08-17 12:30:54,415 epoch 

2019-08-17 13:00:05,203 epoch 2 - iter 0/169 - loss 0.65735203
2019-08-17 13:00:20,827 epoch 2 - iter 16/169 - loss 0.42668125
2019-08-17 13:00:35,781 epoch 2 - iter 32/169 - loss 0.46866686
2019-08-17 13:00:50,638 epoch 2 - iter 48/169 - loss 0.47109203
2019-08-17 13:01:06,108 epoch 2 - iter 64/169 - loss 0.49198029
2019-08-17 13:01:20,295 epoch 2 - iter 80/169 - loss 0.48497822
2019-08-17 13:01:35,141 epoch 2 - iter 96/169 - loss 0.49076826
2019-08-17 13:01:50,992 epoch 2 - iter 112/169 - loss 0.48806929
2019-08-17 13:02:04,402 epoch 2 - iter 128/169 - loss 0.49193886
2019-08-17 13:02:18,987 epoch 2 - iter 144/169 - loss 0.49288729
2019-08-17 13:02:33,855 epoch 2 - iter 160/169 - loss 0.48771508
2019-08-17 13:02:43,141 ----------------------------------------------------------------------------------------------------
2019-08-17 13:02:43,143 EPOCH 2 done: loss 0.4822 - lr 0.0010 - bad epochs 0
2019-08-17 13:05:14,837 DEV : loss 0.3818710148334503 - score 0.8653
2019-08-17 13:05:48,08

2019-08-17 13:45:27,242 epoch 3 - iter 160/169 - loss 0.37776678
2019-08-17 13:46:01,151 ----------------------------------------------------------------------------------------------------
2019-08-17 13:46:02,212 EPOCH 3 done: loss 0.3808 - lr 0.0010 - bad epochs 0
2019-08-17 13:48:51,038 DEV : loss 0.24817469716072083 - score 0.9168
2019-08-17 13:49:23,327 TEST : loss 0.9335619807243347 - score 0.788
2019-08-17 13:49:37,981 ----------------------------------------------------------------------------------------------------
2019-08-17 13:49:40,385 epoch 4 - iter 0/169 - loss 0.13038202
2019-08-17 13:49:55,540 epoch 4 - iter 16/169 - loss 0.25369955
2019-08-17 13:50:09,605 epoch 4 - iter 32/169 - loss 0.30129015
2019-08-17 13:50:24,649 epoch 4 - iter 48/169 - loss 0.29098105
2019-08-17 13:50:41,010 epoch 4 - iter 64/169 - loss 0.29712043
2019-08-17 13:50:56,187 epoch 4 - iter 80/169 - loss 0.28285869
2019-08-17 13:55:14,325 epoch 4 - iter 96/169 - loss 0.29421579
2019-08-17 13:55:35,42

2019-08-17 14:11:17,034 
MICRO_AVG: acc 0.5674 - f1-score 0.724
MACRO_AVG: acc 0.4593 - f1-score 0.5640833333333334
ABBR       tp: 0 - fp: 0 - fn: 9 - tn: 491 - precision: 0.0000 - recall: 0.0000 - accuracy: 0.0000 - f1-score: 0.0000
DESC       tp: 135 - fp: 58 - fn: 3 - tn: 304 - precision: 0.6995 - recall: 0.9783 - accuracy: 0.6888 - f1-score: 0.8157
ENTY       tp: 13 - fp: 5 - fn: 81 - tn: 401 - precision: 0.7222 - recall: 0.1383 - accuracy: 0.1313 - f1-score: 0.2321
HUM        tp: 65 - fp: 62 - fn: 0 - tn: 373 - precision: 0.5118 - recall: 1.0000 - accuracy: 0.5118 - f1-score: 0.6771
LOC        tp: 59 - fp: 11 - fn: 22 - tn: 408 - precision: 0.8429 - recall: 0.7284 - accuracy: 0.6413 - f1-score: 0.7815
NUM        tp: 90 - fp: 2 - fn: 23 - tn: 385 - precision: 0.9783 - recall: 0.7965 - accuracy: 0.7826 - f1-score: 0.8781
2019-08-17 14:11:17,035 ----------------------------------------------------------------------------------------------------
2019-08-17 14:11:43,155 {'LOC', 'NUM', 

2019-08-17 14:20:52,351 epoch 1 - iter 30/63 - loss 3.13793791
2019-08-17 14:20:58,187 epoch 1 - iter 36/63 - loss 2.87647134
2019-08-17 14:21:03,691 epoch 1 - iter 42/63 - loss 2.67898679
2019-08-17 14:21:34,745 epoch 1 - iter 48/63 - loss 2.52067423
2019-08-17 14:21:41,014 epoch 1 - iter 54/63 - loss 2.41049785
2019-08-17 14:21:45,910 epoch 1 - iter 60/63 - loss 2.30902179
2019-08-17 14:22:05,803 ----------------------------------------------------------------------------------------------------
2019-08-17 14:22:07,384 EPOCH 1 done: loss 2.2739 - lr 0.0010 - bad epochs 0
2019-08-17 14:22:59,691 DEV : loss 0.9113616347312927 - score 0.641
2019-08-17 14:23:10,002 TEST : loss 1.051438808441162 - score 0.61
2019-08-17 14:23:16,807 ----------------------------------------------------------------------------------------------------
2019-08-17 14:23:18,990 epoch 2 - iter 0/63 - loss 1.05307782
2019-08-17 14:23:24,850 epoch 2 - iter 6/63 - loss 1.02450063
2019-08-17 14:23:29,627 epoch 2 - it

2019-08-17 14:36:58,319 ----------------------------------------------------------------------------------------------------
2019-08-17 14:37:00,195 epoch 3 - iter 0/63 - loss 0.60367072
2019-08-17 14:37:06,431 epoch 3 - iter 6/63 - loss 0.59765634
2019-08-17 14:37:11,528 epoch 3 - iter 12/63 - loss 0.51860662
2019-08-17 14:37:16,520 epoch 3 - iter 18/63 - loss 0.46668510
2019-08-17 14:37:21,499 epoch 3 - iter 24/63 - loss 0.47681315
2019-08-17 14:37:26,618 epoch 3 - iter 30/63 - loss 0.49923400
2019-08-17 14:37:31,537 epoch 3 - iter 36/63 - loss 0.49364949
2019-08-17 14:37:37,897 epoch 3 - iter 42/63 - loss 0.47639702
2019-08-17 14:37:42,950 epoch 3 - iter 48/63 - loss 0.48325067
2019-08-17 14:37:47,763 epoch 3 - iter 54/63 - loss 0.49018544
2019-08-17 14:37:52,530 epoch 3 - iter 60/63 - loss 0.49157024
2019-08-17 14:38:01,057 ----------------------------------------------------------------------------------------------------
2019-08-17 14:38:01,075 EPOCH 3 done: loss 0.4933 - lr 0.00

2019-08-17 15:01:57,088 epoch 4 - iter 90/94 - loss 0.29633234
2019-08-17 15:02:03,321 ----------------------------------------------------------------------------------------------------
2019-08-17 15:02:03,322 EPOCH 4 done: loss 0.2928 - lr 0.0010 - bad epochs 0
2019-08-17 15:03:33,931 DEV : loss 0.176649808883667 - score 0.9457
2019-08-17 15:03:45,210 TEST : loss 0.3398318588733673 - score 0.896
2019-08-17 15:04:07,309 ----------------------------------------------------------------------------------------------------
2019-08-17 15:04:07,311 Testing using best model ...
2019-08-17 15:04:07,650 loading file flair_resources/qc_en_uiuc/best-model.pt
2019-08-17 15:04:25,693 0.896	0.896	0.896
2019-08-17 15:04:25,695 
MICRO_AVG: acc 0.8116 - f1-score 0.896
MACRO_AVG: acc 0.719 - f1-score 0.8090333333333334
ABBR       tp: 2 - fp: 0 - fn: 7 - tn: 491 - precision: 1.0000 - recall: 0.2222 - accuracy: 0.2222 - f1-score: 0.3636
DESC       tp: 132 - fp: 21 - fn: 6 - tn: 341 - precision: 0.8627 -

2019-08-17 15:31:05,420 ----------------------------------------------------------------------------------------------------
2019-08-17 15:31:05,421 Evaluation method: MICRO_F1_SCORE
2019-08-17 15:31:06,314 ----------------------------------------------------------------------------------------------------
2019-08-17 15:31:08,573 epoch 1 - iter 0/125 - loss 1.77298152
2019-08-17 15:31:21,466 epoch 1 - iter 12/125 - loss 5.29245231
2019-08-17 15:31:38,425 epoch 1 - iter 24/125 - loss 3.74960764
2019-08-17 15:33:50,193 epoch 1 - iter 36/125 - loss 2.99429106
2019-08-17 15:34:31,440 epoch 1 - iter 48/125 - loss 2.56805717
2019-08-17 15:35:09,811 epoch 1 - iter 60/125 - loss 2.25816679
2019-08-17 15:35:42,957 epoch 1 - iter 72/125 - loss 2.03811731
2019-08-17 15:35:54,159 epoch 1 - iter 84/125 - loss 1.85668404
2019-08-17 15:36:05,920 epoch 1 - iter 96/125 - loss 1.71667317
2019-08-17 15:36:15,751 epoch 1 - iter 108/125 - loss 1.58955795
2019-08-17 15:36:25,783 epoch 1 - iter 120/125 - los

2019-08-17 16:10:59,836 epoch 2 - iter 72/125 - loss 0.44173829
2019-08-17 16:11:18,525 epoch 2 - iter 84/125 - loss 0.43576060
2019-08-17 16:11:32,803 epoch 2 - iter 96/125 - loss 0.44104904
2019-08-17 16:11:47,277 epoch 2 - iter 108/125 - loss 0.43823144
2019-08-17 16:12:01,390 epoch 2 - iter 120/125 - loss 0.44227053
2019-08-17 16:12:06,725 ----------------------------------------------------------------------------------------------------
2019-08-17 16:12:06,726 EPOCH 2 done: loss 0.4370 - lr 0.0010 - bad epochs 0
2019-08-17 16:14:08,311 DEV : loss 0.2628241777420044 - score 0.914
2019-08-17 16:14:20,639 TEST : loss 0.31308233737945557 - score 0.892
2019-08-17 16:14:33,333 ----------------------------------------------------------------------------------------------------
2019-08-17 16:14:35,102 epoch 3 - iter 0/125 - loss 0.17005579
2019-08-17 16:14:47,959 epoch 3 - iter 12/125 - loss 0.30663000
2019-08-17 16:14:58,983 epoch 3 - iter 24/125 - loss 0.29315326
2019-08-17 16:15:10,25

2019-08-17 16:54:50,622 ----------------------------------------------------------------------------------------------------
2019-08-17 16:54:52,580 epoch 4 - iter 0/171 - loss 0.80846679
2019-08-17 16:55:07,283 epoch 4 - iter 17/171 - loss 0.32176158
2019-08-17 16:55:23,722 epoch 4 - iter 34/171 - loss 0.28201224
2019-08-17 16:55:38,879 epoch 4 - iter 51/171 - loss 0.28759964
2019-08-17 16:57:56,979 epoch 4 - iter 68/171 - loss 0.27889259
2019-08-17 16:58:33,964 epoch 4 - iter 85/171 - loss 0.27670959
2019-08-17 16:58:54,997 epoch 4 - iter 102/171 - loss 0.27159309
2019-08-17 16:59:14,531 epoch 4 - iter 119/171 - loss 0.27484824
2019-08-17 16:59:36,406 epoch 4 - iter 136/171 - loss 0.28712292
2019-08-17 17:00:10,810 epoch 4 - iter 153/171 - loss 0.28253593
2019-08-17 17:00:37,093 epoch 4 - iter 170/171 - loss 0.28549894
2019-08-17 17:00:37,946 ----------------------------------------------------------------------------------------------------
2019-08-17 17:00:37,947 EPOCH 4 done: loss

2019-08-17 17:38:25,576 ----------------------------------------------------------------------------------------------------


## Run UIUC Benchmark - Cross-validation

Different classifier models are tested with different dependency levels of external linguistic resources (Low, Medium and High)

#### SVM + TF-IDF

In [3]:
for language in ['en', 'es', 'pt']:
    print('\n\nLanguage: ', language)
    dataset_train, dataset_test = load_uiuc(language)
    dataset = pd.concat([dataset_train, dataset_test])
    create_feature('tfidf', dataset, dataset, max_features=2000)
    
    model = {'name': 'svm', 'model': svm_linear}
    
    tfidf = np.array([list(r) for r in dataset['tfidf'].values])
    tfidf = normalize(tfidf, norm='max')
    
    X = np.array([list(x) for x in dataset['tfidf'].values])
    y = dataset['class'].values
    
    
    # run_benchmark_cv(model, X, y, [50, 100] + list(range(500, 5501, 500)),
    run_benchmark_cv(model, X, y, [1000, 2000, 3000, 4000, 5500],
                     save='results/UIUC_cv_svm_tfidf_' + language + '.csv')



Language:  en

1000|..........
2000|..........
3000|..........
4000|..........
5500|..........
Run time benchmark: 8.106821775436401


Language:  es

1000|..........
2000|..........
3000|..........
4000|..........
5500|..........
Run time benchmark: 9.061235904693604


Language:  pt

1000|..........
2000|..........
3000|..........
4000|..........
5500|........

  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)


..
Run time benchmark: 8.001790523529053


#### SVM + TF-IDF + WB

In [4]:
for language in ['en', 'es', 'pt']:
    print('\n\nLanguage: ', language)
    embedding = load_embedding(path_wordembedding + 'wiki.multi.' + language + '.vec')
    dataset_train, dataset_test = load_uiuc(language)
    dataset = pd.concat([dataset_train, dataset_test])
    create_feature('tfidf', dataset, dataset, max_features=2000)
    create_feature('embedding_sum', None, dataset, embedding)
    
    model = {'name': 'svm', 'model': svm_linear}
    
    tfidf = np.array([list(r) for r in dataset['tfidf'].values])
    tfidf = normalize(tfidf, norm='max')
    
    embedding = np.array([list(r) for r in dataset['embedding_sum'].values])
    embedding = normalize(embedding, norm='max')
    
    X = np.array([list(x) + list(xx) for x, xx in zip(tfidf, embedding)])
    y = dataset['class'].values
    
    # run_benchmark_cv(model, X, y, [50, 100] + list(range(500, 5501, 500)),
    run_benchmark_cv(model, X, y, [1000, 2000, 3000, 4000, 5500],
                     save='results/UIUC_cv_svm_cortes_' + language + '.csv')



Language:  en

1000|..........
2000|.



.........
3000|.



...



.



.



...



.
4000|.



.



.



..



.



.



.



.



.




5500|..



.



.



..



.



.



..




Run time benchmark: 125.29054236412048


Language:  es

1000|..........
2000|...



.......
3000|.



.



..



..



.



.



.



.




4000|.



.



.



.



.



.



.



.



.



.




5500|.



.



.



.



.



.



.



.



.



.




Run time benchmark: 151.8110692501068


Language:  pt

1000|...



...



....
2000|...



...



..



.



.




3000|.



.



.



.



...



..



.




4000|.



.



.



.



.



.



.



.



..




5500|.



.



.



.



.



.



.



.



.



.
Run time benchmark: 143.85443115234375




#### SVM + TF-IDF + WB + POS + NER

In [5]:
for language in ['en', 'es', 'pt']:
    print('\n\nLanguage: ', language)
    embedding = load_embedding(path_wordembedding + 'wiki.multi.' + language + '.vec')
    dataset_train, dataset_test = load_uiuc(language)
    dataset = pd.concat([dataset_train, dataset_test])
    create_feature('tfidf', dataset, dataset, max_features=2000)
    create_feature('embedding_sum', dataset, dataset, embedding)
    create_feature('pos_hotencode', dataset, dataset)
    create_feature('ner_hotencode', dataset, dataset)
    model = {'name': 'svm', 'model': svm_linear}
    
    tfidf = np.array([list(r) for r in dataset['tfidf'].values])
    tfidf = normalize(tfidf, norm='max')
    
    embedding = np.array([list(r) for r in dataset['embedding_sum'].values])
    embedding = normalize(embedding, norm='max')
    
    pos = np.array([list(r) for r in dataset['pos_hotencode'].values])
    
    ner = np.array([list(r) for r in dataset['ner_hotencode'].values])
    
    X = np.array([list(x) + list(xx) + list(xxx) + list(xxxx) for x, xx, xxx, xxxx in zip(tfidf, embedding, pos, ner)])
    
    y = dataset['class'].values
    
    # run_benchmark_cv(model, X, y, [50, 100] + list(range(500, 5501, 500)),
    run_benchmark_cv(model, X, y, [1000, 2000, 3000, 4000, 5500],
                     save='results/UIUC_cv_svm_high_' + language + '.csv')



Language:  en

1000|.........



.
2000|...



...



.



..



.
3000|.



.



.



.



.



.



.



.



.



.




4000|.



.



.



.



.



.



.



.



.



.




5500|.



.



.



.



.



.



.



.



.



.




Run time benchmark: 129.40373587608337


Language:  es

1000|....



......
2000|.



..



.



.



.



..



.



.




3000|.



.



.



.



.



.



.



.



.



.




4000|.



.



.



.



.



.



.



.



.



.




5500|.



.



.



.



.



.



.



.



.



.




Run time benchmark: 163.61614727973938


Language:  pt

1000|....



..



.



...
2000|.



.



.



.



.



.



.



.



.



.




3000|.



.



.



.



.



.



.



.



.



.




4000|.



.



.



.



.



.



.



.



.



.




5500|.



.



.



.



.



.



.



.



.



.
Run time benchmark: 144.99252271652222




## Run DISEQuA Benchmark - Cross-validation

Different classifier models are tested with different dependency levels of external linguistic resources (Low, Medium and High)

#### SVM + <font color=#007700>TF-IDF</font>

In [152]:
for language in ['en', 'es', 'it', 'nl']:
    print('\n\nLanguage: ', language)
    dataset = load_disequa(language)
    create_feature('tfidf', dataset, dataset, max_features=2000)
    
    model = {'name': 'svm', 'model': svm_linear}
    
    tfidf = np.array([list(r) for r in dataset['tfidf'].values])
    tfidf = normalize(tfidf, norm='max')
    
    X = np.array([list(x) for x in dataset['tfidf'].values])
    y = dataset['class'].values
    
    run_benchmark_cv(model, X, y, sizes_train=[100,200,300,400],
                     save='results/DISEQuA_svm_tfidf_' + language + '.csv')



Language:  en

100|..........
200|..........
300|..........
400|..........
Run time benchmark: 1.027012586593628


Language:  es

100|..........
200|..........
300|..........
400|..........
Run time benchmark: 1.0114972591400146


Language:  it

100|..........
200|..........
300|..........
400|..........
Run time benchmark: 1.1434721946716309


Language:  nl

100|..........
200|..........
300|..........
400|..........
Run time benchmark: 1.1250619888305664


#### SVM + <font color=#007700>TF-IDF</font> + <font color=#0055CC>WB</font>

In [163]:
for language in ['en', 'es', 'it', 'nl']:
    print('\n\nLanguage: ', language)
    embedding = load_embedding(path_wordembedding + 'wiki.multi.' + language + '.vec')
    dataset = load_disequa(language)
    create_feature('tfidf', dataset, dataset, max_features=2000)
    create_feature('embedding_sum', None, dataset, embedding)
    
    model = {'name': 'svm', 'model': svm_linear}
    
    tfidf = np.array([list(r) for r in dataset['tfidf'].values])
    tfidf = normalize(tfidf, norm='max')
    
    embedding = np.array([list(r) for r in dataset['embedding_sum'].values])
    embedding = normalize(embedding, norm='max')
    
    X = np.array([list(x) + list(xx) for x, xx in zip(tfidf, embedding)])
    y = dataset['class'].values
    
    run_benchmark_cv(model, X, y, sizes_train=[100,200,300,400],
                     save='results/DISEQuA_svm_cortes_' + language + '.csv')



Language:  en

100|..........
200|..........
300|..........
400|..........
Run time benchmark: 6.358882427215576


Language:  es

100|..........
200|..........
300|..........
400|..........
Run time benchmark: 7.197380065917969


Language:  it

100|..........
200|..........
300|..........
400|..........
Run time benchmark: 5.5334153175354


Language:  nl

100|..........
200|..........
300|..........
400|..........
Run time benchmark: 6.624628782272339


#### SVM + <font color=#007700>TF-IDF</font> + <font color=#0055CC>WB</font> + <font color=#CC6600>POS</font> + <font color=#CC6600>NER</font>

In [164]:


for language in ['en', 'es', 'it', 'nl']:
    print('\n\nLanguage: ', language)
    embedding = load_embedding(path_wordembedding + 'wiki.multi.' + language + '.vec')
    dataset = load_disequa(language)
    create_feature('tfidf', dataset, dataset, max_features=2000)
    create_feature('embedding_sum', dataset, dataset, embedding)
    create_feature('pos_hotencode', dataset, dataset)
    create_feature('ner_hotencode', dataset, dataset)
    model = {'name': 'svm', 'model': svm_linear}
    
    tfidf = np.array([list(r) for r in dataset['tfidf'].values])
    tfidf = normalize(tfidf, norm='max')
    
    embedding = np.array([list(r) for r in dataset['embedding_sum'].values])
    embedding = normalize(embedding, norm='max')
    
    pos = np.array([list(r) for r in dataset['pos_hotencode'].values])
    
    ner = np.array([list(r) for r in dataset['ner_hotencode'].values])
    
    X = np.array([list(x) + list(xx) + list(xxx) + list(xxxx) for x, xx, xxx, xxxx in zip(tfidf, embedding, pos, ner)])
    
    y = dataset['class'].values
    
    run_benchmark_cv(model, X, y, sizes_train=[100,200,300,400],
                     save='results/DISEQuA_svm_high_' + language + '.csv')



Language:  en

100|..........
200|..........
300|..........
400|..........
Run time benchmark: 6.811999559402466


Language:  es

100|..........
200|..........
300|..........
400|..........
Run time benchmark: 8.384974479675293


Language:  it

100|..........
200|..........
300|..........
400|..........
Run time benchmark: 6.426969528198242


Language:  nl

100|..........
200|..........
300|..........
400|..........
Run time benchmark: 6.852076053619385


## Old stuffs bellow

#### CNN

In [None]:
# 'en', 'es'
for language in ['es']:
    print('\n\nLanguage: ', language)
    #embedding = load_embedding(path_wordembedding + 'wiki.multi.' + language + '.vec')
    dataset_train, dataset_test = load_uiuc(language)
    text_representation = 'vocab_index'
    vocabulary_inv = create_feature(text_representation, dataset_train, dataset_train)
    create_feature(text_representation, dataset_train, dataset_test)
    model = {'name': 'cnn', 'model': cnn}
    X_train = np.array([list(x) for x in dataset_train[text_representation].values])
    X_test = np.array([list(x) for x in dataset_test[text_representation].values])
    #X_train = pad_sequences(X_train, maxlen=12, dtype='float', padding='post', truncating='post', value=0.0)
    #X_test = pad_sequences(X_test, maxlen=12, dtype='float', padding='post', truncating='post', value=0.0)
    y_train = dataset_train['class'].values
    y_test = dataset_test['class'].values
    ohe = OneHotEncoder()
    y_train = ohe.fit_transform([[y_] for y_ in y_train]).toarray()
    y_test = ohe.transform([[y_] for y_ in y_test]).toarray()
    # , 
    run_benchmark(model, X_train, y_train, X_test, y_test, sizes_train=[1000, 2000, 3000, 4000, 5500],
                  runs=30, save='results/UIUC_cnn_' + language + '.csv', epochs=100, onehot=ohe,
                  vocabulary_size=len(vocabulary_inv))

#### LSTM + WordEmbedding

In [73]:
for language in ['es']:
    print('\n\nLanguage: ', language)
    embedding = load_embedding(path_wordembedding + 'wiki.multi.' + language + '.vec')
    dataset_train, dataset_test = load_uiuc(language)
    dataset_train = dataset_train[:100]
    #dataset_test = dataset_test[:10]
    create_feature('embedding', dataset_train, dataset_train, embedding)
    create_feature('embedding', dataset_train, dataset_test, embedding)
    model = {'name': 'lstm', 'model': lstm_default}
    #print(dataset_train['embedding'].values.shape)
    #print(dataset_train['embedding'].values.dtype)
    #print(dataset_test['embedding'].values.shape)
    X_train = np.array([list(x) for x in dataset_train['embedding'].values])
    X_test = np.array([list(x) for x in dataset_test['embedding'].values])
    X_train = pad_sequences(X_train, maxlen=12, dtype='float', padding='post', truncating='post', value=0.0)
    X_test = pad_sequences(X_test, maxlen=12, dtype='float', padding='post', truncating='post', value=0.0)
    y_train = dataset_train['class'].values
    y_test = dataset_test['class'].values
#     y_train_sub = dataset_train['sub_class'].values
#     sub_classes = set()
#     for sc in y_train_sub:
#         sub_classes.add(sc)
#     y_test_sub = dataset_test['sub_class'].values
#     X_test_sub_ = []
#     y_test_sub_ = []
#     for i in range(len(X_test)):
#         if y_train_sub[i] in sub_classes:
#             X_test_sub_.append(X_test[i])
#             y_test_sub_.append(y_train_sub[i])
#     X_test_sub_ = np.array(X_test_sub_)
#     y_test_sub_ = np.array(y_test_sub_)
    ohe = OneHotEncoder()
    y_train = ohe.fit_transform([[y_] for y_ in y_train]).toarray()
    y_test = ohe.transform([[y_] for y_ in y_test]).toarray() 
    run_benchmark(model, X_train, y_train, X_test, y_test, sizes_train=[1000, 2000, 3000, 4000, 5500],
                  runs=30, save='results/UIUC_lstm_embedding_' + language + '_2.csv', epochs=100, onehot=ohe)
    #run_benchmark(model, X_train, y_train_sub, X_test_sub_, y_test_sub_, sizes_train=[1000, 2000, 3000, 4000, 5500],
    #              save='results/UIUCsub_svm_tfidf_' + language + '.csv')



Language:  es
(100,)
object
(1349,)

1000|...
2000|...
3000|...
4000|...
5500|...
Run time benchmark: 228.79835891723633


#### LSTM + BERT

In [None]:
for language in ['en']:
    print('\n\nLanguage: ', language)
    #embedding = load_embedding(path_wordembedding + 'wiki.multi.' + language + '.vec')
    dataset_train, dataset_test = load_uiuc(language)
    # debug
    print('WARNING: use subset (first 1000 entries) of training data')
    #dataset_train = dataset_train[:5500].copy()
    
    create_feature('bert', dataset_train, dataset_train)
    create_feature('bert', dataset_train, dataset_test)
    model = {'name': 'lstm', 'model': lstm_default}
    X_train = dataset_train['bert'].values
    X_test = dataset_test['bert'].values
    
    X_train = np.array([x for x in X_train])
    X_test = np.array([x for x in X_test])
    
    #X_train = pad_sequences(X_train, maxlen=12, dtype='float', padding='post', truncating='post', value=0.0)
    #X_test = pad_sequences(X_test, maxlen=12, dtype='float', padding='post', truncating='post', value=0.0)
    y_train = dataset_train['class'].values
    y_test = dataset_test['class'].values
    ohe = OneHotEncoder()
    y_train = ohe.fit_transform([[y_] for y_ in y_train]).toarray()
    y_test = ohe.transform([[y_] for y_ in y_test]).toarray() 
    run_benchmark(model, X_train, y_train, X_test, y_test, sizes_train=[5500], # 1000, 2000, 3000, 4000, 5500
                  runs=1, save='results/UIUC_lstm_bert_' + language + '.csv', 
                  epochs=100, onehot=ohe, in_dim=768)
    #run_benchmark(model, X_train, y_train_sub, X_test_sub_, y_test_sub_, sizes_train=[1000, 2000, 3000, 4000, 5500],
    #              save='results/UIUCsub_svm_tfidf_' + language + '.csv')

## DISEQuA Benchmark

### RUN DISEQuA Benchmark

##### SVM + TFIDF

In [None]:
for language in ['DUT', 'ENG', 'ITA', 'SPA']:
    print('\n\nLanguage: ', language)
    dataset = load_disequa(language)
    create_feature('tfidf', dataset, dataset, embedding)
    model = {'name': 'svm', 'model': svm_linear}
    X = np.array([list(x) for x in dataset['tfidf'].values])
    y = dataset['class'].values
    run_benchmark(model, X, y, sizes_train=[100,200,300,400,405],
                  save='results/DISEQuA_svm_tfidf_' + language + '.csv')

##### RFC + TFIDF

In [None]:
for language in ['DUT', 'ENG', 'ITA', 'SPA']:
    print('\n\nLanguage: ', language)
    dataset = load_disequa(language)
    create_feature('tfidf', dataset, dataset, embedding)
    model = {'name': 'rfc', 'model': random_forest}
    X = np.array([list(x) for x in dataset['tfidf'].values])
    y = dataset['class'].values
    run_benchmark(model, X, y, sizes_train=[100,200,300,400],
                  save='results/DISEQuA_rfc_tfidf_' + language + '.csv')

##### SVM + TFIDF_3gram + SKB

In [None]:
for language in ['DUT', 'ENG', 'ITA', 'SPA']:
    print('\n\nLanguage: ', language)
    dataset = load_disequa(language)
    create_feature('tfidf_3gram', dataset, dataset)
    model = {'name': 'svm', 'model': svm_linear}
    X = np.array([list(x) for x in dataset['tfidf'].values])
    y = dataset['class'].values
    skb = SelectKBest(chi2, k=2000).fit(X, y)
    X = skb.transform(X)
    run_benchmark(model, X, y, sizes_train=[100,200,300,400],
                  save='results/DISEQuA_svm_tfidf_3gram_' + language + '.csv')

##### LSTM + Embedding

In [None]:
for language, embd_l in zip(['SPA'], ['es']):
    print('\n\nLanguage: ', language)
    embedding = load_embedding(path_wordembedding + 'wiki.multi.' + embd_l + '.vec')
    dataset = load_disequa(language)
    create_feature('embedding', dataset, dataset, embedding)
    model = {'name': 'lstm', 'model': lstm_default}
    X = np.array([list(x) for x in dataset['embedding'].values])
    y = dataset['class'].values
    X = pad_sequences(X, maxlen=12, dtype='float', padding='post', truncating='post', value=0.0)
    ohe = OneHotEncoder()
    y = ohe.fit_transform([[y_] for y_ in y]).toarray()
    run_benchmark(model, X, y, sizes_train=[100,200,300,400,405], onehot=ohe,
                  save='results/DISEQuA_lstm_embedding_' + language + '.csv')

##### CNN

In [None]:
for language, embd_l in zip(['DUT', 'ENG', 'ITA', 'SPA'], ['nl', 'eng', 'it', 'es']):
    print('\n\nLanguage: ', language)
    #embedding = load_embedding(path_wordembedding + 'wiki.multi.' + embd_l + '.vec')
    dataset = load_disequa(language)
    text_representation = 'vocab_index'
    vocabulary_inv = create_feature(text_representation, dataset, dataset)
    model = {'name': 'cnn', 'model': cnn}
    X = np.array([list(x) for x in dataset[text_representation].values])
    y = dataset['class'].values
    #X = pad_sequences(X, maxlen=12, dtype='float', padding='post', truncating='post', value=0.0)
    ohe = OneHotEncoder()
    y = ohe.fit_transform([[y_] for y_ in y]).toarray()
    run_benchmark(model, X, y, sizes_train=[100,200,300,400], onehot=ohe, vocabulary_size=len(vocabulary_inv),
                  save='results/DISEQuA_cnn_' + language + '.csv', epochs=100)