<a href="https://colab.research.google.com/github/marco-siino/DA-BT/blob/main/code/evaluation/hss/CNN_HSS_augmented_JA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Investigating text data augmentation using back translation for author profiling
- - - 
CNN ON FNS DS EXPERIMENTS NOTEBOOK 
- - -
Convolutional Neural Network on Hate Speech Spreaders Dataset augmented with JA backtranslation.
Code by M. Siino. 

From the paper: "Investigating text data augmentation using back translation for author profiling" by M.Siino et al.



## Importing modules.

In [6]:
import matplotlib.pyplot as plt
import os
import random
import re
import shutil
import string
import tensorflow as tf
from urllib import request


import numpy as np

from tensorflow.keras import layers
from tensorflow.keras import losses
from tensorflow.keras import preprocessing
from keras.models import Model
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization

# Import class Dataset
module_url = f"https://raw.githubusercontent.com/marco-siino/DA-BT/main/code/dataset.py"
module_name = module_url.split('/')[-1]
print(f'Fetching {module_url}')
with request.urlopen(module_url) as f, open(module_name,'w') as outf:
  a = f.read()
  outf.write(a.decode('utf-8'))
from dataset import Dataset

# Import class Vectorizer
module_url = f"https://raw.githubusercontent.com/marco-siino/DA-BT/main/code/vectorizer.py"
module_name = module_url.split('/')[-1]
print(f'Fetching {module_url}')
with request.urlopen(module_url) as f, open(module_name,'w') as outf:
  a = f.read()
  outf.write(a.decode('utf-8'))
from vectorizer import Vectorizer

# Import class Simulator
module_url = f"https://raw.githubusercontent.com/marco-siino/DA-BT/main/code/simulator.py"
module_name = module_url.split('/')[-1]
print(f'Fetching {module_url}')
with request.urlopen(module_url) as f, open(module_name,'w') as outf:
  a = f.read()
  outf.write(a.decode('utf-8'))
from simulator import Simulator

Fetching https://raw.githubusercontent.com/marco-siino/DA-BT/main/code/dataset.py
Fetching https://raw.githubusercontent.com/marco-siino/DA-BT/main/code/vectorizer.py
Fetching https://raw.githubusercontent.com/marco-siino/DA-BT/main/code/simulator.py


## Fetch the dataset zips from GitHub and build up a Keras DS.

In [7]:
# Parameters of Dataset object are name of the ds used and augmentation language used.
ds = Dataset('hss','ja')
ds.build_ds(1)

Downloading data from https://github.com/marco-siino/DA-BT/raw/main/data/hss/hss-train-ja.zip
Downloading data from https://github.com/marco-siino/DA-BT/raw/main/data/hss/hss-test-ja.zip
Found 200 files belonging to 2 classes.
Found 100 files belonging to 2 classes.


## Vectorize text accordingly to the train set.

In [8]:
vct_layer_obj = Vectorizer(ds.train_set)

2023-04-04 11:41:28.749696: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Length of the longest sample is: 8310


2023-04-04 11:41:56.518022: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.



Vocabulary size is: 78454


## Run the simulation.

In [9]:
simulator = Simulator ("cnn",5,20,ds.train_set,ds.test_set,vct_layer_obj.vectorize_layer)
simulator.run()


Setup for shallow model completed.


2023-04-04 11:41:59.464738: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.




2023-04-04 11:42:30.117002: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Run:  1 / Accuracy on test set at epoch  0  is:  0.5 

Run:  1 / Accuracy on test set at epoch  1  is:  0.5600000023841858 

Run:  1 / Accuracy on test set at epoch  2  is:  0.5600000023841858 

Run:  1 / Accuracy on test set at epoch  3  is:  0.6299999952316284 

Run:  1 / Accuracy on test set at epoch  4  is:  0.5799999833106995 

Run:  1 / Accuracy on test set at epoch  5  is:  0.6100000143051147 

Run:  1 / Accuracy on test set at epoch  6  is:  0.5799999833106995 

Run:  1 / Accuracy on test set at epoch  7  is:  0.5899999737739563 

Run:  1 / Accuracy on test set at epoch  8  is:  0.5799999833106995 

Run:  1 / Accuracy on test set at epoch  9  is:  0.6399999856948853 

Run:  1 / Accuracy on test set at epoch  10  is:  0.5899999737739563 

Run:  1 / Accuracy on test set at epoch  11  is:  0.699999988079071 

Run:  1 / Accuracy on test set at epoch  12  is:  0.6899999976158142 

Run:  1 / Accuracy on test set at epoch  13  is:  0.5699999928474426 

Run:  1 / Accuracy on test set a

2023-04-04 11:46:23.189203: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.




2023-04-04 11:46:48.917168: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Run:  2 / Accuracy on test set at epoch  0  is:  0.5 

Run:  2 / Accuracy on test set at epoch  1  is:  0.5600000023841858 

Run:  2 / Accuracy on test set at epoch  2  is:  0.5399999618530273 

Run:  2 / Accuracy on test set at epoch  3  is:  0.5699999928474426 

Run:  2 / Accuracy on test set at epoch  4  is:  0.5600000023841858 

Run:  2 / Accuracy on test set at epoch  5  is:  0.5999999642372131 

Run:  2 / Accuracy on test set at epoch  6  is:  0.5399999618530273 

Run:  2 / Accuracy on test set at epoch  7  is:  0.5600000023841858 

Run:  2 / Accuracy on test set at epoch  8  is:  0.5799999833106995 

Run:  2 / Accuracy on test set at epoch  9  is:  0.5699999928474426 

Run:  2 / Accuracy on test set at epoch  10  is:  0.5699999928474426 

Run:  2 / Accuracy on test set at epoch  11  is:  0.6399999856948853 

Run:  2 / Accuracy on test set at epoch  12  is:  0.6800000071525574 

Run:  2 / Accuracy on test set at epoch  13  is:  0.550000011920929 

Run:  2 / Accuracy on test set a

2023-04-04 11:50:41.011258: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.




2023-04-04 11:51:10.411950: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Run:  3 / Accuracy on test set at epoch  0  is:  0.5 

Run:  3 / Accuracy on test set at epoch  1  is:  0.5399999618530273 

Run:  3 / Accuracy on test set at epoch  2  is:  0.6399999856948853 

Run:  3 / Accuracy on test set at epoch  3  is:  0.6699999570846558 

Run:  3 / Accuracy on test set at epoch  4  is:  0.6399999856948853 

Run:  3 / Accuracy on test set at epoch  5  is:  0.6399999856948853 

Run:  3 / Accuracy on test set at epoch  6  is:  0.6499999761581421 

Run:  3 / Accuracy on test set at epoch  7  is:  0.5299999713897705 

Run:  3 / Accuracy on test set at epoch  8  is:  0.5899999737739563 

Run:  3 / Accuracy on test set at epoch  9  is:  0.6899999976158142 

Run:  3 / Accuracy on test set at epoch  10  is:  0.6100000143051147 

Run:  3 / Accuracy on test set at epoch  11  is:  0.6200000047683716 

Run:  3 / Accuracy on test set at epoch  12  is:  0.6200000047683716 

Run:  3 / Accuracy on test set at epoch  13  is:  0.6200000047683716 

Run:  3 / Accuracy on test set 

2023-04-04 11:55:15.461483: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.




2023-04-04 11:55:54.692847: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Run:  4 / Accuracy on test set at epoch  0  is:  0.5799999833106995 

Run:  4 / Accuracy on test set at epoch  1  is:  0.5799999833106995 

Run:  4 / Accuracy on test set at epoch  2  is:  0.5600000023841858 

Run:  4 / Accuracy on test set at epoch  3  is:  0.5699999928474426 

Run:  4 / Accuracy on test set at epoch  4  is:  0.7199999690055847 

Run:  4 / Accuracy on test set at epoch  5  is:  0.699999988079071 

Run:  4 / Accuracy on test set at epoch  6  is:  0.6399999856948853 

Run:  4 / Accuracy on test set at epoch  7  is:  0.6200000047683716 

Run:  4 / Accuracy on test set at epoch  8  is:  0.6299999952316284 

Run:  4 / Accuracy on test set at epoch  9  is:  0.5899999737739563 

Run:  4 / Accuracy on test set at epoch  10  is:  0.6200000047683716 

Run:  4 / Accuracy on test set at epoch  11  is:  0.6399999856948853 

Run:  4 / Accuracy on test set at epoch  12  is:  0.6299999952316284 

Run:  4 / Accuracy on test set at epoch  13  is:  0.550000011920929 

Run:  4 / Accuracy

2023-04-04 12:00:15.567283: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.




2023-04-04 12:01:03.470253: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Run:  5 / Accuracy on test set at epoch  0  is:  0.5699999928474426 

Run:  5 / Accuracy on test set at epoch  1  is:  0.5299999713897705 

Run:  5 / Accuracy on test set at epoch  2  is:  0.5299999713897705 

Run:  5 / Accuracy on test set at epoch  3  is:  0.6100000143051147 

Run:  5 / Accuracy on test set at epoch  4  is:  0.550000011920929 

Run:  5 / Accuracy on test set at epoch  5  is:  0.6100000143051147 

Run:  5 / Accuracy on test set at epoch  6  is:  0.6899999976158142 

Run:  5 / Accuracy on test set at epoch  7  is:  0.6800000071525574 

Run:  5 / Accuracy on test set at epoch  8  is:  0.7099999785423279 

Run:  5 / Accuracy on test set at epoch  9  is:  0.6200000047683716 

Run:  5 / Accuracy on test set at epoch  10  is:  0.6200000047683716 

Run:  5 / Accuracy on test set at epoch  11  is:  0.6200000047683716 

Run:  5 / Accuracy on test set at epoch  12  is:  0.7099999785423279 

Run:  5 / Accuracy on test set at epoch  13  is:  0.6699999570846558 

Run:  5 / Accurac