<a href="https://colab.research.google.com/github/marco-siino/DA-BT/blob/main/code/evaluation/fns/CNN_FNS_augmented_IT_DE.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Investigating text data augmentation using back translation for author profiling
- - - 
CNN ON FNS DS EXPERIMENTS NOTEBOOK 
- - -
Convolutional Neural Network on Fake News Spreaders Dataset augmented with IT and DE backtranslation.
Code by M. Siino. 

From the paper: "Investigating text data augmentation using back translation for author profiling" by M.Siino et al.



## Importing modules.

In [1]:
import matplotlib.pyplot as plt
import os
import random
import re
import shutil
import string
import tensorflow as tf
from urllib import request


import numpy as np

from tensorflow.keras import layers
from tensorflow.keras import losses
from tensorflow.keras import preprocessing
from keras.models import Model
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization

# Import class Dataset
module_url = f"https://raw.githubusercontent.com/marco-siino/DA-BT/main/code/dataset.py"
module_name = module_url.split('/')[-1]
print(f'Fetching {module_url}')
with request.urlopen(module_url) as f, open(module_name,'w') as outf:
  a = f.read()
  outf.write(a.decode('utf-8'))
from dataset import Dataset

# Import class Vectorizer
module_url = f"https://raw.githubusercontent.com/marco-siino/DA-BT/main/code/vectorizer.py"
module_name = module_url.split('/')[-1]
print(f'Fetching {module_url}')
with request.urlopen(module_url) as f, open(module_name,'w') as outf:
  a = f.read()
  outf.write(a.decode('utf-8'))
from vectorizer import Vectorizer

# Import class Simulator
module_url = f"https://raw.githubusercontent.com/marco-siino/DA-BT/main/code/simulator.py"
module_name = module_url.split('/')[-1]
print(f'Fetching {module_url}')
with request.urlopen(module_url) as f, open(module_name,'w') as outf:
  a = f.read()
  outf.write(a.decode('utf-8'))
from simulator import Simulator

Fetching https://raw.githubusercontent.com/marco-siino/DA-BT/main/code/dataset.py
Fetching https://raw.githubusercontent.com/marco-siino/DA-BT/main/code/vectorizer.py
Fetching https://raw.githubusercontent.com/marco-siino/DA-BT/main/code/simulator.py


## Fetch the dataset zips from GitHub and build up a Keras DS.

In [3]:
# Parameters of Dataset object are name of the ds used and augmentation language used.
ds = Dataset('fns','mix-it-de')
ds.build_ds(1)

Downloading data from https://github.com/marco-siino/DA-BT/raw/main/data/fns/fns-train-mix-it-de.zip
Downloading data from https://github.com/marco-siino/DA-BT/raw/main/data/fns/fns-test-mix-it-de.zip
Found 300 files belonging to 2 classes.
Metal device set to: Apple M1
Found 200 files belonging to 2 classes.


2023-04-04 10:50:02.663090: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-04-04 10:50:02.663586: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


## Vectorize text accordingly to the train set.

In [4]:
vct_layer_obj = Vectorizer(ds.train_set)

2023-04-04 10:50:07.651822: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2023-04-04 10:50:07.692892: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Length of the longest sample is: 7166


2023-04-04 10:50:49.701682: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.



Vocabulary size is: 98031


## Run the simulation.

In [5]:
simulator = Simulator ("cnn",5,20,ds.train_set,ds.test_set,vct_layer_obj.vectorize_layer)
simulator.run()


Setup for shallow model completed.


2023-04-04 10:50:50.901287: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.




2023-04-04 10:51:08.659308: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Run:  1 / Accuracy on test set at epoch  0  is:  0.6049999594688416 

Run:  1 / Accuracy on test set at epoch  1  is:  0.6549999713897705 

Run:  1 / Accuracy on test set at epoch  2  is:  0.675000011920929 

Run:  1 / Accuracy on test set at epoch  3  is:  0.5600000023841858 

Run:  1 / Accuracy on test set at epoch  4  is:  0.6499999761581421 

Run:  1 / Accuracy on test set at epoch  5  is:  0.7099999785423279 

Run:  1 / Accuracy on test set at epoch  6  is:  0.6949999928474426 

Run:  1 / Accuracy on test set at epoch  7  is:  0.7099999785423279 

Run:  1 / Accuracy on test set at epoch  8  is:  0.7199999690055847 

Run:  1 / Accuracy on test set at epoch  9  is:  0.7199999690055847 

Run:  1 / Accuracy on test set at epoch  10  is:  0.7099999785423279 

Run:  1 / Accuracy on test set at epoch  11  is:  0.7199999690055847 

Run:  1 / Accuracy on test set at epoch  12  is:  0.7249999642372131 

Run:  1 / Accuracy on test set at epoch  13  is:  0.7199999690055847 

Run:  1 / Accurac

2023-04-04 10:56:31.205783: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.




2023-04-04 10:56:50.344053: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Run:  2 / Accuracy on test set at epoch  0  is:  0.5799999833106995 

Run:  2 / Accuracy on test set at epoch  1  is:  0.675000011920929 

Run:  2 / Accuracy on test set at epoch  2  is:  0.6399999856948853 

Run:  2 / Accuracy on test set at epoch  3  is:  0.5450000166893005 

Run:  2 / Accuracy on test set at epoch  4  is:  0.6800000071525574 

Run:  2 / Accuracy on test set at epoch  5  is:  0.675000011920929 

Run:  2 / Accuracy on test set at epoch  6  is:  0.6899999976158142 

Run:  2 / Accuracy on test set at epoch  7  is:  0.6899999976158142 

Run:  2 / Accuracy on test set at epoch  8  is:  0.6899999976158142 

Run:  2 / Accuracy on test set at epoch  9  is:  0.7199999690055847 

Run:  2 / Accuracy on test set at epoch  10  is:  0.7299999594688416 

Run:  2 / Accuracy on test set at epoch  11  is:  0.7299999594688416 

Run:  2 / Accuracy on test set at epoch  12  is:  0.7249999642372131 

Run:  2 / Accuracy on test set at epoch  13  is:  0.7299999594688416 

Run:  2 / Accuracy

2023-04-04 11:02:17.417850: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.




2023-04-04 11:02:41.605213: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Run:  3 / Accuracy on test set at epoch  0  is:  0.5799999833106995 

Run:  3 / Accuracy on test set at epoch  1  is:  0.675000011920929 

Run:  3 / Accuracy on test set at epoch  2  is:  0.574999988079071 

Run:  3 / Accuracy on test set at epoch  3  is:  0.6599999666213989 

Run:  3 / Accuracy on test set at epoch  4  is:  0.7099999785423279 

Run:  3 / Accuracy on test set at epoch  5  is:  0.7149999737739563 

Run:  3 / Accuracy on test set at epoch  6  is:  0.7149999737739563 

Run:  3 / Accuracy on test set at epoch  7  is:  0.7149999737739563 

Run:  3 / Accuracy on test set at epoch  8  is:  0.7149999737739563 

Run:  3 / Accuracy on test set at epoch  9  is:  0.7149999737739563 

Run:  3 / Accuracy on test set at epoch  10  is:  0.7099999785423279 

Run:  3 / Accuracy on test set at epoch  11  is:  0.699999988079071 

Run:  3 / Accuracy on test set at epoch  12  is:  0.7099999785423279 

Run:  3 / Accuracy on test set at epoch  13  is:  0.7149999737739563 

Run:  3 / Accuracy 

2023-04-04 11:08:08.373644: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.




2023-04-04 11:08:35.582635: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Run:  4 / Accuracy on test set at epoch  0  is:  0.6049999594688416 

Run:  4 / Accuracy on test set at epoch  1  is:  0.7199999690055847 

Run:  4 / Accuracy on test set at epoch  2  is:  0.7049999833106995 

Run:  4 / Accuracy on test set at epoch  3  is:  0.6649999618530273 

Run:  4 / Accuracy on test set at epoch  4  is:  0.6850000023841858 

Run:  4 / Accuracy on test set at epoch  5  is:  0.6850000023841858 

Run:  4 / Accuracy on test set at epoch  6  is:  0.6850000023841858 

Run:  4 / Accuracy on test set at epoch  7  is:  0.7049999833106995 

Run:  4 / Accuracy on test set at epoch  8  is:  0.7049999833106995 

Run:  4 / Accuracy on test set at epoch  9  is:  0.7099999785423279 

Run:  4 / Accuracy on test set at epoch  10  is:  0.7149999737739563 

Run:  4 / Accuracy on test set at epoch  11  is:  0.7199999690055847 

Run:  4 / Accuracy on test set at epoch  12  is:  0.7199999690055847 

Run:  4 / Accuracy on test set at epoch  13  is:  0.7149999737739563 

Run:  4 / Accura

2023-04-04 11:14:13.524045: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.




2023-04-04 11:14:45.881564: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Run:  5 / Accuracy on test set at epoch  0  is:  0.5899999737739563 

Run:  5 / Accuracy on test set at epoch  1  is:  0.7099999785423279 

Run:  5 / Accuracy on test set at epoch  2  is:  0.5949999690055847 

Run:  5 / Accuracy on test set at epoch  3  is:  0.5949999690055847 

Run:  5 / Accuracy on test set at epoch  4  is:  0.6949999928474426 

Run:  5 / Accuracy on test set at epoch  5  is:  0.7199999690055847 

Run:  5 / Accuracy on test set at epoch  6  is:  0.7149999737739563 

Run:  5 / Accuracy on test set at epoch  7  is:  0.7249999642372131 

Run:  5 / Accuracy on test set at epoch  8  is:  0.7149999737739563 

Run:  5 / Accuracy on test set at epoch  9  is:  0.7099999785423279 

Run:  5 / Accuracy on test set at epoch  10  is:  0.7299999594688416 

Run:  5 / Accuracy on test set at epoch  11  is:  0.7249999642372131 

Run:  5 / Accuracy on test set at epoch  12  is:  0.699999988079071 

Run:  5 / Accuracy on test set at epoch  13  is:  0.7249999642372131 

Run:  5 / Accurac