# <center>PROJECT SANDBOX</center>

## Documentation
The aim of this notebook is to provide a simple sandbox to test different NN architectures for the project. , here is a doc about the functions imported from `scripts` folder : 

- **`prepare_dataset(device,ratio=0.5,shuffle_ctx=False)`** :
    - **Input**:
        - device : a torch.device object
        - ratio : a float ratio between 0 and 1 that determines the average proportion of modern english verses in the data loader
        - shuffle_ctx : if `True`, shuffle the contexts within a Batch so that half of the `x_1` elements has a wrong context `ctx_1`. Useful to train the context recognizer model.
    - **Return** :
        - a torch Dataset | class : Shakespeare inherited from torch.utils.data.Dataset
        - a python word dictionary (aka tokenizer) | class : dict
    - **Tensors returned when loaded in the dataloader**:
        - x_1 : input verse (modern / shakespearian)
        - x_2 : output verse (modern / shakespearian)

        - ctx_1 = context of the input verse
        - ctx_2 = context of the output verse

        - len_x : length of the input verse
        - len_y : length of the output verse

        - len_ctx_x : length of the input verse context
        - len_ctx_y : length of the output verse context

        - label : label of the input verse (0 : modern, 1 : shakespearian)
        - label_ctx : label of the context (0 : wrong context, 1 : right context)
- **`string2code(string,dict)`** : 
    - **Input**:
        - string : a sentence
        - dict : a tokenizer
    - **Return** :
        - a torch Longtensor (sentence tokenized)
- **`code2string(torch.Longtensor,dict)`** : 
    - **Input**:
        - torch.Longtensor : a sentence tokenized
        - dict : a tokenizer
    - **Return** :
        - a string sentence

## Importing packages

In [15]:
from scripts.data_builders.prepare_dataset import prepare_dataset,string2code,code2string,assemble

import torch
import torchvision.datasets as datasets
import torch.nn.functional as F
from torch import nn
from torch import optim
from torch.utils.tensorboard import SummaryWriter
import ipdb
import pickle
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print("device = ",device)

ImportError: cannot import name 'assemble'

## Preprocessing data

In [11]:
train_data, dict_words = prepare_dataset(device,ratio=0.5,shuffle_ctx=True) #check with shift+tab to look at the data structure
batch_size = 20
dict_token = {b:a for a,b in dict_words.items()} #dict for code2string

train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
                                           shuffle=True,collate_fn=train_data.collate)

Loading ...
- Shakespeare dataset length :  20316
- Corrupted samples (ignored) :  763


## Designing NN model

### Language Model 

In [26]:
dict_size=len(dict_words) #19089
d_embedding=300 #cf. paper Y.Kim 2014 Convolutional Neural Networks for Sentence Classification


In [12]:
class CoherenceClassifier(torch.nn.Module):
    def __init__(self,dict_size=dict_size,d_embedding=300,max_length=100):
        super().__init__()
        self.embed_layer=torch.nn.Embedding(dict_size,d_embedding)
        self.conv_1 = torch.nn.Conv1d(d_embedding,3,kernel_size = 3, stride = 1)
        self.max_pool = torch.nn.MaxPool1d(3,2)
        self.relu = torch.nn.ReLU()
        self.linear = torch.nn.Linear(3,1)
        # self.f=lambda x: torch.norm(x,dim=1)**2 (I am not sure it is necessary at all)
        self.f=lambda x:x
        self.tanh=torch.nn.Tanh()
        self.softmax=torch.nn.Softmax()
    
    def forward(self,x,ctx):
        input_=asse
        x = self.conv_1(x.transpose(1,2))
        x = self.max_pool(x)
        x = self.relu(x)
        x = torch.max(x,2)[0]
        u=self.LM()
        x = self.softmax(self.tanh(self.linear(u)))
        return(x)

        
    



## Running model

In [1]:
import ipdb

In [9]:
for x,y , ctx_x,ctx_y , len_x,len_y , len_ctx_x,len_ctx_y, label,label_ctx in train_loader:
    
    for i in range(x.shape[0]):
        print("\n- x :")
        print(code2string(x[i],dict_token))
        print("- context of x :")
        print(code2string(ctx_x[i],dict_token))
        print("- context label :",label_ctx[i].item())
#         ipdb.set_trace()
    break


- x :
RIDE , RIDE , MESSALA .
- context of x :
TRANIO , THAT FACED AND BRAVED ME IN THIS MATTER SO ? WHY , TELL ME , IS NOT THIS MY CAMBIO ? LOVE WROUGHT THESE MIRACLES . BIANCA’S LOVE MADE ME EXCHANGE MY STATE WITH TRANIO , WHILE HE DID BEAR MY COUNTENANCE IN THE TOWN , AND HAPPILY I HAVE ARRIVÈD AT THE LAST UNTO THE WISHÈD
- context label : 0

- x :
HELENA , WE’LL TELL YOU ABOUT OUR SECRET PLAN .
- context of x :
DON’T WORRY ABOUT ME . — YOUR SISTER AND THE DUKE ARE HERE . NOBLE WORDS , SIR . THEN LET’S MEET WITH OUR SENIOR COMMAND AND DISCUSS WHAT TO DO NEXT . I’LL MEET YOU AT YOUR TENT .
- context label : 0

- x :
THEY ONLY LIKE THINGS AS BAD AS THEMSELVES .
- context of x :
IF THE WITCHES TELL THE TRUTH — WHICH THEY DID ABOUT YOU — MAYBE WHAT THEY SAID ABOUT ME WILL COME TRUE TOO . BUT SHHH ! WHATEVER YOUR HIGHNESS COMMANDS ME TO DO , IT IS ALWAYS MY DUTY TO DO IT . ARE YOU GOING RIDING THIS AFTERNOON ?
- context label : 0

- x :
LET HIM COME FORWARD .
- context of x :
THE MOST E

In [7]:
from torch.optim import Adagrad
from torch.nn import BCELoss

In [6]:
epochs=100
optimizer=Adagrad(params=model.parameters(),lr=0.01)
loss_func=BCELoss()

for epoch in range(epochs):
    for x,y , ctx_x,ctx_y , len_x,len_y , len_ctx_x,len_ctx_y, label,label_ctx in train_loader:
        label_pred=model.forward(x)
        loss=loss_func(label,label_pred)
        loss.backward()
        optimizer.step()
    if epoch %10==0:
        print("Epoch %d, loss %f"%(epoch,loss))

NameError: name 'Adagrad' is not defined