<a href="https://colab.research.google.com/github/mscholl96/mad-recime/blob/network_LSTM/network/LSTM/instrGen.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Instruction Generation

## Basic includes

In [None]:
!pip install ipython-autotime
%load_ext autotime
!pip install word2vec
!pip install ray[tune]

Collecting ipython-autotime
  Downloading ipython_autotime-0.3.1-py2.py3-none-any.whl (6.8 kB)
Installing collected packages: ipython-autotime
Successfully installed ipython-autotime-0.3.1
time: 1.67 ms (started: 2022-03-22 21:36:28 +00:00)


## Connect colab and set paths

In [None]:
from google.colab import drive
drive.mount('/content/drive/')

rootDir = '/content/drive/MyDrive/'

dataPath = rootDir + 'TP2/Datasets/Recipe1M/'
tarPath = rootDir + 'Colab Notebooks/recime/data/'

TIMESTAMP = '2022_03_19'


Mounted at /content/drive/
time: 3min 15s (started: 2022-03-22 21:36:53 +00:00)


## Imports for Learning
https://pytorch.org/tutorials/beginner/introyt/trainingyt.html

In [None]:
import pandas as pd
import numpy as np

import torch

# Model
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable 
from torchsummary import summary

# Optimizer
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, random_split

# Tokenizer
# torch padding does only support constant padding (ConstantPad1d) for 1D or non-constant padding for >1D (nn.function.pad)
from tensorflow.keras.preprocessing.sequence import pad_sequences
# keras tokenizer more powerful than torch
from tensorflow.keras.preprocessing.text import Tokenizer, text_to_word_sequence
from torchtext.data import get_tokenizer # https://pytorch.org/text/stable/data_utils.html

# PyTorch TensorBoard support
from torch.utils.tensorboard import SummaryWriter
from datetime import datetime

# hyperparameter tuning
from ray import tune
from ray.tune import CLIReporter
from ray.tune.schedulers import ASHAScheduler

time: 6.99 s (started: 2022-03-22 21:40:57 +00:00)


## Seed

In [None]:
torch.manual_seed(0)
np.random.seed(0)

<torch._C.Generator at 0x7fc0b82629b0>

time: 3.41 ms (started: 2022-03-22 21:41:04 +00:00)


## Training execution
mixture of 
* https://pytorch.org/tutorials/beginner/introyt/trainingyt.html
* https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html
* https://stackoverflow.com/questions/67295494/correct-validation-loss-in-pytorch


### Includes

In [None]:
from src.instGen import HyperParams, InstructionSet, Model3, train, predict, sample
from src.preProc import getPreProcData

### Set device

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(torch.cuda.get_device_name(0))

### Set hyperparams

In [None]:
hyperParams = HyperParams(epochs=50, batchSize=32)
print(hyperParams)

epochs 50
batchSize 32
lr 0.001
ratio train|val|test[0.7, 0.2, 0.1]
hidden_dim 256
num_layers 1
embedding_dim 200

time: 1.36 ms (started: 2022-03-22 21:41:13 +00:00)


### Set dataset

In [None]:
instSet = InstructionSet(tarPath)

# split data set
trainNum = int(hyperParams.ratio[0] * len(instSet))
valNum = int(hyperParams.ratio[1] * len(instSet))


splitSet = random_split(
    instSet, [trainNum, valNum], generator=torch.Generator().manual_seed(0))
    
splitSet = {'train': splitSet[0], 'val': splitSet[1]}

testSet = getPreProcData(tarPath, range(-1,0))

AttributeError: ignored

time: 1min 2s (started: 2022-03-22 21:41:13 +00:00)


### Set model

In [None]:
model = Model3(hyperParams, instSet, device)
model.to(device)
print(model)


In [None]:
train(splitSet, model, hyperParams, device)

Epoch: 1, loss: 3.47934
Epoch: 2, loss: 2.30626
Epoch: 3, loss: 1.85036
Epoch: 4, loss: 1.61080
Epoch: 5, loss: 1.46837
Epoch: 6, loss: 1.37494
Epoch: 7, loss: 1.31024
Epoch: 8, loss: 1.26205
Epoch: 9, loss: 1.22551
Epoch: 10, loss: 1.19719
Epoch: 11, loss: 1.17344
Epoch: 12, loss: 1.15464
Epoch: 13, loss: 1.13828
Epoch: 14, loss: 1.12577
Epoch: 15, loss: 1.11415
Epoch: 16, loss: 1.10420
Epoch: 17, loss: 1.09501
Epoch: 18, loss: 1.08751
Epoch: 19, loss: 1.07986
Epoch: 20, loss: 1.07383
Epoch: 21, loss: 1.06840
Epoch: 22, loss: 1.06360
Epoch: 23, loss: 1.05879
Epoch: 24, loss: 1.05476
Epoch: 25, loss: 1.05162
Epoch: 26, loss: 1.04824
Epoch: 27, loss: 1.04435
Epoch: 28, loss: 1.04166
Epoch: 29, loss: 1.03866
Epoch: 30, loss: 1.03588
Epoch: 31, loss: 1.03428
Epoch: 32, loss: 1.03249
Epoch: 33, loss: 1.03064
Epoch: 34, loss: 1.02836
Epoch: 35, loss: 1.02763
Epoch: 36, loss: 1.02467
Epoch: 37, loss: 1.02484
Epoch: 38, loss: 1.02381
Epoch: 39, loss: 1.02162
Epoch: 40, loss: 1.02058
Epoch: 41

In [None]:
# sample(model, titleSet, 6, device, initial=['dry', 'penne', 'pasta', 'broccoli', 'sun', 'dried', 'tomatoes', 'packed', 'in', 'oil', 'garlic', 'cloves', 'cheddar', 'cheese', 'salt', 'black', 'pepper'])
seq = splitSet['test'][np.random.randint(0, len(splitSet['test']))][0].tolist()

def remove_values_from_list(the_list, val):
   return [instSet.t300kenizer.index_word[value] for value in the_list if value != val]

seq = remove_values_from_list(seq, 0)
print(seq)

sample(model, instSet, 300, device, initial=seq)

['olive', 'oil', 'onions', 'pepper', 'courgettes', 'aubergines', 'cherry', 'tomatoes', 'pasta', 'sauce', "goat's", 'cheese', 'basil', 'leaves', 'pasta', 'bake', 'with', "goats'", 'cheese']


'melt the butter over a medium heat to medium and simmer over medium heat without stirring once mixture has softened add pepper and shrimp to brown stir for about 5 7 to 4 minutes drain and rinse the lentils with cold water in small bowl stir the mayonnaise lemon zest oregano and salt bring back into the sauce add salt water and to simmer gently stirring until thickened stirring constantly until the sugar melts stir in all purpose orange rind orange sugar salt heat oven 1 1 2 1 4 cup unsweetened chocolate sugar orange juice orange juice lemon juice and sugar beat in eggs one tablespoon of butter over each side using the bottom add your fingers or the top of each pan and add it into a spoon to extract the custard so to add the potatoes to make the bowl and add a small pot over the skillet then add the chicken and cook until it begins to boil pour this mixture over the crust and then place the whole in baking pan place the potatoes and tomatoes on top of the meat with some of it place in

time: 1.96 s (started: 2022-03-25 02:10:59 +00:00)


In [None]:
pd.DataFrame.from_dict(pd.Series(instSet.tokenizer.word_index))

Unnamed: 0,0
the,1
and,2
a,3
in,4
to,5
...,...
kaga,62904
thn,62905
gym,62906
uncontaminated,62907


time: 27.6 ms (started: 2022-03-25 02:11:01 +00:00)


## Save model

In [None]:
torch.save(model.state_dict(),
           rootDir + 'weights/instGenerator_model.pt')


# Tensorboard visualization

In [None]:
%load_ext tensorboard
%tensorboard --logdir=/content/drive/MyDrive/runs/instTrainer

Reusing TensorBoard on port 6006 (pid 1117), started 2:20:56 ago. (Use '!kill 1117' to kill it.)

<IPython.core.display.Javascript object>

time: 9.8 ms (started: 2022-03-25 02:11:01 +00:00)
