# LSTM on Recipe Data

## INITIALIZATION

### Imports

In [1]:
import json
import regex as re
import string
import tensorflow as tf
from tensorflow.keras import (
    layers,
    models,
    losses,
    callbacks
)
import datetime
import numpy as np

2024-11-27 23:19:16.394422: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-11-27 23:19:16.464620: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-27 23:19:16.464654: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-27 23:19:16.473434: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-27 23:19:16.495026: I tensorflow/core/platform/cpu_feature_gua

### Functions

In [50]:
def pad_punctuation(s):
    # build a punctuation symbol regular expression, and then replace coughted punctuation symbol (kept in \1) with that symbol surounded by single whitespaces
    s=re.sub(f'([{string.punctuation}])', r' \1 ', s)
    # replace all occurance of more than 1 whitespaces in a row with one whitespace`
    s=re.sub(" +", " ", s)
    return s

def prepare_inputs(text):
    '''
    This transformation helps model learn to predict next token in sentence by analyzing previous tokens in same sentence
    Returns:
        x,y: Pair where y is the target token for x
    '''
    # Add 1 dimension in the end of text data
    text=tf.expand_dims(text,-1) 
    tokenized_sentences=vectorize_layer(text)

    # Create input tensor=x which contains all tokens of each sentence except last token in each sentence
    x=tokenized_sentences[:,:-1]
    # Create target tensor=y which contains all tokens but starting from the 2nd
    y=tokenized_sentences[:,1:]
    
    return x,y

def print_probs(info, vocab, top_k=5):
    '''
    Print probabilitis for the words with highest probablities based on model's experience, which has been learnt during inference.

    Args:
        info (list): Data gerated from TextGenerator.generate function
                     e.g.   [{'prompt'    : 'recipe for chockolate ice cream |',
                              'word_probs': array([9.1210980e-13, 3.9852460e-14, 1.2957362e-18, ..., 0.0000000e+00,
                                     2.9555584e-37, 1.7050197e-32], dtype=float32)}]
        vocab (list): Model vocabulary
        top_k (int): Number of words with highest probability
    Returns:
        PROMPT   : text | generated text
                precicted_word_1:   	predicted_probability_%
                precicted_word_2:   	predicted_probability_%
                precicted_word_3:   	predicted_probability_%
                precicted_word_4:   	predicted_probability_%
                precicted_word_5:   	predicted_probability_%
                ----------
    '''
    for i in info:
        print(f"\nPROMPT: {i['prompt']}")
        word_probs=i['word_probs']
        # Sort, reverse and take first top_k elements
        p_sorted=np.sort(word_probs)[::-1][:top_k]
        # Sort, reverse and take first top_k elements
        i_sorted=np.argsort(word_probs)[::-1][:top_k]

        for p,i in zip(p_sorted,i_sorted):
            # Print probabilities for each element in percents
            print(f'{vocab[i]}:   \t{np.round(100*p,2)}%')

        print('----------\n')


## PREPARE DATA

### Load and explore

In [3]:
with open('./datasets/epirecipes/full_format_recipes.json') as json_data:          
   recipe_data=json .load(json_data)

In [4]:
# create filtered data using list comprehensions
filtered_data=[
    'Recipe for ' + x['title'] + ' | ' + " ".join(x['directions'])
    for x in recipe_data
    if 'title' in x
    and x['title'] is not None
    and 'directions' in x
    and x['directions'] is not None
]

print(f'Num of objects in the list: {len(filtered_data)}')

Num of objects in the list: 20111


In [5]:
filtered_data[1]

'Recipe for Boudin Blanc Terrine with Red Onion Confit  | Combine first 9 ingredients in heavy medium saucepan. Add 3 shallots. Bring to simmer. Remove from heat, cover and let stand 30 minutes. Chill overnight. Preheat oven to 325°F. Line 7-cup pâté or bread pan with plastic wrap. Melt butter in heavy small skillet over low heat. Add remaining 5 shallots. Cover and cook until very soft, stirring occasionally, about 15 minutes. Transfer to processor. Add pork, eggs, flour and Port and puree. Strain cream mixture, pressing on solids to extract as much liquid as possible. With processor running, add cream through feed tube and process just until combined with pork. Transfer to large bowl. Mix in currants. Spoon mixture into prepared pan. Cover with foil. Place pan in large pan. Add boiling water to larger pan to within 1/2 inch of top of terrine. Bake until terrine begins to shrink from sides of pan and knife inserted into center comes out clean, about 1 1/2 hours. Uncover and cool on ra

**INTERIM CONCLUSION**

We have succesfully loaded json formated data, filtered it and pre-process, thus after all transormation being done, we haeve a filtered list named `filtered data`, which contains 20111 filtered objects with title begins from `recipe for` and directions separated from title by the `|`

## TOKENIZATION

Tokenization is the process of a breaking the text up into individual units, such as words or little pieces of words or charachter symbols or other common character combinations.

So how to tokenize your text will depend on final goals, but always keep in mind that key strategy here is how you want your model process the text. For example you can lowercase all words, and thus all words Bill and bill will be pointed to the same index, this is good when your text is desired to someone whose named Bill, but in case if this text has something related to Bills finanace operations and bill your model will loos this conection, which in some cases may be important, so `know your corpus`  
Main Questions to answer:
- Capitalize or Not
- Size of vocabulary
- Stemm or Not
- Tokenize punctuation or Not

### Pad the punctuation

Pad the punctuation to treat them as separarate words

```python
def pad_punctuation(s):
    # build a punctuation symbol regular expression, and then replace coughted punctuation symbol (kept in \1) with that symbol surounded by single whitespaces
    s=re.sub(f'([{string.punctuation}])', r' \1 ', s)
    # replace all occurance of more than 1 whitespaces in a row with one whitespace`
    s=re.sub(" +", " ", s)
    return s
```


In [52]:
filtered_data[1]

'Recipe for Boudin Blanc Terrine with Red Onion Confit  | Combine first 9 ingredients in heavy medium saucepan. Add 3 shallots. Bring to simmer. Remove from heat, cover and let stand 30 minutes. Chill overnight. Preheat oven to 325°F. Line 7-cup pâté or bread pan with plastic wrap. Melt butter in heavy small skillet over low heat. Add remaining 5 shallots. Cover and cook until very soft, stirring occasionally, about 15 minutes. Transfer to processor. Add pork, eggs, flour and Port and puree. Strain cream mixture, pressing on solids to extract as much liquid as possible. With processor running, add cream through feed tube and process just until combined with pork. Transfer to large bowl. Mix in currants. Spoon mixture into prepared pan. Cover with foil. Place pan in large pan. Add boiling water to larger pan to within 1/2 inch of top of terrine. Bake until terrine begins to shrink from sides of pan and knife inserted into center comes out clean, about 1 1/2 hours. Uncover and cool on ra

In [6]:
text_data=[pad_punctuation(x) for x in filtered_data]

In [7]:
text_data[1]

'Recipe for Boudin Blanc Terrine with Red Onion Confit | Combine first 9 ingredients in heavy medium saucepan . Add 3 shallots . Bring to simmer . Remove from heat , cover and let stand 30 minutes . Chill overnight . Preheat oven to 325°F . Line 7 - cup pâté or bread pan with plastic wrap . Melt butter in heavy small skillet over low heat . Add remaining 5 shallots . Cover and cook until very soft , stirring occasionally , about 15 minutes . Transfer to processor . Add pork , eggs , flour and Port and puree . Strain cream mixture , pressing on solids to extract as much liquid as possible . With processor running , add cream through feed tube and process just until combined with pork . Transfer to large bowl . Mix in currants . Spoon mixture into prepared pan . Cover with foil . Place pan in large pan . Add boiling water to larger pan to within 1 / 2 inch of top of terrine . Bake until terrine begins to shrink from sides of pan and knife inserted into center comes out clean , about 1 1 

**INTERIM CONCLUSION**

We have successfully padded punctuation symbols.

### Convert to TF

In [55]:
# Convert data to a TensorFlow dataset devided by batches with 32 recepies and shuffle buffer thus all recepies are devided randomly
text_ds=(
    tf.data.Dataset.from_tensor_slices(text_data)
    .batch(32)
    .shuffle(1000)
)

#for batch in text_ds.take(1):
#    print(batch)
    

### Vectorization

#### Create vectorization layer

Create a Keras TextVectorization layer:
- to convert text to lowercase
- give most prevalent 10k words a correcponding integer token
- pad the sequnce to 201 tokens long

In [9]:
VOCAB_SIZE=10000
MAX_LEN=200

vectorize_layer=layers.TextVectorization(
    standardize='lower',
    max_tokens=VOCAB_SIZE,
    output_mode='int',
    output_sequence_length=MAX_LEN + 1
)

#### Calculate text statistics

i.e.  
- Apply TextVectorization to the training data
- Get the vocabulary of 10k most pevalent words.  
  NOTE:  
  - all words over 10k will be coded as 1 (i.e. UNK)
  - if number of words in sentence less then 201, thouse will be coded as 0 (i.e. stop token - text string come to an end)

In [10]:
%%time
vectorize_layer.adapt(text_ds)
vocab=vectorize_layer.get_vocabulary()

CPU times: user 4.41 s, sys: 802 ms, total: 5.21 s
Wall time: 5.38 s


2024-11-01 16:47:25.998507: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


In [11]:
# Display token-word mapping
for i, word in enumerate(vocab[:10]):
    print(f'{i}: {word}')

0: 
1: [UNK]
2: .
3: ,
4: and
5: to
6: in
7: the
8: with
9: a


**INTERIM CONCLUSION**

We see a subset of tokens mapped ti their respctive indices. The layer reserves the 0 token for padding, and 1 for unknown.
NOTE:
- The other words are assigned tokens in order of frequency

In [44]:
text_data[2]

'Recipe for Potato and Fennel Soup Hodge | In a large heavy saucepan cook diced fennel and onion in butter over moderate heat , stirring , until softened , about 10 minutes . Peel and cube potatoes . Add potatoes and broth to fennel mixture and simmer , covered , until potatoes are very tender , about 20 minutes . In a blender or food processor purée mixture in batches until smooth and return to saucepan . Stir in milk and salt and pepper to taste and simmer soup , stirring occasionally , 10 minutes , or until heated through . Garnish soup with reserved fennel leaves . '

In [45]:
# display same as above but as converted to int word mappings
example_tokenised=vectorize_layer(text_data[2])
print(example_tokenised.numpy())

[  26   16  335    4  354  244    1   27    6    9   30   78   80   43
 1282  354    4  115    6   50   20  269   17    3   48    3   10  361
    3   19   82   12    2  175    4 1915  150    2   18  150    4  171
    5  354   31    4   70    3  121    3   10  150   79  218   85    3
   19  170   12    2    6    9  281   41  291  188  324   31    6  303
   10  141    4  246    5   80    2   42    6  211    4   24    4   33
    5  132    4   70  244    3   48   90    3   82   12    3   41   10
  396  102    2  304  244    8  285  354  262    2    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0 

**INTERIM CONCLUSION**

From the examples above we can see how recipe has been tokenized:
- firts two tokes 26,12 a relevant to recipe for
- there are 1's in the code, meaning that some words are lost their meaning
- it is only 50% if vector represented as words, the other 50% filled with 0

### Create Training Dataset

prepare_inputs convert the dataset to the MapDataset where each sentnce is splited on to the 2 sets of sequences:
- x: contains all words in sentece except last
- y: shifted by one left, thus it starts from the 2nd element

Thus we will have a sort of tuple `[x,y]` thus when our model will train it will learn relation ships between words as it nos that word `x` the target will be `y`. For example in sentence `The cloud is white` model will learn that `x=The` and `y=cloud` so it will adjust it weight accordingly.

In [13]:
train_ds=text_ds.map(prepare_inputs)

In [47]:
type(train_ds)

tensorflow.python.data.ops.map_op._MapDataset

In [58]:
for x,y in train_ds.take(1):
    print(f'x={x}\n y={y}')

x=[[  26   16 1759 ...   82   12    2]
 [  26   16  187 ... 1023   35    4]
 [  26   16 5649 ...    3   46  255]
 ...
 [  26   16 2130 ...    0    0    0]
 [  26   16  418 ...    0    0    0]
 [  26   16  617 ...    6  303    3]]
 y=[[  16 1759   13 ...   12    2   97]
 [  16  187   13 ...   35    4   72]
 [  16 5649   27 ...   46  255 1546]
 ...
 [  16 2130  525 ...    0    0    0]
 [  16  418  298 ...    0    0    0]
 [  16  617 1481 ...  303    3   18]]


2024-11-03 00:12:20.620427: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


## BUILD LSTM

The Embedding matrix visualisation  
<img src="./images/3blue1brown-embeddings.png" width="600" height="400">

(c) [3Blue1Brown](https://www.youtube.com/watch?v=wjZofJX0v4M)

<img src="./images/lstm-embeddings.png" width="600" height="400">

(c) [Generative Deep Learning, 2nd Edition](https://www.oreilly.com/library/view/generative-deep-learning/9781098134174/): David Foster's book from which has become an inspiration of this notebook.

In [15]:
EMBEDDING_DIM=100
# Number of LSTM cells
N_UNITS=128

# Input any size(shape=None) because sentences may varry
# Role: Pass a tensor of integer sequences of shape [batch_size,seq_length] to the embeding layer
inputs=layers.Input(
    shape=(None,),
    dtype='int32'
)

# An embedding layer is essentially a lookup table that converts each integer token into a vector of length EMBEDDING_DIM
# Role: Pass a tensor shape [batch_size,seq_length,EMBEDDING_DIM] to the LSTM
x=layers.Embedding(
    VOCAB_SIZE,
    EMBEDDING_DIM
)(inputs)

x=layers.LSTM(
    N_UNITS,
    return_sequences=True
)(x)

outputs=layers.Dense(
    VOCAB_SIZE,
    activation='softmax'
)(x)

lstm=models.Model(inputs,outputs)

**INTERIM CONCLUSION**

The undetneath mechanics is following:  
1. **Input Layer**: This layer accepts a tensor of shape `[batch_size, seq_length] = [32, 201]`, where each element represents a token (word) in a sequence of 201 tokens per batch. The input tensor is then passed to the Embedding layer.
   
2. **Embedding Layer**: The Embedding layer converts each token in the sequence to a vector of length EMBEDDING_DIM. In our example, each token is mapped to a vector of size 100, so the layer outputs a tensor with shape `[batch_size, seq_length, EMBEDDING_DIM] = [32, 201, 100]`. This transformation allows the model to work with dense, meaningful(i.e. a vector that captures essential information or patterns about the data it represents, in a way that’s useful for the task at hand) vector representations of words rather than raw integer tokens.

*Note*: Tokens with the same ID (i.e., the same word) will be mapped to identical vectors, regardless of where they appear—whether in the same sequence, across different batches, or in the entire corpus. This ensures that each word has a consistent embedding representation.

3. **LSTM Layer**: The LSTM (Long Short-Term Memory) layer processes the tensor from the Embedding layer over 201 time steps for  each sequence in the batch, handling each vector from the embedding one at a time. This means:
- each vector in the sequence (one token's embedding) is fed to the LSTM as 𝑥𝑡, representing the token at the current time step 𝑡.
- the LSTM outputs a hidden state ℎ𝑡 at each step, representing the model’s updated understanding of the sequence up to that point. This hidden state can be thought of as the model's probability prediction for the next token in the sequence. 
- since LSTM param `return_sequences=True` is set, the LSTM will retain the hidden state ℎ𝑡 for every token in the sequence. This is essential for backpropagation, allowing the model to adjust each prediction by comparing it to the actual next token stored in 
𝑦. This setting is necessary for sequence-to-sequence tasks (like predicting the next word for each position in the sentence) rather than a single output. While backpropagation is applied through all time steps, return_sequences=True is specifically for capturing all time-step predictions in the output tensor.
- after processing all tokens, the LSTM outputs a tensor of shape `[batch_size, seq_length, N_UNITS] = [32, 201, 128]`, where each element in the sequence now has a 128-dimensional vector representing the hidden state for that position.

   <img src="./images/lstm.png" width="600" height="400">

4. **Dense Layer**: The Dense layer receives the tensor of shape `[32, 201, 128]` from the LSTM layer. It then calculates probabilities for each word in the vocabulary by:
- multiplying each 128-dimensional hidden state vector by a weight matrix of shape `[128, VOCAB_SIZE] = [128, 10000]` and adding a bias vector of size `[10000,]`.
- this results in an output of shape `[32, 201, 10000]`, where each 10,000-dimensional vector represents the probability distribution over the vocabulary for the next word. Each probability value represents how likely each word in the vocabulary is to follow the current sequence up to the respective token.
The Dense layer combines weights across the Embedding, LSTM, and Dense layers that contribute to predictions. The Dense layer’s main role is to transform each LSTM output to a vocabulary-sized probability distribution using the learned weights.
The Dense layer’s weight matrix is initialized at model creation and is getting adjusted during training by backpropagating the loss. Each batch provides feedback to adjust the weights, aligning predictions with true next tokens(i.e. data saved in `y`).
  
5. **Learning Process**: During training, the Dense layer’s output probabilities for each word are compared to the actual next words in the sequence, forming a "loss" that tells the model how far its predictions were from reality. This difference is then backpropagated through the model to adjust the weights, particularly the Dense layer’s weight matrix. These adjusted weights allow the model to improve its predictions for the next word over time, capturing patterns in word sequences.

In [16]:
lstm.summary()

In [17]:
LOAD_MODEL=False

if LOAD_MODEL:
    lstm=models.load_model('./models/lstm', compile=False)

## TRAIN LSTM

### Configure Loss

In [18]:
loss_fn=losses.SparseCategoricalCrossentropy()
lstm.compile('adam',loss_fn)

### Text Generator

In [19]:
class TextGenerator(callbacks.Callback):
    def __init__(self,index_to_word,top_k=10):
        self.index_to_word=index_to_word
        self.word_to_index={
            word: index for index,word in enumerate(index_to_word)
        }

    def sample_from(self, probs, temperature):
        probs=probs ** (1 / temperature)
        probs=probs / np.sum(probs)

        return np.random.choice(len(probs),p=probs), probs

    def generate(self, start_prompt, max_tokens, temperature):
        start_tokens=[
            self.word_to_index.get(x, 1) for x in start_prompt.split()
        ]
        sample_token=None
        info=[]

        while len(start_tokens) < max_tokens and sample_token !=0:
            x=np.array([start_tokens])
            y=self.model.predict(x,verbose=0)
            sample_token, probs= self.sample_from(y[0][-1], temperature)
            info.append({'prompt':start_prompt, 'word_probs': probs})
            start_tokens.append(sample_token)
            start_prompt=start_prompt + ' ' + self.index_to_word[sample_token]

        print(f'\ngenerated text:\n{start_prompt}\n')
        return info

    def on_epoch_end(self, epoch, logs=None):
        self.generate('recipe for', max_tokens=100, temperature=1.0)

### Model

In [20]:
log_dir='./logs/fit/lstm/recipe/' + datetime.datetime.now().strftime('%Y%m%d-%H%M%S')

model_checkpoint_callback=callbacks.ModelCheckpoint(
    filepath='./checkpoints/lstm.weights.h5',
    save_weights_only=True,
    save_freq='epoch',
    verbose=0
)

tensorboard_callback=callbacks.TensorBoard(
    log_dir=log_dir,
    histogram_freq=1,
    write_graph=True,
    write_images=False,
    update_freq='epoch',
    profile_batch=2,
    embeddings_freq=1
)

2024-11-01 16:47:27.900081: I external/local_tsl/tsl/profiler/lib/profiler_session.cc:104] Profiler session initializing.
2024-11-01 16:47:27.900123: I external/local_tsl/tsl/profiler/lib/profiler_session.cc:119] Profiler session started.
2024-11-01 16:47:27.900159: I external/local_xla/xla/backends/profiler/gpu/cupti_tracer.cc:1239] Profiler found 1 GPUs
2024-11-01 16:47:27.900477: E external/local_xla/xla/backends/profiler/gpu/cupti_error_manager.cc:137] cuptiGetTimestamp: error 999: 
2024-11-01 16:47:27.900491: E external/local_xla/xla/backends/profiler/gpu/cupti_error_manager.cc:186] cuptiSubscribe: ignored due to a previous error.
2024-11-01 16:47:27.900495: E external/local_xla/xla/backends/profiler/gpu/cupti_error_manager.cc:223] cuptiGetResultString: ignored due to a previous error.
2024-11-01 16:47:27.900500: E external/local_xla/xla/backends/profiler/gpu/cupti_tracer.cc:1282] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with

In [21]:
text_generator=TextGenerator(vocab)

### Fit

In [22]:
%%time

EPOCHS=25

lstm.fit(
    train_ds,
    epochs=EPOCHS,
    callbacks=[model_checkpoint_callback,
              tensorboard_callback,
              text_generator]
)


Epoch 1/25


2024-11-01 16:47:28.769867: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:465] Loaded cuDNN version 8907


[1m  2/629[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1:25[0m 136ms/step - loss: 9.2076

2024-11-01 16:47:29.210285: I external/local_tsl/tsl/profiler/lib/profiler_session.cc:104] Profiler session initializing.
2024-11-01 16:47:29.210344: I external/local_tsl/tsl/profiler/lib/profiler_session.cc:119] Profiler session started.
2024-11-01 16:47:29.210368: E external/local_xla/xla/backends/profiler/gpu/cupti_error_manager.cc:135] cuptiGetTimestamp: ignored due to a previous error.
2024-11-01 16:47:29.210387: E external/local_xla/xla/backends/profiler/gpu/cupti_error_manager.cc:186] cuptiSubscribe: ignored due to a previous error.
2024-11-01 16:47:29.210391: E external/local_xla/xla/backends/profiler/gpu/cupti_error_manager.cc:223] cuptiGetResultString: ignored due to a previous error.
2024-11-01 16:47:29.210397: E external/local_xla/xla/backends/profiler/gpu/cupti_tracer.cc:1282] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error 
2024-11-01 16:47:29.328934: I external/local_tsl/tsl/profiler/lib/profiler_session.cc:70] 

[1m629/629[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 124ms/step - loss: 4.9996
generated text:
recipe for pea moroccan blackberry 

[1m629/629[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m80s[0m 126ms/step - loss: 4.9982
Epoch 2/25
[1m629/629[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 356ms/step - loss: 3.0475
generated text:
recipe for coat tart - kale wontons with ginger | preheat oven over soaking surface . . bring in a processor or until small heavy broiler ) partially rack until finely stock is crosswise thickens still oil , about 12 minutes . bake occasionally in same 425° and then pour into dusted tortillas . with vanilla in medium bowl . cover rack and generously lay racks additions . strudels cooking onions , and bring to small large bowl . reduce low heat until guinness ahead , being oils , adding chile pancetta to medium - high cucumber , 6 garnished

[1m629/629[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m230s[0m 365ms/step - loss: 3.0472
Ep

ValueError: Invalid filepath extension for saving. Please add either a `.keras` extension for the native Keras format (recommended) or a `.h5` extension. Use `model.export(filepath)` if you want to export a SavedModel for use with TFLite/TFServing/etc. Received: filepath=./models/lstm.

**INTERIM CONCLUSION**

<img src="./images/lstm-loss.png" width="300" height="400">

On the presented graph we see that our model's training loss decreases consistently, which indicates that the LSTM model is learning effectively. After 20 epochs, the loss appears to flatten out, approaching a value around 1.5, suggesting the model has converged or reached a plateau in training performance.

In [26]:
lstm.save('./models/lstm/lstm.keras')

## TEXT GENERATION

In [29]:
info=text_generator.generate(
    'recepie for roasted vegetables | chop 1/', max_tokens=10, temperature=1.0
)


generated text:
recepie for roasted vegetables | chop 1/ pea purée in



In [33]:
info=text_generator.generate(
    'recepie for roasted vegetables | chop 1 /', max_tokens=100, temperature=0.2
)


generated text:
recepie for roasted vegetables | chop 1 / 2 teaspoon salt and 1 / 4 teaspoon pepper , and 1 / 4 teaspoon salt , and 1 / 4 teaspoon salt , and 1 / 2 teaspoon of salt and 1 / 2 teaspoon pepper in a food processor until it is smooth . add the butter and stir until it is smooth . add the remaining 1 / 2 cup of the remaining 1 tablespoon of the reserved juice , the lemon juice , and the salt and simmer the mixture , stirring , for 1 minute . stir



In [36]:
info=text_generator.generate(
    'recipe for chockolate ice cream |', max_tokens=7, temperature=1.0
)


generated text:
recipe for chockolate ice cream | in



In [37]:
print_probs(info,vocab)


PROMPT: recipe for chockolate ice cream |
combine:   	19.51%
in:   	12.01%
stir:   	7.83%
1:   	6.83%
bring:   	6.52%
----------



In [38]:
info=text_generator.generate('recipe for chockolate ice cream |', max_tokens=7, temperature=0.2)


generated text:
recipe for chockolate ice cream | combine



In [39]:
print_probs(info,vocab)


PROMPT: recipe for chockolate ice cream |
combine:   	90.01%
in:   	7.94%
stir:   	0.94%
1:   	0.47%
bring:   	0.37%
----------



**References**

1. [Generative Deep Learning, 2nd Edition](https://www.oreilly.com/library/view/generative-deep-learning/9781098134174/): David Foster's book from which has become an inspiration of this notebook.
2. [David Foster](https://github.com/davidADSP): GitHub page
3. [David Foster (Keynote) - Generative Deep Learning -Key To Unlocking Artificial General Intelligence?](https://www.youtube.com/watch?v=rHLf78CmNmQ): David's video session at Youtube regarding some key concepts has written in his book
4. [Understanding LSTM](https://colah.github.io/posts/2015-08-Understanding-LSTMs/): Christopher Olah's blog where he is explaining LSTM networks.
5. [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/): Anfdrej Karpathy's blog where he is demostrates some cool features of RNNs.
6. [Long Short-Term Memory (LSTM), Clearly Explained](https://www.youtube.com/watch?v=YCzL96nL7j0): Josh Stamer's StatQuest YouTube channel with amazing explanation and visualisation of LSTM key concepts.
7. [Keras LSTM source code](https://github.com/keras-team/keras/blob/master/keras/src/layers/rnn/lstm.py): Link to Keras LSTM layer source code.