<img src="img.png" width="700">

In this activity we're going to see the evolution of Natural Language Processing from 

Non-neural network method -> NN methods/models -> LLMs

In [None]:
sentences = [
    "I love machine learning",
    "I love deep learning",
    "I hate bugs in code",
    "Machine learning is powerful"
]

# 1. Non-NN Methods/Traditional NLP

    - These methods completely rely 
        - frequency 
        - probability

## 1.0 Text Preprocessing (Foundation layer of NLP)
    - Tokenization
    - Stopwords
    - Stemming
    - Lemmatization

## 1.1 CountVectorizer (Bag of Words)

 - What It Does
    - Counts word frequency.
    - Ignores word order & context.

 - Significance
    - First practical NLP representation
    - Simple, fast
    - Works well for basic classification
 - Cons
    - No word order 
    - No meaning



In [9]:
from sklearn.feature_extraction.text import CountVectorizer

sentences = [
    "I am happy with this product",
    "I am unhappy with this service",
    "I am satisfied with this purchase",
    "I am disappointed with this experience"
]

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(sentences)

print("Vocabulary:", vectorizer.get_feature_names_out())
print("Bag of Words Matrix:\n", X.toarray())

Vocabulary: ['am' 'disappointed' 'experience' 'happy' 'product' 'purchase' 'satisfied'
 'service' 'this' 'unhappy' 'with']
Bag of Words Matrix:
 [[1 0 0 1 1 0 0 0 1 0 1]
 [1 0 0 0 0 0 0 1 1 1 1]
 [1 0 0 0 0 1 1 0 1 0 1]
 [1 1 1 0 0 0 0 0 1 0 1]]


## 1.2. TF-IDF (Term Frequency – Inverse Document Frequency)
 - What It Does
    - Weights words based on importance:
    - Common words → lower weight
 - Rare but important words → higher weight

 - Significance
    - Improved BoW
    - Reduces dominance of frequent words
    - Still no context understanding
 - Cons 
    - No context

In [2]:
from sklearn.feature_extraction.text import TfidfVectorizer

tfidf = TfidfVectorizer()
X_tfidf = tfidf.fit_transform(sentences)

print("TF-IDF Matrix:\n", X_tfidf.toarray())

TF-IDF Matrix:
 [[0.         0.         0.         0.         0.         0.
  0.49681612 0.61366674 0.61366674 0.        ]
 [0.         0.         0.70203482 0.         0.         0.
  0.44809973 0.55349232 0.         0.        ]
 [0.5        0.5        0.         0.5        0.5        0.
  0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.57457953
  0.36674667 0.         0.4530051  0.57457953]]


## 1.* Other methods - 
    - N-grams
    - Statistical & Probablistic models
    - Topic modelling
    - Matrix methods

# 2. Word Representation Learning - Word Embeddings (Word2Vec / GloVe)
 * Word2vec - Shallow NN (Not DNN)
 * Glove - Matrix factorization based/Word co-occurance matrix


 - What It Does
    - Represents words as dense vectors.
    - Captures semantic similarity.

    ** king - man + woman ≈ queen

 - Significance
    - Captures meaning/semantic similarity
    - Words close in vector space = similar meaning
    - Still static (one meaning per word)
 - Cons - 
    - Static word meaning

In [3]:
!pip install gensim

Collecting gensim
  Downloading gensim-4.4.0-cp312-cp312-macosx_11_0_arm64.whl.metadata (8.4 kB)
Downloading gensim-4.4.0-cp312-cp312-macosx_11_0_arm64.whl (24.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.5/24.5 MB[0m [31m462.3 kB/s[0m  [33m0:00:51[0mm0:00:01[0m00:02[0m
[?25hInstalling collected packages: gensim
Successfully installed gensim-4.4.0


In [5]:
import nltk
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to
[nltk_data]     /Users/gourasundarmohanty/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [None]:
# Word2Vec using gensim
from gensim.models import Word2Vec
from nltk.tokenize import word_tokenize

tokenized_sentences = [word_tokenize(sentence.lower()) for sentence in sentences]

model = Word2Vec(sentences=tokenized_sentences, vector_size=50, window=2, min_count=1)

print("Vector for 'learning':\n", model.wv['learning'])
print("Most similar to 'machine':\n", model.wv.most_similar('machine'))

Vector for 'learning':
 [-1.0724545e-03  4.7286271e-04  1.0206699e-02  1.8018546e-02
 -1.8605899e-02 -1.4233618e-02  1.2917745e-02  1.7945977e-02
 -1.0030856e-02 -7.5267432e-03  1.4761009e-02 -3.0669428e-03
 -9.0732267e-03  1.3108104e-02 -9.7203208e-03 -3.6320353e-03
  5.7531595e-03  1.9837476e-03 -1.6570430e-02 -1.8897636e-02
  1.4623532e-02  1.0140524e-02  1.3515387e-02  1.5257311e-03
  1.2701781e-02 -6.8107317e-03 -1.8928028e-03  1.1537147e-02
 -1.5043275e-02 -7.8722071e-03 -1.5023164e-02 -1.8600845e-03
  1.9076237e-02 -1.4638334e-02 -4.6675373e-03 -3.8754821e-03
  1.6154874e-02 -1.1861792e-02  9.0324880e-05 -9.5074680e-03
 -1.9207101e-02  1.0014586e-02 -1.7519170e-02 -8.7836506e-03
 -7.0199967e-05 -5.9236289e-04 -1.5322480e-02  1.9229487e-02
  9.9641159e-03  1.8466286e-02]
Most similar to 'machine':
 [('powerful', 0.16563552618026733), ('bugs', 0.13940520584583282), ('learning', 0.1267007291316986), ('deep', 0.1211962029337883), ('code', 0.08872982859611511), ('i', 0.01107198838144

In [None]:
# pip install gensim

Collecting gensim
  Downloading gensim-4.4.0-cp313-cp313-macosx_11_0_arm64.whl.metadata (8.4 kB)
Collecting smart_open>=1.8.1 (from gensim)
  Using cached smart_open-7.5.0-py3-none-any.whl.metadata (24 kB)
Downloading gensim-4.4.0-cp313-cp313-macosx_11_0_arm64.whl (24.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.4/24.4 MB[0m [31m4.8 MB/s[0m  [33m0:00:05[0m eta [36m0:00:01[0m
[?25hUsing cached smart_open-7.5.0-py3-none-any.whl (63 kB)
Installing collected packages: smart_open, gensim
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2/2[0m [gensim]2m1/2[0m [gensim]
[1A[2KSuccessfully installed gensim-4.4.0 smart_open-7.5.0
Note: you may need to restart the kernel to use updated packages.


In [24]:
# king - man + woman ≈ queen
# Load pretrained model (this may take time first time)
import gensim.downloader as api
model = api.load("word2vec-google-news-300")



In [25]:
# Perform vector arithmetic 
result = model.most_similar(
    positive=["king", "woman"],
    negative=["man"],
    topn=5
)

print(result)


[('queen', 0.7118192911148071), ('monarch', 0.6189674735069275), ('princess', 0.5902431011199951), ('crown_prince', 0.5499460697174072), ('prince', 0.5377321243286133)]


In [26]:
# manually check the vector 
import numpy as np

vector = model["king"] - model["man"] + model["woman"]

similarities = model.similar_by_vector(vector, topn=5)
print(similarities)

[('king', 0.8449392318725586), ('queen', 0.7300516366958618), ('monarch', 0.6454660296440125), ('princess', 0.6156251430511475), ('crown_prince', 0.5818676948547363)]


# 3. NN Methods - Sequence learning

## 3.1 RNN (Recurrent Neural Network)
 - What It Does
    - Processes sequences word-by-word.
    - Remembers previous context (short-term memory).
 - Significance
    - Understands word order
    - Better for sequences
    - Struggles with long dependencies (vanishing gradient)

 - Applications 
   - Time-series prediction
   - NLP
   - Speech recognition 
   - Image and video processing
 - Cons 
   - Vanishing gradient/exploding gradient 
   - Poor long term memory


In [2]:
!pip install tensorflow

Collecting tensorflow
  Downloading tensorflow-2.20.0-cp313-cp313-macosx_12_0_arm64.whl.metadata (4.5 kB)
Collecting absl-py>=1.0.0 (from tensorflow)
  Using cached absl_py-2.4.0-py3-none-any.whl.metadata (3.3 kB)
Collecting astunparse>=1.6.0 (from tensorflow)
  Using cached astunparse-1.6.3-py2.py3-none-any.whl.metadata (4.4 kB)
Collecting flatbuffers>=24.3.25 (from tensorflow)
  Using cached flatbuffers-25.12.19-py2.py3-none-any.whl.metadata (1.0 kB)
Collecting gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 (from tensorflow)
  Using cached gast-0.7.0-py3-none-any.whl.metadata (1.5 kB)
Collecting google_pasta>=0.1.1 (from tensorflow)
  Using cached google_pasta-0.2.0-py3-none-any.whl.metadata (814 bytes)
Collecting libclang>=13.0.0 (from tensorflow)
  Using cached libclang-18.1.1-1-py2.py3-none-macosx_11_0_arm64.whl.metadata (5.2 kB)
Collecting opt_einsum>=2.3.2 (from tensorflow)
  Using cached opt_einsum-3.4.0-py3-none-any.whl.metadata (6.3 kB)
Collecting protobuf>=5.28.0 (from tensorflow)
  Us

In [3]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense

In [4]:
# Define input text and identify unique characters in the text which we’ll encode for our model.
text = "Welcom to IITM Pravartak program, focused on learning Agentic AI, Generative AI systems and it's practical implications"
chars = sorted(list(set(text)))
char_to_index = {char: i for i, char in enumerate(chars)}
index_to_char = {i: char for i, char in enumerate(chars)}

In [5]:
# To train the RNN, we need sequences of fixed length (seq_length) and the character following each sequence as the label.
seq_length = 3
sequences = []
labels = []

for i in range(len(text) - seq_length):
    seq = text[i:i + seq_length]
    label = text[i + seq_length]
    sequences.append([char_to_index[char] for char in seq])
    labels.append(char_to_index[label])

X = np.array(sequences)
y = np.array(labels)

In [6]:
# Converting Sequences and Labels to One-Hot Encoding
X_one_hot = tf.one_hot(X, len(chars))
y_one_hot = tf.one_hot(y, len(chars))

In [7]:
# Create a simple RNN model with a hidden layer of 50 units and a Dense output layer with softmax activation
model = Sequential()
model.add(SimpleRNN(50, input_shape=(seq_length, len(chars)), activation='relu'))
model.add(Dense(len(chars), activation='softmax'))

  super().__init__(**kwargs)


In [8]:
# Compile the model using the categorical_crossentropy loss and train it for 100 epochs
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_one_hot, y_one_hot, epochs=100)

Epoch 1/100
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.0172 - loss: 3.3457  
Epoch 2/100
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.0345 - loss: 3.3090     
Epoch 3/100
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.1121 - loss: 3.2777     
Epoch 4/100
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.1207 - loss: 3.2457     
Epoch 5/100
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.1724 - loss: 3.2133 
Epoch 6/100
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.1983 - loss: 3.1806 
Epoch 7/100
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.2155 - loss: 3.1445 
Epoch 8/100
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.2155 - loss: 3.1068 
Epoch 9/100
[1m4/4[0m [32m━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x1463f1550>

In [9]:
# Testing - After training we use a starting sequence to generate new text character by character
start_seq = "focused on"
generated_text = start_seq

for i in range(50):
    x = np.array([[char_to_index[char] for char in generated_text[-seq_length:]]])
    x_one_hot = tf.one_hot(x, len(chars))
    prediction = model.predict(x_one_hot)
    next_index = np.argmax(prediction)
    next_char = index_to_char[next_index]
    generated_text += next_char

print("Generated Text:")
print(generated_text)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 105ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1

## 3.2 LSTM (Long Short-Term Memory)
 - What It Does
    - Improved RNN with memory gates:
    - Forget gate
    - Input gate
    - Output gate

 - Significance
    - Handles long-term dependencies better
    - Used in translation, speech recognition (pre-transformer era)

 - Cons 
   - Slow inference
   - Limited memory
   - Context fade over very long sequence
   - Hard to scale

In [None]:
pip install pandas scikit-learn matplotlib

Collecting pandas
  Using cached pandas-3.0.0-cp313-cp313-macosx_11_0_arm64.whl.metadata (79 kB)
Collecting scikit-learn
  Downloading scikit_learn-1.8.0-cp313-cp313-macosx_12_0_arm64.whl.metadata (11 kB)
Collecting matplotlib
  Downloading matplotlib-3.10.8-cp313-cp313-macosx_11_0_arm64.whl.metadata (52 kB)
Collecting scipy>=1.10.0 (from scikit-learn)
  Downloading scipy-1.17.0-cp313-cp313-macosx_14_0_arm64.whl.metadata (62 kB)
Collecting joblib>=1.3.0 (from scikit-learn)
  Downloading joblib-1.5.3-py3-none-any.whl.metadata (5.5 kB)
Collecting threadpoolctl>=3.2.0 (from scikit-learn)
  Using cached threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Collecting contourpy>=1.0.1 (from matplotlib)
  Downloading contourpy-1.3.3-cp313-cp313-macosx_11_0_arm64.whl.metadata (5.5 kB)
Collecting cycler>=0.10 (from matplotlib)
  Using cached cycler-0.12.1-py3-none-any.whl.metadata (3.8 kB)
Collecting fonttools>=4.22.0 (from matplotlib)
  Downloading fonttools-4.61.1-cp313-cp313-macosx_10_13_un

In [14]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

In [17]:
data = pd.read_csv('monthly_milk_production.csv')
data['Date'] = pd.to_datetime(data['Date'])
data.set_index('Date', inplace=True)
production = data['Production'].astype(float).values.reshape(-1, 1)

scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(production)

In [18]:
# Creating Sequences and Train-Test Split
window_size = 12
X = []
y = []
target_dates = data.index[window_size:]

for i in range(window_size, len(scaled_data)):
    X.append(scaled_data[i - window_size:i, 0])
    y.append(scaled_data[i, 0])

X = np.array(X)
y = np.array(y)

X_train, X_test, y_train, y_test, dates_train, dates_test = train_test_split(
    X, y, target_dates, test_size=0.2, shuffle=False
)

X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))

In [19]:
# Build LSTM model
model = Sequential()
model.add(LSTM(units=128, return_sequences=True,
          input_shape=(X_train.shape[1], 1)))
model.add(Dropout(0.2))
model.add(LSTM(units=128))
model.add(Dropout(0.2))
model.add(Dense(1))

model.compile(optimizer='adam', loss='mean_squared_error')

  super().__init__(**kwargs)


In [20]:
# Training and Evaluation 
history = model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.1)

predictions = model.predict(X_test)
predictions = scaler.inverse_transform(predictions).flatten()
y_test = scaler.inverse_transform(y_test.reshape(-1,1)).flatten()

rmse = np.sqrt(np.mean((y_test - predictions)**2))
print(f'RMSE: {rmse:.2f}')

Epoch 1/100
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 99ms/step - loss: 0.1205 - val_loss: 0.0268
Epoch 2/100
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 28ms/step - loss: 0.0509 - val_loss: 0.0369
Epoch 3/100
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step - loss: 0.0281 - val_loss: 0.0708
Epoch 4/100
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 27ms/step - loss: 0.0424 - val_loss: 0.0692
Epoch 5/100
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 25ms/step - loss: 0.0318 - val_loss: 0.0284
Epoch 6/100
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 25ms/step - loss: 0.0245 - val_loss: 0.0290
Epoch 7/100
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step - loss: 0.0298 - val_loss: 0.0259
Epoch 8/100
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step - loss: 0.0262 - val_loss: 0.0390
Epoch 9/100
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[3

<img src="img2.png" width="500">

## 3.3 A tiny NN language model 

- What is a Language Model? 
    - A language model predicts: P(next_word∣previous_words)

In [28]:
!pip install torch

Collecting torch
  Downloading torch-2.10.0-2-cp313-none-macosx_11_0_arm64.whl.metadata (31 kB)
Collecting filelock (from torch)
  Downloading filelock-3.23.0-py3-none-any.whl.metadata (2.0 kB)
Collecting sympy>=1.13.3 (from torch)
  Using cached sympy-1.14.0-py3-none-any.whl.metadata (12 kB)
Collecting networkx>=2.5.1 (from torch)
  Downloading networkx-3.6.1-py3-none-any.whl.metadata (6.8 kB)
Collecting fsspec>=0.8.5 (from torch)
  Using cached fsspec-2026.2.0-py3-none-any.whl.metadata (10 kB)
Collecting mpmath<1.4,>=1.1.0 (from sympy>=1.13.3->torch)
  Using cached mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB)
Downloading torch-2.10.0-2-cp313-none-macosx_11_0_arm64.whl (79.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.5/79.5 MB[0m [31m4.3 MB/s[0m  [33m0:00:18[0mm0:00:01[0m00:01[0m
[?25hUsing cached fsspec-2026.2.0-py3-none-any.whl (202 kB)
Downloading networkx-3.6.1-py3-none-any.whl (2.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m 

In [29]:
import torch
import torch.nn as nn
import torch.optim as optim

# Sample dataset
text = "i love machine learning and i love deep learning"
words = text.split()

# Vocabulary
vocab = list(set(words))
word_to_ix = {word: i for i, word in enumerate(vocab)}
ix_to_word = {i: word for word, i in word_to_ix.items()}

# Prepare training data (input -> next word)
inputs = []
targets = []

for i in range(len(words)-1):
    inputs.append(word_to_ix[words[i]])
    targets.append(word_to_ix[words[i+1]])

inputs = torch.tensor(inputs)
targets = torch.tensor(targets)

# Model
class TinyLanguageModel(nn.Module):
    def __init__(self, vocab_size, embed_size):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_size)
        self.linear = nn.Linear(embed_size, vocab_size)

    def forward(self, x):
        x = self.embedding(x)
        x = self.linear(x)
        return x

model = TinyLanguageModel(len(vocab), 10)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Training loop
for epoch in range(200):
    optimizer.zero_grad()
    output = model(inputs)
    loss = criterion(output, targets)
    loss.backward()
    optimizer.step()

# Test prediction
with torch.no_grad():
    test_word = "i"
    test_input = torch.tensor([word_to_ix[test_word]])
    output = model(test_input)
    predicted = torch.argmax(output).item()

print("After 'i' predicted word:", ix_to_word[predicted])

After 'i' predicted word: love


In [30]:
# Test prediction
with torch.no_grad():
    test_word = "deep"
    test_input = torch.tensor([word_to_ix[test_word]])
    output = model(test_input)
    predicted = torch.argmax(output).item()

print("After 'deep' predicted word:", ix_to_word[predicted])

After 'deep' predicted word: learning
