## In this Colab we:
1. import list of latin names (ex. emma, justin etc.)
2. prepare & encode data
    - 'emma' : [e  m  m] -> [a]
    - tensor : [5 13 13] -> [1]
3. initialize trainable embedding matrix C.shape = [vocab_size(27) , **EMB_SIZE** (2)]
    - map any of 27 letter encodings to its embedding. ex. [0] -> [0.1488, 2.1415]
4. Initialize trainable architecture:
    - h = tanh(emb @ hidden_W + hidden_b)
    - logits = h @ out_W + out_b
    - F.cross_entropy(logits, Y_true)
5. train for **N_EPOCHS**

## Result:
- 1
- 2

In [1]:
import torch
import torch.nn.functional as F
import matplotlib.pyplot as plt

In [21]:
WINDOW = 3          # how many previous element to consider

# architecture
EMB_SIZE = 2        # dimention of embedding space
HIDDEN_SIZE = 100   # size of hidden layer
# training
LR = 0.1
N_EPOCHS = 100
BATCH_SIZE = 64

In [16]:
words = open('../docs/names.txt', 'r').read().splitlines()
print(f'{len(words)=}')
words[:7]

len(words)=32033


['emma', 'olivia', 'ava', 'isabella', 'sophia', 'charlotte', 'mia']

In [17]:
# build vocab + vLook-Up tables
chars = sorted(list(set(''.join(words))))
stoi = {s:i+1 for i,s in enumerate(chars)}
stoi['.'] = 0
itos = {i:s for s,i in stoi.items()}

vocab_size = len(chars + ['.'])

In [18]:
# BUILD DATA given WINDOW size

# example 'emma', WINDOW == 3
# context -> output  |  torch.tensor:
#  [...]  -> 'e'     |  [ 0, 0, 0] -> [5]
#  [..e]  -> 'm'     |  [ 0, 0, 5] -> [13]
#  [.em]  -> 'm'     |  [ 0, 5,13] -> [13]
#  [emm]  -> 'a'     |  [ 5,13,13] -> [1]
#  [mma]  -> '.'     |  [13,13, 1] -> [0]


X, Y = [], []

for w in words:

    context = [0] * WINDOW
    for ch in w + '.' :
        ix = stoi[ch]
        X.append(context)
        Y.append(ix)

        context = context[1:] + [ix] # update context

X = torch.tensor(X)
Y = torch.tensor(Y)

print(f'Example X[0]: \n{X[0]} -> {Y[0]}\n')
print(f'{X.shape=} \n{Y.shape=}')

Example X[0]: 
tensor([0, 0, 0]) -> 5

X.shape=torch.Size([228146, 3]) 
Y.shape=torch.Size([228146])


### Create simple Embeddings & Neural Network

In [19]:
g = torch.Generator().manual_seed(2147483647)

# BUILD EMBEDDINGS (aka vlook-up)
C = torch.randn((vocab_size,EMB_SIZE))                 # untrained

# BUILD HIDDEN LAYER
hidden_W = torch.randn((WINDOW*EMB_SIZE, HIDDEN_SIZE)) # untrained
hidden_b = torch.randn(HIDDEN_SIZE)                    # untrained

# BUILD OUT LAYER
out_W = torch.randn((HIDDEN_SIZE, vocab_size))         # untrained
out_b = torch.randn(vocab_size)                        # untrained

parameters = [C, hidden_W, hidden_b, out_W, out_b]
for p in parameters:
    p.requires_grad = True

In [26]:
for epoch in range(10*N_EPOCHS):

    # -------- Forward-pass ---------------

    # mini batch
    ixs = torch.randint(0, X.shape[0], (BATCH_SIZE,))

    # train embeddings
    emb = C[X[ixs]]                          # (N, 3, 2)
    emb = emb.view(-1, WINDOW*EMB_SIZE) #  concat by tokens -> (N , 6) 

    # train Hidden Layer
    h = torch.tanh(emb @ hidden_W + hidden_b)
    # train Output Layer
    logits = h @ out_W + out_b

    # counts = logits.exp()
    # prob = counts / counts.sum(1, keepdims=True)
    # loss = - prob[torch.arange(len(X)) , Y].log().mean()
    loss = F.cross_entropy(logits, Y[ixs])
    
    
    # training
    for p in parameters:
        p.grad = None
    loss.backward()

    #update
    for p in parameters:
        p.data += -LR * p.grad

print(f'{loss.data=}')

loss.data=tensor(2.5180)


**prob** : 
∀ contex os size WINDOW, prob ⇛ vocab distribution (of the next element)

The idea is to train it so that it matchs Y

In [28]:
sum(p.nelement() for p in [C, hidden_W, hidden_b, out_W, out_b])

3481