## Makemore Part 2: MLP
https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf

Idea from paper: C is a lookup table (matrix) for embeddings vector of each of the words in V e.g |V|=17k
- C: (17000x30)
- one-hot encoding of a word: (1x17000)
- enc @ C -> (1x30) embedding vector

We can get these vectors for multiple input words (words that came before).

Pass them to neurons with n_in=30, those neurons are fully connected to a hidden layer, then goes through tanh -> softmax for probabilities.

Final output: 17k vector of probabilities for the 17k words in vocab.
- Train by using actual next word's index, get the predicted prob, do -log(prob) and backprop etc.
---

We can use this idea for a character-level model as well.

## Building dataset
Hyperparameter: BLOCK_SIZE = 3
- block_size is the number of previous chars we consider when predicting next

For each word, we add to X with block_size=3 char windows, Y has the char that comes after
- e.g '.emma': (..., e), (..e, m), (.em, m), (emm, a), (mma, .)
- So each word contributes n+1 examples as before, n = len(word)

## Lookup table
Lookup table: C = (27,2) random init
- In paper, they compress 17k words of vocab into Rn of n=30
- So we do similar here for 27 unique chars -> embeddings of size 2

Previously, we used one-hot encoding to lookup with enc @ W
- But this is just the same as doing W[idx] due to all the zeroes

In PyTorch, we can just do C[X] and it will work
- produces (32,3,2) - one 2D vector for each encoded char
- Or another way to think about it, one (3x2) vector for each row in X. 3 because BLOCK_SIZE=3, so each row in X has 3 elements. For each of those chars, we want one 2D vector - its embedding




In [4]:
from collections import Counter
import torch
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt

In [12]:
names = []
with open('names.txt', 'r') as names_file:
  names = names_file.read().splitlines()

print(names[:10])
print(len(names))

# build lookups
uniq = ['.'] + sorted(list(set(''.join(names))))
stoi = { char: idx for idx, char in enumerate(uniq)}
itos = { idx: char for char,idx in stoi.items() }

print(stoi)
itos

['emma', 'olivia', 'ava', 'isabella', 'sophia', 'charlotte', 'mia', 'amelia', 'harper', 'evelyn']
32033
{'.': 0, 'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6, 'g': 7, 'h': 8, 'i': 9, 'j': 10, 'k': 11, 'l': 12, 'm': 13, 'n': 14, 'o': 15, 'p': 16, 'q': 17, 'r': 18, 's': 19, 't': 20, 'u': 21, 'v': 22, 'w': 23, 'x': 24, 'y': 25, 'z': 26}


{0: '.',
 1: 'a',
 2: 'b',
 3: 'c',
 4: 'd',
 5: 'e',
 6: 'f',
 7: 'g',
 8: 'h',
 9: 'i',
 10: 'j',
 11: 'k',
 12: 'l',
 13: 'm',
 14: 'n',
 15: 'o',
 16: 'p',
 17: 'q',
 18: 'r',
 19: 's',
 20: 't',
 21: 'u',
 22: 'v',
 23: 'w',
 24: 'x',
 25: 'y',
 26: 'z'}

## Build the dataset

In [18]:
BLOCK_SIZE = 3
X = []
Y = []

for word in names[:5]:
  print(word)
  word = word + '.'
  block = [0] * BLOCK_SIZE # ... (empty context at start)

  for char in word:
    char_idx = stoi[char]
    X.append(block)
    Y.append(char_idx)

    block_str = ''.join(list(map(lambda i: itos[i], block)))
    print(f'{block_str} -> {char}')

    # update block: roll over sliding window
    block = block[1:] + [char_idx]


X = torch.tensor(X)
Y = torch.tensor(Y)

emma
... -> e
..e -> m
.em -> m
emm -> a
mma -> .
olivia
... -> o
..o -> l
.ol -> i
oli -> v
liv -> i
ivi -> a
via -> .
ava
... -> a
..a -> v
.av -> a
ava -> .
isabella
... -> i
..i -> s
.is -> a
isa -> b
sab -> e
abe -> l
bel -> l
ell -> a
lla -> .
sophia
... -> s
..s -> o
.so -> p
sop -> h
oph -> i
phi -> a
hia -> .


In [20]:
X.shape, Y.shape, X.dtype, Y.dtype
print(X)
print("---")
print(Y)

tensor([[ 0,  0,  0],
        [ 0,  0,  5],
        [ 0,  5, 13],
        [ 5, 13, 13],
        [13, 13,  1],
        [ 0,  0,  0],
        [ 0,  0, 15],
        [ 0, 15, 12],
        [15, 12,  9],
        [12,  9, 22],
        [ 9, 22,  9],
        [22,  9,  1],
        [ 0,  0,  0],
        [ 0,  0,  1],
        [ 0,  1, 22],
        [ 1, 22,  1],
        [ 0,  0,  0],
        [ 0,  0,  9],
        [ 0,  9, 19],
        [ 9, 19,  1],
        [19,  1,  2],
        [ 1,  2,  5],
        [ 2,  5, 12],
        [ 5, 12, 12],
        [12, 12,  1],
        [ 0,  0,  0],
        [ 0,  0, 19],
        [ 0, 19, 15],
        [19, 15, 16],
        [15, 16,  8],
        [16,  8,  9],
        [ 8,  9,  1]])
---
tensor([ 5, 13, 13,  1,  0, 15, 12,  9, 22,  9,  1,  0,  1, 22,  1,  0,  9, 19,
         1,  2,  5, 12, 12,  1,  0, 19, 15, 16,  8,  9,  1,  0])


## Lookup table C, embeddings matrix

In [22]:
# Lookup table
C = torch.randn(27,2)
print(C, "\n------\n")

# Embedding
emb = C[X]

# For each size 3 block in X (each row), we get a (3x2) matrix which has all the 2D embeddings for the chars that make up the block
  # e.g X[2] = [0,5,13], so emb[2] is [C[0], C[5], C[13]] stacked vertically

print(emb[2])
example = torch.cat((C[0], C[5], C[13]), dim=0).view((3,2))
# same as emb[2]
print(example)


print("Embeddings:")
print(emb.shape)
emb

tensor([[ 1.6309,  0.2452],
        [-0.2122, -0.1653],
        [ 0.1938, -2.3019],
        [ 0.4438,  0.6229],
        [-1.7263,  0.6499],
        [ 0.6808,  0.8638],
        [-0.3586,  1.2320],
        [-1.4481,  1.0437],
        [-0.7733,  0.3859],
        [ 0.8189,  1.4886],
        [ 0.5716,  1.4620],
        [ 0.4114, -0.7313],
        [ 1.0447, -0.7916],
        [-1.4320, -0.8993],
        [-1.9004, -0.6209],
        [ 0.6658, -1.4881],
        [-0.3032, -0.0601],
        [-0.1473,  2.5582],
        [ 0.0379, -0.9372],
        [ 3.0048,  0.5230],
        [ 0.2714,  0.3456],
        [-1.3311, -0.4320],
        [ 0.3081,  1.5513],
        [ 2.2591, -1.9282],
        [ 2.1477,  1.6317],
        [ 0.0038,  0.7593],
        [-0.5328, -0.1106]]) 
------

tensor([[ 1.6309,  0.2452],
        [ 0.6808,  0.8638],
        [-1.4320, -0.8993]])
tensor([[ 1.6309,  0.2452],
        [ 0.6808,  0.8638],
        [-1.4320, -0.8993]])
Embeddings:
torch.Size([32, 3, 2])


tensor([[[ 1.6309,  0.2452],
         [ 1.6309,  0.2452],
         [ 1.6309,  0.2452]],

        [[ 1.6309,  0.2452],
         [ 1.6309,  0.2452],
         [ 0.6808,  0.8638]],

        [[ 1.6309,  0.2452],
         [ 0.6808,  0.8638],
         [-1.4320, -0.8993]],

        [[ 0.6808,  0.8638],
         [-1.4320, -0.8993],
         [-1.4320, -0.8993]],

        [[-1.4320, -0.8993],
         [-1.4320, -0.8993],
         [-0.2122, -0.1653]],

        [[ 1.6309,  0.2452],
         [ 1.6309,  0.2452],
         [ 1.6309,  0.2452]],

        [[ 1.6309,  0.2452],
         [ 1.6309,  0.2452],
         [ 0.6658, -1.4881]],

        [[ 1.6309,  0.2452],
         [ 0.6658, -1.4881],
         [ 1.0447, -0.7916]],

        [[ 0.6658, -1.4881],
         [ 1.0447, -0.7916],
         [ 0.8189,  1.4886]],

        [[ 1.0447, -0.7916],
         [ 0.8189,  1.4886],
         [ 0.3081,  1.5513]],

        [[ 0.8189,  1.4886],
         [ 0.3081,  1.5513],
         [ 0.8189,  1.4886]],

        [[ 0.3081,  1

## Different ways to flatten

In [58]:
# We want W such that we can do emb @ W

# with unbind + cat
  # unbind: removes a dimension and returns tuple of each slice
  # e.g dim=1, so we slice along 0,1,2 for emb
  # get: emb[:, 0, :], emb[:, 1, :], emb[:, 2, :], ...
# cat: concatenate
sliced = torch.unbind(emb, dim=1)
print(sliced)
cat_ver = torch.cat(sliced , dim=1)
print(cat_ver)


# 32x3x2
trf = torch.flatten(emb, 1, 2)
# trf

(tensor([[ 0.8411,  0.9071],
        [ 0.8411,  0.9071],
        [ 0.8411,  0.9071],
        [-0.0032, -0.0444],
        [-1.1311,  0.3155],
        [ 0.8411,  0.9071],
        [ 0.8411,  0.9071],
        [ 0.8411,  0.9071],
        [ 0.2686,  0.6108],
        [-0.0574,  0.2049],
        [ 2.2140, -0.5827],
        [-1.5016, -0.9905],
        [ 0.8411,  0.9071],
        [ 0.8411,  0.9071],
        [ 0.8411,  0.9071],
        [ 0.2061,  0.1933],
        [ 0.8411,  0.9071],
        [ 0.8411,  0.9071],
        [ 0.8411,  0.9071],
        [ 2.2140, -0.5827],
        [-1.1089,  0.6892],
        [ 0.2061,  0.1933],
        [ 0.5028, -1.1211],
        [-0.0032, -0.0444],
        [-0.0574,  0.2049],
        [ 0.8411,  0.9071],
        [ 0.8411,  0.9071],
        [ 0.8411,  0.9071],
        [-1.1089,  0.6892],
        [ 0.2686,  0.6108],
        [-0.0785, -0.0394],
        [-0.3244, -1.1376]]), tensor([[ 0.8411,  0.9071],
        [ 0.8411,  0.9071],
        [-0.0032, -0.0444],
        [-1.1311,

In [39]:
#emb.view(32,6) == trf

tensor([[True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, T

In [23]:
W = torch.randn((6,2))
trf @ W

tensor([[ 2.2529,  0.0614],
        [ 3.5035, -0.3159],
        [ 3.6840,  0.7607],
        [ 0.4116,  2.3404],
        [-6.9042,  2.6555],
        [ 2.2529,  0.0614],
        [ 2.3969,  0.0221],
        [ 2.2525, -0.3542],
        [ 4.8385,  0.6692],
        [-0.7144,  0.2851],
        [ 0.9197,  0.3874],
        [-1.5129,  0.2543],
        [ 2.2529,  0.0614],
        [ 1.3862,  0.3225],
        [ 1.6051, -0.4765],
        [ 3.2999,  0.1873],
        [ 2.2529,  0.0614],
        [ 2.0957,  0.1067],
        [ 2.5501,  0.1032],
        [ 0.4288,  0.8723],
        [-2.4951, -0.0287],
        [ 5.0560, -1.1624],
        [ 2.2715,  0.2917],
        [-0.9599,  1.2554],
        [-1.1757,  0.7717],
        [ 2.2529,  0.0614],
        [ 2.5738, -0.0386],
        [ 2.5847,  0.6766],
        [-1.5513,  0.0100],
        [ 3.0989, -0.1220],
        [ 4.8852, -0.0825],
        [-2.6018, -0.5155]])

In [67]:
t = torch.arange(6).view(3,2)
t.sum(dim=0, keepdim=True).shape

torch.Size([1, 2])