# Character-Level MLP Model

![Status](https://img.shields.io/static/v1.svg?label=Status&message=Finished&color=brightgreen)
[![Source](https://img.shields.io/static/v1.svg?label=GitHub&message=Source&color=181717&logo=GitHub)](https://github.com/particle1331/inefficient-networks/blob/master/docs/notebooks/tensorflow/05-tensorflow-cnn.ipynb)
[![Stars](https://img.shields.io/github/stars/particle1331/inefficient-networks?style=social)](https://github.com/particle1331/inefficient-networks)

---

## Introduction

This notebook is based on [this tutorial](https://www.youtube.com/watch?v=TCH_1BHY58I) by [Andrej Karpathy](https://karpathy.ai/) on language modeling. In this tutorial, we implement a multilayer perceptron (MLP) character-level language model. This also introduces many basics of machine learning (e.g. model training, learning rate tuning, hyperparameters, evaluation, train/dev/test splits, under/overfitting, etc.).


**Readings**

* 

Exercises:
- E01: Tune the hyperparameters of the training to beat my best validation loss of 2.2
- E02: I was not careful with the intialization of the network in this video. (1) What is the loss you'd get if the predicted probabilities at initialization were perfectly uniform? What loss do we achieve? (2) Can you tune the initialization to get a starting loss that is much more similar to (1)?
- E03: Read the Bengio et al 2003 paper (link above), implement and try any idea from the paper. Did it work?



Since context is length 1, the generated names are pretty bad. We saw that using  If we use `k` characters as context, $27^k$ rows. Too few data to train this. 

We move to a new model MLP to predict next character. Bengio 2013 paper. 



In [1]:
import math
import warnings
import matplotlib.pyplot as plt
from pathlib import Path
from matplotlib_inline import backend_inline

DATASET_DIR = Path("./data").absolute()
RANDOM_SEED = 42

warnings.simplefilter(action="ignore")
backend_inline.set_matplotlib_formats('svg')

In [2]:
names = open(DATASET_DIR / 'names.txt', 'r').read().splitlines()
print(len(names))
names[:10]

32033


['emma',
 'olivia',
 'ava',
 'isabella',
 'sophia',
 'charlotte',
 'mia',
 'amelia',
 'harper',
 'evelyn']

In [3]:
from itertools import product
import functools

chars = ['.'] + sorted(list(set(''.join(names))))
itos = dict(enumerate(chars))
stoi = {c: i for i, c in itos.items()}

# @functools.lru_cache()
# def itox(context_size):
#     return dict(enumerate(map(lambda z: ''.join(z), product(stoi.keys(), repeat=context_size))))

# @functools.lru_cache()
# def xtoi(context_size):
#     return {x: i for i, x in itox(context_size).items()}

In [4]:
def build_dataset(context_size=1):
    """Creating subsequences -> next character target."""
    xs = []
    ys = []
    for i, name in enumerate(names):
        context = ['.'] * context_size
        for c in name + '.':
            xs.append([stoi[c] for c in context])
            ys.append(stoi[c])
            context = context[1:] + [c]
    return xs, ys

In [5]:
xs, ys =  build_dataset(3)

In [8]:
import pandas as pd

xs, ys = build_dataset(context_size=3)

df = pd.DataFrame({'xs': xs, 'ys': ys})
df['seq'] = df['xs'].apply(lambda x: ''.join(itos[c] for c in x))
df['target'] = df['ys'].map(itos)
df.head(12)

Unnamed: 0,xs,ys,seq,target
0,"[0, 0, 0]",5,...,e
1,"[0, 0, 5]",13,..e,m
2,"[0, 5, 13]",13,.em,m
3,"[5, 13, 13]",1,emm,a
4,"[13, 13, 1]",0,mma,.
5,"[0, 0, 0]",15,...,o
6,"[0, 0, 15]",12,..o,l
7,"[0, 15, 12]",9,.ol,i
8,"[15, 12, 9]",22,oli,v
9,"[12, 9, 22]",9,liv,i


Embed 27 letters to 2d

In [9]:
import torch
C = torch.randn(27, 2)