# Makemore - Part 1

Like previously mentioned, Makemore makes "more" of things you provide to it. Simple as that. This Jupyter notebook, is my first step in the journey in building **Makemore**. Let's do this!

Under the hood, Andrej said, Makemore is a **character-level language model**. It means that Makemore will model sequences of characters. In order words, Makemore tries to predict the next character, based on previous characters. Another way to put this, would be to say that Makemore tries to answer the following question:

> Based on the previous characters, what character is likely to come **next**?

To provide contrast, ChatGPT is a *token-level language model*. It attempts to predict the next token (i.e. words) based on the previous tokens.

Without further talking, let's start the building with loading the dataset `names.txt`.

## Loading the dataset

In this section of the notebook, I load in the dataset contained in `names.txt` in a string, split it to get individual words, then insert them in a Python list.

In [2]:
words = open('names.txt', 'r').read().splitlines()

And we can go ahead and display the first 10 element of the list. Just to see...

In [3]:
words[:10]

['emma',
 'olivia',
 'ava',
 'isabella',
 'sophia',
 'charlotte',
 'mia',
 'amelia',
 'harper',
 'evelyn']

## Exploring the dataset

We would like to learn more about this dataset, so let's go ahead and print out the total number of words in the dataset.

In [4]:
len(words)

32033

Let's print out the shortest and longuest words, in our dataset

In [5]:
min(len(w) for w in words)

2

In [6]:
max(len(w) for w in words)

15

Let's think through our character-level language model for a bit. Remember, its job is to predict the **next character**, given some already concrete sequence of characters before it. So, the existence of a single word in the dataset like `isabella`, Andrej said, tells us that:

- The character `i` is very likely to come first in a name ü§î

- The character `s` is likely to follow the character `i`

- The character `a` is likely to follow the sequence `is`

- The character `b` is likely to follow the sequence `isa`

... And so on

- There is also one last bit of information in the `isabella` word. It is that after all those letters have been predicted, the word is likely to be **finished** ü§∑üèæ‚Äç‚ôÇÔ∏è.

This is an example of information in terms of statistical structure of what is likely to follow that can be extracted from the character-sequence, `isabella`. And isabella is not our only example! We have 32,000 of them üòé.

So, our goal writing this program is capture the statistical structure in those 32,000 training examples.

And in this notebook, we are going to implement a **bigram** model to acheive the previously mentioned goal.

## A Bigram model

See, in a Bigram model we are only looking at **two characters at a time**. That's it, just ‚úåüèæ