# Karpathy Notes

Notes from Andrej Karpathy's AI playlist.

## Intro to LLMs

[Video](https://www.youtube.com/watch?v=VMj-3S1tku0&list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ)

### What are LLMs?

LLMs essentially consist of 2 files:

```
llama-2-70b
- parameters (140GB)
- run.c (~500l of C code)
```

Let's break it down...

* **llamba-2-70b**: Llama model, V2, 70B params
* **parameters**: 2 bytes per param, so around 140GB total size, params of the neural net, fewer params mean faster model, but less accuract
* **run.c**: Runs the neural net using the above params

_Note: Latest models likely have 10X number of params_

### Stage 1: Pre-training

To get the paramters, we kind of "compress" the internet. Almost like creating a zip file of the internet.

```

| 10GB chunk of internet > 6000 GPUs, $2M, 12 days > 140GB parameter file.

```


This would only be done maybe once per year.

Now you have a neural network that can word prediction. Word prediction actually teaches us a lot about the world. E.g. given a birth date would could predict the death date based on the internet crawl data.

![nn.png](./imgs/nn.jpg)

#### Dreaming

Now we have a model that can "dream" internet documents. We feed it a word, it can predict the next work, and repeat. For example, we could "dream" an amazon review, wiki entry etc.

#### Transformer Architecture

What does the neural net look like?
It uses the transformer architecture.

![nn.png](./imgs/transformer.jpg)

* Billions of params dispersed throughout the network
* We know how to improve them
* But we don't really know how they collaborate together to do it

### Stage 2: Fine Tuning (building an assistant)

Instead of just creating dream docs, we want to build an assistant (i.e., be able to ask questions and get answers).

To do this, we use fine tuning.

We have a list of prompts or questions, and have humans fill in the answers. It's kind of like labeling questions with correct answers.

We then use the training process again on this curated, high-quality data set.

Now, when we ask questions to the model, it will reply using this question-answer format, but can still use data from pre-training.

The actual answer generation may be done in collaboration between humans and machines, where machines suggest answers or part of answers.

If the model has mistakes, they can be fed into this process.

### Stage 3: RLHF (Optional)

_RLHF: Reinforcement Learning with Human Feedback_

To further refine the model, we can show users a list of possible responses and have them pick the best one. This is useful for areas where it's difficult for humans to create an answer. E.g., write a haiku about X.

### Summary

_Every Year..._

__Pre-training__

1. Download internet crawl
2. Get a huge cluster of GPUs
3. Compress into parameters, pay $2M, wait 12 days

Obtained base model.

_Every Week..._

1. Write lableing instructions (for humans)
2. Hire users to write 100k ideal Q&A responses / rank comparisons
3. Finetune on this data

Obtained assistant model.

4. Run evaluations
5. Deploy
6. Collect misbehaviors, correct, feed into step 1 and repeat


### Multi-modality

Can interpret not just text, but videos, audio, etc.

E.g., can draw website and have GPT generate the actual working code.

### Custom GPTs

Create your own GPT. This is not fine-tuning, but rather you just provide it with:

1. Predefined instruction
2. Which tools it can access
3. Files that can be searched with RAG

### Future Direction

* LLM performance so far is a predictible function of
    * N: Num of params
    * D: Amount of text trained on
    
* So we can expect performance to improve
* Possible we may think of more _system 2_ ("slow") thinking
* Self-improvement, similar to RL
* LLM like a central kernel for problem solving -- reads text, browses the internet, calls the appropriate tools to problem solve, etc.