<h1> Introduction to minGPT - A PyTorch re-implementation of GPT </h1>

MinGPT is a PyTorch re-implementation of GPT, a highly successful language modeling framework developed by OpenAI. Created by Andrej Karpathy, minGPT is a lightweight and efficient implementation of GPT, designed to be easy to use and highly customizable. With its modular architecture and flexible design, minGPT is a powerful tool for researchers and practitioners working in natural language processing and related fields. And don't worry, despite its name, minGPT is not small-minded! ;) 
<br/> This assignment will introduce you to the basics of minGPT and how to use it for language modeling tasks.




OpenAI’s generative pre-trained transformer (GPT) was first introduced in ”Improving Language
Understanding by Generative Pre-Training” [1], and has since then developed into being one of the
most topical models within the field of ML, and a common conversational topic on a global scale.
Due to the significant societal impact that this model already has a lot of questions have been
raised in the aftermath of its public release. This projects aim to make a homework assignment on
the implementation of the model architecture and word embedding with an additional twist where
we want to provoke students to make their own reflections on the societal impact of the model.

<i> [1] Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Improving language
understanding by generative pre-training. 2018

<h3> GPT (current) Gold Standard

Before we dive into minGPT, let's look at the current gold-standard of GPT models, GPT-3.5. While there are several use cases of GPT, let's look at one of the most commonly used one, ChatGPT! ChatGPT is a large language model based on the GPT-3.5 architecture, trained by OpenAI to generate human-like text. It is designed to respond to human prompts with natural language responses, making it useful for a variety of applications such as chatbots, automated content generation, and more. Under the hood, ChatGPT uses a deep neural network to learn from massive amounts of data, allowing it to generate text that is coherent and contextually relevant
<br/> You can interact with ChatGPT here: http://chat.openai.com (sign up using Google and your Berkeley account).
<br/>
Have a short conversation with ChatGPT about Transformers, GPT models, and the basic workings of ChatGPT. Limit your conversation to three questions. Answer the following questions: 
1. What questions did you ask?
2. In 2-3 sentences, what did you learn from the conversation?
3. Were you satisfied with the responses? On a scale of 1-5, rate the conversation you had with ChatGPT.

Now that you have interacted with ChatGPT, let's work on a much more simple and scaled-down version, minGPT! In this assignment, we will train minGPT to be a character-level language model on some arbitrary text input

<h4> Train a character-level GPT on some text data

In [None]:
# First, some imports!
import sys
!{sys.executable} -m pip install flax
!{sys.executable} -m pip install optax
!{sys.executable} -m pip install jax

import jax
import jax.numpy as jnp
import haiku as hk
from functools import partial
import torch
from torch.utils.data import Dataset
import numpy as np
np.random.seed(182)

from train import trainer, train_config
import model

<h3> 1. Attention is all we need!

In this section, you are going to implement the causal self attention for (min) GPT! You will implmement the code in the `model/model.py` file. Read the instructions in the docstring and then fill in the code in the places that says `#YOUR CODE HERE`.

In [2]:
# some tests here

<h3> 2. MLP

The multi-layer perceptron (MLP) in GPT, also known as the feedforward network, is an essential component that helps to improve the model's ability to learn from sequential data. While the self-attention mechanism in GPT allows the model to attend to different parts of the input sequence, the MLP is responsible for processing and transforming the attended features before they are fed into the next layer. This additional non-linearity helps to capture more complex patterns and dependencies between the input tokens, leading to better performance on a wide range of language modeling tasks.
<br/> In this section, you are going to implement the MLP for (min) GPT! You will implmement the code in the `model/model.py` file. Read the instructions in the docstring and then fill in the code in the places that says `#YOUR CODE HERE`.
<br/> HINT: Read the documentation [here](https://flax.readthedocs.io/en/latest/api_reference/_autosummary/flax.linen.Dense.html)