# Intro To Full Stack LLM Development

## Workshop Goals
Our goal is to give you an understanding of:
- The basic components that go into building your own ChatGPT-like app
- The basic workflow of prototyping an AI app in JupyterLab
- How to convert a JupyterLab prototype into an app that can run on a web server
- How to run your web server app on your computer

In addition, this is a good place to ask more general questions about AI development.

## If You Get Stuck
Just get someone's attention for help.

## Task 1: Run A Shell Command
Lines of code that start with a `%` in a Jupyter Notebook execute a [magic command](https://ipython.readthedocs.io/en/stable/interactive/magics.html). In the next cell, the magic command executes a bash command to show you what system you are running on.

Run the code cell below by selecting it and using one of the following methods:
* `shift + enter`: Run and move to the next cell
* `ctrl + enter`: Run the cell without moving to the next cell
* Press the run button in the toolbar above

In [None]:
%%bash
cat /etc/os-release

As seen above, Google Colab executes your code on Ubuntu Linux, currently the global standard for industrial-grade computation.

## Task 2: Install Libraries
### Context
When you use a Jupyter Notebook for the first time, you will need to install the third-party libraries required to develop your software. Google Colab discards your computing environment after about 90 minutes of inactivity. So, in Colab, you must rerun your entire notebook, including library installation, whenever you reopen it.

We are installing the following libraries:
- `huggingface_hub`: The API we will use to access a hosted version of Falcon LLM
- `dotenv`: Allows you to store your Hugging Face token securely
- `langchain`: A popular library for making LLMs easier to use

### How To Install The Libraries
1. Option 1: Click the cell and press `shift + enter` to execute and move to the next cell
2. Option 2: Click the run button in the jupyterhub toolbar above

In [None]:
%pip install langchain huggingface_hub python-dotenv chainlit

## Task 3: Get Hugging Face Access Token
### Context
To use an API, you typically need to generate a "password" for your code to log in to your account. This "password" is called an "Access Token".

The canonical way to manage this password is to store it in a file called `.env`. 

Why not simply paste the password in your code? Primarily because it's easy to forget that password is there and then accidentally share the code on Github or elsewhere. Then, a hacker will quickly compromise your account using an automated scraper.

### Instructions: Hugging Face Access Token
First, run the `bash` command below to create a `.env` file.

In [None]:
%%bash
touch .env

Next, create an access token on your Hugging Face account:
1. Create an account at https://huggingface.co
2. Go to https://huggingface.co/settings/token
3. Click on the "New Token" button
4. For **Name** put intro-to-full-stack-llm-token. For **Role**, put Read.
5. Click "Generate Token"
6. Copy the token to your clipboard
7. Paste the token below, replacing `your_hugging_face_token`
8. Run the cell

In [None]:
%%bash
echo "HUGGINGFACE_API_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" > .env

cat .env

If you were successful, you should see something like `HUGGINGFACE_API_TOKEN=hf_xxxxxxxxxxxxx` in the output above.

## Task 4: Load Hugging Face Token Into Memory

To use the access token, we need to load it from our `.env` file on disk into memory. Run the cell below:

In [None]:
import os
from dotenv import load_dotenv

load_dotenv()

token=os.getenv('HUGGINGFACE_API_TOKEN')
print("Your access token: " + token)

The output from the cell above should look like `Your access token: hf_xxxxxxxxxxxxx`.

## Task 5: Setup Hosted LLM
### Context
The model we will use today is [falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct).

**Falcon** is what TII of the UAE government named this model, fittingly because they like falcons. The Falcon model was the highest performing LLM on the [Hugging Face LLM leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). It has since been surpassed by Facebook's Llama 2 model and others.

**7b** refers to the number of parameters that the model has. A more resource-intensive but powerful model called `falcon-40b` has 40b parameters. So, the model we will use today is not nearly as good as ChatGPT. Why use it, then? Because this smaller model can be hosted on Hugging Face's servers for free. We'd need to pay for server time if we used a more powerful model like `falcon-40b` or `llama 2`. It's not very expensive, but it would have added more steps to this tutorial.

**instruct** refers to the fact that the LLM is "[instruction fine-tuned](https://arxiv.org/pdf/2109.01652.pdf)," which means that the model has been specially fine-tuned to have the UX of a human assistant. Non-instruction fine-tuned LLMs are more challenging to use.

### Instructions
Run the code cell below.

Read the comments explaining what each line does. Ask someone if something doesn't make sense.

In [None]:
from langchain import HuggingFaceHub  # Allows us to use LLMs from Hugging Face Hub

# We set up an LLM for use in the next task
llm = HuggingFaceHub(
    huggingfacehub_api_token=token,      # Your "password"
    repo_id="tiiuae/falcon-7b-instruct", # The LLM we chose
    model_kwargs={
        "temperature":0.7,   # Adjusts how "creative" the model will be
        "max_new_tokens":200 # Maximum number of tokens the model will output
    }
)

print(llm)

## Task 6: Prompt Falcon-7B
### Context
Now, you can prompt Falcon-7B from the Python interpreter, similar to how you prompt ChatGPT from its GUI interface.

### Instructions
1. Run the code below
2. Understand the comments explaining the code

In [None]:
# PromptTemplate is a feature of LangChain for defining reusable prompts
#
# LLMChain is called "chain" because LangChain started as a
# library for "chaining together" multiple successive LLM calls
from langchain import LLMChain, PromptTemplate

template = """
You are a helpful assistant.

{question}
"""

prompt = PromptTemplate(template=template, input_variables= ["question"])
llm_chain = LLMChain(prompt=prompt, llm=llm)

question = """
Give me step by step instructions to cook pasta
"""

response = llm_chain.run(question) # Ask falcon-7b our question
print(response)

Hopefully, the output above looks reasonable. One quirk we have noticed is that `falcon-7b-instruct` tends to leave an extra token, "User" at the end of its responses. The extraneous token may be a shortcoming of its instruction fine-tuning process, which may have had many prompts with the format:  
```
User: {question the user asks}
Assistant: {what falcon-7b-instruct response with}
User:
```

This quirk hints at how simplistic LLMs truly are. They simply keep parroting what they expect should come next.

## Task 7: Make Another Prompt
### Instructions
1. Run the code below
2. Modify the code to make any other queries
3. Compare the performance against ChatGPT in a separate window

In [None]:
question = """
What is the capital of British Columbia?
"""

completion = llm_chain.run(question)
print(completion)

## Task 9: Add Hugging Face Access Token To Your Personal Computer

In the next task, you will run the Python code from your personal computer. So, you will need to create a `.env` file with your access token there too.

#### Instructions
Do the following on your personal computer. Use the instructions for your operating system.

#### Windows
1. Open a terminal by searching for "cmd" in the Start Menu
2. Change directory to the `workshops` directory
```
cd %USERPROFILE%/src/workshops
```
3. Create a `.env` file
```
echo.> .env
```
4. Open `.env` in Notepad
```
notepad .env
```
5. Paste your Hugging Face access token from earlier in the notebook in the file. Save the file and exit.

#### Mac and Ubuntu Linux
1. Open a terminal
2. Change directory to the `workshops` directory
```
cd ~/src/workshops
```
3. Create a `.env` file
```
touch .env
```
4. Save your Hugging Face access token from earlier in the notebook in the file. Replace the x's with your real token.
```
echo "HUGGINGFACE_API_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" > .env
```

## Task 10: Open The Chainlit UI Server File

We will add a GUI to `falcon-7b-instruct` so that we can use it like ChatGPT's website.

We will use a web UI library called [Chainlit](https://docs.chainlit.io/overview) to accomplish this. Chainlit describes itself as follows:
> Chainlit is an open-source Python package that makes it incredibly fast to build and share LLM apps. Integrate the Chainlit API in your existing code to spawn a ChatGPT-like interface in minutes!

That sounds like exactly what we want!

We can't run the Chainlit web server from Colab, so we will walk through the whole file first, and then we will run it on your personal computer.

### Instructions
1. Navigate to your `~/src/workshops` folder using your file browser.
2. Open `chainlit-falcon-7b-server.py` in your text editor.

### Code Walkthrough
This is the shebang. It's used to tell your operating system what script interpreter to use to run this file.
```
#!/usr/bin/env python3
```

Next are the imports. These usually go at the top of your program files in Python or any other language. The standard Python library imports are for libraries included with python itself, aka "first party". The other imports are libraries provided by third-party organizations like Hugging Face.
```
# Import standard Python library
import os

# Import third party libraries
from dotenv import load_dotenv
from langchain import HuggingFaceHub, LLMChain, PromptTemplate
import chainlit as cl
```

This loads your Hugging Face access token. This is the exact same code you used before in this Jupyter Notebook.
```
# Load the environment variable for the Hugging Face token
load_dotenv()
token = os.getenv('HUGGINGFACE_API_TOKEN')
```

This sets up your LLM to use on Hugging Face's servers. This is the exact same code you used before in this Jupyter Notebook.
```
# Set up the Hugging Face LLM with Falcon-7B-Instruct
llm = HuggingFaceHub(
    huggingfacehub_api_token=token,
    repo_id="tiiuae/falcon-7b-instruct",
    model_kwargs={
        "temperature": 0.7,
        "max_new_tokens": 200
    }
)

# Define the prompt template for the Chainlit UI
template = """
You are a helpful assistant.

{question}
"""
```

This is the first piece of new code from Chainlit UI. `on_chat_start` means that this is the code that runs when the chat window is first loaded by the user. If the code looks cryptic, it's because it is. Usually you'll get some basic code like this from the third-party developer and you'll modify it, so it's not important to memorize how to write code like this.
```
# Define the chat start event for Chainlit UI
@cl.on_chat_start
def on_chat_start():
    prompt = PromptTemplate(template=template, input_variables=["question"])
    llm_chain = LLMChain(prompt=prompt, llm=llm, verbose=True)
    cl.user_session.set("llm_chain", llm_chain)
```

This code defines what the Chainlit UI does when a user sends a message using its UI. It does the following:
1. Awaits the user sending it a message like "What is the capital of British Columbia?"
2. Sends that message to the Hugging Face server hosting `falcon-7b-instruct`
3. Awaits the Hugging Face server's response
4. Prints that response for the user
```
# Define the message received event for Chainlit UI
@cl.on_message
async def on_message(message: str):
    llm_chain = cl.user_session.get("llm_chain")  # Retrieve the chain from the session
    res = await llm_chain.acall(message, callbacks=[cl.AsyncLangchainCallbackHandler()])
    
    # The response is sent back to the user
    await cl.Message(content=res["text"]).send()
```

Finally, this is a standard snippet of code included in many Python scripts. It simply tells the script what to do if you call it from the command line. You don't need to worry about this snippet for this workshop, but if you're interested, here are [the docs from Python Foundation](https://docs.python.org/3/library/__main__.html#idiomatic-usage).
```
if __name__ == "__main__":
    # Chainlit does not require an explicit call to start the server, but you can add other initialization logic here if needed
    pass
```

## Task 11: Run The Web Server
All that's left is to run the web server and use the code!

### Instructions
#### Windows
1. Open a terminal by searching for "cmd" in the Start Menu
2. Change directory to the `workshops` directory
```
cd %USERPROFILE%/src/workshops
```
3. Install third-party libraries using `requirements.txt`
```
pip install -r requirements.txt
```
4. Run `chainlit-falcon-7b-server.py`
```
chainlit run ./chainlit-falcon-7b-server.py
```

#### Mac
1. Open a terminal
2. Change directory to the `workshops` directory
```
cd ~/src/workshops
```
3. Install third-party libraries using `requirements.txt`
```
pip3 install -r requirements.txt
```
4. Run `chainlit-falcon-7b-server.py`
```
chainlit run ./chainlit-falcon-7b-server.py
```

#### Ubuntu
1. Open a terminal
2. Change directory to the `workshops` directory
```
cd ~/src/workshops
```
3. Create and activate a `venv`
```
python3 -m venv venv
source venv/bin/activate
```
3. Install third-party libraries using `requirements.txt`
```
pip install -r requirements.txt
```
4. Run `chainlit-falcon-7b-server.py`
```
chainlit run ./chainlit-falcon-7b-server.py
```