# Using LLMs in Google Colab

This notebook demonstrates how you can use LLMs programmatically within a Jupyter notebook environment on Google Colab.

While you can easily use LLMs via their apps or website chat interfaces, it's also possible to programmatically pass prompts to these models and capture their responses. You can also download the actual LLM to your environment and pass prompts to it directly (i.e. you are not passing information to a model hosted by Google or OpenAI or Meta) There are some advantages to doing this:

* Access LLMs directly within your coding environment
* Run LLMs on your own hardware or within a personal environment, this is useful when you are working with private data that you cannot share or upload to hosted LLMs
* Demonstrates how you can build custom apps and workflows on top of LLM technologies

To use LLMs within Jupyter notebooks on Google Colab, there are two initial setup steps:

1. Create a <a href="" target="_blank">Hugging Face</a> account and add your account token to the Google Colab Secrets. Then restart your Google Colab session.
2. Ensure that you are using the GPU Google Colab runtime.

## Hugging Face

<a href="https://huggingface.co/huggingface" target="_blank">Hugging Face</a> are a machine learning and AI technology company that host trained models and develop software tools for working with AI models.

### Hugging Face accounts and tokens

To use many of the models on Hugging Face you need to create an account and store your accounts API key in Google Colab's secrets. This lets you make authenticated calls to Hugging Face.

### Hugging Face pipelines

A <a href="https://huggingface.co/docs/transformers/en/pipeline_tutorial" target="_blank">pipeline</a> is a Python object that provides a wrapper around the code required to use LLMs and other AI tools. In short, pipelines take away a lot of the work of setting up AI models within your Python environment and the associated data processing. You simply create a `pipeline` object, pass your prompt in, and `print()` the response.

## Setup

### Setup Hugging Face token

1. Create a Hugging Face account:

<img src="https://github.com/geog3300-agri3003/coursebook/raw/main/docs/img/hf-1-annotated.png" alt="HF account" width="75%">

Once, you have created a Hugging Face account you can gain access to models. Many LLMs require you accept terms and conditions. For this exercise we will be working with Google's Gemma 2 2b it model (2b represents 2 billion parameters and it represents instruction tuned). Agree to the model's terms and conditions and usage policy <a href="https://huggingface.co/google/gemma-2-2b-it" target="_blank">here</a>.

2. Create an *Access Token*:

<img src="https://github.com/geog3300-agri3003/coursebook/raw/main/docs/img/hf-2-annotated.png" alt="HF account" width="75%">

3. Click on *Create new token*

<img src="https://github.com/geog3300-agri3003/coursebook/raw/main/docs/img/hf-3-annotated.png" alt="HF account" width="75%">

4. Set the token permissions

You can initially set token permissions to *Read*, which has read access to all your resources. This is a good option for getting started.

<img src="https://github.com/geog3300-agri3003/coursebook/raw/main/docs/img/hf-4-read-annotated.png" alt="HF account" width="75%">

However, as you start developing resources such as models and datasets and using Hugging Face in different environments, it's a good idea to create access tokens with fine-grained permissions with just enough permissions to complete tasks associated with the token.

<img src="https://github.com/geog3300-agri3003/coursebook/raw/main/docs/img/hf-4-annotated.png" alt="HF account" width="75%">

If you have selected fine-grained permissions, you will need to add repositories (models) that you want that token to grant permission to.

<img src="https://github.com/geog3300-agri3003/coursebook/raw/main/docs/img/hf-5-annotated.png" alt="HF account" width="75%">

6. Click *Create token* to generate the access token.

<img src="https://github.com/geog3300-agri3003/coursebook/raw/main/docs/img/hf-6-annotated.png" alt="HF account" width="75%">

7. Copy the access token. **This is your only opportunity to do this - keep a record of the token (in a secure location)**. If you lose your token, it's easy to generate a new one.

<img src="https://github.com/geog3300-agri3003/coursebook/raw/main/docs/img/hf-7-annotated.png" alt="HF account" width="75%">

8. In Google Colab, click on the *key* icon in the left-hand sidebar.

<img src="https://github.com/geog3300-agri3003/coursebook/raw/main/docs/img/hf-8-annotated.png" alt="HF account" width="75%">

9. Add your Hugging Face access token with the name `HF_TOKEN` and make sure the Notebook access it checked. **Restart your Google Colab session to load the token into your environment**.

<img src="https://github.com/geog3300-agri3003/coursebook/raw/main/docs/img/hf-9-annotated.png" alt="HF account" width="50%">

### Setup runtime

**This lab will only work (quickly) using Google Colab with a *T4 GPU* runtime type.**

**Before running any code, set the runtime type to *T4 GPU*.**

<img src="https://github.com/geog3300-agri3003/coursebook/raw/main/docs/img/week-1-colab-runtime-1.jpg" alt="colab runtime menu" width="50%">

<p></p>

<img src="https://github.com/geog3300-agri3003/coursebook/raw/main/docs/img/week-1-colab-runtime-2.jpg" alt="colab runtime menu" width="50%">

### Install packages

In [1]:
import torch
from transformers import pipeline

### Using `pipeline` objects

To create a `pipeline` object call the `pipeline()` instantiator function as pass in the task (`"text-generation"` here), the model (`"google/gemma-2-2b-it"`), and some additional arguments. Set the `device_map` argument to `"auto"` to run the model on Colab's GPU when it is available.

This will download the model to your Colab environment, which will take a moment or two.

In [None]:
pipe = pipeline(
    "text-generation",
    model="google/gemma-2-2b-it",
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

You can pass in a list of prompts to `pipe` and specify the number of tokens you want generated (`max_new_tokens`).

Each prompt is a dictionary object with keys for `"role"` and `"content"`. The text values associated with the `"content"` key is the prompt provided to the model.

`pipe` returns a list of dictionary objects storing the conversation between the user and the AI assistant.

In [None]:
prompts = [
    {"role": "user", "content": "Can you generate a short Python program that computes the distance between two points?"},
]

outputs = pipe(prompts, max_new_tokens=512)
print(outputs)

It's a little fiddly to extract the generated response, but it's a good exercise to practice extracting information from lists and dictionary objects:

1. subset the first element in the list object `outputs[0]` (remember Python indexes from 0)
2. get the data in the dict under the `"generated_text"` key
3. get the last element in that list (which stores the user-assistant conversation); `-1` means index the last element
4. subset the text stored against the `"content"` key

Work through this to make sure you understand the different ways of subsetting data from a list (subsetting by index position) and a dict (by key).

Calling `strip()` on the string text data removes trailing and leading whitespace.

Finally, `print()` the response from the AI assistant.

In [None]:
response = outputs[0]["generated_text"][-1]["content"].strip()
print(response)

#### Recap quiz

<b>Understanding a Python program, can you identify:
    <ul>
        <li>What is the purpose of the program?</li>
        <li>What input data does the program require?</li>
        <li>What data type is the input data?</li>
        <li>What user defined functions does the program use?</li>
        <li>What built in operations does the program use?</li>
        <li>What data type is the data returned by the function?</li>
    </ul>
</b>

We can copy and paste the Python code out of the printed response by the LLM and execute it (the Python code is contained within ``` marks).

In [None]:
## NOTE! If you run the code above different Python code could be generated as LLMs are probabilistic.

import math

def distance(x1, y1, x2, y2):
  """Calculates the distance between two points.

  Args:
    x1: The x-coordinate of the first point.
    y1: The y-coordinate of the first point.
    x2: The x-coordinate of the second point.
    y2: The y-coordinate of the second point.

  Returns:
    The distance between the two points.
  """
  return math.sqrt((x2 - x1)**2 + (y2 - y1)**2)

# Example usage
x1 = 1
y1 = 2
x2 = 4
y2 = 6

distance_between_points = distance(x1, y1, x2, y2)
print(f"The distance between the points ({x1}, {y1}) and ({x2}, {y2}) is: {distance_between_points}")

LLMs are helpful as assistants when trying to solve more complex coding or spatial analysis tasks.

#### Activities

</b>
Create prompts and generate responses for each of the following tasks. Use the code snippets above as an example of how to use <code>pipeline</code> objects.

* Computing the distance between two points on a sphere.
* Computing the shortest line that joins three points.
* Computing the length of all the edges of a polygon.
</b>

<details>
    <summary><b>How can you verify the response from the LLM is correct?</b></summary>
<ul>
<li>Ask the LLM to generate test examples (check the tests are correct).</li>
<li>Look at how the LLM has solved the problem and compare these solutions with what you find from independent research.</li>
<li>Generate a suite of cases where you know the correct answer and compare these ground truth values with what is returned when executing the LLM generated solution.</li>
</ul>
</details>

<details>
    <summary><b>Example prompts with code snippets (note, you might need to change the number of tokens you want generated to capture the full response).</b></summary>

```python
prompts = [
    {"role": "user", "content": "Can you generate a short Python program that computes the distance between two points on a sphere? The points are represented as latitude and longitude pairs."},
]

outputs = pipe(prompts, max_new_tokens=512)

response = outputs[0]["generated_text"][-1]["content"].strip()
print(response)
```

```python
prompts = [
    {"role": "user", "content": "Can you generate a short Python program that computes the shortest line between three points?"},
]

outputs = pipe(prompts, max_new_tokens=1024)

response = outputs[0]["generated_text"][-1]["content"].strip()
print(response)
```

```python
prompts = [
    {"role": "user", "content": "Can you generate a short Python program that computes the length of all the edges of a Polygon?"},
]

outputs = pipe(prompts, max_new_tokens=1024)

response = outputs[0]["generated_text"][-1]["content"].strip()
print(response)
```

</details>