## Homework: Open-Source LLMs

### Q1: Running Ollama with Docker
    * Let's run ollama with Docker. We will need to execute the same command as in the lectures:

```
docker run -it \
    --rm \
    -v ollama:/root/.ollama \
    -p 11434:11434 \
    --name ollama \
    ollama/ollama
```

**Q1. What's the version of ollama client?**

To find out, enter the container and execute ollama with the -v flag.

**Answer: ollama version is 0.1.48**

In [1]:
!ollama -v

ollama version is 0.1.48


### Q2. Downloading an LLM

We will donwload a smaller LLM - gemma:2b.

Again let's enter the container and pull the model:

```bash
docker exec -it ollama bash
ollama pull gemma:2b
```

In docker, it saved the results into `/root/.ollama`

We're interested in the metadata about this model. You can find
it in `models/manifests/registry.ollama.ai/library`

**Q2: What's the content of the file related to gemma?**

```
root@ca4c6d3a6f76:~/.ollama/models/manifests/registry.ollama.ai/library/gemma# cat 2b 
{"schemaVersion":2,"mediaType":"application/vnd.docker.distribution.manifest.v2+json","config":{"mediaType":"application/vnd.docker.container.image.v1+json","digest":"sha256:887433b89a901c156f7e6944442f3c9e57f3c55d6ed52042cbb7303aea994290","size":483},"layers":[{"mediaType":"application/vnd.ollama.image.model","digest":"sha256:c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12","size":1678447520},{"mediaType":"application/vnd.ollama.image.license","digest":"sha256:097a36493f718248845233af1d3fefe7a303f864fae13bc31a3a9704229378ca","size":8433},{"mediaType":"application/vnd.ollama.image.template","digest":"sha256:109037bec39c0becc8221222ae23557559bc594290945a2c4221ab4f303b8871","size":136},{"mediaType":"application/vnd.ollama.image.params","digest":"sha256:22a838ceb7fb22755a3b0ae9b4eadde629d19be1f651f73efb8c6b4e2cd0eea0","size":84}]}root@ca4c6d3a6f76:~/.ollama/models/manifests/registry.ollama.ai/library/gemma#
```

### Q3. Running the LLM

Test the following prompt: "10 * 10". What's the answer?

**Answer:**
```
The model is requesting a calculation of 10 multiplied by 10.

Sure, here is the calculation:

10 * 10 = 100

Therefore, the answer is 100.
```

In [2]:
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',
)

In [3]:
def llm(prompt):
    response = client.chat.completions.create(
        # Set model to gemma:2b running on local ollama
        model='gemma:2b',
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.choices[0].message.content

In [6]:
llm("10 * 10")

'The answer is 100.\n\nThe problem is 10 * 10'

In [7]:
print(_)

The answer is 100.

The problem is 10 * 10


### Q4. Donwloading the weights

We don't want to pull the weights every time we run
a docker container. Let's do it once and have them available
every time we start a container.

First, we will need to change how we run the container.

Instead of mapping the `/root/.ollama` folder to a named volume,
let's map it to a local directory:

```bash
mkdir ollama_files

docker run -it \
    --rm \
    -v ./ollama_files:/root/.ollama \
    -p 11434:11434 \
    --name ollama \
    ollama/ollama
```

Now pull the model:

```bash
docker exec -it ollama ollama pull gemma:2b
```

What's the size of the `ollama_files/models` folder?

* 0.6G
* 1.2G
* 1.7G
* 2.2G

Hint: on linux, you can use `du -h` for that.

**Answer: 1.6G**

### Q5. Adding the weights

Let's now stop the container and add the weights
to a new image

For that, let's create a `Dockerfile`:

```dockerfile
FROM ollama/ollama

COPY ...
```

What do you put after `COPY`?

**Answer: COPY ollama_files /root/.ollama**

### Q6. Serving it

Let's build it:

```bash
docker build -t ollama-gemma2b .
```

And run it:

```bash
docker run -it --rm -p 11434:11434 ollama-gemma2b
```

We can connect to it using the OpenAI client

Let's test it with the following prompt:

```python
prompt = "What's the formula for energy?"
```

Also, to make results reproducible, set the `temperature` parameter to 0:

```bash
response = client.chat.completions.create(
    #...
    temperature=0.0
)
```

How many completion tokens did you get in response?

* 304
* 604
* 904
* 1204

**Answer: 304**

In [12]:
prompt = "What's the formula for energy?"

In [13]:
def llm(prompt):
    response = client.chat.completions.create(
        # Set model to gemma:2b running on local ollama
        model='gemma:2b',
        messages=[{"role": "user", "content": prompt}],
        temperature=0.0
    )
    
    return response.choices[0].message.content

In [14]:
answer = llm(prompt)

In [16]:
answer

"Sure, here's the formula for energy:\n\n**E = K + U**\n\nWhere:\n\n* **E** is the energy in joules (J)\n* **K** is the kinetic energy in joules (J)\n* **U** is the potential energy in joules (J)\n\n**Kinetic energy (K)** is the energy an object possesses when it moves or is in motion. It is calculated as half the product of an object's mass (m) and its velocity (v) squared:\n\n**K = 1/2 * m * v^2**\n\n**Potential energy (U)** is the energy an object possesses when it is in a position or has a specific configuration. It is calculated as the product of an object's mass and the gravitational constant (g) multiplied by the height or distance of the object from a reference point.\n\n**Gravitational potential energy (U)** is given by the formula:\n\n**U = mgh**\n\nWhere:\n\n* **m** is the mass of the object in kilograms (kg)\n* **g** is the acceleration due to gravity in meters per second squared (m/s^2)\n* **h** is the height or distance of the object in meters (m)\n\nThe formula for energ

In [17]:
import os
from huggingface_hub import login

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
# login(token=os.environ['HF_TOKEN'])

In [18]:
from transformers import AutoTokenizer, AutoModelForCausalLM

encoding = AutoTokenizer.from_pretrained("google/gemma-2b")
tokens = encoding.encode(answer)
# model = AutoModelForCausalLM.from_pretrained("google/gemma-2b")

# answer_ids = tokenizer(answer, return_tensors="pt")

# outputs = model.generate(**answer_ids)
# print(tokenizer.decode(outputs[0]))


# Print the number of tokens
num_out_tokens = len(tokens)
print(num_out_tokens)

304
