# Save & load LLMs locally

Use the Ollama platform to run LLMs locally. Start Ollama with this command:

```bash
docker run -it \
    --rm \
    -v ollama:/root/.ollama \
    -p 11434:11434 \
    --name ollama \
    ollama/ollama
```

Use this command to enter the bash of the docker image

```bash
docker exec -it ollama bash
```

To get the Ollama version, run this command

```bash
ollama -v
```

For Q.1, the answer is:

```bash
$ ollama -v
ollama version is 0.1.48
```

To pull a model from model repositoy use the following command:

```bash
ollama pull [model-name]
```

For Q.2, the answer is:

```bash
$ ollama pull gemma:2b
pulling manifest 
pulling c1864a5eb193... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.7 GB                         
pulling 097a36493f71... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 8.4 KB                         
pulling 109037bec39c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  136 B                         
pulling 22a838ceb7fb... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏   84 B                         
pulling 887433b89a90... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  483 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success 
```

```bash
$ cat /root/.ollama/models/manifests/registry.ollama.ai/library/gemma/2b 
{"schemaVersion":2,"mediaType":"application/vnd.docker.distribution.manifest.v2+json","config":{"mediaType":"application/vnd.docker.container.image.v1+json","digest":"sha256:887433b89a901c156f7e6944442f3c9e57f3c55d6ed52042cbb7303aea994290","size":483},"layers":[{"mediaType":"application/vnd.ollama.image.model","digest":"sha256:c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12","size":1678447520},{"mediaType":"application/vnd.ollama.image.license","digest":"sha256:097a36493f718248845233af1d3fefe7a303f864fae13bc31a3a9704229378ca","size":8433},{"mediaType":"application/vnd.ollama.image.template","digest":"sha256:109037bec39c0becc8221222ae23557559bc594290945a2c4221ab4f303b8871","size":136},{"mediaType":"application/vnd.ollama.image.params","digest":"sha256:22a838ceb7fb22755a3b0ae9b4eadde629d19be1f651f73efb8c6b4e2cd0eea0","size":84}]}
```

To run a model, use this command:

```bash
ollama run [model-name]
```

For Q.3, the answer is:

```bash
$ ollama run gemma:2b
>>> 10*10
Sure. Here is the answer to the question:

10 * 10 = 100



>>> Send a message (/? for help)
```

To load the weights locally, map the `/root/.ollama` directory to a local directory. For example, in this instance we're running the Ollama container and mapping the `/root/.ollama` directory to `ollama_files` locally. Now, when we pull a model, its weights will be coppied to our local directory:

```bash
mkdir ollama_files

docker run -it \
    --rm \
    -v ./ollama_files:/root/.ollama \
    -p 11434:11434 \
    --name ollama \
    ollama/ollama
```

```bash
$ docker exec -it ollama ollama pull gemma:2b 
pulling manifest 
pulling c1864a5eb193... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.7 GB                         
pulling 097a36493f71... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 8.4 KB                         
pulling 109037bec39c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  136 B                         
pulling 22a838ceb7fb... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏   84 B                         
pulling 887433b89a90... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  483 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success 
```

For Q.4, the size of `ollama_files/models/` is 1.6G

```bash
$ du -h ollama_files/models/
1.6G    ollama_files/models/blobs
8.0K    ollama_files/models/manifests/registry.ollama.ai/library/gemma
12K     ollama_files/models/manifests/registry.ollama.ai/library
16K     ollama_files/models/manifests/registry.ollama.ai
20K     ollama_files/models/manifests
1.6G    ollama_files/models/
```

Now that we have the model weights, we can create a new Docker container with these weights and we can use this container for our app. 

For Q.5, we can copy the local weights to a new docker container using this Dockerfile:

```yaml
FROM ollama/ollama

# Copy the weights from your local machine to the Docker image
COPY ollama_files/ /root/.ollama
```

Build the container with this command:

```bash
docker build -t my-ollama-image .
```

And run it with this command:

```bash
docker run -it --rm \
    -p 11434:11434 \
    --name ollama-container \
    my-ollama-image
```

*PS: Make sure to expose port 11434 so you can interact with Ollama.*

For Q.6, the number of completion tokens for the response can be derived using this code:

In [1]:
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',
)

In [2]:
prompt = "What's the formula for energy?"

In [3]:
response = client.chat.completions.create(
    model='gemma:2b',
    messages=[{"role": "user", "content": prompt}],
    temperature=0.0
)

In [4]:
response.choices[0].message.content

"Sure, here's the formula for energy:\n\n**E = K + U**\n\nWhere:\n\n* **E** is the energy in joules (J)\n* **K** is the kinetic energy in joules (J)\n* **U** is the potential energy in joules (J)\n\n**Kinetic energy (K)** is the energy an object possesses when it moves or is in motion. It is calculated as half the product of an object's mass (m) and its velocity (v) squared:\n\n**K = 1/2mv^2**\n\n**Potential energy (U)** is the energy an object possesses due to its position or configuration. It is calculated as the product of an object's mass, gravitational constant (g), and height or position above a reference point.\n\n**U = mgh**\n\nWhere:\n\n* **m** is the mass in kilograms (kg)\n* **g** is the gravitational constant (9.8 m/s^2)\n* **h** is the height or position in meters (m)\n\nThe formula shows that energy can be expressed as the sum of kinetic and potential energy. The kinetic energy is a measure of the object's ability to do work, while the potential energy is a measure of the

In [18]:
response.usage.completion_tokens

281