# Setting up a local LLM server
In this notebook, we will use [basaran](https://github.com/hyperonym/basaran) as a local LLM server.

## Code download and installation
Clone the `windows_modern_openai` branch from https://github.com/haesleinhuepf/basaran . Navigate to the folder and run `pip install -e .` from there.
Also execute `pip install openai==1.5.0`.

## Model download
To have a model to play with, we need to download a small one, e.g. [TinyLlama](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0).
We can use this code which will cache the model in the huggingface .cache folder:

In [1]:
import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16, device_map="auto")

# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot who always responds in the style of a pirate",
    },
    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

<|system|>
You are a friendly chatbot who always responds in the style of a pirate</s>
<|user|>
How many helicopters can a human eat in one sitting?</s>
<|assistant|>
As humans, we can eat a reasonable amount of food in one sitting, depending on our body weight and appetite. However, eating too much food at once can lead to stomach cramps, bloating, and other digestive issues. A reasonable amount for a human to eat in one sitting would be 4-5 servings, which is about 4-5 cups of food.


## Starting the server
Replace `haase` in the following command with your username. Also check if the snapshot may have a different name. Then run this from the terminal (in a separate window):

```
SET MODEL=C:\Users\haase\.cache\huggingface\hub\models--TinyLlama--TinyLlama-1.1B-Chat-v1.0\snapshots\77e23968eed12d195bd46c519aa679cc22a27ddc&& set PORT=80 && python -m basaran
```


The server should say something like 
```
start listening on 127.0.0.1:80
```

## Testing the server
You can test the server using curl.

In [2]:
!curl http://127.0.0.1/v1/completions \
    -H 'Content-Type: application/json' \
    -d '{ "prompt": "once upon a time,", "echo": true }'

{"id":"cmpl-b0954aecfa6b351b807baa2b","object":"text_completion","created":1705783287,"model":"C:\\Users\\haase\\.cache\\huggingface\\hub\\models--TinyLlama--TinyLlama-1.1B-Chat-v1.0\\snapshots\\77e23968eed12d195bd46c519aa679cc22a27ddc","choices":[{"text":"<|system|>\n","index":0,"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":1,"completion_tokens":7,"total_tokens":8}}


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100     2    0     0  100     2      0      1  0:00:02  0:00:01  0:00:01     1
100     2    0     0    0     2      0      0 --:--:--  0:00:02 --:--:--     0
100     2    0     0    0     2      0      0 --:--:--  0:00:03 --:--:--     0
100     2    0     0    0     2      0      0 --:--:--  0:00:04 --:--:--     0
100   392  100   390    0     2     80      0  0:00:04  0:00:04 --:--:--    81
100   392  100   390    0     2     80      0  0:00:04  0:00:04 --:--:--   102
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (6) Could not resolve host: application
curl:

## Accessing the LLM using openai's API
Next, we will use the openai python API to access our local LLM server.

In [3]:
import openai
openai.__version__

'1.5.0'

In [4]:
client = openai.OpenAI()
client.base_url = 'http://127.0.0.1/v1'

In [5]:
response = client.completions.create(
    model="xyz",
    prompt="Max and Moritz go in the forest to",
    max_tokens=200
)
print(response.choices[0].text)

 meet the animals.

Act 2
As Max and Moritz are exploring the forest, Max meets a rabbit that needs help to escape a pack of wolves. Max goes to the forest's owner to help; and a tie to the owner appears in the form of a book. Max reads the story and learns the right way to help the rabbit to escape.

Act 3
Max and Moritz are heading back to their village when, in an attempt to pass an obstacle, they accidentally take a long shortcut through a vibrant meadow. When they arrive, they find that an entire family has been wiped out because they all thought Max was going to eat them! Max and Moritz use their friendship to sweet-talk the family, and they make peace with them.

Epilogue
A happy ending ensues where Max and Moritz are reunited, and the forest seems peaceful once again
