# Setting up a local LLM server
In this notebook, we will use [basaran](https://github.com/hyperonym/basaran) as a local LLM server and access it through the OpenAI API.

## Code download and installation
Clone the `windows_modern_openai` branch from https://github.com/haesleinhuepf/basaran . Navigate to the folder and run `pip install -e .` from there.
Also execute `pip install openai==1.5.0`.

## Starting the server
Run this from the Windows CMD terminal (in a separate window):

```
SET MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0&& SET PORT=80 && python -m basaran
```

When executing this for the first time, the [TinyLlama](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) model will be downloaded from the Huggingface hub and stored in a cache. When executing it the second time, the cached model will be used. When ready, the server should say something like 
```
start listening on 127.0.0.1:80
```

## Testing the server
You can test the server using curl.

In [1]:
!curl http://127.0.0.1/v1/completions \
    -H 'Content-Type: application/json' \
    -d '{ "prompt": "once upon a time,", "echo": true }' -s

{"id":"cmpl-bef9f82d195629ca0f7209b8","object":"text_completion","created":1705829561,"model":"TinyLlama/TinyLlama-1.1B-Chat-v1.0","choices":[{"text":"arches in the glitter ball night in. Can you outline the steps for","index":0,"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":1,"completion_tokens":16,"total_tokens":17}}


## Accessing the LLM using openai's API
Next, we will use the openai python API to access our local LLM server.

In [2]:
import openai
openai.__version__

'1.5.0'

We define a little helper function to access the model.

In [3]:
def ask(prompt, max_tokens=20, base_url='http://127.0.0.1/v1'):
    client = openai.OpenAI()
    client.base_url = base_url

    response = client.completions.create(
        model="xyz",
        prompt=prompt,
        max_tokens=max_tokens
    )
    return response.choices[0].text

### Story writing
We test if the model can complete a story.

In [4]:
print(ask("Max and Moritz go in the forest to"))

 retrieve an ancient magic object that belonged to a daunting guardian of the forest.
The


### Code generation
We test if the model can produce code.

In [5]:
print(ask("def print_hello_world():"))


    print("Hello, world!")

if __name__ == "__main__


### Knowledge retrieval
We test if the model knows about computer science,

In [6]:
print(ask("A software system passes the Turing test if"))

 it can simulate human intelligence, with all of the identified cognitive processes, such as problem-sol


biology,

In [7]:
print(ask("Animals and mushrooms are differentiated by"))

 their: Physical structure. Fascism—ana inversa). Discuss the fashionable


physics,

In [8]:
print(ask("Distance divided by time is "))


Distance/Time
Distance/(Time)


cooking/baking, 

In [9]:
print(ask("To make bread, you need flour, salt and "))

2 and 1/2 cups warm water or milk. You can also add more bread fl


geometry,

In [10]:
print(ask("The Euclidean distance is defined as"))

 the straight line distance between two given points in a vector space.


geopolitics,

In [11]:
print(ask("The number of states in the US is"))

 estimated to be 50. A country without a single-state? How is technology driving the


... and jokes

In [None]:
print(ask("A really funny joke is this one:", max_tokens=200))

### Math
We test if the model can do math

In [None]:
print(ask("1 + 2 / 2 ="))