OpenAI Server

FastAPI API wrapper for quantized LLMs.

Designed to allow simpler interactions with local LLMs, and provide features like response caching.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

A Unix-like environment
Git
Python v3.12+ (which should include pip and venv)
docker (optional)

Installation

Create an SSH key and clone this repository.

    git clone git@github.kcl.ac.uk:hi/openai-server.git

(Alternative) Clone this repository using HTTPs, supplying username and password:

    git clone https://github.kcl.ac.uk/hi/openai-server.git

Create a virtual environment, activate it and install packages using the provided requirements

    python -m venv .venv
    . .venv/bin/activate
    pip install -r requirements.txt

Install tox:

    pip install tox

Create an empty SQLite database:

    mkdir output
    touch output/cache.db

Models

Models currently supported by the server:

Name	Hugging Face (HF) repository	HF filename	Notes
Llama 3.1	SanctumAI/Meta-Llama-3.1-8B-Instruct-GGUF	meta-llama-3.1-8b-instruct.Q4_K_M.gguf	Quantized version of model not distributed by Meta directly. Model required for this server.
MedLlama 3	johnsnowlabs/JSL-MedLlama-3-8B-v2.0	N/A	Requires quantization via llama.cpp before use. Model not required for this server.
Biomistral 3	skfrost19/BioMistralMerged	biomistral-merged-v0.1.gguf	Quantized version of model not distributed directly by BioMistral. Model not required for this server.

To download Llama 3.1 (required), and any of the other models:

Install huggingface-hub (with virtual environment activated):

    pip install huggingface-hub

Download model using information in the table above, and note download location for configuration:

    huggingface-cli download <HF repository> <HF filename> --local-dir .

Configuration

Set environment variables in .env. Current variables, also shown in .env.template, are:

Variable	Details
Llama__3_1__8B_Quant_Instruct	Location of downloaded Llama model (required)
MedLlama__3__8B_Quant	Location of downloaded MedLama model (optional)
Biomistral__7B_Quant	Location of downloaded Biomistral model (optional)
MODEL_FOLDER	Parent folder of all models (e.g. `/home/user/models`, if `/home/user/models/[ModelA]` and `/home/user/models/[ModelB]` exist) (optional; required for Docker)

Determine suitable configuration options (config/config.ini):

Option	Details	Default
CACHE > ACTIVE	Whether to store prompt answers and return these to the same prompt in the future. Sourced from a DB rather than the LLM directly.	False

Testing

Unit tests

tox is used a test orchestrator, creating environments for linting (flake8), type checks (mypy) and finally units tests (pytest). It can be run using tox. A Makefile has been included packaging common commands for convenience. make test runs a loop that will fail if any of the environments fail, providing easier to read output.

Running

Python package

Install and run locally as a python package (e.g. for integration tests) as follows:

pip install . 
openaiserver

Docker

Run through docker as follows (either locally or remotely):

docker compose build
docker compose up -d

The app can then be interacted with in the same manner as if running as a python package.

Example server interaction

Install the OpenAI client:

    pip install openai

Create and run a python file containing the following code:

    import os
    from openai import OpenAI

    client = OpenAI(
        base_url = 'http://localhost:8080/v1/',
        api_key = 'foo'
    )

    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "Say this is a test",
            }
        ],
        model='Llama__3_1__8B_Quant_Instruct',
        max_tokens=1024,
        temperature=0.7
    )

Editing

As this is a Python package, most of the logic is contained within the src folder. General recommendations for editing are:

Use make prettier, another command made available within the Makefile for convenience, to automatically format code.
Always runs tests (make test) before committing.
Commits can be made as follows:

    git add .
    git commit -m "[details of changes]"
    git push

Built With

Authors

kclhi

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
config		config
src/openaiserver		src/openaiserver
tests		tests
.dockerignore		.dockerignore
.env.template		.env.template
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpenAI Server

Getting Started

Prerequisites

Installation

Models

Configuration

Testing

Unit tests

Running

Python package

Docker

Example server interaction

Editing

Built With

Authors

License

About

Uh oh!

Languages

License

martinteaching/openai-server

Folders and files

Latest commit

History

Repository files navigation

OpenAI Server

Getting Started

Prerequisites

Installation

Models

Configuration

Testing

Unit tests

Running

Python package

Docker

Example server interaction

Editing

Built With

Authors

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages