# Text-To-Text LLM Server

**important: Select venv Python Interpreter before you start**

This repository is designed to be used with Visual Studio Code and Docker DevContainer.

![dev-container](../img/dev-container.png)

## 1. Setup

**Instructions:**

a) Download model

```bash
huggingface-cli download bartowski/Mistral-7B-Instruct-v0.3-GGUF \
    Mistral-7B-Instruct-v0.3-Q4_K_M.gguf \
    --revision 61fd4167fff3ab01ee1cfe0da183fa27a944db48 \
    --local-dir ~/.gai/models/llamacpp-mistral7b \
    --local-dir-use-symlinks False
```


---
## 2. Smoke Test

In [None]:
# check .gairc
import os
gairc=None
with open(os.path.expanduser("~/.gairc"),"r") as f:
    gairc = f.read()
print(gairc)

# check ~/.gairc (if docker created .gairc)
import json
jsoned=json.loads(gairc)
assert os.path.expanduser(jsoned["app_dir"])=="/home/kakkoii1337/.gai"

# check ~/.gai (if docker created the mount point)
assert os.path.exists(os.path.expanduser("~/.gai"))

In [None]:
# Confirm llama_cpp is importable
from llama_cpp import Llama, LlamaGrammar

In [1]:
# Configure
from gai.lib.server.singleton_host import SingletonHost
from gai.lib.common.utils import free_mem
from rich.console import Console
console=Console()

config = {
    "type": "ttt",
    "generator_name": "llamacpp-mistral7b",
    "model_filepath": "models/llamacpp-mistral7b/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf",
    "max_seq_len": 8192,
    "prompt_format": "mistral",
    "hyperparameters": {
        "temperature": 1.31,
        "top_p": 0.14,
        "top_k": 49,
        "max_new_tokens": 1000,
    },
    "tool_choice": "auto",
    "max_retries": 5,
    "stop": ["<|eot_id|>"],
    "module_name": "gai.ttt.server.gai_llamacpp",
    "class_name": "GaiLlamaCpp",
    "init_args": [],
    "init_kwargs": {}
}



In [None]:
# Load
## before loading
free_mem()
try:
    with SingletonHost.GetInstanceFromConfig(config) as host:

        ## after loading
        free_mem()
except Exception as e:
    raise e
finally:
    ## after disposal
    free_mem()

---
## 3. Integration Test

### Startup

In [2]:
from gai.lib.server.singleton_host import SingletonHost
host = SingletonHost.GetInstanceFromConfig(config, verbose=False)
host.load()
generator = host.generator
free_mem()

4.399696350097656

### a) Testing streaming

In [None]:
response = host.generator.create(
    messages=[{"role":"user","content":"Tell me a one paragraph story"},
                {"role":"assistant","content":""}],
    stream=True)
for chunk in response:
    if chunk:
        print(chunk.choices[0].delta.content, end="", flush=True)


### b) Test generation

In [None]:
response = host.generator.create(
    messages=[{"role":"user","content":"Tell me a one paragraph story"},
                {"role":"assistant","content":""}],
    stream=False)
print(response.choices[0].message.content)


### c) Test Tool Calling

In [3]:
messages = [
    {"role":"user","content":"What is the current time in Singapore?"},
    {"role":"assistant","content":""}
]
tool_choice="required"
tools = [
    {
        "type": "function",
        "function": {
            "name": "google",
            "description": "The 'google' function is a powerful tool that allows the AI to gather external information from the internet using Google search. It can be invoked when the AI needs to answer a question or provide information that requires up-to-date, comprehensive, and diverse sources which are not inherently known by the AI. For instance, it can be used to find current date, current news, weather updates, latest sports scores, trending topics, specific facts, or even the current date and time. The usage of this tool should be considered when the user's query implies or explicitly requests recent or wide-ranging data, or when the AI's inherent knowledge base may not have the required or most current information. The 'search_query' parameter should be a concise and accurate representation of the information needed.",
            "parameters": {
                "type": "object",
                "properties": {
                    "search_query": {
                        "type": "string",
                        "description": "The search query to search google with. For example, to find the current date or time, use 'current date' or 'current time' respectively."
                    }
                },
                "required": ["search_query"]
            }
        }
    }
]
response = host.generator.create(
    messages=messages,
    tools=tools,
    tool_choice=tool_choice,
    stream=False)
print(response)


array ::= [[] space array_6 []] space 
space ::= space_94 
array_2 ::= value array_5 
value ::= object | array | string | number | boolean | null 
array_4 ::= [,] space value 
array_5 ::= array_4 array_5 | 
array_6 ::= array_2 | 
boolean ::= boolean_8 space 
boolean_8 ::= [t] [r] [u] [e] | [f] [a] [l] [s] [e] 
char ::= [^"\] | [\] char_10 
char_10 ::= ["\/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] 
decimal-part ::= [0-9] decimal-part_41 
decimal-part_12 ::= [0-9] decimal-part_40 
decimal-part_13 ::= [0-9] decimal-part_39 
decimal-part_14 ::= [0-9] decimal-part_38 
decimal-part_15 ::= [0-9] decimal-part_37 
decimal-part_16 ::= [0-9] decimal-part_36 
decimal-part_17 ::= [0-9] decimal-part_35 
decimal-part_18 ::= [0-9] decimal-part_34 
decimal-part_19 ::= [0-9] decimal-part_33 
decimal-part_20 ::= [0-9] decimal-part_32 
decimal-part_21 ::= [0-9] decimal-part_31 
decimal-part_22 ::= [0-9] decimal-part_30 
decimal-part_23 ::= [0-9] decimal-part_29 
decimal-part_24 ::= [0-9

### d) Test Structured Output

In [4]:
# Define Schema
from pydantic import BaseModel
class Book(BaseModel):
    title: str
    summary: str
    author: str
    published_year: int

text = """Foundation is a science fiction novel by American writer
Isaac Asimov. It is the first published in his Foundation Trilogy (later
expanded into the Foundation series). Foundation is a cycle of five
interrelated short stories, first published as a single book by Gnome Press
in 1951. Collectively they tell the early story of the Foundation,
an institute founded by psychohistorian Hari Seldon to preserve the best
of galactic civilization after the collapse of the Galactic Empire.
"""
response = host.generator.create(messages=[{'role':'user','content':text},{'role':'assistant','content':''}], 
    json_schema=Book.schema(),
    stream=False
    )
print(response)


author-kv ::= ["] [a] [u] [t] [h] [o] [r] ["] space [:] space string 
space ::= space_43 
string ::= ["] string_44 ["] space 
char ::= [^"\] | [\] char_4 
char_4 ::= ["\/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] 
integer ::= integer_6 space 
integer_6 ::= integer_7 integral-part 
integer_7 ::= [-] | 
integral-part ::= [0-9] | [1-9] integral-part_38 
integral-part_9 ::= [0-9] integral-part_37 
integral-part_10 ::= [0-9] integral-part_36 
integral-part_11 ::= [0-9] integral-part_35 
integral-part_12 ::= [0-9] integral-part_34 
integral-part_13 ::= [0-9] integral-part_33 
integral-part_14 ::= [0-9] integral-part_32 
integral-part_15 ::= [0-9] integral-part_31 
integral-part_16 ::= [0-9] integral-part_30 
integral-part_17 ::= [0-9] integral-part_29 
integral-part_18 ::= [0-9] integral-part_28 
integral-part_19 ::= [0-9] integral-part_27 
integral-part_20 ::= [0-9] integral-part_26 
integral-part_21 ::= [0-9] integral-part_25 
integral-part_22 ::= [0-9] integral-part_24 


---
## 4. API Test