# Gai/Gen: Text-to-Text Generation using Gai with LlamaCPP

This is useful for running LLM on CPU without relying on graphics card. 

## Setting Up

1. Create a conda environment called `TTT`, if not already created, and install the dependencies:

    ```bash
    sudo apt update -y && sudo apt install ffmpeg git git-lfs -y
    conda create -n TTT python=3.10.10 -y
    conda activate TTT
    cd ../../gai-gen
    pip install -e ".[TTT]"
    ```

2. Download Llama3_8b GGUF model into `~/gai/models` directory.

    ```bash
    huggingface-cli download bartowski/LLaMA3-iterative-DPO-final-GGUF \
                    LLaMA3-iterative-DPO-final-Q4_K_M.gguf  \
                    --local-dir ~/gai/models/LLaMA3-iterative-DPO-final-GGUF \
                    --local-dir-use-symlinks False
    ```

---

## Chat Completion


#### Streaming

In [1]:
from gai.gen import Gaigen
gen = Gaigen.GetInstance().load('llama3-llamacpp')

response = gen.create(messages=[{'role':'USER','content':'Tell me a one paragraph short story.'},{'role':'ASSISTANT','content':''}], max_tokens=1000,stream=True)
for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content,end='',flush=True)


2024-06-04 22:02:06 INFO gai.gen.Gaigen:[32mGaigen: Loading generator llama3-llamacpp...[0m
2024-06-04 22:02:06 INFO gai.gen.ttt.TTT:[32mUsing engine LlamaCpp_TTT...[0m
2024-06-04 22:02:06 INFO gai.gen.ttt.TTT:[32mLoading model from models/LLaMA3-iterative-DPO-final-GGUF[0m
2024-06-04 22:02:06 INFO gai.gen.ttt.LlamaCpp_TTT:[32mexllama_engine.load: Loading model from /home/roylai/gai/models/LLaMA3-iterative-DPO-final-GGUF/LLaMA3-iterative-DPO-final-Q4_K_M.gguf[0m


In the heart of an ancient forest, there lived a wise old owl named Hooten. One day, as he perched on his favorite branch, he noticed a tiny acorn nestled in the moss below. Curious, Hooten decided to watch over it and see what would happen. Days turned into weeks, and the acorn began to sprout. A small sapling emerged, growing taller with each passing day under Hooten's watchful eyes. As the seasons changed, the young tree flourished, eventually becoming a majestic oak, its branches reaching high for the sky. Hooten, now an old friend of the tree, would sit atop it and share stories with the forest creatures, forever grateful for the chance to witness life's cycle from acorn to oak. And so, the tale of Hooten and the Oak became a cherished legend among the woodland dwellers, reminding them all that even the smallest beginnings can grow into something extraordinary.

#### Generating

In [1]:
from gai.gen import Gaigen
gen = Gaigen.GetInstance().load('llama3-llamacpp')

response = gen.create(messages=[{'role':'USER','content':'Tell me a one paragraph short story.'},{'role':'ASSISTANT','content':''}], max_tokens=1000,stream=False)
print(response.choices[0].message.content)

2024-06-04 20:10:01 INFO gai.gen.Gaigen:[32mGaigen: Loading generator llama3-llamacpp...[0m
2024-06-04 20:10:01 INFO gai.gen.ttt.TTT:[32mUsing engine LlamaCpp_TTT...[0m
2024-06-04 20:10:01 INFO gai.gen.ttt.TTT:[32mLoading model from models/LLaMA3-iterative-DPO-final-GGUF[0m
2024-06-04 20:10:01 INFO gai.gen.ttt.LlamaCpp_TTT:[32mexllama_engine.load: Loading model from /home/roylai/gai/models/LLaMA3-iterative-DPO-final-GGUF/LLaMA3-iterative-DPO-final-Q4_K_M.gguf[0m



NO STREAMING
In the heart of an ancient forest, there lived a wise old owl named Hooten. One day, as he perched on his favorite branch, he noticed a tiny acorn nestled in the moss below. Curious, Hooten decided to watch over it and see what would happen. Days turned into weeks, and the acorn began to sprout. A small sapling emerged, growing taller with each passing day under Hooten's watchful eyes. As the seasons changed, the young tree flourished, becoming a beacon of life in the forest. Hooten felt proud, knowing that he had played a part in nurturing this new addition to their home. And so, the tale of Hooten and the acorn became a cherished legend among the woodland creatures, reminding them all of the beauty that can bloom from even the smallest beginnings.


---

## Gai API Service



1. Build the docker image for the Gai API service.

In [None]:
!docker build --build-arg CATEGORY=ttt -f ../../gai-gen/Dockerfile.TTT -t gai-ttt:latest ../../gai-gen    

2. Start container

In [None]:
!docker container rm -f gai-ttt

# Map model directory from host to container 
!docker run -d \
            -e DEFAULT_GENERATOR=llama3-llamacpp \
            -e OPENAI_API_KEY=${OPENAI_API_KEY} \
            --gpus all \
            -v ~/gai/models:/app/models \
            -p 12031:12031 \
            --name gai-ttt \
            gai-ttt:latest

3. Send a POST request to the API service

In [None]:
%%bash
echo STREAMING
curl -X POST \
    http://localhost:12031/gen/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -s \
    -N \
    -d '{"model":"llama3-llamacpp", 
        "messages": [ 
            {"role": "user","content": "Tell me a story"}, 
            {"role": "assistant","content": ""} 
        ], "stream":true, "max_new_tokens":1000 }' | python ../../gai-gen/tests/integration_tests/print_delta.py


---

## JSON Mode

1. Install PyLLMCore

  ```bash
  pip install py-llm-core
  ```

2. Generate JSON Schema


In [None]:
from dataclasses import dataclass
import json


# Define Grammar
@dataclass
class Book:
    title: str
    summary: str
    author: str
    published_year: int
from pydantic import TypeAdapter
from llama_cpp import LlamaGrammar    
type_adaptor=TypeAdapter(Book)
schema=type_adaptor.json_schema()
grammar = LlamaGrammar.from_json_schema(json.dumps(schema))

from gai.gen import Gaigen
gen = Gaigen.GetInstance().load('llama3-llamacpp')
text = """Foundation is a science fiction novel by American writer
Isaac Asimov. It is the first published in his Foundation Trilogy (later
expanded into the Foundation series). Foundation is a cycle of five
interrelated short stories, first published as a single book by Gnome Press
in 1951. Collectively they tell the early story of the Foundation,
an institute founded by psychohistorian Hari Seldon to preserve the best
of galactic civilization after the collapse of the Galactic Empire.
"""
response = gen.create(messages=[{'role':'USER','content':text},{'role':'ASSISTANT','content':''}], grammar=grammar, max_tokens=1000,stream=False)
print(response.choices[0].message.content)

3. Extract structured information about a long speech

In [None]:
from dataclasses import dataclass
from gai.gen import Gaigen
gen = Gaigen.GetInstance().load('mistral7b-llamacpp')

@dataclass
class Book:
    title: str
    summary: str
    author: str
    published_year: int
from pydantic import TypeAdapter
from llama_cpp import LlamaGrammar  
import json  
type_adaptor=TypeAdapter(Book)
schema=type_adaptor.json_schema()
grammar = LlamaGrammar.from_json_schema(json.dumps(schema))

with open("pm_long_speech_2023.txt") as f:
    text = f.read()

import os
model = "~/gai/models/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf"
model = os.path.expanduser(model)
response = gen.create(messages=[{'role':'USER','content':text},{'role':'ASSISTANT','content':''}], grammar=grammar, max_tokens=-1,stream=False)
print(response.choices[0].message.content)

### An alternate approach using PyLLMCore

In [None]:
from dataclasses import dataclass

@dataclass
class Book:
    title: str
    summary: str
    author: str
    published_year: int
from pydantic import TypeAdapter
type_adaptor=TypeAdapter(Book)
schema=type_adaptor.json_schema()

import os
model = "~/gai/models/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf"
model = os.path.expanduser(model)
from llm_core.parsers import LLaMACPPParser
text = """Foundation is a science fiction novel by American writer
Isaac Asimov. It is the first published in his Foundation Trilogy (later
expanded into the Foundation series). Foundation is a cycle of five
interrelated short stories, first published as a single book by Gnome Press
in 1951. Collectively they tell the early story of the Foundation,
an institute founded by psychohistorian Hari Seldon to preserve the best
of galactic civilization after the collapse of the Galactic Empire.
"""

with LLaMACPPParser(Book, model=model,llama_cpp_kwargs={"n_ctx":32000,"verbose":False}) as parser:
    book = parser.parse(text)
    print(book)


---

## Function Calling

It is essentially a way for the LLM to seek external help when encountering limitation to its ability to generate text but returning a string emulating the calling of a function based on the function description provied by the user.

We will create a set of tools that can be made available to the models below.

In [None]:
from gai.gen import Gaigen
#gen = Gaigen.GetInstance().load('mistral7b-llamacpp')
gen = Gaigen.GetInstance().load('llama3-llamacpp')

tools = [
    {
        "type": "function",
        "function": {
            "name": "google",
            "description": "The 'google' function is a powerful tool that allows the AI to gather external information from the internet using Google search. It can be invoked when the AI needs to answer a question or provide information that requires up-to-date, comprehensive, and diverse sources which are not inherently known by the AI. For instance, it can be used to find current date, current news, weather updates, latest sports scores, trending topics, specific facts, or even the current date and time. The usage of this tool should be considered when the user's query implies or explicitly requests recent or wide-ranging data, or when the AI's inherent knowledge base may not have the required or most current information. The 'search_query' parameter should be a concise and accurate representation of the information needed.",
            "parameters": {
                "type": "object",
                "properties": {
                    "search_query": {
                        "type": "string",
                        "description": "The search query to search google with. For example, to find the current date or time, use 'current date' or 'current time' respectively."
                    }
                },
                "required": ["search_query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "scrape",
            "description": "Scrape the content of the provided url",
            "parameters": {
                "type": "object",
                "properties": {
                    "url": {
                        "type": "string",
                        "description": "The url to scrape the content from"
                    }
                },
                "required": ["url"]
            }
        }
    }
]

from gai.common.notebook import highlight

highlight("Model decided to use tool: ")
user_prompt = "What time is it in Singapore right now?"
response = gen.create(
    messages=[
        {'role':'user','content':user_prompt},
        {'role':'assistant','content':''}],
    tools=tools,
    stream=False,
    max_new_tokens=200)
print(response.choices[0].message)

highlight("Model decided not to use tool: ")
user_prompt = "Tell me a one paragraph story."
response = gen.create(
    messages=[
        {'role':'user','content':user_prompt},
        {'role':'assistant','content':''}],
        tools=tools,
        stream=False,
        max_new_tokens=200)
print(response.choices[0].message)
