# Пример с ollama
## Игрушечный пример на понимание интерфейса ChatOllama

Сначала устанавливаем ollama:
```bash
curl -fsSL https://ollama.com/install.sh | sh
```
Можно через докер:
https://hub.docker.com/r/ollama/ollama

Затем через ollama загружаем нужную модель (или подаем свою в формате gguf):

```bash
ollama run llama3
```

Как подать свою есть инструкция на [гитхабе](https://github.com/ollama/ollama)

In [8]:
!sudo apt install pciutils

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libpci3 pci.ids
The following NEW packages will be installed:
  libpci3 pci.ids pciutils
0 upgraded, 3 newly installed, 0 to remove and 45 not upgraded.
Need to get 343 kB of archives.
After this operation, 1,581 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/main amd64 pci.ids all 0.0~2022.01.22-1 [251 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/main amd64 libpci3 amd64 1:3.7.0-6 [28.9 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy/main amd64 pciutils amd64 1:3.7.0-6 [63.6 kB]
Fetched 343 kB in 2s (197 kB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 78, <> line 3.)
debconf: falling back to frontend: Readline
debconf: unable to initializ

In [9]:
!curl -fsSL https://ollama.com/install.sh | sh

>>> Downloading ollama...
############################################################################################# 100.0%
>>> Installing ollama to /usr/local/bin...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> NVIDIA GPU installed.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


In [None]:
!ollama serve

2024/06/05 09:34:27 routes.go:1007: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST: OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS: OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
time=2024-06-05T09:34:27.052Z level=INFO source=images.go:729 msg="total blobs: 0"
time=2024-06-05T09:34:27.053Z level=INFO source=images.go:736 msg="total unused blobs removed: 0"
time=2024-06-05T09:34:27.053Z level=INFO source=routes.go:1053 msg="Listening on 127.0.0.1:11434 (version 0.1.41)"
time=2024-06-05T09:34:27.053Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama4287817558/runne

### Пример для простого инференса из модели в Ollama с подстановкой аргумента:

In [None]:
from langchain_community.chat_models import ChatOllama
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

# Local Llama3
llm = ChatOllama(
    model="llama3",
    keep_alive=-1,  # Модель не будет выгружаться
    temperature=0,
    max_new_tokens=512,
)

prompt = ChatPromptTemplate.from_template(
    "Write me a 500 word article on {topic} from the perspective of a {profession}. "
)

chain = prompt | llm | StrOutputParser()

# print(chain.invoke({"topic": "LLMs", "profession": "shipping magnate"}))

for chunk in chain.stream({"topic": "LLMs", "profession": "shipping magnate"}):
    print(chunk, end="", flush=True)

**"LLMs: The Game-Changers for Shipping Magnates Like Me"**

As a shipping magnate, I've spent my fair share of time navigating the complexities of global trade and logistics. From optimizing routes to managing inventory, every decision counts when it comes to keeping my fleet of vessels running smoothly and efficiently. But in recent years, I've noticed a significant shift in the way I do business – and it's all thanks to Large Language Models (LLMs).

For those who may not be familiar, LLMs are artificial intelligence systems that can process and generate human-like language. They're essentially super-smart chatbots that can learn from vast amounts of data and respond accordingly. And let me tell you, they've revolutionized the way I operate my shipping empire.

First and foremost, LLMs have streamlined our communication channels with customers and partners. Gone are the days of tedious phone calls and lengthy emails; now, we can simply converse with these AI-powered language models 

### Небольшой пример, как работать с OllamaFunctions, как подавать в нее инструменты

In [None]:
from langchain_experimental.llms.ollama_functions import OllamaFunctions

model = OllamaFunctions(model="llama3", format="json")

model = model.bind_tools(
    tools=[
        {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, " "e.g. San Francisco, CA",
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                    },
                },
                "required": ["location"],
            },
        }
    ],
    function_call={
        "name": "get_current_weather"
    },  # Этот параметр ЗАСТАВЛЯЕТ модель использовать функцию ОБЯЗАТЕЛЬНО
)

response = model.invoke("what is the weather in Singapore?")

print(response)

content='' additional_kwargs={'function_call': {'name': 'get_current_weather', 'arguments': '{"location": "Singapore", "unit": "celsius"}'}} id='run-4cea5090-8a87-410a-9a40-257903e68985-0'


### Пример взаимодействия агента с инструментами, которые могут быть в stairs

Задаем гипотетические инструменты для агента:
1. Модуль моделирования зависимостей между работами и
ресурсами
2. Модуль восстановления связей м/у работами, иерархии работ и объектов
3. Модуль автоматической генерации оптимальных планов
4. Обращение к Базе данных для получения контекста с помощью RAG

In [None]:
dependency_modeling_tool = {
    "name": "dependency_modeling",
    "description": "Model dependencies between tasks and resources",
    "parameters": {
        "type": "object",
        "properties": {
            "tasks": {
                "type": "array",
                "items": {"type": "string"},
                "description": "List of tasks to model dependencies for",
            },
            "resources": {
                "type": "array",
                "items": {"type": "string"},
                "description": "List of resources involved in the tasks",
            },
        },
        "required": ["tasks", "resources"],
    },
}

relationship_recovery_tool = {
    "name": "relationship_recovery",
    "description": "Recover relationships between tasks, task hierarchies, and objects",
    "parameters": {
        "type": "object",
        "properties": {
            "task_list": {
                "type": "array",
                "items": {"type": "string"},
                "description": "List of tasks to analyze",
            },
            "hierarchy_levels": {
                "type": "integer",
                "description": "Number of hierarchy levels to consider",
            },
            "object_types": {
                "type": "array",
                "items": {"type": "string"},
                "description": "Types of objects to include in the recovery",
            },
        },
        "required": ["task_list", "hierarchy_levels", "object_types"],
    },
}

optimal_planning_tool = {
    "name": "optimal_planning",
    "description": "Automatically generate optimal plans",
    "parameters": {
        "type": "object",
        "properties": {
            "constraints": {
                "type": "array",
                "items": {"type": "string"},
                "description": "List of constraints to consider in planning",
            },
            "objectives": {
                "type": "array",
                "items": {"type": "string"},
                "description": "List of objectives to achieve in the plan",
            },
        },
        "required": ["constraints", "objectives"],
    },
}

database_query_tool = {
    "name": "query_database_rag",
    "description": "Query the database to retrieve context using Retrieval-Augmented Generation (RAG)",
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "The query to search the database",
            },
            "context_length": {
                "type": "integer",
                "description": "The length of context to retrieve",
                "default": 5,
            },
        },
        "required": ["query"],
    },
}

Отдаем инструменты модели и задаем наводящие вопросы, чтобы получить вызов инструмента, получаем вызов в переменной additional_kwargs, далее все это можно вытаскивать из словаря и исполнять

In [None]:
from langchain_experimental.llms.ollama_functions import OllamaFunctions

model = OllamaFunctions(model="llama3", format="json")

model = model.bind_tools(
    tools=[
        dependency_modeling_tool,
        relationship_recovery_tool,
        optimal_planning_tool,
        database_query_tool,
    ],
)

response_dependencies = model.invoke(
    "Can you model the dependencies between tasks 'Excavation', 'Foundation', and 'Framing' using resources 'Crane' and 'Crew A'?"
)

print(response)

content='' additional_kwargs={'function_call': {'name': 'dependency_modeling', 'arguments': '{"tasks": ["Excavation", "Foundation", "Framing"], "resources": ["Crane", "Crew A"]}'}} id='run-db307d2e-9269-4793-baae-bf087e91ed3a-0'


In [None]:
response_recovery = model.invoke(
    "Recover the relationships between tasks 'Excavation', 'Foundation', and 'Framing', considering 3 levels of hierarchy and including objects 'Site', 'Materials'."
)
response_recovery

AIMessage(content='', additional_kwargs={'function_call': {'name': 'relationship_recovery', 'arguments': '{"task_list": ["Excavation", "Foundation", "Framing"], "hierarchy_levels": 3, "object_types": ["Site", "Materials"]}'}}, id='run-65255b18-2042-4dd7-868c-e570d6e4ab97-0')

In [None]:
response_plan = model.invoke(
    "Generate an optimal plan considering constraints 'budget' is 5 mil rubles, 'time' 10 days and objective 'minimum cost'."
)
response_plan

AIMessage(content='', additional_kwargs={'function_call': {'name': 'optimal_planning', 'arguments': '{"type": "object", "properties": {"constraints": [{"name": "budget", "value": 5000000, "unit": "rubbles"}, {"name": "time", "value": 10, "unit": "days"}], "objectives": [{"name": "minimum cost", "weight": 1}]}}'}}, id='run-a44570fb-d183-4fa5-9b80-ad523a3afb9d-0')

In [None]:
response_rag = model.invoke(
    "Query the database for the latest project reports using the query 'how to build a dormitory' and retrieve 5 context entries."
)
response_rag

AIMessage(content='', additional_kwargs={'function_call': {'name': 'query_database_rag', 'arguments': '{"query": "how to build a dormitory", "context_length": 5}'}}, id='run-a38cb31f-a8a5-49bf-8311-200307f15b9f-0')

### Structured output с помощью Ollama
Возможно, тоже может быть полезно для получения инфы из БД, чтобы приводить ее в определенный вид

In [None]:
from langchain_core.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_experimental.llms.ollama_functions import OllamaFunctions


# Pydantic Schema for structured response
class Person(BaseModel):
    name: str = Field(description="The person's name", required=True)
    height: float = Field(description="The person's height", required=True)
    hair_color: str = Field(description="The person's hair color")


context = """Alex is 5 feet tall.
Claudia is 1 feet taller than Alex and jumps higher than him.
Claudia is a brunette and Alex is blonde."""

# Prompt template llama3
prompt = PromptTemplate.from_template(
    """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
    You are a smart assistant take the following context and question below and return your answer in JSON.
    <|eot_id|><|start_header_id|>user<|end_header_id|>
QUESTION: {question} \n
CONTEXT: {context} \n
JSON:
<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
 """
)

# Chain
llm = OllamaFunctions(model="llama3", format="json", temperature=0)

structured_llm = llm.with_structured_output(Person)
chain = prompt | structured_llm

response = chain.invoke({"question": "Who is taller?", "context": context})

print(response)

## Вариации без Ollama

### Простые примеры:

Первый вариант подходит для любой модели, но остается только надеяться, что модель в каком-то виде отдаст то, что нужно

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer


class ToolCallingModel:
    def __init__(self, model_name):
        self.model = AutoModelForCausalLM.from_pretrained(model_name)
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.tools = {}

    def add_tool(self, tool_name, tool_function):
        self.tools[tool_name] = tool_function

    def call_model(self, input_text):
        inputs = self.tokenizer(input_text, return_tensors="pt")
        outputs = self.model.generate(**inputs)
        decoded_output = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        print("do: ", decoded_output)
        return self._check_for_tool_use(decoded_output)

    def _check_for_tool_use(self, model_output):
        for tool_name, tool_function in self.tools.items():
            if tool_name in model_output:
                # Extract relevant data and call the tool function
                return tool_function()
        return model_output


# Example tool function
def example_tool():
    return "Tool called successfully!"


if __name__ == "__main__":
    model = ToolCallingModel("mzbac/llama-3-8B-Instruct-function-calling")
    model.add_tool("example_tool", example_tool)
    response = model.call_model("Use example_tool to solve this.")
    print(response)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


do:  Use example_tool to solve this. Example tool is a tool that can be used to solve a
Tool called successfully!


Второй вариант более интересный, модель mzbac/llama-3-8B-Instruct-function-calling уже предобучена на glaive-function-calling-v2, еще существует такой датасет https://huggingface.co/datasets/danilopeixoto/pandora-tool-calling для предобучения. Такая модель будет выдвать function calling в том же виде, после токена \<functioncall\>, что в целом нам тоже подходит

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "mzbac/llama-3-8B-Instruct-function-calling"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

tool = {
    "name": "search_web",
    "description": "Perform a web search for a given search terms.",
    "parameter": {
        "type": "object",
        "properties": {
            "search_terms": {
                "type": "array",
                "items": {"type": "string"},
                "description": "The search queries for which the search is performed.",
                "required": True,
            }
        },
    },
}

messages = [
    {
        "role": "system",
        "content": f"You are a helpful assistant with access to the following functions. Use them if required - {str(tool)}",
    },
    {
        "role": "user",
        "content": "Today's news in Melbourne, just for your information, today is April 27, 2014.",
    },
]

input_ids = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

terminators = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")]

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.1,
)
response = outputs[0]
print(tokenizer.decode(response))

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful assistant with access to the following functions. Use them if required - {'name':'search_web', 'description': 'Perform a web search for a given search terms.', 'parameter': {'type': 'object', 'properties': {'search_terms': {'type': 'array', 'items': {'type':'string'}, 'description': 'The search queries for which the search is performed.','required': True}}}}<|eot_id|><|start_header_id|>user<|end_header_id|>

Today's news in Melbourne, just for your information, today is April 27, 2014.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

<functioncall> {"name": "search_web", "arguments": '{"search_terms": ["Melbourne news", "April 27, 2014"]}'}<|eot_id|>


### Пример с теми же функциями stairs

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "mzbac/llama-3-8B-Instruct-function-calling"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

tools = [
    dependency_modeling_tool,
    relationship_recovery_tool,
    optimal_planning_tool,
    database_query_tool,
]

messages = [
    {
        "role": "system",
        "content": f"You are a helpful assistant with access to the following functions. Use them if required - {str(tools)}",
    },
    {
        "role": "user",
        "content": "Can you model the dependencies between tasks 'Excavation', 'Foundation', and 'Framing' using resources 'Crane' and 'Crew A'?",
    },
]

input_ids = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

terminators = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")]

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.1,
)
response = outputs[0]
print(tokenizer.decode(response))

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful assistant with access to the following functions. Use them if required - [{'name': 'dependency_modeling', 'description': 'Model dependencies between tasks and resources', 'parameters': {'type': 'object', 'properties': {'tasks': {'type': 'array', 'items': {'type':'string'}, 'description': 'List of tasks to model dependencies for'},'resources': {'type': 'array', 'items': {'type':'string'}, 'description': 'List of resources involved in the tasks'}},'required': ['tasks','resources']}}, {'name':'relationship_recovery', 'description': 'Recover relationships between tasks, task hierarchies, and objects', 'parameters': {'type': 'object', 'properties': {'task_list': {'type': 'array', 'items': {'type':'string'}, 'description': 'List of tasks to analyze'}, 'hierarchy_levels': {'type': 'integer', 'description': 'Number of hierarchy levels to consider'}, 'object_types': {'type': 'array', 'items': {'type':'string'}, 'desc