First, download and install the [Okareo CLI](https://docs.okareo.com/docs/sdk/cli#proxy).

Then, run `okareo proxy`. Sending completions to this proxy will allow you to send datapoints to Okareo automatically.

When your datapoints are uploaded to Okareo, they will be tagged with a 'task' label. The currently supported tasks are:

- Question Answering
- Roleplaying 
- Code Generation
- Tool Calling
- Data Extraction
- Classification
- Summarization

The task tags allowing you to sort through your data via segments, which are built from user-configurable filters on datapoints.

To illustrate datapoints and segments in Okareo, let's set up a model that is tasked with generating tool calls. We expect these datapoints to be labeled with the "tool calling" task.

In [18]:
import os
import openai # openai v1.0.0+

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
client = openai.OpenAI(
    api_key=OPENAI_API_KEY,
    base_url="http://0.0.0.0:4000", # URL for Okareo proxy
)

In [15]:
import json
file_path = "data/gorilla_tools.jsonl"

gorilla_tools = []
with open(file_path, "r") as f:
    lines = f.readlines()
    for line in lines:
        gorilla_tools.append(json.loads(line))


In [22]:
system_prompt_template = """Given the following initial state, invoke one or more appropriate functions.

Initial state:
{
    'GorillaFileSystem': {
        'root': {
            'workspace': {
                'type': 'directory',
                'contents': {
                    'document': {
                        'type': 'directory',
                        'contents': {
                            'final_report.pdf': {
                                'type': 'file',
                                'content': 'Year2024 This is the final report content including budget analysis and other sections.'
                            },
                            'previous_report.pdf': {
                                'type': 'file',
                                'content': 'Year203 This is the previous report content with different budget analysis.'
                            }
                        }
                    }
                }
            }
        }
    }
}"""

messages = [
    {"role": "system", "content": system_prompt_template},
    {"role": "user", "content": "Recursively find file names containing 'report'"}
]


response = client.chat.completions.create(
    model="gpt-4o", 
    messages=messages,
    tools=gorilla_tools
)

print(response)

ChatCompletion(id='chatcmpl-AfDQSH2SXMqKs9NdS3lx0US4tz0Ya', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_z8qRWW3RkamW4pZIbaIUCH33', function=Function(arguments='{"name":"report"}', name='find'), type='function')]))], created=1734386420, model='gpt-4o-2024-08-06', object='chat.completion', service_tier=None, system_fingerprint='fp_a79d8dac1f', usage=CompletionUsage(completion_tokens=13, prompt_tokens=2049, total_tokens=2062, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=1920)))


In [23]:
# generate another datapoint with a different user message
messages = [
    {"role": "system", "content": system_prompt_template},
    {"role": "user", "content": "List the size of the files/directories in the currect directory."}
]


response = client.chat.completions.create(
    model="gpt-4o", 
    messages=messages,
    tools=gorilla_tools
)

print(response)

ChatCompletion(id='chatcmpl-AfDSS1ZCilkXmQA83pu2WqbXbY9RD', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_dLJZLG6fJUwnmZsywaKlSmdc', function=Function(arguments='{}', name='du'), type='function')]))], created=1734386544, model='gpt-4o-2024-08-06', object='chat.completion', service_tier=None, system_fingerprint='fp_a79d8dac1f', usage=CompletionUsage(completion_tokens=9, prompt_tokens=2054, total_tokens=2063, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=1920)))
