# Checking whether the API is up
The `/ping` endpoint (`/help` is equivalent) is meant to give an overview over whether the API is up, what version it is running and what state it has. 

In [1]:
# Example use of the /ping endpoint
import requests

base_url = 'http://localhost:8502/api/chatbot' # Change the URL to vader5 if you are running it against vader5.
url = base_url + '/ping'
response = requests.get(url)
print(response.text)

{
  "endpoints": [
    {
      "methods": [
        "get"
      ],
      "name": "ping",
      "params": {},
      "return_type": "json{version:string,streamvariants:list{string},endpoints:list{name:string,methods:string,params:list{json},returntype:json}}"
    },
    {
      "methods": [
        "get"
      ],
      "name": "docs",
      "params": {},
      "return_type": "string"
    },
    {
      "methods": [
        "get"
      ],
      "name": "getthread",
      "params": {
        "auth_key": "string",
        "thread_id": "string"
      },
      "return_type": "json{list{variant:streamvariant=string,content:string}}"
    },
    {
      "methods": [
        "get"
      ],
      "name": "streamresponse",
      "params": {
        "auth_key": "string",
        "input": "string",
        "thread_id": "optional{string}"
      },
      "return_type": "stream{json{variant:streamvariant=string,content:string}}"
    },
    {
      "methods": [
        "get",
        "post"
      ],
    

## The ping response
The ping response is largely for manual checking, but can also be used to verify that the interface to the server is as expected. 
A ping response is a JSON Object that contains the version, a list of Stream Variants and all endpoints.
The version follows [semver](https://semver.org/), 

As can be seen in the evaluated code above, the ping response describes itself:
It writes its own version as a string, the endpoints in a list where each endpoint describes its name as string, methods as a list of strings, its return_type as a String and its params as a JSON map. 

It also gives a list of all StreamVariants the server supports. 

## Stream variants
To give the client as much information about the conversation with the LLM as necessary, seperate Stream variants are used to mark different events or participants of the conversation. 

As of version 1.4.1, there are 11 Stream variants: Prompt,User,Assistant,Code,CodeOutput,Image,ServerError,OpenAIError,CodeError,StreamEnd,ServerHint; as can also be read when running `/docs`. 
Changing the behaviour of variants requires a major version bump and adding variants requires a minor version bump in accordance to semver. 
This mean that as long as the major version of the backend matches the major version the client was written for, the client can expect its handling of the stream variants to be correct (or suffient, as it can ignore new variants).
All unknown Stream variants can be safely ignored by the client. 

These are the current Stream variants in Version 1.4.*:
- Prompt: The prompt the LLM got when it started the conversation. Mainly for debugging; should not be shown to the user outside of debugging. 
- User: The input of the user, as a String
- Assistant: The output of the Assistant, as a String. Often Markdown, because the LLM can output Markdown.
Multiple messages of this variant after each other belong to the same message, but are broken up due to the stream.
- Code: The code that the Assistant generated, as a String. It will be executed on the backend.
Currently, only Python is supported. The content is not formatted.
- CodeOutput: The output of the code that was executed, as a String. Also not formatted.
- Image: An image that was generated during the conversation, as a String. The image is Base64 encoded.
An example of this would be a matplotlib plot.
- ServerError: An error that occured on the server(backend) side, as a String. Contains the error message.
The client should realize that this error occured and handle it accordingly; most ServerErrors are immeadiately followed by a StreamEnd.
- OpenAI Error: An error that occured on the OpenAI side, as a String. Contains the error message.
These are often for the rate limits, but can also be for other things, i.E. if the API is down.
- CodeError: The Code from the LLM could not be executed or there was some other error while setting up the code execution.
- StreamEnd: The Stream ended. Contains a reason as a String. This is always the last message of a stream.
If the last message is not a StreamEnd but the stream ended, it's an error from the server side and needs to be fixed.
- ServerHint: The Server hints something to the client. This is primarily used for giving the thread_id.
The Content is in JSON format, with the key being the hint and the value being the content. Currently, only the keys "thread_id" and "warning" are used.
An example for a ServerHint packet would be `{"variant": "ServerHint", "content": "{\"thread_id\":\"1234\"}"}`.
That means that the content needs to be parsed as JSON to get the actual content.

## Authentication
To make sure that not everyone on the network can simply use the API, a base authentication is in place that requires the client to send, with every request, an auth_key that has to match the one on the server. 

Because this doesn't help much regarding overall safety, a move to oauth2 is being worked on. Then, the frontend must simply authenticate with their freva token and the username is automatically retrieved. 

# API Endpoints

## /ping
`/ping` and `/help` are meant to allow the client to check whether the server is up and to get the version as well as a quick overview over the API. 

## /docs
`/docs` is supposed to only be used manually to check how to use the API.

In [2]:
url = base_url + '/docs'

response = requests.get(url)
print(response.text)

Version: 1.5.1
# Stream Variants

The different variants of the stream or Thread that can be sent to the client.
They are always sent as JSON strings in the format `{"variant": "variant_name", "content": "content"}`.

User: The input of the user, as a String.

Assistant: The output of the Assistant, as a String. Often Markdown, because the LLM can output Markdown.
Multiple messages of this variant after each other belong to the same message, but are broken up due to the stream.

Code: The code that the Assistant generated, as a String. It will be executed on the backend.
Currently, only Python is supported. The content is not formatted.
Due to how the LLM calls the code_interpreter, it will be contained within a json object in the following format:
`{"variant": "Code", "content": "{\"code\":\"LLM Code here\"}"`

CodeOutput: The output of the code that was executed, as a String. Also not formatted.

Image: An image that was generated during the conversation, as a String. The image is Ba

## /getthread
`/getthread` is for retrieving a past thread by thread_id. It requires as query arguments both the thread_id as well as the auth_key. 

In [None]:
# Try to get a thread
# Read the AUTH_KEY from the environment variable AUTH_KEY

import os
from dotenv import load_dotenv
load_dotenv()
auth_key = os.getenv('AUTH_KEY')

auth_string = "&auth_key=" + auth_key
url = base_url + '/getthread?thread_id=1' + auth_string
response = requests.get(url)

print(response.text) # Thread ID was not found, because it wasn't created yet. The real thread_ids are strings, not integers. See below.

Thread not found.


## /streamresponse
The `/streamresponse` is the main part of the API. It takes in the users input as a string, the auth_key and optionally the thread_id.

If the thread_id is given, the conversation is continued where that thread last left off. If it isn't given, a new thread is started. 

In [4]:
# user_input = "This is a test. Please respond with \"200 OK\" and exit."
user_input = "This is a test. Use the code_interpreter function to calculate the product of 1286732 and 29843244 and return the result."
url = base_url + '/streamresponse?input=' + user_input + auth_string # leaving out the thread_id spawns a new thread

response = requests.get(url, stream=True) # The response can be streamed or gotten all at once.
complete_response = [] # The stream gets consumed when streamed, we'll store it here. Note that python chunks the response, so it might be split up if it's long.
for delta in response:
    print(delta)
    complete_response.append(delta.decode("utf-8"))

b'{"variant":"ServerHint","content":"{\\"thread_id\\": \\"IwbAgfDGwK7kx6rSN3jC6UeVzTsVnRdW\\"}"}'
b'{"variant":"Code","content":["","call_9b8D4q39OK0K0B0bKRvpheDd"]}'
b'{"variant":"Code","content":["{\\"","call_9b8D4q39OK0K0B0bKRvpheDd"]}'
b'{"variant":"Code","content":["code","call_9b8D4q39OK0K0B0bKRvpheDd"]}'
b'{"variant":"Code","content":["\\":\\"","call_9b8D4q39OK0K0B0bKRvpheDd"]}'
b'{"variant":"Code","content":["result","call_9b8D4q39OK0K0B0bKRvpheDd"]}'
b'{"variant":"Code","content":[" =","call_9b8D4q39OK0K0B0bKRvpheDd"]}'
b'{"variant":"Code","content":[" ","call_9b8D4q39OK0K0B0bKRvpheDd"]}'
b'{"variant":"Code","content":["128","call_9b8D4q39OK0K0B0bKRvpheDd"]}'
b'{"variant":"Code","content":["673","call_9b8D4q39OK0K0B0bKRvpheDd"]}'
b'{"variant":"Code","content":["2","call_9b8D4q39OK0K0B0bKRvpheDd"]}'
b'{"variant":"Code","content":[" *","call_9b8D4q39OK0K0B0bKRvpheDd"]}'
b'{"variant":"Code","content":[" ","call_9b8D4q39OK0K0B0bKRvpheDd"]}'
b'{"variant":"Code","content":["298","ca

In [5]:
import json
responses = []
for delta in complete_response:
    try:
        response = json.loads(delta)
        responses.append(response)
    except json.JSONDecodeError:
        print("Error decoding JSON: " + delta)

# Now we can simply access the content of the response.

# We saw that in this case the first delta was a ServerHint containing the thread_id in JSON format.
thread_id_content = responses[0]["content"] 
print(thread_id_content)
# Deserialize the JSON content of the ServerHint
thread_id = json.loads(thread_id_content)["thread_id"]
print(thread_id)

{"thread_id": "IwbAgfDGwK7kx6rSN3jC6UeVzTsVnRdW"}
IwbAgfDGwK7kx6rSN3jC6UeVzTsVnRdW


In [6]:
# Now that we have the thread_id, we can also test the /getthread endpoint
url = base_url + '/getthread?thread_id=' + thread_id + auth_string
response = requests.get(url)

print(response.text) # This time we should get the thread content.
response_json = json.loads(response.text)

# Example: extract everything the Assistant said
assistant_messages = [message["content"] for message in response_json if message["variant"] == "Assistant"]
print(assistant_messages)

[{"variant":"ServerHint","content":"{\"thread_id\": \"IwbAgfDGwK7kx6rSN3jC6UeVzTsVnRdW\"}"},{"variant":"User","content":"This is a test. Use the code_interpreter function to calculate the product of 1286732 and 29843244 and return the result."},{"variant":"Code","content":["{\"code\":\"result = 1286732 * 29843244\\nresult\"}","call_9b8D4q39OK0K0B0bKRvpheDd"]},{"variant":"CodeOutput","content":["38400257038608","call_9b8D4q39OK0K0B0bKRvpheDd"]},{"variant":"Assistant","content":"The product of 1,286,732 and 29,843,244 is 38,400,257,038,608. If you have any more calculations or questions, feel free to ask!"},{"variant":"StreamEnd","content":"Generation complete"}]
['The product of 1,286,732 and 29,843,244 is 38,400,257,038,608. If you have any more calculations or questions, feel free to ask!']


## /stop
`/stop` is a simple endpoint that stops the generation of a running thread as soon as the server recieves the request. 

It takes in the thread_id as well as the auth_key. It can use the `get` or `post` method, both work identically. 

Also note that because this is a test environment, the maximum length of the assistant's answer is comparitively short, so stopping a thread can not be demonstrated well here.

In [7]:
# Short example of the /stop endpoint
url = base_url + '/stop?thread_id=' + thread_id + auth_string

response = requests.get(url) # could also be post
print(response.text) # The conversation was not found, because it already stopped. 

Conversation not found.
