# Checking whether the API is up
The `/ping` endpoint (`/help` is equivalent) is meant to give an overview over whether the API is up, what version it is running and what state it has. 

In [48]:
# Example use of the /ping endpoint
import requests

base_url = 'http://localhost:8502' # Change the URL to vader5 if you are running it against vader5.
url = base_url + '/ping'
response = requests.get(url)
print(response.text)

Version: 1.1.1
Streamvariants=User,Assistant,Code,CodeOutput,Image,ServerError,OpenAIError,CodeError,StreamEnd,ServerHint
ping:get,,String
docs:get,,String
getthread:get,thread_id=String&auth_key=String,Json{List{Variant:Streamvariant=String,Content:String}}
streamresponse:get,thread_id=Optional{String}&input=String&auth_key=String,Stream{Json{Variant:Streamvariant=String,Content:String}}
stop:post+get,thread_id=String&auth_key=String,



## Understanding the ping response
The ping response is largely for manual checking, but can also be used to verify that the interface to the server is as expected. 
A ping response consists of one line explaining the version, which follows [semver](https://semver.org/), 
followed by one line listing all Stream variants that the server can respond with. See the Stream variants header below. 

All consequent lines describe the API endpoints and how to use them. These contain three values, seperated by commas. First, the requests that are accepted, i.E. "get" or "post+get" in the case of the `stop` endpoint.
Second, the input parameters as query parameters. The format is "key=type", where type can for example be "String" or "Optional", marking that that parameter can be ommitted. 
Third is the return type of that endpoint. These are explained at the seperate endpoint explenations below. 

## Stream variants
To give the client as much information about the conversation with the LLM as necessary, seperate Stream variants are used to mark different events or participants of the conversation. 

As of version 1.1.1, there are 10 Stream variants: User,Assistant,Code,CodeOutput,Image,ServerError,OpenAIError,CodeError,StreamEnd,ServerHint; as can also be read when running `/docs`. 
Changing the behaviour of variants requires a major version bump and adding variants requires a minor version bump in accordance to semver. 
This mean that as long as the major version of the backend matches the major version the client was written for, the client can expect its handlin of the stream variants to be correct (or suffient, as it might ignore new variants).
All unknown Stream variants can be safely ignored by the client. 

These are the current Stream variants in Version 1.1.*:
- User: The input of the user, as a String
- Assistant: The output of the Assistant, as a String. Often Markdown, because the LLM can output Markdown.
Multiple messages of this variant after each other belong to the same message, but are broken up due to the stream.
- Code: The code that the Assistant generated, as a String. It will be executed on the backend.
Currently, only Python is supported. The content is not formatted.
- CodeOutput: The output of the code that was executed, as a String. Also not formatted.
- Image: An image that was generated during the conversation, as a String. The image is Base64 encoded.
An example of this would be a matplotlib plot.
- ServerError: An error that occured on the server(backend) side, as a String. Contains the error message.
The client should realize that this error occured and handle it accordingly; most ServerErrors are immeadiately followed by a StreamEnd.
- OpenAI Error: An error that occured on the OpenAI side, as a String. Contains the error message.
These are often for the rate limits, but can also be for other things, i.E. if the API is down.
- CodeError: The Code from the LLM could not be executed or there was some other error while setting up the code execution.
- StreamEnd: The Stream ended. Contains a reason as a String. This is always the last message of a stream.
If the last message is not a StreamEnd but the stream ended, it's an error from the server side and needs to be fixed.
- ServerHint: The Server hints something to the client. This is primarily used for giving the thread_id.
The Content is in the format `<key>:<value>`, for now the key is "thread_id" and the value is the thread_id.
Might be used for other things in the future. If the client receives a ServerHint with an unknown key, it should log a warning, but not crash.

## Authentication
To make sure that not everyone on the network can simply use the API, a base authentication is in place that requires the client to send, with every request, an auth_key that has to match the one on the server. 

Currently (version 1.1.1), for testing purposes, the auth_key on the server is set to "qA94VhroHMHFN55inWgfAAkt1WEmzQ4J". 

# API Endpoints

## /ping
`/ping` and `/help` are meant to allow the client to check whether the server is up and to get the version as well as a quick overview over the API. 

## /docs
`/docs` is supposed to only be used manually to check how to use the API.

In [49]:
url = base_url + '/docs'

response = requests.get(url)
print(response.text)

Version: 1.1.1

# Stream Variants

The different variants of the stream or Thread that can be sent to the client.
They are always sent as JSON strings in the format `{"variant": "variant_name", "content": "content"}`.

User: The input of the user, as a String.

Assistant: The output of the Assistant, as a String. Often Markdown, because the LLM can output Markdown.
Multiple messages of this variant after each other belong to the same message, but are broken up due to the stream.

Code: The code that the Assistant generated, as a String. It will be executed on the backend.
Currently, only Python is supported. The content is not formatted.

CodeOutput: The output of the code that was executed, as a String. Also not formatted.

Image: An image that was generated during the conversation, as a String. The image is Base64 encoded.
An example of this would be a matplotlib plot.

ServerError: An error that occured on the server(backend) side, as a String. Contains the error message.
The client

## /getthread
`/getthread` is for retrieving a past thread by thread_id. It requires as query arguments both the thread_id as well as the auth_key. 

In [50]:
# Try to get a thread
auth_key = "qA94VhroHMHFN55inWgfAAkt1WEmzQ4J"
auth_string = "&auth_key=" + auth_key
url = base_url + '/getthread?thread_id=1' + auth_string
response = requests.get(url)

print(response.text) # Thread ID was not found, because it wasn't created yet. The real thread_ids are strings, not integers. See below.

Thread not found.


## /streamresponse
The `/streamresponse` is the main part of the API. It takes in the users input as a string, the auth_key and optionally the thread_id.

If the thread_id is given, the conversation is continued where that thread last left off. If it isn't given, a new thread is started. 

In [51]:
user_input = "This is a test. Please respond with \"200 OK\" and exit."
url = base_url + '/streamresponse?input=' + user_input + auth_string # leaving out the thread_id spawns a new thread

response = requests.get(url, stream=True) # The response can be streamed or gotten all at once.
complete_response = [] # The stream gets consumed when streamed, we'll store it here.
for delta in response:
    print(delta)
    complete_response.append(delta.decode("utf-8"))

b'{"variant":"ServerHint","content":"thread_id:3MNpwfe9rr9zFyDDJgAbgD8jAtaO8C23"}'
b'{"variant":"Assistant","content":""}'
b'{"variant":"Assistant","content":"200"}'
b'{"variant":"Assistant","content":" OK"}'
b'{"variant":"StreamEnd","content":"Generation complete"}'


In [52]:
import json
responses = []
for delta in complete_response:
    try:
        response = json.loads(delta)
        responses.append(response)
    except json.JSONDecodeError:
        print("Error decoding JSON: " + delta)

# Now we can simply access the content of the response.

# We saw that in this case the first delta was a ServerHint containing the thread_id in the format "thread_id:THREAD_ID"
thread_id_content = responses[0]["content"] 
thread_id = thread_id_content.split(":")[1].strip()
print("Thread ID: " + thread_id)

Thread ID: 3MNpwfe9rr9zFyDDJgAbgD8jAtaO8C23


In [53]:
# Now that we have the thread_id, we can also test the /getthread endpoint
url = base_url + '/getthread?thread_id=' + thread_id + auth_string
response = requests.get(url)

print(response.text) # This time we should get the thread content.
response_json = json.loads(response.text)

# Example: extract everything the Assistant said
assistant_messages = [message["content"] for message in response_json if message["variant"] == "Assistant"]
print(assistant_messages)

[{"variant":"ServerHint","content":"thread_id:3MNpwfe9rr9zFyDDJgAbgD8jAtaO8C23"},{"variant":"User","content":"This is a test. Please respond with \\\"200 OK\\\" and exit."},{"variant":"Assistant","content":"200 OK"},{"variant":"StreamEnd","content":"Generation complete"}]
['200 OK']


## /stop
`/stop` is a simple endpoint that stops the generation of a running thread as soon as the server recieves the request. 

It takes in the thread_id as well as the auth_key. It can use the `get` or `post` method, both work identically. 

Also note that because this is a test environment, the maximum length of the assistant's answer is comparitively short, so stopping a thread can not be demonstrated well here.

In [54]:
# Short example of the /stop endpoint
url = base_url + '/stop?thread_id=' + thread_id + auth_string

response = requests.get(url) # could also be post
print(response.text) # The conversation was not found, because it already stopped. 

Conversation not found.
