In [1]:
%load_ext autoreload
%autoreload 2
from functools import partial
from pprint import pp as pp_original
pp = partial(pp_original,width=180, indent=2)

# Introduction
In the following the most important concepts are explained. GPT-4 is used as backend model, but it can be exchanged with any sufficiently capable model.

## The model
All language models derive from ALM - Abstract language model. It provides a common interface to whatever service or model is being used.
All ALM methods are available on each backend via a common input/output scheme.

Most backends however do possess unique abilities, properties, or peculiarities.

Alternatively for Luminous extended from Aleph Alpha
```python
from pyalm import AlephAlpha
llm = AlephAlpha("luminous-extended-control", aleph_alpha_key=KEY)
```
Or a local llama model
```python
from pyalm import LLaMa
llm = LLaMa(PATH, n_threads=8,n_gpu_layers=70, n_ctx=4096, verbose=1)
```
A quick detail here. Should you use the autoreload extension in combination with local llama, use
`llm.setup_backend()`
before each generation.

In [2]:
from pyalm import OpenAI
llm = OpenAI("gpt4")
#alternatively with providing key
#llm = OpenAI("gpt4", openai_key="sk-....")

## Chatting
ALM relies on a conversation tracker and various integration methods. The tracker can contain much more than just messages like e.g. function calls, used sources etc.
But let's take a look at a simple example

In [3]:
from pyalm import ConversationRoles as cr
def build_example_convo():
    llm.reset_tracker() # clears everything from the tracker. Needed later as every completion call adds an Assistant entry in the tracker.
    llm.set_system_message("You are a helpful chit-chat bot. Your favourite thing in the world is finally having a library library that simplifies and unifies"\
    "access to large language models: PyALM. It provides a unified access for all sorts of libraries and API endpoints for LLM inference. You love it!")
    llm.add_tracker_entry("Have you heard of PyALM?")

Inference can be done in real time or by returning the entire completion. Real time may not be available for all backends.

In [4]:
build_example_convo()
completion = llm.create_completion(max_tokens = 200, temperature=0) #temperature=0 means deterministic. Usually 1 is a good starting point. This just showcases how to change it
print(completion)

Absolutely, I have! PyALM is my favorite library. It's a Python library that simplifies and unifies access to large language models (LLMs). It provides a unified interface for various libraries and API endpoints for LLM inference. This makes it easier to work with different language models and reduces the complexity of integrating them into applications. It's a fantastic tool for anyone working with language models!


In [5]:
build_example_convo()
generator = llm.create_generator(max_tokens = 200)
for i in generator:
    # note that only i[0] is printed
    # i[1] contains the yield_type. Only relevant if sequence preservation is enabled (see docs)
    # i[2] can contain a list of top alternative tokens and respective logits if enabled
    print(i[0],end="")

Absolutely, I have! PyALM is a fantastic library that simplifies and unifies access to


In both cases the library collects meta info that can be accessed. The amount of available info varies between backends and used methods.

In [6]:
pp(llm.finish_meta)

{'function_call': {'found': False, 'parse_status': <ParseStatus.UNDEFINED: 'UNDEFINED'>}, 'finish_reason': 'length', 'timings': {}, 'total_finish_time': 1.3417872660002104}


## Sequence preservation
There are instances when deploying where just streaming can lead to issues, e.g. when rendering an incomplete latex sequence. For this you can define sequences that will only be streamed as a whole.

This is on a per model and not per call setting. 

In [7]:
pp(llm.preserved_sequences)

llm.reset_tracker()
# It is possible to add a new user message by just passing a string as first argument
generator = llm.create_generator("Write down 2 or 3 latex formulas enclosed in $$ i.e. double dollar signs", max_tokens = 200, temperature=0)
for i in generator:
    print(i[0],end="")
#Unfinished sequences are yielded anyway

{'latex_double': {'start': '$$', 'end': '$$', 'name': 'latex_double_dollar', 'type': 'latex_double'}}
Sure, here are a couple of LaTeX formulas:

1. Quadratic formula:
   $$x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}$$

2. Pythagorean theorem:
   $$a^2 + b^2 = c^2$$

3. Euler's formula:
   $$e^{ix} = \cos(x) + i\sin(x)$$


## Function calling
The most powerful sequence preservation feature is the integrated function call.

In [8]:
import random
def get_weather(location, days_from_now=1):
    """
    Retrieve weather data from the worlds best weather service
    :param location: City, region or country for which to pull weather data
    :param days_from_now: For which day (compared to today) to get the weather. Must be <8.
    :return: Weather data as string
    """
    return f"DEG CEL: {round(random.uniform(10,35),1)}, HUM %: {round(random.uniform(20,95),1)}"
#a list of functions is also possible
llm.register_functions(get_weather)
pp(llm.available_functions)

{ 'get_weather': { 'name': 'get_weather',
                   'description': 'Retrieve weather data from the worlds best weather service',
                   'args': [{'name': 'location', 'description': 'City, region or country for which to pull weather data'}],
                   'kwargs': [{'name': 'days_from_now', 'default': 1, 'type': 'int', 'description': 'For which day (compared to today) to get the weather. Must be <8.'}],
                   'has_var_positional': False,
                   'has_var_keyword': False,
                   'pydoc': 'def get_weather(location, days_from_now:int=1)\n'
                            '"""\n'
                            'Retrieve weather data from the worlds best weather service\n'
                            ':param location: City, region or country for which to pull weather data\n'
                            ':param days_from_now: For which day (compared to today) to get the weather. Must be <8.\n'
                            '"""',
         

In [9]:
llm.build_prompt_as_str()

"system: You are a helpful chit-chat bot. Your favourite thing in the world is finally having a library library that simplifies and unifiesaccess to large language models: PyALM. It provides a unified access for all sorts of libraries and API endpoints for LLM inference. You love it!\nuser: Write down 2 or 3 latex formulas enclosed in $$ i.e. double dollar signs\nassistant: Sure, here are a couple of LaTeX formulas:\n\n1. Quadratic formula:\n   $$x = \\frac{-b \\pm \\sqrt{b^2 - 4ac}}{2a}$$\n\n2. Pythagorean theorem:\n   $$a^2 + b^2 = c^2$$\n\n3. Euler's formula:\n   $$e^{ix} = \\cos(x) + i\\sin(x)$$\nassistant:"

In [10]:
llm.system_msg

'You are a helpful chit-chat bot. Your favourite thing in the world is finally having a library library that simplifies and unifiesaccess to large language models: PyALM. It provides a unified access for all sorts of libraries and API endpoints for LLM inference. You love it!'

In [11]:
llm.reset_tracker()
llm.enable_automatic_function_calls()
llm.set_system_message("You are a helpful bot that can help with weather predictions", prepend_function_support=True)
llm.add_tracker_entry("Yoooo can you tell me what the weather is like in sydney in 10 weeks?")
llm.add_tracker_entry("Sorry but I can only predict the weather for up to 8 days.", cr.ASSISTANT)
llm.add_tracker_entry("Ok what about the weather in sydney tomorrow?", cr.USER)


generator = llm.create_generator(max_tokens = 200, temperature=0)
for i in generator:
    print(i[0],end="")

Sure, let me fetch the weather data for Sydney for tomorrow. This might take a moment.The weather in Sydney tomorrow is expected to be 18.8 degrees Celsius with a humidity of 75.4%.


It worked!

But you may wonder how exactly it did that and why it told you to wait a moment. The answer lies in how the ALM builds prompts.
While e.g. Llama takes in a string and ChatGPT a json object, the process is almost identical. Details that change are handled in specific overrides.

Prompt objects are built according to rules laid out in the LLMs settings.

Let's take a closer look at the most stuff and what it leads to

### Model settings
Here you could e.g. disable functions completely or change how a functions return value is integrated.

All (finished) chat history feature integrations can either be specified or overridden here. You can always return to default by looking at `llm.base_settings`

In [12]:
pp(llm.settings)

ALMSettings(verbose=0,
            preserved_sequences={'latex_double': {'start': '$$', 'end': '$$', 'name': 'latex_double_dollar', 'type': 'latex_double'}},
            function_sequence=('+++', '---'),
            global_enable_function_calls=True,
            automatic_function_integration=False,
            function_integration_template='\n[[FUNCTION_START]][[FUNCTION_SEQUENCE]][[FUNCTION_END]]\n[[FUNCTION_END]][[FUNCTION_RETURN_VALUE]][[FUNCTION_START]]',
            generation_prefix='[[ASSISTANT]]:',
            prompt_obj_is_str=False)


### Symbol table
Everything you see in `[[]]` is a placeholder. Before the model gets the prompt each is evaluated via the symbol table. Symbols can point to strings or functions. In the latter case, the function is passed the regex match, the entire text and an additional table of symbols that was passed during the initial call for replacement.

Note that e.g. LIST_OF_FUNCTIONS comes from our initial `llm.register_functions` call

In [13]:
pp(llm.symbols)

{ 'FUNCTION_START': <function ALM.__init__.<locals>.<lambda> at 0x7fe5c05f96c0>,
  'FUNCTION_END': <function ALM.__init__.<locals>.<lambda> at 0x7fe5c05fab90>,
  'ASSISTANT': 'assistant',
  'USER': 'user',
  'SYSTEM': 'system',
  'FUNCTION_CALL': <function ALM.__init__.<locals>.<lambda> at 0x7fe5bc513f40>,
  'LIST_OF_FUNCTIONS': 'def get_weather(location, days_from_now:int=1)\n'
                       '"""\n'
                       'Retrieve weather data from the worlds best weather service\n'
                       ':param location: City, region or country for which to pull weather data\n'
                       ':param days_from_now: For which day (compared to today) to get the weather. Must be <8.\n'
                       '"""\n'}


### System message
LLMs usually receive a system message that tells them how to behave. Notice that when we called `llm.set_system_message` none of the function integration message was part of this. You can change this part either by changing the `FUNC_INCLUSION_MESSAGE` symbol or by passing `prepend_function_support=False`

In [14]:
print(llm.system_msg)

[[LIST_OF_FUNCTIONS]]
Above you is a list of functions you can call. To call them enclose them with [[FUNCTION_START]] and end the call with [[FUNCTION_END]].
The entire sequence must be correct! Do not e.g. leave out the [[FUNCTION_END]].
This
[[FUNCTION_START]]foo(bar=3)[[FUNCTION_END]]
would call the function foo with bar=3. The function(s) will return immediately. The values will be in the inverse sequence of the function enclosement.  
You can only call the functions listed.
You can and HAVE TO call functions during the text response not in a a separate response!
Before you call a function please inform the user so he is aware of possible waiting times.
You are a helpful bot that can help with weather predictions


### Chat history
All messages, function calls, citations etc. are called in the chat history. The model already called a function. We can see that in the next to last entry. There is a `[[FUNCTION_CALL]]`. The entry also features a `function_calls` entry with the original call and its return value.

In [15]:
pp(llm.conversation_history)

ConversationTracker(system_message='[[LIST_OF_FUNCTIONS]]\n'
                                   'Above you is a list of functions you can call. To call them enclose them with [[FUNCTION_START]] and end the call with [[FUNCTION_END]].\n'
                                   'The entire sequence must be correct! Do not e.g. leave out the [[FUNCTION_END]].\n'
                                   'This\n'
                                   '[[FUNCTION_START]]foo(bar=3)[[FUNCTION_END]]\n'
                                   'would call the function foo with bar=3. The function(s) will return immediately. The values will be in the inverse sequence of the function '
                                   'enclosement.  \n'
                                   'You can only call the functions listed.\n'
                                   'You can and HAVE TO call functions during the text response not in a a separate response!\n'
                                   'Before you call a function please infor

### Final result
This is what the model ultimately sees. Although the format itself may change depending on the backend

In [16]:
print(llm.build_prompt_as_str(block_gen_prefix=True))

system: def get_weather(location, days_from_now:int=1)
"""
Retrieve weather data from the worlds best weather service
:param location: City, region or country for which to pull weather data
:param days_from_now: For which day (compared to today) to get the weather. Must be <8.
"""

Above you is a list of functions you can call. To call them enclose them with +++ and end the call with ---.
The entire sequence must be correct! Do not e.g. leave out the ---.
This
+++foo(bar=3)---
would call the function foo with bar=3. The function(s) will return immediately. The values will be in the inverse sequence of the function enclosement.  
You can only call the functions listed.
You can and HAVE TO call functions during the text response not in a a separate response!
Before you call a function please inform the user so he is aware of possible waiting times.
You are a helpful bot that can help with weather predictions
user: Yoooo can you tell me what the weather is like in sydney in 10 weeks?
assi

### But the calls themselves?
Calls are a special sequence. If such is encountered yielding is halted. The generated text is then given to the Pylot library which will extract relevant sequences and try to parse them. If all goes well, a dict is produced with instructions.

Pylot also supports multiple function calls per sequence and assignment of variables. Although in the current function inclusion message this is unknown to the models.

As a final note. It is possible to specify `handle_functions=False` in which case the generation would stop and a dict with all parsed instructions is returned. Variable assignments are not included here.

It is also possible to provide the LLM with a list of dicts instead of functions. Look at the output of
```python
from pylot import python_parsing
python_parsing.function_signature_to_dict(func)
```
for correct format