# promptx

A framework for building AI systems.

```bash
pip install pxx
```

In [1]:
from promptx import prompt

character = 'Batman'
prompt(f'Write a character profile for {character}')

[32m2023-11-01 08:38:50.153[0m | [1mINFO    [0m | [36mpromptx[0m:[36mload[0m:[36m104[0m - [1mloading local app from /home/rjl/promptx[0m
[32m2023-11-01 08:38:50.154[0m | [1mINFO    [0m | [36mpromptx[0m:[36mload[0m:[36m107[0m - [1mloaded environment variables from /home/rjl/promptx/.env[0m
[32m2023-11-01 08:38:50.156[0m | [1mINFO    [0m | [36mpromptx[0m:[36mload[0m:[36m108[0m - [1mAPI KEY wMeGC[0m


TypeError: ChatGPT.__init__() missing 2 required positional arguments: 'api_key' and 'org_id'

By default, this returns a plain string response, but to generate complex data you can pass in the expected schema along with the prompt input.

*Note: `Entity` is a thin layer on top of `pydantic.BaseModel` that allows the object to be stored as an embedding. You can use `pydantic.BaseModel` directly if you don't need to store the object as an embedding and just want to use it as the prompt output schema.*

In [None]:
from pydantic import Field
from promptx.collection import Entity

class Character(Entity):
    name: str = Field(..., embed=False),
    description: str = Field(..., description='Describe the character in a few sentences')
    age: int = Field(..., ge=0, le=120)

batman = prompt('Generate a character profile for Batman', output=Character)
batman

This returns an instance of the specified schema using the generated response as the input data. Let's create a list of instead.

In [None]:
characters = prompt(
    'Generate some characters from the Batman universe',
    output=[Character],
)

characters

If the output is a list, `prompt` returns a `Collection`, which extends `pd.DataFrame`. To extract the `Entity` representations, use the `objects` property.

We can now store these generated objects as embeddings in a collection.

In [None]:
from promptx import store

store(*characters.objects)

This stores the object as an embedding, along with some metadata, in a vector database (ChromaDB by default). The process is quite simple, it embeds the whole object as a JSON string and each field individually. This allows us to query the database using any field in the object.

In [None]:
from promptx import query

query()

Now let's generate some more characters and add them to the collection. We'll first get any existing characters and extract their names, which we can pass to the prompt to avoid generating duplicates. Any characters generated will be added the list during iteration. Finally, we'll store all the generated characters in the collection.

In [None]:
n = 3
characters = query().objects

for _ in range(n):
    characters += prompt(
        '''
        Generate a list of new characters from the Batman universe.
        Don't use any of the existing characters.
        ''',
        input = {
            'existing_characters': [c.name for c in characters],
        },
        output=[Character],
    ).objects

store(*characters)
query()

Now that the characters are embedded, we can query the collection.

In [None]:
villains = query('they are a villain')
villains

This compares the query text with the stored objects, returning results that are closest in vector space.

*Note: the effectiveness of embedding queries will depend on what data has been embedded. In this case, ChatGPT will know some details about the generated characters and so does a decent job on this data. For other data, you may find generating synthetic intermediary data to be helpful. E.g. generating `thoughts` and/or `quotes` about a set of documents.*

Because `Collection` extends `pd.DataFrame`, we can use all the usual Pandas methods to filter and sort the results.

In [None]:
villains[villains.age < 30]

Relationships can be defined by setting the field to a type which subclasses `Entity` (or a list of that type). Internally, this is stored as a query and then loaded when the field is accessed from the database.

In [None]:
class StoryIdea(Entity):
    title: str
    description: str = None
    characters: list[Character] = None

characters = query('they are a villain').sample(3).objects

ideas = prompt(
    'Generate some story ideas',
    input={
        'characters': characters,
    },
    output=[StoryIdea],
).objects

for idea in ideas:
    idea.characters = characters

store(*ideas, collection='story-ideas')
query(collection='story-ideas')

Note that the output is being stored in a collection called `story-ideas`, which is created if it doesn't exist. Previously, all the data we've stored has been in the 'default' collection.

*Collections are widely used internally to represent stored models, templates, prompt history, etc. This provides a consistent interface for accessing and manipulating data.*

So far we've used the default model (GPT-3.5) when generating data, but you can specify a custom model using the `llm=` parameter.

In [None]:
from promptx.models.openai import ChatGPT

gpt4 = ChatGPT(id='gpt4', model='gpt4')

characters = prompt(
    'Generate some characters from the Batman universe',
    output=[Character],
    llm=gpt4,
)

You can define any commonly used models, templates, etc, along with defining other settings, by creating a `config.py` file in the root of the project (i.e. adjacent to the `.px/` directory). This file is loaded when the project is initialized and a `setup` function is expected. Here's a simple example that defines a few custom models and a template.

```
# ./config.py

from promptx.models.openai import ChatGPT

gpt4 = ChatGPT(id='gpt4', model='gpt4')

def setup(session):
    session.store(gpt4, collection='models')
```
