Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 9 additions & 10 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
# Typeagent docs
# Typeagent Docs

## TBD
## Basics

For now we have:
- [Getting Started](getting-started.md)
- [High-level API](high-level-api.md)
- [Environment Variables](env-vars.md)

### High-level API:
## Advanced

- create_conversation
- [conversation.query](query-method.md)

### Other

- [architecture design](typeagent-architecture.md)
- [Reproducing the Demos](demos.md)
- [Downloading GMail Messages](gmail.md)
- [Developing and Contributing](developing.md)
3 changes: 3 additions & 0 deletions docs/demos.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# How to Reproduce the Demos

This will be revealed after [PyBay 2025](https://pybay.org/).
12 changes: 12 additions & 0 deletions docs/developing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Developing and Contributing

**Always follow the code of conduct, see [Code of Conduct](../CODE_OF_CONDUCT.md).**

To contribute, submit issues or PRs to
[our repo](https://github.com/microsoft/typeagent-py).

To develop, for now you're on your own.
We use [uv](https://docs.astral.sh/uv/) for some things.
Check out the [Makefile](../Makefile) for some recipes.

More TBD.
44 changes: 44 additions & 0 deletions docs/env-vars.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Environment Variables

No LLM-using application today works without API tokens and/or other
authentication secrets. These are almost always passed via environment
variables.

Typeagent currently supports two families of environment variables:

- Those for (public) OpenAI servers.
- Those for the Azure OpenAI service.

## OPENAI environment variables

The (public) OpenAI environment variables include:

- `OPENAI_API_KEY`: Your secret API key that you get from the
[OpenAI dashboard](https://platform.openai.com/api-keys).
- `OPENAI_MODEL`: An environment variable introduced by
[TypeChat](https://microsoft.github.io/TypeChat/docs/examples/)
indicating the model to use (e.g.`gpt-4o`).

## Azure OpenAI environment variables

If you are using the OpenAI service hosted by Azure, you need different
environment variables, starting with:

- `AZURE_OPENAI_API_KEY`: Your Azure OpenAI API key.
- `AZURE_OPENAI_ENDPOINT`: The full URL of the Azure OpenAI REST API
(e.g. https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/chat/completions?api-version=2023-05-15).
- If you use Azure OpenAI you will know where to get these
(or ask your sysadmin).

## Conflicts

If you set both `OPENAI_API_KEY` and `AZURE_OPENAI_API_KEY`,
plain `OPENAI` will win.

## Other ways to specify environment variables

It is recommended to put your environment variables in a file named
`.env` in the current or parent directory.
To pick up these variables, call `typeagent.aitools.utils.load_dotenv()`
at the start of your program (before calling any typeagent functions).
(For simplicity this is not shown in [Getting Started](getting-started.md).)
136 changes: 136 additions & 0 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# Getting Started

## Installation

```sh
$ pip install typeagent
```

You might also want to use a
[virtual environment](https://docs.python.org/3/library/venv.html)
or another tool like [poetry](https://python-poetry.org/)
or [uv](https://docs.astral.sh/uv/), as long as your tool can
install wheels from [PyPI](https://pypi.org).

## "Hello world" ingestion program

### 1. Create a text file named `transcript.txt`

```txt
STEVE We should really make a Python library for Structured RAG.
UMESH Who would be a good person to do the Python library?
GUIDO I volunteer to do the Python library. Give me a few months.
```

### 2. Create a Python file named `demo.py`

```py
from typeagent import create_conversation
from typeagent.transcripts.transcript import (
TranscriptMessage,
TranscriptMessageMeta,
)


def read_messages(filename) -> list[TranscriptMessage]:
messages: list[TranscriptMessage] = []
with open(filename, "r") as f:
for line in f:
# Parse each line into a TranscriptMessage
speaker, text_chunk = line.split(None, 1)
message = TranscriptMessage(
text_chunks=[text_chunk],
metadata=TranscriptMessageMeta(speaker=speaker),
)
messages.append(message)
return messages


async def main():
conversation = await create_conversation("demo.db", TranscriptMessage)
messages = read_messages("transcript.txt")
print(f"Indexing {len(messages)} messages...")
results = await conversation.add_messages_with_indexing(messages)
print(f"Indexed {results.messages_added} messages.")
print(f"Got {results.semrefs_added} semantic refs.")


if __name__ == "__main__":
import asyncio
asyncio.run(main())
```

### 3. Set up your environment for using OpenAI

The minimal set of environment variables is:

```sh
export OPENAI_API_KEY=your-very-secret-openai-api-key
export OPENAI_MODEL=gpt-4o
```

Some OpenAI setups will require some additional environment variables.
See [Environment Variables](env-vars.md) for more information.
You will also find information there on how to use
Azure-hosted OpenAI models.

### 4. Run your program

```sh
$ python demo.py
```

Expected output looks like:

```txt
0.027s -- Using OpenAI
Indexing 3 messages...
Indexed 3 messages.
Got 26 semantic refs.
```

## "Hello world" query program

### 1. Write this small program

```py
from typeagent import create_conversation
from typeagent.transcripts.transcript import TranscriptMessage


async def main():
conversation = await create_conversation("demo.db", TranscriptMessage)
question = "Who volunteered to do the python library?"
print("Q:", question)
answer = await conversation.query(question)
print("A:", answer)


if __name__ == "__main__":
import asyncio
asyncio.run(main())
```

### 2. Set up your environment like above

### 3. Run your program

```sh
$ python query.py
```

Expected output looks like:

```txt
0.019s -- Using OpenAI
Q: Who volunteered to do the python library?
A: Guido volunteered to do the Python library.
```

## Next steps

You can study the full documentation for `create_conversation()`
and `conversation.query()` in [High-level API](high-level-api.md).

You can also study the source code at the
[typeagent-py repo](https://github.com/microsoft/typeagent-py).
8 changes: 8 additions & 0 deletions docs/gmail.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Extracting GMail Messages

There's a helper script in the repo under `gmail/`.
It requires setting up and creating a Google API project.
Until we have time to write this up, your best bet is to
ask your favorite search engine or LLM-based chat bot for help.

More TBD.
111 changes: 111 additions & 0 deletions docs/high-level-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# High-level API

NOTE: When an argument's default is given as `[]`, this is a shorthand
for a dynamically assigned default value on each call. We don't mean
the literal meaning of this notation in Python, which would imply
that all calls would share a single empty list object as their default.

## Classes

### Message classes

#### `ConversationMessage`

`typeagent.knowpro.universal_message.ConversationMessage`

Constructor and fields:

```py
class ConversationMessage(
text_chunks: list[str], # Text of the message, 1 or more chunks
tags: list[str] = [], # Optional tags
timestamp: str | None = None, # ISO timestamp in UTC with 'z' suffix
metadata: ConversationMessageMeta, # See below
)
```

- Only `text_chunks` is required.
- Tags are arbitrary pieces of information attached to a message
that will be indexed; e.g. `["sketch", "pet shop"]`
- If present, the timestamp must be of the form `2025-10-14T09:03:21z`.

#### `ConversationMessageMeta`

`typeagent.knowpro.universal_message.ConversationMessageMeta`

Constructor and fields:

```py
class ConversationMessageMeta(
speaker: str | None = None, # Optional entity who sent the message
recipients: list[str] = [], # Optional entities to whom the message was sent
)
```

This class represents the metadata for a given `ConversationMessage`.

#### `TranscriptMessage` and `TranscriptMessageMeta`

`typeagent.transcripts.transcript.TranscriptMessage`
`typeagent.transcripts.transcript.TranscriptMessageMeta`

These are simple aliases for `ConversationMessage` and
`ConversationMessageMeta`, respectively.

### Conversation classes

#### `ConversationBase`

`typeagent.knowpro.factory.ConversationBase`

Represents a conversation, which holds ingested messages and the
extracted and indexed knowledge thereof.

It is constructed by calling the factory function
`typeagent.create_conversation` described below.

It has one public method:

- `query`
```py
async def query(
question: str,
# Other parameters are not public
) -> str
```

Tries to answer the question using (only) the indexed messages.
If no answer is found, the returned string starts with
`"No answer found:"`.

## Functions

There is currently only one public function.

#### Factory function

- `create_conversation`
```py
async def create_conversation(
dbname: str | None,
message_type: type,
name: str = "",
tags: list[str] | None = None,
settings: ConversationSettings | None = None,
) -> ConversationBase
```

- Constructs a conversation object.
- The required `dbname` argument specifies the SQLite3 database
name (e.g. `test.db`). If explicitly set to `None` the data is
stored in RAM and will not persist when the process exits.
- The required `message_type` is normally `TranscriptMessage`
or `ConversationMessage` (there are other possibilities too,
as yet left undocumented).
- The optional `name` specifies the conversation name, which
may be used in diagnostics.
- `tags` gives tags (like `ConversationMessage.tags`) for the whole
conversation.
- `settings` provides overrides for various aspects of the knowledge
extraction and indexing process. Its exact usage is currently left
as an exercise for the reader.
Loading