Skip to content

Commit de74695

Browse files
authored
pgml-chat support for Discord (#886)
1 parent a47ba1b commit de74695

File tree

8 files changed

+1520
-0
lines changed

8 files changed

+1520
-0
lines changed

pgml-apps/pgml-chat/.env.template

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
OPENAI_API_KEY=<OPENAI_API_KEY>
2+
DATABASE_URL=<POSTGRES_DATABASE_URL starts with postgres://>
3+
MODEL=hkunlp/instructor-xl
4+
MODEL_PARAMS={"instruction": "Represent the Wikipedia document for retrieval: "}
5+
QUERY_PARAMS={"instruction": "Represent the Wikipedia question for retrieving supporting documents: "}
6+
SYSTEM_PROMPT="You are an assistant to answer questions about an open source software named PostgresML. Your name is PgBot. You are based out of San Francisco, California."
7+
BASE_PROMPT="Given relevant parts of a document and a question, create a final answer.\
8+
Include a SQL query in the answer wherever possible. \
9+
Use the following portion of a long document to see if any of the text is relevant to answer the question.\
10+
\nReturn any relevant text verbatim.\n{context}\nQuestion: {question}\n \
11+
If the context is empty then ask for clarification and suggest user to send an email to team@postgresml.org or join PostgresML [Discord](https://discord.gg/DmyJP3qJ7U)."
12+
SLACK_BOT_TOKEN=<SLACK_BOT_TOKEN>
13+
SLACK_APP_TOKEN=<SLACK_APP_TOKEN>
14+
DISCORD_BOT_TOKEN=<DISCORD_BOT_TOKEN>

pgml-apps/pgml-chat/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.env

pgml-apps/pgml-chat/README.md

Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
# PostgresML Chatbot Builder
2+
A command line tool to build and deploy a **_knowledge based_** chatbot using PostgresML and OpenAI API.
3+
4+
There are two stages in building a knowledge based chatbot:
5+
- Build a knowledge base by ingesting documents, chunking documents, generating embeddings and indexing these embeddings for fast query
6+
- Generate responses to user queries by retrieving relevant documents and generating responses using OpenAI API
7+
8+
This tool automates the above two stages and provides a command line interface to build and deploy a knowledge based chatbot.
9+
10+
# Prerequisites
11+
Before you begin, make sure you have the following:
12+
13+
- PostgresML Database: Spin up a for a free [GPU-powered database](https://postgresml.org/signup)
14+
- Python version >=3.8
15+
- OpenAI API key
16+
- Python 3.8+
17+
- Poetry
18+
19+
# Getting started
20+
1. Clone this repository, start a poetry shell and install dependencies
21+
```bash
22+
git clone https://github.com/postgresml/postgresml
23+
cd postgresml/pgml-apps/pgml-chat
24+
poetry shell
25+
poetry install
26+
pip install .
27+
```
28+
29+
2. Update environment variables in `.env` file
30+
```bash
31+
cp .env.template .env
32+
```
33+
34+
Update environment variables with your OpenAI API key and PostgresML database credentials.
35+
```bash
36+
OPENAI_API_KEY=<OPENAI_API_KEY>
37+
DATABASE_URL=<POSTGRES_DATABASE_URL starts with postgres://>
38+
MODEL=hkunlp/instructor-xl
39+
MODEL_PARAMS={"instruction": "Represent the Wikipedia document for retrieval: "}
40+
QUERY_PARAMS={"instruction": "Represent the Wikipedia question for retrieving supporting documents: "}
41+
SYSTEM_PROMPT="You are an assistant to answer questions about an open source software named PostgresML. Your name is PgBot. You are based out of San Francisco, California."
42+
BASE_PROMPT="Given relevant parts of a document and a question, create a final answer.\
43+
Include a SQL query in the answer wherever possible. \
44+
Use the following portion of a long document to see if any of the text is relevant to answer the question.\
45+
\nReturn any relevant text verbatim.\n{context}\nQuestion: {question}\n \
46+
If the context is empty then ask for clarification and suggest user to send an email to team@postgresml.org or join PostgresML [Discord](https://discord.gg/DmyJP3qJ7U)."
47+
```
48+
49+
# Usage
50+
You can get help on the command line interface by running:
51+
52+
```bash
53+
(pgml-bot-builder-py3.9) pgml-chat % pgml-chat --help
54+
usage: pgml-chat [-h] --collection_name COLLECTION_NAME [--root_dir ROOT_DIR] [--stage {ingest,chat}] [--chat_interface {cli,slack}]
55+
56+
PostgresML Chatbot Builder
57+
58+
optional arguments:
59+
-h, --help show this help message and exit
60+
--collection_name COLLECTION_NAME
61+
Name of the collection (schema) to store the data in PostgresML database (default: None)
62+
--root_dir ROOT_DIR Input folder to scan for markdown files. Required for ingest stage. Not required for chat stage (default: None)
63+
--stage {ingest,chat}
64+
Stage to run (default: chat)
65+
--chat_interface {cli, slack, discord}
66+
Chat interface to use (default: cli)
67+
```
68+
## Ingest
69+
In this step, we ingest documents, chunk documents, generate embeddings and index these embeddings for fast query.
70+
71+
```bash
72+
LOG_LEVEL=DEBUG pgml-chat --root_dir <directory> --collection_name <collection_name> --stage ingest
73+
```
74+
75+
You will see the following output:
76+
```bash
77+
[15:39:12] DEBUG [15:39:12] - Using selector: KqueueSelector
78+
INFO [15:39:12] - Starting pgml_chatbot
79+
INFO [15:39:12] - Scanning <root directory> for markdown files
80+
[15:39:13] INFO [15:39:13] - Found 85 markdown files
81+
Extracting text from markdown ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
82+
INFO [15:39:13] - Upserting documents into database
83+
[15:39:32] INFO [15:39:32] - Generating chunks
84+
[15:39:33] INFO [15:39:33] - Starting chunk count: 0
85+
[15:39:35] INFO [15:39:35] - Ending chunk count: 576
86+
INFO [15:39:35] - Total documents: 85 Total chunks: 576
87+
INFO [15:39:35] - Generating embeddings
88+
[15:39:36] INFO [15:39:36] - Splitter ID: 2
89+
[15:40:47] INFO [15:40:47] - Embeddings generated in 71.073 seconds
90+
```
91+
## Chat
92+
You can interact with the bot using the command line interface or Slack.
93+
94+
### Command Line Interface
95+
In this step, we start chatting with the chatbot at the command line. You can increase the log level to ERROR to suppress the logs. CLI is the default chat interface.
96+
97+
```bash
98+
LOG_LEVEL=ERROR pgml-chat --collection_name <collection_name> --stage chat --chat_interface cli
99+
```
100+
101+
You should be able to interact with the bot as shown below. Control-C to exit.
102+
```bash
103+
User (Ctrl-C to exit): Who are you?
104+
PgBot: I am PgBot, an AI assistant here to answer your questions about PostgresML, an open source software. How can I assist you today?
105+
User (Ctrl-C to exit): What is PostgresML?
106+
Found relevant documentation....
107+
PgBot: PostgresML is an open source software that allows you to unlock the full potential of your data and drive more sophisticated insights and decision-making processes. It provides a dashboard with analytical views of the training data and
108+
model performance, as well as integrated notebooks for rapid iteration. PostgresML is primarily written in Rust using Rocket as a lightweight web framework and SQLx to interact with the database.
109+
110+
If you have any further questions or need more information, please feel free to send an email to team@postgresml.org or join the PostgresML Discord community at https://discord.gg/DmyJP3qJ7U.
111+
```
112+
113+
114+
### Slack
115+
116+
**Setup**
117+
You need SLACK_BOT_TOKEN and SLACK_APP_TOKEN to run the chatbot on Slack. You can get these tokens by creating a Slack app. Follow the instructions [here](https://slack.dev/bolt-python/tutorial/getting-started) to create a Slack app.Include the following environment variables in your .env file:
118+
119+
```bash
120+
SLACK_BOT_TOKEN=<SLACK_BOT_TOKEN>
121+
SLACK_APP_TOKEN=<SLACK_APP_TOKEN>
122+
```
123+
In this step, we start chatting with the chatbot on Slack. You can increase the log level to ERROR to suppress the logs.
124+
```bash
125+
LOG_LEVEL=ERROR pgml-chat --collection_name <collection_name> --stage chat --chat_interface slack
126+
```
127+
If you have set up the Slack app correctly, you should see the following output:
128+
129+
```
130+
⚡️ Bolt app is running!
131+
```
132+
133+
Once the slack app is running, you can interact with the chatbot on Slack as shown below. In the example here, name of the bot is `PgBot`. This app responds only to direct messages to the bot.
134+
135+
![Slack Chatbot](./images/slack_screenshot.png)
136+
137+
138+
### Discord
139+
140+
**Setup**
141+
You need DISCORD_BOT_TOKEN to run the chatbot on Discord. You can get this token by creating a Discord app. Follow the instructions [here](https://discordpy.readthedocs.io/en/stable/discord.html) to create a Discord app. Include the following environment variables in your .env file:
142+
143+
```bash
144+
DISCORD_BOT_TOKEN=<DISCORD_BOT_TOKEN>
145+
```
146+
147+
In this step, we start chatting with the chatbot on Discord. You can increase the log level to ERROR to suppress the logs.
148+
```bash
149+
pgml-chat --collection_name <collection_name> --stage chat --chat_interface discord
150+
```
151+
If you have set up the Discord app correctly, you should see the following output:
152+
153+
```bash
154+
2023-08-02 16:09:57 INFO discord.client logging in using static token
155+
```
156+
Once the discord app is running, you can interact with the chatbot on Discord as shown below. In the example here, name of the bot is `pgchat`. This app responds only to direct messages to the bot.
157+
158+
![Discord Chatbot](./images/discord_screenshot.png)
159+
160+
## Options
161+
You can control the behavior of the chatbot by setting the following environment variables:
162+
- `SYSTEM_PROMPT`: This is the prompt that is used to initialize the chatbot. You can customize this prompt to change the behavior of the chatbot. For example, you can change the name of the chatbot or the location of the chatbot.
163+
- `BASE_PROMPT`: This is the prompt that is used to generate responses to user queries. You can customize this prompt to change the behavior of the chatbot.
164+
- `MODEL`: This is the open source embedding model used to generate embeddings for the documents. You can change this to use a different model.
165+
166+
## Roadmap
167+
- ~~`hyerbot --chat_interface {cli, slack, discord}` that supports Slack, and Discord.~~
168+
- Support for file formats like rst, html, pdf, docx, etc.
169+
- Support for open source models in addition to OpenAI for chat completion.
170+
- Support for multi-turn converstaions using converstaion buffer. Use a collection for chat history that can be retrieved and used to generate responses.
378 KB
Loading
341 KB
Loading

0 commit comments

Comments
 (0)