Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.11.7
rev: v0.12.2
hooks:
# Run the linter.
- id: ruff
Expand Down
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)

## [0.7.0]
### Added
- Prompts now stored as text file and can be overwritten via command line [#20](https://github.com/jbencina/vecsync/pull/20)
### Fixed
- `.env` file is correctly loaded, if present

## [0.6.1]
### Added
- Test cases for most CLI commands [#18](https://github.com/jbencina/vecsync/issues/18)
Expand Down
66 changes: 22 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,34 @@
[![image](https://img.shields.io/pypi/pyversions/vecsync.svg)](https://pypi.python.org/pypi/vecsync)
[![Actions status](https://github.com/jbencina/vecsync/actions/workflows/ci.yaml/badge.svg)](https://github.com/jbencina/vecsync/actions)

A simple command-line utility for synchronizing documents to vector storage for LLM interaction. Vecsync helps you
quickly chat with papers, journals, and other documents with minimal overhead.
A fast command-line utility for synchronizing journals and papers to OpenAI vector storage for chat interaction. Vecsync helps you research topics by simpliyfing your workflow.

- 📄 Upload a local collection of PDFs to a remote vector store
- ✅ Automatically add and remove remote files to match local documents
- ☺️ Simplify platform specific complexities
- 👀 Synchronize with a Zotero collection
- 💬 Chat with documents from command line or local Gradio UI
- 📄 Synchronize a local collection of PDFs to a remote vector store
- ✅ Automatically manage OpenAI files, vector store, and assistant
- 💬 Quickly chat with documents from command line or local Gradio UI
- 👀 Connect to a local [Zotero](https://www.zotero.org/) collection


**Sync and chat**
```bash
vs sync && vs chat
```
![demo](docs/images/demo.gif)

Local [Gradio](https://www.gradio.app) instance available for assistant interaction. Chat history across sessions is saved.
**Chat with [Gradio](https://www.gradio.app)**
```bash
vs chat --ui
```
![chat](docs/images/demo_chat.png)

## Getting Started
> **OpenAI API Requirements**
>
> Currently vecsync only supports OpenAI for remote operations and requires a valid OpenAI key with credits. Visit https://openai.com/api/ for more information. Future improvements will allow more platform options and self-hosted models.
> Currently vecsync only supports OpenAI for remote operations and requires a valid OpenAI key with credits. Visit https://openai.com/api/ for more information.

> **Costs**
>
> Vecsync uses OpenAI gpt-4o-mini which is Input: $0.15/million tokens and Output: $0.60/million tokens. These costs are tied to your OpenAI API account. See [pricing](https://platform.openai.com/docs/pricing) for details.

### Installation
Install vecsync from PyPI.
Expand All @@ -34,24 +44,14 @@ Set your OpenAI API key environment.
```
export OPENAI_API_KEY=...
```
You can also define the key via `.env` file using [dotenv](https://pypi.org/project/python-dotenv/)
You can also define the key via `.env` file in the working directory.
```
echo "OPENAI_API_KEY=…" > .env
```

### Development
This project is still in early alpha, and users should frequent updates. Breaking changes will be avoided where possible.
To use the latest code, clone the repository and install locally. In progress work uses the branch naming convention
of `dev-0.0.1` and will have an accompanying open PR.
```bash
git clone -b dev-0.0.1 git@github.com:jbencina/vecsync.git
cd vecsync
uv sync && source .venv/bin/activate
```

### Usage

#### Synching Collections
#### Syncing Collections
Use the `vs sync` command for all synching operations.

Sync from local file path.
Expand Down Expand Up @@ -88,16 +88,9 @@ Remote count: 15
Duration: 57.99 seconds
```

#### Settings

Settings are persisted in a local json file which can be purged.
```bash
vs settings clear
```

#### Chat Interactions
Use `vs chat` to chat with uploaded documents via the command line. The responding assistant is automatically linked to your
vector store. Alternatively, you can use `vs chat -u` to spawn a local Gradio instance.
vector store. Alternatively, you can use `vs chat --ui` to spawn a local Gradio instance.

```bash
vs chat
Expand All @@ -120,18 +113,3 @@ Type "exit" to quit at any time.
> What was my last question to you?
Your last question to me was asking for a one sentence summary of the contents of my vector store collection.
```

Threads can be cleared using the `-n` flag.
```bash
vs chat -n
✅ Assistant found: asst_123456789
Type "exit" to quit at any time.

> What was my last question to you?
💬 Conversation started: thread_987654321

Your last question was about searching for relevant information from a large number of journals and papers, emphasizing the importance of citing information from the provided sources without making up any content.

# Assistant response is in reference to the system prompt
```

Binary file modified docs/images/demo.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/images/demo_chat.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "vecsync"
version = "0.6.0"
version = "0.7.0"
description = "A simple command-line utility for synchronizing documents to vector storage for LLM interaction."
readme = "README.md"
authors = [
Expand Down
41 changes: 32 additions & 9 deletions src/vecsync/chat/clients/openai.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
from importlib import resources
from queue import Empty, Queue

from dotenv import load_dotenv
from openai import AssistantEventHandler, OpenAI
from termcolor import cprint

Expand Down Expand Up @@ -106,14 +108,42 @@ class OpenAIClient:
settings_path : str | None
The path to the settings file. If None, the default settings file will be used.
This is used to store the thread ID for the current conversation.
prompt_source : str | None
The path to the prompt source file. If None, the default prompt will be used.
"""

def __init__(self, store_name: str, settings_path: str | None = None):
def __init__(self, store_name: str, settings_path: str | None = None, prompt_source: str | None = None):
load_dotenv(override=True)

self.client = OpenAI()
self.store_name = store_name
self.assistant_name = f"vecsync-{store_name}"
self.connected = False
self.settings_path = settings_path
self.prompt = self._get_prompt(prompt_source)

def _get_prompt(self, prompt_source: str | None = None) -> str:
"""Get the prompt from the prompt source.

If a prompt source is provided, it will be used to load the prompt. Otherwise, the default
prompt will be used from the resources.

Parameters
----------
prompt_source : str | None
The path to the prompt source file. If None, the default prompt will be used.

Returns
-------
str
The prompt to use for the assistant.
"""
if prompt_source is not None:
with open(prompt_source) as f:
return f.read()
else:
with resources.files("vecsync.prompts").joinpath("default_prompt.txt").open("r") as f:
return f.read()

def connect(self):
"""Connect to the OpenAI API and load the assistant and thread.
Expand Down Expand Up @@ -217,16 +247,9 @@ def _create_assistant(self) -> str:
The assistant ID for the current conversation.
"""

instructions = """You are a helpful research assistant that can search through a large number
of journals and papers to help answer the user questions. You have been given a file store which contains
the relevant documents the user is referencing. These documents should be your primary source of information.
You may only use external knowledge if it is helpful in clarifying questions. It is very important that you
remain factual and cite information from the sources provided to you in the file store. You are not allowed
to make up information."""

assistant = self.client.beta.assistants.create(
name=self.assistant_name,
instructions=instructions,
instructions=self.prompt,
tools=[{"type": "file_search"}],
tool_resources={
"file_search": {
Expand Down
20 changes: 13 additions & 7 deletions src/vecsync/cli/chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
from vecsync.constants import DEFAULT_STORE_NAME


def start_console_chat(store_name: str):
client = OpenAIClient(store_name=store_name)
def start_console_chat(store_name: str, prompt_source: str | None = None):
client = OpenAIClient(store_name=store_name, prompt_source=prompt_source)
client.connect()

ui = ConsoleInterface(client)
Expand All @@ -20,8 +20,8 @@ def start_console_chat(store_name: str):
ui.prompt(prompt)


def start_ui_chat(store_name: str):
client = OpenAIClient(store_name=store_name)
def start_ui_chat(store_name: str, prompt_source: str | None = None):
client = OpenAIClient(store_name=store_name, prompt_source=prompt_source)
client.connect()

ui = GradioInterface(client)
Expand All @@ -35,10 +35,16 @@ def start_ui_chat(store_name: str):
is_flag=True,
help="Spawn an interactive UI instead of a console interface.",
)
def chat(ui: bool):
@click.option(
"--prompt",
"-p",
type=str,
help="The path to the prompt source file used when creating a new assistant.",
)
def chat(ui: bool, prompt: str | None):
"""Chat with the assistant."""

if ui:
start_ui_chat(DEFAULT_STORE_NAME)
start_ui_chat(DEFAULT_STORE_NAME, prompt)
else:
start_console_chat(DEFAULT_STORE_NAME)
start_console_chat(DEFAULT_STORE_NAME, prompt)
22 changes: 22 additions & 0 deletions src/vecsync/prompts/default_prompt.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Role
You are an AI researcher and educator with a deep understanding of state-of-the-art machine-learning theory and
practice. Your role to help summarize and dive into research papers.

# Task
You are given a collection of academic papers by the user. Using only the user-provided paper collection:
1. Answer the user's question in your own words.
2. Support answers with paper citations.
3. If the question is ambiguous, ask a clarifying follow-up before answering.
4. If the user asks about fundamental concepts or requests an example, you may use your own knowledge to answer.

# Style
- Insightful, friendly, professional.
- Use clear analogies and examples to explain complex ideas.
- Include citations when referencing the document collection.
- Refer to the paper title rather than the file name.
- Be transparent about any uncertainties or if the information is missing from the provided documents.
- Adapt responses to user's knowledge level.

# Constraints
- Do not deviate from the user provided documents unless you are explaining fundamental concepts
- Do not simply repeat text verbatim from references
2 changes: 2 additions & 0 deletions src/vecsync/store/openai.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from pathlib import Path
from time import perf_counter

from dotenv import load_dotenv
from openai import OpenAI
from pydantic import BaseModel
from termcolor import cprint
Expand All @@ -19,6 +20,7 @@ class SyncOperationResult(BaseModel):

class OpenAiVectorStore:
def __init__(self, name: str):
load_dotenv(override=True)
self.client = OpenAI()
self.name = name
self.store = None
Expand Down
Loading