jbencina · jbencina · Jul 10, 2025 · Jun 6, 2025 · Jul 9, 2025 · Jul 9, 2025
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -1,7 +1,7 @@
 repos:
 - repo: https://github.com/astral-sh/ruff-pre-commit
   # Ruff version.
-  rev: v0.11.7
+  rev: v0.12.2
   hooks:
     # Run the linter.
     - id: ruff

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,12 @@ All notable changes to this project will be documented in this file.
 
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
 
+## [0.7.0]
+### Added
+- Prompts now stored as text file and can be overwritten via command line [#20](https://github.com/jbencina/vecsync/pull/20)
+### Fixed
+- `.env` file is correctly loaded, if present
+
 ## [0.6.1]
 ### Added
 - Test cases for most CLI commands [#18](https://github.com/jbencina/vecsync/issues/18)

diff --git a/README.md b/README.md
@@ -5,24 +5,34 @@
 [![image](https://img.shields.io/pypi/pyversions/vecsync.svg)](https://pypi.python.org/pypi/vecsync)
 [![Actions status](https://github.com/jbencina/vecsync/actions/workflows/ci.yaml/badge.svg)](https://github.com/jbencina/vecsync/actions)
 
-A simple command-line utility for synchronizing documents to vector storage for LLM interaction. Vecsync helps you
-quickly chat with papers, journals, and other documents with minimal overhead.
+A fast command-line utility for synchronizing journals and papers to OpenAI vector storage for chat interaction. Vecsync helps you research topics by simpliyfing your workflow.
 
-- 📄 Upload a local collection of PDFs to a remote vector store
-- ✅ Automatically add and remove remote files to match local documents
-- ☺️ Simplify platform specific complexities
-- 👀 Synchronize with a Zotero collection
-- 💬 Chat with documents from command line or local Gradio UI
+- 📄 Synchronize a local collection of PDFs to a remote vector store
+- ✅ Automatically manage OpenAI files, vector store, and assistant
+- 💬 Quickly chat with documents from command line or local Gradio UI
+- 👀 Connect to a local [Zotero](https://www.zotero.org/) collection
 
+
+**Sync and chat**
+```bash
+vs sync && vs chat
+```
 ![demo](docs/images/demo.gif)
 
-Local [Gradio](https://www.gradio.app) instance available for assistant interaction. Chat history across sessions is saved.
+**Chat with [Gradio](https://www.gradio.app)**
+```bash
+vs chat --ui
+```
 ![chat](docs/images/demo_chat.png)
 
 ## Getting Started
 > **OpenAI API Requirements**
 >
-> Currently vecsync only supports OpenAI for remote operations and requires a valid OpenAI key with credits. Visit https://openai.com/api/ for more information. Future improvements will allow more platform options and self-hosted models.
+> Currently vecsync only supports OpenAI for remote operations and requires a valid OpenAI key with credits. Visit https://openai.com/api/ for more information.
+
+> **Costs**
+>
+> Vecsync uses OpenAI gpt-4o-mini which is Input: $0.15/million tokens and Output: $0.60/million tokens. These costs are tied to your OpenAI API account. See [pricing](https://platform.openai.com/docs/pricing) for details.
 
 ### Installation
 Install vecsync from PyPI.
@@ -34,24 +44,14 @@ Set your OpenAI API key environment.
 ```
 export OPENAI_API_KEY=...
 ```
-You can also define the key via `.env` file using [dotenv](https://pypi.org/project/python-dotenv/)
+You can also define the key via `.env` file in the working directory.
 ```
 echo "OPENAI_API_KEY=…" > .env
 ```
 
-### Development
-This project is still in early alpha, and users should frequent updates. Breaking changes will be avoided where possible.
-To use the latest code, clone the repository and install locally. In progress work uses the branch naming convention
-of `dev-0.0.1` and will have an accompanying open PR.
-```bash
-git clone -b dev-0.0.1 git@github.com:jbencina/vecsync.git
-cd vecsync
-uv sync && source .venv/bin/activate
-```
-
 ### Usage
 
-#### Synching Collections
+#### Syncing Collections
 Use the `vs sync` command for all synching operations.
 
 Sync from local file path.
@@ -88,16 +88,9 @@ Remote count: 15
 Duration: 57.99 seconds
 ```
 
-#### Settings
-
-Settings are persisted in a local json file which can be purged.
-```bash
-vs settings clear
-```
-
 #### Chat Interactions
 Use `vs chat` to chat with uploaded documents via the command line. The responding assistant is automatically linked to your
-vector store. Alternatively, you can use `vs chat -u` to spawn a local Gradio instance.
+vector store. Alternatively, you can use `vs chat --ui` to spawn a local Gradio instance.
 
 ```bash
 vs chat
@@ -120,18 +113,3 @@ Type "exit" to quit at any time.
 > What was my last question to you? 
 Your last question to me was asking for a one sentence summary of the contents of my vector store collection.
 ```
-
-Threads can be cleared using the `-n` flag.
-```bash
-vs chat -n
-✅ Assistant found: asst_123456789
-Type "exit" to quit at any time.
-
-> What was my last question to you?
-💬 Conversation started: thread_987654321
-
-Your last question was about searching for relevant information from a large number of journals and papers, emphasizing the importance of citing information from the provided sources without making up any content.
-
-# Assistant response is in reference to the system prompt
-```
-
diff --git a/docs/images/demo.gif b/docs/images/demo.gif
diff --git a/docs/images/demo_chat.png b/docs/images/demo_chat.png
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "vecsync"
-version = "0.6.0"
+version = "0.7.0"
 description = "A simple command-line utility for synchronizing documents to vector storage for LLM interaction."
 readme = "README.md"
 authors = [

diff --git a/src/vecsync/chat/clients/openai.py b/src/vecsync/chat/clients/openai.py
@@ -1,5 +1,7 @@
+from importlib import resources
 from queue import Empty, Queue
 
+from dotenv import load_dotenv
 from openai import AssistantEventHandler, OpenAI
 from termcolor import cprint
 
@@ -106,14 +108,42 @@ class OpenAIClient:
     settings_path : str | None
         The path to the settings file. If None, the default settings file will be used.
         This is used to store the thread ID for the current conversation.
+    prompt_source : str | None
+        The path to the prompt source file. If None, the default prompt will be used.
     """
 
-    def __init__(self, store_name: str, settings_path: str | None = None):
+    def __init__(self, store_name: str, settings_path: str | None = None, prompt_source: str | None = None):
+        load_dotenv(override=True)
+
         self.client = OpenAI()
         self.store_name = store_name
         self.assistant_name = f"vecsync-{store_name}"
         self.connected = False
         self.settings_path = settings_path
+        self.prompt = self._get_prompt(prompt_source)
+
+    def _get_prompt(self, prompt_source: str | None = None) -> str:
+        """Get the prompt from the prompt source.
+
+        If a prompt source is provided, it will be used to load the prompt. Otherwise, the default
+        prompt will be used from the resources.
+
+        Parameters
+        ----------
+        prompt_source : str | None
+            The path to the prompt source file. If None, the default prompt will be used.
+
+        Returns
+        -------
+        str
+            The prompt to use for the assistant.
+        """
+        if prompt_source is not None:
+            with open(prompt_source) as f:
+                return f.read()
+        else:
+            with resources.files("vecsync.prompts").joinpath("default_prompt.txt").open("r") as f:
+                return f.read()
 
     def connect(self):
         """Connect to the OpenAI API and load the assistant and thread.
@@ -217,16 +247,9 @@ def _create_assistant(self) -> str:
             The assistant ID for the current conversation.
         """
 
-        instructions = """You are a helpful research assistant that can search through a large number
-        of journals and papers to help answer the user questions. You have been given a file store which contains
-        the relevant documents the user is referencing. These documents should be your primary source of information.
-        You may only use external knowledge if it is helpful in clarifying questions. It is very important that you
-        remain factual and cite information from the sources provided to you in the file store. You are not allowed
-        to make up information."""
-
         assistant = self.client.beta.assistants.create(
             name=self.assistant_name,
-            instructions=instructions,
+            instructions=self.prompt,
             tools=[{"type": "file_search"}],
             tool_resources={
                 "file_search": {

diff --git a/src/vecsync/cli/chat.py b/src/vecsync/cli/chat.py
@@ -5,8 +5,8 @@
 from vecsync.constants import DEFAULT_STORE_NAME
 
 
-def start_console_chat(store_name: str):
-    client = OpenAIClient(store_name=store_name)
+def start_console_chat(store_name: str, prompt_source: str | None = None):
+    client = OpenAIClient(store_name=store_name, prompt_source=prompt_source)
     client.connect()
 
     ui = ConsoleInterface(client)
@@ -20,8 +20,8 @@ def start_console_chat(store_name: str):
         ui.prompt(prompt)
 
 
-def start_ui_chat(store_name: str):
-    client = OpenAIClient(store_name=store_name)
+def start_ui_chat(store_name: str, prompt_source: str | None = None):
+    client = OpenAIClient(store_name=store_name, prompt_source=prompt_source)
     client.connect()
 
     ui = GradioInterface(client)
@@ -35,10 +35,16 @@ def start_ui_chat(store_name: str):
     is_flag=True,
     help="Spawn an interactive UI instead of a console interface.",
 )
-def chat(ui: bool):
+@click.option(
+    "--prompt",
+    "-p",
+    type=str,
+    help="The path to the prompt source file used when creating a new assistant.",
+)
+def chat(ui: bool, prompt: str | None):
     """Chat with the assistant."""
 
     if ui:
-        start_ui_chat(DEFAULT_STORE_NAME)
+        start_ui_chat(DEFAULT_STORE_NAME, prompt)
     else:
-        start_console_chat(DEFAULT_STORE_NAME)
+        start_console_chat(DEFAULT_STORE_NAME, prompt)
diff --git a/src/vecsync/prompts/default_prompt.txt b/src/vecsync/prompts/default_prompt.txt
@@ -0,0 +1,22 @@
+# Role
+You are an AI researcher and educator with a deep understanding of state-of-the-art machine-learning theory and
+practice. Your role to help summarize and dive into research papers.
+
+# Task
+You are given a collection of academic papers by the user. Using only the user-provided paper collection:
+1. Answer the user's question in your own words.
+2. Support answers with paper citations.
+3. If the question is ambiguous, ask a clarifying follow-up before answering.
+4. If the user asks about fundamental concepts or requests an example, you may use your own knowledge to answer.
+
+# Style
+- Insightful, friendly, professional.
+- Use clear analogies and examples to explain complex ideas.
+- Include citations when referencing the document collection.
+- Refer to the paper title rather than the file name.
+- Be transparent about any uncertainties or if the information is missing from the provided documents.
+- Adapt responses to user's knowledge level.
+
+# Constraints
+- Do not deviate from the user provided documents unless you are explaining fundamental concepts
+- Do not simply repeat text verbatim from references
diff --git a/src/vecsync/store/openai.py b/src/vecsync/store/openai.py
@@ -1,6 +1,7 @@
 from pathlib import Path
 from time import perf_counter
 
+from dotenv import load_dotenv
 from openai import OpenAI
 from pydantic import BaseModel
 from termcolor import cprint
@@ -19,6 +20,7 @@ class SyncOperationResult(BaseModel):
 
 class OpenAiVectorStore:
     def __init__(self, name: str):
+        load_dotenv(override=True)
         self.client = OpenAI()
         self.name = name
         self.store = None