Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 26 additions & 6 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,36 @@
**NEVER use TEST_MODEL_NAME or "test" embedding model outside of test files**

Never run git commands that make any changes. (`git status` and `git diff` are fine)
Exceptions: `git push`, `git worktree`, `git branch` (for tracking setup), as instructed below.

**NEVER COMMIT CODE. Do not run `git commit` or any other git commands
that make changes to the repository. Not even `git add`**
**NEVER COMMIT CODE.** Do not run `git commit` or any other git commands
that make changes to the repository. Exception: Worktrees/Branches below.
`git add` is fine.

When moving, copying or deleting files, use the git commands: `git mv`, `git cp`, `git rm`

When I ask to update AGENTS.md (even if maybe) extract a general rule from what I said
before and update AGENTS.md (unless it's already in there -- maybe reformulate since
it apparently didn't work). Also, when it looks like I state a general rule, add it to
AGENTS.md. In all cases show what you added to AGENTS.md.
## Worktrees and Branches

- Each session uses its own worktree with a feature branch
- Create worktrees with: `git worktree add ../<repo>-<branch-name> -b <branch-name>`
- Push the branch to the `me` remote: `git push me <branch-name>`
- Set upstream to `me/<branch-name>`: `git branch --set-upstream-to me/<branch-name>`
- **Never** upstream to `me/main` — that must stay identical to `origin/main`
- The worktree directory name should be `<repo>-<branch-name>` (sibling of the main checkout)
- **Work in the worktree directory**, not the main checkout — edit files there, run tests there
- VS Code may show buffers from the main checkout; ignore those when working in a worktree.
When in doubt, verify edits landed on disk with `cat` or `grep` in the terminal.

## Debugging discipline

- When a bug seems impossible, suspect stale files or wrong working directory — not exotic causes.
- If you're tempted to blame installed package versions, `__pycache__`, or similar,
**stop and ask the user** before investigating further. You're probably on the wrong track.

**Whenever the user tells you how to do something, states a preference, or corrects you,
extract a general rule and add it to AGENTS.md** (unless it's already covered -- maybe
reformulate since it apparently didn't work). This applies even without being asked.
In all cases show what you added to AGENTS.md.

- Don't use '!' on the command line, it's some bash magic (even inside single quotes)
- When running 'make' commands, do not use the venv (the Makefile uses 'uv run')
Expand Down
8 changes: 4 additions & 4 deletions src/typeagent/knowpro/conversation_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,10 +96,10 @@ async def create(
tags if tags is not None else [],
)
instance.storage_provider = storage_provider
instance.messages = await storage_provider.get_message_collection()
instance.semantic_refs = await storage_provider.get_semantic_ref_collection()
instance.semantic_ref_index = await storage_provider.get_semantic_ref_index()
instance.secondary_indexes = await secindex.ConversationSecondaryIndexes.create(
instance.messages = storage_provider.messages
instance.semantic_refs = storage_provider.semantic_refs
instance.semantic_ref_index = storage_provider.semantic_ref_index
instance.secondary_indexes = secindex.ConversationSecondaryIndexes(
storage_provider, settings.related_term_index_settings
)
return instance
Expand Down
8 changes: 4 additions & 4 deletions src/typeagent/knowpro/factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,10 +60,10 @@ async def create_conversation[TMessage: IMessage](
tags=tags if tags is not None else [],
)
conversation.storage_provider = storage_provider
conversation.messages = await storage_provider.get_message_collection()
conversation.semantic_refs = await storage_provider.get_semantic_ref_collection()
conversation.semantic_ref_index = await storage_provider.get_semantic_ref_index()
conversation.secondary_indexes = await secindex.ConversationSecondaryIndexes.create(
conversation.messages = storage_provider.messages
conversation.semantic_refs = storage_provider.semantic_refs
conversation.semantic_ref_index = storage_provider.semantic_ref_index
conversation.secondary_indexes = secindex.ConversationSecondaryIndexes(
storage_provider, settings.related_term_index_settings
)
return conversation
26 changes: 17 additions & 9 deletions src/typeagent/knowpro/interfaces_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,23 +126,31 @@ async def get_metadata_multiple(
class IStorageProvider[TMessage: IMessage](Protocol):
"""API spec for storage providers -- maybe in-memory or persistent."""

async def get_message_collection(self) -> IMessageCollection[TMessage]: ...
@property
def messages(self) -> IMessageCollection[TMessage]: ...

async def get_semantic_ref_collection(self) -> ISemanticRefCollection: ...
@property
def semantic_refs(self) -> ISemanticRefCollection: ...

# Index getters - ALL 6 index types for this conversation
# Index properties - ALL 6 index types for this conversation

async def get_semantic_ref_index(self) -> ITermToSemanticRefIndex: ...
@property
def semantic_ref_index(self) -> ITermToSemanticRefIndex: ...

async def get_property_index(self) -> IPropertyToSemanticRefIndex: ...
@property
def property_index(self) -> IPropertyToSemanticRefIndex: ...

async def get_timestamp_index(self) -> ITimestampToTextRangeIndex: ...
@property
def timestamp_index(self) -> ITimestampToTextRangeIndex: ...

async def get_message_text_index(self) -> IMessageTextIndex[TMessage]: ...
@property
def message_text_index(self) -> IMessageTextIndex[TMessage]: ...

async def get_related_terms_index(self) -> ITermToRelatedTermsIndex: ...
@property
def related_terms_index(self) -> ITermToRelatedTermsIndex: ...

async def get_conversation_threads(self) -> IConversationThreads: ...
@property
def conversation_threads(self) -> IConversationThreads: ...

# Metadata management

Expand Down
36 changes: 8 additions & 28 deletions src/typeagent/knowpro/secindex.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,32 +22,12 @@ def __init__(
settings: RelatedTermIndexSettings,
):
self._storage_provider = storage_provider
# Initialize all indexes through storage provider immediately
self.property_to_semantic_ref_index = None
self.timestamp_index = None
self.term_to_related_terms_index = None
self.threads = None
self.message_index = None

@classmethod
async def create(
cls,
storage_provider: IStorageProvider,
settings: RelatedTermIndexSettings,
) -> "ConversationSecondaryIndexes":
"""Create and initialize a ConversationSecondaryIndexes with all indexes."""
self = cls(storage_provider, settings)
# Initialize all indexes from storage provider
self.property_to_semantic_ref_index = (
await storage_provider.get_property_index()
)
self.timestamp_index = await storage_provider.get_timestamp_index()
self.term_to_related_terms_index = (
await storage_provider.get_related_terms_index()
)
self.threads = await storage_provider.get_conversation_threads()
self.message_index = await storage_provider.get_message_text_index()
return self
self.property_to_semantic_ref_index = storage_provider.property_index
self.timestamp_index = storage_provider.timestamp_index
self.term_to_related_terms_index = storage_provider.related_terms_index
self.threads = storage_provider.conversation_threads
self.message_index = storage_provider.message_text_index


async def build_secondary_indexes[
Expand All @@ -59,7 +39,7 @@ async def build_secondary_indexes[
) -> None:
if conversation.secondary_indexes is None:
storage_provider = await conversation_settings.get_storage_provider()
conversation.secondary_indexes = await ConversationSecondaryIndexes.create(
conversation.secondary_indexes = ConversationSecondaryIndexes(
storage_provider, conversation_settings.related_term_index_settings
)
else:
Expand All @@ -82,9 +62,9 @@ async def build_transient_secondary_indexes[
settings: ConversationSettings,
) -> None:
if conversation.secondary_indexes is None:
conversation.secondary_indexes = await ConversationSecondaryIndexes.create(
conversation.secondary_indexes = ConversationSecondaryIndexes(
await settings.get_storage_provider(),
(settings.related_term_index_settings),
settings.related_term_index_settings,
)
await build_property_index(conversation)
await build_timestamp_index(conversation)
4 changes: 2 additions & 2 deletions src/typeagent/podcasts/podcast.py
Original file line number Diff line number Diff line change
Expand Up @@ -187,8 +187,8 @@ async def read_from_file(
data = Podcast._read_conversation_data_from_file(filename_prefix)

provider = await settings.get_storage_provider()
msgs = await provider.get_message_collection()
semrefs = await provider.get_semantic_ref_collection()
msgs = provider.messages
semrefs = provider.semantic_refs
if await msgs.size() or await semrefs.size():
raise RuntimeError(
f"Database {dbname!r} already has messages or semantic refs."
Expand Down
2 changes: 1 addition & 1 deletion src/typeagent/podcasts/podcast_ingest.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ async def ingest_podcast(
PodcastMessage,
)
settings.storage_provider = provider
msg_coll = await provider.get_message_collection()
msg_coll = provider.messages
if (msg_size := await msg_coll.size()) > start_message:
raise RuntimeError(
f"{dbname!r} has {msg_size} messages; start_message ({start_message}) should be at least that."
Expand Down
2 changes: 1 addition & 1 deletion src/typeagent/storage/memory/messageindex.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ async def build_message_index[
if csi is None:
return
if csi.message_index is None:
csi.message_index = await storage_provider.get_message_text_index()
csi.message_index = storage_provider.message_text_index
messages = conversation.messages
# Convert collection to list for add_messages
messages_list = await messages.get_slice(0, await messages.size())
Expand Down
26 changes: 16 additions & 10 deletions src/typeagent/storage/memory/provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,30 +77,36 @@ async def __aexit__(
"""Exit transaction context. No-op for in-memory storage."""
pass

async def get_semantic_ref_index(self) -> ITermToSemanticRefIndex:
@property
def semantic_ref_index(self) -> ITermToSemanticRefIndex:
return self._conversation_index

async def get_property_index(self) -> IPropertyToSemanticRefIndex:
@property
def property_index(self) -> IPropertyToSemanticRefIndex:
return self._property_index

async def get_timestamp_index(self) -> ITimestampToTextRangeIndex:
@property
def timestamp_index(self) -> ITimestampToTextRangeIndex:
return self._timestamp_index

async def get_message_text_index(self) -> IMessageTextIndex[TMessage]:
@property
def message_text_index(self) -> IMessageTextIndex[TMessage]:
return self._message_text_index

async def get_related_terms_index(self) -> ITermToRelatedTermsIndex:
@property
def related_terms_index(self) -> ITermToRelatedTermsIndex:
return self._related_terms_index

async def get_conversation_threads(self) -> IConversationThreads:
@property
def conversation_threads(self) -> IConversationThreads:
return self._conversation_threads

async def get_message_collection(
self, message_type: type[TMessage] | None = None
) -> MemoryMessageCollection[TMessage]:
@property
def messages(self) -> MemoryMessageCollection[TMessage]:
return self._message_collection

async def get_semantic_ref_collection(self) -> MemorySemanticRefCollection:
@property
def semantic_refs(self) -> MemorySemanticRefCollection:
return self._semantic_ref_collection

async def close(self) -> None:
Expand Down
51 changes: 10 additions & 41 deletions src/typeagent/storage/sqlite/provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
from ...knowpro.convsettings import MessageTextIndexSettings, RelatedTermIndexSettings
from ...knowpro.interfaces import ConversationMetadata, STATUS_INGESTED
from ...knowpro.interfaces_storage import ChunkFailure
from ..memory.convthreads import ConversationThreads
from .collections import SqliteMessageCollection, SqliteSemanticRefCollection
from .messageindex import SqliteMessageTextIndex
from .propindex import SqlitePropertyIndex
Expand Down Expand Up @@ -100,6 +101,11 @@ def __init__(
self.db, self.related_term_index_settings.embedding_index_settings
)

# Initialize conversation threads
self._conversation_threads = ConversationThreads(
self.message_text_index_settings.embedding_index_settings
)

# Connect message collection to message text index for automatic indexing
self._message_collection.set_message_text_index(self._message_text_index)

Expand Down Expand Up @@ -325,7 +331,7 @@ def semantic_refs(self) -> SqliteSemanticRefCollection:
return self._semantic_ref_collection

@property
def term_to_semantic_ref_index(self) -> SqliteTermToSemanticRefIndex:
def semantic_ref_index(self) -> SqliteTermToSemanticRefIndex:
return self._term_to_semantic_ref_index

@property
Expand All @@ -344,46 +350,9 @@ def message_text_index(self) -> SqliteMessageTextIndex:
def related_terms_index(self) -> SqliteRelatedTermsIndex:
return self._related_terms_index

# Async getters required by base class
async def get_message_collection(
self, message_type: type[TMessage] | None = None
) -> interfaces.IMessageCollection[TMessage]:
"""Get the message collection."""
return self._message_collection

async def get_semantic_ref_collection(self) -> interfaces.ISemanticRefCollection:
"""Get the semantic reference collection."""
return self._semantic_ref_collection

async def get_semantic_ref_index(self) -> interfaces.ITermToSemanticRefIndex:
"""Get the semantic reference index."""
return self._term_to_semantic_ref_index

async def get_property_index(self) -> interfaces.IPropertyToSemanticRefIndex:
"""Get the property index."""
return self._property_index

async def get_timestamp_index(self) -> interfaces.ITimestampToTextRangeIndex:
"""Get the timestamp index."""
return self._timestamp_index

async def get_message_text_index(self) -> interfaces.IMessageTextIndex[TMessage]:
"""Get the message text index."""
return self._message_text_index

async def get_related_terms_index(self) -> interfaces.ITermToRelatedTermsIndex:
"""Get the related terms index."""
return self._related_terms_index

async def get_conversation_threads(self) -> interfaces.IConversationThreads:
"""Get the conversation threads."""
# For now, return a simple implementation
# In a full implementation, this would be stored/retrieved from SQLite
from ...storage.memory.convthreads import ConversationThreads

return ConversationThreads(
self.message_text_index_settings.embedding_index_settings
)
@property
def conversation_threads(self) -> ConversationThreads:
return self._conversation_threads

async def clear(self) -> None:
"""Clear all data from the storage provider."""
Expand Down
4 changes: 2 additions & 2 deletions src/typeagent/transcripts/transcript.py
Original file line number Diff line number Diff line change
Expand Up @@ -187,8 +187,8 @@ async def read_from_file(
data = Transcript._read_conversation_data_from_file(filename_prefix)

provider = await settings.get_storage_provider()
msgs = await provider.get_message_collection()
semrefs = await provider.get_semantic_ref_collection()
msgs = provider.messages
semrefs = provider.semantic_refs
if await msgs.size() or await semrefs.size():
raise RuntimeError(
f"Database {dbname!r} already has messages or semantic refs."
Expand Down
2 changes: 1 addition & 1 deletion tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -328,7 +328,7 @@ async def ensure_initialized(self):
storage_provider = await self.settings.get_storage_provider()
self._storage_provider = storage_provider
if self.semantic_ref_index is None:
self.semantic_ref_index = await storage_provider.get_semantic_ref_index() # type: ignore
self.semantic_ref_index = storage_provider.semantic_ref_index # type: ignore

if self._has_secondary_indexes:
# Set up secondary indexes
Expand Down
6 changes: 3 additions & 3 deletions tests/test_message_text_index_population.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ async def test_message_text_index_population_from_database():
),
]

msg_collection = await storage1.get_message_collection()
msg_collection = storage1.messages
await msg_collection.extend(test_messages)
assert await msg_collection.size() == len(test_messages)

Expand All @@ -74,15 +74,15 @@ async def test_message_text_index_population_from_database():
)

# Check message collection size
msg_collection2 = await storage2.get_message_collection()
msg_collection2 = storage2.messages
msg_count = await msg_collection2.size()
print(f"Message collection size: {msg_count}")
assert msg_count == len(
test_messages
), f"Expected {len(test_messages)} messages, got {msg_count}"

# Check message text index
msg_text_index = await storage2.get_message_text_index()
msg_text_index = storage2.message_text_index
# Check that it implements the interface correctly
from typeagent.knowpro.interfaces import IMessageTextIndex

Expand Down
4 changes: 2 additions & 2 deletions tests/test_property_index_population.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ async def test_property_index_population_from_database(really_needs_auth):
),
]

sem_ref_collection = await storage1.get_semantic_ref_collection()
sem_ref_collection = storage1.semantic_refs
for sem_ref in test_data:
await sem_ref_collection.append(sem_ref)

Expand Down Expand Up @@ -111,7 +111,7 @@ async def test_property_index_population_from_database(really_needs_auth):
# Build property index from the semantic refs
await build_property_index(conversation)

prop_index = await storage2.get_property_index()
prop_index = storage2.property_index
from typeagent.knowpro.interfaces import IPropertyToSemanticRefIndex

assert isinstance(prop_index, IPropertyToSemanticRefIndex)
Expand Down
Loading
Loading