Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core[minor]: Adds an in-memory implementation of RecordManager #13200

Merged
merged 41 commits into from
Jun 20, 2024

Conversation

pprados
Copy link
Contributor

@pprados pprados commented Nov 10, 2023

Description:
langchain offers three technologies to save data:

If you want to combine these technologies in a sample persistence stategy you need a common implementation for each. DocStore propose InMemoryDocstore.

We propose the class MemoryRecordManager to complete the system.

This is the prelude to another full-request, which needs a consistent combination of persistence components.

Tag maintainer:
@baskaryan

Twitter handle:
@pprados

Copy link

vercel bot commented Nov 10, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Jun 19, 2024 5:46am

@dosubot dosubot bot added Ɑ: memory Related to memory module 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features labels Nov 10, 2023
pprados added a commit to pprados/langchain-rag that referenced this pull request Nov 10, 2023
- [Add a Wrapper vectorstore, compatible with SelfQueryRetriever](langchain-ai/langchain#13190)
- [Adds an in-memory implementation of RecordStore](langchain-ai/langchain#13200)
- [Add SQLDocStore](langchain-ai/langchain#13181)
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 23, 2023
@pprados pprados changed the title Adds an in-memory implementation of RecordStore Adds an in-memory implementation of RecordManager Nov 27, 2023
Copy link
Contributor

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is good, but the async methods aren't actually async right?

@pprados
Copy link
Contributor Author

pprados commented Dec 6, 2023

No. All is async, because all method works in memories. Without IO.

@pprados
Copy link
Contributor Author

pprados commented Dec 15, 2023

@hwchase17
Some news ?

@pprados pprados force-pushed the pprados/memory_recordmanager branch from b069a09 to f667410 Compare January 23, 2024 13:39
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jan 23, 2024
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Jan 23, 2024
@pprados pprados marked this pull request as ready for review June 13, 2024 08:56
@pprados
Copy link
Contributor Author

pprados commented Jun 13, 2024

@eyurtsev
I can result this bug.

with

git checkout master
cd /libs/core
poetry lock --no-update
Resolving dependencies... (0.0s)

Because langchain-core depends on langchain-text-splitters (0.2.1) @ file:///home/pprados/workspace.bda/langchain/libs/text-splitters which depends on langchain-core (^0.2.0), langchain-core is required.
So, because langchain-core is 0.0.0a64, version solving failed.

@pprados
Copy link
Contributor Author

pprados commented Jun 14, 2024

@eyurtsev
In the test CI /cd libs/core / make extended_tests, I receive an error:

INTERNALERROR>     raise Failed(msg=reason, pytrace=pytrace)
INTERNALERROR> Failed: Package `aiosqlite` is not installed but is required for extended tests. Please install the given package and try again.

It has nothing to do with my patch

@pprados pprados marked this pull request as draft June 14, 2024 15:47
@pprados pprados marked this pull request as ready for review June 14, 2024 15:47
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Jun 14, 2024
@eyurtsev
Copy link
Collaborator

@pprados please review

I reused the existing implementation of the in memory record manager from unit tests.

  1. data is scoped to the instance not to the class (we generally don't want to scope data to the class namespace)
  2. has slightly more documentation

It's moved to base.py together with the abstract interface. We can move it into a separate module as well (doesn't really matter), we just need it to be a non private module for it to appear in the API Reference documentation.

The typical name for memory implementations is "InMemory"

@pprados
Copy link
Contributor Author

pprados commented Jun 17, 2024

@eyurtsev
It's perfect for me.

pprados added a commit to pprados/langchain-rag that referenced this pull request Jun 17, 2024
@pprados pprados marked this pull request as draft June 19, 2024 05:46
@pprados pprados marked this pull request as ready for review June 19, 2024 14:07
@ccurme ccurme added the Ɑ: core Related to langchain-core label Jun 19, 2024
Copy link
Contributor Author

@pprados pprados left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Jun 20, 2024
@eyurtsev eyurtsev merged commit 8711c61 into langchain-ai:master Jun 20, 2024
134 checks passed
hinthornw pushed a commit that referenced this pull request Jun 20, 2024
…22065)

# package community: Fix SQLChatMessageHistory

## Description
Here is a rewrite of `SQLChatMessageHistory` to properly implement the
asynchronous approach. The code circumvents [issue
22021](#22021) by
accepting a synchronous call to `def add_messages()` in an asynchronous
scenario. This bypasses the bug.

For the same reasons as in [PR
22](langchain-ai/langchain-postgres#32) of
`langchain-postgres`, we use a lazy strategy for table creation. Indeed,
the promise of the constructor cannot be fulfilled without this. It is
not possible to invoke a synchronous call in a constructor. We
compensate for this by waiting for the next asynchronous method call to
create the table.

The goal of the `PostgresChatMessageHistory` class (in
`langchain-postgres`) is, among other things, to be able to recycle
database connections. The implementation of the class is problematic, as
we have demonstrated in [issue
22021](#22021).

Our new implementation of `SQLChatMessageHistory` achieves this by using
a singleton of type (`Async`)`Engine` for the database connection. The
connection pool is managed by this singleton, and the code is then
reentrant.

We also accept the type `str` (optionally complemented by `async_mode`.
I know you don't like this much, but it's the only way to allow an
asynchronous connection string).

In order to unify the different classes handling database connections,
we have renamed `connection_string` to `connection`, and `Session` to
`session_maker`.

Now, a single transaction is used to add a list of messages. Thus, a
crash during this write operation will not leave the database in an
unstable state with a partially added message list. This makes the code
resilient.

We believe that the `PostgresChatMessageHistory` class is no longer
necessary and can be replaced by:
```
PostgresChatMessageHistory = SQLChatMessageHistory
```
This also fixes the bug.


## Issue
- [issue 22021](#22021)
  - Bug in _exit_history()
  - Bugs in PostgresChatMessageHistory and sync usage
  - Bugs in PostgresChatMessageHistory and async usage
- [issue
36](langchain-ai/langchain-postgres#36)
 ## Twitter handle:
pprados

## Tests
- libs/community/tests/unit_tests/chat_message_histories/test_sql.py
(add async test)

@baskaryan, @eyurtsev or @hwchase17 can you check this PR ?
And, I've been waiting a long time for validation from other PRs. Can
you take a look?
- [PR 32](langchain-ai/langchain-postgres#32)
- [PR 15575](#15575)
- [PR 13200](#13200)

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
hinthornw pushed a commit that referenced this pull request Jun 20, 2024
**Description:**
langchain offers three technologies to save data:
-
[vectorstore](https://python.langchain.com/docs/modules/data_connection/vectorstores/)
- [docstore](https://js.langchain.com/docs/api/schema/classes/Docstore)
- [record
manager](https://python.langchain.com/docs/modules/data_connection/indexing)

If you want to combine these technologies in a sample persistence
stategy you need a common implementation for each. `DocStore` propose
`InMemoryDocstore`.

We propose the class `MemoryRecordManager` to complete the system.

This is the prelude to another full-request, which needs a consistent
combination of persistence components.

**Tag maintainer:**
@baskaryan

**Twitter handle:**
@pprados

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
@pprados pprados deleted the pprados/memory_recordmanager branch June 21, 2024 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: core Related to langchain-core 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features 🤖:improvement Medium size change to existing code to handle new use-cases lgtm PR looks good. Use to confirm that a PR is ready for merging. Ɑ: memory Related to memory module size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants