Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make document serializable, create utility to create a docstore #9674

Merged
merged 8 commits into from Aug 30, 2023

Conversation

eyurtsev
Copy link
Collaborator

@eyurtsev eyurtsev commented Aug 24, 2023

This PR makes the following changes:

  1. Documents become serializable using langhchain serialization
  2. Make a utility to create a docstore kw store

Will help to address issue here: #9345

@vercel
Copy link

vercel bot commented Aug 24, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Aug 24, 2023 3:02pm

@eyurtsev
Copy link
Collaborator Author

@baskaryan / @hwchase17 let me know if you agree with namespacing, and if code looks correct.

Largest concern is that a key-value wrapper that allows writing and fetching documents != a doc-store in terms of obvious features

@eyurtsev
Copy link
Collaborator Author

Going to change this a bit to make it work with any lc serializable

)


def create_kv_docstore(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other possibility below -- but seems like over abstraction

Objects we'll want to serialize are:

  1. everything (then no need to type check)
  2. prompts (can do special purpose)
  3. documents (can do special purpose)

The other possibility if anyone has time is to try and introduce a more general class that proxies the embedding class but gets the type signatures right

The tricky thing is getting the type signatures right :)

user passes types and the class does run time type checking.



class LCStore(BaseStore[str, T]):
    def __init__(self, store: BaseStore[str, bytes], types: Sequence[Type]):
        """Create a store for langchain serializable objects from a bytes store."""
        self.types = types
        self.store = EncoderBackedStore(
            store,
            key_encoder or _identity,
            _dump_document_as_bytes,
            _load_document_from_bytes,
        )

    def mget(self, keys: Sequence[K]) -> List[Optional[V]]:
        """Get multiple keys."""
        return self.store.mget(keys)

    def mdelete(self, keys: Sequence[K]) -> None:
        """Delete multiple keys."""
        return self.store.mdelete(keys)

    def yield_keys(
        self, *, prefix: Optional[str] = None
    ) -> Union[Iterator[K], Iterator[str]]:
        """Yield all keys in the store."""
        yield from self.store.yield_keys(prefix=prefix)

    def mset(self, items: Sequence[Tuple[K, V]]) -> None:
        """Set multiple key-value pairs."""
        return self.store.mset(items)

def _load_document_from_bytes(serialized: bytes) -> Document:
"""Return a document from a bytes representation."""
obj = loads(serialized.decode("utf-8"))
if not isinstance(obj, Document):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So all that is different about the specialised dump/load functions is the isinstance check?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah run time check and type annotation

@eyurtsev eyurtsev merged commit 588237e into master Aug 30, 2023
27 checks passed
@eyurtsev eyurtsev deleted the eugene/add_document_store branch August 30, 2023 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants