New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make document serializable, create utility to create a docstore #9674
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
@baskaryan / @hwchase17 let me know if you agree with namespacing, and if code looks correct. Largest concern is that a key-value wrapper that allows writing and fetching documents != a doc-store in terms of obvious features |
Going to change this a bit to make it work with any lc serializable |
) | ||
|
||
|
||
def create_kv_docstore( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other possibility below -- but seems like over abstraction
Objects we'll want to serialize are:
- everything (then no need to type check)
- prompts (can do special purpose)
- documents (can do special purpose)
The other possibility if anyone has time is to try and introduce a more general class that proxies the embedding class but gets the type signatures right
The tricky thing is getting the type signatures right :)
user passes types
and the class does run time type checking.
class LCStore(BaseStore[str, T]):
def __init__(self, store: BaseStore[str, bytes], types: Sequence[Type]):
"""Create a store for langchain serializable objects from a bytes store."""
self.types = types
self.store = EncoderBackedStore(
store,
key_encoder or _identity,
_dump_document_as_bytes,
_load_document_from_bytes,
)
def mget(self, keys: Sequence[K]) -> List[Optional[V]]:
"""Get multiple keys."""
return self.store.mget(keys)
def mdelete(self, keys: Sequence[K]) -> None:
"""Delete multiple keys."""
return self.store.mdelete(keys)
def yield_keys(
self, *, prefix: Optional[str] = None
) -> Union[Iterator[K], Iterator[str]]:
"""Yield all keys in the store."""
yield from self.store.yield_keys(prefix=prefix)
def mset(self, items: Sequence[Tuple[K, V]]) -> None:
"""Set multiple key-value pairs."""
return self.store.mset(items)
def _load_document_from_bytes(serialized: bytes) -> Document: | ||
"""Return a document from a bytes representation.""" | ||
obj = loads(serialized.decode("utf-8")) | ||
if not isinstance(obj, Document): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So all that is different about the specialised dump/load functions is the isinstance check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah run time check and type annotation
This PR makes the following changes:
Will help to address issue here: #9345