-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: BigtableLoader and BigtableSaver implementation #5
Conversation
docs/document_loader.ipynb
Outdated
"## Pre-reqs" | ||
"## Setting up\n", | ||
"\n", | ||
"To run this notebook, you will need a Google Cloud Project, a Bigtable instance, and Google credentials." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should include link to instructions at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docs/document_loader.ipynb
Outdated
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Load from table" | ||
"You can fetch the documents by calling the `load` method of the loader. It will return a list with all the documents. If you want to avoid this blocking call, you can call `lazy_load` method that returns an Iterator." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should encourage lazy_load -- load is only around for legacy reasons IIUC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docs/document_loader.ipynb
Outdated
"docs_iterator = loader.lazy_load()\n", | ||
"for doc in docs_iterator:\n", | ||
" print(doc)\n", | ||
" break" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should probably just be: for doc in loader.lazy_load()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
class MetadataMapping: | ||
def __init__( | ||
self, | ||
column_family: str, | ||
column_name: str, | ||
metadata_key: str, | ||
encoding: Encoding, | ||
custom_encoding_func: Callable[[Any], bytes] = _not_implemented, | ||
custom_decoding_func: Callable[[bytes], Any] = _not_implemented, | ||
): | ||
self.column_family = column_family | ||
self.column_name = column_name | ||
self.metadata_key = metadata_key | ||
self.encoding = encoding | ||
self.custom_encoding_func = custom_encoding_func | ||
self.custom_decoding_func = custom_decoding_func |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could we just use a dataclass here? https://docs.python.org/3/library/dataclasses.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
from langchain_google_bigtable.document_loader import BigtableLoader, BigtableSaver | ||
|
||
__all__ = ["BigtableLoader", "BigtableSaver"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to expose Encoding + Metadata mapping here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
metadata_mappings: Optional. The array of mappings that maps from Bigtable columns to keys on the metadata dictionary, including the encoding to use when mapping from Bigtable bytes to a python type. | ||
""" | ||
self.client = ( | ||
(client or bigtable.Client(admin=True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be using the default client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
.table(table_id) | ||
) | ||
if content_encoding not in SUPPORTED_DOC_ENCODING: | ||
raise ValueError(f"{content_encoding} not in {SUPPORTED_DOC_ENCODING}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
raise ValueError(f"{content_encoding} not in {SUPPORTED_DOC_ENCODING}") | |
raise ValueError(f"content_encoding '{content_encoding}' not supported for content (must be {SUPPORTED_DOC_ENCODING}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in follow up PR
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 馃