Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: BigtableLoader and BigtableSaver implementation #5

Merged
merged 18 commits into from
Feb 7, 2024
Merged

Conversation

ron-gal
Copy link
Contributor

@ron-gal ron-gal commented Jan 23, 2024

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 馃

@ron-gal ron-gal requested a review from a team as a code owner January 23, 2024 18:38
@ron-gal ron-gal changed the title Loader feat: BigtableLoader and BigtableSaver implementation Jan 23, 2024
@product-auto-label product-auto-label bot added the api: bigtable Issues related to the googleapis/langchain-google-bigtable-python API. label Feb 6, 2024
"## Pre-reqs"
"## Setting up\n",
"\n",
"To run this notebook, you will need a Google Cloud Project, a Bigtable instance, and Google credentials."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should include link to instructions at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load from table"
"You can fetch the documents by calling the `load` method of the loader. It will return a list with all the documents. If you want to avoid this blocking call, you can call `lazy_load` method that returns an Iterator."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should encourage lazy_load -- load is only around for legacy reasons IIUC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 107 to 110
"docs_iterator = loader.lazy_load()\n",
"for doc in docs_iterator:\n",
" print(doc)\n",
" break"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should probably just be: for doc in loader.lazy_load()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@kurtisvg kurtisvg self-assigned this Feb 7, 2024
@kurtisvg kurtisvg self-requested a review February 7, 2024 16:09
Comment on lines 48 to 63
class MetadataMapping:
def __init__(
self,
column_family: str,
column_name: str,
metadata_key: str,
encoding: Encoding,
custom_encoding_func: Callable[[Any], bytes] = _not_implemented,
custom_decoding_func: Callable[[bytes], Any] = _not_implemented,
):
self.column_family = column_family
self.column_name = column_name
self.metadata_key = metadata_key
self.encoding = encoding
self.custom_encoding_func = custom_encoding_func
self.custom_decoding_func = custom_decoding_func
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we just use a dataclass here? https://docs.python.org/3/library/dataclasses.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


from langchain_google_bigtable.document_loader import BigtableLoader, BigtableSaver

__all__ = ["BigtableLoader", "BigtableSaver"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to expose Encoding + Metadata mapping here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

metadata_mappings: Optional. The array of mappings that maps from Bigtable columns to keys on the metadata dictionary, including the encoding to use when mapping from Bigtable bytes to a python type.
"""
self.client = (
(client or bigtable.Client(admin=True))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be using the default client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ron-gal ron-gal requested a review from kurtisvg February 7, 2024 16:31
.table(table_id)
)
if content_encoding not in SUPPORTED_DOC_ENCODING:
raise ValueError(f"{content_encoding} not in {SUPPORTED_DOC_ENCODING}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
raise ValueError(f"{content_encoding} not in {SUPPORTED_DOC_ENCODING}")
raise ValueError(f"content_encoding '{content_encoding}' not supported for content (must be {SUPPORTED_DOC_ENCODING}")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in follow up PR

@ron-gal ron-gal merged commit c0f4244 into main Feb 7, 2024
8 checks passed
@ron-gal ron-gal deleted the loader branch February 7, 2024 16:48
@release-please release-please bot mentioned this pull request Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the googleapis/langchain-google-bigtable-python API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants