Skip to content

Commit

Permalink
feat: BigtableLoader and BigtableSaver implementation (#5)
Browse files Browse the repository at this point in the history
* fix repo

* feat: BigtableLoader and BigtableSaver implementation

* add test

* add some tests

* add load test

* fix load test

* fix merge

* Add tests

* update doc pip install command

* move tests to tests folder

* Update docs

* fix docs

* memoize client

* fixed PR comments
  • Loading branch information
ron-gal authored Feb 7, 2024
1 parent 9519778 commit c0f4244
Show file tree
Hide file tree
Showing 4 changed files with 839 additions and 41 deletions.
198 changes: 157 additions & 41 deletions docs/document_loader.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,72 +4,103 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google DATABASE\n",
"\n",
"[Google DATABASE](https://cloud.google.com/DATABASE).\n",
"\n",
"Load documents from `DATABASE`."
"# Bigtable\n",
"[Bigtable](https://cloud.google.com/bigtable) is a key-value and wide-column store, ideal for fast access to structured, semi-structured, or unstructured data.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pre-reqs"
"## Setting up\n",
"\n",
"To run this notebook, you will need a [Google Cloud Project](https://developers.google.com/workspace/guides/create-project), a [Bigtable instance](https://cloud.google.com/bigtable/docs/creating-instance), and [Google credentials](https://developers.google.com/workspace/guides/create-credentials)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"metadata": {},
"outputs": [],
"source": [
"%pip install PACKAGE_NAME"
"%pip install langchain-google-bigtable"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Querying for Documents from Bigtable\n",
"For more details on connecting to a Bigtable table, please check the [Python SDK documentation](https://cloud.google.com/python/docs/reference/bigtable/latest/client)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"tags": []
},
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from PACKAGE import LOADER"
"from langchain_google_bigtable import BigtableLoader\n",
"\n",
"instance_id = \"my_instance\"\n",
"table_id = \"my_table\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Usage"
"## Create the Loader"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loader = BigtableLoader(\n",
" instance_id,\n",
" table_id,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load from table"
"### Load from table\n",
"\n",
"You can fetch the documents by calling the `lazy_load` method that returns an Iterator of documents."
]
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": null,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"page_content='address: 8301 Hollister Ave\\nalias: None\\ncheckin: 12PM\\ncheckout: 4PM\\ncity: Santa Barbara\\ncountry: United States\\ndescription: Located on 78 acres of oceanfront property, this resort is an upscale experience that caters to luxury travelers. There are 354 guest rooms in 19 separate villas, each in a Spanish style. Property amenities include saline infinity pools, a private beach, clay tennis courts, a 42,000 foot spa and fitness center, and nature trails through the adjoining wetland and forest. The onsite Miro restaurant provides great views of the coast with excellent food and service. With all that said, you pay for the experience, and this resort is not for the budget traveler. In addition to quoted rates there is a $25 per day resort fee that includes a bottle of wine in your room, two bottles of water, access to fitness center and spa, and internet access.\\ndirections: None\\nemail: None\\nfax: None\\nfree_breakfast: True\\nfree_internet: False\\nfree_parking: False\\ngeo: {\\'accuracy\\': \\'ROOFTOP\\', \\'lat\\': 34.43429, \\'lon\\': -119.92137}\\nid: 10180\\nname: Bacara Resort & Spa\\npets_ok: False\\nphone: None\\nprice: $300-$1000+\\npublic_likes: [\\'Arnoldo Towne\\', \\'Olaf Turcotte\\', \\'Ruben Volkman\\', \\'Adella Aufderhar\\', \\'Elwyn Franecki\\']\\nreviews: [{\\'author\\': \\'Delmer Cole\\', \\'content\\': \"Jane and Joyce make every effort to see to your personal needs and comfort. The rooms take one back in time to the original styles and designs of the 1800\\'s. A real connection to local residents, the 905 is a regular tour stop and the oldest hotel in the French Quarter. My wife and I prefer to stay in the first floor rooms where there is a sitting room with TV, bedroom, bath and kitchen. The kitchen has a stove and refrigerator, sink, coffeemaker, etc. Plus there is a streetside private entrance (very good security system) and a covered balcony area with seating so you can watch passersby. Quaint, cozy, and most of all: ORIGINAL. No plastic remods. Feels like my great Grandmother\\'s place. While there are more luxurious places to stay, if you want the real flavor and eclectic style of N.O. you have to stay here. It just FEELS like New Orleans. The location is one block towards the river from Bourbon Street and smack dab in the middle of everything. Royal street is one of the nicest residential streets in the Quarter and you can walk back to your room and get some peace and quiet whenever you like. The French Quarter is always busy so we bring a small fan to turn on to make some white noise so we can sleep more soundly. Works great. You might not need it at the 905 but it\\'s a necessity it if you stay on or near Bourbon Street, which is very loud all the time. Parking tips: You can park right in front to unload and it\\'s only a couple blocks to the secure riverfront parking area. Plus there are several public parking lots nearby. My strategy is to get there early, unload, and drive around for a while near the hotel. It\\'s not too hard to find a parking place but be careful about where it is. Stay away from corner spots since streets are narrow and delivery trucks don\\'t have the room to turn and they will hit your car. Take note of the signs. Tuesday and Thursday they clean the streets and you can\\'t park in many areas when they do or they will tow your car. Once you find a spot don\\'t move it since everything is walking distance. If you find a good spot and get a ticket it will cost $20, which is cheaper than the daily rate at most parking garages. Even if you don\\'t get a ticket make sure to go online to N.O. traffic ticket site to check your license number for violations. Some local kids think it\\'s funny to take your ticket and throw it away since the fine doubles every month it\\'s not paid. You don\\'t know you got a ticket but your fine is getting bigger. We\\'ve been coming to the French Quarter for years and have stayed at many of the local hotels. The 905 Royal is our favorite.\", \\'date\\': \\'2013-12-05 09:27:07 +0300\\', \\'ratings\\': {\\'Cleanliness\\': 5, \\'Location\\': 5, \\'Overall\\': 5, \\'Rooms\\': 5, \\'Service\\': 5, \\'Sleep Quality\\': 5, \\'Value\\': 5}}, {\\'author\\': \\'Orval Lebsack\\', \\'content\\': \\'I stayed there with a friend for a girls trip around St. Patricks Day. This was my third time to NOLA, my first at Chateau Lemoyne. The location is excellent....very easy walking distance to everything, without the chaos of staying right on Bourbon Street. Even though its a Holiday Inn, it still has the historical feel and look of NOLA. The pool looked nice too, even though we never used it. The staff was friendly and helpful. Chateau Lemoyne would be hard to top, considering the price.\\', \\'date\\': \\'2013-10-26 15:01:39 +0300\\', \\'ratings\\': {\\'Cleanliness\\': 5, \\'Location\\': 5, \\'Overall\\': 4, \\'Rooms\\': 4, \\'Service\\': 4, \\'Sleep Quality\\': 5, \\'Value\\': 4}}, {\\'author\\': \\'Hildegard Larkin\\', \\'content\\': \\'This hotel is a safe bet for a value stay in French Quarter. Close enough to all sites and action but just out of the real loud & noisy streets. Check in is quick and friendly and room ( king side balcony) while dated was good size and clean. Small balcony with table & chairs is a nice option for evening drink & passing sites below. Down side is no mimi bar fridge ( they are available upon request on a first come basis apparently, so book one when you make initial reservation if necessary) Bathroom is adequate with ok shower pressure and housekeeping is quick and efficient. TIP; forget paying high price for conducted local tours, just take the red trams to end of line and back and then next day the green tram to cross town garden district and zoo and museums. cost for each ride $2.00 each way!! fantastic. Tip: If you stay during hot weather make sure you top up on ice early as later guests can \"run the machine dry\" for short time. Overall experience met expectations and would recommend for value stay.\\', \\'date\\': \\'2012-01-01 18:48:30 +0300\\', \\'ratings\\': {\\'Cleanliness\\': 4, \\'Location\\': 4, \\'Overall\\': 4, \\'Rooms\\': 3, \\'Service\\': 4, \\'Sleep Quality\\': 3, \\'Value\\': 4}}, {\\'author\\': \\'Uriah Rohan\\', \\'content\\': \\'The Chateau Le Moyne Holiday Inn is in a perfect location in the French Quarter, a block away from the craziness on Bourbon St. We got a fantastic deal on Priceline and were expecting a standard room for the price. The pleasant hotel clerk upgraded our room much to our delight, without us asking and the concierge also went above an beyond to assist us with information and suggestions for places to dine and possessed an \"can do\" attitude. Nice pool area to cool off in during the midday NOLA heat. It is definitely a three star establishment, not super luxurious but the beds were comfy and the location superb! If you can get a deal on Priceline, etc, it\\\\\\'s a great value.\\', \\'date\\': \\'2014-08-04 15:17:49 +0300\\', \\'ratings\\': {\\'Cleanliness\\': 4, \\'Location\\': 5, \\'Overall\\': 4, \\'Rooms\\': 3, \\'Service\\': 5, \\'Sleep Quality\\': 4, \\'Value\\': 4}}]\\nstate: California\\ntitle: Goleta\\ntollfree: None\\ntype: hotel\\nurl: http://www.bacararesort.com/\\nvacancy: True'\n"
]
}
],
"source": [
"loader = LOADER()\n",
"\n",
"data = loader.load()"
"for doc in loader.lazy_load():\n",
" print(doc)\n",
" break"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load from query"
"## Limiting the returned rows\n",
"There are two ways to limit the returned rows:\n",
"1. Using a [filter](https://cloud.google.com/python/docs/reference/bigtable/latest/row-filters)\n",
"2. Using a [row_set](https://cloud.google.com/python/docs/reference/bigtable/latest/row-set#google.cloud.bigtable.row_set.RowSet)"
]
},
{
Expand All @@ -78,41 +109,81 @@
"metadata": {},
"outputs": [],
"source": [
"loader = LOADER()\n",
"import google.cloud.bigtable.row_filters as row_filters\n",
"\n",
"filter_loader = BigtableLoader(\n",
" instance_id, table_id, filter=row_filters.ColumnQualifierRegexFilter(b\"os_build\")\n",
")\n",
"\n",
"\n",
"from google.cloud.bigtable.row_set import RowSet\n",
"\n",
"data = loader.load()"
"row_set = RowSet()\n",
"row_set.add_row_range_from_keys(\n",
" start_key=\"phone#4c410523#20190501\", end_key=\"phone#4c410523#201906201\"\n",
")\n",
"\n",
"row_set_loader = BigtableLoader(\n",
" instance_id,\n",
" table_id,\n",
" row_set=row_set,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Customize Document Page Content & Metadata"
"## Custom client\n",
"The client created by default is the default client, using only admin=True option. To use a non-default, a [custom client](https://cloud.google.com/python/docs/reference/bigtable/latest/client#class-googlecloudbigtableclientclientprojectnone-credentialsnone-readonlyfalse-adminfalse-clientinfonone-clientoptionsnone-adminclientoptionsnone-channelnone) can be passed to the constructor."
]
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loader = LOADER()\n",
"from google.cloud import bigtable\n",
"\n",
"data = loader.load()"
"custom_client_loader = BigtableLoader(\n",
" instance_id,\n",
" table_id,\n",
" client=bigtable.Client(...),\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Customize Page Content Format"
"## Custom content\n",
"The BigtableLoader assumes there is a column family called `langchain`, that has a column called `content`, that contains values encoded in UTF-8. These defaults can be changed like so:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_bigtable import Encoding\n",
"\n",
"custom_content_loader = BigtableLoader(\n",
" instance_id,\n",
" table_id,\n",
" content_encoding=Encoding.ASCII,\n",
" content_column_family=\"my_content_family\",\n",
" content_column_name=\"my_content_column_name\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Save Documents to table"
"## Metadata mapping\n",
"By default, the `metadata` map on the `Document` object will contain a single key, `rowkey`, with the value of the row's rowkey value. To add more items to that map, use metadata_mapping."
]
},
{
Expand All @@ -121,15 +192,37 @@
"metadata": {},
"outputs": [],
"source": [
"saver = SAVER()\n",
"saver.add_documents(docs)"
"from langchain_google_bigtable import MetadataMapping\n",
"import json\n",
"\n",
"metadata_mapping_loader = BigtableLoader(\n",
" instance_id,\n",
" table_id,\n",
" metadata_mappings=[\n",
" MetadataMapping(\n",
" column_family=\"my_int_family\",\n",
" column_name=\"my_int_column\",\n",
" metadata_key=\"key_in_metadata_map\",\n",
" encoding=Encoding.INT_BIG_ENDIAN,\n",
" ),\n",
" MetadataMapping(\n",
" column_family=\"my_custom_family\",\n",
" column_name=\"my_custom_column\",\n",
" metadata_key=\"custom_key\",\n",
" encoding=Encoding.CUSTOM,\n",
" custom_decoding_func=lambda input: json.loads(input.decode()),\n",
" custom_encoding_func=lambda input: str.encode(json.dumps(input)),\n",
" ),\n",
" ],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Customize Connection & Authentication"
"## Using the saver\n",
"It is possible to save documents into Bigtable using the BigtableSaver. The BigtableSaver constructor is very similar to the BigtableLoader's one."
]
},
{
Expand All @@ -138,13 +231,36 @@
"metadata": {},
"outputs": [],
"source": [
"from google.cloud.DATABASE import Client\n",
"from langchain_google_bigtable import BigtableSaver\n",
"from langchain_core.documents import Document\n",
"\n",
"creds = \"\"\n",
"client = Client(creds=creds)\n",
"loader = LOADER(\n",
" client=client,\n",
")"
"saver = BigtableSaver(\n",
" instance_id,\n",
" table_id,\n",
" client=bigtable.Client(...),\n",
" content_encoding=Encoding.ASCII,\n",
" content_column_family=\"my_content_family\",\n",
" content_column_name=\"my_content_column_name\",\n",
" metadata_mappings=[\n",
" MetadataMapping(\n",
" column_family=\"my_int_family\",\n",
" column_name=\"my_int_column\",\n",
" metadata_key=\"key_in_metadata_map\",\n",
" encoding=Encoding.INT_BIG_ENDIAN,\n",
" ),\n",
" MetadataMapping(\n",
" column_family=\"my_custom_family\",\n",
" column_name=\"my_custom_column\",\n",
" metadata_key=\"custom_key\",\n",
" encoding=Encoding.CUSTOM,\n",
" custom_decoding_func=lambda input: json.loads(input.decode()),\n",
" custom_encoding_func=lambda input: str.encode(json.dumps(input)),\n",
" ),\n",
" ],\n",
")\n",
"\n",
"saver.add_documents([Document(), Document()])\n",
"saver.delete([Document(), Document()])"
]
}
],
Expand All @@ -169,4 +285,4 @@
},
"nbformat": 4,
"nbformat_minor": 4
}
}
9 changes: 9 additions & 0 deletions src/langchain_google_bigtable/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,12 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from langchain_google_bigtable.document_loader import (
BigtableLoader,
BigtableSaver,
Encoding,
MetadataMapping,
)

__all__ = ["BigtableLoader", "BigtableSaver", "MetadataMapping", "Encoding"]
Loading

0 comments on commit c0f4244

Please sign in to comment.