diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb
new file mode 100644
index 000000000..46c495528
--- /dev/null
+++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb
@@ -0,0 +1,690 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Copyright 2023 Google LLC\n",
+ "#\n",
+ "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+ "# you may not use this file except in compliance with the License.\n",
+ "# You may obtain a copy of the License at\n",
+ "#\n",
+ "# https://www.apache.org/licenses/LICENSE-2.0\n",
+ "#\n",
+ "# Unless required by applicable law or agreed to in writing, software\n",
+ "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+ "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+ "# See the License for the specific language governing permissions and\n",
+ "# limitations under the License."
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Use BigQuery DataFrames to cluster and characterize complaints\n",
+ "\n",
+ "
\n",
+ "\n",
+ " \n",
+ " \n",
+ " Run in Colab\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ " \n",
+ " View on GitHub\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ " \n",
+ " Open in Vertex AI Workbench\n",
+ " \n",
+ " | \n",
+ "
"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Overview\n",
+ "\n",
+ "The goal of this notebook is to demonstrate a comment characterization algorithm for an online business. We will accomplish this using [Google's PaLM 2](https://ai.google/discover/palm2/) and [KMeans clustering](https://en.wikipedia.org/wiki/K-means_clustering) in three steps:\n",
+ "\n",
+ "1. Use PaLM2TextEmbeddingGenerator to [generate text embeddings](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings) for each of 10000 complaints sent to an online bank. If you're not familiar with what a text embedding is, it's a list of numbers that are like coordinates in an imaginary \"meaning space\" for sentences. (It's like [word embeddings](https://en.wikipedia.org/wiki/Word_embedding), but for more general text.) The important point for our purposes is that similar sentences are close to each other in this imaginary space.\n",
+ "2. Use KMeans clustering to group together complaints whose text embeddings are near to eachother. This will give us sets of similar complaints, but we don't yet know _why_ these complaints are similar.\n",
+ "3. Prompt PaLM2TextGenerator in English asking what the difference is between the groups of complaints that we got. Thanks to the power of modern LLMs, the response might give us a very good idea of what these complaints are all about, but remember to [\"understand the limits of your dataset and model.\"](https://ai.google/responsibility/responsible-ai-practices/#:~:text=Understand%20the%20limitations%20of%20your%20dataset%20and%20model)\n",
+ "\n",
+ "We will tie these pieces together in Python using BigQuery DataFrames. [Click here](https://cloud.google.com/bigquery/docs/dataframes-quickstart) to learn more about BigQuery DataFrames!"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Dataset\n",
+ "\n",
+ "This notebook uses the [CFPB Consumer Complaint Database](https://console.cloud.google.com/marketplace/product/cfpb/complaint-database)."
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Costs\n",
+ "\n",
+ "This tutorial uses billable components of Google Cloud:\n",
+ "\n",
+ "* BigQuery (compute)\n",
+ "* BigQuery ML\n",
+ "* Generative AI support on Vertex AI\n",
+ "\n",
+ "Learn about [BigQuery compute pricing](https://cloud.google.com/bigquery/pricing#analysis_pricing_models), [Generative AI support on Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing#generative_ai_models),\n",
+ "and [BigQuery ML pricing](https://cloud.google.com/bigquery/pricing#bqml),\n",
+ "and use the [Pricing Calculator](https://cloud.google.com/products/calculator/)\n",
+ "to generate a cost estimate based on your projected usage."
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Before you begin\n",
+ "\n",
+ "Complete the tasks in this section to set up your environment."
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Set up your Google Cloud project\n",
+ "\n",
+ "**The following steps are required, regardless of your notebook environment.**\n",
+ "\n",
+ "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 credit towards your compute/storage costs.\n",
+ "\n",
+ "2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n",
+ "\n",
+ "3. [Click here](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com,bigqueryconnection.googleapis.com,run.googleapis.com,artifactregistry.googleapis.com,cloudbuild.googleapis.com,cloudresourcemanager.googleapis.com) to enable the following APIs:\n",
+ "\n",
+ " * BigQuery API\n",
+ " * BigQuery Connection API\n",
+ " * Cloud Run API\n",
+ " * Artifact Registry API\n",
+ " * Cloud Build API\n",
+ " * Cloud Resource Manager API\n",
+ " * Vertex AI API\n",
+ "\n",
+ "4. If you are running this notebook locally, install the [Cloud SDK](https://cloud.google.com/sdk)."
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Set your project ID\n",
+ "\n",
+ "**If you don't know your project ID**, see the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# set your project ID below\n",
+ "PROJECT_ID = \"\" # @param {type:\"string\"}\n",
+ "\n",
+ "# Set the project id in gcloud\n",
+ "! gcloud config set project {PROJECT_ID}"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Set the region\n",
+ "\n",
+ "You can also change the `REGION` variable used by BigQuery. Learn more about [BigQuery regions](https://cloud.google.com/bigquery/docs/locations#supported_locations)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "REGION = \"US\" # @param {type: \"string\"}"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Authenticate your Google Cloud account\n",
+ "\n",
+ "Depending on your Jupyter environment, you might have to manually authenticate. Follow the relevant instructions below."
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Vertex AI Workbench**\n",
+ "\n",
+ "Do nothing, you are already authenticated."
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Local JupyterLab instance**\n",
+ "\n",
+ "Uncomment and run the following cell:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# ! gcloud auth login"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Colab**\n",
+ "\n",
+ "Uncomment and run the following cell:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# from google.colab import auth\n",
+ "# auth.authenticate_user()"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If you want to reset the location of the created DataFrame or Series objects, reset the session by executing `bf.close_session()`. After that, you can reuse `bf.options.bigquery.location` to specify another location."
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Connect to Vertex AI\n",
+ "\n",
+ "In order to use PaLM2TextGenerator, we will need to set up a [cloud resource connection](https://cloud.google.com/bigquery/docs/create-cloud-resource-connection)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from google.cloud import bigquery_connection_v1 as bq_connection\n",
+ "\n",
+ "CONN_NAME = \"bqdf-llm\"\n",
+ "\n",
+ "client = bq_connection.ConnectionServiceClient()\n",
+ "new_conn_parent = f\"projects/{PROJECT_ID}/locations/{REGION}\"\n",
+ "exists_conn_parent = f\"projects/{PROJECT_ID}/locations/{REGION}/connections/{CONN_NAME}\"\n",
+ "cloud_resource_properties = bq_connection.CloudResourceProperties({})\n",
+ "\n",
+ "try:\n",
+ " request = client.get_connection(\n",
+ " request=bq_connection.GetConnectionRequest(name=exists_conn_parent)\n",
+ " )\n",
+ " CONN_SERVICE_ACCOUNT = f\"serviceAccount:{request.cloud_resource.service_account_id}\"\n",
+ "except Exception:\n",
+ " connection = bq_connection.types.Connection(\n",
+ " {\"friendly_name\": CONN_NAME, \"cloud_resource\": cloud_resource_properties}\n",
+ " )\n",
+ " request = bq_connection.CreateConnectionRequest(\n",
+ " {\n",
+ " \"parent\": new_conn_parent,\n",
+ " \"connection_id\": CONN_NAME,\n",
+ " \"connection\": connection,\n",
+ " }\n",
+ " )\n",
+ " response = client.create_connection(request)\n",
+ " CONN_SERVICE_ACCOUNT = (\n",
+ " f\"serviceAccount:{response.cloud_resource.service_account_id}\"\n",
+ " )\n",
+ "print(CONN_SERVICE_ACCOUNT)"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Set permissions for the service account\n",
+ "\n",
+ "The resource connection service account requires certain project-level permissions:\n",
+ " - `roles/aiplatform.user` and `roles/bigquery.connectionUser`: These roles are required for the connection to create a model definition using the LLM model in Vertex AI ([documentation](https://cloud.google.com/bigquery/docs/generate-text#give_the_service_account_access)).\n",
+ " - `roles/run.invoker`: This role is required for the connection to have read-only access to Cloud Run services that back custom/remote functions ([documentation](https://cloud.google.com/bigquery/docs/remote-functions#grant_permission_on_function)).\n",
+ "\n",
+ "Set these permissions by running the following `gcloud` commands:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/bigquery.connectionUser'\n",
+ "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/aiplatform.user'\n",
+ "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/run.invoker'"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Now we are ready to use BigQuery DataFrames!"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "xckgWno6ouHY"
+ },
+ "source": [
+ "## Step 1: Text embedding "
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Project Setup"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "R7STCS8xB5d2"
+ },
+ "outputs": [],
+ "source": [
+ "import bigframes.pandas as bf\n",
+ "\n",
+ "bf.options.bigquery.project = PROJECT_ID\n",
+ "bf.options.bigquery.location = REGION"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "v6FGschEowht"
+ },
+ "source": [
+ "Data Input - read the data from a publicly available BigQuery dataset"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "zDSwoBo1CU3G"
+ },
+ "outputs": [],
+ "source": [
+ "input_df = bf.read_gbq(\"bigquery-public-data.cfpb_complaints.complaint_database\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "tYDoaKgJChiq"
+ },
+ "outputs": [],
+ "source": [
+ "issues_df = input_df[[\"consumer_complaint_narrative\"]].dropna()\n",
+ "issues_df.head(n=5) # View the first five complaints"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Download 10000 complaints to use with PaLM2TextEmbeddingGenerator"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "OltYSUEcsSOW"
+ },
+ "outputs": [],
+ "source": [
+ "# Choose 10,000 complaints randomly and store them in a column in a DataFrame\n",
+ "downsampled_issues_df = issues_df.sample(n=10000)"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Wl2o-NYMoygb"
+ },
+ "source": [
+ "Generate the text embeddings"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "li38q8FzDDMu"
+ },
+ "outputs": [],
+ "source": [
+ "from bigframes.ml.llm import PaLM2TextEmbeddingGenerator\n",
+ "\n",
+ "model = PaLM2TextEmbeddingGenerator() # No connection id needed"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "cOuSOQ5FDewD"
+ },
+ "outputs": [],
+ "source": [
+ "# Will take ~3 minutes to compute the embeddings\n",
+ "predicted_embeddings = model.predict(downsampled_issues_df)\n",
+ "# Notice the lists of numbers that are our text embeddings for each complaint\n",
+ "predicted_embeddings.head() "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "4H_etYfsEOFP"
+ },
+ "outputs": [],
+ "source": [
+ "# Join the complaints with their embeddings in the same DataFrame\n",
+ "combined_df = downsampled_issues_df.join(predicted_embeddings)"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We now have the complaints and their text embeddings as two columns in our combined_df. Recall that complaints with numerically similar text embeddings should have similar meanings semantically. We will now group similar complaints together."
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "OUZ3NNbzo1Tb"
+ },
+ "source": [
+ "## Step 2: KMeans clustering"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "AhNTnEC5FRz2"
+ },
+ "outputs": [],
+ "source": [
+ "from bigframes.ml.cluster import KMeans\n",
+ "\n",
+ "cluster_model = KMeans(n_clusters=10) # We will divide our complaints into 10 groups"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Perform KMeans clustering"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "6poSxh-fGJF7"
+ },
+ "outputs": [],
+ "source": [
+ "# Use KMeans clustering to calculate our groups. Will take ~3 minutes.\n",
+ "cluster_model.fit(combined_df[[\"text_embedding\"]])\n",
+ "clustered_result = cluster_model.predict(combined_df[[\"text_embedding\"]])\n",
+ "# Notice the CENTROID_ID column, which is the ID number of the group that\n",
+ "# each complaint belongs to.\n",
+ "clustered_result.head(n=5)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Join the group number to the complaints and their text embeddings\n",
+ "combined_clustered_result = combined_df.join(clustered_result)"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Our dataframe combined_clustered_result now has three columns: the complaints, their text embeddings, and an ID from 1-10 (inclusive) indicating which semantically similar group they belong to."
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "21rNsFMHo8hO"
+ },
+ "source": [
+ "## Step 3: Summarize the complaints"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Build prompts - we will choose just two of our categories and prompt PaLM2TextGenerator to identify their salient characteristics. The prompt is natural language in a python string."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "2E7wXM_jGqo6"
+ },
+ "outputs": [],
+ "source": [
+ "# Using bigframes, with syntax identical to pandas,\n",
+ "# filter out the first and second groups\n",
+ "cluster_1_result = combined_clustered_result[\n",
+ " combined_clustered_result[\"CENTROID_ID\"] == 1\n",
+ "][[\"consumer_complaint_narrative\"]]\n",
+ "cluster_1_result_pandas = cluster_1_result.head(5).to_pandas()\n",
+ "\n",
+ "cluster_2_result = combined_clustered_result[\n",
+ " combined_clustered_result[\"CENTROID_ID\"] == 2\n",
+ "][[\"consumer_complaint_narrative\"]]\n",
+ "cluster_2_result_pandas = cluster_2_result.head(5).to_pandas()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "ZNDiueI9IP5e"
+ },
+ "outputs": [],
+ "source": [
+ "# Build plain-text prompts to send to PaLM 2. Use only 5 complaints from each group.\n",
+ "prompt1 = 'comment list 1:\\n'\n",
+ "for i in range(5):\n",
+ " prompt1 += str(i + 1) + '. ' + \\\n",
+ " cluster_1_result_pandas[\"consumer_complaint_narrative\"].iloc[i] + '\\n'\n",
+ "\n",
+ "prompt2 = 'comment list 2:\\n'\n",
+ "for i in range(5):\n",
+ " prompt2 += str(i + 1) + '. ' + \\\n",
+ " cluster_2_result_pandas[\"consumer_complaint_narrative\"].iloc[i] + '\\n'\n",
+ "\n",
+ "print(prompt1)\n",
+ "print(prompt2)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "BfHGJLirzSvH"
+ },
+ "outputs": [],
+ "source": [
+ "# The plain English request we will make of PaLM 2\n",
+ "prompt = (\n",
+ " \"Please highlight the most obvious difference between\"\n",
+ " \"the two lists of comments:\\n\" + prompt1 + prompt2\n",
+ ")\n",
+ "print(prompt)"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Get a response from PaLM 2 LLM by making a call to Vertex AI using our connection."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "mL5P0_3X04dE"
+ },
+ "outputs": [],
+ "source": [
+ "from bigframes.ml.llm import PaLM2TextGenerator\n",
+ "\n",
+ "session = bf.get_global_session()\n",
+ "connection = f\"{PROJECT_ID}.{REGION}.{CONN_NAME}\"\n",
+ "q_a_model = PaLM2TextGenerator(session=session, connection_name=connection)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "ICWHsqAW1FNk"
+ },
+ "outputs": [],
+ "source": [
+ "# Make a DataFrame containing only a single row with our prompt for PaLM 2\n",
+ "df = bf.DataFrame({\"prompt\": [prompt]})"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "gB7e1LXU1pst"
+ },
+ "outputs": [],
+ "source": [
+ "# Send the request for PaLM 2 to generate a response to our prompt\n",
+ "major_difference = q_a_model.predict(df)\n",
+ "# PaLM 2's response is the only row in the dataframe result \n",
+ "major_difference[\"ml_generate_text_llm_result\"].iloc[0]"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We now see PaLM2TextGenerator's characterization of the different comment groups. Thanks for using BigQuery DataFrames!"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.16"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/noxfile.py b/noxfile.py
index 34b055de4..3dd23ba04 100644
--- a/noxfile.py
+++ b/noxfile.py
@@ -609,6 +609,7 @@ def notebook(session):
# our test infrastructure.
"notebooks/getting_started/getting_started_bq_dataframes.ipynb",
"notebooks/generative_ai/bq_dataframes_llm_code_generation.ipynb",
+ "notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb",
"notebooks/regression/bq_dataframes_ml_linear_regression.ipynb",
"notebooks/generative_ai/bq_dataframes_ml_drug_name_generation.ipynb",
"notebooks/vertex_sdk/sdk2_bigframes_pytorch.ipynb",