# BigQueryVectorSearch
>[BigQueryVectorSearch](https://drive.google.com/file/d/1lIoTV_ytBWZRODairaAasyzf6ZXDaX9U/view?resourcekey=0-6C4l_L3aWAJXfgvHh8dfCg):
BigQuery vector search lets you use GoogleSQL to do semantic search, using vector indexes for fast but approximate results, or using brute force for exact results.


This tutorial illustrates how to work with an end-to-end data and embedding management system in LangChain, and provide scalable semantic search in BigQuery.

## Getting started


### Install the library

In [None]:
! pip install google-cloud-aiplatform langchain==0.0.316 google-cloud-bigquery pydantic==1.10.8 typing-inspect==0.8.0 typing_extensions==4.5.0 pandas openai==0.28.1 tiktoken datasets google-api-python-client pypdf faiss-cpu transformers config --upgrade --user

**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top.

In [None]:
# Automatically restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

## Before you begin

#### Set your project ID

If you don't know your project ID, try the following:
* Run `gcloud config list`.
* Run `gcloud projects list`.
* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113).

In [None]:
PROJECT_ID = ""  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID}

#### Set the region

You can also change the `REGION` variable used by BigQuery. Learn more about [BigQuery regions](https://cloud.google.com/bigquery/docs/locations#supported_locations).

In [None]:
REGION = "US"  # @param {type: "string"}

### Authenticating your notebook environment

- If you are using **Colab** to run this notebook, uncomment the cell below and continue.
- If you are using **Vertex AI Workbench**, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env).

In [None]:
from google.colab import auth as google_auth
google_auth.authenticate_user()

## Demo: BigQueryVectorSearch

In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings
import os
import getpass

# We want to use OpenAIEmbeddings so we have to get the OpenAI API Key.
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

In [None]:
embeddings = OpenAIEmbeddings()

In [None]:
from langchain.vectorstores.bigquery_vector_search import BigQueryVectorSearch
from langchain.vectorstores.utils import DistanceStrategy

DEFAULT_DISTANCE_STRATEGY = DistanceStrategy.EUCLIDEAN_DISTANCE

bq_vector_search = BigQueryVectorSearch(
                                project_id=PROJECT_ID,
                                dataset_name="your_dataset",
                                table_name="your_table",
                                content_field="text_content_column",
                                vector_field="embedding_column",
                                embedding=embeddings,
                                distance_strategy=DEFAULT_DISTANCE_STRATEGY,
                                location=REGION)

In [None]:
publication_number_to_add = "WO-03025453-B1"
bq_vector_search.add_texts(publication_number_to_add)

In [None]:
publication_number_as_query = "PL-346047-A1"
bq_vector_search.similarity_search_with_score(publication_number_as_query, k=2)

In [None]:
bq_vector_search.similarity_search(publication_number_as_query, k=2)