# BigFrames + LangChain Integration

## Overview

Use this notebook to walk through example use cases of generating sample code by using BigQuery DataFrames LangChain integration to simplify the development of applications using large language models (LLMs). BigFramesChain supports batching prompts prediction using BigFrames DataFrames. 

Learn more about [BigQuery DataFrames](https://cloud.google.com/python/docs/reference/bigframes/latest)

## Installation

Install the following packages, which are required to run this notebook:


In [1]:
!pip install bigframes --upgrade --quiet
!pip install openai --upgrade openai --quiet

## Before you begin

### Set up your Google Cloud project

**Set your project ID**

If you don't know your project ID, try the following:

- Run ``gcloud config list``.

- Run ``gcloud projects list``.

See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113).

In [2]:
PROJECT_ID = "bigframes-dev"  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID}

Updated property [core/project].


**Set the region**

You can also change the REGION variable used by BigQuery.

Learn more about [BigQuery regions](https://cloud.google.com/bigquery/docs/locations#supported_locations).

In [3]:
REGION = "US"  # @param {type: "string"}

### Authenticate your Google Cloud account

Uncomment and run the following cell:

In [4]:
# ! gcloud auth login

### Import libraries

In [5]:
import bigframes.pandas as bf
from typing import List, Dict

### Set BigQuery DataFrames options

In [6]:
bf.options.bigquery.project = PROJECT_ID
bf.options.bigquery.location = REGION

If you want to reset the location of the created DataFrame or Series objects, reset the session by executing ``bf.reset_session()``. After that, you can reuse ``bf.options.bigquery.location`` to specify another location.

## Define the LLM model

BigQuery DataFrames provides integration with ``text-bison`` [model of the PaLM API](https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/text) via Vertex AI.


This section walks through a few steps required in order to use the model in your notebook.

### Create a BigQuery Cloud resource connection

You need to create a [Cloud resource connection](https://cloud.google.com/bigquery/docs/create-cloud-resource-connection) to enable BigQuery DataFrames to interact with Vertex AI services.

In [7]:
CONN_NAME = "bigframes-ml"

In [8]:
session = bf.get_global_session()
connection = f"{PROJECT_ID}.{REGION}.{CONN_NAME}"

### Using BigFramesLLM

**Case 1: Input is a String**

In [9]:
from bigframesllm import BigFramesLLM

llm = BigFramesLLM(session=session, 
                  connection=connection,
                  model="PaLM2TextGenerator",
                  max_new_tokens=128,
                  top_k=10,
                  top_p=0.95,
                  temperature=0.8,
)

# The output is a BigFrames DataFrames
llm("What is the capital of France ?")

[INFO][2023-10-11 21:40:35,289][bigframes.clients] Connector bigframes-dev.US.bigframes-ml already exists


HTML(value='Query job 78608b8a-3e7e-4b10-a9f6-2e7a93864720 is RUNNING. <a target="_blank" href="https://consol…

HTML(value='Query job dfb6f55d-1bc5-47a9-85e4-316a2495ed78 is RUNNING. <a target="_blank" href="https://consol…

HTML(value='Query job 0cc2ac68-abd5-428c-a4d9-65251ec2ce0e is RUNNING. <a target="_blank" href="https://consol…

HTML(value='Query job 6fdc3179-9673-45cc-9329-ffce58248753 is DONE. 8 Bytes processed. <a target="_blank" href…

HTML(value='Query job 105598dd-f656-494f-948e-7129f9d1d234 is DONE. 0 Bytes processed. <a target="_blank" href…

HTML(value='Query job 2c22a4a9-e33f-4b67-bac9-52ed0e7a6545 is DONE. 41 Bytes processed. <a target="_blank" hre…

Unnamed: 0,ml_generate_text_llm_result
0,The capital of France is Paris


**Case 2: Input is BigFrames DataFrames**

In [10]:
# The input is a BigFrames DataFrames
df = bf.DataFrame(
        {
            "prompt": [
                "What is BigQuery?",
                "What is BQML?",
                "What is BigQuery DataFrame?",
            ],
        }
    )

# The output is a BigFrames DataFrames
llm(df)

HTML(value='Query job 9dfca381-7421-4e28-ab09-36f717da0859 is RUNNING. <a target="_blank" href="https://consol…

HTML(value='Query job 7987f8cf-0725-40aa-989a-6e2729b98abd is RUNNING. <a target="_blank" href="https://consol…

HTML(value='Query job 49b8cfed-15bd-4c51-af02-5e176945fde6 is DONE. 24 Bytes processed. <a target="_blank" hre…

HTML(value='Query job 69ed8fd7-4c17-478c-a254-09a5ac271c08 is DONE. 0 Bytes processed. <a target="_blank" href…

HTML(value='Query job a2402d32-7c26-4154-8262-cb86573b4dd9 is DONE. 1.3 kB processed. <a target="_blank" href=…

Unnamed: 0,ml_generate_text_llm_result
0,"BigQuery is Google's fully managed, petabyte-..."
1,BigQuery ML (BQML) is a fully managed end-to-...
2,BigQuery DataFrame is a Python object that pr...


### Integrate the model in a BigFramesChain

**Case 1: Single Prompt**

- String as input
- BigFrames DataFrames as ouput

In [11]:
from langchain.prompts import PromptTemplate
from bigframesllm import BigFramesLLM
from bigframeschain import BigFramesChain

llm = BigFramesLLM(session=session, 
                  connection=connection,
                  model="PaLM2TextGenerator",
                  max_new_tokens=128,
                  top_k=10,
                  top_p=0.95,
                  temperature=0.8,
)

template = """Question: {question}
Answer: Let's think step by step."""

prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = BigFramesChain(prompt=prompt, llm=llm)

# answer is a BigFrames DataFrame
answer = llm_chain.run("What is BigFrames?")
answer

[INFO][2023-10-11 21:41:06,172][bigframes.clients] Connector bigframes-dev.US.bigframes-ml already exists


HTML(value='Query job 89cd628f-2e1c-43f1-866b-a810e61bb118 is RUNNING. <a target="_blank" href="https://consol…

HTML(value='Query job 7c8764bf-f796-419d-ac47-499b4e7bf25d is RUNNING. <a target="_blank" href="https://consol…

HTML(value='Query job cbeeeec9-f705-4967-a239-e86ffab8f61d is RUNNING. <a target="_blank" href="https://consol…

HTML(value='Query job 087d09d4-f836-4257-b87e-c3c7ae54d92c is DONE. 8 Bytes processed. <a target="_blank" href…

HTML(value='Query job 2fcf4bdc-f267-4b18-b8e2-4e686fab704f is DONE. 0 Bytes processed. <a target="_blank" href…

HTML(value='Query job ff687ccc-1aca-4231-a060-2478b64663d1 is DONE. 205 Bytes processed. <a target="_blank" hr…

Unnamed: 0,ml_generate_text_llm_result
0,BigFrames is a Python package designed specif...


**Case 2: Batch Prompts**

- BigFrames DataFrames as input
- BigFrames DataFrames as ouput

In [12]:
from langchain.prompts import PromptTemplate
from bigframesllm import BigFramesLLM
from bigframeschain import BigFramesChain

llm = BigFramesLLM(session=session, 
                  connection=connection,
                  model="PaLM2TextGenerator",
                  max_new_tokens=128,
                  top_k=10,
                  top_p=0.95,
                  temperature=0.8,
)

template = """Generate Pandas sample code for DataFrame.{api_name}"""

prompt = PromptTemplate(template=template, input_variables=["api_name"])

[INFO][2023-10-11 21:41:22,659][bigframes.clients] Connector bigframes-dev.US.bigframes-ml already exists


HTML(value='Query job 1d7660e2-5b47-4739-bac5-c9cbcb49494c is RUNNING. <a target="_blank" href="https://consol…

**Read data frpm GCS bucket**

In [13]:
## read input from GCS
df_api = bf.read_csv("gs://cloud-samples-data/vertex-ai/bigframe/df.csv")

df_api.head(4)

HTML(value='Load job f6050b00-2a26-4bf9-8aa8-13cef188848a is RUNNING. <a target="_blank" href="https://console…

HTML(value='Query job f2ef12d0-f6a0-4922-9727-aa52af813dad is DONE. 584 Bytes processed. <a target="_blank" hr…

HTML(value='Query job 2b64585f-b393-4a7f-8745-0a81adfa86b9 is DONE. 1.7 kB processed. <a target="_blank" href=…

Unnamed: 0,API
0,values
1,dtypes
2,ndim
3,shape


In [14]:
llm_chain = BigFramesChain(prompt=prompt, llm=llm)

In [15]:
# answer is a BigFrames DataFrame
answer = llm_chain.run(df_api["API"])

HTML(value='Query job 2765f546-cdf5-40fa-82dd-9292e8a62813 is RUNNING. <a target="_blank" href="https://consol…

HTML(value='Query job dbe90784-03ab-47ed-8b23-345a0dbaa55a is RUNNING. <a target="_blank" href="https://consol…

HTML(value='Query job da60cfae-040f-4b5b-9370-38a5bb34e083 is DONE. 584 Bytes processed. <a target="_blank" hr…

In [16]:
answer

HTML(value='Query job 383d47c9-1cd5-4478-840b-315229179d25 is DONE. 0 Bytes processed. <a target="_blank" href…

HTML(value='Query job 750069a7-e8d5-4a84-8658-375433629fe5 is DONE. 21.8 kB processed. <a target="_blank" href…

Unnamed: 0,ml_generate_text_llm_result
0,```python import pandas as pd # Create a Dat...
1,```python import pandas as pd # Create a Dat...
2,```python import pandas as pd # Create a Dat...
3,```python import pandas as pd # Create a Dat...
4,```python import pandas as pd # Create a Dat...
5,```python import pandas as pd # Create a Dat...
6,```python import pandas as pd # Create a Dat...
7,```python import pandas as pd # Create a Dat...
8,```python import pandas as pd # Create a Dat...
9,```python import pandas as pd # Create a Dat...


In [17]:
# Visualize the result
print(answer["ml_generate_text_llm_result"][0])

HTML(value='Query job bdf37d69-36aa-401a-a21b-6a5add6f1b86 is DONE. 21.8 kB processed. <a target="_blank" href…

 ```python
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Get the values as a NumPy array
values = df.values

# Print the values
print(values)

# Output:
# [[1 4]
#  [2 5]
#  [3 6]]
```


## Performance Comparison

### Compare with OpenAI

In [18]:
# prepare the 300 rows DataFrame
df_api_100 = bf.concat([ df_api, df_api, df_api, df_api, df_api])
df_api_300 = bf.concat([df_api_100, df_api_100]).sample(300).reset_index()

In [19]:
df_api_300.shape

HTML(value='Query job 503511b7-dee7-4eac-9771-04be0ee3dd6c is DONE. 584 Bytes processed. <a target="_blank" hr…

(300, 2)

In [20]:
llm_chain = BigFramesChain(prompt=prompt, llm=llm)

In [21]:
df_api.head(5)

HTML(value='Query job c5d7ca3e-a625-48e1-8786-4234caf090da is DONE. 584 Bytes processed. <a target="_blank" hr…

HTML(value='Query job af5f1a75-87d7-4079-9820-2ba0d8568da7 is DONE. 1.7 kB processed. <a target="_blank" href=…

Unnamed: 0,API
0,values
1,dtypes
2,ndim
3,shape
4,size


### Batch prompts using BigFramesLLM

In [22]:
# answer is a BigFrames DataFrame
answer = llm_chain.run(df_api_300["API"])

HTML(value='Query job 56e3b2f1-4e72-4d22-9a1e-b738b1d1e89a is RUNNING. <a target="_blank" href="https://consol…

HTML(value='Query job 15edfb5f-8ac7-4db6-9f58-e2bd173c952a is RUNNING. <a target="_blank" href="https://consol…

HTML(value='Query job 4a218648-b782-4634-8de7-24f2599f9ff6 is RUNNING. <a target="_blank" href="https://consol…

In [23]:
answer

HTML(value='Query job c7c79bd3-5936-45bb-ad80-995ef98755c5 is DONE. 0 Bytes processed. <a target="_blank" href…

HTML(value='Query job 760c58cc-0354-4c0e-bcb7-5c3cfafdae65 is RUNNING. <a target="_blank" href="https://consol…

Unnamed: 0,ml_generate_text_llm_result
0,```python import pandas as pd # Create a Dat...
1,```python import pandas as pd # Create a Dat...
2,```python import pandas as pd # Create a Dat...
3,```python import pandas as pd import numpy as...
4,```python import pandas as pd # Create a Dat...
5,```python import pandas as pd # Create two D...
6,```python import pandas as pd # Create a Dat...
7,```python import pandas as pd # Create a Dat...
8,```python import pandas as pd import numpy as...
9,```python import pandas as pd # Create two D...


## Agent

Intsall the required library

In [None]:
!pip install numexpr --quiet

In [24]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType


llm = BigFramesLLM(session=session, 
                  connection=connection,
                  model="PaLM2TextGenerator",
                  max_new_tokens=128,
                  top_k=10,
                  top_p=0.95,
                  temperature=0.8,
)

tools = load_tools(["llm-math"], llm=llm)

[INFO][2023-10-11 21:42:41,670][bigframes.clients] Connector bigframes-dev.US.bigframes-ml already exists


HTML(value='Query job 7e3e1b74-c808-49ba-9929-fc3fa31e5dbb is RUNNING. <a target="_blank" href="https://consol…

In [27]:
agent_executor = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
agent_executor.run("My father is 20 years old than me. I am 30 years old. How old is my father?")

llm_chain=LLMChain(prompt=PromptTemplate(input_variables=['input', 'agent_scratchpad'], template='Answer the following questions as best you can. You have access to the following tools:\n\nCalculator: Useful for when you need to answer questions about math.\n\nUse the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [Calculator]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin!\n\nQuestion: {input}\nThought:{agent_scratchpad}'), llm=BigFramesLLM(session=<bigframes.session.Session object at 0x7fc8c6ee2f50>, connection='bigframes-dev.US.bigframes-ml', temperature=0.8, top_k=10, client=PaLM2TextGenerator(connection_name='bigframes-dev.US.bigframes-ml',
                  

KeyError: 'agent'