# Explore How to Interact with AI on Logical/Physical Plans
Logical plan and Physical plans are the source-of-truth to understand how query is executed and optimized. It provides to AI the knowledge of query performance evaluation. This notebook aims to explore how to take advantage of this feature.

In [1]:
from pathlib import Path
db_address = Path("./spark-warehouse/")
if (db_address.exists()):
    import shutil
    shutil.rmtree(db_address)

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
schema = "langchain_example"
spark.sql(f"CREATE DATABASE IF NOT EXISTS {schema}")
spark.sql(f"USE {schema}")
csv_file_path = "../titanic.csv"
table = "titanic"
spark.read.csv(csv_file_path, header=True, inferSchema=True).write.saveAsTable(table, mode="overwrite")
spark.table(table).show()

23/05/14 11:41:08 WARN Utils: Your hostname, Mikes-Mac-mini.local resolves to a loopback address: 127.0.0.1; using 192.168.68.115 instead (on interface en1)
23/05/14 11:41:08 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/05/14 11:41:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/05/14 11:41:09 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
                                                                                

+-----------+--------+------+--------------------+------+----+-----+-----+----------------+-------+-----+--------+
|PassengerId|Survived|Pclass|                Name|   Sex| Age|SibSp|Parch|          Ticket|   Fare|Cabin|Embarked|
+-----------+--------+------+--------------------+------+----+-----+-----+----------------+-------+-----+--------+
|          1|       0|     3|Braund, Mr. Owen ...|  male|22.0|    1|    0|       A/5 21171|   7.25| null|       S|
|          2|       1|     1|Cumings, Mrs. Joh...|female|38.0|    1|    0|        PC 17599|71.2833|  C85|       C|
|          3|       1|     3|Heikkinen, Miss. ...|female|26.0|    0|    0|STON/O2. 3101282|  7.925| null|       S|
|          4|       1|     1|Futrelle, Mrs. Ja...|female|35.0|    1|    0|          113803|   53.1| C123|       S|
|          5|       0|     3|Allen, Mr. Willia...|  male|35.0|    0|    0|          373450|   8.05| null|       S|
|          6|       0|     3|    Moran, Mr. James|  male|null|    0|    0|      

In [2]:
import os
os.environ["OPENAI_API_KEY"]="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"
# os.environ["PATH"]=os.environ["PATH"]+":"+"/GitHub/langchain/"

In [12]:
from langchain.agents import create_spark_sql_agent
from langchain.chat_models import ChatOpenAI
from langchain.spark_sql import SparkSQL

spark_sql = SparkSQL(schema=schema)
llm = ChatOpenAI(temperature=0, model_name="gpt-4")
agent_executor = create_spark_sql_agent(
    db=spark_sql,
    llm=llm,
    enable_memory=True,
    enable_freestyle=True,
    verbose=True,
)

def ask(q: str) -> None:
    ans = agent_executor.run(q)
    import textwrap
    print("PRINTED ANSWER: \n", textwrap.fill(ans, width=80))

new3


In [13]:
ask("how many classes are there?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: list_tables_sql_db
Action Input: [0m
Observation: [38;5;200m[1;3mtitanic[0m
Thought:[32;1m[1;3mI need to check the schema of the titanic table to see if there is a column related to classes.
Action: schema_sql_db
Action Input: titanic[0m
Observation: [33;1m[1;3mCREATE TABLE langchain_example.titanic (
  PassengerId INT,
  Survived INT,
  Pclass INT,
  Name STRING,
  Sex STRING,
  Age DOUBLE,
  SibSp INT,
  Parch INT,
  Ticket STRING,
  Fare DOUBLE,
  Cabin STRING,
  Embarked STRING)
;

/*
3 rows from titanic table:
PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
1	0	3	Braund, Mr. Owen Harris	male	22.0	1	0	A/5 21171	7.25	None	S
2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Thayer)	female	38.0	1	0	PC 17599	71.2833	C85	C
3	1	3	Heikkinen, Miss. Laina	female	26.0	0	0	STON/O2. 3101282	7.925	None	S
*/[0m
Thought:[32;1m[1;3mThe titanic table has a column named Pclass which seems

In [14]:
ask("how's it optimized in logic plan? Explain the optimized logical plan to me")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: list_tables_sql_db
Action Input: ""[0m
Observation: [38;5;200m[1;3mtitanic[0m
Thought:[32;1m[1;3mThere is only one table called "titanic". I need to check its schema to understand the columns and their data types.
Action: schema_sql_db
Action Input: "titanic"[0m
Observation: [33;1m[1;3mCREATE TABLE langchain_example.titanic (
  PassengerId INT,
  Survived INT,
  Pclass INT,
  Name STRING,
  Sex STRING,
  Age DOUBLE,
  SibSp INT,
  Parch INT,
  Ticket STRING,
  Fare DOUBLE,
  Cabin STRING,
  Embarked STRING)
;

/*
3 rows from titanic table:
PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
1	0	3	Braund, Mr. Owen Harris	male	22.0	1	0	A/5 21171	7.25	None	S
2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Thayer)	female	38.0	1	0	PC 17599	71.2833	C85	C
3	1	3	Heikkinen, Miss. Laina	female	26.0	0	0	STON/O2. 3101282	7.925	None	S
*/[0m
Thought:[32;1m[1;3mI need to write a query to cou