# Gemini-Powered SQL Query Assistant

This project is an intelligent SQL query assistant powered by Google's **Gemini 1.5 Pro** LLM and LangChain. It connects to a MySQL database, generates SQL queries based on natural language input, and retrieves accurate results. The project also includes semantic similarity-based example selection for better query generation.

## Features

- **Gemini LLM Integration**: Leverages Google's Gemini 1.5 Pro for natural language processing.
- **Semantic Example Selector**: Enhances prompt quality using Chroma and HuggingFace embeddings.
- **Customizable Prompt Templates**: Dynamically selects few-shot examples based on user input.
- **Database Connectivity**: Connects to a MySQL database and generates SQL queries.
- **LangChain SQL Support**: Uses LangChain's `SQLDatabaseChain` for query execution.
- **Caching**: Includes a customizable caching mechanism for query optimization.




## Technologies Used

- **LangChain**: Framework for building LLM-powered applications.
- **Google Gemini 1.5 Pro**: LLM for query understanding and generation.
- **HuggingFace Embeddings**: Model `sentence-transformers/all-MiniLM-L6-v2` for semantic similarity.
- **Chroma**: Vector database for storing and retrieving example embeddings.
- **Pydantic**: Data validation and management for LangChain components.
- **MySQL**: Backend database for executing SQL queries.


In [5]:
few_shots = [
    {'Question' : "What is the total number of shares for Apple Inc.?", 
     'SQLQuery' : "SELECT total_shares FROM stocks WHERE company_name = 'Apple Inc.'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "1000000"},
    
    {'Question' : "What is the current price of Tesla Inc. stock?", 
     'SQLQuery' : "SELECT current_price FROM stocks WHERE company_name = 'Tesla Inc.'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "450.75"},
    
    {'Question' : "What is the total sales volume for Microsoft Corp. on January 1st, 2024?", 
     'SQLQuery' : "SELECT volume FROM price_history JOIN stocks ON price_history.stock_id = stocks.stock_id WHERE company_name = 'Microsoft Corp.' AND date = '2024-01-01'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "2000000"},
    
    {'Question' : "How many shares of Amazon.com Inc. were sold on January 2nd, 2024?", 
     'SQLQuery' : "SELECT SUM(quantity) FROM stock_transactions JOIN stocks ON stock_transactions.stock_id = stocks.stock_id WHERE company_name = 'Amazon.com Inc.' AND transaction_date = '2024-01-02' AND transaction_type = 'Sell'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "500"},
    
    {'Question' : "What was the opening price of the stock of NVIDIA Corp. on January 1st, 2024?", 
     'SQLQuery' : "SELECT open_price FROM price_history JOIN stocks ON price_history.stock_id = stocks.stock_id WHERE company_name = 'NVIDIA Corp.' AND date = '2024-01-01'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "120.00"},
    
    {'Question' : "What was the closing price of Oracle Corp. stock on January 1st, 2024?", 
     'SQLQuery' : "SELECT close_price FROM price_history JOIN stocks ON price_history.stock_id = stocks.stock_id WHERE company_name = 'Oracle Corp.' AND date = '2024-01-01'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "145.50"},
    
    {'Question' : "How much was the total amount spent on buying stocks of Meta Platforms Inc. on January 2nd, 2024?", 
     'SQLQuery' : "SELECT SUM(price_per_share * quantity) FROM stock_transactions JOIN stocks ON stock_transactions.stock_id = stocks.stock_id WHERE company_name = 'Meta Platforms Inc.' AND transaction_date = '2024-01-02' AND transaction_type = 'Buy'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "15600.00"},
    
    {'Question' : "What is the high price of Tesla Inc. stock on January 1st, 2024?", 
     'SQLQuery' : "SELECT high_price FROM price_history JOIN stocks ON price_history.stock_id = stocks.stock_id WHERE company_name = 'Tesla Inc.' AND date = '2024-01-01'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "465.00"},
    
    {'Question' : "How many shares of Meta Platforms Inc. are available?", 
     'SQLQuery' : "SELECT total_shares FROM stocks WHERE company_name = 'Meta Platforms Inc.'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "1500000"}
]

In [5]:
import sqlite3
import os
from dotenv import load_dotenv
load_dotenv()  # take environment variables from .env (especially openai api key)


__import__('pysqlite3')
import sys
sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

from langchain_google_genai import GoogleGenerativeAI
from langchain.utilities import SQLDatabase
from langchain_experimental.sql import SQLDatabaseChain
from langchain.prompts import SemanticSimilarityExampleSelector
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.prompts import FewShotPromptTemplate
from langchain.chains.sql_database.prompt import PROMPT_SUFFIX, _mysql_prompt
from langchain.prompts.prompt import PromptTemplate


def get_few_shot_db_chain():
    db_user = "root"
    db_password = "root"
    db_host = "localhost"
    db_name = "stock_market"


    few_shots = [
    {'Question' : "What is the total number of shares for Apple Inc.?", 
     'SQLQuery' : "SELECT total_shares FROM stocks WHERE company_name = 'Apple Inc.'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "1000000"},
    
    {'Question' : "What is the current price of Tesla Inc. stock?", 
     'SQLQuery' : "SELECT current_price FROM stocks WHERE company_name = 'Tesla Inc.'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "450.75"},
    
    {'Question' : "What is the total sales volume for Microsoft Corp. on January 1st, 2024?", 
     'SQLQuery' : "SELECT volume FROM price_history JOIN stocks ON price_history.stock_id = stocks.stock_id WHERE company_name = 'Microsoft Corp.' AND date = '2024-01-01'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "2000000"},
    
    {'Question' : "How many shares of Amazon.com Inc. were sold on January 2nd, 2024?", 
     'SQLQuery' : "SELECT SUM(quantity) FROM stock_transactions JOIN stocks ON stock_transactions.stock_id = stocks.stock_id WHERE company_name = 'Amazon.com Inc.' AND transaction_date = '2024-01-02' AND transaction_type = 'Sell'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "500"},
    
    {'Question' : "What was the opening price of the stock of NVIDIA Corp. on January 1st, 2024?", 
     'SQLQuery' : "SELECT open_price FROM price_history JOIN stocks ON price_history.stock_id = stocks.stock_id WHERE company_name = 'NVIDIA Corp.' AND date = '2024-01-01'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "120.00"},
    
    {'Question' : "What was the closing price of Oracle Corp. stock on January 1st, 2024?", 
     'SQLQuery' : "SELECT close_price FROM price_history JOIN stocks ON price_history.stock_id = stocks.stock_id WHERE company_name = 'Oracle Corp.' AND date = '2024-01-01'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "145.50"},
    
    {'Question' : "How much was the total amount spent on buying stocks of Meta Platforms Inc. on January 2nd, 2024?", 
     'SQLQuery' : "SELECT SUM(price_per_share * quantity) FROM stock_transactions JOIN stocks ON stock_transactions.stock_id = stocks.stock_id WHERE company_name = 'Meta Platforms Inc.' AND transaction_date = '2024-01-02' AND transaction_type = 'Buy'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "15600.00"},
    
    {'Question' : "What is the high price of Tesla Inc. stock on January 1st, 2024?", 
     'SQLQuery' : "SELECT high_price FROM price_history JOIN stocks ON price_history.stock_id = stocks.stock_id WHERE company_name = 'Tesla Inc.' AND date = '2024-01-01'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "465.00"},
    
    {'Question' : "How many shares of Meta Platforms Inc. are available?", 
     'SQLQuery' : "SELECT total_shares FROM stocks WHERE company_name = 'Meta Platforms Inc.'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "1500000"}
    ]

    # Connect to MySQL database
    db = SQLDatabase.from_uri(f"mysql+pymysql://{db_user}:{db_password}@{db_host}/{db_name}",
                              sample_rows_in_table_info=3)
    
    print(db.table_info)

    # Initialize Gemini LLM via Google Vertex AI
    llm = GoogleGenerativeAI(
        model="gemini-1.5-pro",  # Specify the Gemini model
        temperature=0.1,
        max_output_tokens=1024,  # Limit the output token length
        top_p=0.95,  # Optional: Adjust the probability sampling
    )

    # Set up embeddings
    embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
    to_vectorize = [" ".join(example.values()) for example in few_shots]
    vectorstore = Chroma.from_texts(to_vectorize, embeddings, metadatas=few_shots)

    # Example selector for few-shot prompting
    example_selector = SemanticSimilarityExampleSelector(
        vectorstore=vectorstore,
        k=2,
    )

    # Prompt template for MySQL queries
    mysql_prompt = """You are a MySQL expert. Given an input question, first create a syntactically correct MySQL query to run, then look at the results of the query and return the answer to the input question.
    Unless the user specifies in the question a specific number of examples to obtain, query for at most {top_k} results using the LIMIT clause as per MySQL. You can order the results to return the most informative data in the database.
    Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in backticks (`) to denote them as delimited identifiers.
    Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
    Pay attention to use CURDATE() function to get the current date, if the question involves "today".
    
    Use the following format:
    
    Question: Question here
    SQLQuery: Query to run with no pre-amble
    SQLResult: Result of the SQLQuery
    Answer: Final answer here
    
    No pre-amble.
    """

    # Few-shot example prompt
    example_prompt = PromptTemplate(
        input_variables=["Question", "SQLQuery", "SQLResult", "Answer"],
        template="\nQuestion: {Question}\nSQLQuery: {SQLQuery}\nSQLResult: {SQLResult}\nAnswer: {Answer}",
    )

    # Full few-shot prompt template
    few_shot_prompt = FewShotPromptTemplate(
        example_selector=example_selector,
        example_prompt=example_prompt,
        prefix=mysql_prompt,
        suffix=PROMPT_SUFFIX,
        input_variables=["input", "table_info", "top_k"],  # These variables are used in the prefix and suffix
    )

    # Create the SQLDatabaseChain
    chain = SQLDatabaseChain.from_llm(llm, db, verbose=True, prompt=few_shot_prompt)
    return chain



# Call the function to get the chain
chain = get_few_shot_db_chain()

# Run the query
response = chain.run("How many shares of Meta Platforms Inc. are available?")
print(response)


CREATE TABLE price_history (
	price_id INTEGER NOT NULL AUTO_INCREMENT, 
	stock_id INTEGER NOT NULL, 
	date DATE NOT NULL, 
	open_price DECIMAL(10, 2), 
	close_price DECIMAL(10, 2), 
	high_price DECIMAL(10, 2), 
	low_price DECIMAL(10, 2), 
	volume BIGINT, 
	PRIMARY KEY (price_id), 
	CONSTRAINT price_history_ibfk_1 FOREIGN KEY(stock_id) REFERENCES stocks (stock_id), 
	CONSTRAINT price_history_chk_1 CHECK ((`open_price` > 0)), 
	CONSTRAINT price_history_chk_2 CHECK ((`close_price` > 0)), 
	CONSTRAINT price_history_chk_3 CHECK ((`high_price` > 0)), 
	CONSTRAINT price_history_chk_4 CHECK ((`low_price` > 0)), 
	CONSTRAINT price_history_chk_5 CHECK ((`volume` > 0))
)ENGINE=InnoDB COLLATE utf8mb4_0900_ai_ci DEFAULT CHARSET=utf8mb4

/*
3 rows from price_history table:
price_id	stock_id	date	open_price	close_price	high_price	low_price	volume
1	1	2024-01-01	100.00	105.00	110.00	95.00	1000000
2	2	2024-01-01	200.00	210.00	220.00	190.00	2000000
3	3	2024-01-01	300.00	295.00	310.00	280.00	1500000
*/

In [6]:
import sqlite3
import os
from dotenv import load_dotenv
load_dotenv()  # take environment variables from .env (especially openai api key)


__import__('pysqlite3')
import sys
sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

from langchain_google_genai import GoogleGenerativeAI
from langchain.utilities import SQLDatabase
from langchain_experimental.sql import SQLDatabaseChain
from langchain.prompts import SemanticSimilarityExampleSelector
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.prompts import FewShotPromptTemplate
from langchain.chains.sql_database.prompt import PROMPT_SUFFIX, _mysql_prompt
from langchain.prompts.prompt import PromptTemplate


def get_few_shot_db_chain():
    db_user = "root"
    db_password = "root"
    db_host = "localhost"
    db_name = "stock_market"


    few_shots = [
    {'Question' : "What is the total number of shares for Apple Inc.?", 
     'SQLQuery' : "SELECT total_shares FROM stocks WHERE company_name = 'Apple Inc.'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "1000000"},
    
    {'Question' : "What is the current price of Tesla Inc. stock?", 
     'SQLQuery' : "SELECT current_price FROM stocks WHERE company_name = 'Tesla Inc.'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "450.75"},
    
    {'Question' : "What is the total sales volume for Microsoft Corp. on January 1st, 2024?", 
     'SQLQuery' : "SELECT volume FROM price_history JOIN stocks ON price_history.stock_id = stocks.stock_id WHERE company_name = 'Microsoft Corp.' AND date = '2024-01-01'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "2000000"},
    
    {'Question' : "How many shares of Amazon.com Inc. were sold on January 2nd, 2024?", 
     'SQLQuery' : "SELECT SUM(quantity) FROM stock_transactions JOIN stocks ON stock_transactions.stock_id = stocks.stock_id WHERE company_name = 'Amazon.com Inc.' AND transaction_date = '2024-01-02' AND transaction_type = 'Sell'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "500"},
    
    {'Question' : "What was the opening price of the stock of NVIDIA Corp. on January 1st, 2024?", 
     'SQLQuery' : "SELECT open_price FROM price_history JOIN stocks ON price_history.stock_id = stocks.stock_id WHERE company_name = 'NVIDIA Corp.' AND date = '2024-01-01'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "120.00"},
    
    {'Question' : "What was the closing price of Oracle Corp. stock on January 1st, 2024?", 
     'SQLQuery' : "SELECT close_price FROM price_history JOIN stocks ON price_history.stock_id = stocks.stock_id WHERE company_name = 'Oracle Corp.' AND date = '2024-01-01'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "145.50"},
    
    {'Question' : "How much was the total amount spent on buying stocks of Meta Platforms Inc. on January 2nd, 2024?", 
     'SQLQuery' : "SELECT SUM(price_per_share * quantity) FROM stock_transactions JOIN stocks ON stock_transactions.stock_id = stocks.stock_id WHERE company_name = 'Meta Platforms Inc.' AND transaction_date = '2024-01-02' AND transaction_type = 'Buy'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "15600.00"},
    
    {'Question' : "What is the high price of Tesla Inc. stock on January 1st, 2024?", 
     'SQLQuery' : "SELECT high_price FROM price_history JOIN stocks ON price_history.stock_id = stocks.stock_id WHERE company_name = 'Tesla Inc.' AND date = '2024-01-01'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "465.00"},
    
    {'Question' : "How many shares of Meta Platforms Inc. are available?", 
     'SQLQuery' : "SELECT total_shares FROM stocks WHERE company_name = 'Meta Platforms Inc.'", 
     'SQLResult': "Result of the SQL query", 
     'Answer' : "1500000"}
    ]

    # Connect to MySQL database
    db = SQLDatabase.from_uri(f"mysql+pymysql://{db_user}:{db_password}@{db_host}/{db_name}",
                              sample_rows_in_table_info=3)
    
    

    # Initialize Gemini LLM via Google Vertex AI
    llm = GoogleGenerativeAI(
        model="gemini-1.5-pro",  # Specify the Gemini model
        temperature=0.1,
        max_output_tokens=1024,  # Limit the output token length
        top_p=0.95,  # Optional: Adjust the probability sampling
    )

    # Set up embeddings
    embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
    to_vectorize = [" ".join(example.values()) for example in few_shots]
    vectorstore = Chroma.from_texts(to_vectorize, embeddings, metadatas=few_shots)

    # Example selector for few-shot prompting
    example_selector = SemanticSimilarityExampleSelector(
        vectorstore=vectorstore,
        k=2,
    )

    # Prompt template for MySQL queries
    mysql_prompt = """You are a MySQL expert. Given an input question, first create a syntactically correct MySQL query to run, then look at the results of the query and return the answer to the input question.
    Unless the user specifies in the question a specific number of examples to obtain, query for at most {top_k} results using the LIMIT clause as per MySQL. You can order the results to return the most informative data in the database.
    Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in backticks (`) to denote them as delimited identifiers.
    Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
    Pay attention to use CURDATE() function to get the current date, if the question involves "today".
    
    Use the following format:
    
    Question: Question here
    SQLQuery: Query to run with no pre-amble
    SQLResult: Result of the SQLQuery
    Answer: Final answer here
    
    No pre-amble.
    """

    # Few-shot example prompt
    example_prompt = PromptTemplate(
        input_variables=["Question", "SQLQuery", "SQLResult", "Answer"],
        template="\nQuestion: {Question}\nSQLQuery: {SQLQuery}\nSQLResult: {SQLResult}\nAnswer: {Answer}",
    )

    # Full few-shot prompt template
    few_shot_prompt = FewShotPromptTemplate(
        example_selector=example_selector,
        example_prompt=example_prompt,
        prefix=mysql_prompt,
        suffix=PROMPT_SUFFIX,
        input_variables=["input", "table_info", "top_k"],  # These variables are used in the prefix and suffix
    )

    # Create the SQLDatabaseChain
    chain = SQLDatabaseChain.from_llm(llm, db, verbose=True, prompt=few_shot_prompt)
    return chain



# Call the function to get the chain
chain = get_few_shot_db_chain()

# Run the query
response = chain.run("How many shares of Volkswagen AG. are available?")
print(response)



[1m> Entering new SQLDatabaseChain chain...[0m
How many shares of Volkswagen AG. are available?
SQLQuery:[32;1m[1;3mSELECT `total_shares` FROM `stocks` WHERE `company_name` = 'Volkswagen AG.'[0m
SQLResult: [33;1m[1;3m[0m
Answer:[32;1m[1;3mQuestion: How many shares of Volkswagen AG. are available?
SQLQuery:SELECT `total_shares` FROM `stocks` WHERE `company_name` = 'Volkswagen AG.'[0m
[1m> Finished chain.[0m
Question: How many shares of Volkswagen AG. are available?
SQLQuery:SELECT `total_shares` FROM `stocks` WHERE `company_name` = 'Volkswagen AG.'


In [7]:
# Call the function to get the chain
chain = get_few_shot_db_chain()

# Run the query
response = chain.run("How many shares of Microsoft Corp are available?")
print(response)



[1m> Entering new SQLDatabaseChain chain...[0m
How many shares of Microsoft Corp are available?
SQLQuery:[32;1m[1;3mSELECT `total_shares` FROM `stocks` WHERE `company_name` = 'Microsoft Corp.' LIMIT 5[0m
SQLResult: [33;1m[1;3m[(6585482,)][0m
Answer:[32;1m[1;3m6585482[0m
[1m> Finished chain.[0m
6585482


In [9]:
response = chain.run("How many shares of Volkswagen AG are available?")
print(response)



[1m> Entering new SQLDatabaseChain chain...[0m
How many shares of Volkswagen AG are available?
SQLQuery:[32;1m[1;3mSELECT `total_shares` FROM `stocks` WHERE `company_name` = 'Volkswagen AG'[0m
SQLResult: [33;1m[1;3m[(6152750,)][0m
Answer:[32;1m[1;3m6152750[0m
[1m> Finished chain.[0m
6152750


In [17]:
!pip show pydantic


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Name: pydantic
Version: 2.9.2
Summary: Data validation using Python type hints
Home-page: https://github.com/pydantic/pydantic
Author: 
Author-email: Samuel Colvin <s@muelcolvin.com>, Eric Jolibois <em.jolibois@gmail.com>, Hasan Ramezani <hasan.r67@gmail.com>, Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>, Terrence Dorsey <terry@pydantic.dev>, David Montague <david@pydantic.dev>, Serge Matveenko <lig@countzero.co>, Marcelo Trylesinski <marcelotryle@gmail.com>, Sydney Runkle <sydneymarierunkle@gmail.com>, David Hewitt <mail@davidhewitt.io>, Alex Hall <alex.mojaki@gmail.com>
License: 
Location: /home/codespace/.python/current/lib/python3.12/site-packages
Requires: annotated-types, pydantic-core, typing-extensions
Required-by: chromadb, fastapi, google-generativeai, langchain, langchain-core, langchain-google-genai, langsmith, pydantic-settings


In [23]:
!pip install --upgrade langchain




In [31]:
!pip install chromadb==0.4.0

Collecting chromadb==0.4.0
  Downloading chromadb-0.4.0-py3-none-any.whl.metadata (6.9 kB)
Collecting pydantic<2.0,>=1.9 (from chromadb==0.4.0)
  Downloading pydantic-1.10.19-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (152 kB)
Collecting chroma-hnswlib==0.7.1 (from chromadb==0.4.0)
  Downloading chroma-hnswlib-0.7.1.tar.gz (30 kB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting fastapi<0.100.0,>=0.95.2 (from chromadb==0.4.0)
  Downloading fastapi-0.99.1-py3-none-any.whl.metadata (23 kB)
Collecting starlette<0.28.0,>=0.27.0 (from fastapi<0.100.0,>=0.95.2->chromadb==0.4.0)
  Downloading starlette-0.27.0-py3-none-any.whl.metadata (5.8 kB)
Downloading chromadb-0.4.0-py3-none-any.whl (398 kB)
Downloading fastapi-0.99.1-py3-none-any.whl (58 kB)
Downloading pydantic-1.10.19-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8

In [3]:
!pip install --upgrade pydantic

Collecting pydantic
  Using cached pydantic-2.10.3-py3-none-any.whl.metadata (172 kB)
Using cached pydantic-2.10.3-py3-none-any.whl (456 kB)
Installing collected packages: pydantic
  Attempting uninstall: pydantic
    Found existing installation: pydantic 1.9.1
    Uninstalling pydantic-1.9.1:
      Successfully uninstalled pydantic-1.9.1
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
fastapi 0.99.1 requires pydantic!=1.8,!=1.8.1,<2.0.0,>=1.7.4, but you have pydantic 2.10.3 which is incompatible.
chromadb 0.4.0 requires pydantic<2.0,>=1.9, but you have pydantic 2.10.3 which is incompatible.[0m[31m
[0mSuccessfully installed pydantic-2.10.3


In [7]:
!pip install pipdeptree

Collecting pipdeptree
  Downloading pipdeptree-2.24.0-py3-none-any.whl.metadata (16 kB)
Downloading pipdeptree-2.24.0-py3-none-any.whl (32 kB)
Installing collected packages: pipdeptree
Successfully installed pipdeptree-2.24.0


In [1]:
!pip install -U ipykernel



In [9]:

!pip install langchain_google_genai

Collecting langchain_google_genai
  Using cached langchain_google_genai-2.0.7-py3-none-any.whl.metadata (3.6 kB)
Using cached langchain_google_genai-2.0.7-py3-none-any.whl (41 kB)
Installing collected packages: langchain_google_genai
Successfully installed langchain_google_genai-2.0.7


In [12]:
!pip install pydantic==2.9.2

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting pydantic==2.9.2
  Downloading pydantic-2.9.2-py3-none-any.whl.metadata (149 kB)
Collecting pydantic-core==2.23.4 (from pydantic==2.9.2)
  Downloading pydantic_core-2.23.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Downloading pydantic-2.9.2-py3-none-any.whl (434 kB)
Downloading pydantic_core-2.23.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m14.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pydantic-core, pydantic
  Attempting uninstall: pydantic-core
    Found existing installation: pydantic_core 2.27.1
    Uninstalling pydantic_core-2.27.1:
      Successfully uninstalled pydantic_core-2.27.1
  Attempting uninstall: pydantic
    Found existing installation: pydantic 1.10.19
    Uninstalling pydantic-1.10.19:
      Successfully uninstalled pydantic-1.10.19
[31mERROR: pip's dependency resolver does not currently tak

In [18]:
!pip install langchain_core

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




In [3]:
!pip install sentence-transformers


Collecting tokenizers<0.22,>=0.21 (from transformers<5.0.0,>=4.41.0->sentence-transformers)
  Using cached tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Using cached tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
Installing collected packages: tokenizers
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.20.3
    Uninstalling tokenizers-0.20.3:
      Successfully uninstalled tokenizers-0.20.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
chromadb 0.5.23 requires tokenizers<=0.20.3,>=0.13.2, but you have tokenizers 0.21.0 which is incompatible.[0m[31m
[0mSuccessfully installed tokenizers-0.21.0


In [10]:
!pip install streamlit

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting streamlit
  Downloading streamlit-1.41.1-py2.py3-none-any.whl.metadata (8.5 kB)
Collecting altair<6,>=4.0 (from streamlit)
  Downloading altair-5.5.0-py3-none-any.whl.metadata (11 kB)
Collecting blinker<2,>=1.0.0 (from streamlit)
  Downloading blinker-1.9.0-py3-none-any.whl.metadata (1.6 kB)
Collecting pyarrow>=7.0 (from streamlit)
  Downloading pyarrow-18.1.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting toml<2,>=0.10.1 (from streamlit)
  Downloading toml-0.10.2-py2.py3-none-any.whl.metadata (7.1 kB)
Collecting watchdog<7,>=2.1.5 (from streamlit)
  Downloading watchdog-6.0.0-py3-none-manylinux2014_x86_64.whl.metadata (44 kB)
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Collecting narwhals>=1.14.2 (from altair<6,>=4.0->streamlit)
  Downloading narwhals-1.18.3-py3-none-any.whl.metadata (8.3 kB)
Downloading streamlit-1.41.1-py2.py3-none-any.whl (9.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━