# Gemini Example in langchain
This is a LLM example with Gemini of Google AI Studio.  
https://github.com/google/generative-ai-docs/blob/main/examples/gemini/python/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb

Google AI Studio  
https://makersuite.google.com/  

**Note**: Fill in the .env before start  
you can get the key from https://makersuite.google.com/  

In [1]:
!pip install --quiet langchain-google-genai # This is Google AI Studio
# !pip install --quiet langchain-google-vertexai # This is Vertex AI (GCP)

In [2]:
# Run this cell and paste the API key in the prompt
import os
# import getpass

# os.environ['GOOGLE_API_KEY'] = getpass.getpass('Gemini API Key:')
from dotenv import load_dotenv
load_dotenv()

True

API:
https://api.python.langchain.com/en/latest/chat_models/langchain_google_genai.chat_models.ChatGoogleGenerativeAI.html

Add "convert_system_message_to_human" if there's any system role
>SystemMessages are not yet supported!
>
>To automatically convert the leading SystemMessage to a HumanMessage,
>set  `convert_system_message_to_human` to True. Example:
>
>llm = ChatGoogleGenerativeAI(model="gemini-pro", convert_system_message_to_human=True)

In [3]:
from langchain_google_genai import ChatGoogleGenerativeAI
from google.generativeai.types.safety_types import HarmBlockThreshold, HarmCategory

safety_settings = {
    # Gemini blocks everything
    # HarmCategory.HARM_CATEGORY_UNSPECIFIED: HarmBlockThreshold.BLOCK_NONE, 
    HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE, 
    HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE, 
    HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE, 
    HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
}

llm = ChatGoogleGenerativeAI(model="gemini-pro",
                                temperature=0.7, top_p=0.85, 
                                 safety_settings=safety_settings,
                                convert_system_message_to_human=True) #SystemMessages are not yet supported!

## Prompt (with model)

In [4]:
# llm.invoke("how can langsmith help with testing?") <-- without streaming

# streaming
async for chunk in llm.astream("how can langsmith help with testing?"):
    print(chunk.content, end="", flush=True)

**Langsmith** is a powerful AI-powered code testing platform that can significantly enhance the testing process for developers. Here's how Langsmith can help with testing:

**1. Automated Test Generation:**
* Generates comprehensive test cases based on code coverage, reducing manual effort and ensuring thorough testing.
* Supports various programming languages and frameworks, including Python, Java, JavaScript, and C++.

**2. Smart Test Execution:**
* Executes tests in parallel, optimizing testing time and resources.
* Provides detailed test results with logs, screenshots, and videos for easy debugging and analysis.

**3. Test Maintenance and Optimization:**
* Automates regression testing by detecting changes in code and updating tests accordingly.
* Identifies redundant or unnecessary tests, helping to streamline the testing process.

**4. Cross-Platform Testing:**
* Supports testing on multiple platforms, including web, mobile, and desktop, ensuring compatibility and seamless user ex

## Prompt (with LCEL)

Note that we pass an "dict" into invoke/stream/astream, not str

In [5]:
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are world class technical documentation writer. Please answer the answer in short"),
    ("user", "{input}")
])

In [6]:
chain = prompt | llm 
# print(chain)

# chain.invoke({"input": "how can langsmith help with testing?"}) <-- without streaming
async for chunk in chain.astream({"input": "how can langsmith help with testing?"}):
    print(chunk.content, end="", flush=True)

Langsmith can help with testing by:
- Generating test cases
- Automating test execution
- Analyzing test results

In [7]:
from langchain_core.output_parsers import StrOutputParser
chain = prompt | llm | StrOutputParser()
# Note that the output will be string after StrOutputParser, not dict anymore

# chain.invoke({"input": "how can langsmith help with testing?"}) <-- without streaming
async for chunk in chain.astream({"input": "how can langsmith help with testing?"}):
    print(chunk, end="", flush=True)

Langsmith can help with testing by:

- Generating test cases
- Automating test execution
- Reporting test results

In [8]:
# replace astream with stream
for chunk in chain.stream({"input": "how can langsmith help with testing?"}):
    print(chunk, end="", flush=True)

- Automating testing processes
- Generating test cases
- Creating test plans
- Executing tests
- Analyzing results

## Data Loader and Summary

In [9]:
from langchain.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://blog.google/technology/ai/google-gemini-ai/#sundar-note")
docs = loader.load()

print(docs[0].__dict__.keys())
print(docs[0].metadata)

dict_keys(['page_content', 'metadata', 'type'])
{'source': 'https://blog.google/technology/ai/google-gemini-ai/#sundar-note', 'title': 'Introducing Gemini: Google’s most capable AI model yet', 'description': 'Gemini is our most capable and general model, built to be multimodal and optimized for three different sizes: Ultra, Pro and Nano.', 'language': 'en-us'}


In [10]:
from langchain.schema import format_document
from langchain import PromptTemplate


# To query Gemini
llm_prompt = PromptTemplate.from_template(
    """Write a concise summary of the following:
    "{text}"
    CONCISE SUMMARY:""")


stuff_chain = (
    # Extract data from the documents and add to the key `text`.
    {
        "text": lambda docs: "\n\n".join(
                # format_document accept "page_content" and "metadata" in doc
                format_document(doc, PromptTemplate.from_template("{page_content}")) for doc in docs 
            )
    }
    | llm_prompt         # Prompt for Gemini
    | llm                # Gemini function
    | StrOutputParser()  # output parser
)

In [11]:
for chunk in stuff_chain.stream(docs):
    print(chunk, end="", flush=True)

Google introduces Gemini, its most capable AI model yet, with state-of-the-art performance across benchmarks, next-generation capabilities, and optimized efficiency. Gemini 1.0, available in Ultra, Pro, and Nano sizes, excels in multimodal understanding, sophisticated reasoning, advanced coding, and reliability. Built with responsibility and safety in mind, Gemini incorporates comprehensive evaluations, safety classifiers, and external expert collaborations. Google products like Bard, Pixel, and Search will integrate Gemini, while developers and enterprise customers can access Gemini Pro via APIs. Gemini Ultra will undergo further trust and safety checks before broader availability, with Bard Advanced offering access to cutting-edge capabilities. Gemini represents a milestone in AI development, unlocking a future of innovation and transformative applications across various industries and domains.

## RAG

In [12]:
from langchain.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://blog.google/outreach-initiatives/sustainability/report-ai-sustainability-google-cop28/")
docs = loader.load()
print(docs[0].__dict__.keys())
print(docs[0].metadata)

dict_keys(['page_content', 'metadata', 'type'])
{'source': 'https://blog.google/outreach-initiatives/sustainability/report-ai-sustainability-google-cop28/', 'title': 'New report from Google: AI and climate action', 'description': 'AI has the potential to mitigate 5-10% of global greenhouse gas emissions according to our new report with Boston Consulting Group.', 'language': 'en-us'}


In [13]:
from langchain.docstore.document import Document

# Get text between "...at COP28" and "POSTED IN:"
text_content = docs[0].page_content.split("Later this month at COP28",1)[1].split("POSTED IN:",1)[0]
# We manually make doc, change source
all_text_splits =  [Document(page_content=text_content, metadata={"source": "local"})]

# text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
# all_text_splits = text_splitter.split_documents(documents)

In [14]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
gemini_embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

In [15]:
from langchain.vectorstores import FAISS
vectorstore = FAISS.from_documents(all_text_splits, gemini_embeddings)

In [16]:
retriever = vectorstore.as_retriever(search_kwargs={"k": 1}) # default=4
search= retriever.get_relevant_documents("google AI")
# == vectorstore.similarity_search("....")
print(search)

[Document(page_content=", global leaders will gather in Dubai to build momentum for climate action. The United Nations’ Intergovernmental Panel on Climate Change (IPCC) forecasts that the world needs to reduce emissions by 43% by 2030. We believe that artificial intelligence (AI) and collective action can help achieve this goal and create a sustainable future for everyone.Today, we released a report with Boston Consulting Group (BCG), which shows that AI has the potential to help mitigate 5-10% of global greenhouse gas (GHG) emissions by 2030 — the equivalent of the total annual emissions of the European Union. Here’s a look at how we’re building AI that can drive climate progress, while at the same time working to mitigate AI’s environmental impact.\n\n\n\n\nAccelerating climate action with AIAI can have a transformative effect on climate progress. Already, it is starting to address climate challenges in three key areas: providing people and organizations with better information to ma

In [17]:
from langchain.schema.runnable import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

llm_prompt = PromptTemplate.from_template(
"""You are an assistant for question-answering tasks.
Use the following context to answer the question.
If you don't know the answer, just say that you don't know.
Use five sentences maximum and keep the answer concise.

Question: {question} 
Context: {context} 
Answer:""")


rag_chain = (
    {
        "context": retriever | format_docs, 
         "question": RunnablePassthrough()}
    | llm_prompt
    | llm
    | StrOutputParser()
)

In [18]:
# rag_chain.invoke("How can AI address climate challenges?")
for chunk in rag_chain.stream("How can AI address climate challenges?"):
    print(chunk, end="", flush=True)

AI can address climate challenges by providing better information for sustainable choices, delivering improved predictions for adaptation, finding recommendations for high-impact climate action optimization, and managing the environmental impact associated with AI.

**Note**: If you want pass an `dict`, make sure the input layer is able to accept it.

In [19]:
%%time
from operator import itemgetter
rag_chain2 = (
    {
        "context": itemgetter("q") |retriever, 
         "question": itemgetter("q")}
    | llm_prompt
    | llm
    | StrOutputParser()
)

for chunk in rag_chain2.stream({"q":"How can AI address climate challenges?"}):
    print(chunk, end="", flush=True)
print("\n")

AI can address climate challenges by providing better information for sustainable choices, delivering improved predictions for climate change adaptation, and finding recommendations to optimize climate action. It can also help reduce greenhouse gas emissions by optimizing routing, predicting floods, and minimizing contrails. Additionally, AI can help make AI computing more efficient and reduce the carbon footprint of AI workloads.

CPU times: user 30.7 ms, sys: 412 µs, total: 31.1 ms
Wall time: 4.62 s
