# Meetup Goiania - Question Answering using Gemini, Langchain & Elasticsearch

This tutorial demonstrates how to use the [Gemini API](https://ai.google.dev/docs) to create [embeddings](https://ai.google.dev/docs/embeddings_guide) and store them in Elasticsearch. We will learn how to connect Gemini to private data stored in Elasticsearch and build question/answer capabilities over it using [LangChian](https://python.langchain.com/docs/get_started/introduction).

## setup

* Elastic Credentials - Create an [Elastic Cloud deployment](https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud) to get all Elastic credentials (`ELASTIC_CLOUD_ID`, `ELASTIC_API_KEY`).

* `GOOGLE_API_KEY` - To use the Gemini API, you need to [create an API key in Google AI Studio](https://ai.google.dev/tutorials/setup).

## Install packages

In [None]:
pip install -q -U google-generativeai langchain-elasticsearch langchain langchain_google_genai

## Import packages and credentials

In [None]:
import json
import os
from getpass import getpass
from urllib.request import urlopen

from langchain_elasticsearch import ElasticsearchStore
from langchain.text_splitter import CharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough

## Get Credentials

In [None]:
!pip install python-dotenv



In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import os
from dotenv import load_dotenv

# Substitua 'path/to/your/.env' pelo caminho correto até o seu arquivo .env no Google Drive
env_path = 'path/'
load_dotenv(env_path)

# Google API
google_api_key = os.getenv('google_api_key')


# Elastic cloud credentials
es_cloud_id = os.getenv('cloud_id')
es_user = os.getenv('cloud_user')
es_pass = os.getenv('cloud_pass')

ELASTIC_API_KEY = "your_elastic_api_key"
ELASTIC_CLOUD_ID = es_cloud_id


In [None]:
#os.environ["GOOGLE_API_KEY"] = getpass("Google API Key :")
#ELASTIC_API_KEY = getpass("Elastic API Key :")
#ELASTIC_CLOUD_ID = getpass("Elastic Cloud ID :")
elastic_index_name = "gemini-gyn-qa-json"
os.environ["GOOGLE_API_KEY"] = google_api_key

## Add documents

Ler PDF

In [None]:
!pip install pypdf

Collecting pypdf
  Downloading pypdf-4.1.0-py3-none-any.whl (286 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m286.1/286.1 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-4.1.0


In [None]:
from langchain.document_loaders import PyPDFLoader

filename = "/content/drive/MyDrive/@MyPresentations/MeetupGoiania2024/lava_pt.pdf"
loader = PyPDFLoader(filename)
pages = loader.load()

In [None]:
len(pages)

60

In [None]:
page = pages[0]
page.metadata
page

Document(page_content='Lavadora de roupas\nManual do usuário\nWD10M4***** / WD85M4*****\nWD10M44530W(127V)_03786S-00_BPT.indd   1 2017/5/24   9:43:37', metadata={'source': '/content/drive/MyDrive/@MyPresentations/MeetupGoiania2024/lava_pt.pdf', 'page': 0})

In [None]:
print(page.page_content)

Lavadora de roupas
Manual do usuário
WD10M4***** / WD85M4*****
WD10M44530W(127V)_03786S-00_BPT.indd   1 2017/5/24   9:43:37


### Let's download the sample dataset and deserialize the document.

In [None]:
url = "https://raw.githubusercontent.com/ashishtiwari1993/langchain-elasticsearch-RAG/main/data.json"

response = urlopen(url)

workplace_docs = json.loads(response.read())
workplace_docs = pages


In [None]:
workplace_docs[0]

Document(page_content='Lavadora de roupas\nManual do usuário\nWD10M4***** / WD85M4*****\nWD10M44530W(127V)_03786S-00_BPT.indd   1 2017/5/24   9:43:37', metadata={'source': '/content/drive/MyDrive/@MyPresentations/MeetupGoiania2024/lava_pt.pdf', 'page': 0})

### Split Documents into Passages

In [None]:
metadata = []
content = []

for doc in workplace_docs:
    content.append(doc["content"])
    metadata.append(
        {
            "name": doc["name"],
            "summary": doc["summary"],
            "rolePermissions": doc["rolePermissions"],
        }
    )

text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=0)
docs = text_splitter.create_documents(content, metadatas=metadata)



In [None]:
# metadata = []
# content = []

# for doc in workplace_docs:
#     content.append(doc["content"])
#     metadata.append(
#         {
#             "name": doc["name"],
#             "summary": doc["summary"],
#             "rolePermissions": doc["rolePermissions"],
#         }
#     )

# text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=0)
# docs = text_splitter.create_documents(content, metadatas=metadata)

In [None]:
docs

## Index Documents into Elasticsearch using Gemini Embeddings

In [None]:
query_embedding = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001", task_type="retrieval_document"
)

es = ElasticsearchStore.from_documents(
    docs,
    es_cloud_id=ELASTIC_CLOUD_ID,
    es_api_key=ELASTIC_API_KEY,
    index_name=elastic_index_name,
    embedding=query_embedding,
)

## Create a retriever using Elasticsearch

In [None]:
query_embedding = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001", task_type="retrieval_query"
)

# es = ElasticsearchStore(
#     es_cloud_id=es_cloud_id,
#     es_api_key=ELASTIC_API_KEY,
#     embedding=query_embedding,
#     index_name=elastic_index_name,
# )

es = ElasticsearchStore(
    es_cloud_id=es_cloud_id,
    index_name=elastic_index_name,
    embedding=query_embedding,
    es_user=es_user,
    es_password=es_pass
)

retriever = es.as_retriever(search_kwargs={"k": 3})

## Format Docs

In [None]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

## Create a Chain using Prompt Template + `gemini-pro` model

In [None]:
template = """Answer the question based only on the following context:\n

{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)


chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | ChatGoogleGenerativeAI(model="gemini-pro", temperature=0.7)
    | StrOutputParser()
)

chain.invoke("what is our sales goals?")

'Increase revenue by 20% compared to fiscal year 2023.\nExpand market share in key segments by 15%.\nRetain 95% of existing customers and increase customer satisfaction ratings.\nLaunch at least two new products or services in high-demand market segments.'

In [None]:
Increase revenue, expand market share,
and strengthen customer relationships
in our target markets.

# testes

In [None]:
# PDF

In [None]:
from langchain.document_loaders import PyPDFLoader

# Load PDF
loaders = [
    PyPDFLoader("lava_pt.pdf")
]
docs = []
for loader in loaders:
    docs.extend(loader.load())

In [None]:
# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500,
    chunk_overlap = 150
)

In [None]:
splits = text_splitter.split_documents(docs)

In [None]:
len(splits)

83

In [None]:
query_embedding = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001", task_type="retrieval_document"
)

es = ElasticsearchStore.from_documents(
    splits,
    es_cloud_id=ELASTIC_CLOUD_ID,
    es_api_key=ELASTIC_API_KEY,
    index_name=elastic_index_name,
    embedding=query_embedding,
)

In [None]:
query_embedding = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001", task_type="retrieval_query"
)

# es = ElasticsearchStore(
#     es_cloud_id=es_cloud_id,
#     es_api_key=ELASTIC_API_KEY,
#     embedding=query_embedding,
#     index_name=elastic_index_name,
# )

es = ElasticsearchStore(
    es_cloud_id=es_cloud_id,
    index_name=elastic_index_name,
    embedding=query_embedding,
    es_user=es_user,
    es_password=es_pass
)

retriever = es.as_retriever(search_kwargs={"k": 3})

In [None]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [None]:
template = """Answer the question in portuguese based only on the following context:\n

{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)


chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | ChatGoogleGenerativeAI(model="gemini-pro", temperature=0.7)
    | StrOutputParser()
)

chain.invoke("como funciona a maquina?")

' Para usar a máquina, gire o seletor de ciclos para selecionar um ciclo. O visor mostrará informações sobre o ciclo atual e o tempo estimado restante, ou ainda, um código de informação quando ocorrer algum problema. Você pode pressionar a tecla Temp. para mudar a temperatura da água do ciclo atual. Também é possível pressionar a tecla Centrifugar para mudar a velocidade de centrifugação do ciclo atual.\n\nPara selecionar a opção de secagem adequada, pressione a tecla Nível de Secagem. Todas as opções de secagem, exceto a opção Tempo de Secagem, detectam o peso das roupas para exibir um tempo de secagem mais preciso e secá-las mais completamente. Consulte a tabela na página 40 para selecionar a opção de secagem apropriada de acordo com o tipo e a quantidade de peças e a umidade que você deseja deixar.'

In [None]:
chain.invoke("O que você precisa saber sobre as instruções de segurança?")

' O que você precisa saber sobre as instruções de segurança\nLeia este manual cuidadosamente para que você saiba como operar de forma segura e eficiente os recursos e as funções abrangentes do seu novo eletrodoméstico. Mantenha-o em um lugar seguro próximo ao eletrodoméstico para consultas futuras. Utilize esse eletrodoméstico somente para os fins pretendidos, conforme descrito neste manual de instruções. As Advertências e Instruções importantes de segurança deste manual não abrangem todas as condições e situações que podem vir a ocorrer. É sua responsabilidade ter bom senso, cuidado e precaução ao instalar, cuidar e operar sua lavadora de roupas.Como as instruções de operação a seguir servem para vários modelos, as características da sua lavadora de roupas podem ser levemente diferentes daquelas descritas neste manual e nem todos os sinais de advertência serão aplicáveis. Caso tenha alguma dúvida ou comentário, entre em contato com a central de atendimento mais próxima ou encontre aju