<a href="https://colab.research.google.com/github/nikojim/Ollama-Chatbot/blob/main/chat_json_gemini.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Chat myPDF : RAG with LangChain, Ollama, and FAISS Vector Store

In [2]:
!pip install langchain-community faiss-cpu langchain-huggingface  google-generativeai langchain-google-genai gradio jq

Collecting langchain-community
  Downloading langchain_community-0.3.24-py3-none-any.whl.metadata (2.5 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.8 kB)
Collecting langchain-huggingface
  Downloading langchain_huggingface-0.2.0-py3-none-any.whl.metadata (941 bytes)
Collecting langchain-google-genai
  Downloading langchain_google_genai-2.1.4-py3-none-any.whl.metadata (5.2 kB)
Collecting gradio
  Downloading gradio-5.29.0-py3-none-any.whl.metadata (16 kB)
Collecting jq
  Downloading jq-1.8.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.0 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.9.1-py3-none-any.whl.metadata (3.8 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse

In [170]:
import os
import warnings
import pandas as pd
import json

os.environ['KMP_DUPLICATE_LIB_OK'] = 'True'
warnings.filterwarnings("ignore")



### Document Loader JSON

In [171]:
from langchain_community.document_loaders import JSONLoader

jsons = []
for root, dirs, files in os.walk('/content/sample_data/json-dataset'):
    # print(root, dirs, files)
    for file in files:
        if file.endswith('.json'):
            jsons.append(os.path.join(root, file))

In [172]:
docs = []
for json in jsons:
    loader = JSONLoader(json,jq_schema=".[]",  # El archivo ya es una lista, así que simplemente tomamos cada elemento
                            text_content=False
                            )
    pages = loader.load()

    docs.extend(pages)

### Document Chunking

In [173]:
from langchain_text_splitters import RecursiveCharacterTextSplitter


text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)

chunks = text_splitter.split_documents(docs)

In [174]:
len(docs), len(chunks)

(100, 1243)

In [175]:
len(docs[0].page_content), len(chunks[0].page_content)

(978, 90)

### Document Vector Embedding

In [176]:
import google.generativeai as genai
from google.colab import userdata
import faiss
from langchain_community.vectorstores import FAISS
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.embeddings import OpenAIEmbeddings

In [212]:
#Configure the genai library with the API key
api_key = userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=api_key)

model = genai.GenerativeModel('gemini-2.0-flash')

#testing model
response = model.generate_content("quién es el papa actualmente?")
print(response.text)

El Papa actual es el Papa Francisco.



In [223]:
#google embeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004", google_api_key=api_key)
#embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001", google_api_key=api_key)

#open ai embeddings
#embeddings = OpenAIEmbeddings(openai_api_key= userdata.get('OPENAI_API_KEY'))

single_vector = embeddings.embed_query("this is some text data")
print(single_vector)


[-0.03618612140417099, 0.013321533799171448, -0.11232965439558029, 0.02080659568309784, 0.020828289911150932, 0.06256619095802307, 0.019421320408582687, 0.037722695618867874, -0.005002064164727926, 0.0011697154259309173, -0.018056189641356468, 0.07775287330150604, 0.06763759255409241, -0.05253675952553749, 0.018215671181678772, -0.015552974306046963, 0.018968826159834862, 0.011806091293692589, -0.09088963270187378, -0.016034089028835297, -0.00273537146858871, 0.003796084551140666, 0.029542265459895134, -0.043985385447740555, -0.01580583117902279, -0.0272434763610363, 0.026261087507009506, -0.04548199847340584, 0.0026322673074901104, -0.027413934469223022, -0.004949113354086876, 0.041221100836992264, -0.03235857933759689, -0.051387205719947815, 0.021668581292033195, -0.009788203053176403, -0.008052870631217957, 0.022189658135175705, 0.02144995704293251, -0.09687020629644394, -0.06336642056703568, 0.02938433550298214, -0.06999195367097855, 0.020232347771525383, -0.03396876901388168, -0.0

In [224]:
len(single_vector)

768

In [225]:
index = faiss.IndexFlatL2(len(single_vector))
index.ntotal, index.d

(0, 768)

In [226]:
vector_store = FAISS(
    embedding_function=embeddings,
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={}
)

In [227]:
len(chunks)

1243

In [228]:
# help(vector_store)

In [229]:
ids = vector_store.add_documents(documents=chunks)

In [230]:
vector_store.index_to_docstore_id
len(ids)

1243

In [231]:
# # # store vector database
db_name = "incidentes-facturas-db"
vector_store.save_local(db_name)

# # # # load vector database
# new_vector_store = FAISS.load_local(db_name, embeddings=embeddings, allow_dangerous_deserialization=True)
# len(new_vector_store.index_to_docstore_id)

### Retreival

In [232]:
question = "incidente 108"
parts = vector_store.search(query=question, search_type='similarity')

for doc in parts:
    print(doc.page_content)
    print("\n\n")

{"id_incidente": 103, "asunto": "Fallo en la validaci\u00f3n del RUC emisor", "creador": "Eduardo",



{"id_incidente": 113, "asunto": "Fallo en la validaci\u00f3n del RUC emisor", "creador": "Eduardo",



{"id_incidente": 106, "asunto": "Fallo en la comunicaci\u00f3n con el servidor de DGI", "creador":



{"id_incidente": 102, "asunto": "Rechazo de e-Factura por parte de DGI", "creador": "Eduardo",





In [233]:
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs = {'k': 10,
                                                                          'fetch_k': 50,
                                                                          'lambda_mult': 1})

In [234]:
partes_retriever = retriever.invoke(question)

for doc in partes_retriever:
     print(doc.page_content)
     print("\n\n")


{"id_incidente": 103, "asunto": "Fallo en la validaci\u00f3n del RUC emisor", "creador": "Eduardo",



{"id_incidente": 113, "asunto": "Fallo en la validaci\u00f3n del RUC emisor", "creador": "Eduardo",



{"id_incidente": 106, "asunto": "Fallo en la comunicaci\u00f3n con el servidor de DGI", "creador":



{"id_incidente": 102, "asunto": "Rechazo de e-Factura por parte de DGI", "creador": "Eduardo",



{"id_incidente": 116, "asunto": "Fallo en la comunicaci\u00f3n con el servidor de DGI", "creador":



{"id_incidente": 189, "asunto": "Comprobante no enviado correctamente a DGI", "creador":



{"id_incidente": 183, "asunto": "Fallo en la validaci\u00f3n del RUC emisor", "creador": "Pablo",



{"id_incidente": 132, "asunto": "Rechazo de e-Factura por parte de DGI", "creador": "Pablo",



{"id_incidente": 133, "asunto": "Fallo en la validaci\u00f3n del RUC emisor", "creador": "Eduardo",



{"id_incidente": 129, "asunto": "Comprobante no enviado correctamente a DGI", "creador":





In [235]:


# question = "what is used to reduce weight?"
# question = "what are side effects of supplements?"
# question = "what are the benefits of supplements?"
# question = "what are the benefits of BCAA supplements?"
question = "Incidente 108"



### RAG with LLAMA 3.2 on OPEN AI

In [236]:
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate


In [237]:
prompt = hub.pull("rlm/rag-prompt")

In [238]:
 prompt = """
     You are an assistant that responds for question-answering tasks. Use the following pieces of retrieved context to answer the question.
     If you don't know the answer, just say that you don't know.
     Answer in bullet points. Make sure your answer is relevant to the question and it is answered from the context only.
     Answer in spanish.
     Question: {question}
     Context: {context}
     Answer:
 """
prompt = ChatPromptTemplate.from_template(prompt)

In [239]:
def format_docs(docs):
    return "\n\n".join([doc.page_content for doc in docs])

# print(format_docs(docs))

In [240]:
from langchain_google_genai import ChatGoogleGenerativeAI

model_rag = ChatGoogleGenerativeAI(model="gemini-2.0-flash", google_api_key=api_key)

rag_chain = (
    {"context": retriever|format_docs, "question": RunnablePassthrough()}
    | prompt
    | model_rag
    | StrOutputParser()
)

In [242]:
# question = "what is used to gain muscle mass?"
# question = "what is used to reduce weight?"
# question = "what are side effects of supplements?"
# question = "what are the benefits of supplements?"
# question = "what are the benefits of BCAA supplements?"

#question = "what is used to increase mass of the Earth?"

question = "qué incidentes tienen asunto validación RUC"

output = rag_chain.invoke(question)
print(output)


Los incidentes que tienen como asunto "Fallo en la validación del RUC emisor" son:

*   Incidente 163, creado por Gustavo.
*   Incidente 133, creado por Eduardo.
*   Incidente 193, creado por Pablo.
*   Incidente 183, creado por Pablo.
*   Incidente 173, creado por Gabriel.
*   Incidente 113, creado por Eduardo.
*   Incidente 103, creado por Eduardo.
*   Incidente 123, creado por Eduardo.
*   Incidente 153
*   Incidente 143


In [63]:
import gradio as gr # oh yeah!

In [64]:
# Let's create a call that streams back results

def chatbot(question):

    docs = retriever.invoke(question)

    rag_chain = (
    {"context": retriever|format_docs, "question": RunnablePassthrough()}
    | prompt
    | model_rag
    | StrOutputParser())

    output = rag_chain.invoke(question)

    return output

    # result = ""
    # for chunk in stream:
    #     result += chunk.choices[0].delta.content or ""
    #     yield result


In [77]:
view = gr.Interface(
    fn=chatbot,
    inputs=[gr.Textbox(label="Your message:", lines=6)],
    outputs=[gr.Textbox(label="Response:", lines=20)],
    flagging_mode="never"
)

gr.themes.Ocean()

view.launch()

It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://f5ab2f8924dde4123a.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


