<a href="https://colab.research.google.com/github/irisalmeida/oficina-rag/blob/main/Agents4good_oficina_rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Leitor Inteligente de PDFs com Imagens**

Aplicação que responde perguntas sobre PDFs usando IA multimodal e busca semântica com geração aumentada por recuperação (RAG).

#### **Configuração do Ambiente**


##### **Instalação de Pacotes Necessários**

In [None]:
!pip install -U -q langgraph langchain langchain-google-genai langchain_community faiss-cpu pdf2image PyMuPDF

##### **Definição da chave da API do Google AI**
Gerar uma API Key no [Google AI Studio](https://aistudio.google.com/prompts/new_chat)

In [None]:
import os, getpass

#Adicione sua chave abaixo:
#os.environ["GOOGLE_API_KEY"] = "sua-chave-aqui"

if not os.getenv("GOOGLE_API_KEY"):
    os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter your Google API key: ")

#### **Upload do PDF**

In [None]:
os.makedirs("pdf_images", exist_ok=True)

In [None]:
from google.colab import files
uploaded = files.upload()
pdf_path = list(uploaded.keys())[0]

#### **Inicialização do LLM**

In [None]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")

#### **Extraindo Texto e Imagens do PDF**

In [None]:
from pdf2image import convert_from_path
import fitz
import base64
from langchain_core.documents import Document
from langchain_core.messages import HumanMessage

text_chunks = []
image_descriptions = []
doc = fitz.open(pdf_path)

for i, page in enumerate(doc):
    text = page.get_text("text")
    if text:
        text_chunks.append(Document(page_content=text.strip(), metadata={"page": i}))

    image_list = page.get_images(full=True)
    for j, img in enumerate(image_list):
        xref = img[0]
        base_image = doc.extract_image(xref)
        image_bytes = base_image["image"]
        image_path = f"pdf_images/page_{i}_img_{j}.png"
        with open(image_path, "wb") as f:
            f.write(image_bytes)

        image_b64 = base64.b64encode(image_bytes).decode()
        vision_response = llm.invoke([
            HumanMessage(
                content=[
                    {"type": "text", "text": "Descreva com detalhes o conteúdo da imagem."},
                    {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_b64}"}}
                ]
            )
        ])
        image_desc = vision_response.content.strip()
        image_descriptions.append(Document(
            page_content=image_desc,
            metadata={
                "image_path": image_path,
                "image_base64": image_b64,
                "page": i,
                "image_index": j
            }
        ))

all_documents = text_chunks + image_descriptions

#### **Gerando embeddings e indexando conteúdo**

Modelo de embedding: [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(all_documents, embedding=embeddings)

#### **Construção do Grafo**

In [None]:
from langchain import hub
from langgraph.graph import START, END, StateGraph
from typing_extensions import TypedDict, List
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI

class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

prompt = hub.pull("rlm/rag-prompt")

def retrieve(state: State):
    docs = vectorstore.similarity_search(state["question"], k=7)
    return {"context": docs}

def generate(state: State):
    context_parts = []
    for doc in state["context"]:
        text = doc.page_content
        img_b64 = doc.metadata.get("image_base64")
        if img_b64:
            image_tag = f"\n[Imagem relacionada: data:image/png;base64,{img_b64}]\n"
            context_parts.append(text + image_tag)
        else:
            context_parts.append(text)

    context_text = "\n\n".join(context_parts)

    messages = prompt.invoke({
        "question": state["question"],
        "context": context_text
    })
    result = llm.invoke(messages)
    return {"answer": result.content}

graph_builder = StateGraph(State)
graph_builder.add_node("retrieve", retrieve)
graph_builder.add_node("generate", generate)
graph_builder.add_edge(START, "retrieve")
graph_builder.add_edge("retrieve", "generate")
graph_builder.add_edge("generate", END)
graph = graph_builder.compile()


#### **Visualização do Grafo**

In [None]:
from IPython.display import Image, display
try:
    display(Image(graph.get_graph().draw_mermaid_png()))
except Exception:
    pass

In [None]:
question = input("Digite sua pergunta sobre o PDF: ")
result = graph.invoke({"question": question})

print("\n Resposta:\n")
print(result["answer"])

print("\n Imagens relevantes: \n")
for doc in result["context"]:
    if "image_path" in doc.metadata:
        display(Image(doc.metadata["image_path"]))