# **GraphRAG Indexing & Querying**

This notebook demonstrates the refactored GraphRAG pipeline. We will use two separate scripts:

1.  **`scripts/run_indexing.py`**: Reads a PDF, builds a knowledge graph, detects communities, and saves pre-computed community summaries.
2.  **`scripts/run_query.py`**: Loads the pre-computed summaries and answers a user's "global query" about the corpus.

This separation of *indexing* (heavy, done once) from *querying* (light, done many times) is a standard practice for efficient RAG systems.

## 1. Setup

First, let's install all the required Python packages from our `requirements.txt` file.

In [None]:
!pip install -r requirements.txt

## 2. Run the Indexing Pipeline

This is the most compute-intensive step. The script will:

1.  Download the GraphRAG paper PDF (if not present).
2.  Load and split the PDF into text chunks.
3.  Use the LLM to extract entities and relationships for **each** chunk.
4.  Parse all extractions to build a single knowledge graph.
5.  Run the Leiden algorithm to detect communities (clusters) in the graph.
6.  Use the LLM again to create a textual summary for **each** community.
7.  Save the final graph (`knowledge_graph.gml`) and summaries (`community_summaries.json`) to the `output/` directory.

**Note:** This step can take a long time, depending on your hardware (GPU) and the size of the document.

In [None]:
!python scripts/run_indexing.py \
    --pdf_url "https://arxiv.org/pdf/2404.16130v1" \ 
    --pdf_path "data/2404.16130v1.pdf" \ 
    --output_dir "output"

## 3. Run the Query Pipeline

Now that the heavy indexing is done, we can ask questions!

This script is very fast. It will:

1.  Load the `community_summaries.json` file from the `output/` directory.
2.  Take our `--query`.
3.  Use the LLM to generate an "intermediate answer" from each community summary.
4.  Use the LLM a final time to synthesize all intermediate answers into one global, comprehensive response.

In [None]:
!python scripts/run_query.py --query "What problem does Graph RAG aim to solve that traditional RAG and QFS methods cannot?"

### Try another query!

Feel free to ask another global question. For example:

In [None]:
!python scripts/run_query.py --query "How does GraphRAG use community detection in its pipeline?"

## 4. Conclusion

This notebook demonstrated the modular and professional structure of the GraphRAG pipeline. By separating indexing and querying, we have a system that is both powerful and efficient.

All artifacts (logs, graph files, summaries) are neatly organized in the `logs/` and `output/` directories, and all source code is modular within `src/`.