# 0 - Introduction

## Overview

In this workshop you are going to learn how to build a graph-enhanced GenAI question-answering solution using the AWS [GraphRAG Toolkit](https://github.com/awslabs/graphrag-toolkit/tree/main/lexical-graph) open source library.

The workshop assumes familiarity with graph concepts, vector similarity search, and retrieval augmented generation (RAG) techniques.

<div class="alert alert-success">
‚è∞ <b style="color:black;">Completing the exercises</b>

The exercises are best completed in order. However, each exercise is a self-contained unit. Exercise 3 is optional ‚Äì¬†you can complete this exerice if you have time at the end of the session.
</div>

### Exercise 1 ‚Äì¬†Indexing (20 mins)

In this exercise you learn how to build a graph and vector index from source documents using the GraphRAG Toolkit. 

During the exercise you will:

 - Learn about the Extract and Build stages of the indexing process
 - Inspect the extracted data
 - Learn about the specific graph model used by the toolkit
 - Visualise the resulting graph

### Exercise 2 ‚Äì¬†Querying (20 mins)

In this exercise you learn how to query data in the toolkit's graph and vector stores. The queries in this exercise show how the graph and vector stores can help answer complex questions that require both:

  - Information that is similar to the question being asked
  - Information that is structurally relevant, but _dissimilar_ to the question being asked
  
During the exercise you will:

 - Compare the results produced by traditional similarity search with those produced by graph-enhanced search
 - Visualise and inspect the results produced by the graph-enhanced search
 - Learn about some of the techniques employed by the graph-enhanced search to improve responses
  
### Exercise 3 - Agentic Use Cases (Extra - 10 mins)

This optional exercise shows how the indexing and querying capabilities in the previous exercises can be composed into higher-level agentic solutions, whereby an AI agent orchestrates multiple question-answering interactions to help answer more complex questions.

During the exercise you will:

  - Inspect the graph schemas for the two different datasets used in the exercise
  - Build an MCP server and client
  - Inspect the tool descriptions created by the toolkit
  - Create an agent that utilises the tools to answer some complex questions
  
## Activities

  - üéØ Code that you run to extract, build, query and visualise your data.
  - üîç Inspect the configuration, code and source data used throughout the solution.

## GraphRAG Toolkit

The [GraphRAG Toolkit](https://github.com/awslabs/graphrag-toolkit/tree/main/lexical-graph) is an open-source Python library designed to enhance retrieval-augmented generation (RAG) capabilities using graph-based approaches. 

Here's a summary of its key features:

  - **Hybrid RAG:** The GraphRAG Toolkit uses both a graph store and a vector store to fetch relevant content.
  - **Support for multiple backends:** The toolkit provides built-in support for multiple graph and vector stores, including Amazon Neptune Database, Amazon Neptune Analytics, Amazon OpenSearch Serverless, and PostgreSQL with pgvector extension.
  - **Lexical Graph Model:** The toolkit utilizes a three-tiered lexical graph model.
  
The toolkit includes two main components:

  - **Indexing:** Builds a graph store and vector store from source documents
  - **Querying:** Finds relevant content and generates answers in response to user questions 

![GraphRAG Toolkit](./images/graphrag-toolkit.png)

### LlamaIndex

Parts of the GraphRAG Toolkit are built using [LlamaIndex](https://www.llamaindex.ai/), another open source framework for building AI applications. Using LlamaIndex components such as connectors, chunking strategies and LLM wrappers, you can customize the GraphRAG Toolkit for additional data sources and backends.

## AWS Services and Foundation Models

As mentioned above, the GraphRAG Toolkit supports a number of different backends and foundation model providers, including Amazon Neptune Database, Amazon Neptune Analytics, Amazon OpenSearch Serverless, Amazon Aurora Postgres, and Amazon Bedrock.

In this workshop you'll be using Amazon Neptune Database (graph store), Amazon OpenSearch Serverless (vector store), and Amazon Bedrock (foundation models provider).

### üîç 0.1 Review endpoints and models

To see the backend endpoints and the specific models being used, run the following cell:

In [None]:
%reload_ext dotenv
%dotenv

import os

print(f"""
GRAPH_STORE      : {os.environ['GRAPH_STORE']}
VECTOR_STORE     : {os.environ['VECTOR_STORE']}
EXTRACTION_MODEL : {os.environ['EXTRACTION_MODEL']}
RESPONSE_MODEL   : {os.environ['RESPONSE_MODEL']}
""")

## Next Exercise

Go to <a href="../../../nbclassic/notebooks/graphrag-toolkit/1-Indexing.ipynb"><b>Exercise 1 - Indexing</b></a> to begin the workshop exercises.