# Appendix 1 - Lexical Graph

The graph built by the GraphRAG Tookit is based on a three-tiered lexical graph structure. This model is designed to efficiently represent and organize information extracted from unstructured sources. The model consists of three main tiers:

![Lexical Graph](./images/lexical-graph.png)

#### Lineage 

(Top tier, blue) This tier represents sources, chunks, and relations between them, providing a way to track the origin of information within the graph. Source nodes contain metadata describing source documents, with the specific metadata varying depending on the source.

#### Summarisation 

(Middle tier, green) The summarisation tier contains lexical units at different levels of granularity:

  - **Topics** group thematically linked statements belonging to the same source. A single chunk may contain statements belonging to multiple topics; multiple chunks may contain statements belonging to the same topic. Topics provide *local connectivity* – they allow a search to traverse thematically linked content belonging to an individual source document (see below).
  - **Statements** are the primary unit of context supplied to the LLM at question-answering time. Statements are grouped by topic and source, and supported by facts. Statements are not unique: if the same statement occurs in different sources, there will be multiple statement nodes in the graph.
  - **Facts** link statements derived from different sources, allowing for transitive connections between statements via facts and topics. Facts provide *global connectivity* – they allow a search to traverse linked content from across the corpus (see below). Unlike statements, facts are unique: the same fact can support multiple statements.

#### Entity-Relationship

(Bottom tier, orange) This tier forms the foundation of the graph and contains individual entities and relations extracted from the underlying sources. Entities in this tier act as low-level entry points into the graph. Entity classifications and the typed relations between them capture the semantics of the domain.

### Local and global connectivity

#### Local connectivity

Topics allow a search to traverse thematically linked content belonging to an individual source document:

![Local Connectivity](./images/local-connectivity.png)

#### Global connectivity

Facts allow a search to traverse linked content from across the corpus:

![Global Connectivity](./images/global-connectivity.png)