---------------------------
#### The SummaryIndex
----------------------------

In [5]:
from llama_index.core import SummaryIndex

In [6]:
from llama_index.core import SimpleDirectoryReader

In [7]:
documents = SimpleDirectoryReader("files").load_data()

In [8]:
index = SummaryIndex.from_documents(documents)

In [9]:
query_engine = index.as_query_engine()

In [10]:
response = query_engine.query("How many documents have you loaded?")
print(response)

Two documents have been loaded.


#### Understanding the Inner Workings of the SummaryIndex

The **SummaryIndex** functions by storing each document or section in a **list-like structure**, allowing for efficient iteration during queries.

#### Key Mechanism:
- **Node Storage**: Documents are broken down into nodes and stored in a simple, linear fashion (like a list).
- **Query Execution**: During a query, the **SummaryIndex** iterates through its stored nodes to identify relevant sections.
  - This process lacks the complexity of embedding-based searches (like in **VectorStoreIndex**), but for many simpler applications, it's still highly effective.

#### Retrievers:
The **SummaryIndex** can work with various retrievers, providing flexibility for different use cases:
1. **SummaryIndexRetriever**: Basic node retrieval mechanism.
2. **SummaryIndexEmbeddingRetriever**: Incorporates embeddings for more refined searches.
3. **SummaryIndexLLMRetriever**: Leverages language models (LLMs) for query resolution.

#### Create & Refine Approach:
When a query is made, the **SummaryIndex** follows a **create and refine** method to formulate the response:

1. **Initial Response**: 
   - The system generates a preliminary answer based on the first chunk of text.
   
2. **Refinement**:
   - The initial answer is refined by incorporating additional text chunks as context.
   - The process may involve:
     - **Maintaining** the original answer.
     - **Slightly modifying** the initial response to better suit the query.
     - **Rephrasing** or significantly changing the initial response based on additional information.

#### Benefits of this Approach:
- **Efficient for linear queries**: The list-like structure allows for quick retrievals when more advanced vector-based searches are not necessary.
- **Contextual Refinement**: Ensures that answers evolve as new information is brought in, leading to more accurate responses in some scenarios.

#### Example Workflow:
1. A query is submitted: **"What is the data retention policy?"**
2. The **SummaryIndex** retrieves the first relevant node and formulates a rough answer.
3. Additional nodes provide context (e.g., different sections of a document discussing retention policy specifics).
4. The response is refined based on the new chunks, resulting in a more polished and informative output.

This method balances simplicity with adaptability, making the **SummaryIndex** useful for applications where **complex embeddings** are **not necessary**.
