---------------------------
#### DocumentSummaryIndex
-------------------------

The **DocumentSummaryIndex** is a specialized index within **LlamaIndex**, designed to facilitate efficient document management and retrieval by summarizing documents and linking these summaries to corresponding nodes. Here's an overview of how it operates:

#### Key Features:
- **Summarization**: The core functionality of the **DocumentSummaryIndex** revolves around creating concise summaries for each document. These summaries help in identifying and retrieving relevant information quickly.
  
- **Node Mapping**: Each document summary is mapped to its respective node within the index. This allows for rapid navigation to the most relevant sections of a document during query retrieval.

#### Workflow:
1. **Document Ingestion**:
   - When a document is added, it is broken down into nodes (text chunks).
   - A summary is created for each document or section, giving a high-level overview of the content.

2. **Indexing**:
   - The summaries are then stored within the index and linked to the relevant nodes.
   - This creates a structure where each summary points to the deeper content, allowing for efficient retrieval without scanning the entire document.

3. **Querying**:
   - During a query, the system leverages these summaries to find relevant documents quickly.
   - Instead of searching through all the text, the system first scans the summaries to identify the most likely relevant documents or nodes.
   - Once a relevant summary is identified, the corresponding node can be accessed for more detailed information.

#### Benefits:
- **Efficient Retrieval**: By relying on summaries, the **DocumentSummaryIndex** reduces the need for extensive text scanning, optimizing performance.
- **Contextual Insights**: The summaries provide a high-level understanding of the document, which aids in quickly pinpointing where the relevant information might be located.
- **Scalability**: Particularly useful when managing large repositories of documents, where full-text search may be too resource-intensive.

#### Use Case Example:
Imagine you are managing a legal database containing thousands of contracts. Using the **DocumentSummaryIndex**, each contract can be summarized (e.g., terms, parties involved, key clauses). When a user queries "Termination clause in Vendor Agreement," the system will quickly scan the summaries to find relevant contracts, and the corresponding sections (nodes) can be retrieved for a deeper dive into the content.


![image.png](attachment:32b42c5f-bd23-47cf-80c1-70eea7f81100.png)

#### A simple usage model for the DocumentSummaryIndex

In [1]:
from llama_index.core import (
        DocumentSummaryIndex, 
        SimpleDirectoryReader)


A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.2 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "D:\ANACONDA\Lib\site-packages\ipykernel_launcher.py", line 17, in <module>
    app.launch_new_instance()
  File "D:\ANACONDA\Lib\site-packages\traitlets\config\application.py", line 1075, in launch_instance
    app.start()
  File "D:\ANACONDA\Lib\site-packages\ipykernel\kernelapp.py", line 701, in start
    self.io_loop.start()
  File "D:\ANACONDA\Lib\site-packages\tornado\platform\asyncio.py", line 205, in star

ImportError: 
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.2 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.




A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.2 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "D:\ANACONDA\Lib\site-packages\ipykernel_launcher.py", line 17, in <module>
    app.launch_new_instance()
  File "D:\ANACONDA\Lib\site-packages\traitlets\config\application.py", line 1075, in launch_instance
    app.start()
  File "D:\ANACONDA\Lib\site-packages\ipykernel\kernelapp.py", line 701, in start
    self.io_loop.start()
  File "D:\ANACONDA\Lib\site-packages\tornado\platform\asyncio.py", line 205, in star

ImportError: 
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.2 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.



In [2]:
documents = SimpleDirectoryReader("files").load_data()

In [3]:
index = DocumentSummaryIndex.from_documents(
                    documents,
                    show_progress=True
)

Parsing nodes:   0%|          | 0/2 [00:00<?, ?it/s]

Summarizing documents:   0%|          | 0/2 [00:00<?, ?it/s]

current doc id: f1e728c7-95ce-4422-be3c-0d0c03eb2201
current doc id: a64688b0-80b0-4c0e-81eb-5d29e231edab


Generating embeddings:   0%|          | 0/2 [00:00<?, ?it/s]

In [4]:
summary1 = index.get_document_summary(documents[0].doc_id)

In [5]:
summary1

"The provided text discusses the significance of ancient Rome within the vast Roman Empire, highlighting its grand architecture, engineering feats, governance structure, military prowess, and cultural influence. It mentions iconic structures like the Colosseum and the Pantheon, the Roman Republic with its Senate and elected officials, the Roman legions' conquests, and the lasting impact of Roman civilization on art, law, and governance in modern societies.\n\nSome questions that this text can answer include:\n- What were some of the notable architectural achievements of ancient Rome?\n- How did the Roman Republic function with its Senate and elected officials?\n- What role did the Roman legions play in expanding Roman territories?\n- In what ways has Roman civilization influenced modern art, law, and governance?"

In [6]:
summary2 = index.get_document_summary(documents[1].doc_id)