## Metadata Extraction with AWS Bedrock and LlamaIndex

This comprehensive technical guide demonstrates how to leverage [LlamaIndex metadata extractors](https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/modules/#metadata-extractors) with [AWS Bedrock](https://aws.amazon.com/bedrock/) LLMs to dramatically improve Retrieval-Augmented Generation (RAG) system performance. These extractors enhance document chunks with structured metadata that improves retrieval quality and semantic understanding, enabling more accurate and contextually relevant responses.

### Available Metadata Extractors

LlamaIndex provides several powerful metadata extraction modules that work seamlessly with AWS Bedrock:

- **`SummaryExtractor`** - Automatically generates concise summaries for document nodes/chunks, providing quick content overviews for better retrieval relevance
- **`QuestionsAnsweredExtractor`** - Identifies and extracts potential questions that each node/chunk can answer, enabling [question-based retrieval strategies](https://docs.llamaindex.ai/en/stable/understanding/querying/querying/)
- **`TitleExtractor`** - Creates descriptive titles for document chunks based on their content context, improving document organization and navigation
- **`EntityExtractor`** - Identifies and extracts [named entities](https://en.wikipedia.org/wiki/Named-entity_recognition) (people, places, organizations, concepts) from document content, enabling entity-based search and knowledge graph construction
- **`PydanticProgramExtractor`** - Uses [structured Pydantic models](https://ojitha.github.io/python/ai/langgraph/langchain/chatgpt/2025/09/07/Python_Type_Annotation.html) to extract custom metadata schemas, providing flexible and type-safe metadata extraction

### Benefits of Metadata Extraction

- **Enhanced Retrieval**: Rich metadata improves semantic matching and retrieval accuracy by providing additional context clues
- **Better Organization**: Automated titles and summaries help structure large document collections for easier navigation
- **Query Understanding**: Question extraction enables more intuitive natural language queries and better question-answering capabilities
- **Knowledge Discovery**: Entity extraction reveals relationships and key concepts across documents, enabling semantic search
- **Structured Data**: Pydantic models ensure consistent, validated metadata that can be easily processed and filtered

### Technical Implementation

This notebook demonstrates:
1. **Basic Extractors**: Using built-in `QuestionsAnsweredExtractor` to generate potential questions from document chunks
2. **Custom Pydantic Extractors**: Creating structured metadata extraction using custom schemas and AWS Bedrock LLMs
3. **Performance Comparison**: Comparing retrieval quality between original nodes and metadata-enhanced nodes
4. **Index Integration**: Building vector indices with enhanced metadata for improved semantic search

This approach is particularly valuable for scientific literature, research papers, and knowledge-intensive domains where precise information retrieval is critical for generating accurate, contextually relevant responses.

In [1]:
from llama_index.embeddings.bedrock import BedrockEmbedding
from llama_index.core import get_response_synthesizer
from llama_index.llms.bedrock_converse import BedrockConverse
from llama_index.embeddings.bedrock import BedrockEmbedding
from llama_index.core import Settings
import json

supported_models = BedrockEmbedding.list_supported_models()
print(json.dumps(supported_models, indent=2))


llm = BedrockConverse(model="amazon.nova-pro-v1:0",
            temperature=0,
            max_tokens=3000
)
Settings.llm = llm
# Basic embedding model setup
embed_model = BedrockEmbedding(
    model_name="amazon.titan-embed-text-v2:0"
)
Settings.embed_model = embed_model

{
  "amazon": [
    "amazon.titan-embed-text-v1",
    "amazon.titan-embed-text-v2:0",
    "amazon.titan-embed-g1-text-02"
  ],
  "cohere": [
    "cohere.embed-english-v3",
    "cohere.embed-multilingual-v3"
  ]
}


Helper function:

In [59]:
from IPython.display import Markdown, display
def in_md(md_txt):
    md_formated_txt = f"""--- Response ---   
    {md_txt}     
    ----
    """
    display(Markdown(md_formated_txt))

### Node Parser and Metadata Extractors

The `TokenTextSplitter` divides documents into 256-token chunks with 128-token overlap to maintain semantic continuity across boundaries. The `QuestionsAnsweredExtractor` automatically generates 3 potential questions each chunk can answer, enriching the metadata for improved retrieval accuracy.

In [2]:
from llama_index.core.node_parser import TokenTextSplitter
from llama_index.core.extractors import  QuestionsAnsweredExtractor
from llama_index.core.schema import MetadataMode


node_parse = TokenTextSplitter(
    separator = " ", chunk_size=256, chunk_overlap=128)

question_extractor = QuestionsAnsweredExtractor(
    questions= 3, llm=llm, meta_mode=MetadataMode.EMBED
)


In [49]:
from llama_index.readers.web import SimpleWebPageReader
reader = SimpleWebPageReader(html_to_text=True)
docs = reader.load_data(urls=["https://www.nobelprize.org/prizes/physics/1921/einstein/biographical/"])

### Nodes

In [50]:
orig_nodes = node_parse.get_nodes_from_documents(docs)
pprint.pprint(orig_nodes[20:28][3].get_content(metadata_mode='all'))

('url: https://www.nobelprize.org/prizes/physics/1921/einstein/biographical/\n'
 '\n'
 '_\n'
 '\n'
 'This autobiography/biography was written at the time of the award and first\n'
 'published in the book series [_Les Prix\n'
 'Nobel_](/nobel_organizations/nobelfoundation/publications/lesprix.html). It\n'
 'was later edited and republished in _[Nobel\n'
 'Lectures](/nobel_organizations/nobelfoundation/publications/lectures/index.html)_.\n'
 'To cite this document, always state the source as shown above.\n'
 '\n'
 '* * *\n'
 '\n'
 '[](/nobel_prizes/physics/laureates/1921/)* Albert Einstein was formally\n'
 'associated with the Institute for Advanced Study located in Princeton, New\n'
 'Jersey.\n'
 '\n'
 'Copyright (C) The Nobel Foundation 1922\n'
 '\n'
 'To cite this section  \n'
 'MLA style: Albert Einstein - Biographical. NobelPrize.org. Nobel Prize\n'
 'Outreach 2025. Mon. 15 Sep 2025.\n'
 '<https://www.nobelprize.org/prizes/physics/1921/einstein/biographical/>\n'
 '\n'
 'Back to top 

In [51]:
nodes_20_28 = orig_nodes[20:28]
nodes_20_28_q = question_extractor(nodes_20_28)

100%|██████████| 8/8 [00:06<00:00,  1.24it/s]


### Question extraction

This code selects 8 document nodes (indices 20-28) and applies the question extractor to generate potential questions for each chunk. The extractor uses the configured LLM to analyze each node's content and automatically creates 3 relevant questions that the text can answer, enriching the metadata for improved semantic search.

> The output contain `questions_this_excerpt_can_answer` metadata that improves semantic matching during retrieval, helping the vector search find more relevant matches when users ask questions.

In [52]:
pprint.pprint(nodes_20_28_q[3].get_content(metadata_mode='all'), indent=2)

('[Excerpt from document]\n'
 'url: https://www.nobelprize.org/prizes/physics/1921/einstein/biographical/\n'
 'questions_this_excerpt_can_answer: Certainly! Here are three specific '
 'questions that this context can provide unique answers to, along with '
 'higher-level summaries to frame them:\n'
 '\n'
 '### Higher-Level Summaries:\n'
 '1. **Historical Context of the Nobel Prize Announcement**:\n'
 '   The context provides specific details about the announcement and '
 "documentation of Albert Einstein's Nobel Prize in Physics in 1921. It "
 'highlights the original publication in "Les Prix Nobel" and its later '
 'inclusion in "Nobel Lectures."\n'
 '\n'
 '2. **Associations and Affiliations**:\n'
 "   The biographical information specifies Einstein's formal association with "
 'the Institute for Advanced Study in Princeton, New Jersey, at the time of '
 'the award.\n'
 '\n'
 '3. **Citation and Copyright Information**:\n'
 '   The document includes precise citation guidelines in MLA s

## Build indices

Two vector store indices are created for performance comparison:

- `index_0` - Built from original nodes without metadata enhancement
- `index_1` - Metadata enhanced index combining:
  - `orig_nodes[:20]` - First 20 original nodes (unchanged)
  - `nodes_20_28_q` - Nodes 20-28 enhanced with question metadata from `QuestionsAnsweredExtractor`
  - `orig_nodes[28:]` - Remaining original nodes (unchanged)

The enhanced nodes contain `questions_this_excerpt_can_answer` metadata that improves semantic matching during retrieval, helping the vector search find more relevant matches when users ask questions.

In [54]:
from llama_index.core import VectorStoreIndex
from llama_index.core.response.notebook_utils import (
    display_source_node, display_response
)

index_0 = VectorStoreIndex(orig_nodes)
index_1 = VectorStoreIndex(orig_nodes[:20]+nodes_20_28_q+orig_nodes[28:])

Create query engines

In [57]:
# Query engines
query_engine_0 = index_0.as_query_engine(similarity_top_k=1)
query_engine_1 = index_1.as_query_engine(similarity_top_k=1)

### Question which is not answerable

In [None]:
query_1 = "What is the photoelectric effect?"

In [75]:
response_1_0 = query_engine_0.query(query_1)
response_1_1 = query_engine_1.query(query_1)

--------- Response_0: --------

In [76]:

display_response(response_1_0 
    , source_length=1000
    , show_source=True
    , show_metadata=True)


**`Final Response:`** The context provided does not explicitly mention the photoelectric effect. Therefore, based on the given information, I cannot provide an answer to the query about the photoelectric effect.

---

**`Source Node 1/1`**

**Node ID:** 9af54332-06a2-4152-b896-397cdcea4e6c<br>**Similarity:** 0.1352490042404992<br>**Text:** the Brownian movement of molecules. He investigated the
thermal properties of light with a low radiation density and his observations
laid the foundation of the photon theory of light.

In his early days in Berlin, Einstein postulated that the correct
interpretation of the special theory of relativity must also furnish a theory
of gravitation and in 1916 he published his paper on the general theory of
relativity. During this time he also contributed to the problems of the theory
of radiation and statistical mechanics.

In the 1920s, Einstein embarked on the construction of unified field theories,
although he continued to work on the probabilistic interpretation of quantum
theory, and he persevered with this work in America. He contributed to
statistical mechanics by his development of the quantum theory of a monatomic
gas and he has also accomplished valuable work in connection with atomic
transition probabilities and relativistic cosmology.

After his retirement he continued to wor...<br>

{'9af54332-06a2-4152-b896-397cdcea4e6c': {'url': 'https://www.nobelprize.org/prizes/physics/1921/einstein/biographical/'}}

--------- response_1: --------

In [77]:
display_response(response_1_1 
    , source_length=1000
    , show_source=True
    , show_metadata=True)


**`Final Response:`** The photoelectric effect is not explicitly mentioned in the provided context. However, it can be inferred that Einstein's investigations into the thermal properties of light with a low radiation density and his observations that laid the foundation of the photon theory of light are related to the principles underlying the photoelectric effect. The photoelectric effect demonstrates that light can be thought of as consisting of particles, or photons, which can eject electrons from a material when they strike it. This concept is foundational to the photon theory of light.

---

**`Source Node 1/1`**

**Node ID:** 9af54332-06a2-4152-b896-397cdcea4e6c<br>**Similarity:** 0.1352490042404992<br>**Text:** the Brownian movement of molecules. He investigated the
thermal properties of light with a low radiation density and his observations
laid the foundation of the photon theory of light.

In his early days in Berlin, Einstein postulated that the correct
interpretation of the special theory of relativity must also furnish a theory
of gravitation and in 1916 he published his paper on the general theory of
relativity. During this time he also contributed to the problems of the theory
of radiation and statistical mechanics.

In the 1920s, Einstein embarked on the construction of unified field theories,
although he continued to work on the probabilistic interpretation of quantum
theory, and he persevered with this work in America. He contributed to
statistical mechanics by his development of the quantum theory of a monatomic
gas and he has also accomplished valuable work in connection with atomic
transition probabilities and relativistic cosmology.

After his retirement he continued to wor...<br>

{'9af54332-06a2-4152-b896-397cdcea4e6c': {'url': 'https://www.nobelprize.org/prizes/physics/1921/einstein/biographical/'}}

------------------------------

In [78]:
query_2= "What was the original publication in which Albert Einstein's 1921 Nobel?"

In [79]:
response_2_0 = query_engine_0.query(query_2)
response_2_1 = query_engine_1.query(query_2)

--------- Response_0: --------

In [80]:

display_response(response_2_0 
    , source_length=1000
    , show_source=True
    , show_metadata=True)


**`Final Response:`** Albert Einstein's 1921 Nobel Prize autobiography was first published in the book series "Les Prix Nobel."

---

**`Source Node 1/1`**

**Node ID:** 19fdaaa8-b8c7-444c-bb4b-cb141b590de9<br>**Similarity:** 0.6599053823598307<br>**Text:** _

This autobiography/biography was written at the time of the award and first
published in the book series [_Les Prix
Nobel_](/nobel_organizations/nobelfoundation/publications/lesprix.html). It
was later edited and republished in _[Nobel
Lectures](/nobel_organizations/nobelfoundation/publications/lectures/index.html)_.
To cite this document, always state the source as shown above.

* * *

[](/nobel_prizes/physics/laureates/1921/)* Albert Einstein was formally
associated with the Institute for Advanced Study located in Princeton, New
Jersey.

Copyright (C) The Nobel Foundation 1922

To cite this section  
MLA style: Albert Einstein - Biographical. NobelPrize.org. Nobel Prize
Outreach 2025. Mon. 15 Sep 2025.
<https://www.nobelprize.org/prizes/physics/1921/einstein/biographical/>

Back to top  Back To Top Takes users back to the top of the page

### Nobel Prize announcements<br>

{'19fdaaa8-b8c7-444c-bb4b-cb141b590de9': {'url': 'https://www.nobelprize.org/prizes/physics/1921/einstein/biographical/',
  'questions_this_excerpt_can_answer': 'Certainly! Here are three specific questions that this context can provide unique answers to, along with higher-level summaries to frame them:\n\n### Higher-Level Summaries:\n1. **Historical Context of the Nobel Prize Announcement**:\n   The context provides specific details about the announcement and documentation of Albert Einstein\'s Nobel Prize in Physics in 1921. It highlights the original publication in "Les Prix Nobel" and its later inclusion in "Nobel Lectures."\n\n2. **Associations and Affiliations**:\n   The biographical information specifies Einstein\'s formal association with the Institute for Advanced Study in Princeton, New Jersey, at the time of the award.\n\n3. **Citation and Copyright Information**:\n   The document includes precise citation guidelines in MLA style and specifies the copyright held by The Nobel

--------- response_1: --------

In [81]:
display_response(response_2_1 
    , source_length=1000
    , show_source=True
    , show_metadata=True)


**`Final Response:`** Albert Einstein's 1921 Nobel Prize autobiography was first published in the book series "Les Prix Nobel."

---

**`Source Node 1/1`**

**Node ID:** 19fdaaa8-b8c7-444c-bb4b-cb141b590de9<br>**Similarity:** 0.6599053823598307<br>**Text:** _

This autobiography/biography was written at the time of the award and first
published in the book series [_Les Prix
Nobel_](/nobel_organizations/nobelfoundation/publications/lesprix.html). It
was later edited and republished in _[Nobel
Lectures](/nobel_organizations/nobelfoundation/publications/lectures/index.html)_.
To cite this document, always state the source as shown above.

* * *

[](/nobel_prizes/physics/laureates/1921/)* Albert Einstein was formally
associated with the Institute for Advanced Study located in Princeton, New
Jersey.

Copyright (C) The Nobel Foundation 1922

To cite this section  
MLA style: Albert Einstein - Biographical. NobelPrize.org. Nobel Prize
Outreach 2025. Mon. 15 Sep 2025.
<https://www.nobelprize.org/prizes/physics/1921/einstein/biographical/>

Back to top  Back To Top Takes users back to the top of the page

### Nobel Prize announcements<br>

{'19fdaaa8-b8c7-444c-bb4b-cb141b590de9': {'url': 'https://www.nobelprize.org/prizes/physics/1921/einstein/biographical/',
  'questions_this_excerpt_can_answer': 'Certainly! Here are three specific questions that this context can provide unique answers to, along with higher-level summaries to frame them:\n\n### Higher-Level Summaries:\n1. **Historical Context of the Nobel Prize Announcement**:\n   The context provides specific details about the announcement and documentation of Albert Einstein\'s Nobel Prize in Physics in 1921. It highlights the original publication in "Les Prix Nobel" and its later inclusion in "Nobel Lectures."\n\n2. **Associations and Affiliations**:\n   The biographical information specifies Einstein\'s formal association with the Institute for Advanced Study in Princeton, New Jersey, at the time of the award.\n\n3. **Citation and Copyright Information**:\n   The document includes precise citation guidelines in MLA style and specifies the copyright held by The Nobel

------------------------------

### Question already extracted

In [None]:
query_2 = "Why did Albert Einstein awarded the Nobel Prize in Physics? "

In [None]:
response_1_0 = query_engine_0.query(query_1)
response_1_1 = query_engine_1.query(query_1)

--------- Response_0: --------

In [None]:

display_response(response_1_0 
    , source_length=1000
    , show_source=True
    , show_metadata=True)


**`Final Response:`** The context provided does not explicitly mention the photoelectric effect. Therefore, based on the given information, I cannot provide an answer to the query about the photoelectric effect.

---

**`Source Node 1/1`**

**Node ID:** 9af54332-06a2-4152-b896-397cdcea4e6c<br>**Similarity:** 0.1352490042404992<br>**Text:** the Brownian movement of molecules. He investigated the
thermal properties of light with a low radiation density and his observations
laid the foundation of the photon theory of light.

In his early days in Berlin, Einstein postulated that the correct
interpretation of the special theory of relativity must also furnish a theory
of gravitation and in 1916 he published his paper on the general theory of
relativity. During this time he also contributed to the problems of the theory
of radiation and statistical mechanics.

In the 1920s, Einstein embarked on the construction of unified field theories,
although he continued to work on the probabilistic interpretation of quantum
theory, and he persevered with this work in America. He contributed to
statistical mechanics by his development of the quantum theory of a monatomic
gas and he has also accomplished valuable work in connection with atomic
transition probabilities and relativistic cosmology.

After his retirement he continued to wor...<br>

{'9af54332-06a2-4152-b896-397cdcea4e6c': {'url': 'https://www.nobelprize.org/prizes/physics/1921/einstein/biographical/'}}

--------- response_1: --------

In [None]:
display_response(response_1_1 
    , source_length=1000
    , show_source=True
    , show_metadata=True)


**`Final Response:`** The photoelectric effect is not explicitly mentioned in the provided context. However, it can be inferred that Einstein's investigations into the thermal properties of light with a low radiation density and his observations that laid the foundation of the photon theory of light are related to the principles underlying the photoelectric effect. The photoelectric effect demonstrates that light can be thought of as consisting of particles, or photons, which can eject electrons from a material when they strike it. This concept is foundational to the photon theory of light.

---

**`Source Node 1/1`**

**Node ID:** 9af54332-06a2-4152-b896-397cdcea4e6c<br>**Similarity:** 0.1352490042404992<br>**Text:** the Brownian movement of molecules. He investigated the
thermal properties of light with a low radiation density and his observations
laid the foundation of the photon theory of light.

In his early days in Berlin, Einstein postulated that the correct
interpretation of the special theory of relativity must also furnish a theory
of gravitation and in 1916 he published his paper on the general theory of
relativity. During this time he also contributed to the problems of the theory
of radiation and statistical mechanics.

In the 1920s, Einstein embarked on the construction of unified field theories,
although he continued to work on the probabilistic interpretation of quantum
theory, and he persevered with this work in America. He contributed to
statistical mechanics by his development of the quantum theory of a monatomic
gas and he has also accomplished valuable work in connection with atomic
transition probabilities and relativistic cosmology.

After his retirement he continued to wor...<br>

{'9af54332-06a2-4152-b896-397cdcea4e6c': {'url': 'https://www.nobelprize.org/prizes/physics/1921/einstein/biographical/'}}

------------------------------

### New Question

In [91]:
query_3 = "In his early days in Berlin, What was Einstein postulated?"

In [92]:
response_3_0 = query_engine_0.query(query_3)
response_3_1 = query_engine_1.query(query_3)

--------- Response_0: --------

In [93]:

display_response(response_3_0 
    , source_length=1000
    , show_source=True
    , show_metadata=True)


**`Final Response:`** In his early days in Berlin, Einstein postulated that the correct interpretation of the special theory of relativity must also furnish a theory of gravitation.

---

**`Source Node 1/1`**

**Node ID:** 7d309b43-b600-4ab7-8c00-46df30c92f58<br>**Similarity:** 0.5993445308060565<br>**Text:** stages on the way to his goal. He regarded his major
achievements as mere stepping-stones for the next advance.

At the start of his scientific work, Einstein realized the inadequacies of
Newtonian mechanics and his special theory of relativity stemmed from an
attempt to reconcile the laws of mechanics with the laws of the
electromagnetic field. He dealt with classical problems of statistical
mechanics and problems in which they were merged with quantum theory: this led
to an explanation of the Brownian movement of molecules. He investigated the
thermal properties of light with a low radiation density and his observations
laid the foundation of the photon theory of light.

In his early days in Berlin, Einstein postulated that the correct
interpretation of the special theory of relativity must also furnish a theory
of gravitation and in 1916 he published his paper on the general theory of
relativity. During this time he also contributed to the problems of the theory
of radiation and ...<br>

{'7d309b43-b600-4ab7-8c00-46df30c92f58': {'url': 'https://www.nobelprize.org/prizes/physics/1921/einstein/biographical/'}}

--------- response_1: --------

In [94]:
display_response(response_3_1 
    , source_length=1000
    , show_source=True
    , show_metadata=True)


**`Final Response:`** In his early days in Berlin, Einstein postulated that the correct interpretation of the special theory of relativity must also furnish a theory of gravitation.

---

**`Source Node 1/1`**

**Node ID:** 7d309b43-b600-4ab7-8c00-46df30c92f58<br>**Similarity:** 0.5993445308060565<br>**Text:** stages on the way to his goal. He regarded his major
achievements as mere stepping-stones for the next advance.

At the start of his scientific work, Einstein realized the inadequacies of
Newtonian mechanics and his special theory of relativity stemmed from an
attempt to reconcile the laws of mechanics with the laws of the
electromagnetic field. He dealt with classical problems of statistical
mechanics and problems in which they were merged with quantum theory: this led
to an explanation of the Brownian movement of molecules. He investigated the
thermal properties of light with a low radiation density and his observations
laid the foundation of the photon theory of light.

In his early days in Berlin, Einstein postulated that the correct
interpretation of the special theory of relativity must also furnish a theory
of gravitation and in 1916 he published his paper on the general theory of
relativity. During this time he also contributed to the problems of the theory
of radiation and ...<br>

{'7d309b43-b600-4ab7-8c00-46df30c92f58': {'url': 'https://www.nobelprize.org/prizes/physics/1921/einstein/biographical/'}}

------------------------------

## Use of Pydantic Model
Define a basic [structured schema](https://ojitha.github.io/python/ai/langgraph/langchain/chatgpt/2025/09/07/Python_Type_Annotation.html) that we want to extract:

In [95]:
from pydantic import BaseModel, Field
class NodeMetadata(BaseModel):
    entities: list[str] = Field(
        ..., description="Unique entities in this text chunk."
    )
    summary: str = Field(
        ..., description="A concise summary of this text chunk"
    )

### Extractor

The extractor combines a Pydantic schema with LlamaIndex's `LLMTextCompletionProgram` to create structured metadata extraction. It uses the Bedrock LLM to analyze document nodes and extract entities and summaries according to the defined `NodeMetadata` model. 

In [101]:
extractor_template = """\
Here is the content of the section:
-------------
{context_str}
-------------
Given the contextual information, extract out a {class_name} object.\
"""

Above `extractor_template` provides the prompt structure that instructs the LLM how to process the document content and extract the required schema fields.

In [118]:
# bedrock PyDantic
from llama_index.core.program import LLMTextCompletionProgram

# Create program with Bedrock LLM
bedrock_program = LLMTextCompletionProgram.from_defaults(
    output_cls=NodeMetadata  # your output schema
    , prompt_template_str="{input}"
    , llm=llm
    , extractor_template =extractor_template
)


In [119]:
from llama_index.core.extractors import PydanticProgramExtractor

metadata_extractor = PydanticProgramExtractor(
    program=bedrock_program, input_key="input", show_progress=True
)

In [None]:
extract_metadata = metadata_extractor.extract(orig_nodes[0:1])

100%|██████████| 1/1 [00:02<00:00,  2.04s/it]


In [125]:
extract_metadata

[{'entities': ['https://www.nobelprize.org/prizes/physics/1921/einstein/biographical/',
   'Nobel Prizes & laureates',
   'All Nobel Prizes',
   'All Nobel Prizes 2024',
   'Physics prize',
   'Chemistry prize',
   'Medicine prize',
   'Literature prize',
   'Peace prize',
   'Prize in economic sciences',
   'About',
   'About the prize',
   'Alfred'],
  'summary': 'The content is a section from the Nobel Prize website, specifically focusing on the biographical information of Albert Einstein, who won the Nobel Prize in Physics in 1921. It includes navigation links to various Nobel Prizes categories and information about the Nobel Prizes.'}]

### Metadata Nodes

The `metadata_nodes` represent the original document nodes enhanced with structured metadata extracted using the Pydantic model. Unlike the raw extraction dictionary, these nodes retain their original text content while adding the extracted entities and summary to their metadata fields, making them ready for indexing and retrieval with enriched semantic information.

In [126]:
metadata_nodes = metadata_extractor.process_nodes(orig_nodes[0:1])

100%|██████████| 1/1 [00:00<00:00,  1.03it/s]


In [133]:
metadata_nodes

[TextNode(id_='cab40088-3371-4bbe-9fde-9e0cb549df07', embedding=None, metadata={'url': 'https://www.nobelprize.org/prizes/physics/1921/einstein/biographical/', 'entities': ['https://www.nobelprize.org/prizes/physics/1921/einstein/biographical/', 'Nobel Prizes & laureates', 'All Nobel Prizes', 'All Nobel Prizes 2024', 'Physics prize', 'Chemistry prize', 'Medicine prize', 'Literature prize', 'Peace prize', 'Prize in economic sciences', 'About', 'About the prize', 'Alfred'], 'summary': 'The content is a section from the Nobel Prize website, specifically focusing on the biographical information of Albert Einstein, who won the 1921 Nobel Prize in Physics. It includes navigation links to various Nobel Prizes categories and information about the prizes.'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='35ceaca2-31bf-471a-92ae-7754022ca516', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'url': 'https://w