In [2]:
from llama_index.core import Document

text = "The quick brown fox jumps over the lazy dog."
doc = Document(
    text=text, 
    metadata={'author': 'John Doe','category': 'others'}, 
    id_='1'
)
print(doc)

Doc ID: 1
Text: The quick brown fox jumps over the lazy dog.


In LlamaIndex, a **Document** is a core abstraction that serves as a generic container for data from various sources. This can include text from PDFs, API outputs, database queries, and more. The Document object is designed to encapsulate this data along with associated metadata and relationships, making it a fundamental building block for creating structured indices and facilitating efficient data querying.

### Key Features of a Document in LlamaIndex

1. **Generic Container**:
   - A Document can hold data from diverse sources, such as PDFs, APIs, or databases. This flexibility allows for a wide range of data ingestion scenarios[1][4].

2. **Attributes**:
   - **Text**: The primary content of the Document.
   - **Metadata**: A dictionary of annotations that can be appended to the text, such as file names, timestamps, or other relevant information.
   - **Relationships**: A dictionary containing relationships to other Documents or Nodes, which helps in building a more structured and relational index[4][6].

3. **Multimodal Capabilities**:
   - While primarily designed for text, Documents in LlamaIndex have beta support for storing images and are actively being developed to improve multimodal capabilities[4].

4. **Integration with Nodes**:
   - Documents can be parsed into smaller, more granular data entities called Nodes. Nodes represent "chunks" of the source Document and carry metadata and relationship information, which is crucial for precise retrieval operations[4][6].

### Example Usage

Here is a simple example of how to create and use a Document in LlamaIndex:

```python
from llama_index.core import Document, VectorStoreIndex

# Create a list of text data
text_list = ["This is the first document.", "This is the second document."]

# Create Document objects from the text data
documents = [Document(text=t) for t in text_list]

# Build an index from the Document objects
index = VectorStoreIndex.from_documents(documents)

# Query the index
query_engine = index.as_query_engine()
response = query_engine.query("first document")
print(response)
```

### Advanced Usage

For more advanced scenarios, such as handling large datasets or integrating with various data sources, LlamaIndex provides data connectors (Readers) that can automatically convert data into Document objects. For example, using the `SimpleDirectoryReader` to load documents from a directory:

```python
from llama_index.core import SimpleDirectoryReader

# Initialize the reader for the directory
reader = SimpleDirectoryReader(input_dir="data/")

# Load the documents
docs = reader.load_data()

# Extract documents and their names into separate lists
documents = [doc.text for doc in docs]
document_names = [doc.id_ for doc in docs]

print("Documents:", documents)
print("Document Names:", document_names)
```

### Conclusion

In summary, a Document in LlamaIndex is a versatile and essential component that encapsulates data from various sources, enriched with metadata and relationships. This abstraction facilitates the creation of structured indices and efficient querying, making it a powerful tool for building LLM-based applications[1][4][6].

Citations:
* [1] https://medium.aiplanet.com/advanced-rag-using-llama-index-e06b00dc0ed8?gi=a9d1f4470c17
* [2] https://github.com/run-llama/llama_index/discussions/11970
* [3] https://nanonets.com/blog/llamaindex/
* [4] https://docs.llamaindex.ai/en/stable/module_guides/loading/documents_and_nodes/
* [5] https://www.youtube.com/watch?v=j6dJcODLd_c
* [6] https://ts.llamaindex.ai/api/classes/Document
* [7] https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/schema.py
* [8] https://llamahub.ai/l/readers/llama-index-readers-docstring-walker?from=readers