# Hands-on 1: Ingestion and Chunking

## Problem

Create a function with these requirements:​

- Input: the name of the Wikipedia page to ingest "Glossary of artificial intelligence".​

- Output: list of name-description concepts 

## Code

In [None]:
# if running on colab uncomment the next line
%pip install llama-index>=0.11.20
%pip install llama-index-readers-wikipedia>=0.2.0
%pip install wikipedia>=1.4.0

In [11]:
# imports
from rich import print as rprint
from llama_index.core.schema import MetadataMode
from llama_index.core.schema import BaseNode
from llama_index.readers.wikipedia import WikipediaReader
from llama_index.core.node_parser import SentenceSplitter

In [12]:

def ingest_and_chunk(name_page: str) -> list[BaseNode]:
    reader = WikipediaReader()
    documents = reader.load_data(pages=[name_page])
    node_parser = SentenceSplitter(chunk_size=128, chunk_overlap=20, 
                                separator="\n\n",
                                paragraph_separator="\n\n")
    nodes = node_parser.get_nodes_from_documents(
        documents
    )
    return nodes

In [13]:
name_page = "Glossary of artificial intelligence" 

chunks = ingest_and_chunk(name_page)

for chunk in chunks:
    rprint(chunk.get_content(
        metadata_mode=MetadataMode.LLM
    ))
