# Paged PDF Splitter

This notebook shows how to load a PDF file with the `PagedPDFSplitter`, which 
uses the [pypdf](https://github.com/mstamy2/PyPDF2) library to read a 
PDF file. **Note this reads & splits.** 

## Compared with other PDF Reader
Compared with the `unstructured` PDF reader, this one is local
and does not require an model - it just extracts text. This means it will
not work for scanned documents or PDFs containing images of text.

In [None]:
from langchain.document_loaders import PagedPDFSplitter

loader = PagedPDFSplitter(chunk_size=250)
splits, metadatas = loader.load_and_split(
    "examples/example_data/layout-parser-paper.pdf"
)

## Using with document retrieval

An advantage of this approach is that documents can be retrieved with page numbers.

In [None]:
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings

faiss_index = FAISS.from_texts(splits, OpenAIEmbeddings(), metadatas=metadatas)
docs = faiss_index.similarity_search("How will the community be engaged?", k=2)
for doc in docs:
    print(doc.metadata["pages"] + ":", doc.page_content)