langchain-ai · lnhsingh · Oct 6, 2025 · Oct 1, 2025 · Oct 6, 2025
@@ -67,6 +67,7 @@ The below document loaders allow you to load PDF documents.
 | [Upstage Document Parse Loader](/oss/integrations/document_loaders/upstage) | Load PDF files using UpstageDocumentParseLoader | Package |
 | [Docling](/oss/integrations/document_loaders/docling) | Load PDF files using Docling | Package |
 | [UnDatasIO](/oss/integrations/document_loaders/undatasio) | Load PDF files using UnDatasIO | Package |
+| [OpenDataLoader PDF](/oss/integrations/document_loaders/opendataloader_pdf) | Load PDF files using OpenDataLoader PDF | Package |
 
 
 ### Cloud Providers
@@ -258,6 +259,7 @@ The below document loaders allow you to load data from common data formats.
 <Card title="Notion DB" icon="link" href="/oss/integrations/document_loaders/notion" arrow="true" cta="View guide" />
 <Card title="Nuclia" icon="link" href="/oss/integrations/document_loaders/nuclia" arrow="true" cta="View guide" />
 <Card title="Obsidian" icon="link" href="/oss/integrations/document_loaders/obsidian" arrow="true" cta="View guide" />
+<Card title="OpenDataLoader PDF" icon="link" href="/oss/integrations/document_loaders/opendataloader_pdf" arrow="true" cta="View guide" />
 <Card title="Open Document Format (ODT)" icon="link" href="/oss/integrations/document_loaders/odt" arrow="true" cta="View guide" />
 <Card title="Open City Data" icon="link" href="/oss/integrations/document_loaders/open_city_data" arrow="true" cta="View guide" />
 <Card title="Oracle Autonomous Database" icon="link" href="/oss/integrations/document_loaders/oracleadb_loader" arrow="true" cta="View guide" />

@@ -0,0 +1,67 @@
+---
+title: OpenDataLoader PDF
+---
+
+**Safe, Open, High-Performance — PDF for AI**
+
+[OpenDataLoader PDF](https://github.com/opendataloader-project/opendataloader-pdf) converts PDFs into JSON, Markdown or Html — ready to feed into modern AI stacks (LLMs, vector search, and RAG).
+
+It reconstructs document layout (headings, lists, tables, and reading order) so the content is easier to chunk, index, and query.
+Powered by fast, heuristic, rule-based inference, it runs entirely on your local machine and delivers high-throughput processing for large document sets.
+AI-safety is enabled by default and automatically filters likely prompt-injection content embedded in PDFs to reduce downstream risk.
+
+## Overview
+
+### Integration details
+
+| Class | Package | Local | Serializable | JS support |
+| :--- | :--- | :---: | :---: |  :---: |
+| [OpenDataLoader PDF](https://github.com/opendataloader-project/opendataloader-pdf) | [langchain-opendataloader-pdf](https://pypi.org/project/langchain-opendataloader-pdf/) | ✅ | ❌ | ❌ |
+
+### Loader features
+
+| Source | Document Lazy Loading | Native Async Support
+| :---: | :---: | :---: |
+| OpenDataLoaderPDFLoader | ✅ | ❌ |
+
+The `OpenDataLoaderPDFLoader` component enables you to parse PDFs into structured `Document` objects.
+
+## Requirements
+- Python >= 3.9
+- Java 11 or newer available on the system `PATH`
+- opendataloader-pdf >= 1.1.1
+
+## Installation
+```bash
+pip install -U langchain-opendataloader-pdf
+```
+
+## Quick start
+```python
+from langchain_opendataloader_pdf import OpenDataLoaderPDFLoader
+
+loader = OpenDataLoaderPDFLoader(
+    file_path=["path/to/document.pdf", "path/to/folder"], 
+    format="text"
+)
+documents = loader.load()
+
+for doc in documents:
+    print(doc.metadata, doc.page_content[:80])
+```
+
+## Parameters
+
+| Parameter                | Type                  | Required   | Default      | Description                                                                                                        |
+|--------------------------|-----------------------| ---------- |--------------|--------------------------------------------------------------------------------------------------------------------|
+| `file_path`              | `List[str]`           | ✅ Yes     | —            | One or more PDF file paths or directories to process.                                                              |
+| `format`                 | `str`                 | No         | `None`       | Output formats (e.g. `"json"`, `"html"`, `"markdown"`, `"text"`).                                                  |
+| `quiet`                  | `bool`                | No         | `False`      | Suppresses CLI logging output when `True`.                                                                         |
+| `content_safety_off`     | `Optional[List[str]]` | No         | `None`       | List of content safety filters to disable (e.g. `"all"`, `"hidden-text"`, `"off-page"`, `"tiny"`, `"hidden-ocg"`). |
+
+## Additional Resources
+
+- [LangChain OpenDataLoader PDF integration GitHub](https://github.com/opendataloader-project/langchain-opendataloader-pdf)
+- [LangChain OpenDataLoader PDF integration PyPI package](https://pypi.org/project/langchain-opendataloader-pdf/)
+- [OpenDataLoader PDF GitHub](https://github.com/opendataloader-project/opendataloader-pdf)
+- [OpenDataLoader PDF Homepage](https://opendataloader.org/)
@@ -1990,6 +1990,14 @@ Browse the complete collection of integrations available for Python. LangChain P
   >
     GPT models and comprehensive AI platform.
   </Card>
+
+  <Card
+    title="OpenDataLoader PDF"
+    href="/oss/integrations/providers/opendataloader_pdf"
+    icon="link"
+  >
+    Safe, Open, High-Performance — PDF for AI
+  </Card>
 
   <Card
     title="OpenGradient"

@@ -0,0 +1,51 @@
+---
+title: OpenDataLoader PDF
+---
+
+> **Safe, Open, High-Performance — PDF for AI**
+
+> [OpenDataLoader PDF](https://github.com/opendataloader-project/opendataloader-pdf) converts PDFs into JSON, Markdown or Html — ready to feed into modern AI stacks (LLMs, vector search, and RAG).
+> 
+> It reconstructs document layout (headings, lists, tables, and reading order) so the content is easier to chunk, index, and query.
+> Powered by fast, heuristic, rule-based inference, it runs entirely on your local machine and delivers high-throughput processing for large document sets.
+> AI-safety is enabled by default and automatically filters likely prompt-injection content embedded in PDFs to reduce downstream risk.
+
+## Requirements
+- Python >= 3.9
+- Java 11 or newer available on the system `PATH`
+- opendataloader-pdf >= 1.1.1
+
+## Installation
+```bash
+pip install -U langchain-opendataloader-pdf
+```
+
+## Quick start
+```python
+from langchain_opendataloader_pdf import OpenDataLoaderPDFLoader
+
+loader = OpenDataLoaderPDFLoader(
+    file_path=["path/to/document.pdf", "path/to/folder"], 
+    format="text"
+)
+documents = loader.load()
+
+for doc in documents:
+    print(doc.metadata, doc.page_content[:80])
+```
+
+## Parameters
+
+| Parameter                | Type                  | Required   | Default      | Description                                                                                                        |
+|--------------------------|-----------------------| ---------- |--------------|--------------------------------------------------------------------------------------------------------------------|
+| `file_path`              | `List[str]`           | ✅ Yes     | —            | One or more PDF file paths or directories to process.                                                              |
+| `format`                 | `str`                 | No         | `None`       | Output formats (e.g. `"json"`, `"html"`, `"markdown"`, `"text"`).                                                  |
+| `quiet`                  | `bool`                | No         | `False`      | Suppresses CLI logging output when `True`.                                                                         |
+| `content_safety_off`     | `Optional[List[str]]` | No         | `None`       | List of content safety filters to disable (e.g. `"all"`, `"hidden-text"`, `"off-page"`, `"tiny"`, `"hidden-ocg"`). |
+
+## Additional Resources
+
+- [LangChain OpenDataLoader PDF integration GitHub](https://github.com/opendataloader-project/langchain-opendataloader-pdf)
+- [LangChain OpenDataLoader PDF integration PyPI package](https://pypi.org/project/langchain-opendataloader-pdf/)
+- [OpenDataLoader PDF GitHub](https://github.com/opendataloader-project/opendataloader-pdf)
+- [OpenDataLoader PDF Homepage](https://opendataloader.org/)