In [1]:
from docling.document_converter import DocumentConverter

source = "https://arxiv.org/pdf/2408.09869"  # PDF path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())  # output: "### Docling Technical Report[...]"

  from .autonotebook import tqdm as notebook_tqdm
2025-10-11 14:19:23,186 - INFO - detected formats: [<InputFormat.PDF: 'pdf'>]
2025-10-11 14:19:23,206 - INFO - Going to convert document batch...
2025-10-11 14:19:23,206 - INFO - Initializing pipeline for StandardPdfPipeline with options hash e647edf348883bed75367b22fbe60347
2025-10-11 14:19:23,215 - INFO - Loading plugin 'docling_defaults'
2025-10-11 14:19:23,221 - INFO - Registered picture descriptions: ['vlm', 'api']
2025-10-11 14:19:23,228 - INFO - Loading plugin 'docling_defaults'
2025-10-11 14:19:23,238 - INFO - Registered ocr engines: ['easyocr', 'ocrmac', 'rapidocr', 'tesserocr', 'tesseract']
2025-10-11 14:19:23,811 - INFO - Accelerator device: 'cpu'
2025-10-11 14:19:25,134 - INFO - Accelerator device: 'cpu'
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install 

<!-- image -->

## Docling Technical Report

## Version 1.0

Christoph Auer Maksym Lysak Ahmed Nassar Michele Dolfi Nikolaos Livathinos Panos Vagenas Cesar Berrospi Ramis Matteo Omenetti Fabian Lindlbauer Kasper Dinkla Lokesh Mishra Yusik Kim Shubham Gupta Rafael Teixeira de Lima Valery Weber Lucas Morin Ingmar Meijer Viktor Kuropiatnyk Peter W. J. Staar

AI4K Group, IBM Research R¨ uschlikon, Switzerland

## Abstract

This technical report introduces Docling , an easy to use, self-contained, MITlicensed open-source package for PDF document conversion. It is powered by state-of-the-art specialized AI models for layout analysis (DocLayNet) and table structure recognition (TableFormer), and runs efficiently on commodity hardware in a small resource budget. The code interface allows for easy extensibility and addition of new features and models.

## 1 Introduction

Converting PDF documents back into a machine-processable format has been a major challenge for decades due to their huge vari

In [2]:
result = converter.convert('../test.docx')
print(result.document.export_to_markdown())  

2025-10-11 14:20:18,444 - INFO - detected formats: [<InputFormat.DOCX: 'docx'>]
2025-10-11 14:20:18,451 - INFO - Going to convert document batch...
2025-10-11 14:20:18,452 - INFO - Initializing pipeline for SimplePipeline with options hash 995a146ad601044538e6a923bea22f4e
2025-10-11 14:20:18,453 - INFO - Processing document test.docx
2025-10-11 14:20:18,561 - INFO - Finished converting document test.docx in 0.12 sec.


# Steven VanOmmeren

# February 27, 2025

# ADEC7500 Final Paper

# Modern Child Labor Abuse in the United States: is it Right to Work?

# 

## 1 Introduction

In the United States, child workers are granted additional protection under both federal and state legislation. Companies generally may not employ anyone under the age of 14 years old. Children aged 14 and 15 are severely limited in the hours they may work per week while school is in session. 14- through 17-year-olds may not work in certain industries considered dangerous by the Department of Labor. Recent controversial events have sparked debate about the current state of US child labor laws. I review the economic analysis and ethical language surrounding two recent domestic child labor abuse stories. Pro-regulation arguments are more likely to invoke consequentialist ethical frameworks, while those seeking to loosen child labor regulations employ family values and deontological negative rights. An economic formulation of Mill’

In [3]:
result = converter.convert('../test2.pdf')
print(result.document.export_to_markdown())  

2025-10-11 14:21:27,720 - INFO - detected formats: [<InputFormat.PDF: 'pdf'>]
2025-10-11 14:21:27,723 - INFO - Going to convert document batch...
2025-10-11 14:21:27,724 - INFO - Processing document test2.pdf
2025-10-11 14:21:41,428 - INFO - Finished converting document test2.pdf in 13.70 sec.


## Modern Child Labor Abuse in the United States

Steven VanOmmeren February 27, 2025 ADEC7500 Final Presentation Prof. Richard McGowan

## Overview

- The Fair Labor Standards Act
- Why is Child Labor Controversial?
- Recent Domestic Child Labor Violations
- Analysis of Recent Child Labor Violations
- Proposed Solution

## The Fair Labor Standards Act (FLSA)

- Child labor saw a large decrease in the early 20th century:
- 1900: 20% of children aged 10 to 15 were employed
- 1930: 5% of children aged 10 to 15 were employed
- Decrease is attributable to:
- Enforcement of compulsory education laws
- Industrialization reduces need for unskilled labor
- Mass immigration
- Improved standards of living

## The Fair Labor Standards Act (FLSA) (cont.)

- FLSA signed in 1938
- New federal laws:
- Under 14: generally illegal to work
- Under 18: no hazardous employment
- 14-15: extra restrictions to prevent loss of education/health
- Federal minimum wage
- Required overtime payments for hourly emp

In [11]:
import json, os
import logging
import time
from pathlib import Path
from typing import Iterable
import yaml
from docling.datamodel.base_models import ConversionStatus
from docling.datamodel.document import ConversionResult
from docling.datamodel.settings import settings
from docling.document_converter import DocumentConverter

In [12]:
_log = logging.getLogger(__name__)

os.chdir(r"C:\Users\Steven\Documents\Python\super-search")

In [17]:
def export_documents(
    conv_results: Iterable[ConversionResult],
    output_dir: Path,
):
    output_dir.mkdir(parents=True, exist_ok=True)

    success_count = 0
    failure_count = 0
    partial_success_count = 0

    for conv_res in conv_results:
        if conv_res.status == ConversionStatus.SUCCESS:
            success_count += 1
            doc_filename = conv_res.input.file.stem

            # Export Docling document format to text:
            with (output_dir / f"{doc_filename}.txt").open("w", encoding='utf-8') as fp:
                fp.write(conv_res.document.export_to_markdown(strict_text=True))

        elif conv_res.status == ConversionStatus.PARTIAL_SUCCESS:
            _log.info(
                f"Document {conv_res.input.file} was partially converted with the following errors:"
            )
            for item in conv_res.errors:
                _log.info(f"\t{item.error_message}")
            partial_success_count += 1
        else:
            _log.info(f"Document {conv_res.input.file} failed to convert.")
            failure_count += 1

    _log.info(
        f"Processed {success_count + partial_success_count + failure_count} docs, "
        f"of which {failure_count} failed "
        f"and {partial_success_count} were partially converted."
    )
    return success_count, partial_success_count, failure_count

In [18]:
def main():
    logging.basicConfig(level=logging.INFO)

    input_doc_paths = [
        Path("./data/tests/32165.pdf"),
        Path("./data/tests/32189.pdf"),
        Path("./data/tests/32267.pdf"),
        Path("./data/tests/32286.pdf"),
    ]

    # buf = BytesIO(Path("./test/data/2206.01062.pdf").open("rb").read())
    # docs = [DocumentStream(name="my_doc.pdf", stream=buf)]
    # input = DocumentConversionInput.from_streams(docs)

    # # Turn on inline debug visualizations:
    # settings.debug.visualize_layout = True
    # settings.debug.visualize_ocr = True
    # settings.debug.visualize_tables = True
    # settings.debug.visualize_cells = True

    doc_converter = DocumentConverter()

    start_time = time.time()

    conv_results = doc_converter.convert_all(
        input_doc_paths,
        raises_on_error=False,  # to let conversion run through all and examine results at the end
    )
    success_count, partial_success_count, failure_count = export_documents(
        conv_results, output_dir=Path("scratch")
    )

    end_time = time.time() - start_time

    _log.info(f"Document conversion complete in {end_time:.2f} seconds.")

    if failure_count > 0:
        raise RuntimeError(
            f"The example failed converting {failure_count} on {len(input_doc_paths)}."
        )

In [19]:
main()

INFO:docling.document_converter:Going to convert document batch...
INFO:docling.utils.accelerator_utils:Accelerator device: 'cpu'
INFO:docling.utils.accelerator_utils:Accelerator device: 'cpu'
INFO:docling.utils.accelerator_utils:Accelerator device: 'cpu'
INFO:docling.pipeline.base_pipeline:Processing document 32165.pdf
INFO:docling.document_converter:Finished converting document 32165.pdf in 42.09 sec.
INFO:docling.pipeline.base_pipeline:Processing document 32189.pdf
INFO:docling.document_converter:Finished converting document 32189.pdf in 66.14 sec.
INFO:docling.document_converter:Going to convert document batch...
INFO:docling.pipeline.base_pipeline:Processing document 32267.pdf
INFO:docling.document_converter:Finished converting document 32267.pdf in 97.86 sec.
INFO:docling.pipeline.base_pipeline:Processing document 32286.pdf
INFO:docling.document_converter:Finished converting document 32286.pdf in 144.11 sec.
INFO:__main__:Processed 4 docs, of which 0 failed and 0 were partially c