## Step 1: List all PDF files in the current directory

We start by listing all PDF files in the current working directory (the folder named `NLP`). This helps us confirm which files are available for conversion using `docling`.

In [1]:
from pathlib import Path

# Define the working directory
base_path = Path('.')  # current folder (assumed to be 'NLP')

# List all PDF files
pdf_files = list(base_path.glob('*.pdf'))

# Display the found PDF files
pdf_files

[PosixPath('02_Neural Networks for NLP.pdf'),
 PosixPath('03_Large Language Models.pdf')]

## Step 2: Convert each PDF to Markdown using Docling

For each PDF found in the previous step, we call the `docling` CLI to convert the file to Markdown format.  
The resulting `.md` files are saved in a folder named `markdown/` in the current directory.


In [2]:
import subprocess

# Create 'markdown' folder if it doesn't exist
output_dir = base_path / 'markdown'
output_dir.mkdir(exist_ok=True)

# Loop through PDFs and convert each one using docling
for pdf_path in pdf_files:
    output_path = output_dir / (pdf_path.stem + '.md')
    print(f"Converting: {pdf_path.name} → {output_path.name}")

    subprocess.run([
        "docling",
        str(pdf_path),
        "--output",
        str(output_path)
    ])


Converting: 02_Neural Networks for NLP.pdf → 02_Neural Networks for NLP.md
Converting: 03_Large Language Models.pdf → 03_Large Language Models.md
