# PDF to image conversion utility

This notebook provides a utility function to convert PDF files into images using the `pdf2image` library.

## Installation instructions

Before running this notebook, you need to install some dependencies. Run the following commands in your terminal based on your operating system:

### For Linux (Ubuntu/Debian):
```bash
sudo apt-get update
sudo apt-get install poppler-utils
```

### For MacOS:
```bash
brew install poppler
```

In [1]:
!pip install pdf2image



In [2]:
import os
import multiprocessing
from pdf2image import convert_from_path

In [3]:
def convert_pdf_to_images(pdf_path, output_folder, dpi=300, thread_count=None):
    """
    Convert a PDF file to a series of images.
    
    :param pdf_path: Path to the PDF file
    :param output_folder: Folder to save the output images
    :param dpi: DPI for the output images (default: 300)
    :param thread_count: Number of threads to use (default: auto-detect)
    """
    # Create output folder if it doesn't exist
    os.makedirs(output_folder, exist_ok=True)
    
    # Auto-discover thread count if not provided
    if thread_count is None:
        thread_count = multiprocessing.cpu_count()
    
    # Convert PDF to images
    images = convert_from_path(
        pdf_path,
        dpi=dpi,
        thread_count=thread_count
    )
    
    # Save images
    for i, image in enumerate(images):
        image_path = os.path.join(output_folder, f'page_{i+1}.png')
        image.save(image_path, 'PNG')
    
    print(f"Converted {len(images)} pages to images in {output_folder}")

In [6]:
# Adapt your paths
pdf_path = 'input.pdf'
output_folder = 'results'

In [7]:
convert_pdf_to_images(pdf_path, output_folder)

Converted 218 pages to images in results
