# PDF Resolution & Size Converter

Convert a PDF to a smaller, lower-resolution version by rendering each page as a JPEG at your chosen DPI and quality.

**Steps:**
1. Run the setup cell below.
2. Set your input/output paths and options in the main cell.
3. Run the main cell to create your optimized PDF.


In [None]:
# Package Installation

!python3 -m venv PDF --clear
!source PDF/bin/activate && pip install --upgrade pip
!source PDF/bin/activate && pip install PyMuPDF Pillow tqdm

print("Setup complete! Ready to convert images to PDF.")

In [6]:
import os
import io
import fitz  
from tqdm import tqdm
import multiprocessing
from PIL import Image

# Input PDF path
pdf_path = "/home/bacon/PDF/dio.pdf"

# Define the output path and the target resolution (DPI)
output_pdf_path = os.path.splitext(pdf_path)[0] + "_dwn.pdf"
dpi = 120 # Target DPI. Lower values mean lower resolution and smaller file size.                                                              
jpeg_quality = 60 # JPEG quality (0-100), lower is smaller file                            

# Get original file stats
original_file_size = os.path.getsize(pdf_path)
original_doc = fitz.open(pdf_path)
num_pages = original_doc.page_count
original_doc.close()  # Reopen in each process

# Detect number of available workers
num_procs = os.cpu_count() or 1 # Set to lower the number of processes in large files to avoid memory issues
print(f"Detected {num_procs} CPU cores for multiprocessing.")
print(f"Input PDF: {pdf_path}")
print(f"Total pages to process: {num_pages}")
print(f"Target DPI for downsampling: {dpi}")
print(f"JPEG quality: {jpeg_quality}")
print(f"Output PDF will be saved as: {output_pdf_path}\n")

def process_chunk_mp(args):
    start, end, pdf_path, dpi, jpeg_quality = args
    print(f"Process handling pages {start} to {end-1}")
    doc = fitz.open(pdf_path)
    chunk_results = []
    for page_num in range(start, end):
        page = doc.load_page(page_num)
        # Render page as pixmap
        downsampled_pix = page.get_pixmap(dpi=dpi)
        # Convert pixmap to PIL Image
        img = Image.frombytes("RGB", [downsampled_pix.width, downsampled_pix.height], downsampled_pix.samples)
        # Save as JPEG to bytes
        img_bytes_io = io.BytesIO()
        img.save(img_bytes_io, format="JPEG", quality=jpeg_quality)
        img_bytes = img_bytes_io.getvalue()
        chunk_results.append((page_num, img_bytes, downsampled_pix.width, downsampled_pix.height))
    doc.close()
    print(f"Process finished pages {start} to {end-1}")
    return chunk_results

# Split pages into chunks based on number of processes
chunk_size = (num_pages + num_procs - 1) // num_procs
chunks = [(i*chunk_size, min((i+1)*chunk_size, num_pages), pdf_path, dpi, jpeg_quality) for i in range(num_procs) if i*chunk_size < num_pages]

print(f"Splitting {num_pages} pages into {len(chunks)} chunks for multiprocessing.\n")

with multiprocessing.Pool(processes=num_procs) as pool:
    results = []
    for chunk_result in tqdm(pool.imap_unordered(process_chunk_mp, chunks), total=len(chunks), desc="Process chunks"):
        results.extend(chunk_result)

# Sort results by page number to preserve order
results.sort(key=lambda x: x[0])

print("\nAssembling downsampled pages into new PDF (multiprocessing, JPEG)...")
new_doc = fitz.open()
for idx, (page_num, img_bytes, width, height) in enumerate(tqdm(results, desc="Writing pages")):
    img_stream = io.BytesIO(img_bytes)
    new_page_rect = fitz.Rect(0, 0, width, height)
    new_page = new_doc.new_page(width=new_page_rect.width, height=new_page_rect.height)
    new_page.insert_image(new_page_rect, stream=img_stream)
    if (idx+1) % 10 == 0 or (idx+1) == len(results):
        print(f"  Written {idx+1}/{len(results)} pages...")

new_doc.save(output_pdf_path)
new_doc.close()

new_file_size = os.path.getsize(output_pdf_path)

print("\nPDF processing complete (multiprocessing, JPEG).")
print(f"Original file size: {original_file_size/1024/1024:.2f} MB")
print(f"Downsampled file size: {new_file_size/1024/1024:.2f} MB")
print(f"Downsampled file saved at: {output_pdf_path}")


Detected 28 CPU cores for multiprocessing.
Input PDF: /home/bacon/PDF/dio.pdf
Total pages to process: 432
Target DPI for downsampling: 120
JPEG quality: 60
Output PDF will be saved as: /home/bacon/PDF/dio_dwn.pdf

Splitting 432 pages into 27 chunks for multiprocessing.

Process handling pages 0 to 15

Process chunks:   0%|          | 0/27 [00:00<?, ?it/s]

Process handling pages 16 to 31Process handling pages 32 to 47Process handling pages 64 to 79Process handling pages 96 to 111Process handling pages 48 to 63Process handling pages 80 to 95
Process handling pages 160 to 175Process handling pages 128 to 143Process handling pages 176 to 191Process handling pages 144 to 159Process handling pages 224 to 239Process handling pages 208 to 223Process handling pages 240 to 255Process handling pages 112 to 127

Process handling pages 256 to 271Process handling pages 192 to 207



Process handling pages 336 to 351Process handling pages 272 to 287Process handling pages 304 to 319Process handling pages 288 to 303Process handling pages 320 to 335
Process handling pages 352 to 367


Process handling pages 368 to 383Process handling pages 416 to 431
Process handling pages 384 to 399

Process handling pages 400 to 415












Process finished pages 352 to 367


Process chunks:   4%|▎         | 1/27 [00:01<00:48,  1.88s/it]

Process finished pages 256 to 271
Process finished pages 400 to 415


Process chunks:   7%|▋         | 2/27 [00:02<00:21,  1.16it/s]

Process finished pages 288 to 303
Process finished pages 304 to 319Process finished pages 240 to 255

Process finished pages 272 to 287
Process finished pages 384 to 399


Process chunks:  22%|██▏       | 6/27 [00:02<00:04,  4.50it/s]

Process finished pages 416 to 431


Process chunks:  33%|███▎      | 9/27 [00:02<00:02,  7.12it/s]

Process finished pages 160 to 175
Process finished pages 208 to 223
Process finished pages 144 to 159
Process finished pages 224 to 239


Process chunks:  44%|████▍     | 12/27 [00:02<00:01, 10.03it/s]

Process finished pages 192 to 207
Process finished pages 176 to 191
Process finished pages 336 to 351


Process chunks:  59%|█████▉    | 16/27 [00:02<00:00, 13.95it/s]

Process finished pages 320 to 335
Process finished pages 80 to 95
Process finished pages 64 to 79Process finished pages 48 to 63

Process finished pages 96 to 111


Process chunks:  70%|███████   | 19/27 [00:02<00:00, 12.48it/s]

Process finished pages 128 to 143
Process finished pages 32 to 47
Process finished pages 112 to 127
Process finished pages 0 to 15


Process chunks:  85%|████████▌ | 23/27 [00:02<00:00, 16.36it/s]

Process finished pages 368 to 383
Process finished pages 16 to 31


Process chunks: 100%|██████████| 27/27 [00:03<00:00,  8.63it/s]



Assembling downsampled pages into new PDF (multiprocessing, JPEG)...


Writing pages:  22%|██▏       | 93/432 [00:00<00:00, 920.20it/s]

  Written 10/432 pages...
  Written 20/432 pages...
  Written 30/432 pages...
  Written 40/432 pages...
  Written 50/432 pages...
  Written 60/432 pages...
  Written 70/432 pages...
  Written 80/432 pages...
  Written 90/432 pages...
  Written 100/432 pages...
  Written 110/432 pages...
  Written 120/432 pages...
  Written 130/432 pages...
  Written 140/432 pages...
  Written 150/432 pages...
  Written 160/432 pages...
  Written 170/432 pages...
  Written 180/432 pages...
  Written 190/432 pages...


Writing pages:  44%|████▍     | 190/432 [00:00<00:00, 948.87it/s]

  Written 200/432 pages...


Writing pages:  66%|██████▌   | 285/432 [00:00<00:00, 848.43it/s]

  Written 210/432 pages...
  Written 220/432 pages...
  Written 230/432 pages...
  Written 240/432 pages...
  Written 250/432 pages...
  Written 260/432 pages...
  Written 270/432 pages...
  Written 280/432 pages...
  Written 290/432 pages...
  Written 300/432 pages...
  Written 310/432 pages...
  Written 320/432 pages...
  Written 330/432 pages...
  Written 340/432 pages...
  Written 350/432 pages...
  Written 360/432 pages...
  Written 370/432 pages...


Writing pages: 100%|██████████| 432/432 [00:00<00:00, 837.14it/s]


  Written 380/432 pages...
  Written 390/432 pages...
  Written 400/432 pages...
  Written 410/432 pages...
  Written 420/432 pages...
  Written 430/432 pages...
  Written 432/432 pages...

PDF processing complete (multiprocessing, JPEG).
Original file size: 260.30 MB
Downsampled file size: 65.93 MB
Downsampled file saved at: /home/bacon/PDF/dio_dwn.pdf
