# PDF Conversion in Jupyter Notebook

Welcome to this Jupyter Notebook! In this notebook, we will demonstrate how to create a PDF converter using Python libraries. The converter will handle the conversion of PDF files and provide a simple user interface for ease of use.

- Dev by Kao Panboonyuen

In [1]:
import fitz  # PyMuPDF
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
from PyPDF2 import PdfReader, PdfWriter
import io
import tempfile
import os

## PDF Conversion Function

This section contains the code for converting PDF files. We will use the `PyPDF2` library to read and write PDFs.

In [None]:
def optimize_pdf(input_pdf_path, output_pdf_path, dpi=300):
    # Open the original PDF
    pdf_document = fitz.open(input_pdf_path)
    
    # Create a new PDF with optimized content
    output = PdfWriter()
    
    for page_num in range(len(pdf_document)):
        page = pdf_document.load_page(page_num)
        pix = page.get_pixmap(matrix=fitz.Matrix(dpi / 72, dpi / 72))  # Set resolution
        
        # Create a temporary file to save the image
        with tempfile.NamedTemporaryFile(delete=False, suffix='.png') as temp_img_file:
            img_path = temp_img_file.name
            pix.save(img_path)
        
        # Convert image to PDF page using ReportLab
        packet = io.BytesIO()
        can = canvas.Canvas(packet, pagesize=(pix.width, pix.height))
        can.drawImage(img_path, 0, 0, width=pix.width, height=pix.height)
        can.save()
        
        # Merge image PDF into the new PDF
        packet.seek(0)
        new_pdf = PdfReader(packet)
        output.add_page(new_pdf.pages[0])
        
        # Remove the temporary image file
        os.remove(img_path)
    
    # Write the optimized PDF to output
    with open(output_pdf_path, "wb") as f:
        output.write(f)
    
    print(f"PDF optimized and saved to {output_pdf_path}")

def main(input_pdf_path, output_pdf_path):
    optimize_pdf(input_pdf_path, output_pdf_path)


## Interactive User Interface

In this section, we will create an interactive user interface using IPython widgets. This will allow us to input file paths and trigger the PDF conversion process through a graphical interface.

In [None]:
input_pdf_path = 'paper/Panboonyuen_REG_Refined_Generalized_Focal_Loss.pdf'
output_pdf_path = 'Panboonyuen_REG_Refined_Generalized_Focal_Loss_toArxiv.pdf'

main(input_pdf_path, output_pdf_path)