# PDF Power Hacks: Everything You Didn’t Know You Could Do with R and Your PDF Files
### 🛡️ DIY in AI: Protect your data. Don’t upload it anywhere.

# 📊 R 🗑️ | Delete Unwanted Pages from PDFs 📄🗑️

## ❓Ever needed to share a PDF, but certain pages should *never* leave your computer? 🙃  
With R, you can filter and clean your PDFs locally — no third-party uploads required.

## 👉 Solution  
🧹 Remove specific pages from a PDF right from your R environment.  
💡 Perfect for protecting sensitive content or trimming down large documents.

## 🔧 How does it work?  
📄 We generate a sample PDF with numbered pages using `grid` and `pdf()`.  
✂️ Define which pages to delete by index.  
📥 Save a new, clean PDF with only the pages you want.

## 🔎 Why does it matter?  
🛡️ Avoid leaking sensitive or irrelevant information.  
📉 Shrink document size for easier sharing.  
🚀 Automate repetitive cleanup tasks for better workflows.

## ✨ Real-world example:  
📑 Imagine preparing a report, but pages 2 and 4 contain internal notes.  
🔒 Before sending it out, you clean it with R — safely and efficiently.

## ⚙️ Business impact:  
💼 Protects business data  
📬 Shares only what matters  
⏱️ Saves time in document handling

## 📊 Code summary  
📝 Creates a 5-page numbered PDF  
🗑️ Removes pages 2 and 4  
📄 Saves a clean PDF without uploading anything to the internet

🔗[Github](https://github.com/jcombari/AI-For-Unstructured-Data/tree/main/PDF%20Power%20Hacks)

## 💭 Thought:  
How do you clean your PDFs before sharing? What repetitive PDF tasks would you automate?

🔑 #RStats #DataScience #Automation #PDFprocessing #DataPrivacy #TechCareers #CareerGrowth #TechForGood

🔁 If you found this post useful, feel free to share it with your network.  
⚠️ Please don’t copy or repost it as your own. Respect original work.

---

# PDF Power Hacks: Todo lo que no sabías que podías hacer con R y tus archivos PDF
### 🛡️ DIY en IA: Cuida tus datos. No los subas a ningún sitio.

# 📊 R 🗑️ | Elimina páginas innecesarias de tus PDFs 📄🗑️

## ❓¿Alguna vez quisiste compartir un PDF pero había páginas que *nunca* debían salir? 🙃  
Con R puedes limpiar esos archivos de forma local, sin depender de servicios externos.

## 👉 Solución  
🧹 Elimina páginas específicas de un PDF directamente desde tu entorno en R.  
💡 Ideal para proteger contenido confidencial o reducir el tamaño de documentos.

## 🔧 ¿Cómo funciona?  
📄 Generamos un PDF de ejemplo con páginas numeradas usando `pdf()` y `grid`.  
✂️ Definimos qué páginas eliminar (por su número).  
📥 Guardamos un nuevo PDF limpio solo con las páginas que necesitas.

## 🔎 ¿Por qué importa?  
🛡️ Evita compartir datos sensibles o irrelevantes.  
📉 Reduce el tamaño de archivos para facilitar su envío.  
🚀 Automatiza tareas repetitivas para flujos de trabajo más eficientes.

## ✨ Caso práctico:  
📑 Imagina que preparaste un informe y las páginas 2 y 4 contienen notas internas.  
🔒 Antes de enviarlo, lo limpias con R, rápido y seguro.

## ⚙️ Impacto en el negocio:  
💼 Protege información de la organización  
📬 Comparte solo lo importante  
⏱️ Ahorra tiempo automatizando tareas

## 📊 Resumen del código  
📝 Crea un PDF con 5 páginas numeradas  
🗑️ Elimina las páginas 2 y 4  
📄 Guarda un PDF limpio sin subirlo a la nube

🔗[Github](https://github.com/jcombari/AI-For-Unstructured-Data/tree/main/PDF%20Power%20Hacks)

## 💭 Reflexión:  
¿Cómo gestionas tus PDFs antes de compartirlos? ¿Qué tareas repetitivas automatizarías para mejorar tu flujo de trabajo?

🔑 #RStats #DataScience #Automatización #PDFprocessing #CienciaDeDatos #IA #PrivacidadDeDatos #TechCareers #DesarrolloProfesional #TechForGood

🔁 Si te ha parecido útil, siéntete libre de compartirlo con tu red.  
⚠️ Por favor, no copies ni publiques este contenido como propio. Respeta el trabajo original.

![image.png](attachment:50538e06-eb67-47d2-92e0-f3dd2bdc0df4.png)

In [9]:
# Suppress startup messages when loading the pdftools package
suppressPackageStartupMessages(library(pdftools))

# Load grid and gridExtra for graphical functions and layout
library(grid)
library(gridExtra)

# Function to create a sample PDF with numbered pages
create_sample_pdf <- function(path, total_pages = 5) {
  suppressWarnings({
    # Open a new PDF device with standard US letter size (8.5 x 11 inches)
    pdf(path, width = 8.5, height = 11)
    
    # Loop through the number of pages and add page number text to each
    for (i in 1:total_pages) {
      grid.newpage()  # Start a new page
      grid.text(paste("Page", i), x = 0.2, y = 0.95, gp = gpar(fontsize = 14))  # Draw page number near top-left
    }
    
    dev.off()  # Close the PDF device to save the file
  })
}

# File path for the sample PDF to be created
sample_pdf_path <- "20250618_sample.pdf"

# Create the sample PDF with default 5 pages
create_sample_pdf(sample_pdf_path)

# Read the sample PDF and extract the text content from each page
suppressWarnings({
  pdf_text_list <- pdf_text(sample_pdf_path)
})

# Define which pages to remove (note: R indexes pages starting at 1)
pages_to_remove <- c(2, 4)

# Keep only pages not in the removal list
pages_to_keep <- pdf_text_list[-pages_to_remove]

# Create a new cleaned PDF containing only the pages we kept
suppressWarnings({
  # Open a new PDF device for the cleaned output
  pdf("20250618_cleaned_sample.pdf", width = 8.5, height = 11)
  
  # For each page's text that we want to keep
  for (page_text in pages_to_keep) {
    grid.newpage()  # Start a new page
    # Write the page text on the page, aligned to the left near the top
    grid.text(page_text, x = 0.1, y = 0.9, just = "left", gp = gpar(fontsize = 12))
  }
  
  dev.off()  # Close the PDF device to save the cleaned PDF
})
