<a href="https://colab.research.google.com/github/kilos11/AUTOMATING-STUFF-WITH-PYTHON/blob/main/Project_Combining_Select_Pages_from_Many_PDFs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Combining Select Pages from Many PDFs**#
##Say you have the boring job of merging several dozen PDF documents into a single PDF file. Each of them has a cover sheet as the first page, but you don’t want the cover sheet repeated in the final result. Even though there are lots of free programs for combining PDFs, many of them simply merge entire files together. Let’s write a Python program to customize which pages you want in the combined PDF.

##At a high level, here’s what the program will do:

##**Find all PDF files in the current working directory.*
##**Sort the filenames so the PDFs are added in order.*
##**Write each page, excluding the first page, of each PDF to the output file.*
##In terms of implementation, your code will need to do the following:

##**Call os.listdir() to find all the files in the working directory and remove any non-PDF files.*
##**Call Python’s sort() list method to alphabetize the filenames.*
##**Create a PdfFileWriter object for the output PDF.*
##**Loop over each PDF file, creating a PdfFileReader object for it.*
##**Loop over each page (except the first) in each PDF file.*
##**Add the pages to the output PDF.*
##**Write the output PDF to a file named allminutes.pdf.*

#**Step 1: Find All PDF Files**#

In [None]:
!pip install PyPDF2

Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/232.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m225.3/232.6 kB[0m [31m7.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1


In [None]:
# combinePdfs.py - Combines all the PDFs in the current working directory into
# into a single PDF
import PyPDF2, os

# Get all the PDF filenames.
pdfFiles = []

for filename in os.listdir('.'):
    if filename.endswith('.pdf'):
        pdfFiles.append(filename)
pdfFiles.sort(key = str.lower)
pdfWriter = PyPDF2.PdfWriter()


##After the shebang line and the descriptive comment about what the program does, this code imports the os and PyPDF2 modules ➊. The os.listdir('.') call will return a list of every file in the current working directory. The code loops over this list and adds only those files with the .pdf extension to pdfFiles ➋. Afterward, this list is sorted in alphabetical order with the key = str.lower keyword argument to sort() ➌.

##A PdfFileWriter object is created to hold the combined PDF pages

#**Step 2: Open Each PDF**#
##For each PDF, the loop opens a filename in read-binary mode by calling open() with 'rb' as the second argument. The open() call returns a File object, which gets passed to PyPDF2.PdfFileReader() to create a PdfFileReader object for that PDF file.

Step 3: Add Each Page

In [None]:
import PyPDF2, os

pdfFiles = []

# Loop through all the PDF files.
for filename in pdfFiles:
    pdfFileObj = open(filename, 'rb')
    pdfReader = PyPDF2.PdfReader(pdfFileObj)


#**Step 3: Add Each Page**#
##The code inside the for loop copies each Page object individually to the PdfFileWriter object. Remember, you want to skip the first page. Since PyPDF2 considers 0 to be the first page, your loop should start at 1 ➊ and then go up to, but not include, the integer in pdfReader.numPages.

In [None]:
import PyPDF2, os

# Loop through all the PDF files.
for filename in pdfFiles:
    # Loop through all the pages (except the first) and add them.
    for pageNum in range(1, len(pdfReader.pages)):
        pageObj = pdfReader.getPage(pageNum)
        pdfWriter.addPage(pageObj)

#**Step 4: Save the Results**#
##After these nested for loops are done looping, the pdfWriter variable will contain a PdfFileWriter object with the pages for all the PDFs combined. The last step is to write this content to a file on the hard drive.

In [None]:
import PyPDF2, os

# Loop through all the PDF files.
for filename in pdfFiles:
    # Loop through all the pages (except the first) and add them.
    for pageNum in range(1, pdfReader.numPages):
        # Save the resulting PDF to a file.
        pdfOutput = open('allminutes.pdf', 'wb')
        pdfWriter.write(pdfOutput)
        pdfOutput.close()

NameError: name 'pdfFiles' is not defined