<a href="https://colab.research.google.com/github/olga-terekhova/pdf-utilities/blob/main/MergePDFs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Merge PDFs

## How to use

To **merge** PDF files:  
1) Prepare the PDF files that you want to merge. The utility will merge them in the alphabetical order. Rename files if needed.  
2) Upload all pdf files that you want to merge into the root directory of Files area.  
3) In the [Set parameters](#scrollTo=jAQ3-5A8kCPi&line=2&uniqifier=1) section, set the preferred name for the merged pdf.  
4) In the [Set parameters](#scrollTo=jAQ3-5A8kCPi&line=2&uniqifier=1) section, set 'merge' for the choice between merge or delete (you may use the  dropdown field).  
5) Run all cells in the notebook (Runtime - Run all or Ctrl-F9).  
6) Download the output pdf from the Files area (Refresh to see the newly created merged file).

If you need to merge another batch of files, **delete** current PDF files first.
For that:  
1) In the [Set parameters](#scrollTo=jAQ3-5A8kCPi&line=2&uniqifier=1) section, set 'delete' for the choice between merge or delete (you may use the dropdown field).  
2) Run all cells in the notebook (Runtime - Run all or Ctrl-F9).

## Parameterize  

In [138]:
# @title Set parameters

merged_pdf_path = 'merged.pdf' # @param {type:"string"}
merge_or_delete = "merge" # @param ["merge", "delete"]

print(merged_pdf_path)
print(merge_or_delete)

merged.pdf
merge


## Code

### Install, import, initialize  

In [139]:
!pip install -q PyPDF2

In [140]:
import os
import PyPDF2

### Merge all PDF files in Files

In [141]:
def merge_pdfs():
  # create a list of all PDF files in the current directory

  pdf_files = []
  for filename in os.listdir():
      if filename.endswith('.pdf'):
          pdf_files.append(filename)

  print(pdf_files)

  if merged_pdf_path in pdf_files:
    return "File " + merged_pdf_path + " already exists. No action taken. Do you want to delete PDF files first?"

  if len(pdf_files) == 0:
    return "No PDF files found. No action taken."

  # sort pdf_files in the alphabetical order

  pdf_files.sort()
  print(pdf_files)

  # merge all the files in the pdf_files list into a single pdf

  merger = PyPDF2.PdfMerger()

  for filename in pdf_files:
      merger.append(filename)

  merger.write(merged_pdf_path)

  # get the string of filenames in pdf_files
  pdf_files_str = ', \n'.join(pdf_files)

  return "Files merged:\n" + pdf_files_str + " \n\nRefresh the Files area and locate " + merged_pdf_path + "."

### Delete all PDF files from Files

In [142]:
def delete_pdfs():

  # create a list of all PDF files in the current directory

  pdf_files = []
  for filename in os.listdir():
      if filename.endswith('.pdf'):
          pdf_files.append(filename)

  print(pdf_files)

  if len(pdf_files) == 0:
    return "No PDF files found. No action taken."

  # delete all files in the pdf_files

  for filename in pdf_files:
      os.remove(filename)

  pdf_files_str = ', \n'.join(pdf_files)

  return "Files deleted:\n" + pdf_files_str + "\n\nRefresh the Files area and check that it has no PDF files."

### Run the chosen option

In [143]:
if merge_or_delete == "merge":
  result = merge_pdfs()
elif merge_or_delete == "delete":
  result = delete_pdfs()

['1GMWC_Instructor-562-coins.pdf', '2GMWC_Instructor-562-notes-1.pdf', '2GMWC_Instructor-562-notes-2.pdf', 'merged.pdf']


## Result

In [144]:
print(result)

File merged.pdf already exists. No action taken. Do you want to delete PDF files first?
