<a href="https://colab.research.google.com/github/olga-terekhova/pdf-utilities/blob/main/CopyPDF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Copy PDF

## How to use

To **copy** some pages from a PDF file into a new PDF file:  
1) Prepare the PDF file that you want to copy.  
2) Upload the pdf file that you want to copy into the root directory of Files area. Upload one file only. E.g. *input.pdf*.  
3) In the [Set parameters](#scrollTo=XaMoALpy6JHx&line=7&uniqifier=1) section, set the preferred name for the copied pdf. E.g. *output.pdf*.   
4) In the [Set parameters](#scrollTo=XaMoALpy6JHx&line=7&uniqifier=1) section, set *'copy'* for the choice between *copy* or *delete* (you may use the  dropdown field).   
5) In the [Set parameters](#scrollTo=XaMoALpy6JHx&line=7&uniqifier=1) section, specify a range of pages to be copied. You can use comma (,) for individual pages or dash (-) for ranges. Spaces are allowed but not needed. E.g. *'1,3'* or *'1,5-7,9'* or *'4, 2,5-10'* each will create one new output file but with some pages dropped or reodered.      
6) Run all cells in the notebook (Runtime - Run all or Ctrl-F9).  
7) Download the output pdf from the Files area (Refresh to see the newly created copied file).


If you need to copy another file, **delete** current PDF files first.
For that:  
1) In the [Set parameters](#scrollTo=XaMoALpy6JHx&line=7&uniqifier=1) section, set 'delete' for the choice between copy or delete (you may use the dropdown field).  
2) Run all cells in the notebook (Runtime - Run all or Ctrl-F9).  

In [15]:
# @title Set parameters

copied_pdf_path = 'output.pdf' # @param {type:"string"}
copy_or_delete = "copy" # @param ["copy", "delete"]
page_range = "5,5,1-3" # @param {type:"string"}

print(copied_pdf_path)
print(copy_or_delete)
print(page_range)

output.pdf
copy
5,5,1-3


## Code (you can collapse this section)

### Install, import, initialize  

In [1]:
!pip install -q PyPDF2

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/232.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m225.3/232.6 kB[0m [31m15.9 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
import os
import PyPDF2

### Rotate the PDF file

In [5]:
def get_file():
  """
  Get the first PDF file in the current directory.
  Return the file name and a message.
  """
  pdf_files = []
  for filename in os.listdir():
      if filename.endswith('.pdf'):
          pdf_files.append(filename)

  if copied_pdf_path in pdf_files:
    return "", "File " + copied_pdf_path + " already exists. No action taken. Do you want to delete PDF files first?"

  if len(pdf_files) == 0:
    return "", "No PDF files found. No action taken."

  # sort pdf_files in the alphabetical order
  pdf_files.sort()

  # take the first PDF file
  pdf_file = pdf_files[0]
  print(pdf_file)

  return pdf_file, "OK"

In [6]:
def parse_page_range(page_range):
    """
    Parse a string like '2, 5-7,9' into a list of page numbers.
    This list preserves the order for copying.

    :param page_range: String representing the page range. Input by the user.
    Return a list of page numbers (0-based).
    """

    pages = []

    # Remove all spaces from the input string
    page_range = page_range.replace(' ', '')

    # Split the string by commas
    ranges = page_range.split(',')

    for r in ranges:
        if '-' in r:
            start, end = map(int, r.split('-'))
            pages.extend(range(start - 1, end))  # Convert to 0-based index
        else:
            pages.append(int(r) - 1)  # Convert to 0-based index


    return pages

In [8]:
def copy_selected_pages(output_pdf, page_range):
    """
    Copy selected pages in the PDF in the user-defined order.

    :param output_pdf: Path to output PDF. Input by the user.
    :param page_range: Pages to copy in string format (e.g., '2,5-7,9'). Input by the user.

    Return a message.
    """

    # Get the input PDF file (a first PDF file found in the root directory)
    input_pdf, input_pdf_response = get_file()
    if input_pdf == "":  # no file to process
        print(input_pdf_response)
        return input_pdf_response

    # Parse the page range string
    pages_to_copy = parse_page_range(page_range)  # Returns a list of pages to copy in the output PDF

    # Open the PDF file
    with open(input_pdf, 'rb') as file:
        reader = PyPDF2.PdfReader(file)
        writer = PyPDF2.PdfWriter()

        # Loop through the list of user-selected pages and copy them
        for page_num in pages_to_copy:
            if page_num < len(reader.pages):  # Ensure the page number is within the document
                writer.add_page(reader.pages[page_num])

        # Write the copied pages to a new PDF
        with open(output_pdf, 'wb') as output_file:
            writer.write(output_file)

    message = f"Copied pages saved as:\n{output_pdf}\n\nRefresh the Files area and locate {output_pdf}."
    print(message)

    return message

### Delete all PDF files from Files

In [9]:
def delete_pdfs():
  """
  Delete all PDF files in the current directory.
  Return a message.
  """

  # Create a list of all PDF files in the current directory
  pdf_files = []
  for filename in os.listdir():
      if filename.endswith('.pdf'):
          pdf_files.append(filename)

  print(pdf_files)

  if len(pdf_files) == 0:
    return "No PDF files found. No action taken."

  # Delete all files in the pdf_files

  for filename in pdf_files:
      os.remove(filename)

  pdf_files_str = ', \n'.join(pdf_files)

  return "Files deleted:\n" + pdf_files_str + "\n\nRefresh the Files area and check that it has no PDF files."

### Run the chosen option

In [16]:
# Run the rotate or delete process depending on the user choice
if copy_or_delete == "copy":
  result = copy_selected_pages(copied_pdf_path, page_range)
elif copy_or_delete == "delete":
  result = delete_pdfs()



Medical Forms - Filled.pdf
Copied pages saved as:
output.pdf

Refresh the Files area and locate output.pdf.


## Result

In [17]:
print(result)

Copied pages saved as:
output.pdf

Refresh the Files area and locate output.pdf.
