<a href="https://colab.research.google.com/github/olga-terekhova/pdf-utilities/blob/main/Pdf_to_Spreads.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# User Guide

This is a notebook that converts a PDF file with one page per spread (optimized for printing) into a PDF file with two pages per spread (optimized for reading on screen).  

Steps:

1. Upload a pdf file into Google Colab:  
≡ Menu on the left - 📁 Files - ⬆️ Upload to session storage.  
Put the file right in the root, not in any subfolders.

2. Change the book name of the file here (including the 'pdf' extenstion):


In [12]:
book_name = 'MapsR.pdf' 

3. Should there be two pages per spread?  
If Yes: change double_page to True.  
If No: change double_page to False.

In [13]:
double_page = True 

4. If there are two pages per spread, should double-paged spreads start from the first page?  
If Yes: change double_start to 1  
If No: change double_start to 2  
If there is only one page per spread, this parameter can be any value  

In [14]:
double_start = 1

5. Set the width of one page in points. If you put around 2000, that's quite good quality allowing for some zoom in.

In [15]:
page_width = 2000

6. You may set a limit for PDF generation for testing purposes. If you see that the book ends abruptly, check that this parameter is greater than the number of pages. 

In [16]:
last_page_pdf = 100 

7. Choose Runtime - Run all or press Ctrl-F9.  
You can collapse the Code body section (press the triangle on the left of the heading) and watch for progress in the Result section.   
In the Result section you should see the message with the number of spreads in the final book and the name of the new file.  
Go to Files in the menu on the left to download the new file. 

8. To process a new file, just start from Step 1. 

# Code body

## Constants

In [17]:
# Constants

result_ok = "Finished"
output_folder = 'output_pages'

## Set up installations using Shell

In [18]:
# Create output folder for pics

!mkdir "$output_folder"

mkdir: cannot create directory ‘output_pages’: File exists


In [19]:
# Install poppler-utils - needed for the pdf2image library

!apt-get install poppler-utils

Reading package lists... Done
Building dependency tree       
Reading state information... Done
poppler-utils is already the newest version (0.86.1-0ubuntu1.1).
0 upgraded, 0 newly installed, 0 to remove and 24 not upgraded.


In [20]:
# Install libraries

!pip install pdf2image

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


## Import libraries

In [21]:
# Import libraries

from pdf2image import convert_from_path, convert_from_bytes
from PIL import Image
from io import BytesIO
import math
import os


## Convert from PDF into images and back to PDF

In [22]:
# Convert PDF into images and return the list of the paths

image_list = convert_from_bytes(open(book_name, 'rb').read(), size=page_width, output_folder = output_folder, output_file = 'page', paths_only= True)
print(image_list)

['output_pages/page0001-001.ppm', 'output_pages/page0001-002.ppm', 'output_pages/page0001-003.ppm', 'output_pages/page0001-004.ppm', 'output_pages/page0001-005.ppm', 'output_pages/page0001-006.ppm', 'output_pages/page0001-007.ppm', 'output_pages/page0001-008.ppm', 'output_pages/page0001-009.ppm', 'output_pages/page0001-010.ppm', 'output_pages/page0001-011.ppm', 'output_pages/page0001-012.ppm', 'output_pages/page0001-013.ppm', 'output_pages/page0001-014.ppm', 'output_pages/page0001-015.ppm', 'output_pages/page0001-016.ppm', 'output_pages/page0001-017.ppm', 'output_pages/page0001-018.ppm', 'output_pages/page0001-019.ppm', 'output_pages/page0001-020.ppm', 'output_pages/page0001-021.ppm', 'output_pages/page0001-022.ppm', 'output_pages/page0001-023.ppm', 'output_pages/page0001-024.ppm', 'output_pages/page0001-025.ppm', 'output_pages/page0001-026.ppm', 'output_pages/page0001-027.ppm', 'output_pages/page0001-028.ppm', 'output_pages/page0001-029.ppm', 'output_pages/page0001-030.ppm', 'output_p

In [23]:
# Calculate the number of pages

total_pages_cnt = len(image_list)

print(total_pages_cnt)

# If there's a limit for pages generated, cap the number of pages

total_pages = min(last_page_pdf, total_pages_cnt)  
print (total_pages)

182
100


In [24]:
# Calculate the number of spreads

if double_page:
    if double_start == 1:
        spread_number = math.ceil(total_pages / 2)
    elif double_start == 2:
        spread_number = math.ceil((total_pages + 1) / 2)
    else:
        print('Invalid value for double_start')
else:
    spread_number = total_pages

print(f'Number of spreads: {spread_number}')

Number of spreads: 50


In [25]:
# Assign page paths to spreads

spread_list = []
for i in range ( 1, spread_number + 1): 
  if double_page == False: 
    spread_list.append([image_list[i - 1]])
  elif double_page == True and double_start == 1:
    page_num_left = (i * 2) - 1
    page_num_right = (i * 2) 
    if page_num_left == 0:
      spread_list.append([image_list[page_num_right - 1] ])
    elif page_num_right > total_pages:
      spread_list.append([image_list[page_num_left - 1] ])
    else:
      spread_list.append([image_list[page_num_left - 1], image_list[page_num_right - 1]])
  elif double_page == True and double_start == 2:
    page_num_left = (i * 2) - 2
    page_num_right = (i * 2) - 1
    if page_num_left == 0:
      spread_list.append([image_list[page_num_right - 1] ])
    elif page_num_right > total_pages:
      spread_list.append([image_list[page_num_left - 1]])
    else:
      spread_list.append([image_list[page_num_left - 1], image_list[page_num_right - 1]])

print (spread_list)

[['output_pages/page0001-001.ppm', 'output_pages/page0001-002.ppm'], ['output_pages/page0001-003.ppm', 'output_pages/page0001-004.ppm'], ['output_pages/page0001-005.ppm', 'output_pages/page0001-006.ppm'], ['output_pages/page0001-007.ppm', 'output_pages/page0001-008.ppm'], ['output_pages/page0001-009.ppm', 'output_pages/page0001-010.ppm'], ['output_pages/page0001-011.ppm', 'output_pages/page0001-012.ppm'], ['output_pages/page0001-013.ppm', 'output_pages/page0001-014.ppm'], ['output_pages/page0001-015.ppm', 'output_pages/page0001-016.ppm'], ['output_pages/page0001-017.ppm', 'output_pages/page0001-018.ppm'], ['output_pages/page0001-019.ppm', 'output_pages/page0001-020.ppm'], ['output_pages/page0001-021.ppm', 'output_pages/page0001-022.ppm'], ['output_pages/page0001-023.ppm', 'output_pages/page0001-024.ppm'], ['output_pages/page0001-025.ppm', 'output_pages/page0001-026.ppm'], ['output_pages/page0001-027.ppm', 'output_pages/page0001-028.ppm'], ['output_pages/page0001-029.ppm', 'output_pages

In [26]:
# Define function to generate name for the new PDF file 

def convert_book_name(filename):
    base_name, extension = os.path.splitext(filename)
    new_filename = base_name + "_spreads" + extension
    return new_filename

In [27]:
# Save pictures as PDF

book_name_spreads = convert_book_name(book_name)

for index, spread_link in enumerate(spread_list):
  
  print(index + 1)

  # If one page per spread, open one page:

  if len(spread_link) == 1:
    url_pic = spread_link[0]
    img = Image.open(spread_link[0])
        
  # If two pages per spread, open two pages and merge them:

  elif len(spread_link) == 2:
    
    # Open pictures
    img1 = Image.open(spread_link[0])
    img2 = Image.open(spread_link[1])
    
    # Resize the images to have the same height
    height = min(img1.height, img2.height)
    img1 = img1.resize((int(img1.width * height / img1.height), height))
    img2 = img2.resize((int(img2.width * height / img2.height), height))

    # Create a new image to hold the merged images
    img = Image.new('RGB', (img1.width + img2.width, height))

    # Paste the images side by side
    img.paste(img1, (0, 0))
    img.paste(img2, (img1.width, 0))

    # Close intermediary pictures
    img1.close()
    img2.close()

  else:
    print("Wrong array")
    break

  # Save into PDF (new for the first spread, append for subsequent ones)

  if index == 0:
    img.save(book_name_spreads, "PDF" ,resolution=100.0, append = False)
  else:
    img.save(book_name_spreads, "PDF" ,resolution=100.0, save_all=True, append = True)
  img.close()

print('Finished')

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Finished


## Remove temp files using Shell

In [28]:
# Removing temp files

!rm -r "$output_folder"

# Result

In [29]:
# End messages

print(result_ok)
print('Spreads: ', spread_number)
print('File generated: ', book_name_spreads)

Finished
Spreads:  50
File generated:  MapsR_spreads.pdf
