# Save Web Page as PDF

We will use [pdfkit](https://github.com/JazzCore/python-pdfkit) to print web-pages as PDF files. You can also print a file as PDF using pdfkit package. Please see the [GitHub Page for pdfkit](https://github.com/JazzCore/python-pdfkit) for more details.

This notebook helps you setup the installers for the first time and then mostly run just **Step-3** every other time. Remember to **update the URL**.
<hr>

### It is as easy as just as **3-step process: <font color = 'red'>1-2-3</font>**

## Step-1

### Check which environment is active using `conda`

In [71]:
!conda env list

# conda environments:
#
base                  *  C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64
astra_env                C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\envs\astra_env
                         C:\Users\raysu\Anaconda3
                         C:\Users\raysu\Anaconda3\envs\pyforge
test_env                 C:\Users\raysu\AppData\Local\conda\conda\envs\test_env



### Install `pdfkit` with `pip`

In [72]:
!pip install -U --user pdfkit

Requirement already up-to-date: pdfkit in c:\users\raysu\appdata\roaming\python\python36\site-packages (0.6.1)


### Download and Install `wkhtmltopdf` (Just the first time)
You will also need to download and install [wkhtmltopdf](https://github.com/JazzCore/python-pdfkit/wiki/Installing-wkhtmltopdf) as per the instructions.

### Windows Users:
Download the installer from the [wkhtmltopdf downloads list](http://wkhtmltopdf.org/downloads.html) and add folder with wkhtmltopdf binary to PATH.

For Windows 10 the typical path for the installation directory will be:  
> **`C:\Program Files\wkhtmltopdf\bin`**

You will need to add this to your System Environment Variable: Path

## Step-2

### Define Printing Options

+ Do not change the parameters inside function: **`get_default_options()`**  
+ If you need to update options, change the variable: **`options`** instead.

In [75]:
# Set Printing Options
def get_default_options():
    default_options = {
        'page-size': 'Letter',
        'margin-top': '0.75in',
        'margin-right': '0.75in',
        'margin-bottom': '0.75in',
        'margin-left': '0.75in',
        'encoding': "UTF-8",
        'custom-header' : [
            ('Accept-Encoding', 'gzip')
        ],
        'cookie': [
            ('cookie-name1', 'cookie-value1'),
            ('cookie-name2', 'cookie-value2'),
        ],
        'no-outline': None
    }
    
    return default_options

default_options = get_default_options()
options = default_options
options

{'cookie': [('cookie-name1', 'cookie-value1'),
  ('cookie-name2', 'cookie-value2')],
 'custom-header': [('Accept-Encoding', 'gzip')],
 'encoding': 'UTF-8',
 'margin-bottom': '0.75in',
 'margin-left': '0.75in',
 'margin-right': '0.75in',
 'margin-top': '0.75in',
 'no-outline': None,
 'page-size': 'Letter'}

### Define Custom Printing Options (dict)

In [76]:
options = {
    'page-size': 'Letter',
    'margin-top': '1.75in',
    'margin-right': '0.75in',
    'margin-bottom': '0.75in',
    'margin-left': '0.75in',
    'encoding': "UTF-8",
    'custom-header' : [
        ('Accept-Encoding', 'gzip')
    ],
    'cookie': [
        ('cookie-name1', 'cookie-value1'),
        ('cookie-name2', 'cookie-value2'),
    ],
    'no-outline': None
}

## Step-3

### Generate PDF from URL(s)

You can pass a list of URLs instead of just one URL to be printed into a single PDF file.

In [69]:
import pdfkit
url = r"https://medium.com/@jonathan_hui/gan-why-it-is-so-hard-to-train-generative-advisory-networks-819a86b3750b"
file_name = './gen_pdf_out_3.pdf'
use_default_options = True # Note: At present custom format is not working (2018-09-12)
generate_pdf = True

# This if block takes care of suppressing pdf-generation when 
#        generate_pdf = False
# OR,
#        file_name = False
# AND,
# Uses custom printing options through variable: options, when
#        use_default_options = False
if (file_name==False) | (not(generate_pdf == True)):
    file_name = False
if use_default_options:
    pdf_file = pdfkit.from_url(url, file_name)
else:
    pdf_file = pdfkit.from_url(url, file_name, options = options)

Loading pages (1/6)
Printing pages (6/6)


### Open The PDF File

In [70]:
from IPython.display import IFrame, display
if not(file_name==False):
    print('Opening File: {}\n'.format(file_name))
    display(IFrame(file_name, width = 800, height = 300))

Opening File: ./gen_pdf_out_3.pdf

