# Working with Images

The Python Imaging Library (PIL) is a 3rd party Python package that adds image processing capabilities to your Python interpreter. It allows you to process photos and do many common image file manipulations. The current version of this software is in Pillow, which is a fork of the original PIL to support Python 3. Several other Python packages, such as wxPython and ReportLab, use Pillow to support loading many different image file types. You can use Pillow for several use cases including the following:

- Image processing
- Image archiving
- Batch processing
- Image display via Tkinter

## Let's get started by installing pillow

```python3 -m pip install pillow```

Install on your terminal not on notebook. You will most likely get an error message if you install here.

However, if you want to install on jupyter notebook, do this.

```import sys```
```!{sys.executable} -m pip install numpy```

In [13]:

import sys

!{sys.executable} -m venv venv
!{sys.executable} source /Users/ojo/Documents/venv/bin/activate
!{sys.executable} -m pip install pillow

/Library/Developer/CommandLineTools/usr/bin/python3: can't open file 'source': [Errno 2] No such file or directory
Defaulting to user installation because normal site-packages is not writeable
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m


## Openining Images


In [10]:
from PIL import Image

open_image = Image.open('xyz.jpeg')

open_image.show()

ImportError: dlopen(/Users/ojo/Library/Python/3.8/lib/python/site-packages/PIL/_imaging.cpython-38-darwin.so, 2): Symbol not found: _xcb_connect
  Referenced from: /Users/ojo/Library/Python/3.8/lib/python/site-packages/PIL/_imaging.cpython-38-darwin.so
  Expected in: flat namespace
 in /Users/ojo/Library/Python/3.8/lib/python/site-packages/PIL/_imaging.cpython-38-darwin.so

In [14]:
# get_image_info.py

from PIL import Image

def get_image_info(path):
    image = Image.open(path)
    print(f'This image is {image.width} x {image.height}')
    exif = image._getexif()
    print(exif)

get_image_info('ducks.jpg')

ImportError: dlopen(/Users/ojo/Library/Python/3.8/lib/python/site-packages/PIL/_imaging.cpython-38-darwin.so, 2): Symbol not found: _xcb_connect
  Referenced from: /Users/ojo/Library/Python/3.8/lib/python/site-packages/PIL/_imaging.cpython-38-darwin.so
  Expected in: flat namespace
 in /Users/ojo/Library/Python/3.8/lib/python/site-packages/PIL/_imaging.cpython-38-darwin.so

Here you get the width and height of the image using the image object. Then you use the _getexif() method to get metadata about your image. EXIF stands for "Exchangeable image file format" and is a standard that specifies the formats for images, sound, and ancillary tags used by digital cameras. The output is pretty verbose, but you can learn from that data that this particular photo was taken with a Sony 6300 camera with the following settings: "E 18-200mm F3.5-6.3 OSS LE". The timestamp for the photo is also in the Exif information.

## Cropping Images

When you are taking photographs, all too often the subject of the photo will move or you didn't zoom in far enough. This results in a photo where the focus of the image isn't really front-and-center. To fix this issue, you can crop the image to that part of the image that you want to highlight.

Pillow has this functionality built-in

In [None]:
# cropping.py

from PIL import Image

def crop_image(path, cropped_path):
    image = Image.open(path)
    cropped = image.crop((40, 590, 979, 1500))
    cropped.save(cropped_path)
    #cropped.open(cropped_path)
    
crop_image('ducks.jpg', 'ducks_cropped.jpg')
cropped_image = Image.open('xyz.extension')
cropped_image.show()

## Merging two images together

In [None]:
from PIL import Image
#Read the two images
image1 = Image.open('abc.jpg')
image1.show()
image2 = Image.open('pqr.jpg')
image2.show()
#resize, first image
image1 = image1.resize((426, 240))
image1_size = image1.size
image2_size = image2.size
new_image = Image.new('RGB',(2*image1_size[0], image1_size[1]), (250,250,250))
new_image.paste(image1,(0,0))
new_image.paste(image2,(image1_size[0],0))
new_image.save("merged_image.jpg")
new_image.show()

## Blur an Image

Blurring an image can be done by reducing the level of noise in the image by applying a filter to an image. Image blurring is one of the important aspects of image processing.

The ImageFilter class in the Pillow library provides several standard image filters. Image filters can be applied to an image by calling the filter() method of Image object with required filter type as defined in the ImageFilter class.



In [None]:
#Import required Image library
from PIL import Image, ImageFilter

#Open existing image
original_image = Image.open('boy.jpg')

#Display an existing image
original_image.show()

blur_image = original_image.filter(ImageFilter.BLUR)
blur_image.show()

#Save blurImage
blur_image.save('blur_boy.jpg')

## Working with PDF with Python

The Portable Document Format, or PDF, is a file format that can be used to present and exchange documents reliably across operating systems. While the PDF was originally invented by Adobe, it is now an open standard that is maintained by the International Organization for Standardization (ISO). You can work with a preexisting PDF in Python by using the PyPDF2 package.

PyPDF2 is a python library built as a PDF toolkit. It is capable of: 
- Extracting document information (title, author, …)
- Splitting documents page by page
- Merging documents page by page
- Cropping pages
- Merging multiple pages into a single page
- Encrypting and decrypting PDF files
- and more!

## Installing PyPDF2

```pip install pypdf2```

In [None]:
from PyPDF2 import PdfFileReader

def extract_information(pdf_path):
    with open(pdf_path, 'rb') as f:
        pdf = PdfFileReader(f)
        information = pdf.getDocumentInfo()
        number_of_pages = pdf.getNumPages()

    txt = f"""
    Information about {pdf_path}: 

    Author: {information.author}
    Creator: {information.creator}
    Producer: {information.producer}
    Subject: {information.subject}
    Title: {information.title}
    Number of pages: {number_of_pages}
    """

    print(txt)
    return information


extract_information('xyz.pdf')

## How to Merge PDFs
There are many situations where you will want to take two or more PDFs and merge them together into a single PDF. For example, you might have a standard cover page that needs to go on to many types of reports. You can use Python to help you do that sort of thing.

For this example, you can open up a PDF and print a page out as a separate PDF. Then do that again, but with a different page. That will give you a couple of inputs to use for example purposes.

In [None]:
from PyPDF2 import PdfFileReader, PdfFileWriter

def merge_pdfs(paths, output):
    pdf_writer = PdfFileWriter()

    for path in paths:
        pdf_reader = PdfFileReader(path)
        for page in range(pdf_reader.getNumPages()):
            # Add each page to the writer object
            pdf_writer.addPage(pdf_reader.getPage(page))

    # Write out the merged PDF
    with open(output, 'wb') as out:
        pdf_writer.write(out)


paths = ['document1.pdf', 'document2.pdf']
merge_pdfs(paths, 'merged.pdf')

## How to Split PDFs

There are times where you might have a PDF that you need to split up into multiple PDFs. This is especially true of PDFs that contain a lot of scanned-in content, but there are a plethora of good reasons for wanting to split a PDF.

Here’s how you can use PyPDF2 to split your PDF into multiple files:

In [None]:
from PyPDF2 import PdfFileReader, PdfFileWriter

def split(path, name_of_split):
    pdf = PdfFileReader(path)
    for page in range(pdf.getNumPages()):
        pdf_writer = PdfFileWriter()
        pdf_writer.addPage(pdf.getPage(page))

        output = f'{name_of_split}{page}.pdf'
        with open(output, 'wb') as output_pdf:
            pdf_writer.write(output_pdf)


path = 'Jupyter_Notebook_An_Introduction.pdf'
split(path, 'jupyter_page')

## How to Add Watermarks

Watermarks are identifying images or patterns on printed and digital documents. Some watermarks can only be seen in special lighting conditions. The reason watermarking is important is that it allows you to protect your intellectual property, such as your images or PDFs. Another term for watermark is overlay.

You can use Python and PyPDF2 to watermark your documents. You need to have a PDF that only contains your watermark image or text.

Let’s learn how to add a watermark now:

In [None]:
from PyPDF2 import PdfFileWriter, PdfFileReader

def create_watermark(input_pdf, output, watermark):
    watermark_obj = PdfFileReader(watermark)
    watermark_page = watermark_obj.getPage(0)

    pdf_reader = PdfFileReader(input_pdf)
    pdf_writer = PdfFileWriter()

    # Watermark all the pages
    for page in range(pdf_reader.getNumPages()):
        page = pdf_reader.getPage(page)
        page.mergePage(watermark_page)
        pdf_writer.addPage(page)

    with open(output, 'wb') as out:
        pdf_writer.write(out)

create_watermark(
        input_pdf='Jupyter_Notebook_An_Introduction.pdf', 
        output='watermarked_notebook.pdf',
        watermark='watermark.pdf')

```create_watermark()``` accepts three arguments:
- input_pdf: the PDF file path to be watermarked
- output: the path you want to save the watermarked version of the PDF
- watermark: a PDF that contains your watermark image or text

In the code, you open up the watermark PDF and grab just the first page from the document as that is where your watermark should reside. Then you create a PDF reader object using the input_pdf and a generic pdf_writer object for writing out the watermarked PDF.

The next step is to iterate over the pages in the input_pdf. This is where the magic happens. You will need to call .mergePage() and pass it the watermark_page. When you do that, it will overlay the watermark_page on top of the current page. Then you add that newly merged page to your pdf_writer object.

Finally, you write the newly watermarked PDF out to disk, and you’re done!

## How to Encrypt a PDF

PyPDF2 currently only supports adding a user password and an owner password to a preexisting PDF. In PDF land, an owner password will basically give you administrator privileges over the PDF and allow you to set permissions on the document. On the other hand, the user password just allows you to open the document.

As far as I can tell, PyPDF2 doesn’t actually allow you to set any permissions on the document even though it does allow you to set the owner password.

Regardless, this is how you can add a password, which will also inherently encrypt the PDF

In [None]:
from PyPDF2 import PdfFileWriter, PdfFileReader

def add_encryption(input_pdf, output_pdf, password):
    pdf_writer = PdfFileWriter()
    pdf_reader = PdfFileReader(input_pdf)

    for page in range(pdf_reader.getNumPages()):
        pdf_writer.addPage(pdf_reader.getPage(page))

    pdf_writer.encrypt(user_pwd=password, owner_pwd=None, 
                       use_128bit=True)

    with open(output_pdf, 'wb') as fh:
        pdf_writer.write(fh)

add_encryption(input_pdf='reportlab-sample.pdf',
                   output_pdf='reportlab-encrypted.pdf',
                   password='dao@unionbankng')

```add_encryption()``` takes in the input and output PDF paths as well as the password that you want to add to the PDF. It then opens a PDF writer and a reader object, as before. Since you will want to encrypt the entire input PDF, you will need to loop over all of its pages and add them to the writer.

The final step is to call ```.encrypt()```, which takes the user password, the owner password, and whether or not 128-bit encryption should be added. The default is for 128-bit encryption to be turned on. If you set it to False, then 40-bit encryption will be applied instead

## Working with CSV in Python

While we could use the built-in ```open()``` function to work with CSV files in Python, there is a dedicated csv module that makes working with CSV files much easier.

Before we can use the methods to the csv module, we need to import the module first using:

```import csv```

## Reading CSV files Using csv.reader()

To read a CSV file in Python, we can use the ```csv.reader()``` function


In [None]:
import csv
with open('xyz.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

In the above example, we are using the csv.reader() function in default mode for CSV files having comma delimiter.

However, the function is much more customizable.

Suppose our CSV file was using tab as a delimiter. To read such files, we can pass optional parameters to the csv.reader() function. Let's take an example.

In [None]:
import csv
with open('people.csv', 'r',) as file:
    reader = csv.reader(file, delimiter = '\t')
    for row in reader:
        print(row)

## Writing CSV files Using csv.writer()

To write to a CSV file in Python, we can use the csv.writer() function.

The csv.writer() function returns a writer object that converts the user's data into a delimited string. This string can later be used to write into CSV files using the writerow() function. Let's take an example.

In [None]:
import csv
with open('protagonist.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(["SN", "Movie", "Protagonist"])
    writer.writerow([1, "Lord of the Rings", "Frodo Baggins"])
    writer.writerow([2, "Harry Potter", "Harry Potter"])