**By Peter A. Stokes, École Pratique des Hautes Études – Université PSL**

_These are brief notes and exercises on working with TEI XML files using Python. They are intended as teaching aids for the course on 'Image Processing with Python' which is part of the Atelier de formation annuel du Consortium Cahier on the topic of 'Exploiter les corpus d'auteurs' in Poitiers, 18--20 June 2019. For more details see https://cahier.hypotheses.org/4662_

These notes assume a good knowledge of TEI XML but assume **no experience or knowledge at all** in programming. This notebook also assumes that the [Python Imaging Library (PIL)](https://pythonware.com/products/pil/) has already been installed in your Python system.

_If you are viewing this in Jupyter then you can edit the code simply by typing in the boxes. You can also execute the code in any box by clicking on the box and typing SHIFT + ENTER or using the 'Run' button in the menubar above._

# Setting the Scene

Although TEI XML is normally focussed on text, more and more Digital Scholarly Editions incorporate text and image together, often with close links between the two. There are very many examples of this, including the published version of Montaigne's _Essais_ that we have been working with in the last worksheet, the [_Proust Prototype_](http://research.cch.kcl.ac.uk/proust_prototype/) edition by Elena Pierazzo and Julie André, and many others. This combination of text and image is an area where Python gives significant advantages as there are many things that we can do here which are impossible with XSLT (and, indeed, difficult with many other programming languages). To give some examples, we've already seen our XML TEI document for _Les Essais_ de Michel de Montaigne, and we remember that each `<pb>` element has a `@facs` attribute which points to a digital image of that page. We can use Python to automatically harvest these images and manipulate them in different ways. For instance, we might want to recreate something like the [BVH text-image view](http://xtf.bvh.univ-tours.fr/xtf/view?docId=tei/B330636101_S1238/B330636101_S1238_tei.xml), with thumbnail images for each page. Alternatively, we might want to do some sort of analysis of the images, for instance looking at the ['manuscript average'](http://jessehurlbut.net/wp/mssart/?page_id=2097) of the page layout in the way that Jesse Hurlbut has done. In any case, we can use Python to automatically find, download, save and manipulate the images as well as the XML-encoded text that goes with it.

# The Python Imaging Library (PIL)

The PIL gives us a lot of basic features for manipulating images. Here are some of the key ones. Remember that you must import the PIL library before you use the commands. We don't need the whole library, and it's relatively inefficient to import everything, so let's just import some parts. We will also need a couple of libraries to help us download images from the Internet, so let's import them as well.

In [None]:
from PIL import Image
from PIL import ImageOps
from PIL import ImageChops

import requests
import io

## Opening and saving image files; showing images

To load and save images we need to tell Python where the image can be found. It's relatively easy to open a file which is already saved on our dis, but it's more interesting to download it directly from the internet.

**Be careful when saving images as you can easily overwrite your data by mistake**. If you try to save an image and a file exists already at that path and with that file name then **Python will simply overwrite your existing file without even asking**.

In [None]:
# Just use the first address for now
im_addr = "http://gallica.bnf.fr/ark:/12148/bpt6k11718168/f8.highres"

# Now we read the image. Don't worry how this works (it's complicated!), just remember that you can
# copy and paste this to download the image stored at im_addr and keep it in the variable im for use with PIL.

res = requests.get(im_addr)
im = Image.open(io.BytesIO(res.content))
                
# Now we show the image
im.show()

# Let's save a copy of the image on your computer
im.save('newimage.jpg')

## Getting basic image information

Now let's try getting some basic information about the image. You should be able to figure out how this works. Just remember that the image itself is stored in `im`, which is a PIL image, and this allows us to do certain things with it like the following:

In [None]:
print('Image size:', im.size)
print('Image height:', im.height)
print('Image width:', im.width)
print('Image format', im.format)
print('Image mode:', im.mode)
print('Image info:', im.info)

## Rotation and reflection

**Note that each of these operations creates a new image**. We store the result in a new variable. This is (usually) what we want as that means we keep the original image; otherwise we could store the result in the original variable, overwriting the original image.

In [None]:
# Flip - i.e. mirror vertically
image_flipped = ImageOps.flip(im)
image_flipped.show('Flipped image')

# Mirror - i.e. mirror horizontally
image_mirror = ImageOps.mirror(im)
image_mirror.show('Mirrored image')

# Rotate. NB that rotation is anticlockwise
image_rotate1 = im.rotate(45)
image_rotate2 = im.rotate(-45)
image_rotate_nocrop = im.rotate(135, expand=True)

image_rotate1.show()
image_rotate2.show()
image_rotate_nocrop.show()

## Resizing Images

This works much the same way, except that we have to specify the size of the new image. Note that this can cause very distorted images, unless we do the calculations to preserve the image ratio.

In [None]:
size = 400, 200

image_resized = im.resize(size)

image_resized.show()

# Let's do this properly, calculating the image ratio
new_height = 400

new_width = int((400 / im.height) * im.width)

size2 = (new_width, new_height)
image_resized2 = im.resize(size2)
image_resized2.show()

### Creating thumbnails

As well as `resize()` above, PIL provides a special way for creating thumbnails. If we want to create thumbnail images we first need to specify the size of the thumbnail. This is the maximum size in each dimension. In other words, the longest side of the image will be set to this maximum value, and the shorter side of the image will be scaled to whatever size is appropriate to keep the proportions right.

**Note that the thumbnail operation does not create a new image but overwrites the old image**. For this reason we may well want to creat a copy of the original image so we can use it again later.

In [None]:
size = 128, 128               # Set size to 128 pixels in both directions
im_thumbnail = im.copy()      # Create a copy of the image
im_thumbnail.thumbnail(size)  # Turn the image into a thumbnail

im_thumbnail.show()

## Basic image enhancement

The PIL gives us some basic functions for image enhancement. These include the examples given below. Note that this time we are not creating a new variable for each image but we are re-using one variable.

In [None]:
# Automatically enhance the contrast
im_enhanced = ImageOps.autocontrast(im)
im_enhanced.show()

# Automatically adjust the histogram
im_enhanced = ImageOps.equalize(im)
im_enhanced.show()

# Going Further

This gives you some idea of the possibilities that Python allows. Some things that you can now do include:

* Find all the images in a TEI XML file, make copies of the images, and convert the copies into black and white.
* Find the addresses of all images tagged with a given attribute in a TEI file and automatically adjust the image contrast and histogram of those images, saving the results in a new directory.
* Going through a directory and automatically convert all JPG files in the directory to thumbnails. (For instructions on how to process all the files in a directory, see ['Automatic Batch Processing of a Set of Images'](https://github.com/pastokes/MS-images/blob/master/1.%20Image%20Analysis%20with%20PIL.ipynb) in the worksheet that I prepared for a different course.

And so on. We will see more possibilities in the following worksheets, but in the meantime, use your imagination and don't be afraid to play and see how things work.


---
![Licence Creative Commons](https://i.creativecommons.org/l/by/4.0/88x31.png)
This work (the contents of this Jupyter Python notebook) is licenced under a [Creative Commons Attribution 4.0 International](http://creativecommons.org/licenses/by/4.0/)