**By Peter A. Stokes, École Pratique des Hautes Études – Université PSL**

_These are brief notes and exercises on working with TEI XML files using Python. They are intended as teaching aids for the course on 'Image Processing with Python' which is part of the Atelier de formation annuel du Consortium Cahier on the topic of 'Exploiter les corpus d'auteurs' in Poitiers, 18--20 June 2019. For more details see https://cahier.hypotheses.org/4662_

These notes assume a good knowledge of TEI XML but assume **no experience or knowledge at all** in programming.

_If you are viewing this in Jupyter then you can edit the code simply by typing in the boxes. You can also execute the code in any box by clicking on the box and typing SHIFT + ENTER or using the 'Run' button in the menubar above._

# Python basics

This section gives a very brief summary of the basics of the Python 3 programming language. If you already know Python 3 then you can skip this section. If you know an earlier version of Python such as Python 2.7, then you can also skip this section, but be aware that `print 'hello'` is no longer valid: in Python 3 you must always include the parentheses, so `print('hello')` instead.

## Variables

To create a new variable for storing data in memory, simply provide a unique name for that variable and use `=` to assign the content. Note that you can re-assign different content to an existing variable, in which case the new content will simply replace the old (hence the name 'variable'). You can have as many different variables as you like, as long as your computer doesn't run out of RAM. This is unlikely with modern computers, but it is possible if you have very large images.

Notice also the `#` symbol. This is to signal a 'comment': i.e. everything after `#` on that line is a comment for us humans to read and so will be ignored by Python. It is very good practice to add comments as a reminder to you and a message to others of what your code does. You will be grateful when you come back to your code in a year's time!

In [None]:
a = 1      # Stores the integer (whole number) value 1 in the variable a
b = 2.0    # Stores the decimal value 2.0 in the variable b

c = a + b  # Stores the decimal value 3.0 (1 + 2.0) in the variable c
c = c + 1  # Stores the decimal value 4.0 (3.0 + 1) in the variable c

d = c / b  # Stores the decimal value of c divided by b into the variable d
e = b * c  # Stores the decimal value of b multiplied by c into the variable f

print(c)   # Prints the value currently stored in c (i.e. 4)
print(d)
print(e)


Variables do not have to contain numbers but they can contain many things, including images (as we will see soon). Another common type of data is a string, namely a series of characters:

In [None]:
s1 = 'This is a string. It must be enclosed in single quotes.'
s2 = 'The single quotes tell Python that it is a string.'
s3 = 'Otherwise, Python might think that it is the name of a variable.'

print(s1)
print(s2)
print(s3)

print('You can also print a string directly without storing it first.')
print(s1, s2, s3) # Notice what happens here

## Libraries

If you want to use a library of existing code then you **must** first tell Python to load it into your system **before** you use the library code. You can import an entire library, but normally you only import specific parts from that library. For this you use the `import` or `from ... import` command. You can also add `as` to give the library a short name if you want, as we do in the example below: `matplotlib.pyplot` is a pain to type so we give it the name `plt` for short. These are the libraries that we will be using today:

In [None]:
from PIL import Image
from PIL import ImageOps
from PIL import ImageChops
import matplotlib.pyplot as plt

## Lists

Lists are a more complex type of data, but one that is very useful as it allows us to store a list of things in a single variable. To do this we use square brackets, with the contents of the list in the brackets separated by commas. We can have lists of anything we want: integers, decimal numbers, strings, images, ...

In [None]:
list1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
list2 = [1.0, 2.6, 3.3, 4.7, 5.1, 6.7]
list3 = ['a', 'b', 'c', 'd', 'e']
list4 = ['bonjour', 'au revoir', 'ça va ?', 'très bien']

print(list1)
print(list2)
print(list3)
print(list4)

At times we may want to access specific items in the list. To do this we use the following system:

In [None]:
list1[0]      # Gives us the first item in the list.
list2[-1]     # Gives us the last item in the list
list3[0:3]    # Gives us the first three items in the list
list3[:3]     # Also gives us the first three items in the list
list4[-3:]    # Gives us the last three items in the list
list1[2:5]    # Gives us the third (!) through fifth items in the list

# Let's test it:
print(list1[0])      # Gives us the first item in the list.
print(list2[-1])     # Gives us the last item in the list
print(list3[0:3])    # Gives us the first three items in the list
print(list3[:3])     # Also gives us the first three items in the list
print(list4[-2:])    # Gives us the last two items in the list
print(list1[2:5])    # Gives us the third (!) through fifth items in the list



## Loops

One of programming's biggest strengths is being able to do things again and again, automatically and very quickly. To do this, we need to use loops: that is, we tell Python:

For every item i in a list l:
   * Do this
   * Do that
   * Etc.

Let's demonstrate this with a simple list. Let's say we want to do the following:

1. Create a list of five numbers and save it in a variable.
1. `For` each number `in` the list:
   1. Add ten to it
   1. Print it out
1. After we've gone through the whole list then print a message ("Finished!")
   
This is what the code looks like:

In [None]:
list = [2, 4, 6, 8, 10]

for n in list:
    temp = n + 10
    print(temp)
print("Finished!")

---
![Licence Creative Commons](https://i.creativecommons.org/l/by/4.0/88x31.png)
This work (the contents of this Jupyter Python notebook) is licenced under a [Creative Commons Attribution 4.0 International](http://creativecommons.org/licenses/by/4.0/)