<a href="https://colab.research.google.com/github/esohman/EADH/blob/main/EADH_intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Python for Digital Humanities

Welcome to the introduction to Python for Digital Humanities notebook.
This notebook has been created by Emily Öhman, Assistant Professor of Digital Humanities at Waseda University, Japan, and funded by the European Association for Digital Humanities. This is the pre-release version. Changes are possible.

This is the first step to learning Python. The target audience for these tutorials is digital humanities, and other cross-disciplinary scholars and students who are not aiming to become programmers as such, but who wish to gain code literacy and the tools to do most of the computational side of their research themselves.

If you already know the basics of Python you can move onto the next notebook (Beginner, Intermediate, or Advanced). If you know another programming language, you can use this notebook as a quick guide to Python syntax.

There are many online tutorials and courses out there, some of them are even aimed at digital humanities scholars. This notebook differs from those in at least three respects: 1) There is a clear progression from total beginner up to machine learning, 2) it is interactive and requires no installations, 3) existing tools are linked to in order to minimize unecessary overlap.

This notebook gives a cursory overview of Python concepts with links to more in-depth sources of information. For this first part, most of the links will be to the [w3schools Python page](https://www.w3schools.com/python/default.asp). This is a great free resource that lists all the basic functionalities of Python and allows you to run example code directly in your browser.

This first notebook covers the topics of:

1. Introduction to different data types and the basics of programming.
2. Introduction to loops.
3. Using packages, opening, modifying, and saving files.

You can run Python code using a terminal-based application or an interpreter, or a notebook such as this. Most things work exactly the same whether you run them here or somewhere else. The code is identical.

**To run code** on this notebook you should save a copy to your own drive (File -> save a copy in Drive) or if you don't have a Google account, you can use this notebook in Jupyter notebook. Once you've done that you can start playing with it. We recommend saving this notebook in your Drive and using it with Google Colab for simplicity and ease.

Click on the "play" button on the left side of the next 'box', these boxes are refered to as cells. (The button appears when you hover your cursor over the [    ] brackets.) Or choose Runtime -> and then the appropriate choice.

There's a lot of information in the beginning. These are concepts we will come back to over and over again and you do not need to remember everything from the get go.

## The first code

The cell below this one is a code cell.
```
# the hashtag indicates a comment. 
#This will not be interpreted as code and can be used to clarify code
```
The only code here is 
````
1+2
````
As a sidenote, if you run your code as a script in a terminal, you will need to specify that you want to print to console. On notebooks such as these, everything is output by default. We will return to the print function later, but for starters, to print the results of 1+2 in a terminal environment, you would type:
```
print(1+2)
```


In [None]:
#You can change these numbers to whatever you want. Click the "play" button again to get the answer.
#btw, the hashtag indicates a comment that is to be ignored by the interpreter 
#comments make code easier to understand as you can explain in plain language what your code does

#here we are simply adding 1 and 2, click play for results
print(1+4)

5


Python syntax is minimalistic compared to many other programming languages. You can read more about the basics of Python syntax here: https://www.w3schools.com/python/python_syntax.asp

One important thing to note is that Python uses indentation (four spaces, or one tab) to show what belongs with what. This is very important for code functionality and inmost cases your code will not even run if you are using the wrong indentation. We will discuss this more in later sections, but for now, all our code will be on the same "level" and therefore no indentation is necessary.

### Variables and the print function
To do the same thing using variables:
we can store the values 1 and 2 respectively in variables. Let's call these variables a and b. In most programming languages, including Python, values are assigned using a single 'equals sign': =

You can read more about variables here: https://www.w3schools.com/python/python_variables.asp

In [None]:
a = 1  #a and b are called variables. Here we assign them values. 
b = 2 #the value of a is now '1' and the value of b is now '2'
print(a+b) #In most environments in order for our program to print any results to the screen, we need to use the print function.
print('The result of ' + str(a) + '+' + str(b) + ' is: ' + str(a+b)) #You can print it like this
print(f'The result of {a}+{b} is: {a+b}')  #But it's usually much easier to do it like this:

#Let's change the value of one of the variables
a = 5
print(f'The result of {a}+{b} is: {a+b}')  #Let's print it again.

3
The result of 1+2 is: 3
The result of 1+2 is: 3
The result of 5+2 is: 7


The print command prints whatever you tell it to print. You use it with parentheses. As you can see, we used two slightly different print commands:
```
print()
```
and
```
print(f'')
```
The benefit of using print(f'') is that you can easily combine different datatypes and even operations when printing. When using print(f'') you can use curly brackets to print variables within the string. In this tutorial we will mainly be using print(f'').

If you want to print actual text (i.e. not variables) you need to use quotation marks, single or double both work.

In [None]:
print('Hello!')

Hello!


You have probably noticed that different words appear in different colors. This is a design feature of most interpreters. Literals, variables, functions, comments, etc. all appear in different colors to make code more legible to humans.

#Datatypes
Almost all programming languages have different data types. You an read more about data types in Python here: https://www.w3schools.com/python/python_datatypes.asp 

The most basic types are:


##Integers 
Whole numbers, no decimals. (By clicking 'play' you are assigning the given values to the variables. If you do not click 'play', the variables int_a etc will be undefined and the program will not know what they mean and will return an error.)

In [None]:
int_a = 1
int_b = 5
int_c = -5


##Floats 
Numbers with decimals. Note that the decimal point is a period, not a comma.

In [None]:
fl_a = 0.1234
fl_b = -5.78899
fl_c = 1234.6543

Floats, integers and other numerical types support all basic mathematical operations. Note that when mixing numerical types like integers and floats, the result will be a float.

In [None]:
print('Addition: ', int_a + fl_a)
print('Substraction: ', fl_b - fl_c)
print('Multiplication: ', int_c * fl_c)
print('Division: ', fl_c / int_c)
print('Exponentiation: ', int_b ** int_a)

Addition:  1.1234
Substraction:  -1240.44329
Multiplication:  -6173.2715
Division:  -246.93086
Exponentiation:  5


In [None]:
# Be careful not to divide by zero!
int_a / 0

ZeroDivisionError: ignored

Notice how the error message tells you what kind of error this is and gives you extra details about it as well (alhtough in this case it is quite straight-forward). It also shows where the error happened in the code under "Traceback" (sometimes this can occur in code that you have imported from an other library, this typically means that there is something off about your datatype) and draws a red squiggly line under the error in the code cell.

Whether you are running this code on Colab or your own computer (in the terminal, using PyCharm or VSCode), the error message will still look pretty much the same.

##Strings
Strings are basically text. The text can include numbers.
Strings are defined using either simple or double quotation marks.


In [None]:
string_a = '' # this is an empty string
string_b = 'Awesomeness!' #this is a non-empty string

In [None]:
'This is a string. It can be long or short. It can include numbers like 10. It can also inlcude "quotations marks", they just have to be different from the main quotation marks.'

'This is a string. It can be long or short. It can include numbers like 10. It can also inlcude "quotations marks", they just have to be different from the main quotation marks.'

Strings can be added and multiplied, much like numbers:

In [None]:
a = 'kitty'
print(a + 'cat')
print(a*10)

kittycat
kittykittykittykittykittykittykittykittykittykitty


All elements of a string have indexes. These start from 0. You can access elements of a string using positive or negative numbers within square brakets. You can also slice a string using square brackets and colons.

**All indexes in Python start at 0. The first element of a list or string is always 0. In a list of 10 items, the indexes run from 0-9.**

You can read more about strings and "slicing" (using square brackets to access different parts of a string) here: https://www.w3schools.com/python/python_strings.asp

In [None]:
string = 'kittycat'
print('First letter: ', string[0])
print('Last letter: ', string[-1])
print('Length of the string: ', len(string))
print('Some slices: ', string[:6], string[::2])

First letter:  k
Last letter:  t
Length of the string:  8
Some slices:  kitty ktya


**NB!** Notice that you do not have to explicitly state the type of a variable in Python. Python understands that you mean a string when you use quotes, integers if the number has no decimal points, floats if there are decimal points, list if you use square brackets, dictionaries when you use curly brackets etc.

In [None]:
# This will automatically be interpreted as a string:
s = 'a'

# This will automatically be interpreted as an integer:
i = 3

Sometimes you can change the variable's type:

https://www.w3schools.com/python/python_casting.asp

In [None]:
# Turn an integer into a string
str(i)
type(str(i))

str

In [None]:
# Sometimes, however, it won't work:
int(s)
#Can you figure out why?

ValueError: ignored

## Booleans  
Statements that can be True or False. Note that you have to capitalize the first letter. You can not give a variable a name such as True or False. These are known as reserved keywords: https://www.w3schools.com/python/python_ref_keywords.asp

Also note that in most cases it is not possible to use other definitions as keywords either. This would include 'int' 'str' etc; basically words that are used for other things as built-in parts of Python.

You can read more about Booleans and how to use them here:https://www.w3schools.com/python/python_booleans.asp


In [None]:
t = True
f = False

In [None]:
# Remember: string = 'kittycat'
hello = string[-1] != string[6] #so the last character of the string is the same as the eighth character
print(hello)

True


In [None]:
bool_false = len(string) == 7
print(bool_false)
print(len(string))
#How "long" (how many characters are there) is string? How would you check that?

False
8


You noticed that we used single and double equal signs in the same lines. In Python, **one "=" sign** is used to assign a value to a variable, whereas **double "==" sign** is used to compare values. The double equals sign can be interpreted as an "is" statement. The opposite of "==", or an "is not" statenemt sign would be **"!="**.

In [None]:
var = 'variable'

In [None]:
var == 'variable' #This is obviously True since we defined it as such in the previous cell

True

In [None]:
# you can use the type() function in Python to check the variable's type. The "var" variable here is a string.
type(var) == int

False

In [None]:
type(var) is str

True

In [None]:
type(var) is not float

True

In [None]:
type(var) != str

False

## Printing output
print() is the basic output format in Python. However, it is rather complicated and cumbersome to print more than single variables using just print(). Therefore we'll now learn to use print(f'') instead.

print(f'') allows us to seamlessly include variables and even functions into our output.

In [None]:
g = 'words'
n = 2
m = 4
print('I want to print my variables' + g + str(n) + str(m)) #How can you fix this error?
print('I want to print my variables\n' + 'This is g: '+ g + ' This is n: '+ str(n) + 'This is m: '+ str(m)) #How can you fix this error?
print(f'I want to print my variables. This is g: {g}, this is n: {n} and this is m: {m} This is n+m: {n+m} and this n*g: {n*g}')

I want to print my variableswords24
I want to print my variables
This is g: words This is n: 2This is m: 4
I want to print my variables. This is g: words, this is n: 2 and this is m: 4 This is n+m: 6 and this n*g: wordswords



##Lists
This data type is used to group values together. You can store practically any objects in lists. Lists are defined using square brackets, and elements are separated using a comma. List indexes start from 0 so that the first element of a list had index 0 and not 1, the second element has index 1, etc.

Lists are also good when you don't know how big your output will be as you can always add more elements to your list. A list can consists of any type of data. You can have a list of integers, strings or a combination of both. You can also have a list of lists or a list of dictionaries etc.

You can read more lists and how you can manipulate lists here: 
https://www.w3schools.com/python/python_lists_add.asp

In [None]:
# This is a list of integers
l = [1, 2, 3]

# This is a list of lists (that was unintentionally made to look like a matrix)
ll = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
print(len(ll))
# And this is a list of all kinds of things!
misc = [0, 1 == 1.0, type('cat')]
print(misc)

3
[0, True, <class 'str'>]


Lists also support indexing, just like strings, and you can access elements of a list by number using square brackets.

In [None]:
print(l[0])
print(ll[-1])

# In the following statement [0, 0, 0] is a list and [0] is an index 
print([0, 0, 0][0])

1
[7, 8, 9]
0


## The Basics of Loops
[Here is a good interactive tutorial on loops in Python.](https://www.learnpython.org/en/Loops) Feel free to explore.

The classic loops are *for* and *while*.
Basically a for loop iterates over every item in a given sequence such as characters in a string or item in a list. We will mainly be using the *for* loop on this course, but you should be aware of the *while* loop too.


```
for item in sequence:
  do something
```


A while loop loops as long as its basic, boolean, condition is met.



```
while True:
  do something
```

Closely related to loops are *if-statements*, these are conditional statements where certain code is executed only if the conditions are met.



```
if True: #e.g. a==b
  then do something
elif something else is True"
  do something else
else
  do something else or do nothing
```




In [None]:
counter = 0
while counter < 10:
  print(counter)
  counter += 1

0
1
2
3
4
5
6
7
8
9


In [None]:
for i in range(10):
  print(i)

0
1
2
3
4
5
6
7
8
9


In [None]:
sequence = [4,5,10,16,15,7,25,21,45235467,23344545,234,235]  #this sequence is a random list of numbers
for item in sequence: #for every item in the sequence
  if item % 5: #if the item is divisible by 5 
    print(item) #print the item
  else:
    print(f'{item} is divisible by 5')

4
5 is divisible by 5
10 is divisible by 5
16
15 is divisible by 5
7
25 is divisible by 5
21
45235467
23344545 is divisible by 5
234
235 is divisible by 5


##Using packages, Opening, Modifying and Saving Files

For actual projects we almost always need to use our own data. Here we will look at how to import that data, how to modify it, and how to save it again.
We will also look briefly at how we can add extra functionality to Python.


##Reading in a csv file as a dataframe
This will be shown in detail in the intermediate notebook.

In [None]:
#Read in csv file as dataframe
import pandas as pd
df = pd.read_csv("data.csv")

##Reading in a text file

In [None]:
#The basic approach. NOT RECOMMENDED
file = open('example.txt')
#do something
file.close()

In [None]:
reader = open('example.txt')
try:
  # Further file processing goes here
  pass
finally:
  reader.close()

In [None]:
#the same as above
with open('example.txt') as reader:
    # Further file processing goes here
    pass

In [None]:
#get all the contents
with open('example.txt') as f:
    contents = f.read()
    print(contents) #note that in this case indentation does not matter

FileNotFoundError: ignored

In [None]:
#to get a list of strings
lines = []
with open('example.txt') as f:
    lines = f.readlines()

count = 0
for line in lines:
    count += 1
    print(f'line {count}: {line}')    #Any other way to get the number of lines?

line 1: The Zen of Python, by Tim Peters

line 2: 

line 3: Beautiful is better than ugly.

line 4: Explicit is better than implicit.

line 5: Simple is better than complex.

line 6: Complex is better than complicated.

line 7: Flat is better than nested.

line 8: Sparse is better than dense.

line 9: Readability counts.

line 10: Special cases aren't special enough to break the rules.

line 11: Although practicality beats purity.

line 12: Errors should never pass silently.

line 13: Unless explicitly silenced.

line 14: In the face of ambiguity, refuse the temptation to guess.

line 15: There should be one-- and preferably only one --obvious way to do it.

line 16: Although that way may not be obvious at first unless you're Dutch.

line 17: Now is better than never.

line 18: Although never is often better than *right* now.

line 19: If the implementation is hard to explain, it's a bad idea.

line 20: If the implementation is easy to explain, it may be a good idea.

line 21: Namespac

In [None]:
#different encodings
with open('example.txt', encoding='utf8') as f:
    for line in f:
        print(line.strip())

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


##Writing files

In [None]:
with open('example.txt', 'r') as reader:
    # Note: readlines doesn't trim the line endings
    example = reader.readlines()

with open('example_reversed.txt', 'w') as writer:
    # Alternatively you could use
    # writer.writelines(reversed(dog_breeds))

    # Write the dog breeds to the file in reversed order
    for breed in reversed(example):
        writer.write(breed)

In [None]:
with open("example.txt", "a") as myfile:
    myfile.write("appended text")