# -1. Disclaimer
-- Many of the materials are gently stolen from the following courses: 
- **["A Python Course for the Humanities"](https://github.com/fbkarsdorp/python-course)** a course designed by Folgert Karsdorp and Maarten van Gompel
- and later modified by Mike Kestemont and Lars Wieneke for the course **["Programming for Linguistics and Literature"](https://github.com/mikekestemont/prog1617)**
- **["Python for text analysis"](https://github.com/cltl/python-for-text-analysis)** designed by H.D. van der Vliet and taught at the Vrije Universiteit
- **["How to Think Like a Computer Scientist"](http://www.greenteapress.com/thinkpython/thinkCSpy.pdf)** by Allen Downey, Jeffrey Elkner, Chris Meyers
- **["The Programming Historian"](https://programminghistorian.org/en/lessons)**: ["Fetch and Parse Data with OpenRefine"](https://programminghistorian.org/en/lessons/fetch-and-parse-data-with-openrefine) (by Evan P. Williamson) and ["Manipulating Strings in Python"](https://programminghistorian.org/en/lessons/manipulating-strings-in-python) by William J. Turkel and Adam Crymble

# 0. Before we kick off: Installing Jupyter Notebook

- Download Anaconda: https://www.anaconda.com/download
        Select the Python 3.6 Version
        Follow the installation instructions
- Download the Notebook and data [here](https://github.com/kasparvonbeelen/Python-Slow-Learning)
        Open Anaconda Navigator
        Launch Jupyter Notebook
        This should open a tab in your browser
        Go to the location where you cloned/unzipped the material downloaded from Github

# 1. Philosophy of the Course

- **We have time...** (if we don't get everything done, we just add an extra session)
- Coding is **not** difficult, but obtaining basic programming skills requires a **sustained effort**.
- With only a few basic skills you can go a long way (writing scripts vs. developing tools).
- Learning Python is not a linear, incremental process. Sometimes I will **fast-forward** and skip some of the details but hope you get a feeling for what is possible, and why coding could be useful for your research.
- The full course, with all the details, is available [here](https://github.com/kasparvonbeelen/Coding-the-Humanities) (but still under construction).
- It takes a while before you can do some more fancy stuff (you have to go through kindergarten again before you become a rocket scientist).

### 1.1 The Language of Choice: Python

#### **What** is Python?

[From Wikipedia](https://en.wikipedia.org/wiki/Python_(programming_language): Python is a widely used **high-level** programming language for **general-purpose** programming.
- ** high-level programming language**: In computer science, a high-level programming language is a programming language with **strong abstraction from the details of the computer**. In comparison to low-level programming languages, it may use **natural language elements**, be easier to use, or may **automate** (or even **hide** entirely) significant areas of computing systems (e.g. memory management), making the process of developing a program simpler and more understandable relative to a lower-level language. The amount of **abstraction** provided defines how "high-level" a programming language is.


#### **Why** Python?

In general, Python is **easier to learn and to read**. Let's look at a very simple example. 

In [None]:
print('Hello, World.')

Compare this to the C++ version of  "Hello, World." which looks like this:

C++ code below:
``
#include <iostream.h>

void main()

{
    
    cout << "Hello, world." << endl;

}

``

End of C++ code.


So, in general, the reasons why I teach **Python** are:

- Software **Quality**: Python code is designed to be **readable**, and hence reusable and maintainable. 
- Developer **Productivity**: Python code is typically one-third to one-fifth the size of C++ or Java code. 
- **Portability**: Python code runs unchanged on all major computer platforms (Windows, Linux, MacOS). 
- **General-purpose**: data analysis, web development etc.
- **Support Libraries**: Standard, homegrown and third-party libraries.
- **Widely used by the academic and scientific community!**

# 2. Goal of Today's Lecture

Today we cover a few basic **Python data types and objects**:
- Strings
- Lists
- Dictionaries

And survey some tools for manipulating these objects:
- String formatting
- Appending items to a list
- Exploring dictionaries and JSON Objects


In general, the course shows how to **collect and save data from the Web**. In the following courses, we turn to analysing the retrieved data. 

At the end of this session, you should be able to **understand** most of the following code. For now, just try to run it by simultaneously pressing `ctrl` and `enter` or the **play** button at the top of your Notebook.

### 2.1. Leading Example
#### A real-world application of the elements we discuss today.

In [None]:
import requests # Import models, here a set of tools that help you donwloading data
'''
Script that retrieves data "Chronicling America" and stores information in a list.
'''

data = """Idaho,1865\nMontana,1865\nOregon,1865\nWashington,1865""" # Variable Assignment & Strings
print('Input data is a csv file.')
print(data)
url = "http://chroniclingamerica.loc.gov/search/pages/results/?state={0}&date1={1}&date2={1}&dateFilterType=yearRange&sequence=1&sort=date&rows=5&format=json"
# String formatting, Getting data from APIs

all_data = [] # Empty Lists & Variable Assignment
print('\n')
lines = data.split('\n') # Split string (convert string to list)
print('Split the csv file by newline characters "\\n"')
print(lines) # Printing the list
print('\n')
for line in lines: # For loop 
    state,year = line.split(',') # Split string, multiple assignment
    formatted_url = url.format(state,year)
    print('Downloading data for state={} year={}'.format(state,year))
    print(formatted_url)
    response = requests.get(formatted_url).json() # Calling the API & download data
    all_data.append(response) # Storing data in a variable

print('\nDownloaded {} items.\nDone!\n'.format(len(all_data)))

In [None]:
import json # Import JSON tools
'''
Inspect the downloaded data.
'''
idaho = all_data[0] # Indexing and slicing
idaho.keys() # Inspecting JSON objects & Python dictionaries
print(json.dumps(idaho)) # Copy-Paste the print output to http://jsonviewer.stack.hu/

In [None]:
# print the title
print(idaho['items'][0]['alt_title'])

In [None]:
# print ocr text
print(idaho['items'][0]['ocr_eng'])

In [None]:
# how many words does this text contain?
idaho_text = idaho['items'][0]['ocr_eng']
print(len(idaho_text.split()))

In [None]:
json.dump(all_data,open('./chrom_america.json','w')) # Store the data on your disk as a JSON file

The difficulty with learning how to program is to obtain a proper understanding of all the individual building blocks that constitute the language. Things only start to make sense when you start combining various components.

Nonetheless, we have to explain these elements separately. This can be tedious, but please bear with me for a few hours, the rewards are plenty! 

# 3. Baby Python

For practising your coding skills, you can use the many **'code blocks'** in this Notebook, such as the grey cell below. Place your cursor inside the cell and press ``ctrl+enter`` to "run" or execute the code. Let's begin right away: run your first little program!

In [None]:
print('Hello, World!')

You've just executed your first program!

### --Exercise--
- Can you describe what the programme just did?
- Can you adapt it to print your name (with a greeting, i.e. "Hello, ...")?

Use the code block **below**.

In [None]:
# Insert your own code here!
# Print your own name ... or whatever you want, and press ctrl + enter
print('Hello, Kaspar')

Besides printing words to your screen, you can use Python as a **calculator**. 

In [None]:
print(10)
print(5+9)
print(3*8)

Please note that a string is always enclosed in **quotation** marks *`'`* or *`"`*, while a number (integers or floats) is not.

### --Exercise--
Use the code block below to calculate (and print) how many minutes there are in one week?

**HINT**: use the multiplication operator **`*`**

In [None]:
# Write your code here
print(60*24*7)

# 4. Variables: Presents for Everyone

One of the most powerful features of a programming language is the ability to **store and manipulate variables**. A variable is a **name** that refers to a value. The **assignment statement** creates new variables and relates them to concrete values. Instead of passing these elements as an argument to the `print()` function, we can **store** them, by creating a variable that refers to the "Hello, World!" string.

In [None]:
# declare a variable
x = 'Hello World.'
# print what is in the box
print(x)

In [None]:
# declare a variable
y = 22
# print what is in the box
print(y)

If you vaguely remember your math-classes in school, this should look familiar. It is basically the same notation with the name of **the variable on the left, the value on the right**, and the = sign in the middle. 

In the code block above, two things happen. **First**, we fill `x` with a value, in our case `22`. This variable x behaves pretty much like a **box** on which we write an `x` with a thick, black marker to find it back later. **Second**: We print the contents of this box, using the `print()` command. ![box](./images/box.png)

You can inspect the type of the variable with the `type()` **function**. You can use this function by putting the object between parenthesis.

In [None]:
text = 'Hello, Worlds!'
print(type(text))
number = 10
print(type(number))
number_string = '10'
print(type(number_string))

### --Exercise--
Create and print two values: your name (string) and year of birth (integer)

In [None]:
# write your code here
name = 'Kaspar'
year_of_birth = 1984
print(name,year_of_birth)

### --Exercise--
Find the variable assignments in the leading example code.

# 5. Strings: How Python Understands Text

In the preceding sections, we learned how to define string variables.

In [None]:
x = 'Yo, Kaspar'
print(x)
print(type(x))

Let's have a closer look at the ``'str'`` type (str stands for string)

Similar to numbers, strings can also be added together. What do you think the operation below will produce? (pause a moment before running the code.)

In [None]:
first_name = "Kaspar"
last_name = "Beelen"
print(first_name+last_name)

This the last operation is called string **concatination**. We added one string to another using the `+` operator.

In [None]:
book = "The Lord of the Flies"
print(first_name + " likes " + book + "?")

### --Exercise--
Declare two variables `first_name` and `last_name`. Print them neatly using concatenation.

In [None]:
# write your code here
first_name = "Kaspar"
last_name = "Beelen"
print(first_name + ' ' + last_name)

Another option would be the `format()` method. 

To see what `format()` does, we can simply turn to Python's help functionality.

In [None]:
help(str.format)

`.format()` inserts a variable (either a string or a number) between braces. Try it out below!

In [None]:
name = # enter a name
print('{} is great!'.format(name))

In [None]:
# create variable
name = 'My first name is {0}.\nMy second name is {1}'.format(first_name,last_name)
print(name)

Please note the `\n` sign here. Which denotes a hard return (newline character).

### --Exercise--
What would the following expression return?

`'My first name is {1}.\nMy second name is {0}'.format(first_name,last_name)`

In [None]:
# try it here
'My first name is {1}.\nMy second name is {0}'.format(first_name,last_name)

A lot is actually happening here--and may be confusing at first. Let's inspect the syntax of this line a bit closer.

## 5.1 String Methods

The expression below follows the Python dot notation:

    - `'My first name is {0}.\nMy second name is {1}'.format(first_name,last_name)`

Which, in a more abstract form, looks like:

    - `object.method(arguments)`
    
In this example, we applied the `.format` **method** to a string **object** with `first_name,last_name` as **arguments**.

We also could have applied the method to a variable:

In [None]:
name_string = 'My first name is {0}.\nMy second name is {1}'
name_string.format(first_name,last_name)

Python comes with many useful **tools for text processing**. You can list and inspect them with `dir()` or `help()` functions (again, the syntax here is slightly different than the dot notation).

In [None]:
book = 'Pride and Prejudice' # Let's pretend we stored a whole book in this variable

`dir()` shows all the methods you can apply to the string variable `book`. Please scroll down. You can ignore the elements starting with double underscores.

In [None]:
dir(book)

All these methods allows you to do things with strings. Some of the most useful methods are
- `split()`
- `lower()`
- `len()`
- `find()`

## .split()

### --Exercise--

Go back to the initial example, and figure out how the `split()` method works.

### --Exercise--
Print the Python **documentation** on the `.split()` method using the `help` function.

In [None]:
# search for help here
name = "Kaspar"
help(name.split)
# or
help(str.split)

### --Exercise--

Inspect the following examples:

In [None]:
print('Split on white space: ',book.split())
print('Split on character "e": ',book.split('e'))
print('Splint on newline: ',book.split('\n'))

#### Important

`split()` converts a string of characters to a **list of words** (approximately, we come back to this later).

## .lower()

### --Exercise--

Experiment with the `lower()` function. 
- Create a string variable;
- Pass the lowercased variable to another one;
- Print the lowercased and the original variable.

In [None]:
# Experiment with lower
# Declare a string variable

variable = "KaspaR"

# Look for documentation on `lower`

help(variable.lower)

# Apply lower to the variable AND assign the lowercased string to a new variable

var_lower = variable.lower()

# print the variables before and after applying the lower method
print(variable)
print(var_lower)


## .find()

### --Exercise--

Find the position of the first 'e' in the title "Naturkatastrophenkonzert".

In [None]:
title = 'Naturkatastrophenkonzert'
# use the find() method here
title.find('e')

### --Exercise--

In [None]:
# download Romeo and Juliet from Gutenberg
import requests
randj = requests.get('http://www.gutenberg.org/cache/epub/1777/pg1777.txt').text

Find the **first** occurence of the word **`love`** in Shakespeare's Rome and Juliet. 

**HINT**: Do not forget to first lowercase all words!

In [None]:
first_love = randj.lower().find('love')
print(first_love)

You can print the context around `first_love` using the [index](https://www.oreilly.com/learning/how-do-i-use-the-slice-notation-in-python) notation. (Please follow link for more information.)

In [None]:
context_size = 50 # the number of character around the word
start_at = first_love-context_size # indicate the starting position
stop_at = first_love+context_size+len('love') # indicate where to stop
print('Start printing at character with position=',start_at)
print('Stop printing at character with position=',stop_at)
print('\n')
print(randj[start_at:stop_at]) # print with context

### --Exercise--

Can you find the **second** occurence of **"love"** in this play? And print the context?

HINT: Inspect the `help()` function. Reuse information from the above code cells (`first_love`).

In [None]:
# add and copy-paste your code here
second_love = randj.find('love',first_love+4)
print(second_love)

In [None]:
context_size = 50 # the number of character around the word
start_at = second_love-context_size # indicate the starting position
stop_at = second_love+context_size+len('love') # indicate where to stop
print('Start printing at character with position=',start_at)
print('Stop printing at character with position=',stop_at)
print('\n')
print(randj[start_at:stop_at]) # print with context

# Intermezzo: Counting Words

In [None]:
from collections import Counter
wf = Counter(randj.lower().split())
wf.most_common(20)

# Recap

- Variables are boxes in which you can store information.
- Variables can be of a different type: Text (strings) or Numbers (Integers).
- Methods/Function allow you to manipulate the content of these boxes (e.g. `.lower()`)

In [None]:
# Experiment a bit here

## len()

`len()` counts the number of elements the argument contains. If you pass a string as an argument, it counts the number characters.

Note: the syntax is slighly different here (for reasons that fall outside the scope of this course.)

In [None]:
word = 'supercalifragilisticexpialidocious'
print(len(word))
#print(word.__len__())

In [None]:
# How many characters does your full name contain?

### --Exercise--

How many words does Romeo and Juliet contain (approximately)? Use `split()` and `len()` in combination.

In [None]:
# add your code here

### --Exercise--

Can you find other useful string methods?

In [None]:
# if yes, play with them here

# Intermezzo: Returning to the main example

Let's inspect more closely some lines in the leading example.

#### `\n` indicates the end of a line

In [None]:
data = """Idaho,1865\nMontana,1865\nOregon,1865\nWashington,1865"""
print(data)

### --Exercise--

Split by `newline` returns the rows as a list (see below).

In [None]:
# Exercise split by the newline character

### --Exercise--

- We can save the rows in a new variable `lines`.
- Count the number of line with `len()`.

In [None]:
# Exercise

### --Additional--

#### .format() method manipulates the url by inserting substrings as specific locations marked by braces '{}'.

In [None]:
url = "http://chroniclingamerica.loc.gov/search/pages/results/?state={0}&date1={1}&date2={1}&dateFilterType=yearRange&sequence=1&sort=date&rows=5&format=json"
query = url.format('Idaho',1865)
print(query)

Please follow the link produced by the `print` operation.

This may look very complicated, but actually we are doing nothing more than generating a query that we use to retrieve data from "Chronicling America". Let's have a closer look at what we are actually doing.

The basic components of the this URL are:
- the base URL, http://chroniclingamerica.loc.gov/
- the search service location for individual newspaper pages, search/pages/results
- a query string, starting with `?` and made up of **value pairs** (fieldname=value) separated by `&`.
    - e.g. value pairs are: state=Idaho; date1=1865;
    - only the front pages (sequence=1)
    - sorting by date (sort=date)
    - returning a maximum of five (rows=5)
    - in JSON (format=json)

Now image we would like to retrieve data for multiple years for the state Idaho. In Python this is very simple, but we have to extend our syntax to properly understand how.

In [None]:
queries = [] # define a variable where you will store all your queries
for year in [1865,1885,1905]:  # loop over these years
    query = url.format('Idaho',year) # formulate queriy
    queries.append(query) # store it in the queries variable using .append()
print(queries) # done! print! copy paste one of the elements to see if this worked...

So let's turn to list objects!

# 6. Lists

Lists resemble strings: both are a **sequence** of values. But whereas a string was a sequence of characters, a list can contain values of any type. These values we call **elements** or **items**.

In [None]:
this_is_a_string = 'Hello Newman'
this_is_a_list = ['Hello','Jerry',42,3.1415]

Consider the first sentence (represented as a string) from Franz Kafka's book 'The Trial'. Image for a moment we would have assigned the whole book to the `trial` variable.

In [None]:
trial = "Someone must have slandered Josef K., for one morning, without having done anything truly wrong, he was arrested. "

**A string is a sequence of characters.**

How can we select specific words from this book? It might seem  natural for us to describe the sentence as a series of words, rather than a series of characters. Say, we want to access the first word in our sentence. If we enter:

In [None]:
first_word = trial[0]
print(first_word)

Here we used index notation. The variable name followed by square brackets which contains a number. 

The notation `variable_name[n]`: can be read as: give me the n-th element of the variable called `variable name`.

### --Exercise--

Print the second and last character of the `trial` string.

TIP: the last character has position `-1` in Python syntax.

In [None]:
# Exercise

### --Exercise--

Can you print the penultimate character?

In [None]:
# Exercise

**`.split()` converts this string to a list of words.**

Python only prints the first character of our sentence. We can, however, transform our sentence into **a list of words** using the `.split()` function as follows:

In [None]:
words = trial.split()
print(words)

The variable `trial` now holds the first line of Kafka's Trial as a **list**. Each element in this list is now (approximately) a **word**. Run the code below to see the difference.

In [None]:
first_word = words[0]
print(first_word)

### --Exercise--

- Count the number of words in the `trial` string.
- Print the second and the last word.

## Creating a list: the basic rules 

`.split()` transforms a string to a list. But we can also create lists manually.

To store an empty list in variable `x`, simply assign `x` to ``[]`` (square brackets).

In [None]:
# create an empty list
x = []
print(x)

Defining an empty list may seem useless at first, but it's not. Actually we are defining here a variable in which we want to collect information--save it for later. 

We will do this often later on in this course.

We can also create lists with some content: enclose the individual items within square brackets, separated by a comma.

In [None]:
my_grades = [8,9,6,7]
print(my_grades)
my_garbage = ['Potatoe',[1,2,3],9.03434,'frogs']
print(my_garbage)

### General rules:
* Lists are surrounded by square brackets and the elements in the list are separated by commas
* A list element can be **any Python object** - even another list (e.g. * List can be an collection of numbers, strings, floats (or a combination thereof))
* A list can store values with different types
* A list can be empty

### --Exercise--

Create a list manually, select your three favorite artists/composers, whatever, and put them in one list.

In [None]:
# put your code here

## Adding items to a list: concatenation and the `.append()` method

Similar to strings, Python comes with specific operations (``*`` and ``+``) that you can apply to a list.

The ``+`` operator **concatenates** lists. 

Can you guess what the variable `c` will look like?

In [None]:
a = [1, 2, 3]
b = [4, 5, 6]
c = a + b

In [None]:
# print variable c here

Most of the crucial list functionalities are provided by the inbuilt list **methods** (**functions attached to the list object**). For an overview of the available methods run the code below (scroll down, for this course you can ignore the methods starting and ending with double underscores.)

In [None]:
writers_list = []
print(type(writers_list))

We learn, unsurprisingle to that the variable a_list is of type `list`. Let's inspect the functionalities Python provides for working with lists.

In [None]:
help(list)

**``append()`` adds other values to the list**

The first method we encounter is ``append``. To see what this method does use the same `help` function as before

In [None]:
help(list.append)

`.append()` **adds new items** to the right end of a list. It has one argument and **returns `None`** (we come back to this a few blocks below).

In [None]:
composer_list = ['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn']
print(composer_list)
composer_list.append('L. van Beethoven')
print(composer_list)

### --Exercise--
add another composers to the `composer_list`

In [None]:
# add your code here

Functions in Python are generally divided into **fruitful** and **void** functions? `append` is a **void** function: similar to `print`, it performs an operation (adds one element to the list) but **returns nothing**. Understanding this distinction may help you tracing bugs in future code.

In [None]:
a = composer_list.append('J. des Prez')
print(composer_list)
print(a)

The `append()` method is especially powerful in **a `for` loop**.

# 7. For-loops

The code below shows a context in which the `append()` method is often applied. For example, we have structured data which lists song titles since the interwar year. Imagine, we want to study all songs about "love". 

But let's start with a simple example and return to Kafka.

In [None]:
trial = "Someone must have slandered Josef K., for one morning, without having done anything truly wrong, he was arrested. "

In [None]:
# split the string by white spaces
words = trial.split()
print(words)

### Membership operators

An easy way to check if a word apears in a sentence is the membership operator `in`.

In [None]:
'must' in words

In [None]:
print('"must" in words? ','must' in words)
print('"lalalala" in words? ','lalalala' in words)

Ok, now we have a list with the indvidual words. We can iterate over this list with a `for` loop. Let's loop over the words and print each of the individually in upper case and with exclamation marks!!!!

In [None]:
for word in words:
    print(word.upper()+'!!!')

### --Exercise--

The conditional expression `if` allows us to manipulate the behaviour of the `for` loop.

For example we can only uppercase words that start with 'a'.

In [None]:
for w in trial.lower().split():
    if w.startswith('a'):
        print(w.upper()+ '!!!') 
    else:
        print(w)

Now adapt the previous code and uppercase all words ending with "ly".

HINT: you can use the `.endswith()` method.

In [None]:
# Exercise

### --Exercise--

Print the length of the words in the sentence variable.

In [None]:
for  in :
   print(...)

For sure, we could have done this manually, and obtain the same result (as shown below).

In [None]:
print(len(words[0]))
print(len(words[1]))
print(len(words[2]))
print(len(words[3]))
print('...')
print('etc.  till the end.')
print('...')
print(len(words[-4]))
print(len(words[-3]))
print(len(words[-2]))
print(len(words[-1]))

But you have to agree that the above example is more elegant and concise. Also, applying the example below to a list 100.000 items or more, would be very time consuming. What is the benefit of having a fast computer if you have to enter everything manually?

Python provides the so-called `for`-statements that allow us to **loop** through any **(iterable) object** and perform actions on each element. 

The basic syntax of a `for`-statement is: 

    for x in iterable:
        ...do something with x...

That reads almost like English! 

The `for` loop might still confusing at first. Let's have a closer look at a simple example: 

In [None]:
names = ['John', 'Anna', 'Bert']
for name in names:
    print(name)

The `name` variable is not explicitly assigned in advance. It acts somewhat as a **placeholder**, and is assigned to each element in the list in turn (as the `print()` statement suggests). 

You are **free to choose the name** of this variable, but it has to be consistent in the indented block below.

In [None]:
names = ['John', 'Anna', 'Bert']
for LALALALALA in names:
    print(LALALALALA)

... this works just fine but is less readable.

### Indentation

Note the tab (or white space) after the colon. This is called **[indentation](http://www.diveintopython.net/getting_to_know_python/indenting_code.html)** and is part of Python syntax. Try removing it, and see what happens...

Now, we can make a simple program that stores the word length of each word in a `word_lengths` list.

In [None]:
# Initialize and empty list, in which we will store all word lengths
word_lengths = []
# now we iterate over the iterable (i.e. list) called words
for word in words:
    # get the name of the word
    var = len(word)
    # append it to the list
    word_lengths.append(var)

print(word_lengths)

We could make the previous code a bit more concise:

In [None]:
# Initialize and empty list, in which we will store all word lengths
word_lengths = []
# now we iterate over the iterable (i.e. list) called words
for word in words:
    word_lengths.append(len(word))

print(word_lengths)

### --Exercise--

Now we can put everything together and make a simple programme that collects all songs about 'love'. 
We use the [Million Song database](
https://labrosa.ee.columbia.edu/millionsong/sites/default/files/AdditionalFiles/tracks_per_year.txt)

Below follows a step-by-step guide, but I left out some code. Please complete where necessary.

**A.** Retrieve the data with `requests` (this can take a while).

In [None]:
import requests
url = 'https://labrosa.ee.columbia.edu/millionsong/sites/default/files/AdditionalFiles/tracks_per_year.txt'
#small data set for those with a slower laptop/computer
#url = ‘https://raw.githubusercontent.com/kasparvonbeelen/Coding-the-Humanities/master/lecture2/subsample.txt’
data = requests.get(url).text.strip() # download the song titles

**B.** Create an empty list and define your query.

In [None]:
search = # define the query string
love_song = # create an empty list

**C.** split the data by row. There should be 515576 rows.

In [None]:
rows = data.split('\n')
print(len(rows) == 515576)

In [None]:
for row in rows:
    cells = # split the row into a list called cells, split on the <SEP> sequence
    title = cells[]  # the fifth element in the list is the title
    title_lower = # convert capitals in the string to lowercase characters
    words =  # split the title string into words
    if search in words  : # print string if it contains the search term and is older than 1960
        love_song.append(title)

There should be 13844 in `love_song` variable.

In [None]:
print(len(love_song)==13844)

Let's print the first 100.

In [None]:
print(love_song[:100])

### --Exercise--

- Put all the code together in one code block. 
- Can you find all the songs on "hate" in the song title database?
- Is "love" more popular a topic than "hate"? 

In [None]:
# add you code here

# 8. Dictionaries

Dictionaries are a **mapping from keys to values**. In this way a dictionary resembles a "real" dictionary that associates lemmas with definitions. 

In Python this looks as follows:

In [None]:
dictionary = {
    'bird':'a warm-blooded egg-laying vertebrate animal...',
    'feather':"Any of the flat appendages growing from a bird's skin and forming its plumage",
    'plumage':"a bird's feathers collectively"
}

... and we can easily find a definition by the lemma

In [None]:
print(dictionary['bird'])

Note the **square brackets**! (looks similar to the index notation for lists)

*Dictionaries* provide you with the data structure that makes looking up values by keys exceptionally easy.
With *lists* you can only look up elements by position. 

Let's give another example: a mapping from names to numbers is saved in the variable `telephone_numbers`.

In [None]:
telephone_numbers = {'Frank': 4334030, 'Susan': 400230, 'Guido': 487239}
print(telephone_numbers)

What is Susan's phone number?

In Pyhon you can easily look-up a value (the element after the `":"`) by entering a key (the element before the `":"`) in a dictionary.

... and now print Susan's telephone number:

In [None]:
print(telephone_numbers['Susan'])

### --Exercise--

Add you own name to `telephone_numbers` and print it.

In [None]:
# add you name to the telephone_numbers dictionary

In [None]:
# print your number by entering your name as a key

## Creating a dictionary

* a dictionary is surrounded by **curly brackets** 

* a dictionary consists of one or more **key:value pairs**, the key is the 'identifier' or "name" that is used to describe the value.
* the **keys** in a dictionary are **unique**
* the syntax for a key/value pair is: `key : value`
* and the **key/value** pairs (i.e. **items**) are separated by **commas**.
* the keys (e.g. 'Frank') in a dictionary have to be **immutable**
* the values (e.g. 8) in a dictionary can by **any python object**
* a dictionary can be empty


### Some examples:

An empty dictionary:

In [None]:
x = {}

A mapping between English and German words:

In [None]:
english2deutsch = {'ambulance':'Krankenwagen',
                  'clever':'klug',
                  'concrete':'Beton'}

### -- Optional Exercise--

Make dictionary which maps three cities to the size of their population. Call it `city2population`.

In [None]:
city2population = #add your code here

## Optional: Adding items to a dictionary

There is one very simple way in order to add a **key:value** pair to a dictionary. Please look at the following code snippet:

In [None]:
english2deutsch = dict()
#or try english2deutsch = {}
print(english2deutsch)

In [None]:
english2deutsch['one'] = 'einz'
english2deutsch['two'] = 'zwei'
english2deutsch['three'] = 'drei'
print(english2deutsch)

### --Exercise--

The previous notation is useful in combination with a `for` loop. We can for example map words to their position in a text.

Let's return to Kafka.

In [None]:
trial = "Someone must have slandered Josef K., for one morning, without having done anything truly wrong, he was arrested. "

In [None]:
# split the sentence into words

To simplify things, we can use the `enumerate()` function, which loops over an object and keeps track of the location of items in a list.

In [None]:
x = ['a','b','c','d']
for count,item in enumerate(x):
    print(count,item)

We can loop over the sentence, and map each index to a word. Complete the code by replacing the question marks.

In [None]:
position2word # create empty dictionary
for ??,?? in ??:
    position2word[??] = ??
    
print(position2word)
# get word at position nine

## Iterating over dictionaries

Since dictionaries are iterable objects, we can iterate through our good reads collection as well. This will iterate over the *keys* of a dictionary:

In [None]:
good_reads = {"The Magic Mountain":9,
             "The Idiot":7,
             "Don Quixote": 9.5}

for book in good_reads:
    print(book)

To iterate over the key-value pairs use the `.items()` method.

In [None]:
for book,score in good_reads.items():
    print(book,score)

#### Exercise

Print the English words and their German translation by iterating over the items of the english2deutsch dictionary.

In [None]:
# add you code here

# 9. JSON

The data retrieved from the Chronicling America API is a [JSON](https://en.wikipedia.org/wiki/JSON) file in which each item contains a few newspapers from a different state. Copy paste the printout below and go to the [JSON viewer](http://jsonviewer.stack.hu/) the inspect the docoment.

As you'll see, the JSON object combines Python lists and dictionaries. As it is a very common data type, Python has some libraries to process and read JSON data.

In [None]:
idaho = json.load(open('./idaho_example.json'))

In [None]:
print(json.dumps(idaho))

### --Exercise--

Explore the JSON file.

### --Exercise--

Can you print the first title?

In [None]:
# Add you code here

### --Exercise--

Can you print the number of words (approximetaly) in the fourth article (hidden under key 'ocr_eng')?

In [None]:
# Add you code here

## We are DONE for today. Congratulations!