Intro
====

This Jupyter notebook will introduce you to

* the basic operation of a Jupyter notebook
* some nice features of the Python programming language
* a helpful mindset of how to approach programming tasks (in Python)

If you're completely unfamiliar with programming languages the contents below will probably overwhelm you. If this is the case, try the *Extended Intro* notebook and/or search the web for a general introduction to Python.

Jupyter basics
---------------------

A Jupyter notebook is composed of cells. For our purposes, there are two relevant types of cells:

* [Python](https://en.wikipedia.org/wiki/Python_(programming_language))
* [Markdown](https://en.wikipedia.org/wiki/Markdown)

This text is in a Markdown cell. If you double-click into the cell it will reveal the markdown and become editable.

<!-- By the way: you can use HTML in Mardkown. This is an HTML comment and therefore not visible when renderd. -->

In [None]:
# This is a Python cell. (This line is a Python comment)

# If you click into this cell, you can run/execute it using Ctrl+Enter.
# This will result in the cell's output being printed.

1 + 2

In [None]:
# <- Each execution of a Python cell is counted. The number of a cell's last
#    execution is shown on the top left.

# Try and run this cell several times

x = 1 + 2

Above cell has no output, this is because assigning a value to a variable does not result in an output.  
If you add another line to above cell that just says `x`, the value of x will be output.  
Alternatively, you can explicitly print the value of `x` with the line `print(x)`.

**Question:** What is the difference between the output of the `x` line and the `print(x)` line?

In [None]:
# The values of variables carry over to subsequently executed cells.
# In other words, x still carries its value assigend in above Python cell.

# Run this cell multiple times and see what happens.

print(x)
x = x * 2
print(x)

Python goodies
-----------------------

Compared to some other programming languages,
Python is [relatively comfortable](https://xkcd.com/353/) to work with.  
Below are just a few examples of the nifty things that are possible.

In [None]:
# Printing, as you've seen, is dead easy.
print('Hello World!')

In [None]:
# Using variables in print statements is straightforward, too.
y = 3
print('The value of x is {} and the value of y is {}.'.format(x, y))

# or, even shorter:
print(f'The value of x is {x} and the value of y is {y}.')  #  (notice the `f` preceeding the string)

In [None]:
# Python has built in documentation that can be retrieved with the function `help`.
# This can be handy to, for example, quickly look up the parameters a function expects.

help(print)

**Task:** Using the information given by `help(print)`, write a print statement that outputs the words *foo*, *bar*, and *baz* separated by semicolons.

In [None]:
# replace me with a print statement

In [None]:
# Lists ("arrays") have some nice features.

l = [1, 2, 3, 'a', 'b', 'c']
print(l)
print(l[0])  # indexing
print(l[-1])  # negative indexing
print(l[1:])  # slicing
print(l[2:3])
print(l[:-4])

In [None]:
# Strings are automatically treated as lists of characters

print('Hello World!'[:5])

**Task:** Use slicing to make above print statement say *World* (without the exclamation mark) instead of *Hello*.

In [None]:
# More fun with lists

l = [1, 2, 3]

# unpacking
a, b, c = l
print(f'a is {a}\nb is {b}\nc is {c}')
x, *y = l
print(f'x is {x}\ny is {y}')

In [None]:
# list comprehensions
l_doubled = [i * 2 for i in l]
print(l_doubled)
l_stringed = ['€ {};-'.format(i) for i in l]
print(l_stringed)

In [None]:
# Dictionaries

number_of_moons = {'Venus': 0, 'Earth': 1, 'Mars': 2, 'Jupiter': 79, 'Neptune': 14}
print(number_of_moons)
print(number_of_moons['Neptune'])

In [None]:
# dictionary comprehension
not_num_moons = {
    'not {}'.format(key) : value + 3
    for key, value
    in number_of_moons.items()
}
print(not_num_moons)

In [None]:
# easy use in loops
for key, val in number_of_moons.items():
    print(f'{key} has {val} moons.')

**Task:** In the cell below, write a list comprehension based on the list `planets` such that a list of the numbers of moons is printed.

In [None]:
planets = ['Venus', 'Earth', 'Mars', 'Jupiter', 'Neptune']
numbers =  # complete me
print(numbers)

Python libraries
-----------------------

Python offers a lot of useful extra functionality in its [standard library](https://docs.python.org/3/library/)
as well as [third party libraries](https://pypi.org/).

In [None]:
# Libraries can be imported as follows

import math  # imports the whole library
from collections import defaultdict  # imports just part of a library
from unicodedata import name as unicode_name  # imports under a specified name

# And then used as shown below

print(math.sqrt(25))
print(math.exp(1))

In [None]:
player_score = defaultdict(int)
player_score['Alex'] = 3
print(player_score['Alex'])
print(player_score['Bob'])

In [None]:
print(unicode_name(' '))
print(unicode_name('ß'))  # as in the German word “Straße”
print(unicode_name('あ'))  # as in the Japanese word かかあ天下
print(unicode_name('👍'))  # as in “👍 was approved as part of Unicode 6.0 under the name ‘Thumbs Up Sign’”

Third party libraries have to be installed before they can be used.
This can be done, for example, using [pip](https://pip.pypa.io/en/stable/).

To install third party libraries—ideally in a [virtual environment](https://docs.python.org/3/tutorial/venv.html)—using pip, execute `pip install <package_name>`.‌  
For example `pip install matplotlib`.

The following libraries will most likely be used frequently throughout the exercise sessions:

* pandas
* matplotlib
* numpy
* scipy
* scikit-learn

In [None]:
# A quick showcase

import pandas as pd  # frequently used packages sometimes have common short handles they're imported as
import numpy as np

If executing above cell resulted in an error, you have to first install *pandas* and *numpy*.

In [None]:
random_values = np.random.randn(100, 1)  # a 100×1 "matrix" (basically a list) of normally distributed values
df_showcase = pd.DataFrame(random_values, columns=['values'])
df_showcase.hist()

**Task:**
* Increase the number of random values and see, if you get a nicer bell curve than before.
* Use `help` to learn about the parameters of the DataFrame's `hist` function and increase the number of bins used for the histogramm.

You're not the first to try this
-----------------------------------------

Trying to ...

* parse a CSV file?
* extract a piece of text from an HTML document?
* use configuration files with your program? 
* display a progressbar for a long running process?
* validate a URL?
* detect the language a text is written in?

You're not the first. If you're not doing it as a programming exercise, and *especially* if you're creating something that is supposed to be used by others, it is often advisable to look for a mature library that does the job.

* CSV parsing is available in the [standard library](https://docs.python.org/3/library/csv.html); pandas also has a [builtin function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)
* HTML can be parsed with [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/)
* Parsing configuration files is also part of the [standard library](https://docs.python.org/3/library/configparser.html)
* A nice progressbar can be created using [tqdm](https://tqdm.github.io/)
* URL validation, while a bit of an iffy topic, can be done with [urllib](https://docs.python.org/3/library/urllib.parse.html)
* For language detection there is, for example, [langdetect](https://github.com/Mimino666/langdetect)

**Bottom line:** the Python community is your friend. :)

---

As a minimal demonstration, here's effectively four lines of Python that retrieve the latest KIT news:  
(The libraries used are `requests` and `beautifulsoup4`. You'll have to install them for the code to run.)

In [None]:
import requests
from bs4 import BeautifulSoup

http_response = requests.get('https://www.kit.edu/english/')  # retrieve website content
parsed_html = BeautifulSoup(http_response.text)               # parse html
headline = parsed_html.find('span', class_='headline')        # find headline
snippet = headline.parent.find('p')                           # find associated text snippet

print(
    'KIT News:\n'
    '=========\n'
    f'{headline.text}\n'
    f'{snippet.text}'
)

**NOTE:** before you get all enthusiastic now and start scraping websites left and right, *be aware* that there are laws and a general etiquette to be observed when accessing other people's stuff.

A web search for "ethical web scraping" or "responsible web scraping" will lead you to further information. Key points:

* Use an API if there is one
* Identify yourself and your intent using the `User-Agent` header
* Respect the robots.txt
* Respect time-outs, crawl-rates, etc. as indicated in response headers or the robots.txt