
# Chapter 1: Structure of a Program

This notebook serves as an overview. Many terms referenced here will be covered in detail in following notebooks.

## TL;DR

- program := sequence of instructions that specify how to perform a computation (= "black box")

- input := e.g., data from a CSV file, text entered on a command line, data from a database

- output := result of the computation, optional

- variables := storage of intermediate "state"

- operators / statements := operations like addition or multiplication, reading / saving a file, commands that "do" something or change the state of the variables

- functions := a named sequence of instructions, often a small part in a larger program

- flow control
  - conditional execution ("if statements")
  - repetitions (for or while loops)

## Example: Calculate the Average of all even Numbers in a List

In [1]:
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [2]:
numbers

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [3]:
count = 0  # initialize variables to keep track of the sum
total = 0  # so far and the count of the even numbers

for number in numbers:
    if number % 2 == 0:  # only look at even numbers
        count = count + 1
        total = total + number

average = total / count

In [4]:
average

6.0

## Generating Cell Output in a Jupyter Notebook

Note that only two of the previous four cells generate an output while two remained "silent" (i.e., there is no "**Out[...]**").

By default, Jupyter notebooks show the value of a cell's last so-called **expression**. This behavior can be suppressed by ending the last line with a semicolon.

To visualize something before the end of the cell, use the [print()](https://docs.python.org/3/library/functions.html#print) built-in function.

In [5]:
"Hello, World!"
"I am feeling great :-)"

'I am feeling great :-)'

In [6]:
"I am invisible!";

In [7]:
print("Hello, World!")
print("I am feeling great :-)")

Hello, World!
I am feeling great :-)


Note that Python **begins counting at 0** (This is not the case for many other languages, e.g., Matlab, R, or Stata). To understand why this makes sense, see this short [note](https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html) by one of the all-time greats in computer science, the late [Edsger Dijkstra](https://en.wikipedia.org/wiki/Edsger_W._Dijkstra).

In [8]:
for i in range(5):
    print(i)

0
1
2
3
4


## (Arithmetic) Operators

Python comes with basic mathematical operators built in. **Operators** are **tokens** that have a special meaning and either take the value after them (= **unary** operators) or the two values around them (= **binary** operators) to "produce" a new value. By convention, they have **no side effects** in the sense that this new value is discarded unless explicitly stored in a **variable**.

Let's see some examples of operators. We start with the binary `+` and the `-` operators for addition and subtraction. Binary operators resemble what mathematicians call [infix notation](https://en.wikipedia.org/wiki/Infix_notation) and have the expected meaning.

In [9]:
77 + 13

90

In [10]:
101 - 93

8

The minus operator can be used as a unary operator as well. Then it just flips the "sign" of a number.

In [11]:
-1

-1

When we compare the output of the `*` and `/` operators for multiplication and division, we note the subtle difference between the $42$ and the $42.0$. This is a first illustration of the concept of a **data type**.

In [12]:
2 * 21

42

In [13]:
84 / 2

42.0

The so-called **floor division operator** `//` always rounds down to the next integer and is thus also called **integer division operator**. This is a first example of an operator we commonly do not know from high school mathematics.

In [14]:
84 // 2

42

In [15]:
85 // 2

42

To obtain the remainder of a division, we can use the **modulo operator** `%`.

In [16]:
85 % 2

1

Note that the remainder is $0$ if a number is divisable by another.

In [17]:
49 % 7

0

Modulo division can be useful if we, for example, need to get the last couple of digits of a large integer.

In [18]:
123 % 10

3

In [19]:
123 % 100

23

The [divmod()](https://docs.python.org/3/library/functions.html#divmod) built-in function combines the integer and modulo divisions into one operation. However, note that this is not an operator any more (but a function). Also observe that [divmod()](https://docs.python.org/3/library/functions.html#divmod) returns a "pair" of integers.

In [20]:
divmod(42, 10)

(4, 2)

Raising a number to a power is performed with the **exponentiation operator** `**`. Note that this is different from the `^` operator many other programming languages might use.

In [21]:
2 ** 3

8

The normal order of precedence from mathematics applies ("PEMDAS" rule) but parenthesis help avoid confusion.

In [22]:
3 ** 2 * 2 

18

In [23]:
(3 ** 2) * 2  # same result as before but much clearer code

18

In [24]:
3 ** (2 * 2)  # different result

81

Some programmers also use "style" conventions. For example, we can play with the **whitespace**, which is an umbrella term that refers to any non-printable sign like spaces, tabs, or the like. However, parenthesis convey a much clearer picture.

In [25]:
3**2 * 2  # bad style; it is better to use parenthesis here

18

There are plenty more mathematical and non-mathematical operators that are introduced throughout this tutorial together with the concepts they implement or support. Some of these are already shown in the next section.

## Values vs. Types

Python is a so-called **object-oriented** language, which is a paradigm of organizing a program's memory.

An **object** can be viewed as a "bag" of $0$s and $1$s in the same memory location that not only symbolizes a certain value but also has some associated rules as to how this value is treated and may be worked with.

An object always has **three** main characteristics.

Let's look at the following example.

In [26]:
a = 789
b = 42.0
c = "Python rocks"

### Identity / "Memory Location"

The [id()](https://docs.python.org/3/library/functions.html#id) built-in function shows an object's "address" in the computer's memory.

In [27]:
id(a)

139662722068432

In [28]:
id(b)

139662722077752

In [29]:
id(c)

139662713231792

The same value might be stored at several (=different) memory locations.

In [30]:
d = 789

`a` and `d` have the same value as can be checked with the **equality operator** `==`. Note that the resulting `True` (and the `False` below) is yet another data type, a so-called **boolean**.

In [31]:
a == d

True

On the contrary, `a` and `d` are different objects as can be seen with the **identity operator** `is`: they are stored at seperate addresses in the memory.

In [32]:
a is d

False

### (Data) Type / "behavior"

The [type()](https://docs.python.org/3/library/functions.html#type) built-in function shows an object's type. For example, `a` is an integer (`int`) while `b` is a so-called [floating-point number](https://en.wikipedia.org/wiki/Floating-point_arithmetic) (`float`).

In [33]:
type(a)

int

In [34]:
type(b)

float

Different types imply different behaviors for the objects. For a float type, for example, we can test if we could also interpret it as an integer. For an integer type this does not make sense as we know it is an integer to begin with (this is why we get an `AttributeError` below).

Formally, we call such type-specific behaviors **methods** and we will eventually fully introduce them when we talk about object orientation. For now, it suffices to know that we can execute them using the **dot operator** `.`.

In [35]:
b.is_integer()

True

In [36]:
a.is_integer()

AttributeError: 'int' object has no attribute 'is_integer'

`c` is a so-called **string** type (`str`), which we can view as Python's way of representing "text". Strings also come with their own behaviors, for example, to convert a text to lower or upper case.

In [37]:
type(c)

str

In [38]:
c.lower()

'python rocks'

In [39]:
c.upper()

'PYTHON ROCKS'

In [40]:
c.title()

'Python Rocks'

### Value

Trivially, every object also has a value to which it "evaluates" when referenced.

In [41]:
a

789

In [42]:
b

42.0

In [43]:
c

'Python rocks'

## Formal vs. Natural Languages

Just like the language of mathematics is good at expressing relationships among numbers and symbols, any programming language is just a formal language that is good at expressing computations.

Formal languages come with grammatical rules called **syntax**.

### Syntax Errors

If we do not follow the rules, the code cannot be **parsed** correctly, i.e., the program does not even start to run but raises a **syntax error**. Computers are very dumb in the sense that the slightest syntax error leads to the machine not understanding the code.

For example, if we wanted to write an accounting program that adds up currencies, we have to model dollar prices as floats as the dollar symbol cannot be read by Python.

In [44]:
3.99 $ + 10.40 $

SyntaxError: invalid syntax (<ipython-input-44-cafa82e54b9c>, line 1)

Python requires certain symbols at certain places (a `:` is missing here) ...

In [45]:
for i in range(10)
    print(i)

SyntaxError: invalid syntax (<ipython-input-45-7a8a49ad5eea>, line 1)

... and interprets whitespace / indentation unlike many other programming languages.

In [46]:
for i in range(10):
print(i)

IndentationError: expected an indented block (<ipython-input-46-0c8aafc23d7e>, line 2)

### Runtime Errors

Syntax errors as above are easy to find as the code will not even run in the first place.

However, there are also so-called **runtime errors** or **exceptions**, i.e., code that would run given correct values (i.e., user input).

This example does not work because just like in the "real" world, Python does not know how to divide by $0$.

In [47]:
1 / 0

ZeroDivisionError: division by zero

### Semantic Errors

So-called **semantic errors**, on the contrary, can be very hard to spot as they do not crash the program at all. The only way to find such errors is to run the program with test data for which we know the answer already and can then verify it. However, testing software is a whole discipline on its own and often very hard to do in practice.

In [48]:
count = 0
total = 0

for number in numbers:
    if number % 2 == 0:
        count = count + 1
        total = total + count  # count is wrong here, it should be number

average = total / count

In [49]:
average

3.0

Finding errors is is called **debugging**. For the history of the term, check this [link](https://en.wikipedia.org/wiki/Debugging).

## Best Practices

Adhering to just syntax rules is not enough. Over time, best practices and common **style guides** were created to make it easier for a programmer to get going with an established code base (often called **legacy code**). These rules are not enforced by Python itself, i.e., badly styled and un-readable code will run. At the very least, Python programs should be styled according to [PEP 8](https://www.python.org/dev/peps/pep-0008/) and documented "inline" according to [PEP 257](https://www.python.org/dev/peps/pep-0257/).

An easier to read version of PEP 8 can be found [here](https://pep8.org/). The video below features a well known "Pythonista" talking about the importance of code style.

In [50]:
from IPython.display import YouTubeVideo
YouTubeVideo("Hwckt4J96dI", width="60%")

For example, while the above code to calculate the average of the even numbers from 1 through 10 is correct, a Pythonista would re-write it in a more "Pythonic" way and use the [sum()](https://docs.python.org/3/library/functions.html#sum) and [len()](https://docs.python.org/3/library/functions.html#len) (= "length") built-in functions. Pythonic code runs faster in many cases and is less error prone.

In [51]:
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [52]:
evens = [n for n in numbers if n % 2 == 0]  # example for a so-called list comprehension

In [53]:
evens

[2, 4, 6, 8, 10]

In [54]:
average = sum(evens) / len(evens)  # built-in functions are much faster than a for-loop

In [55]:
average

6.0

To get a rough overview of the mindsets of a typical Python programmer, check these rules by an early Python core developer that are deemed so important that they are actually included in every Python installation.

In [56]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


### Jupyter Notebook Aspects to keep in Mind

#### Cell Order

Observe that you can run the cells in a Jupyter notebook in arbitrary order.

That means, for example, that a variable defined towards the bottom could accidently be referenced at the top of the notebook. This happens easily if we iteratively built a program and go back and forth between cells.

As a good practice, it is recommended to click on "Kernel" > "Restart & Run All" in the navigation bar once a notebook is finished. That restarts the Python process in the background (forgetting any state) and ensures that the notebook runs top to bottom without any errors.

#### Notebooks are linear

While this tutorial uses Jupyter notebooks, it is to be noted that "real" applications are almost always never just a "linear" (= top to bottom) program but instead can take many different flows of execution.

However, for a beginner's course it is often easier to just code in a linear fashion.

In real data science projects one would probably put re-usable functions into so-called Python modules (= \*.py files) and then use notebooks to built up a report or story line for a business argument to be made. Jupyter notebooks can contain images, videos, interactive buttons, plots, previews of tabular data, and much more. Also, they can be exported as simple PDFs and sent to managers and co-workers who do not know how to code.