# Structure of a Program 

- program := sequence of instructions that specify how to perform a computation (= "black box")

- input := e.g., data from a CSV file, text entered on a command line

- output := result of the computation

- variables := storage of intermediate "state"

- operators / statements := operations like addition or multiplication, reading / saving a file, commands that "do" something or change the state of the variables

- flow control
  - conditional execution ("if statements")
  - repetitions (for or while loops)
 

## Example: Calculate the Average of all even Numbers in a List

In [1]:
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [2]:
numbers

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [3]:
count = 0
total = 0

for number in numbers:
    if number % 2 == 0:
        count = count + 1
        total = total + number

average = total / count

In [4]:
average

6.0

## Generating Cell Output in a Jupyter Notebook

Note that only two of the previous four cells generate an output while two remained "silent".

By default, Jupyter notebooks show the value of a cell's last **expression**.

To visualize something before the end of the cell, use the **print() function**.

In [5]:
print("Hello, World!")

Hello, World!


In [6]:
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


Note that Python begins counting at 0 (This is not the case for many other languages, e.g., Matlab, R, or Stata).

Also observe that there is no "Out[5]" for the previous cell as the cell does not end with an expression.

## Arithmetic Operators

Python comes with basic mathematical operators built in.

In [7]:
77 + 13

90

In [8]:
101 - 93

8

In [9]:
2 * 21

42

Note the difference between 42 and 42.0 that shows the concept of a **type**.

In [10]:
84 / 2

42.0

The so-called **floor / integer division operator** always rounds down.

In [11]:
84 // 2

42

In [12]:
85 // 2

42

To obtain the remainder of a division, we can use the **modulo operator**.

In [13]:
85 % 2

1

Note that the remainder is $0$ if a number is divisable by another.

In [14]:
49 % 7

0

Modulo division can also be useful if we, for example, need to get the last one or two digits of a large integer.

In [15]:
123 % 10

3

In [16]:
123 % 100

23

Raising a number to a power is performed with the **exponentiation operator** (This is different from the ^ operator many other languages might use).

In [17]:
2 ** 3

8

The normal order of precedence from mathematics applies (= "PEMDAS" rule) but parenthesis help avoid confusion.

In [18]:
3 ** 2 * 2 

18

In [19]:
(3 ** 2) * 2

18

In [20]:
3 ** (2 * 2)

81

Some programmers also use "style" conventions, i.e., whitespace.

In [21]:
3**2 * 2

18

There are many more non-mathematical operators that are introduced throughout this tutorial together with the concepts they implement or support.

## Values vs. Types

Python is a so-called **object-oriented** language. As such, it stores not only a value but also its type in the same memory location.

Each object can be characterized with respect to **three** characteristics.

In [22]:
a = 789
b = 42.0
c = "Python rocks"

a) **Identity / Memory Location** (this is important when we deal with "big" data)

In [23]:
id(a)

140486905764528

In [24]:
id(b)

140486906024704

In [25]:
id(c)

140486905073776

The same value might be stored at several memory locations.

In [26]:
d = 789

`a` and `d` are actually two different objects that happen to have the same value as can be checked with the **equality operator**.

In [27]:
a == d

True

On the contrary, `a` and `d` are different objects as can be seen with the **identity operator**.

In [28]:
a is d

False

b) **Type** (different types of objects come with different functionalities)

In [29]:
type(a)

int

In [30]:
type(b)

float

A *float*, for example, can be tested if it is an integer, which does not make sense for *integers*.

In [31]:
b.is_integer()

True

In [32]:
a.is_integer()

AttributeError: 'int' object has no attribute 'is_integer'

A *string* also comes with some utilities. Such type-specific functionalities are called **methods** (we will eventually fully introduce them when we talk about object orientation).

In [33]:
type(c)

str

In [34]:
c.lower()

'python rocks'

In [35]:
c.upper()

'PYTHON ROCKS'

In [36]:
c.title()

'Python Rocks'

c) **Value**

In [37]:
a

789

In [38]:
b

42.0

In [39]:
c

'Python rocks'

## Formal vs. Natural Languages

Just like the language of mathematics is good at expressing relationships among numbers and symbols, any programming language is just a formal language that is good at expressing computations.

Formal languages come with grammatical rules called **syntax**.

If we do not follow the rules, the code cannot be **parsed** correctly, i.e., the program does not even start to run but raises a **syntax error**. Computers are very dumb in the sense that the slightest syntax error leads to the machine not understanding the code.

For example, if we wanted to write an accounting program that adds up currencies, we have to model dollar prices as floats as the dollar symbol cannot be read by Python.

In [40]:
3.99 $ + 10.40 $

SyntaxError: invalid syntax (<ipython-input-40-cafa82e54b9c>, line 1)

Python requires certain symbols at certain places ...

In [41]:
for i in range(10)
    print(i)

SyntaxError: invalid syntax (<ipython-input-41-7a8a49ad5eea>, line 1)

... and interprets *whitespace* / indentation unlike many other programming languages.

In [42]:
for i in range(10):
print(i)

IndentationError: expected an indented block (<ipython-input-42-0c8aafc23d7e>, line 2)

Syntax errors as above are easy to find as the code will not even run in the first place.

However, there are also so-called **runtime errors** or **exceptions**, i.e., code that would run given correct values.

In [43]:
1 / 0

ZeroDivisionError: division by zero

So-called **semantic errors**, on the contrary, can be very hard to spot as they do not crash the program at all. The only way to find such errors is to run the program with test data for which we know the answer already and can then verify it.

In [44]:
count = 0
total = 0

for number in numbers:
    if number % 2 == 0:
        count = count + 1
        total = total + count  # count is wrong here, it should be number

average = total / count

In [45]:
average

3.0

Finding errors is is called **debugging**. For the history of the term, check this [link](https://en.wikipedia.org/wiki/Debugging).

## Best Practices

Adhering to just syntax rules is not enough. Over time, best practices and common **style guides** were created to make it easier for a programmer to get going with an established code base (often called **legacy code**). These rules are not enforced by Python itself, i.e., badly styled and un-readable code will run. At the very least, Python programs should be styled according to [PEP 8](https://www.python.org/dev/peps/pep-0008/) and documented "inline" according to [PEP 257](https://www.python.org/dev/peps/pep-0257/).

For example, while the above code to calculate the average of the even numbers from 1 through 10 is correct, a "Pythonista" would re-write it in a more "Pythonic" way.

In [46]:
evens = [n for n in numbers if n % 2 == 0]  # example for a so-called list comprehension

In [47]:
evens

[2, 4, 6, 8, 10]

In [48]:
average = sum(evens) / len(evens)  # built-in functions are much faster than a for-loop

In [49]:
average

6.0

To get a rough overview of the mindsets of a typical Python programmer, check these rules by an early Python core developer.

In [50]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


### Jupyter Notebook Aspects to keep in Mind

#### Cell Order

Observe that you can run the cells in a Jupyter notebook in arbitrary order.

That means, for example, that a variable defined towards the bottom could accidently be referenced at the top of the notebook. This happens easily if we iteratively built a program and go back and forth between cells.

As a good practice, it is recommended to click on "Kernel" > "Restart & Run All" in the navigation bar once a notebook is finished. That restarts the Python process in the background (forgetting any state) and ensures that the notebook runs top to bottom without any errors.

#### Notebooks are linear

While this tutorial uses Jupyter notebooks, it is to be noted that "real" applications are almost always never just a "linear" (= top to bottom) program but instead can take many different flows of execution.

However, for a beginner's course it is often easier to just code in a linear fashion.

In real data science projects one would probably put re-usable functions into so-called Python modules (= \*.py files) and then use notebooks to built up a report or story line for a business argument to be made. Jupyter notebooks can contain images, videos, interactive buttons, plots, previews of tabular data, and much more. Also, they can be exported as simple PDFs and sent to managers and co-workers who do not know how to code.