# <center><font color='magenta'>**Python for DA1**</font></center>
### <center>Central European University, 2024-2025</center>

### Acknowledgments

I would like to say thanks to [Roberta Sinatra](https://www.robertasinatra.com/) and [Johannes Wachs](https://johanneswachs.com/) whose work this course was inspired by and party built on. 

# <center>Class 1</center>

## Jupyter notebooks

This file - a Jupyter notebook -  does not follow the standard pattern with Python code in a text file. Instead, a Jupyter notebook is stored as a file in the [JSON](http://en.wikipedia.org/wiki/JSON) format. The advantage is that we can mix formatted text, Python code and code output. It requires the Jupyter notebook server to run it though, and therefore isn't a stand-alone Python program as described above. Other than that, there is no difference between the Python code that goes into a program file or a Jupyter notebook.
We will return to JSON files later, when we will work with dictionaries, and advanced data structures.

## Variables and types

### Symbol names 

Variable names in Python can contain alphanumerical characters `a-z`, `A-Z`, `0-9` and some special characters such as `_`. Normal variable names must start with a letter. 

By convention, variable names start with a lower-case letter, and Class names start with a capital letter. 

In addition, there are a number of Python keywords **that cannot be used as variable names**. These keywords are:

    and, as, assert, break, class, continue, def, del, elif, else, except, 
    exec, finally, for, from, global, if, import, in, is, lambda, not, or,
    pass, print, raise, return, try, while, with, yield

Note: Be aware of the keyword `lambda`, which could easily be a natural variable name in a scientific program. But being a keyword, it cannot be used as a variable name.

### Assignment



The assignment operator in Python is `=`. Python is a dynamically typed language, so we do not need to specify the type of a variable when we create one.

Assigning a value to a new variable creates the variable:

In [None]:
a = 3

In [None]:
type(a)

In [None]:
b = 1.2

In [None]:
type(b)

In [None]:
c = 'a'

In [None]:
type(c)

In [None]:
d = 'abc'

In [None]:
type(d)

Variables can easily be casted into other types.

In [None]:
float(a)

In [None]:
str(a)

In [None]:
int(b)

When reassigned with a new value, its type can change. 

In [None]:
a = 1.3
type(a)

All characters have a respective number by which they are referred to. 

In [None]:
chr(42)

### Fundamental types

In [None]:
# integers
x = 1
type(x)

In [None]:
# float
x = 1.0
type(x)

In [None]:
# boolean
b1 = True
b2 = False

type(b1)

In [None]:
# complex numbers: note the use of `j` to specify the imaginary part
x = 1.0 - 1.0j
type(x)

In [None]:
b1

In [None]:
print(x)

In [None]:
print(x.real, x.imag)

### Operators

arithmetic

In [None]:
1 + 2, 1 - 2, 1 * 2, 1 / 2

In [None]:
# Integer division of float numbers
3.0 // 2.0

In [None]:
# Power is ** not ^
2**3

In [None]:
# / always results in floats
2 / 1

In [None]:
# if you need intergers use integer division instead
2 // 1

In [None]:
# Modulo
7%3

boolean

In [None]:
True and False

In [None]:
not False

In [None]:
True or False

comparison

In [None]:
a = 2
b = 3
c = 3

In [None]:
a > b

In [None]:
b == c

In [None]:
b is c

In [None]:
True == 1

### Strings

**Intro**

In [None]:
s = 'Hello Monty!'

In [None]:
s

In [None]:
print(s)

Double quotes work as well. However, they are still not treated the same. (See later.)

In [None]:
s2 = "Hello Monthy!"
s2

In [None]:
len(s) # get length

In [None]:
print(s.replace('Monty', 'Python'))

*Indexing* starts at 0 in Python. 

In [None]:
s[0:5]

In [None]:
s[-6:]

In [None]:
s[1:9:2] # start, stop, step

In [None]:
s[::2] # start and stop are missing so we are stepping along the whole string

**Manipulate and print**

In [None]:
s = 'Hello Monty Python!'

split() splits the text to a complex variable called `list`. 

In [None]:
s.split(' ')

In [None]:
s.split('M')

In [None]:
s.split(' ')[1]

Functions can directly be applied to the text itself. 

In [None]:
'Hello Monty Python!'.split(' ')[1]

Combine " and ' to print quotation marks. 

In [None]:
s = "Hello Monty 'Holy Grail' Python!"
print(s)

In [None]:
# this will throw an error
s = 'Hello Monty 'Holy Grail' Python!'

Special characters: \n, \t and alike. 

In [None]:
s = 'Hello \n Monthy!'
print(s)

Use another backslash (\\) to escape the escape character.

In [None]:
s = 'Hello \\n Monthy!'
print(s)

Start strings with the letter <font color= 'magenta'>**r**</font> to define *raw strings*. Raw strings are printed and interpreted as they are. 

In [None]:
print('Hello \t Monthy!') # plain
print(r'Hello \t Monthy!') # raw

Raw strings are espcially useful when defining Windows paths. (Not an issue in Linux & Mac.)

In [None]:
# this will throw an error
print('C:\Users')

In [None]:
print(r'C:\Users')

"\\" is the *escape character* that's why it creates a mess in strings. You can also escpate the escape character with an escape character.

In [None]:
print('C:\\Users') # Not an optimal solution in case of long paths. 

Concatenating

In [None]:
'Monthy' + 'Python'

In [None]:
''.join(['Monthy', 'Python'])

In [None]:
' '.join(['Monthy', 'Python']) # Note the space between the parenthesis. 

**Some more tweaking and formatting**

Use special characters to include variables in text.
- %s: strings
- %f: floats
- %d: integers

In [None]:
print('Hello, I am %s, I have been working for here for %f years and this is Python Class %d.'% ('Jenny', 2.5, 1))

In [None]:
# What if we replace %f with %s or %d?
print('Hello, I am %s, I have been working here for %s years and this is Python Class %d.'% ('Jenny', 2.5, 1))
print('Hello, I am %s, I have been working here for %d years and this is Python Class %f.'% ('Jenny', 2.5, 1))

We can even format the numbers.

In [None]:
print('Hello, I am %s, I have been working here for %.2f years and this is Python Class %d.'% ('Jenny', 2.5, 1))

The other way is use *f-strings*.

In [None]:
name = 'Jenny'
classnumber = 1
print(f'Hello, I am {name} and this is Python class {classnumber}.')

Use *f-strings* in your scripts to write queries. (A quick intro to SQL.)

In [None]:
database = 'SALES'
table = 'WEBSHOP_SALES'
month = '8'
day = 20 # both strings and integers will work

# using three consecutive double quotes you can write multiline strings

query = f"""
SELECT *
FROM {database}.{table}
WHERE month = {month}
AND day = {day}
"""

print(query)

Why do you think the following query will result in a *fail*?

In [None]:
database = 'SALES'
table = 'WEBSHOP_SALES'
month = 'April' # this will lead a query to error! 
day = 20 # both strings and integers will work

query = f"""
SELECT *
FROM {database}.{table}
WHERE month = {month}
AND day = {day}
"""

print(query)

In [None]:
database = 'SALES'
table = 'WEBSHOP_SALES'
month = 'April'
day = 20 # both strings and integers will work

query = f"""
SELECT *
FROM {database}.{table}
WHERE month = '{month}'
AND day = {day}
"""

print(query)

And of course there are even more ways to format text.

In [None]:
print('{} divided by {} is {}.'.format(2000,1500,2000/1500))

Some fancier formatting. 

In [None]:
print('{:,d} divided by {:,.0f} is {:.2f}.'.format(2000,1500,2000/1500))

In [None]:
print('{:8.2f}'.format(10/3))
print('{:8.2f}'.format(100/3))
print('{:8.2f}'.format(1000/3))
print('{:8.2f}'.format(10000/3))
print('{:8.2f}'.format(100000/3))
print('{:8.2f}'.format(1000000/3))

In [None]:
print('{:8,.2f}'.format(100000/3)) # Add a thousand separator.

**Logical operations with strings**

In [None]:
s = 'Java, C++, COBOL '
'Python' in s

In [None]:
s = 'Monthy Python'
'Python' in s

## Basic Data Structures

Python has four built-in general purpose containers: `lists`, `tuples`, `dictionaries` and `sets`. 

### Lists

Python lists are one of the most used datatypes. They can contain elemets of various types which makes them very popuplar but this feature makes large lists memory inefficient! 

In [None]:
l = [1, 2, 3, 's', 1.2]

In [None]:
print(type(l))

Lists can also contain other lists so lists can be *nested*. 

In [None]:
b = 'Monthy Python'
l = [1, 2, ['a', b]]

In [None]:
l

In [None]:
l[2][1]

The classmethod *reverse()* reverses the list **in place**, that is it modifies the list which stays like that.

In [None]:
l.reverse()
l

When the type match, they can also be sorted, also in place.

In [None]:
l = ['f', 'a', 'z', 't']
l.sort()
l

Loops and control flows are the subjects of later classes but you can iterate on lists:

In [None]:
for mickey in l:
    print(mickey)

Lists play a very important role in Python. For example they are used in loops and other flow control structures (discussed later). There are a number of convenient functions for generating lists of various types, for example the `range` function:

In [None]:
start = 10
stop = 20
step = 2
list(range(start, stop, step)) # 'start is included in the list but 'stop' in not ! 

In [None]:
list(range(10)) # with one integer input parameter you'll get a list from zero to 'stop' with an increment valae of 1

In [None]:
range(10)

In [None]:
type(range(10))

 List can be modified, so they are *mutable*.

In [None]:
l1 = [] # instantiate an empty list

In [None]:
l1.append(1)
l1.append('a')
l1.append(1.2)
l1

In [None]:
import numpy as np
l2 = ['Python', 15, np.random.rand(2,2)] # np.random.rand(2,2) creates a 2x2 matrix with random numbers in the [0,1] interval in its elements
l2

In [None]:
l = l1 + l2 # concatenate the two lists
l

In [None]:
len(l) # length

List slicing: `l[start:stop]`, where 
- indexing starts at 0
- `'start'` is included while `'stop'` is not

In [None]:
l[1:4]

In [None]:
l[-1] # the last element is indexed as -1

In [None]:
l.remove('a') # remove a particular item
l

In [None]:
del l[2] # delete an item at a given position
print(l)

List items can be concatenated using the `join` method. 

In [None]:
l = ['Monty', 'Python']
print(''.join(l))
print(' '.join(l))
print(', '.join(l))

It comes handy in writing automated SQL scripts.

In [None]:
database = 'SALES'
table = 'WEBSHOP_SALES'
month = 'April'
day = ['20', '21', '22'] # since it is an input to a string function (the 'join()' method, list elements can only be strings! 

query = f"""
SELECT *
FROM {database}.{table}
WHERE month = '{month}'
AND day IN ({", ".join(day)})
"""

# note paranthesis (needed for the proper SQL syntax) and the curly braces (for the f-string) after the IN clause

print(query)

**Sorting**  
We can sort lists by built-in or customized functions.

In [None]:
ls_cities = ['New York', 'Rio', 'Tokyo']

In [None]:
# sort by length
sorted(ls_cities, key = len) 

In [None]:
# sort by length reveresed
sorted(ls_cities, key = len, reverse= True) 

In [None]:
# sorting by a custom function: sorted by the third character of each word
sorted(ls_cities, key = lambda x: x[2]) # rember: indexing start at 0

We will revisit `lambda functions` later in the course. 

### Tuples

Tuples are like lists, except that they cannot be modified once created, that is they are *immutable*. 

In Python, tuples are created using the syntax `(..., ..., ...)`, or even `..., ...`:

In [None]:
height_and_weight = (165, 60)
print(height_and_weight)
print(type(height_and_weight))

You can iterate on tuples just like on lists:

In [None]:
for parameter in height_and_weight:
    print(parameter)

Access like lists:

In [None]:
height_and_weight[0]

Remember, tuples are *immutable*

In [None]:
height_and_weight[0] = 170 # this assignment will throw an error

We can *unpack* a tuple by assigning it to a comma-separated list of variables:

In [None]:
height, weight = height_and_weight
print(height)
print(weight)

### Dictionaries

Dictionaries are also little like lists, except that each element is a *key-value pair*. Dictionaries are written with curly brackets. A dictionary is a collection which is *mutable* and does not allow duplicates. The syntax for dictionaries is `{key1 : value1, key2 : value 2, ...}:` Take keys as labels of the particular values. 

An *'item'* in a dictionary is a tuple of (*'key'*, *'value'*). 

Another similarity to lists is that these *values* can be any kind of objects: integers, strings, lists, functions, even other dictionaries. 

The difference to lists is that in case of the former the ordering is fix. (This is how we iterate on them.) The order of the dictionary values, however, is not important as these values are retreived based not on their position but on their labels or *keys*. The fact that ordering of the values is not important makes searching a dictionary very fast. (See [hash functions](https://en.wikipedia.org/wiki/Hash_function))

In [None]:
class_size = {
    'Java': 20,
    'C++': 23, 
    'Python': 29
}

In [None]:
print(type(class_size))
print(class_size)

Strings, numbers, and tuples work as keys, and values can be of any type. Other types may or may not work correctly as keys (strings and tuples work cleanly since they are immutable). 

Looking up a value which is not in the dict throws a `KeyError` -- use `in` to check if the key is in the dict, or use `dict.get(key)` which returns the value or `None` if the key is not present (or `get(key, not-found)` allows you to specify what value to return in the not-found case).

In [None]:
class_size['Java']

In [None]:
class_size['JavaScript']

In [None]:
class_size.get('JavaScript', 0)

In [None]:
'JavaScript' in class_size.keys()

To access each element:

In [None]:
for key, value in class_size.items():
    print('The number of students in the', key, 'class is', value)

In [None]:
for key in class_size.keys():
    print(key)

### Sets

A Python `set` is a **collection** which is unordered, unchangeable, and unindexed. The most important aspect of sets that they _can't have two items of the same value_.

You can create a set by calling the `set()` method and adding the inputs or casting a list into a set object. In the latter case the `set()` method removes the duplicates from list.

In [None]:
ls_a = ['a', 'b', 'a', np.pi, 36] 

In [None]:
st_a = set(ls_a)
st_a

In [None]:
st_b = {'d', 36, 'Holy Grail', np.pi}
st_b

We can perform standard _set operations_ with sets.

In [None]:
# union
st_a | st_b

In [None]:
# intersection
st_a & st_b

In [None]:
# difference
st_a - st_b

In [None]:
st_b - st_a

### JSON

`JSON` is a syntax for storing and exchanging data. JSON is text, written with *JavaScript object notation*, which a special syntax for writing data as text. In order to work with JSON objects you need to import the `json` module. JSON is one of the few standard format for configuration (aka. 'config') files.

In [None]:
import json

In [None]:
JSON_string = '{"class 1" : "coding basics", "class 2": "navigating the file system", "class 3" : "data manipulation"}'

In [None]:
type(JSON_string)

Using the `loads()` method of the `json` module the JSON string can be cast into a dictionary. 

In [None]:
dc = json.loads(JSON_string)

In [None]:
type(dc)

In [None]:
dc

In [None]:
dc.keys()

In [None]:
dc.values()

In [None]:
type(dc)

We can cast a dictionary into a **valid** JSON string using the `dumps()` method.

In [None]:
dc_new = {
    'alpha': 0,
    'beta': 'car',
    'gamma': 1.4,
    'delta': None,
    'epsilon': True
}

In [None]:
print(dc_new)

In [None]:
new_json_string = json.dumps(dc_new)

In [None]:
print(new_json_string) # Note that the parentheses change to valid JSON standards. The boolean 'True' in Python becomes 'true' and 'None' becomse 'null' in JSON. 

The conversion patterns are given in the [JSON documentation](https://docs.python.org/2/library/json.html#py-to-json-table).

## Modules

Most of the functionality in Python is provided by *modules*. The Python Standard Library is a large collection of modules that provides *cross-platform* implementations of common facilities such as access to the operating system, file I/O, string management, network communication, and much more. We will use some of these modules along the course.

Formally, a module is a Python file with the `.py` extension which define classes, functions, variables, or even runnable codes. We can also define our own modules, which is a great way produce reusable codes and to keep our workflow organized. 

The very basic Python functionalities are automatically loaded when starting Python, but most functions, methods, object types, etc. can only be used by *importing* these modules with the `import` statement. 

In [None]:
import math

This includes the whole module and makes it available for use later in the program. When using the module's methods we need to refer to the method's name. 

In [None]:
math.cos(0)

Alternatively, we can chose to import all symbols (functions and variables) in a module to the current namespace, so that we don't need to use the prefix `math.` every time we use something from the math module:

In [None]:
from math import *

cos(100)

This pattern can be very convenient, but in *large programs that include many modules* it is often a good idea to keep the symbols from each module in their own ***namespaces***, by using the import math pattern. This would eliminate potentially confusing problems with *namespace collisions*, that is when more modules have functions and method with the same name which perform completely different tasks.

A third way is to import the necessary functions only.

In [None]:
from numpy import ceil, floor

In [None]:
print(ceil(5.5))
print(floor(5.5))

When importing modules we often use ***aliases***, or simple abbreviations of the module names which we can use for namespace definition. These aliases can be of any string (you could alias you module as *mickeymouse* or any non-keyword string), but it makes sense to follow general conventions for practical purposes: when we find solutions to our coding problems on the web it is easier to copy and paste those solutions if we don't need to redefine those aliases.

In [None]:
import pandas as pd # 'pd' is the conventional alias for the 'pandas' module

In [None]:
data = [1,2,3,4]

series = pd.Series(data)
print(series)

As said, we can also import our own module. 

In [None]:
import utils

In [None]:
utils.print_hello('Monty Python')

## Conditional Statements <a class="anchor" id="conditionals"></a>

Conditional statments is the `if`-`else` structure. The program performs an operation (or more) if certain conditions are met, and - optionally - performs some other if those conditions are not fulfilled.

In [None]:
import random

Condtional statements are controlled by ***indentation***. Each new embedded condition needs to be shifted one tab right. (Other languages, like Java or JavaScript, use curly braces.)

In [None]:
r = random.randint(20,34)
print(r)
if r < 25:
    print('A small number!')
elif r < 30:
    print('A moderately high number.')
else:
    print('A large number!')

Question: How are random numbers generated?

Conditional statements do not have to have an `else` branch. If the condition is not met the program can also stay idle. 

In [None]:
a = random.randint(1,12)
b = random.randint(1,16)

print('a:', a)
print('b:', b)

if a > 6:
    print("'a' is large")
    if b > a:
        print('Both numbers are large.')
        print('Result: b is larger than a.')

## Control Flows

### The _'for'_ Loop <a class="anchor" id="for"></a>

In [None]:
for i in range(22): # Remember: 20 is not included in the range! 
    if i%2 == 0: # The 'modulo' operator returns the integer part left after an integer division.
        print(f'Number {i} is even.')
    else:
        print(f'Number {i} is odd.')

Note: a `range` object is *iterable*.

In [None]:
for word in ['Business', 'analytics', 'with', 'Python']:
    print(word, len(word)) # functions can also be print inputs

In [None]:
list_capitals = []
for i in range(65,91):
    list_capitals.append(chr(i))
print(list_capitals)

In [None]:
range(60,91)

In [None]:
chr(42)

The `enumerate` function helps you get a counter. 

In [None]:
for k, v in enumerate(list_capitals):
    print(k, v)

Add some simple formatting: right-adjust k, the counter. This is what the `.rjust()` function does. This, however, is a *string function*, so we need to *cast* our 'k' variable, which an integer, into string. For this we use the `str()` function.

In [None]:
for k, v in enumerate(list_capitals):
    print(str(k).rjust(2)+': ', v)

### The *'while'* Loop <a class="anchor" id="while"></a>

In [None]:
i = 0 # the counter
while i < 20:
    if i%2 == 0:
        print('Number %d is even.'% i)
    else:
        print('Number %d is odd.'% i)
    i += 1 # increment in Python (same as i++ in Java)
print('\nDone.') # Indented so that it will only print at the end.

<font color = 'red'>**Caution!!!**</font> If you don't increment the counter, the loop will never stop!

If you use '*True*' in the `while` condition the script runs until manual interruption. 

In [None]:
from IPython.display import clear_output
import time

i = 1
while True: # This syntax makes it run forever, or untill manual interruption. 
    print(i)
    i += 1
    time.sleep(1)
    clear_output()

To interrupt the script in a code cell click in the cell and then click &#9632; (the black rectangle icon) on the notebook's menu bar. 

### List Comprehension <a class="anchor" id="comprehension"></a>

List comprehension is a logical construct to create a list from another lists or from an iterable, or to modify an existing list *in place*. 

A list comprehension mimics the mathematic formalism of defining sets. For example:
$$ L=\lbrace x^2 : x \in \lbrace 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10\rbrace \rbrace.$$
This translates into:

In [None]:
L = [x**2 for x in range(0,11)]
L

You can also combine it with conditional statements. For example:

In [None]:
[x for x in L if x%2 == 1]

You can also use an ``if else`` statement

In [None]:
['even' if x%2 == 0 else 'odd' for x in L]

## To Do

In [None]:
ls_actors_and_actresses = [
    'Blake Lively', 'Halle Berry', 'Mark Wahlberg', 'Michelle Monaghan', 'Elliot Page', 'Ryan Reynolds', 'Lily Collins', 'Amy Adams', 'Kristen Wiig', 'Channing Tatum', 
    'Adam Sandler', 'Russel Crowe', 'Henry Cavill', 'Theresa Palmer', 'Ian Holm', 'Hugh Jackman', 'Nicolas Cage', 'Tom Hardy', 'Tom Cruise', 'Alexandra Daddario'
]

- Exercise 1: Sort alphabetically by their last names.
- Exercise 2: Sort by any twisted logic of your choice.

## Homework

Create a list comprehension based on the following formula:
$$ L=\lbrace x^2 : x \in \lbrace -1, 0, \pi, e, cos(\pi), 1.5, 8\rbrace \mid x \in \mathbb{W} \rbrace.$$
Hint: You need to find the Python way of checking the condition whether the element is in the mathematical set of whole numbers denoted as $\mathbb{W}$. Also, you need to get exact value of $\pi$, $e$ and $cos(\pi)$ using a Python formula or definition.