#  INGENIO, July 2023: Python Files and Modules

The objective of this notebook is to both present the basics of Python syntax and refresh some concepts. Thus, this notebook is meant for beginners and intermediate users alike. In case you found some concepts unclear or needing further explanation, do not hesitate in resorting to Stack Overflow or other resources online, including any package documentation.

## Why Python?
- Python works on all major platforms (Windows, Mac, Linux, etc.)
- Python handles well large datasets and complex computational problems
- Python's syntax is similar to human language and thus relatively intuitive
- Python's vast library of packages allow for straightforward, off-the-shelf implementation of otherwise cumbersome programs
- Python is designed for general purpose programming, meaning it can be treated in a procedural, object-oriented or functional way

## Early-on Python hacks
- The most recent major version of Python is Python 3, and support for Python 2 is gradually being withdrawn. The syntax is not identical, so a recommendation is to focus on learning Python 3
- It is possible to write Python in many different environments, from your Windows' command interpreter to high level integrated environments. A selection of this can be found in Anaconda
- [Anaconda](https://www.anaconda.com/) is a free and open-source distribution of Python and R programming languages that greatly streamlines package management and deployment. The use of Anaconda is highly recommended, if not indispensable
- This class and a non-insignificant body of scholars use Jupyter Notebook as their IDE (Integrated Development Environment) of choice. Getting acquainted early on with all the [shortcuts and commands](https://www.edureka.co/blog/wp-content/uploads/2018/10/Jupyter_Notebook_CheatSheet_Edureka.pdf) is highly recommended
- It may be the case that you need to quickly shift between Jupyter Notebooks saved in different folders. A nice trick is to use Mozilla Firefox's [Python Notebook Viewer](https://addons.mozilla.org/en-US/firefox/addon/python-notebook-viewer/)

## Python Files

__Structure of the Code file__
    
* Code files end with "`.py`":

        myprogram.py
        
* Jupyter Notebooks end with "`.ipynb`":

        myprogram.ipynb

* Every line is a Python statement (or part thereof). 

        * comment line start with `#`

__Run Python Program from command line__

* Execute with python interpreter

        $ python myprogram.py        

* Direct use on UNIX systems
    - define the path to the interpreter on the first line of the program as __comment__

        \#!/usr/bin/env python
        

    - If setting the executable flag of the file, we can run the program directly in the shell:

        $ myprogram.py


## Python Modules

### Overview
 * **Modules** group Python functions
 * __Modules are files__ with containing Python code (e.g. myprogram.py)
 * **Using Modules** requires to import them first using the __`import` statement__
 * **Scope:** Import a module to the current namespace, defined in its own namespace or used as `module.function`
 
The Python Standard Library is a large collection of modules that provides *cross-platform* implementations of common facilities such as access to the operating system, file I/O, string management, network communication, and much more.

### References
 
 * The Python Language Reference: https://docs.python.org/3/reference/index.html
 * The Python Standard Library: https://docs.python.org/3/library/

### Module import

For example, to import the module `math`, which contains many standard mathematical functions, we can do:

In [135]:
import math

Imports the **whole module** and makes it available under **the moduls namespace**. For example, we can do:

In [136]:
import math

x = math.cos(2 * math.pi)

print(x)

1.0


### Import a Module in current Namespace

* Goal: avoid writing the prefix `module.` by __importing all symbols (functions and variables)__ of one module into the current namespace 

In [137]:
from math import *

x = cos(2 * pi)

print(x)

1.0


* **Caveat:** possible namespace conflicts in large programs

### Import selected Functions into the current Namespace 

Selective import of function from a module

In [138]:
from math import cos, pi

x = cos(2 * pi)

print(x)

1.0


### Changing the Name/Namespace of a Functions/Modules at Import
Names of variables, functions and the namespace of moduls can be changed at  import using the **`as`** keyword

In [139]:
from math import cos as c
from math import pi as p
import math as m
x = c(2 * p)
y = m.cos(2 * m.pi)
print (x,y)

1.0 1.0


**Caveat:** Readability may suffer.

### Looking at what a module contains

Once a module is imported, we can list the symbols it provides using the `dir` function:

In [140]:
import math

print(dir(math))

['__doc__', '__loader__', '__name__', '__package__', '__spec__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'comb', 'copysign', 'cos', 'cosh', 'degrees', 'dist', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose', 'isfinite', 'isinf', 'isnan', 'isqrt', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'nan', 'perm', 'pi', 'pow', 'prod', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'tau', 'trunc']


### Getting Help on a Function

And using the function `help` we can get a description of each function (almost .. not all functions have docstrings, as they are technically called, but the vast majority of functions are documented this way). 

In [141]:
help(math.log)

Help on built-in function log in module math:

log(...)
    log(x, [base=math.e])
    Return the logarithm of x to the given base.
    
    If the base not specified, returns the natural logarithm (base e) of x.



In [142]:
log(10)

2.302585092994046

In [143]:
log(10, 2)

3.3219280948873626

### Getting Help on a Module

Use `help` function directly on modules: 

    help(math) 

Some very useful modules form the Python standard library are 

 * `os`
 * `sys`
 * `math`
 * `shutil`
 * `re`
 * `subprocess`
 * `multiprocessing`
 * `threading`. 

A complete lists of standard modules for Python 2 and Python 3 are available at http://docs.python.org/2/library/ and http://docs.python.org/3/library/, respectively.

### Exercise: Explore the math package

**Setup**

 * Open the ipython notebook using the command `'ipython notebook'` from the directory where you want to store your exercises
 * Create a new notebook - use a meaningfull naming scheme, like for example `'Durante-coursname-semester'`
 * Create a first markdown block and create a header with title, name, course and semester
 * Work through the ipython notebook documentation in https://ipython.readthedocs.io/en/stable/index.html 
   - Learn the principles behind the notebook, how it is started, how you edit code and text and the ipython magic commands
 * Create a first code block and start with the following exercise
 
**Exercise**

Explore the math package and make the following calculations

* `cos(0.7) + sin(0.3)`
* factorial of 20
* round down the following numbers: `1.4`, `3.5`, `4.8`
* check the run-time behaviour of different functions in the `math` package 

# Variables and Types in Python

## Syntax and Naming Convention

 * __Variable names__ contain 
    * alphanumerical characters `a-z`, `A-Z`, `0-9` 
    * some special characters such as `_`. 
   

 * __Convention__ 
     * variable names start with a lower-case letter
     * Class names start with a capital letter. 
     * **Visible variable** names must start with a letter. 
     * **Hidden variables** and class variables start with a double underscore`__`
     

 * __Python keywords__ not usable as variable names:
 
 
    and, as, assert, break, class, continue, def, del, elif, else, except, 
    exec, finally, for, from, global, if, import, in, is, lambda, not, or,
    pass, print, raise, return, try, while, with, yield
    

## Assignment


The assignment operator in Python is **`=`**. 

The type is determined **dynamically** on assignment. No explicit definition

In [144]:
# variable assignments
x = 1.0
my_variable = 12.2

The type is derived form the value it was assigned (duck-typing)

In [145]:
type(x)

float

If we assign a new value to a variable, its type can change.

In [146]:
x = 1

In [147]:
type(x)

int

If we try to use a variable that has not yet been defined we get an `NameError`:

In [148]:
print(y)

1.0


## Basictypes

In [149]:
# integers
x = 1
type(x)

int

In [150]:
# float
x = 1.0
type(x)

float

In [151]:
# boolean
b1 = True
b2 = False

type(b1)

bool

In [152]:
# complex numbers: note the use of `j` to specify the imaginary part
x = 1.0 - 1.0j
type(x )

complex

In [153]:
print(x)

(1-1j)


In [154]:
print(x.real, x.imag)

1.0 -1.0


## Type utility functions


The module `types` contains a number of type name definitions that can be used to test if variables are of certain types:

In [155]:
import types

# print all types defined in the `types` module
print(dir(types))

['AsyncGeneratorType', 'BuiltinFunctionType', 'BuiltinMethodType', 'CellType', 'ClassMethodDescriptorType', 'CodeType', 'CoroutineType', 'DynamicClassAttribute', 'FrameType', 'FunctionType', 'GeneratorType', 'GetSetDescriptorType', 'LambdaType', 'MappingProxyType', 'MemberDescriptorType', 'MethodDescriptorType', 'MethodType', 'MethodWrapperType', 'ModuleType', 'SimpleNamespace', 'TracebackType', 'WrapperDescriptorType', '_GeneratorWrapper', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_calculate_meta', '_cell_factory', 'coroutine', 'new_class', 'prepare_class', 'resolve_bases']


In [156]:
x = 1.0

# check if the variable x is a float
type(x) is float

True

In [157]:
# check if the variable x is an int
type(x) is int

False

We can also use the `isinstance` method for testing types of variables:

In [158]:
isinstance(x, float)

True

## Type casting

In [159]:
x = 1.5

print(x, type(x))

1.5 <class 'float'>


In [160]:
x = int(x)

print(x, type(x))

1 <class 'int'>


In [161]:
z = complex(x)

print(z, type(z))

(1+0j) <class 'complex'>


#### Complex Numbers

Complex variables cannot be cast to floats or integers. We need to use `z.real` or `z.imag` to extract the part of the complex number we want:

In [162]:
y = bool(z.real)

print(z.real, " -> ", y, type(y))

y = bool(z.imag)

print(z.imag, " -> ", y, type(y))

1.0  ->  True <class 'bool'>
0.0  ->  False <class 'bool'>


# Operators and Comparisons

Most operators and comparisons in Python work as one would expect. We will briefly outline the following operators

 * Arithmetic Operators
 * Boolean Operators 
 * Comparison Operators


## Arithmetic Operators


* Arithmetic operators `+`, `-`, `*`, `/`, `//` (integer division), '**' power


In [163]:
1 + 2, 1 - 2, 1 * 2, 1 / 2

(3, -1, 2, 0.5)

In [164]:
1.0 + 2.0, 1.0 - 2.0, 1.0 * 2.0, 1.0 / 2.0

(3.0, -1.0, 2.0, 0.5)

In [165]:
# Integer division of float numbers
3.0 // 2.0

1.0

In [166]:
# Note! The power operators in python isn't ^, but **
2 ** 2

4

## Boolean Operators 



* The boolean operators are spelled out as words `and`, `not`, `or`. 

In [167]:
True and False

False

In [168]:
not False

True

In [169]:
True or False

True

## Comparison Operators



* Comparison operators `>`, `<`, `>=` (greater or equal), `<=` (less or equal), `==` equality, `is` identical.

In [170]:
2 > 1, 2 < 1

(True, False)

In [171]:
2 > 2, 2 < 2

(False, False)

In [172]:
2 >= 2, 2 <= 2

(True, True)

In [173]:
# equality
[1,2] == [1,2]

True

In [174]:
# objects identical?
l1 = l2 = [1,2]

l1 is l2

True

In [175]:
[1,2] is [1,2]

False

# Basic Data Structures 



Python provides a large set of fast, easy to use data structures. In particular

 * Strings
 * List  (mutable arrays)
 * Tupel (inmutable Lists)
 * Dictionaries (Maps)
 
Built-in data structures are implemented in C.


## Strings


Strings are the variable type that is used for storing text messages as array of characters/bytes. 

In [176]:
s = "Hello world"
type(s)

str

In [177]:
# length of the string: the number of characters
len(s)

11

In [178]:
# replace a substring in a string with somethign else
s2 = s.replace( "world", "test")
print(s2)

Hello test


We can index a character in a string using `[]`: 

**Note:** Indexing start at 0!

In [179]:
s[0]

'H'

### Basic Indexing

We can extract a part of a string using the syntax `[start:stop]`, which extracts characters between index `start` and `stop`:

In [180]:
s[0:5]

'Hello'

If we omit either (or both) of `start` or `stop` from `[start:stop]`, the default is the beginning and the end of the string, respectively:

In [181]:
s[:5]

'Hello'

In [182]:
s[6:]

'world'

In [183]:
s[:]

'Hello world'

### Advanced Indexing using :
Define the step size using the syntax **`[start:end:step]`** (the default value for `step` is 1, as we saw above):

In [184]:
s[::1]

'Hello world'

In [185]:
s[::2]

'Hlowrd'

This technique is called *slicing*. Read more about the syntax here: http://docs.python.org/release/2.7.3/library/functions.html?highlight=slice#slice

Python has a very rich set of functions for text processing. See for example http://docs.python.org/2/library/string.html for more information.

### String formatting examples

In [186]:
print("str1", "str2", "str3")  # The print statement concatenates strings with a space

str1 str2 str3


In [187]:
print("str1", 1.0, False, -1j)  # The print statements converts all arguments to strings

str1 1.0 False (-0-1j)


In [188]:
print("str1" + "str2" + "str3") # strings added with + are concatenated without space

str1str2str3


In [189]:
print("value = %f" % 1.0)       # we can use C-style string formatting

value = 1.000000


In [190]:
# this formatting creates a string
s2 = "value1 = %.2f. value2 = %d" % (3.1415, 1.5)

print(s2)

value1 = 3.14. value2 = 1


In [191]:
# alternative, more intuitive way of formatting a string 
s3 = 'value1 = {0}, value2 = {1}'.format(3.1415, 1.5)

print(s3)

value1 = 3.1415, value2 = 1.5


## List



Lists are very similar to strings, except that __each element can be of any type.__

The syntax for creating lists in Python is `[...]`:

In [192]:
l = [1,2,3,4]

print(type(l))
print(l)

<class 'list'>
[1, 2, 3, 4]


See `help(list)` for more details, or read the online documentation 

### List Indexing 
We can use the __same slicing techniques__ to manipulate lists as we could use on __strings__:

In [193]:
print(l)

print(l[1:3])

print(l[::2])

[1, 2, 3, 4]
[2, 3]
[1, 3]


**Noe:** Indexing starts at 0!

In [194]:
l[0]

1

### Heterogeneous Types and Nesting
Elements in a list do not all have to be of the same type:

In [195]:
l = [1, 'a', 1.0, 1-1j]

print(l)

[1, 'a', 1.0, (1-1j)]


Python lists can be inhomogeneous and arbitrarily nested:

In [196]:
nested_list = [1, [2, [3, [4, [5]]]]]

nested_list

[1, [2, [3, [4, [5]]]]]

### Lists and flow control

Lists play a very important role in Python, and are for example used in loops and other flow control structures (discussed below). There are number of convenient functions for generating lists of various types, for example the `range` function:

In [197]:
start = 10
stop = 30
step = 2

range(start, stop, step)

range(10, 30, 2)

In [198]:
# in python 3 range generates an iterator, which can be converted to a list using 'list(...)'. It has no effect in python 2
list(range(start, stop, step))

[10, 12, 14, 16, 18, 20, 22, 24, 26, 28]

In [199]:
list(range(-10, 10))

[-10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [200]:
s

'Hello world'

In [201]:
# convert a string to a list by type casting:

s2 = list(s)

s2

['H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']

In [202]:
# sorting lists
s2.sort()

print(s2)

[' ', 'H', 'd', 'e', 'l', 'l', 'l', 'o', 'o', 'r', 'w']


### Adding, inserting, modifying, and removing elements from lists

In [203]:
# create a new empty list
l = []

# add an elements using `append`
l.append("A")
l.append("d")
l.append("d")

print(l)

['A', 'd', 'd']


We can modify lists by assigning new values to elements in the list. In technical jargon, lists are *mutable*.

In [204]:
l[1] = "p"
l[2] = "p"

print(l)

['A', 'p', 'p']


In [205]:
l[1:3] = ["d", "d"]

print(l)

['A', 'd', 'd']


#### Insert

Insert an element at an specific index using `insert`

In [206]:
l.insert(0, "i")
l.insert(1, "n")
l.insert(2, "s")
l.insert(3, "e")
l.insert(4, "r")
l.insert(5, "t")

print(l)

['i', 'n', 's', 'e', 'r', 't', 'A', 'd', 'd']


#### Remove
Remove first element with specific value using 'remove'

In [207]:
l.remove("A")

print(l)

['i', 'n', 's', 'e', 'r', 't', 'd', 'd']


Remove an element at a specific location using `del`:

In [208]:
del l[7]
del l[6]

print(l)

['i', 'n', 's', 'e', 'r', 't']


## Tuples



Tuples are like lists, except that they cannot be modified once created, that is they are *immutable*. 

In Python, tuples are created using the syntax `(..., ..., ...)`, or even `..., ...`:

In [209]:
point = (10, 20)

print(point, type(point))

(10, 20) <class 'tuple'>


In [210]:
point = 10, 20

print(point, type(point))

(10, 20) <class 'tuple'>


### Unpacking tuples

We can unpack a tuple by assigning it to a comma-separated list of variables:

In [211]:
x, y = point

print("x =", x)
print("y =", y)

x = 10
y = 20


#### Tuples are Inmutable 

If we try to assign a new value to an element in a tuple we get an error:

In [212]:
point[0] = 20

TypeError: 'tuple' object does not support item assignment

## Dictionaries



Dictionaries are also like lists, except that each element is a key-value pair. The syntax for dictionaries is `{key1 : value1, ...}`:

In [213]:
params = {"parameter1" : 1.0,
          "parameter2" : 2.0,
          "parameter3" : 3.0,}

print(type(params))
print(params)

<class 'dict'>
{'parameter1': 1.0, 'parameter2': 2.0, 'parameter3': 3.0}


In [214]:
print("parameter1 = " + str(params["parameter1"]))
print("parameter2 = " + str(params["parameter2"]))
print("parameter3 = " + str(params["parameter3"]))

parameter1 = 1.0
parameter2 = 2.0
parameter3 = 3.0


In [215]:
params["parameter1"] = "A"
params["parameter2"] = "B"

# add a new entry
params["parameter4"] = "D"

print("parameter1 = " + str(params["parameter1"]))
print("parameter2 = " + str(params["parameter2"]))
print("parameter3 = " + str(params["parameter3"]))
print("parameter4 = " + str(params["parameter4"]))

parameter1 = A
parameter2 = B
parameter3 = 3.0
parameter4 = D


# Control Flow

## Conditional statements: if, elif, else



The Python Syntax for conditional execution of code use the keywords `if`, `elif` (else if), `else`:

In [216]:
statement1 = True
statement2 = False

if statement1:
    print("statement1 is True")

elif statement2:
    print("statement2 is True")
    
else:
    print("statement1 and statement2 are False")

statement1 is True


### Blocks are defined by indention

For the first time, here we encounted a peculiar and unusual aspect of the Python programming language: Program blocks are defined by their indentation level. 

Compare to the equivalent C code:

    if (statement1)
    {
        printf("statement1 is True\n");
    }
    else if (statement2)
    {
        printf("statement2 is True\n");
    }
    else
    {
        printf("statement1 and statement2 are False\n");
    }

In C blocks are defined by the enclosing curly brakets `{` and `}`. And the level of indentation (white space before the code statements) does not matter (completely optional). 

But in Python, the extent of a code block is defined by the indentation level (usually a tab or say four white spaces). This means that we have to be careful to indent our code correctly, or else we will get syntax errors. 



### IF Example - Blocks by indention

In [217]:
statement1 = statement2 = True

if statement1:
    if statement2:
        print("both statement1 and statement2 are True")


both statement1 and statement2 are True


In [218]:
# Bad indentation!
if statement1:
    if statement2:      
    print("both statement1 and statement2 are True")  # this line is not properly indented

IndentationError: expected an indented block (4166911315.py, line 4)

In [219]:
statement1 = False 

if statement1:
    print("printed if statement1 is True")
    
    print("still inside the if block")

In [220]:
if statement1:
    print("printed if statement1 is True")
    
print("now outside the if block")

now outside the if block


## Loops



In Python, loops can be programmed in a number of different ways. The most common is the `for` loop, which is used together with iterable objects, such as lists. The basic syntax is:


### `for` loops:

In [221]:
for x in [1,2,3]:
    print(x)
    
y=[1,2,3]
print(y)

1
2
3
[1, 2, 3]


### Counting `for` loops

The `for` loop iterates over the elements of the supplied list, and executes the containing block once for each element. Any kind of list can be used in the `for` loop. For example:

In [222]:
for x in range(6): # by default range start at 0
    print(x)

0
1
2
3
4
5


Note: `range(4)` does not include 4 !

In [223]:
for x in range(-3,3):
    print(x)

-3
-2
-1
0
1
2


In [224]:
for word in ["data", "science", "with", "python"]:
    print(word)

data
science
with
python


### Iterating over Dictionaries

To iterate over key-value pairs of a dictionary:

In [225]:
for key, value in params.items():
    print(key + " = " + str(value))

parameter1 = A
parameter2 = B
parameter3 = 3.0
parameter4 = D


Sometimes it is useful to have access to the indices of the values when iterating over a list. We can use the `enumerate` function for this:

In [226]:
for idx, x in enumerate(range(-3,3)):
   print(idx, x) 

0 -3
1 -2
2 -1
3 0
4 1
5 2


### List comprehensions: Creating lists using `for` loops

A convenient and compact way to initialize lists:

In [227]:
l1 = [x**2 for x in range(0,5)]

print(l1)

[0, 1, 4, 9, 16]


List comprehensions can be conditional (and hence very powerful especially when working with data structures!)

In [228]:
l1 = [x**2 for x in range(0,5) if x<3]

print(l1)

[0, 1, 4]


#### `while` loops

In [229]:
i = 0

while i < 5:
    print(i)
    
    i = i + 1
    
print("done")

0
1
2
3
4
done


Note that the `print("done")` statement is not part of the `while` loop body because of the difference in indentation.

# Functions



A function in Python is defined using the keyword `def`, followed by a function name, a signature within parenthises `()`, and a colon `:`. The following code, with one additional level of indentation, is the function body.

In [230]:
def func0():   
    print("test")

In [231]:
func0()

test


## Documenting a Function


Optionally, but highly recommended, we can define a so called "docstring", which is a description of the functions purpose and behaivor. The docstring should follow directly after the function definition, before the code in the function body.

In [232]:
def func1(s):
    """
    Print a string 's' and tell how many characters it has 
    """
    
    print(s + " has " + str(len(s)) + " characters")

In [233]:
help(func1)

Help on function func1 in module __main__:

func1(s)
    Print a string 's' and tell how many characters it has



In [234]:
func1("test")

test has 4 characters


##  Returning values



Functions that returns a value use the `return` keyword to return __any object or function__:

In [235]:
def square(x):
    """
    Return the square of x.
    """
    return x ** 2

In [236]:
square(4)

16

### Returning Multiple Values
We can return multiple values from a function using tuples (see above):

In [237]:
def powers(x):
    """
    Return a few powers of x.
    """
    return x ** 2, x ** 3, x ** 4

In [238]:
h=powers(3)
print(type(h)) 
print(h)
print(h[0])

<class 'tuple'>
(9, 27, 81)
9


In [239]:
x2, x3, x4 = powers(3)

print(x3)

27


## Default argument and keyword arguments



In a definition of a function, we can give default values to the arguments the function takes:

In [240]:
def myfunc(x, p=2, debug=False):
    if debug:
        print("evaluating myfunc for x = " + str(x) + " using exponent p = " + str(p))
    return x

If we don't provide a value of the `debug` argument when calling the the function `myfunc` it defaults to the value provided in the function definition:

In [241]:
myfunc(5)

5

In [242]:
myfunc(5, debug=True, p="jadfs")

evaluating myfunc for x = 5 using exponent p = jadfs


5

### Keyword Argument
If we explicitly list the name of the arguments in the function calls, they do not need to come in the same order as in the function definition.

This is called *keyword* arguments, and is often very useful in functions that takes a lot of optional arguments.

In [243]:
myfunc(p=3, debug=True, x=7)

evaluating myfunc for x = 7 using exponent p = 3


7

## Unnamed functions (lambda function)



In Python we can also create unnamed functions, using the `lambda` keyword:

In [244]:
f1 = lambda x: x**2
    
# is equivalent to 

def f2(x):
    return x**2

In [245]:
f1(2), f2(2)

(4, 4)

### (Lambda) Functions as Argument

This technique is useful for exmample when we want to pass a simple function as an argument to another function, like this:

In [246]:
# map is a built-in python function
map(lambda x: x**2, range(-3,4))

<map at 0x267b8f87bb0>

In [247]:
# in python 3 we can use `list(...)` to convert the iterator to an explicit list
list(map(lambda x: x**2, range(-3,4)))

[9, 4, 1, 0, 1, 4, 9]

# Classes



Classes are the key features of object-oriented programming.

A class is a structure for representing an object and the operations that can be performed on the object. 

A class can contain *attributes* (variables) and *methods* (functions).

A class is defined using the `class` keyword plus a number of class method definitions (a function in a class).

* Each class method should have **an argurment `self`** as it first argument. This object is a self-reference.

* Some class method names have special meaning, for example:

 * `__init__`: The name of the method that is invoked when the object is first created.
 * `__str__` : A method that is invoked when a simple string representation of the class is needed, as for example when printed.
 * There are many more, see http://docs.python.org/2/reference/datamodel.html#special-method-names

In [248]:
class Point:
    """
    Simple class for representing a point in a Cartesian coordinate system.
    """
    
    def __init__(self, x, y):
        """
        Create a new Point at x, y.
        """
        self.x = x
        self.y = y
        
    def translate(self, dx, dy):
        """
        Translate the point by dx and dy in the x and y direction.
        """
        self.x += dx
        self.y += dy
        
    def __str__(self):
        return("Point at [%f, %f]" % (self.x, self.y))

## Instance Creation



To create a new instance of a class:

In [249]:
p1 = Point(0, 0) # this will invoke the __init__ method in the Point class

print(p1)         # this will invode the __str__ method

Point at [0.000000, 0.000000]


To invoke a class method in the class instance `p`:

In [250]:
# An indentical call as the one above
p2 = Point(1, 1)

# Passing a class method (function within class) to our original (0,0) coordinates
p1.translate(0.25, 1.5)

print(p1)
print(p2)

Point at [0.250000, 1.500000]
Point at [1.000000, 1.000000]


Note that calling class methods can modifiy the state of that particular class instance, but does not effect other class instances or any global variables.

That is one of the nice things about object-oriented design: code such as functions and related variables are grouped in separate and independent entities. 

# Creating Modules


One of the most important concepts in good programming is to reuse code and avoid repetitions.

The idea is to write functions and classes with a well-defined purpose and scope, and reuse these instead of repeating similar code in different part of a program (modular programming). The result is usually that readability and maintainability of a program is greatly improved. What this means in practice is that our programs have fewer bugs, are easier to extend and debug/troubleshoot. 

Python supports modular programming at different levels. Functions and classes are examples of tools for low-level modular programming. Python modules are a higher-level modular programming construct, where we can collect related variables, functions and classes in a module. A python module is defined in a python file (with file-ending `.py`), and it can be made accessible to other Python modules and programs using the `import` statement. 

Consider the following example: the file `mymodule.py` contains simple example implementations of a variable, function and a class:

In [251]:
%%file mymodule.py
"""
Example of a python module. Contains a variable called my_variable,
a function called my_function, and a class called MyClass.
The code is stored in mymodule.py through using the %%file magic
"""

my_variable = 0

def my_function():
    """
    Example function
    """
    return my_variable
    
class MyClass:
    """
    Example class.
    """

    def __init__(self):
        self.variable = my_variable
        
    def set_variable(self, new_value):
        """
        Set self.variable to a new value
        """
        self.variable = new_value
        
    def get_variable(self):
        return self.variable

Overwriting mymodule.py


# Exceptions



In Python errors are managed with a special language construct called "Exceptions". When errors occur exceptions can be raised, which interrupts the normal program flow and fallback to somewhere else in the code where the closest try-except statements is defined.


##  Generating Exceptions



To generate an exception we can use the `raise` statement, which takes an argument that must be an instance of the class `BaseExpection` or a class derived from it. 

In [252]:
raise Exception("description of the error")

Exception: description of the error

A typical use of exceptions is to abort functions when some error condition occurs, for example:

    def my_function(arguments):
    
        if not verify(arguments):
            raise Expection("Invalid arguments")
        
        # rest of the code goes here

## Catching Exception

To gracefully catch errors that are generated by functions and class methods, or by the Python interpreter itself, use the `try` and  `except` statements:

    try:
        # normal code goes here
    except:
        # code for error handling goes here
        # this code is not executed unless the code
        # above generated an error

For example:

In [253]:
try:
    print("test")
    
    # generate an error: the variable test is not defined
    print(test)
    
except:
    print("Caught an expection")

test
Caught an expection


## Message of an Exception




To get information about the error, we can access the `Exception` class instance that describes the exception by using for example:

    except Exception as e:

In [254]:
try:
    print("test")
    # generate an error: the variable test is not defined
    print(test)
except Exception as e:
    print("Caught an exception:" + str(e))

test
Caught an exception:name 'test' is not defined


Print the stack trace of an exception

In [255]:
import traceback
def tb_test():
    try:
        print("test")
        # generate an error: the variable test is not defined
        print(test)
    except Exception as e:
        print("Caught an exception:" + str(e))
        print("And here comes the Traceback:")
        traceback.print_exc()

tb_test()

test
Caught an exception:name 'test' is not defined
And here comes the Traceback:


Traceback (most recent call last):
  File "C:\Users\David\AppData\Local\Temp\ipykernel_27432\2505802241.py", line 6, in tb_test
    print(test)
NameError: name 'test' is not defined


## NumPy (from Stanford's CS224N)
NumPy is a Python library, which adds support for large, multi-dimensional arrays and matrices, along with a large collection of optimized, high-level mathematical functions to operate on these arrays.

You may need to install numpy first before importing it in the next cell.

There are many ways to manage your packages, but the workflow we suggest for this class is to use Anaconda.
 - Download Anaconda. Create a conda environment when you work on a new project.
 - Activate your conda environment and install libraries using conda or pip if they are not available in conda.
 - If you are running scripts on command line, run inside your conda environment.
 - If you are using a Jupyter notebook, add your conda environment to your Jupyter notebook: https://towardsdatascience.com/get-your-conda-environment-to-show-in-jupyter-notebooks-the-easy-way-17010b76e874. Create your Jupyter notebook and verify you're in your conda environment kernel (top right of notebook should display the name). If you're not, go to the Kernel tab on the top left and click Change kernel to change to your conda environment kernel.

In [258]:
# Import numpy
import numpy as np

In [259]:
# Create numpy arrays from lists
x = np.array([1,2,3])
a = np.array([[1,2,3]])


y = np.array([[3,4,5]])
z = np.array([[6,7],[8,9]])

# Let's take a look at their shapes.
# When working with numpy arrays, .shape will be a very useful debugging tool
print(x.shape)
print(y.shape)
print()
print(z)
print(z.shape)

(3,)
(1, 3)

[[6 7]
 [8 9]]
(2, 2)


Vectors can be represented as 1-D arrays of shape (N,) or 2-D arrays of shape (N, 1) or (1, N). But it's important to note that the shapes (N,), (N, 1), and (1,N) are not the same and may result in different behavior (we'll see some examples below involving matrix multiplication and broadcasting).

Matrices are generally represented as 2-D arrays of shape (M, N).

The best way to ensure your code gives you the behavior you expect is to keep track of your array shapes and try out small test cases or refer back to documentation when you are unsure.

In [260]:
a = np.arange(10)
b = a.reshape((5, 2))
print(a)
print()
print(b)

[0 1 2 3 4 5 6 7 8 9]

[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]


### Array Operations

There are many NumPy operations that can be used to reduce a numpy array along an axis.

Let's look at the np.max operation (documentation: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.max.html).

In [261]:
x = np.array([[1,2],[3,4], [5, 6]])
print(x)
print()
print(x.shape)

[[1 2]
 [3 4]
 [5 6]]

(3, 2)


In [262]:
print(np.max(x, axis = 1))

[2 4 6]


In [263]:
print(np.max(x, axis = 1).shape)

(3,)


In [264]:
print(np.max(x, axis = 1, keepdims = True))

[[2]
 [4]
 [6]]


In [265]:
print(np.max(x, axis = 1, keepdims = True).shape)

(3, 1)


Next, let's look at some matrix operations. Let's take an element-wise product (Hadamard product).

In [266]:
A = np.array([[1, 2], [3, 4]])
B = np.array([[3, 3], [3, 3]])
print(A)
print(B)
print("---")
print(A * B)

[[1 2]
 [3 4]]
[[3 3]
 [3 3]]
---
[[ 3  6]
 [ 9 12]]


We can do matrix multiplication with np.matmul or @.

In [267]:
# One way to do matrix multiplication
print(np.matmul(A, B))

# Another way to do matrix multiplication
print(A @ B)

[[ 9  9]
 [21 21]]
[[ 9  9]
 [21 21]]


We can take the dot product or a matrix vector product with np.dot.

In [268]:
u = np.array([1, 2, 3])
v = np.array([1, 10, 100])

print(np.dot(u, v))

# Can also call numpy operations on the numpy array, useful for chaining together multiple operations
print(u.dot(v))

321
321


In [269]:
W = np.array([[1, 2], [3, 4], [5, 6]])
print(v.shape)
print(W.shape)

# This works.
print(np.dot(v, W))
print(np.dot(v, W).shape)

(3,)
(3, 2)
[531 642]
(2,)


In [270]:
# This does not. Why?
print(np.dot(W, v))

ValueError: shapes (3,2) and (3,) not aligned: 2 (dim 1) != 3 (dim 0)

In [271]:
# We can fix the above issue by transposing W.
print(np.dot(W.T, v))
print(np.dot(W.T, v).shape)

[531 642]
(2,)


###  Indexing

Slicing / indexing numpy arrays is a extension of the Python concept of slicing (lists) to N dimensions.

In [272]:
x = np.random.random((3, 4))

# Selects all of x
print(x[:])

[[0.45715601 0.31469443 0.58602923 0.95297734]
 [0.64142775 0.21401485 0.41842489 0.42772252]
 [0.58312784 0.34786316 0.94355481 0.45647013]]


In [273]:
# Selects the 0th and 2nd rows
print(x[np.array([0, 2]), :])

print("---")

# Selects 1st row as 1-D vector and and 1st through 2nd elements
print(x[1, 1:3])

[[0.45715601 0.31469443 0.58602923 0.95297734]
 [0.58312784 0.34786316 0.94355481 0.45647013]]
---
[0.21401485 0.41842489]


In [274]:
# Boolean indexing
print(x[x > 0.5])

[0.58602923 0.95297734 0.64142775 0.58312784 0.94355481]


In [275]:
# 3-D vector of shape (3, 4, 1)
print(x[:, :, np.newaxis])

[[[0.45715601]
  [0.31469443]
  [0.58602923]
  [0.95297734]]

 [[0.64142775]
  [0.21401485]
  [0.41842489]
  [0.42772252]]

 [[0.58312784]
  [0.34786316]
  [0.94355481]
  [0.45647013]]]


### Broadcasting

The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations.

**General Broadcasting Rules**

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when:
- they are equal, or
- one of them is 1 (in which case, elements on the axis are repeated along the dimension)

More details: https://numpy.org/doc/stable/user/basics.broadcasting.html

In [276]:
x = np.random.random((3, 4))

y = np.random.random((3, 1))
z = np.random.random((1, 4))

# In this example, y and z are broadcasted to match the shape of x.
# y is broadcasted along dim 1.
s = x + y
# z is broadcasted along dim 0.
p = x * z

In [277]:
print(x.shape)
print()
print(y.shape)
print(s.shape)

(3, 4)

(3, 1)
(3, 4)


In [278]:
print(x.shape)
print()
print(s.shape)
print(p.shape)

(3, 4)

(3, 4)
(3, 4)


In [279]:
a = np.zeros((3, 3))
b = np.array([[1, 2, 3]])
print(a)
print()
print(a+b)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

[[1. 2. 3.]
 [1. 2. 3.]
 [1. 2. 3.]]


Let's look at a more complex example.

In [280]:
a = np.random.random((3, 4))
b = np.random.random((3, 1))
c = np.random.random((3, ))

What is the expected broadcasting behavior for these operations? What do the following operations give us? What are the resulting shapes?

In [281]:
result1 = b + b.T

print(b.shape)
print(b.T.shape)
print(result1.shape)
print(result1)

(3, 1)
(1, 3)
(3, 3)
[[0.40745787 1.11478997 0.21054352]
 [1.11478997 1.82212207 0.91787562]
 [0.21054352 0.91787562 0.01362918]]


In [282]:
result2 = a + c

print(a.shape)
print(c.shape)
print(result2.shape)
print(result2)

ValueError: operands could not be broadcast together with shapes (3,4) (3,) 

In [283]:
result3 = b + c

print(b.shape)
print(c.shape)
print(result3.shape)
print(result3)

(3, 1)
(3,)
(3, 3)
[[0.97572639 1.04263807 0.36947619]
 [1.68305849 1.74997017 1.07680829]
 [0.77881204 0.84572372 0.17256185]]


### Efficient NumPy Code

When working with numpy arrays, avoid explicit for-loops over indices/axes at all costs. For-loops will dramatically slow down your code (~10-100x).

We can time code using the %%timeit magic. Let's compare using explicit for-loop vs. using numpy operations.

In [284]:
%%timeit
x = np.random.rand(1000, 1000)
for i in range(100, 1000):
    for j in range(x.shape[1]):
        x[i, j] += 5

319 ms ± 32.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [285]:
%%timeit
x = np.random.rand(1000, 1000)
x[np.arange(100,1000), :] += 5

13.8 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)


# Further reading



* http://www.python.org - The official web page of the Python programming language.
* http://www.python.org/dev/peps/pep-0008 - Style guide for Python programming. Highly recommended. 
* https://greenteapress.com/wp/think-python-2e/ - A free book on Python programming.

### Versions

In [256]:
import sys
import IPython

In [257]:
print("This notebook was evaluated with: Python %s and IPython %s." % (sys.version, IPython.__version__))

This notebook was evaluated with: Python 3.8.13 (default, Mar 28 2022, 06:59:08) [MSC v.1916 64 bit (AMD64)] and IPython 8.5.0.
