# General Introduction to Important Python Features

### FTAG algo tutorial on good code practices, 14.04.2022

### Manuel Guth

with Material from GRK python workshop and RODEM good practices mini-workshop

# Overview

* [New Features in Python 3](#New-Features-in-Python-3)
* [Generators](#Generators)
* [Type hinting / Type declaration](#Type-hinting-/-Type-declaration)
* [Logging](#Logging)
* [argparse](#Command-line-options----argparse)
* [What not to do](#What-NOT-to-do)
* [Debugger](#Debugger-PDB)
* [Code formatting & Linting](#Code-formatting-&-Linting)

Auxiliary Material (for which we don't have enough time)
* [Object-Oriented Programming](#Object-Oriented-Programming-(OOP))


reusing some material from https://indico.cern.ch/event/846501


# New Features in Python 3

* Python 2 is deprecated since beginning of 2020
* Python 3 already has 10 minor releases (3.xx)
* For all changes have a look at [What's New in Python](https://docs.python.org/3/whatsnew/index.html)
* [Cheat Sheet: Writing Python 2-3 compatible code](http://python-future.org/compatible_idioms.html)

are your libraries ready for python 3.10? [have a look](https://pyreadiness.org/3.10/)

## f-Strings

#### String formatting before Python 3.6

In [1]:
import math
ftag = 2_022
where = "online"

In [4]:
message = "Welcome to the FTAG Algo {} Good practice tutorial {}!\nWe can round pi to {:.2f}".format(ftag, where, math.pi)

In [5]:
print(message)

Welcome to the FTAG Algo 2022 Good practice tutorial online!
We can round pi to 3.14


#### String formatting with f-String

In [6]:
message_f = f"Welcome to the FTAG Algo {ftag} Python mini workshop {where}!\nWe can round pi to {math.pi:.2f}."

In [7]:
print(message_f)

Welcome to the FTAG Algo 2022 Python mini workshop online!
We can round pi to 3.14.


## True Division

#### Python 2

3/4 returned 0


#### Python 3

In [None]:
3/4

In python 3 the operator `/` does not loose fractions

Integer division has its own operator

In [None]:
3//4

# Dictionary operators

New Merge (|) and update (|=) operators for dictionaries

In [8]:
dict1 = {"key1": "CERN", "key2": "DESY"}
dict2 = {"key2": "CH", "key3": "DE"}

In [9]:
dict1 | dict2

{'key1': 'CERN', 'key2': 'CH', 'key3': 'DE'}

In [10]:
dict2 | dict1

{'key2': 'DESY', 'key3': 'DE', 'key1': 'CERN'}

In [11]:
dict2 |= dict1

In [12]:
dict2

{'key2': 'DESY', 'key3': 'DE', 'key1': 'CERN'}

- available since python 3.9

# Parenthesised context managers

In [None]:
with (
    CtxManager1() as example1,
    CtxManager2() as example2,
    CtxManager3() as example3,
):
    ...

# Structural Pattern Matching

In [None]:
def http_error(status):
    match status:
        case 400:
            return "Bad request"
        case 404:
            return "Not found"
        case 418:
            return "I'm a teapot"
        case _:
            return "Something's wrong with the internet"

In [None]:
http_error(418)

# Pairwise function itertools

In [None]:
from itertools import pairwise
words = ["good", "morning", "routine"]
for w1, w2 in pairwise(words):
    print(w1, w2)

- available since python 3.10
- useful e.g. when looping over indices for batches

# Generators

In [None]:
def squares(end):
    """
    Returns the squares of 0 up to (not including) the given end.
    >>> squares(3)
    [0, 1, 4]
    """
    out = []
    for i in range(end):
        out.append(i * i)
    return out

In [None]:
squares(3)

This is a typical pattern:

 1. Create empty list
 2. Append items in loop
 3. Return final list

## Problematic when dealing with huge lists

In [None]:
small_list = squares(10)  # Returns list of 10 items
sum(small_list)

In [None]:
large_list = squares(1000_000)  # Returns a list with 1 million items
                                # Calling it with 1 billion exhausts my computer's memory
sum(large_list)

In this example
 - Don't need random access to items: `large_list[100]`
 - Need only to iterate over list once

# Solution: Generators

In [None]:
def squares(end):
    """
    Returns the squares of 0 up to (not including) the given end.
    >>> squares(3)
    [0, 1, 4]
    """
    # Old implemenation:
    # out = []
    # for i in range(end):
    #    out.append(i * i)
    # return out
    for i in range(end):
        yield i * i  # yield one item at a time

In [None]:
squares(3)

In [None]:
list(squares(3))

In [None]:
sum(squares(1000_000))  # Computes one item at a time
# Works even with 1 billion, takes ~2min

## Exercise: Write a generator for a binary sequence

The method should take a `limit` parameter. Each item in the sequence is the product of the previous value and `2`: $a_n = 2 \cdot a_{n-1}$. The sequence starts with 1. The sequence should stop when the `limit` is reached.

In [None]:
from solutions import exp_seq
list(exp_seq(10))

In [None]:
sum(exp_seq(10))

In [None]:
sum(exp_seq(1000_1000))

# Type hinting / Type declaration

In [None]:
def multiply_values(val1, val2):
    """Multiplies two floats and returns result."""
    return f"Result: {val1 * val2}"

In [None]:
multiply_values(5.4, 1.2)

In [None]:
multiply_values(5, 2)

In [None]:
multiply_values(True, False)

Common case!
- Function intended to be used with floats
- Python doesn't forbid other types

### How to avoid that

Type hinting helps to remind yourself and other developers about your intentions

- Hinted types of arguments
- Hinted return type

In [None]:
def multiply_values(val1: float, val2: float) -> str:
    """Multiplies two floats and returns result."""
    return f"Result: {val1 * val2}"

Can ask for the type hints at run time:

In [None]:
from typing import get_type_hints

In [None]:
get_type_hints(multiply_values)

### A few reminders

Type hints are just _hints_, they do not declare types. Can still do this:

In [None]:
multiply_values(True, False)

In [None]:
get_type_hints(multiply_values)

> Python will remain a dynamically typed language, and the authors have no desire to ever make type hints mandatory, even by convention.

# Logging

... defines functions and classes which implement a flexible event logging system for applications and libraries.
- Track the status of software at runtime 
- Can be output, stored to a file, etc.
- Can have different severity/importance levels
- Can have custom output format

### Logging levels


- `DEBUG` – detailed information, only for problem diagnosis
- `INFO` – conformative, "working as expected"
- `WARNING` – something unexpected happened, maybe a problem in the near future, but: still working as expected
- `ERROR` – more serious problem, some operation not executed
- `CRITICAL` – serious error, program itself might be compromised

### Loggers, Handlers, Formatters

- Loggers: to expose the interface that applications use
- Handlers: to send the logs to the appropriate destination
- Formatters: to specify the log layout in the final output

In [None]:
import logging

In [None]:
logger = logging.getLogger()
logger.setLevel("INFO")

In [None]:
handler = logging.StreamHandler()

In [None]:
formatter = logging.Formatter(
    "%(funcName)s()  %(levelname)7s  %(message)s", 
    '%H:%M:%S'
)
handler.setFormatter(formatter)

In [None]:
logger.addHandler(handler)

### Simple example

In [None]:
def floor(var: float) -> int:
    """Floors a float."""
    logger.info(f"called with argument var={var}.")
    if type(var) not in [float, int]:
        logger.error(
            f"called with var={var} which is neither float nor int."
            " Returned 'None' as I don't know what to do here."
        )
        return None
    elif type(var) is not float:
        logger.warning(f"called with var={var} which is not a float.")

    return int(var)

In [None]:
floor(3.7)

In [None]:
floor(3)

In [None]:
floor("3")

### More things to be done with loggers – some ideas

- Multiple handlers, e.g. to:
  - send warning/error/fatal to std output
  - send info/warning/error/fatal to a log file
  - ...
- Same or different formats for multiple handlers
- Make use of a command-line argument `--debug` to:
  - print everything down to debug level to std output
  - use a different formatter that prints more info (e.g. module name + line number)

# Command-line options -  [`argparse`](https://docs.python.org/3/library/argparse.html#module-argparse)

Command-line parsing module in the Python standard library

All sorts of configurations possible:
- Positional / Keyword
- Default values
- Keywords can be mandatory or optional
- Help messages

In [None]:
from argparse import ArgumentParser

In [None]:
parser = ArgumentParser()

### How to add arguments

In [None]:
parser.add_argument("number", type=float) # positional argument with type float

In [None]:
parser.add_argument(
    '-e',           # short-hand
    '--exponent',   # full name
    default=2,      # default value
    type=int,       # int type
)

In [None]:
parser.add_argument(
    "-v",                              # short-hand
    "--verbose",                       # full name
    help="increase output verbosity",  # help message
    action="store_true",               # true/false
)

# What NOT to do


### Thinks you should avoid with python

### Misusing default arguments in functions
you can define default values in a function

In [17]:
def ftag_append(ftag_list=[]):  # ftag_list is optional with the default value []
    ftag_list.append("algo") # this line can cause problems!
    return ftag_list

In [29]:
ftag_append() 

['algo']

Possible way out of it

In [26]:
def ftag_append(ftag_list=None):  # setting default value to None
    if ftag_list is None:
        ftag_list = []
    ftag_list.append("algo")
    return ftag_list

In [41]:
ftag_append()

['algo']

### Import Mistakes

#### Wildcard Import

In [None]:
from numpy import *

* Can cause name clashing
* Unnecessary import of unneeded functionalities

with python 3 e.g. ROOT does not allow wildcard import anymore
```
from ROOT import *
```

### Import Mistakes
#### Name conflicts with other libraries

email is a python standard library
```
from email.message import EmailMessage
```

```
%%writefile email.py
def GetMail():
    return "grk@physik.uni-freiburg.de"
```

```
import email
email.GetMail()
```

### Opening files

Often used to open files
```
file = open("test.txt", "w")
.
.
.
file.close()
```
This synthax can cause issues e.g. if there is an exception raised before `file.close()`

Saver way to open files
```
with open("test.txt", "w") as file:
    .
    .
    .
```

## Mutable assignment errors - Dictionaries

We have a dictionary a

In [42]:
a = {'1': "one", '2': 'two'}

Now we want to have the same dict again but leaving the previous one intact

In [43]:
b = a

In [44]:
b

{'1': 'one', '2': 'two'}

In [45]:
b['3'] = "three"

In [46]:
a

{'1': 'one', '2': 'two', '3': 'three'}

## Mutable assignment errors - Dictionaries
#### What happened?
Here b is a pointer -> reference to a.

The same thing is happening for lists.

Possible way out:

In [None]:
# for dicts
b = a.copy()
# for lists
l = list(a.keys())
cp = l[:]

# Debugger PDB
Your program crashes or doesn't do what it should?

Debugging can be challenging
<img src="material/debugging.png" style="width: 80%; height: auto" />

## Example

In [None]:
from myproject import read_config, compute_all_results

config = read_config()
# ...
results = compute_all_results(config)  # lengthy computation
# ...
for result in results:
    if result == "tt":
        print("We have the answer!")
        break
else:
    print("This should not happen.")

### Debugging with `print()`
Add single print, rerun **whole** program

In [None]:
config = read_config()
# ...
results = compute_all_results(config)  # lengthy computation
# ...
print(results)  # Inspect the list of results
for result in results:
    if result == "tt":
        print("We have the answer!")
        break
else:
    print("This should not happen.")

- `tt` in results
- Why not detected in loop?

### Debugging with `print()`
Add another print, rerun **whole** program **again**

In [None]:
config = read_config()
# ...
results = compute_all_results(config)  # lengthy computation
# ...
print(results)  # Inspect the list of results
for result in results:
    print(result)
    if result == "tt":
        print("We have the answer!")
        break
else:
    print("This should not happen.")

### Better: Using debugger
Insert `breakpoint()` (or `import pdb; pdb.set_trace()` before Python 3.7) and rerun whole program

In [None]:
config = read_config()
# ...
results = compute_all_results(config)  # lengthy computation
# ...
import pdb; pdb.set_trace()  # This works also before 3.7
for result in results:
    if result == "tt":
        print("We have the answer!")
        break
else:
    print("This should not happen.")

### Better: Using debugger
 - Trigger debugger
   - Add `breakpoint()` or `import pdb; pdb.set_trace()`
   - Run `python -m pdb your_program.py`
 - Command summary
   - `b [FILE:]LINE` adds a new **b**earkpoint
   - `c` **c**ontinue to next breakpoint
   - `n` run **n**ext statement
   - `s` **s**tep into method call
   - `u` move one level up (reverts `s`)
   - `cl [N]` clear breakpoints or breakpoint `N`
   - `q` **q**uit
   - `h` **h**elp

## Exercise:
Investigate the example below:

In [None]:
cities = set(["London", "Paris", "Bern"])  # Unordered collection

def get_new_cities():
    new_cities = []
    new_cities.append("Oslo")
    new_cities.append("Praque")
    return set(new_cities)

cities.union(get_new_cities())

print(cities)  # Does not include Oslo, Praque!

# Code formatting & Linting

Code formatter = runs over your code and applies styling changes

Linter = scans the code to flag:
 - Programming errors / invalid syntax
 - Suspicious constructs ("code that smells")
 - Stylistic errors (enforces common style within a team)
 
The combination of the two is extremely powerful!

## Linter example

The the slightly modified example of the cities.
```python
# debug_exercise.py
cities = ["London", "Paris", "Bern"]

def get_nordic_cities():
    cities = []
    cities.append("Oslo")
    cities.append("Stockholm")
    return cities

nordic_cities = get_nordic_cities()

print(cities)  # Still contains London, Paris, Bern
```

## Linter example

```console
$ python -m pylint debug_exercise.py
example.py:1:0: C0114: Missing module docstring (missing-module-docstring)
example.py:6:4: W0621: Redefining name 'cities' from outer scope (line 3) (redefined-outer-name)
example.py:5:0: C0116: Missing function or method docstring (missing-function-docstring)

------------------------------------------------------------------
Your code has been rated at 6.25/10 (previous run: 6.25/10, +0.00)
```
 - `cities` redefined within the function
 - In this example the redefinition might be obvious and not a problem
 - But what if the code is much more complex? Shadowing is dangerous!
 - Linter would have given a hint of the problem already


## What to take away

- Code formatter, e.g. `black`, to have uniform code style
- `pylint`, `flake8`, ... + other style checkers to cross-check syntax, constructs etc

The best is the combination of both! Ideal for pre-commit hooks & CI/CD:
- Pre-commit hooks – no "broken" commits:
  - code formatter 
  - style checker / linter
  - other safety nets, e.g. yaml syntax checker
- Continuous integration: linter + all actual code tests

# Auxiliary Material

severl concepts of python we couldn't cover in the tutorial

## Print Function

In [None]:
print("Hello world!")

In [None]:
print("Hello", "world", sep="-")

In [None]:
print('home', 'user', 'documents', sep='/')

In [None]:
print('', 'home', 'user', 'documents', sep='/')

## Print Function

In [None]:
print('Mercury', 'Venus', 'Earth', sep=', ', end=", ")
print('Mars', 'Jupiter', 'Saturn', sep=', ', end=', ')
print('Uranus', 'Neptune', 'Pluto', sep=', ')

### Writing to file

In [None]:
!cat file.txt

In [None]:
with open('file.txt', mode='w') as file_object:
    print('hello world', file=file_object)

## What is Object-Oriented Programming (OOP)
<img style="float: right; width: 30%" src="material/oop.svg" />


 - You've used it already:
 
     ```python
     "Hello World".lower()
     ```

     The string `"Hello World"` is an object of `str` class.
 - Class is a *blueprint* to create instances, called *objects*
 - Combines data and functions
 - Example: Particles in an experiment

In [None]:
class Particle:
    def __init__(self, mass, charge):
        self.mass = mass
        self.charge = charge

In [None]:
bert = Particle(125, 0)
bert.mass

In [None]:
class Particle:
    def __init__(self, mass, charge):
        # __init__() is called when new object is created.
        # First argument (self) is the new object
        self.mass = mass
        self.charge = charge
        
    def anti(self):
        # First argument is the object on which anti() is called
        
        # Create new particle with same mass and
        # opposite charge
        return Particle(self.mass, -self.charge)

In [None]:
bert = Particle(1.777, -1)
ernie = bert.anti()
ernie.charge

In [None]:
ernie.mass

In [None]:
bert.charge  # Original particle not changed

In [None]:
class Particle:
    def __init__(self, mass, charge):
        # __init__() is called when new object is created.
        # First argument (self) is the new object
        self.mass = mass
        self.charge = charge
        
    def anti(self):
        # First argument is the object on which anti() is called
        
        # Create new particle with same mass and
        # opposite charge
        return Particle(self.mass, -self.charge)
        
    def flip_charge(self):
        # Change the charge of the particle itself (instead of creating a new one)
        
        self.charge *= -1

In [None]:
bert = Particle(1.777, -1)
bert.charge

In [None]:
bert.flip_charge()  # Changes the original particle
bert.charge

## Exercise: Implement a 2D Vector

Implement a `Vector2D` class such that the following lines work

In [None]:
from solutions import Vector2D
a = Vector2D(4, 3)
a.x

In [None]:
a.y

In [None]:
a.length()

In [None]:
a.scale(3)
a.length()

## Inheritance

 - Sub-classes extend parent classes
 - Share functionality implemented in parent classes
 - Terminology: parent class = "base class"; sub-class = "derived class"
 - Inheritance models = "**is a**"-type relationships
   - A `Fermion` **is a** `Particle`
   - A `Particle` is not necessariliy a `Fermion`
 - Example: Include sub-classes `Fermion` and `Boson`

In [None]:
class Boson(Particle):
    def interact_with_higgs(self, factor=1.5):
        # Bosons can increase their mass by interacting with the Higgs field (NEW PHYSICS!)
        self.mass *= factor

class Fermion(Particle):
    def __init__(self, mass, charge, generation):
        super().__init__(mass, charge)  # Create a regular particle
        self.generation = generation

In [None]:
tau = Fermion(1.777, -1, 3)
tau.generation

In [None]:
Z = Boson(60.78, 0)
Z.mass

In [None]:
Z.interact_with_higgs()
Z.mass

In [None]:
Z.generation  # Z is a Boson which do not come in generations

In [None]:
tau.interact_with_higgs()

## Other interesting things about OOP

 - Methods `__str__` and `__repr__` can be overridden
   - Reminder: `__repr__` = unambiguous representation of an object
   - Reminder: `__str__` = "pretty" printable representation (defaults to `__repr__`)
 - Operators can be overriden: `ernie + bert`
 - Polymorphism: methods with different implementations sub-classes, e.g.
   - `Fermion.susy()` returns a Boson
   - `Boson.susy()` returns a Fermion