# Python manual

<a name="contents"></a>
# Contents

- [Configuration](#conf)
- [Os](#os)
- [Sys](#sys)
- [Pdb](#pdb)
- [String](#string)
- [Regex](#regex)
- [Random](#rnd)
- [Time](#time)
- [Datetime](#datetime)
    - [time objects](#timeobj)
    - [date objects](#dateobj)
    - [datetime objects](#datetimeobj)
- [Dateutil](#dateutil)
- [Zipfile](#zipfile)
- [Argparse](#argparse)
- [Subprocess](#subproc)
- [Multiprocessing](#multiproc)
- [Unittest](#unittest)
- [A/B Tests](#ab)
- [Errors](#error)
- [Data types](#data)
    - [set](#set)
    - [list](#list)
    - [tuple](#tuple)
    - [dictionary](#dictionary)
- [Functions](#fnc)
- [Objects](#obj)
- [I/O](#io)
    - [create](#create)
    - [write](#write)
    - [read](#read)

---
<a name="conf"></a>
# Configuration

[Return to Contents](#contents)

<a name="imports"></a>
### imports

Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.  
Imports should usually be on separate lines.  
Imports should be grouped in the following order:
- standard library imports
- related third party imports
- local application/library specific imports

You should put a blank line between each group of imports.

**References**
- python standard library https://docs.python.org/3/library/

In [None]:
# first you should import python standard library modules
import os
import re
import csv
import sys
import pdb
import json
import pytz
import time
import string
import inspect
import zipfile
import unittest
import random as rnd
import datetime as dt
import multiprocessing
from operator import itemgetter
from itertools import groupby, chain

# then you should import third parties packages that you have installed
import numpy as np
import pandas as pd
from joblib import dump, load
from pympler import asizeof
from IPython.display import display, HTML, Image
from dateutil.relativedelta import relativedelta

# finally you should import Local application/library specific imports

<a name="env"></a>
### environment configuration

In [None]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)

display(HTML('<style>.container { width:90% !important; }</style>'))

### conda environment

In [None]:
print(f'Running notebook with Conda Env {os.environ["CONDA_DEFAULT_ENV"]}')

<a name="global"></a>
### global variables

In [None]:
PATH = 'C:\\Users\\pgreselin\\OneDrive\\Python'

<a name="install"></a>
### dependencies installation

In [None]:
# ! pip install xlsxwriter
# ! pip install pympler

### print format

In [None]:
num = 23.097613327304835
print(f'Format {num*100}%')
print(f'Format {num*100:.1f}%')

---
<a name="os"></a>
# Os

[Return to Contents](#contents)

This module provides a portable way of using operating system dependent functionality.

When running a python script you can use the following to extract path to where the script is being run:  
`os.path.dirname(os.path.abspath(__file__))`

In [None]:
os.getcwd()                                                      # get current path
os.listdir()                                                     # list files and folders
os.system('cp test.txt ../')                                     # run bash command (returns 0 for success, 1 for failure)
os.mkdir('test_folder')                                          # create directory
os.rmdir('test_folder')                                          # delete directory
os.path.abspath('.')                                             # get full path to given location
os.path.dirname('.')                                             # get full path up to previous fold of given path
os.path.join('folder', 'file')                                   # create path by joining subpaths
os.path.realpath('example_ml.ipynb')                             # get the real path of file
os.path.getsize('example_ml.ipynb')                              # get the size in bytes of a file

---

<a name="sys"></a>
# Sys

[Contents](#contents)

This module provides access to some variables used or maintained by the interpreter and to functions that interact strongly with the interpreter.

In [None]:
sys.getsizeof('example_python.ipynb')                            # returns byte size of obj

**Note** that `sys.getsizeof` returns the size in Bytes of a python object whereas `os.path.getsize` returns the size in Bytes of a file or folder.\
Furthermore `sys.getsizeof` only measures the size of a an object itself without measuring the size of its internal or nested  elements.\
To get the full size of a python object you can use `asizeof` form `pympler` library

In [None]:
# file
print(f'OS:      {os.path.getsize("example_ml.ipynb")}')         # get the size in bytes of the file
print(f'SYS:     {sys.getsizeof("example_ml.ipynb")}')           # get the size in bytes of the string object
print(f'Pympler: {asizeof.asizeof("example_ml.ipynb")}')         # get the size in bytes of the string object

In [None]:
# object
dd = {'a':1, 'b':2, 'c':3}
print(f'SYS:     {sys.getsizeof(dd)}')                           # only measures the memory consumed by the dictionary itself
print(f'Pympler: {asizeof.asizeof(dd)}')                         # asizeof includes the sizes of nested objects in its calculations

When you chose the size method that is most suitable to your use-case then you can take advantage of this function to print size in human readable format 

In [None]:
def format_size(num, suffix='B'):
    for unit in ('', 'K', 'M', 'G', 'T', 'P', 'E', 'Z'):
        if abs(num) < 1024.0:
            return f'{num:3.1f}{unit}{suffix}'
        num /= 1024.0

    return f'{num:.1f}Y{suffix}'
    
format_size(os.path.getsize('example_ml.ipynb'))

---
<a name="pdb"></a>
# Pdb

[Return to Contents](#contents)

The module pdb defines an interactive source code debugger for Python programs.
It supports setting (conditional) breakpoints and single stepping at the source line level, inspection of stack frames, source code listing, and evaluation of arbitrary Python code in the context of any stack frame.

In [None]:
pdb.set_trace()

---
<a name="string"></a>
# String

[Return to Contents](#contents)

In [None]:
# remove punctuation
ss = 'Hy there, WHAT was- that? Sad.nEss'
' '.join(ss.translate(str.maketrans(string.punctuation, ' ' * len(string.punctuation))).title().split())

---
<a name="regex"></a>
# Regex

[Return to Contents](#contents)

- character classes
    - `.` matches whatever character except from a newline  
    - `\w`, `\d`, `\s` match respectively word, digit, whitespace
    - `\W`, `\D`, `\S` match respectively not word, digit, whitespace
    - `[abc]` match any of a, b, or c
    - `[^abc]` match not a, b, or c
    - `[a-g]` match any character between a & g
- anchors
    - `^abc`, `abc$` match respectively the start / end of the string
    - `\b`, `\B` match respectively word, not-word boundary
- escape characters
    - `\.`, `\*`, `\\` are escaped special characters
    - `\t`, `\n`, `\r` refer to tab, linefeed and carriage return
- groups & lookaround
    - `(abc)` is a capture group meaning that returns the string that matches the pattern insided the parenthesis 
    - `\1` is a backreference to group #1
    - `(?:abc)` is a non-capturing group
    - `(?=abc)`, `(?!abc)` are positive and negative lookahead
    - `(?<=a)b`, `(?<!a)b` are positive and negative lookbehind
- quantifiers & alternation
    - `a*`, `a+`, `a?` match respectively 0 or more, 1 or more, 0 or 1 of the specified pattern
    - `a{5}`, `a{2,}` match respectively exactly five, two or more of the specified pattern
    - `a{1,3}` match between one & three occurrences of the pattern
    - `a+?a{2,}?` match match as few as possible occurrences
    - `ab|cd` match either of the two patterns
    
**Note** that lookahead and lookbehind require fixed-width pattern

**References**
- https://regexr.com/
- https://regex101.com/
- https://www.w3schools.com/python/python_regex.asp

**Splitting** can be done with the `re.split(pattern, string)` method.  

In [None]:
# split function
ss = 'one two 3.4 5,6 seven.eight nine,ten'
re.split('\s|(?<!\d)[,.](?!\d)', ss)                             # split string by characters

**Searching** can be done with the `re.search(pattern, string)` method.  
This finds all occurrences of search pattern insinde the input string.

You can use `group()` method to retrieve a specific match from the search output. 

In [None]:
# search function
path = 'gs://my-bucket/common/feeds/maps/year=2021/month=5/day=19/'
re.search('gs://([^/]+)/(.+)/', path)                            # return the match object
re.search('gs://([^/]+)/(.+)/', path).group(2)                   # return the second match element

**Replacing** and **removing** can be done with the `re.sub(pattern, replace, string)` method.

**Note** that you must not pass escape characters as `replace` argument to `re.sub()` method. Instead use exact characters (i.e. `' '` instead of `'\s'` or `','` instead of `'\,'`).

In [None]:
# replace function
path = 'gs://my-bucket/common/feeds/maps/year=2021/month=5/day=19/'
re.sub(r'/year=[0-9]{4}$', '', path)

---
<a name="rnd"></a>
# Random

[Return to Contents](#contents)

This module implements **pseudo-random** number generators for various distributions.

In [None]:
rnd.seed()                                                       # initialize the random number generator
rnd.random()                                                     # generate random float between 0 and 1
rnd.randint(0,5)                                                 # generate random integer between given numbers
rnd.choice([1,3,5,7,9])                                          # extract random sample from sequence

---
<a name="time"></a>
# Time

[Return to Contents](#contents)

This module provides various time-related functions

In [None]:
# current time 
tz = pytz.timezone('Europe/Rome')
now = dt.datetime.now(tz)
now.strftime('%H:%M:%S')

In [None]:
start_time = time.time()

# execution time in seconds
print(f'Execution time: {time.time() - start_time} seconds')

# execution time in readable format
print(f'Execution time: {dt.timedelta(seconds=time.time() - start_time)}')

---
<a name="datetime"></a>
# Datetime

[Return to Contents](#contents)

The datetime module supplies classes for manipulating dates and times

<a name="timeobj"></a>
## time objects

[Return to Contents](#contents)

A time object represents a (local) time of day, independent of any particular day, and subject to adjustment via a tzinfo object.

In [None]:
time = dt.time(3, 45, 12)                                        # create custom time variable
hour, minute, second = time.hour, time.minute, time.second       # access time elements
time_repl = time.replace(hour=5, second=30)                      # replace values

<a name="dateobj"></a>
## date objects

[Return to Contents](#contents)

A date object represents a date (year, month and day) in an idealized calendar, the current Gregorian calendar indefinitely extended in both directions.

In [None]:
today = dt.date.today()                                          # get today's date variable
date = dt.date(2021, 1, 1)                                       # create custom date
year, month, day = date.year, date.month, date.day               # access elements
date.weekday()                                                   # get day of the way as integer where monday is 0 and sunday is 6
today.strftime('%Y%m%d')                                         # strftime: this means string formatter and will format a data format to string
today.isoformat()                                                # isoformat: converts a date to the ISO 8601 format that is YYYY-MM-DD
date + dt.timedelta(days=180)                                    # add / remove days to date

<a name="datetimeobj"></a>
## datetime objects

[Return to Contents](#contents)

A datetime object is a single object containing all the information from a date object and a time object.

In [None]:
now = dt.datetime.now()                                          # get current datetime variable
ds = dt.datetime(2021, 1, 1)                                     # create custom datetime
hour, minute, second = ds.hour, ds.minute, ds.second             # access elements
weekday = ds.weekday()                                           # get day of the way as integer where monday is 0 and sunday is 6
date = ds.date()                                                 # convert datetime to date object
ds.replace(month=3, minute=16)                                   # replace values
dt.datetime.strptime('2019-08-09 01:01:01', '%Y-%m-%d %H:%M:%S') # strptime: this means string parser and will convert a string to datetime
ds.strftime('%Y-%m-%d %H:%M:%S')                                 # strftime: this means string formatter and will format a data format to string and can be used to print datetime in readable format

---
<a name="dateutil"></a>
# Dateutil

[Return to Contents](#contents)

Dateutil provides a useful method to compute date differences

In [None]:
# note the different behavior using month or months
date = dt.date(2021, 1, 1)                                       # create custom date
date - relativedelta(months=2)                                   # removes 2 months from date
date - relativedelta(month=2)                                    # replaces month 2 to current date's month

---
<a name="zipfile"></a>
# Zipfile

[Return to Contents](#contents)

The ZIP file format is a common archive and compression standard. This module provides tools to create, read, write, append, and list a ZIP file

In [None]:
with zipfile.ZipFile('sampleDir.zip', 'r') as zip_file:
    zip_file.extractall()                                        # extract all the contents of zip file in current directory
    zip_file.extractall('target_folder')                         # extract all the contents of zip file in different directory

In [None]:
# only extract files with specified pattern from zip folder to target folder
with zipfile.ZipFile('sampleDir.zip', 'r') as zip_file:
    filenames_list = zipObj.namelist()
    for filename in filenames_list:
        if filename.endswith('.csv'):
            zip_file.extract(filename, 'target_folder')

---
<a name="argparse"></a>
# Argparse

[Return to Contents](#contents)

The argparse module makes it easy to write user-friendly command-line interfaces.\
The program defines what arguments it requires, and argparse will figure out how to parse those out of sys.argv. The argparse module also automatically generates help and usage messages. The module will also issue errors when users give the program invalid arguments.

In [None]:
import argparse

parser = argparse.ArgumentParser(description='Revenue estimation for GNLC')
# positional argument
parser.add_argument('month', help='GNLC date in YYYYMM format', required=True)
# option argument that takes a value
parser.add_argument('-c', '--campaign', help='GNLC campaign', required=False, choices=['SME', 'SOHO', 'OneNet'], default='SME')
parser.add_argument('--force-train', action='store_true', help='Force model train')

# parse some argument list
args = parser.parse_args()
gnlc_date = args.month
campaign = args.campaign
force_train = args.force_train

In [None]:
parser = argparse.ArgumentParser(description='Revenue estimation for GNLC parser')

# create subparsers
subparsers = parser.add_subparsers(help='Sub-parsers help')

# create the parser for the "full" command
parser_full = subparsers.add_parser('full', help='Full command help')
parser_full.add_argument('month', type=str, help='GNLC date in YYYYMM format')

# create the parser for the "incr" command
parser_incr = subparsers.add_parser('incr', help='Incremental command help')
parser_incr.add_argument('-c', help='GNLC campaign')

# parse some argument list
parser.parse_args(['full', '202209'])
parser.parse_args(['incr', '-c', '2023q3'])

---
<a name="subproc"></a>
# Subprocess

[Return to Contents](#contents)

The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes

In [None]:
cmd = f'gsutil ls -r gs://{bucket}/{prefix}'

In [None]:
# run should be used as the default method
p = subprocess.run(
    cmd,
    shell=True,
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT)

# Popen can be used for edge cases when more control is needed
p = subprocess.Popen(
    cmd,
    shell=True,
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT)

In [None]:
out = p.stdout.read()

In [None]:
out.decode('utf-8')

---
<a name="multiproc"></a>
# Multiprocessing

[Return to Contents](#contents)

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both POSIX and Windows.

In [None]:
multiprocessing.cpu_count()

To execute code in parallel, you first need to wrap the code you want to run in parallel in a function and then set up a pool of cores that can run the code with the `Pool` class.  
The `Pool` class can be initialized with the number of available cores and then it coordinates them to run the same code in parallel with the `map` method

In [None]:
# wrap epub reader method to a functions that can be mapped
def read_epub(filename_):
    epub_content = epub.read_epub(os.path.join(epubs_path, filename_))
    content = ' '.join([BeautifulSoup(item.content, 'html.parser').get_text() for item in epub_content.get_items_of_type(ebooklib.ITEM_DOCUMENT)])
    for key, val in special_chars.items():
        content = content.replace(key, val)
    content = re.sub(r"([A-Z])([a-zA-Z]+)", lambda m: m.group(1) + m.group(2).lower(), content)
    content = re.sub('\n+', '. ', content)
    return pd.DataFrame(data=[[filename_.split('.')[0], content]], columns=['isbn', 'text'])


data = []
schema = StructType([
    StructField('isbn', StringType(), False),
    StructField('text', StringType(), False)
])


# get a list of file names
available_books_list = os.listdir(epubs_path)
books_list = [isbn for isbn in isbn_list if isbn in available_books_list] 

# set up your pool
with multiprocessing.Pool(processes=8) as pool:
    # have your pool map the file names to dataframes
    df_list = pool.map(read_epub, books_list)
    # reduce the list of dataframes to a single dataframe
    combined_df = pd.concat(df_list, ignore_index=True)

---
<a name="unittest"></a>
# Unittest

[Return to Contents](#contents)

The unittest unit testing framework was originally inspired by JUnit and has a similar flavor as major unit testing frameworks in other languages. It supports test automation, sharing of setup and shutdown code for tests, aggregation of tests into collections, and independence of the tests from the reporting framework

**References**
- https://towardsdatascience.com/how-to-easily-and-confidently-implement-unit-tests-in-python-cad48d91ab74
- https://www.datacamp.com/community/tutorials/unit-testing-python![image.png](attachment:image.png)

In [None]:
# first define a custom function
def cuboid_volume(l):
    if type(l) not in [int, float]:
        raise TypeError('ERROR! Input length is not valid, please specify an integer of float value')
    return (l*l*l)

In [None]:
# then test the custom function with different kind of inputs
length = [2,1.1, -2.5, 2j, 'two', False]

for i in range(len(length)):
    print (f'The volume of cuboid: {cuboid_volume(length[i])}')

When writing Python scripts, test runner must be a Python file called `test_<module_to_be_tested>.py` and placed in the `tests` folder.
The file must store a test suite similar to the following one and you need to provide your test runner with the following lines of code in order to execute the tests.

```
if __name__ == '__main__':
    unittest.main()
```

Finally, the test runner can be executed from your working environment as follows:

`python -m unittest tests/test_<module_to_be_tested>.py`

In [None]:
class TestCuboid(unittest.TestCase):
    def setUp(self):
        # setUp() mehtod is run before any test
        # if you need to define or load some data you can do it inside the setUp() method
        pass
        
    def tearDown(self):
        # tearDown() wil be run if setUp() succeeded, careless of whether the test methods succeeded or not
        # if your test writes some output then you can use tearDown() to delete it
        pass
        
    def test_volume(self):
        self.assertAlmostEqual(cuboid_volume(2), 8)
        self.assertAlmostEqual(cuboid_volume(1), 1)
        self.assertAlmostEqual(cuboid_volume(0), 0)
        self.assertAlmostEqual(cuboid_volume(5.5), 166.375)
        
    def test_input_values(self):
        self.assertRaises(TypeError, cuboid_volume, False)
        self.assertRaises(TypeError, cuboid_volume, 'String')
        self.assertRaises(TypeError, cuboid_volume, ['el1', 'el2'])

In [None]:
unittest.main(argv=[''], verbosity=2, exit=False)s

---
<a name="ab"></a>
# A/B Tests

[Return to Contents](#contents)

An A/B Test is a randomised experiment containing two groups, A and B that receive different experiences. Within an A/B Test, we look to understand and measure the response of each group.

Assume you are an on-line commerce business and wish to make changes to the current product page. The current conversion rate is 13% on average and you would be happy with an increase of 2%.

First you must formulate a hypotesis before starting to test. This will make sure that the interpretation of the result is correct as well as rigorous. The **null hypotesis** usually is the one stating that there is no statistical difference when comparing the test result with the current situation ($H_{0}: p=p_{0}$). Then you can pick one-tailed ($H_{a}: p>p_{0}$) or two-tailed tests ($H_{a}: p\neq p_{0}$) where
- $H_{0}$ is the null hypotesis
- $H_{a}$ il the alternate hypotesis
- $p_{0}$ is the conversion rate of the old design
- $p$ is the conversion rate of the new design

Then you have to set a **confidence level**, for example 95% which leads to considering a **significance level** $\alpha = 0.05$.

Next, you have to choose a **sample size**. The number of people (or user session) we decide to capture in each group will have an effect on the precision of our estimated conversion rates: the larger the sample size, the more precise our estimates. The sample size can be estimated through the **power analysis** that depends on:
- significance level ($\alpha$)
- **power of the test** ($1-\beta$): this represents the probability of finding a statistical difference between the groups is a test when a difference is actually present. It is usually set at 0.8 by convention
- **effect size** or **detectable effect**: how big of a difference we expect there to be berween the conversion rates

Effect size and sample size can be estimated in Python with the support of `statsmodels` library.  
**Note** that the returned sample size is the number of observations you need for each group in order for the experiment to be statistically significant.

Finally, you can draw results of your test. Since the sample size is large enough you can use normal approximation to calculate the $p$-value, that is z-test. Again, this can be easily be performed in Python through the `statsmodels` library.

**Note**
- $\alpha$: significance level, false positive rate, type I error (in how many cases we reject null when we should not)
- $\beta$: false negative rate, type II error (in how many cases we fail to reject null when we should)
- power: true positive rate, $1-\beta$
- $1-\alpha$: true negative rate

**References**
- https://towardsdatascience.com/ab-testing-with-python-e5964dd66143
- https://towardsdatascience.com/the-power-of-a-b-testing-3387c04a14e3

In [None]:
from statsmodels.stats.api import proportion_effectsize, NormalIndPower
from statsmodels.stats.proportion import proportions_ztest, proportion_confint

In [None]:
# estimate required sample size
effect_size = proportion_effectsize(0.13, 0.15)
sample_size = NormalIndPower().solve_power(
    effect_size, 
    power=0.8, 
    alpha=0.05, 
    ratio=1
)
sample_size = ceil(sample_size)

In [None]:
# import data
root_path = 'C:\\Users\\pgreselin\\OneDrive\\Python'
data = pd.read_csv(os.path.join(root_path, 'Dati', 'ab_test.csv'))
print(f'{len(data)} test results')
data.head(2)

In [None]:
# make sure there are no users that have been recorded multiple times
session_counts = data['user_id'].value_counts(ascending=False)
multi_users = session_counts[session_counts > 1].count()
print(f'There are {multi_users} users that appear multiple times in the dataset')

users_to_drop = session_counts[session_counts > 1].index
data = data[~data['user_id'].isin(users_to_drop)]
print(f'The updated dataset now has {len(data)} observations')

In [None]:
# sampling data
sample_size = 4720
control_sample = data[data['group'] == 'control'].sample(n=sample_size, random_state=22)
treatment_sample = data[data['group'] == 'treatment'].sample(n=sample_size, random_state=22)

ab_test = pd.concat([control_sample, treatment_sample], axis=0)
ab_test.reset_index(drop=True, inplace=True)
print(f'Considering {len(ab_test)} observations for A/B test')

In [None]:
# testing the hypotesis
control_results = ab_test[ab_test['group'] == 'control']['converted']
treatment_results = ab_test[ab_test['group'] == 'treatment']['converted']

n_con = control_results.count()
n_treat = treatment_results.count()
successes = [control_results.sum(), treatment_results.sum()]
nobs = [n_con, n_treat]

z_stat, pval = proportions_ztest(successes, nobs=nobs)
(lower_con, lower_treat), (upper_con, upper_treat) = proportion_confint(successes, nobs=nobs, alpha=0.05)

print(f'z statistic: {z_stat:.2f}')
print(f'p-value: {pval:.3f}')
print(f'ci 95% for control group: [{lower_con:.3f}, {upper_con:.3f}]')
print(f'ci 95% for treatment group: [{lower_treat:.3f}, {upper_treat:.3f}]')

---
<a name="error"></a>
# Errors

[Return to Contents](#contents)

In [None]:
# TODO: write me...
try:
    # some code...
except:
    # optional block
    # handling of exception (if required)
else:
    # some code...
    # execute if no exception
finally:
    # some code...
    # always executed

---
<a name="data"></a>
# Data types

[Return to Contents](#contents)

<a name="set"></a>
### set

A set is a collection of elements which is **unordered**, **unchangeable**, and **unindexed**.

Note that being a set unordered, set objects do not support indexing!.

In [None]:
s = {'pietro', 'luigi', 'mario', 'luigi'}
s2 = {'mario', 'bowser', 'sissy'}


len(s)                                                           # length of set
s.add('giulio')                                                  # add / remove elements to a set
s.remove('mario')                                                # remove: remove a specifi element from a set
s.pop()                                                          # pop: removes a random element from a set
s.intersection(s2)                                               # set intersection

<a name="list"></a>
### list

A list is a collection of elements which is **ordered**, **changeable**, and **allow duplicate values**.

**Ordered** means that the items have a defined order, and that order will not change.
Elements of ordered collections can be accessed by their indexes in the following manner:
```
l = ['pietro', 'luigi', 'mario']
l[2]
>>> 'mario'
```

**Changeable** means that user can change, add, and remove items in a list after it has been created.
```
l += ['domenico']
l[1] = 'filippo'
>>> ['pietro', 'filippo', 'mario']
```

In [None]:
l = ['pietro', 'luigi', 'mario']
l2 = ['mario', 'sissy', 'bowser']

len(l)                                                           # length of list
l[1]                                                             # accessing elements
l[:2], l[2:]                                                     # slice of list

# sorting of list
sorted(l, reverse=False)                                         # sorting of list in ascending / descending order
l.sort(), l.sort(reverse=False)                                  # inplace sorting of list in ascending / descending order
l.reverse()                                                      # reverse list
l[::-1]                                                          # this is an alternative way to reverse a list, basically it lists elements from the last to the first

# find elements in list
min(l), max(l)                                                   # find min / max
l.index('pietro')                                                # find index of element
np.max(l), np.min(l)                                             # find min / max in numpy
np.argmax(l), np.argmin(l)                                       # find argmin / argmax

# adding / removing elements to list
l.append('franco')                                               # append a single element at the end of the list
l.insert('peppe', 2)                                             # insert a single element at a given position in the list
l.remove('luigi')                                                # remove a specific element from a list (if multiple occurrences, only the first one is removed)
l.pop, ll.pop(1)                                                 # remove an element from any position in the list (by default the last one)
l += ['jhon', 'silvia']                                          # concatenate two lists into one

# filtering list
filtered(lambda x: x%5 == 0 and x%3 == 0, l)

# list intersection
# note that there is no method to perform list intersection directly, you have to use sets
set(l).intersection(l2)

#### list comprehension

List comprehension is a compact and elegant way to create a list from existing lists.  
It has a simple and short inline sintax that allows not to declare any variable and for this reason it is usually faster to be run.

In [None]:
number_list = [num for num in range(20) if num % 2 == 0]

<a name="tuple"></a>
### tuple

A tuple is a collection of elements which is **ordered**, **unchangeable**, and **allow duplicate values**.

If you try to change tuple's value then Python will throw TypeError
```
t = ('pietro', 'greselin', 23)
t[2] = 26
>>> TypeError: 'tuple' object does not support item assignment
```

In [None]:
t = ('pietro', 'greselin', 23)

len(t)                                                           # length of tuple
t[1]                                                             # accessing elements

In [None]:
# list of tuples
tup_list = [('pietro', 23), ('gino', 27), ('edo', 15), ('pietro', 31)]

# flatten list of tuples
list_flat = [tag for sublist in tup_list for tag in sublist]

# sort inplace list of tuples by given element
# by default ascending sort, you can change it using reverse=True
tup_list.sort(key=lambda tup: tup[1], reverse=True)

# find min / max in list of tuples
# the method itemgetter(n) will look for the min / max over the n-th coordinate
# REMEMBER this only works if the list has been previously sorted by the element you will group by on 
tup_list.sort(key=lambda tup: tup[1])
max(tup_list, key=itemgetter(1))

# aggregate list of tuples by given element
# REMEMBER this only works if the list has been previously sorted by the element you will group by on 
tup_list.sort(key=lambda tup: tup[0])
[max(v, key=itemgetter(1)) for k, v in groupby(tup_list, itemgetter(0))]

<a name="dict"></a>
### dictionary

A dictionary is a collection of key:value pairs which is **ordered**, **changeable** and **do not allow duplicates**.

In [None]:
d = {'name': 'pietro', 'surname': 'greselin', 'age': 23}

# accessing elements
d.keys(), d.values(), d.items()

# sort dictionary in ascending/descending order by key or value
dict(sorted(d.items(), key=itemgetter(1), reverse=True))
{k: v for k, v in sorted(d.items(), key=lambda item: item[1])}

# get maximum value from a dictionary
max(d.values())
# get key corrisponding to maximum value from a dictionary
max(d.items(), key=itemgetter(1))[0]

#### dictionary comprehension

Similar to list comprehension.

In [None]:
number_dict = {n: n**2 for n in range(5)}

# it works also with nested dictionaries
d = {
    'LIGURIA': {
        'IMPERIA': {
            'popolazione': 1234,
            'superficie': 2345,
            'densita': 0.5,
        },
        'GENOVA': {
            'popolazione': 63256,
            'superficie': 1432,
            'densita': 2.3,
        }
    },
    'UMBRIA': {
        'PERUGIA': {
            'popolazione': 5728,
            'superficie': 1346,
            'densita': 4,
        },
        'TERNI': {
            'popolazione': 125,
            'superficie': 8548,
            'densita': 0.1,
        }
    },
}
{f'{k0} - {k1}': v2 for k0, v0 in d.items() for k1, v1 in v0.items() for k2, v2 in v1.items() if k2 == 'popolazione'}

---

<a name="fnc"></a>
# Functions

[Return to Contents](#contents)

In [None]:
# decorators

class Mathematics:

    @staticmethod
    def addNumbers(x, y):
        return x + y

# create addNumbers static method
#Mathematics.addNumbers = staticmethod(Mathematics.addNumbers)

print('The sum is:', Mathematics.addNumbers(5, 10))

In [None]:
# decorators

class Maths:

    def addNumbers(x, y):
        return x + y

print('The sum is:', Mathematics.addNumbers(5, 10))

---

<a name="obj"></a>
# Objects

[Return to Contents](#contents)

In [None]:
class Employee():
    'Common base class for all employees'
    
    # class variables
    # they are variables whose value is shared among all instances of a this class
    # class variables are defined before any object method
    empCount = 0

    # init method
    # it's a class constructor or initialization method that Python calls when you create a new instance of this class
    # it is used to define object attributes
    def __init__(self, name, salary):
        self.name = name
        self.salary = salary
        Employee.empCount += 1
    
    # class methods
    # other class methods are defined as normal functions inside the object
    # their first parameter is self which points to the instance and allows to access attributes and other methods on the same object
    def displayCount(self):
        print f'Total Employee {Employee.empCount}'

    def displayEmployee(self):
        print f'Name : {self.name}, Salary: {self.salary}'
        
# object instance
emp1 = Employee('Zara', 2000)

In [None]:
# inheritance
# TODO: write me...

In [None]:
# classmethod and staticmethod
class MyClass:
    def method(self):
        return f'instance method called {self}'

    # classmethod do not accept a self parameter, instead it takes a cls parameter that points to the class (and not the object instance) when the method is called
    # classmethods cannot modify object instance state, it can only modify class state that applies across all instances of the class
    @classmethod
    def classmethod(cls):
        return f'class method called {cls}'

    # staticmethod takes neither a self nor a cls parameter (but of course it's free to accept an arbitrary number of other parameters)
    # it can neither modify object state nor class state and it's primarily a way to namespace your methods
    @staticmethod
    def staticmethod():
        return 'static method called'
    
myclass = MyClass()
print(myclass.method())
print(myclass.classmethod())  # note classmethod doesn’t have access to the <MyClass instance> object, but only to the <class MyClass> object
print(myclass.staticmethod())

In [None]:
# get list of object's methods and properties
inspect.getmembers(table)                                        # retrieve list of the members of an object
inspect.getmembers(table, predicate=inspect.ismethod)            # get list of methods of an object

---

<a name="io"></a>
# I/O

[Return to Contents](#contents)

<a name="create"></a>
## create

[Return to Contents](#contents)

In [None]:
string_ = 'pietro'
list_ = ['pietro', 'luigi', 'mario']
set_ = {'pietro', 'luigi', 'mario', 'luigi'}
tuple_ = ('pietro', 'greselin', 23)
dict_ = {'name': 'pietro', 'surname': 'greselin', 'age': 23}

<a name="write"></a>
## write

[Return to Contents](#contents)

In [None]:
# write string to txt file
with open('string.txt', 'w') as txt_file:
    txt_file.write(string)
txt_file.close()

# write list to txt file
with open('list.txt', 'w') as txt_file:
    for el in el_list:
        txt_file.write(el + '\n')
txt_file.close()

# write dictionary to csv file
with open('dict.csv', 'w') as csv_file:
    writer = csv.writer(csv_file)
    for key, value in dictionary.items():
        writer.writerow([key, value])
csv_file.close()

<a name="read"></a>
## read

[Return to Contents](#contents)

In [None]:
# read list from txt
with open('list.txt', 'r') as txt_file:
    content = txt_file.read().split('\n')
    
# read dictionary from csv
# dictionary is read as k,v
import csv
with open('dict.csv', 'r') as csv_file:
    reader = csv.reader(csv_file)
    mydict = {rows[0]: rows[1] for rows in reader}
    
# read dictionary from yaml
import yaml
with open('dict.yaml'), 'r') as yaml_file:
    mydict = yaml.safe_load(yaml_file)
    
# read json or other formats
from joblib import load
with open(input_path, 'r') as f:
    file = load(f)
    json_file = json.load(f)

In [None]:
# special characters
# when reading input file with special characters then those character can be encoded, to avoid it use the following syntax
with open('list.txt', encoding='utf-8') as txt_file:
    lines = txt_file.readlines()
    for line in lines:
        split_line = line.strip().split(' ; ')
        dizio_gen[split_line[0]] = split_line[1]

# when reading input file the following error coudl be retrieved:   UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0
# with txt file
with open('list.txt', 'r', encoding='ISO-8859-1') as txt_file:
    content = txt_file.read().split('\n')