<span>
<img src="http://www.sobigdata.eu/sites/default/files/logo-SoBigData-DEFINITIVO.png" width="180px" align="right"/>
</span>
<span>
<b>Author:</b> <a href="http://about.giuliorossetti.net">Giulio Rossetti</a><br/>
<b>Python version:</b>  3.6<br/>
<b>Last update:</b> 22/01/2018
</span>

<a id='top'></a>
# *Python Basics*

This notebook contains an overview of basic Python functionality that you might come across using Python for Social Science Research.

**Note:** this notebook is purposely not 100% comprehensive, it only discusses the basic things you need to get started.

## Table of Contents

1. [Hello Wolrd!](#display)
2. [Variables](#variables)
3. [Numeric Operations](#numeric)
4. [Strings](#strings)
5. [Data Structures](#ds)
6. [Slicing](#slicing)
7. [Functions](#functions)
8. [Code Blocks](#blocks)
9. [Flow Control](#flow)
10. [Comprehensions](#comp)
11. [Exceptions](#exceptions)
12. [File Input/Output](#io)
13. [Importing Libraries](#libs)
14. [Zen of Python](#zen)

<a id='display'></a>
## 1. Hello World!  ([to top](#top))

Your first - one line - program: **"Hello World"**

In [1]:
print("Hello world!")

Hello world!


<a id='variables'></a>
## 2. Variables and Types  ([to top](#top))

In programming languages **variables** are named entities used to store information.

A **variable** can reference several types of contents, for instance:
- Basic numeric **types** in Python are ``int`` for integers and ``float`` for floating point numbers.
- Strings are represented by ``str``, in Python 3.x this implies a sequence of Unicode characters.


In [2]:
a = 1
b = 0.5
c = 'Giulio'

In [3]:
type(a), type(b), type(c) # check variable type

(int, float, str)

In [4]:
isinstance(a, float) # test variable type:  a is a float? --> true or false

False

In [5]:
int(2.5), str(2), float(3) # type conversion 

(2, '2', 3.0)

In [6]:
print("Have a nice class,", c)

Have a nice class, Giulio


<a id='numeric'></a>
## 3. Numeric operations  ([to top](#top))

The python interpreter allows to perform simple math

In [7]:
5+2 # sum (subtraction, alike)

7

In [8]:
5/2 # division

2.5

In [9]:
5%2 # module: resto dell'operazione di divisione

1

In [10]:
5//2 # integer division

2

In [11]:
5**2 # exponentiation

25

<a id='strings'></a>
## 4. String basics  ([to top](#top))

Strings are used to describe basilar text objects: as such they can be manipulated and transformed.

**Note:** strings are *immutable*, any tranformation generates a modified *copy* of the original string.

In [12]:
s = "This is a string"

In [13]:
l = s.split(" ")  # divide la stringa dove è presente il carattere specificato e mette i vari pezzi in una list in ordine
l

['This', 'is', 'a', 'string']

In [14]:
r = s.replace("a", "THE")  # sostituisce il primo elemento con il secondo
r

'This is THE string'

In [15]:
s.lower()  # tutte le lettere minuscole

'this is a string'

In [16]:
s.upper()  # tutte le lettere maiuscole

'THIS IS A STRING'

In [17]:
name = 'Giulio'
age = 33

'Hi, my name is {}: I am {}'.format(name, age)  # {}  sono dei placeholder (tieniposto) che vengono sostituiti dai valori 
                                                # nella funzione format, in ordine


'Hi, my name is Giulio: I am 33'

In [18]:
name = 'Giulio'
age = 33

'Hi, my name is %s: I am %d' % (name, age)  # fa la stessa cosa di prima. ma in questo caso abbiamo un richiamo diverso in base
                                            # al data type:  %s --> stringa   %d --> numero

'Hi, my name is Giulio: I am 33'

<a id='ds'></a>
## 5. Data structures  ([to top](#top))

**Data structures** are conceptual models that are used to *organize* and *structure* data. 

There are 4 basic data structures: lists (list), tuples (tuple), dictionaries (dict), and sets (set)

### Lists

List are **mutable** collections of objects (their contents can change as the program executes).

In python a lists is defined using brackets

nelle liste l'ordine conta: gli elementi vengono inseriti in fondo alla lista o nell'ordine da te prescelto (e non in un altro ordine o a caso) e gli elementi possono essere ripetuti.
questo fa si che sia possibile usare le liste per dichiarare vettori e matrici di numeri.

In [8]:
fruits = ['apple', 'banana', 'orange'] 
fruits

['apple', 'banana', 'orange']

In [9]:
fruits.append('pinapple') # la lista è mutabile, infatti ad esempio possiamo aggiungerci elementi con  .append
fruits

['apple', 'banana', 'orange', 'pinapple']

In [10]:
fruits.append('banana') # possiamo inserire elementi già presenti nella lista e il doppione permane 
                        # (es. è possibile che in un vettore ci siano due numeri uguali)
fruits

['apple', 'banana', 'orange', 'pinapple', 'banana']

### Tuples

Tuples are **immutable** collections of objects.

In python tuples are enclosed in parentheses<br/>
**Note**: You cannot add or remove elements from a tuple but they are faster and consume less memory

In [21]:
fruits = ('apple', 'banana', 'orange')
fruits

('apple', 'banana', 'orange')

### Dictionaries

Dictionaries are **mutable** key-indexable objects.

In python dictionaries are build using curly brackets<br/>
**Note**: Dictionaries are unordered but have key, value pairs

In [22]:
student = {'name': 'Albert', 
           'age': 21, 
           'department': 'Computer Science' }
print(student['name'], student['age'], student['department'])

Albert 21 Computer Science


In [23]:
student['income'] = 50  # così aggiungi un nuovo oggetto: key + valore (senza nessun metodo, basta scrivere una nuova key)
del student['age']  # così elimini un oggetto: quello con la key indicata
student

{'name': 'Albert', 'department': 'Computer Science', 'income': 50}

In [25]:
'name' in student, 'age' in student  # la key è nella dictionary? vero o falso?

(True, False)

In [24]:
student.keys()  # LIST con le key

dict_keys(['name', 'department', 'income'])

In [26]:
student.values()  # LIST con i valori

dict_values(['Albert', 'Computer Science', 50])

In [27]:
student.items()  # LIST con le coppie key,valore

dict_items([('name', 'Albert'), ('department', 'Computer Science'), ('income', 50)])

In [None]:
''' questi 3 metodi sono fondamentali e li riuseremo molto, soprattutto il terzo:  .item()
questo perchè se vogliamo grabbare un elemento della dictionary non possiamo farlo direttamente con la
struttura dati dictionary stessa, ma dobbiamo prima trasformarla in una LIST con all'interno Tuples con le coppie di elementi.
'''

### Sets

A set is like a list but it can only hold unique values: cioè non possono esserci elementi doppioni nella lista

In [16]:
fruits1 = {'apple', 'banana', 'orange'}
fruits2 = set(['kiwi', 'banana', 'melon'])

fruits1, fruits2

({'apple', 'banana', 'orange'}, {'banana', 'kiwi', 'melon'})

In [17]:
fruits1.add('apple')  # così si aggiunge un elemento al set
fruits2.update(['apple', 'banana'])  # così si aggiungono più elementi al set in una volta sola
fruits1, fruits2
# come vediamo dall'output gli elementi vengono inseriti in ordine alfabetico e non nell'ordine in cui si inseriscono.
# inoltre se inseriamo elementi già presenti essi non vengono duplicati, e il set non si modifica.

({'apple', 'banana', 'orange'}, {'apple', 'banana', 'kiwi', 'melon'})

Many operations can be efficiently performed using sets
e questo è il motivo per cui essi vengono usati al posto delle LIST: per fare le stesse operazioni con le List
ci vorrebbe molto più codice

In [18]:
it = fruits1 & fruits2 # intersection
it

{'apple', 'banana'}

In [19]:
un = fruits1 | fruits2 # union
un

{'apple', 'banana', 'kiwi', 'melon', 'orange'}

In [20]:
diff = fruits1 - fruits2 # difference
diff

{'orange'}

### Combination

Data structures can be combined and nested

In [10]:
combo = ('apple', 'orange')
mix = {'fruit' : ['banana', 'pear'], combo: ('melon', 'kiwi')}  # Dictionary con all'interno List e Tuples
mix

{'fruit': ['banana', 'pear'], ('apple', 'orange'): ('melon', 'kiwi')}

<a id='slicing'></a>
## 6. Slicing  ([to top](#top))

If an object is ordered (such as a list or tuple) you can select on index

In [12]:
fruits = ['apple', 'banana', 'orange', 'pineapple', 'pear']

In [13]:
first_fruit = fruits[0]
first_fruit

'apple'

In [14]:
last_fruit = fruits[-1]
last_fruit

'pear'

In [15]:
subset = fruits[1:4] # tutti gli elementi da 1 a 3 --> By convention: left index included, rigth excluded
subset

['banana', 'orange', 'pineapple']

In [38]:
subset = fruits[:4]  # tutti gli elementi da 0 a 3
subset

['apple', 'banana', 'orange', 'pineapple']

**Note**: slicing also works on strings!  perchè anche le stringhe sono delle collezioni di oggetti ordinate

In [39]:
s = "Hello world!"
s[6:]

'world!'

<a id='functions'></a>
## 7. Functions  ([to top](#top))

A funciton is a **named** and **reusable** snippet of code.

A function takes *arguments* as input and defines logic to process these inputs (and possibly returns something).

In [40]:
def multiply(a, b):
    return a*b

The action expressed by **multiply** will only execute once you call it:

In [41]:
multiply(2, 3)

6

Function can also define default values for their arguments:

In [42]:
def multiply(a, b=5):
    return a*b

multiply(2)  # quando poi la richiami ti basterà esplicitare un input solo che sarà quello non definito (es. a)

10

In [43]:
multiply(2, b=3)  # se vuoi riassegnare il valore di b devi specificarlo

6

<a id='blocks'></a>
## 8. Code Blocks  ([to top](#top))

In python blocks of code can be nested.<br/>
Variables in the outer blocks can be seen by the inner ones, the opposite does not apply.

Indentations are required by Python to define blocks of code. Each indentation level is identified by 4 spaces (one tab)<br/>
**Note**: code subsets have their own local *scope* (notice variable *a*):

In [35]:
def layer_1():
    a = 'Layer 1'
    print(a)
    
    def layer_2():
        a = 'Layer 2'
        print(a)

In [31]:
a = 3
if a > 0:
    print('b')
    if a == 3: 
        print('c')
else:
    print('d')

b
c


<a id='flow'></a>
## 9. Flow Control  ([to top](#top))

Python, as all programming languages, defines primitives to allow flow control: they are **conditional** and **cycles**

### Conditional: If-Elif-Else

A conditional statement allows to check **logic** conditions. 

Conditions can be:
- mathematical (<, >, <=, >=, !=)
- logical (and, or, not, in, is)

In [47]:
age = 25
if age == 20:
    print('A')
elif age < 20:
    print('B')
elif age >= 25:
    print('C')
else:
    print('D')

C


In [37]:
sex = "M"
age = 20
if age in [20, 21, 25] and sex == 'M':
    print("Hello")
    
if age is not None:
    print(age + 6)

Hello
26


#### None: a special "value"
``None`` is just a value that commonly is used to signify 'empty', or 'no value here'.

A variable can be tested to be ``None`` with the keyword ``is``

si usa ad esempio per gestire i valori mancanti nei dataset

In [50]:
name = "Irene"
if name is not None:
    print('Hi {}'.format(name))

Hi Irene


### Cycles: For loops

``for`` cycles are used to iterate over a data structure (e.g., a list). <br/>
Since the size of a given data structure is *finite*, a ``for`` cycle *necessarely* ends.

In [51]:
for num in range(0, 6, 2): # range(from, to, step)
    print(num)

0
2
4


In [39]:
array_1 = list(range(0, 6, 2))  # range(0, 6, 2) -->  numeri interi da 0 a 6 andando avanti di due in due

for num in array_1:
    print(num)

0
2
4


In [45]:
list_fruit = ['Apple', 'Banana', 'Orange']

for i, fruit in enumerate(list_fruit):
    print(i, fruit, list_fruit[i])
    
# enumerate(LIST) prende una lista in input e restituisce una matrice a coppie di elementi: indice, elemento lista.
# es.  [(0, 'Apple'), (1, 'Banana'), (3, 'Orange')]
# le coppie di elementi sono delle Tuples (immutabili). vedi anche il blocco di codice successivo, che fa la stessa cosa.

0 Apple Apple
1 Banana Banana
2 Orange Orange


Looping over a list of tuples

In [53]:
tuple_in_list = [(1, 2), (3, 4)]
for a, b in tuple_in_list:
    print(a + b)

3
7


Looping over a dictionary

In [54]:
dictionary = {'one' : 1, 'two' : 2, 'three' : 3}

for k, v in dictionary.items():
    print(k, v)

# come si vede per looppare su una dictionary prima va trasformata in una struttura dati come questa sopra:
# Lista[Tuples(), Tuples(), ...., Tuples()]
# con il metodo  .item()

one 1
two 2
three 3


### Cycles: While loops

``While`` cycles allow to loop until a specific condition (called *guard*) is satisfied. <br/>

Indeed, conversely from ``for`` cycles, you can describe infinite loop using while.

In [55]:
count = 0
while count < 4:
    print(count)
    count += 1

0
1
2
3


<a id='comp'></a>
## 10. Comprehensions  ([to top](#top))

Comprehension makes it easier to generate a list or dictionary using a loop.

### List comprehension

In [56]:
lst = [x + 2 for x in range(0,6)]
lst

[2, 3, 4, 5, 6, 7]

It is an alternative to:

In [57]:
lst = []
for x in range(0,6):
    lst.append(x + 2)
lst

[2, 3, 4, 5, 6, 7]

### Dict comprehension

In [58]:
ndt = {'num_{}'.format(x) : x + 5 for x in range(0,6)}
ndt

{'num_0': 5, 'num_1': 6, 'num_2': 7, 'num_3': 8, 'num_4': 9, 'num_5': 10}

It is an alternative to:

In [59]:
ndt = {}
for x in range(0,6):
    ndt['num_{}'.format(x)] = x + 5
ndt

{'num_0': 5, 'num_1': 6, 'num_2': 7, 'num_3': 8, 'num_4': 9, 'num_5': 10}

### Comprehension with conditions

In [60]:
lst = [x for x in range(0,6) if x%2==0]
lst

[0, 2, 4]

<a id='exceptions'></a>
## 11. Exceptions  ([to top](#top))

In Python when something *wrong* happens an exception is raised:

In [46]:
num_list = [1, 2, 3]
num_list.remove(4)

ValueError: list.remove(x): x not in list

You can catch exceptions using try and except:

In [47]:
try:
    num_list.remove(4)
except:
    print('ERROR!')

ERROR!


It is usually best practice to specify the error type to except:

In [48]:
try:
    num_list.remove(4)
except ValueError as e:
    print('Number not in the list')
except Exception as e:
    print ('Generic error')
finally:
    print('Done')

Number not in the list
Done


<a id='io'></a>
## 12. File Input/Output  ([to top](#top))

You can open a file with different file modes: <br/>
w -> write only <br/>
r -> read only <br/>
w+ -> read and write + completely overwrite file <br/>
a+ -> read and write + append at the bottom <br/>

In [None]:
with open('new_file.txt', 'w') as file:
    file.write('Content of new file. \nHi there!')

# con  open('...', 'w')  apri il file di testo e sovrascrivi completamente tutto il file

In [None]:
with open('new_file.txt', 'r') as file:
    file_content = file.read()
    
file_content

# solo lettura del file

In [None]:
print(file_content)

In [None]:
with open('new_file.txt', 'a+') as file:
    file.write('\n' + 'New line')

# apri il file e inserisci nuovo testo in fondo (non elimini il testo esistente)

In [None]:
with open('new_file.txt', 'r') as file:
    for line in file:
        print(line)

# printi tutte le righe del file di testo

<a id='libs'></a>
## 13. Importing Libraries  ([to top](#top))

In python some funtionality are left outside from the core language: in order to access them it is necessary to ``import`` dedicated packages.

A ``package`` is a choerent collection of ``functions``. 

There exist hundreds of thousands different packages: if you need something, it is likely that someone already coded and packaged it!

In [41]:
import math
math.sin(1)

0.8414709848078965

In [42]:
import math as mt
mt.sin(1)

0.8414709848078965

In [43]:
from math import sin
sin(1)

0.8414709848078965

To install a new package you can use a command line tool: ``pip``.

Just open a terminal (from Anaconda) and type

    pip install <package name>
  

<a id='zen'></a>
## 14. Zen of Python ([to top](#top))

When you have doubts just remember the "Zen of Python"

In [40]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
