# Basi di Dati Mod. 2 - Python crash course

### Stefano Calzavara, Università Ca' Foscari Venezia
### Parte di questo materiale è stato riadattato dal corso di Web Intelligence del prof. Claudio Lucchese



## What is Python?

Python is a powerful scripting language:
 - object-oriented language with support for functional programming
 - very easy syntax, similar to pseudo-code, and tons of libraries
 - dynamically typed: be careful!

## How can I run my Python code ?

Python is an **interpreted** language. It can be run in two modes:
 - interactive mode: launch with no argument and write your code in the prompt
 - non-interactive mode: launch passing a .py file and let it run

In these lectures we use **Jupyter notebooks** for teaching purposes:
 - multiple ways to install Jupyter, pick one!
 - Anaconda: https://www.anaconda.com/distribution/
 - Colab: https://colab.research.google.com
 - we use python 3 and not python 2.7: significant differences

Jupyter notebooks allow one to:
 - write complex documents interleaving text with programs
 - run the programs through an interactive interpreter accessed via a web browser
 
Great for teaching and fast prototyping!
 
Additional tools:
 - PyCharm by JetBrains https://www.jetbrains.com/pycharm/
 - any other text editor or IDE, really

## Your best friends in learning Python

1. The Python website:
    - plenty of links to books and tutorials!
        - e.g., https://docs.python.org/3/tutorial/
0. The official Python documentation:
    - https://docs.python.org/3/library/index.html
0. Google & StackOverflow:
    - try googling for `TypeError: can't multiply sequence by non-int of type 'float'`
0. Python Tutor
    - visualizes the execution of python code
    - http://pythontutor.com/

## Who uses Python

 - The popular *YouTube* video sharing service is largely written in Python
 - The *Dropbox* storage service codes both its server and desktop client primarily in Python
 - The widespread *BitTorrent* peer-to-peer file sharing system began its life as a Python program
 - *Netflix* and *Yelp* have both documented the role of Python in their software infrastructures
 - *JPMorgan, Chase, UBS, Getco, and Citadel* apply Python to financial market forecasting
 - *NASA, Los Alamos, Fermilab, JPL*, and others use Python for scientific programming tasks
 - In "The Anatomy of a Large-Scale Hypertextual Web Search Engine" 1998, Google founders describe crawlers written in Python
 - All the users of *Flask*, the popular web development framework which we will study

# Python types

Python provides the following types:

| Object type | Examples |
|:-:|:-:|
| Integers | `1234`, `5678`, ... |
| Floats | `3.1415`, `7.0`, ...|
| Strings | `'spam'`, `"Bob's"`, ... |
| Lists   | `[1, [2, 'three'], 4.5]`, `list(range(10))`, ... |
| Dictionaries | `{'food': 'spam', 'taste': 'yum'}`, `dict(hours=10)`, ... |
| Tuples |  `(1, 'spam', 4, 'U')`, `tuple('spam')`, ...|
| Files |   `open('eggs.txt')`, `open(r'C:\ham.bin', 'wb')`, ... |
| Sets  | `set('abc')`, `{'a', 'b', 'c'}`, ... |
| Other core types | `Booleans`, `None`, ... |

 - The type of a variable is inferred from the context: no need for type annotations
 - You can use the function `type` to ask Python the type of an expression
 - The type determines the set of valid operators

In [1]:
a = 2.0
print (type(a))
a = "Hello!"
print (type(a))
print (type(3.1 * 5))

<class 'float'>
<class 'str'>
<class 'float'>


# Numbers

Check integer vs. floating point division. Type of the results is determined by the operation.

In [2]:
print ("What is the output of 11/2:", 11/2)
print ("What is the output of 11%2:", 11%2)
print ("What is the output of  2**10:", 2**10)

What is the output of 11/2: 5.5
What is the output of 11%2: 1
What is the output of  2**10: 1024


In [3]:
print ("What is the output of 11//2:", 11//2)

What is the output of 11//2: 5


# Strings

Check the `*` operation.

In [4]:
print ("What is the output of 'a'+'b':",  'a'+'b'   )
print ("What is the output of 'a'=='b':", 'a'=='b'  )
print ("What is the output of 'a'<='b':", 'a'<='b'  )
print ("What is the output of 'a'<='A':", 'a'<='A'  )

What is the output of 'a'+'b': ab
What is the output of 'a'=='b': False
What is the output of 'a'<='b': True
What is the output of 'a'<='A': False


In [5]:
print ("What is the output of 'a'*5:",    'a'*5 )
print ("What is the output of 'a'/5:", 'a'/5 )

What is the output of 'a'*5: aaaaa


TypeError: unsupported operand type(s) for /: 'str' and 'int'

In [None]:
print    ("What is the output of int('10')/5:", int('10')/5)

In [None]:
print ( str(9) * 4 )

# Conditional Statements

Tabbing is used to identify the body of `if`-`else` and other constructs such as `for`, `while`, `functions`.

Check if a variable x is within the interval $[0,10]$.

In [None]:
x = 33
if x >= 0 and x <= 10:
    print ("x is in the interval [0,10]")
else:
    print ("x is not in the interval [0,10]")

In [None]:
x = 33
# This is a special compact form
if 0 <= x <= 10:
    print ("x is in the interval [0,10]")
else:
    print ("x is not in the interval [0,10]")

# While Loops

Nothing new: `while`, `break` `continue`. Don't forget about good programming!


In [None]:
i = 0
while i < 10:
    if i == 8: break
    i += 1
    
    if i == 5: continue
    print ("Completed Iteration N.", i)

print ("I'm out of the loop")

# For Loops

A `range` is a special tool to create sequences of numbers, given start, end, and step parameters.

In [None]:
for i in range(5):
    print ("This is Iteration N.", i)

In [None]:
for i in range(0,10,2):
    print ("This is Iteration N.", i)

In [10]:
for i in range(10,0,-2):
    print ("This is Iteration N.", i)

This is Iteration N. 10
This is Iteration N. 8
This is Iteration N. 6
This is Iteration N. 4
This is Iteration N. 2


In [7]:
print (range(5))

range(0, 5)


This is called **iterable**! You can only iterate through it ...

In [6]:
print (type(range(5)))

<class 'range'>


# Lists

Lists are very frequently used. They are **heterogeneous** and **mutable**.

In [None]:
for i in [0,1,2,3,4]:
    print ("This is Iteration N.", i)

In [None]:
my_list = [1,2,3] + [4,5]
print (my_list)

In [None]:
my_list = [1,2,3]
my_list += [4,5]
print (my_list)

In [None]:
my_list = [1,2,3] + ["donald duck", 42.0]
print (my_list)

In [None]:
my_list = [1,2,3] + ["donald duck", ["this", "is", 1, "nested", "list"] ]
print (my_list)

In [None]:
print ( len([1,2,3,4,5]) )
print ( len([1,2,[3,4,5]]) )

In [None]:
my_list = [1,2,3,4,5]
print ( my_list[0] )
print ( my_list[4] )
print ( my_list[5] )

In [None]:
my_list = [1,2,3,4,5]
print ( my_list[-1] )
print ( my_list[-2] )
print ( my_list[-100] )

In [None]:
my_list = [1,2,3,4,5,4,3,2,1]

print ( 3 in my_list )

print ( my_list.count(3) )

print ( my_list.index(1) )
print ( my_list.index(33) ) # this raises an error

# Slicing

Slicing allows to access a sublist using a special syntax

In [None]:
my_list = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']

print ( my_list[1:3] ) 

In [None]:
print ( my_list[3:-1] ) 

In [None]:
print ( my_list[3:] )

In [None]:
print ( my_list[0:7:2] )

In [None]:
print ( my_list[0::2] )

In [None]:
print ( my_list[::2] )

In [None]:
print ( my_list[::-1] )

# Lists are mutable

Elements of a list can be replaced. Sublists can be replaced with other sublists.

In [None]:
# original list
my_list = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']
print (my_list)

# modify one element
my_list[-2] = 'ultramarine'

# the new list
print (my_list)

In [None]:
my_list = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']
my_list[4] = ['light blue', 'blue', 'dark blue']
print (my_list)

In [None]:
# here we replace one slice with another slice
my_list = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']
my_list[4:5] = ['light blue', 'dark blue', 'darker blue']
print (my_list)

In [None]:
# A special case of replacement when start and end index are the same
my_list = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']
my_list[5:5] = ['light blue', 'dark blue', 'darker blue'] 
print (my_list)

In [None]:
my_list = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']
my_list[2] = []
print (my_list)

In [None]:
my_list = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']
my_list[2:3] = []
print (my_list)

In [None]:
my_list = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']

print ("Is orange in the rainbow?", 'orange' in my_list )

print ("Is brown in the rainbow?", 'brown' in my_list )

print ("Is it true that cobalt is not in the rainbow?", 'cobalt' not in my_list )

# Tuple

Tuples are largely similar to lists, but they are **immutable**.

In [None]:
my_tuple = (1,2,3,4, "five")

print (my_tuple)
print (my_tuple[2])

In [None]:
my_tuple[2] = 3

In [None]:
my_tuple = (1,2,3) 
my_tuple += (4, "five")

print (my_tuple)
print (my_tuple[2])

# Unpacking

Multiple assignment, typical of function returning multiple values.

In [None]:
my_tuple = (1,2,3)
a,b,c = my_tuple
print (a,b,c)

In [None]:
my_list = [1,2,3]
a,b,c = my_list
print (a,b,c)

# Sorting

In-place vs. returning a new list.

In [None]:
my_list = [2,3,1]

my_list.sort()

print (my_list)

In [None]:
my_list = [2,3,1]

new_list = sorted( my_list )

print (my_list)
print (new_list)

# Be careful!

Check in python tutor when you are in doubt! http://pythontutor.com/

In [None]:
a = 11
b = a
a = 22
print (a,b)

In [None]:
a = [11]
b = a
a[0] = 22
print (a,b)

In [None]:
a = b = [1,2]
c = d = [1,2]
a += [3]
c = c + [3]
print(b)
print(d)

In [None]:
my_list = [1,2,3]
new_list = my_list
my_list += [77]
print ( new_list + my_list)

In [None]:
my_list = [1,2,3] * 2
print (my_list)

In [None]:
my_list = [ [1,2,3] ] * 2
print (my_list)

In [None]:
my_list[0] += [4]
print ( my_list )

In [None]:
my_tuple = (1,2,3)
new_tuple = my_tuple
my_tuple += tuple([77])
print ( new_tuple + my_tuple)

In [None]:
# if you want to actually copy a list
a = [11]
b = a.copy()
a[0] = 22
print (a,b)

In [None]:
a = [11]
b = list(a)
a[0] = 22
print (a,b)

In [None]:
a = [11]
b = a[:]
a[0] = 22
print (a,b)

# Iterating through lists

Or through multiple lists.

In [None]:
my_list = [2,3,1]
for x in my_list:
    print (x)

In [None]:
my_list = [2,3,1]
for i,x in enumerate(my_list):
    print (i,x)

In [None]:
my_list = [2,3,1]
for z in enumerate(my_list):
    print (z, type(z))

In [None]:
A = [2,3,1]
B = ["two", "three", "one"]
for a,b in zip(A,B):
    print (a,b)

# More about strings

Strings are like lists of character, but they are **immutable**.

In [None]:
msg = "I like programming with python!"

In [None]:
print (msg[2])

In [None]:
print (msg[2:6])

In [None]:
msg[3] = "x"

In [None]:
for c in msg:
    print (c)

In [None]:
print (msg.split())

In [None]:
print (msg.split("i"))

In [None]:
#Remove leading and trailing whitespaces

my_string = "     A Bit of Python \n"

print ( "---", my_string, "---" )
print ( "---", my_string.strip(), "---" )

In [None]:
# Remove leading and trailing characters of choice

my_string = "###!#!#!##!#A Bit of Python?!!???##"

print ( "---", my_string.strip("#"), "---" )
print ( "---", my_string.strip("#?"), "---" )
print ( "---", my_string.strip("!?#"), "---" )

# Sets

The mathematical notion of set.

In [None]:
my_set = set([1,2,3,4,5,4,3,2,1])

print (my_set)

In [None]:
A = set([1,2,3])
B = set([4,5])
C = A | B

print (C)

In [None]:
A = set([1,2,3])
B = set([3,4,5])
C = A & B

print (C)

In [None]:
A = set([1,2,3])
B = set([3,4,5])
C = A - B

print (C)

In [None]:
A = set([1,2,3])
B = set([3,4,5])

print (1 in A)
print (7 not in A)

# Dictionaries

A dictionary is a map between keys and values

In [None]:
my_dict = {1:"Jan", 2:"Feb", 3:"Mar", 4:"Apr", 5:"May", 6:"Jun",
           7:"Jul", 8:"Aug", 9:"Sep", 10:"Oct", 11:"Nov", 12:"Dec"}

print (my_dict[0])

In [None]:
my_dict = {1:"Jan", 2:"Feb", 3:"Mar", 4:"Apr", 5:"May", 6:"Jun",
           7:"Jul", 8:"Aug", 9:"Sep", 10:"Oct", 11:"Nov", 12:"Dec"}

print (my_dict[1])
print (my_dict[12])

In [None]:
my_dict = {1:"Jan", 2:"Feb", 3:"Mar", 4:"Apr", 5:"May", 6:"Jun",
           7:"Jul", 8:"Aug", 9:"Sep", 10:"Oct", 11:"Nov", 12:"Dec"}

my_dict[1] = 777
del my_dict[12]
print (my_dict)

In [None]:
my_dict[84] = "spam"
print (my_dict)

In [None]:
print (my_dict.keys())

In [None]:
print (my_dict.values())

In [None]:
my_dict = {1:"Jan", 2:"Feb", 3:"Mar", 4:"Apr", 5:"May", 6:"Jun",
           7:"Jul", 8:"Aug", 9:"Sep", 10:"Oct", 11:"Nov", 12:"Dec"}

for k in my_dict:
    print (k)

In [None]:
my_dict = {1:"Jan", 2:"Feb", 3:"Mar", 4:"Apr", 5:"May", 6:"Jun",
           7:"Jul", 8:"Aug", 9:"Sep", 10:"Oct", 11:"Nov", 12:"Dec"}

for k,v in my_dict.items():
    print (k,v)

# Comprehensions

Creating lists by iterating through other lists.

In [None]:
my_list = [x**2 for x in range(10)]
print (my_list)

In [None]:
my_list = [x**2 for x in range(10) if x%2==0]
print (my_list)

In [1]:
my_dict = {x:x**2 for x in range(10) if x%2==0}
print (my_dict)

{0: 0, 2: 4, 4: 16, 6: 36, 8: 64}


# Functions

Do not write code outside functions! Be careful when passing lists as parameters...

You can return lists, tuples, sets, dictionaries, etc.

In [None]:
def square(x):
    return x**2

print ( square(3) )

In [None]:
def powers(x,n):
    return [ x**i for i in range(n) ]

print ( powers(2,5) )

In [None]:
copy_f = powers

print ( copy_f(2,5) )

In [None]:
powers_3 = lambda x:powers(x,3)

print (powers_3(5))

In [None]:
a = [1,-2,3,-4,5,-6]

print (sorted(a))

print (sorted(a, key=lambda x:abs(x)))

In [None]:
def add1(x):
    x+=1
    return x

y = 10
z = add1(y)
print( y,z )

In [None]:
def add1(x):
    for i in range(len(x)):
        x[i] = x[i]+1
    return x


y = [1,2,3,4,5]
z = add1(y)
print( y,z )

In [None]:
def myfun (a, b=3, c=77):
    print (a,b,c)
    
myfun(10)
myfun(10,20)
myfun(10, c=99)

# Classes and objects

In [None]:
import math

class Point:
    
    def __init__(self, x, y):    # three arguments are given
        self.x = x
        self.y = y
        
    def distance(self, p):
        return math.sqrt((self.x - p.x)**2 + (self.y - p.y)**2)

In [None]:
p = Point(2.0, 6.0)
q = Point(2.0, 2.0)
print(p.distance(q))

In [None]:
print(p.x)

In [None]:
class Dog:

    tricks = []             # shared by all instances of the class

    def __init__(self, name):
        self.name = name

    def add_trick(self, trick):
        self.tricks.append(trick)
        
    def bark(self):
        return "Bau bau bau!"

d = Dog('Fido')
e = Dog('Buddy')
d.add_trick('roll over')
e.add_trick('play dead')
print(d.tricks)             # shared by all dogs

In [None]:
class Dog:

    def __init__(self, name):
        self.name = name
        self.tricks = []    # creates a new empty list for each dog

    def add_trick(self, trick):
        self.tricks.append(trick)
        
    def bark(self):
        return "Bau bau bau!"

d = Dog('Fido')
e = Dog('Buddy')
d.add_trick('roll over')
e.add_trick('play dead')
print(d.tricks)
print(e.tricks)

In [None]:
class DogWithPedigree(Dog):
    
    def __init__(self, name, parent):
        Dog.__init__(self, name)
        self.parent = parent
        
    def good_dog(self):
        return set(self.parent.tricks).issubset(self.tricks)
    
    def bark(self):
        return "BAU BAU BAU!!!"
    
f = DogWithPedigree('Bolt', d)
print(f.good_dog())
f.add_trick('roll over')
print(f.good_dog())

In [None]:
print(d.bark())
print(f.bark())

# Let's play with data!

I used excel to transform the data file from http://tennis-data.co.uk/alldata.php into a csv file.

In [2]:
!ls

2020.csv  Python.ipynb


In [3]:
!head 2020.csv

ATP,Location,Tournament,Date,Series,Court,Surface,Round,Best of,Winner,Loser,WRank,LRank,WPts,LPts,W1,L1,W2,L2,W3,L3,W4,L4,W5,L5,Wsets,Lsets,Comment,B365W,B365L,PSW,PSL,MaxW,MaxL,AvgW,AvgL
1,Doha,Qatar Exxon Mobil Open,06/01/2020,ATP250,Outdoor,Hard,1st Round,3,Bublik A.,Mannarino A.,55,43,919,1111,6,3,6,4,,,,,,,2,0,Completed,2,"1,72","2,21","1,74","2,25","1,8","2,11","1,72"
1,Doha,Qatar Exxon Mobil Open,06/01/2020,ATP250,Outdoor,Hard,1st Round,3,Moutet C.,Sandgren T.,81,68,638,803,7,6,6,4,,,,,,,2,0,Completed,"1,57","2,25","1,6","2,47","1,65","2,47","1,59","2,34"
1,Doha,Qatar Exxon Mobil Open,06/01/2020,ATP250,Outdoor,Hard,1st Round,3,Verdasco F.,Andujar P.,49,64,1025,867,6,4,6,3,,,,,,,2,0,Completed,"1,25","3,75","1,31","3,74","1,33","3,85","1,29","3,53"
1,Doha,Qatar Exxon Mobil Open,06/01/2020,ATP250,Outdoor,Hard,1st Round,3,Bedene A.,Ymer M.,58,76,905,681,3,6,6,4,6,3,,,,,2,1,Completed,"1,83","1,83","1,97","1,92",2,"2,07","1,87","1,92"
1,Doha,Qatar Exxon Mobil Open,06/01/2020,ATP250,O

In [1]:
def load_data(data_file):
    # read text lines
    raw_lines = []
    with open(data_file) as f:
        raw_lines = [line.strip() for line in f]
    
    # extract header
    header = raw_lines[0]
    fields = header.split(",")
    
    # put data into a "transposed" dictionary
    data = { c:[] for c in fields }
    for line in raw_lines[1:]:
        values = line.split(",")
        for c,v in zip(fields, values):
            data[c] += [v]
    
    return data

In [2]:
dataset = "2020.csv"

data = load_data(dataset)
print ( data.keys() )

dict_keys(['ATP', 'Location', 'Tournament', 'Date', 'Series', 'Court', 'Surface', 'Round', 'Best of', 'Winner', 'Loser', 'WRank', 'LRank', 'WPts', 'LPts', 'W1', 'L1', 'W2', 'L2', 'W3', 'L3', 'W4', 'L4', 'W5', 'L5', 'Wsets', 'Lsets', 'Comment', 'B365W', 'B365L', 'PSW', 'PSL', 'MaxW', 'MaxL', 'AvgW', 'AvgL'])


In [None]:
print ( data["Location"] )

## Answer the following questions:

 - What is the number of matches?
 - List the tournament names (without duplicates)
 - List the player names (without duplicates)
 - Find the player with most wins and the corresponding number of wins

## What is the number of matches?

## List the tournament names

## List the player names

## Find the player with most wins

## Moving to databases!

In [None]:
import sqlite3

con = sqlite3.connect("tennis.db")

con.execute('''CREATE TABLE IF NOT EXISTS event(atp integer,
                                                location text, 
                                                tournament text, 
                                                date text, 
                                                series text, 
                                                court text, 
                                                surface text, 
                                                PRIMARY KEY (atp))''')

con.execute("INSERT INTO event VALUES (?, ?, ?, ?, ?, ?, ?)",
            (1, "Doha", "Qatar Exxon Mobil Open", "06/01/2020", "ATP250", "Outdoor", "Hard"))

con.execute('''CREATE TABLE IF NOT EXISTS match(id integer,
                                                tournament integer,
                                                round text, 
                                                bestof integer,
                                                winner text,
                                                loser text,  
                                                wrank integer, 
                                                lrank integer, 
                                                PRIMARY KEY (id),
                                                FOREIGN KEY (tournament) REFERENCES event(atp))''')

con.execute("INSERT INTO match VALUES (?, ?, ?, ?, ?, ?, ?, ?)",
            (0, 1, "1st Round", "3", "Bublik A.", "Mannarino A.", 55, 43))

cur = con.cursor()
cur.execute("SELECT * FROM match")
for r in cur.fetchall():
    print (r)

con.execute("DELETE FROM event")
con.execute("DELETE FROM match")
    
con.commit()
con.close()

## Exercise
Write a Python function which saves all the information from our csv file into a SQLite database, then use SQL to directly find the player with most wins and the corresponding number of wins.