# Fundamentals of Information Systems

## Python Programming (for Data Science)

### Master's Degree in Data Science

#### Giorgio Maria Di Nunzio
#### (Courtesy of Gabriele Tolomei FIS 2018-2019)
<a href="mailto:giorgiomaria.dinunzio@unipd.it">giorgiomaria.dinunzio@unipd.it</a><br/>
University of Padua, Italy<br/>
2019/2020<br/>

# Lecture 4: Python's Built-in Data Types (2)

## Data Type Hierarchy

-  Python's built-in data types can be grouped into several classes. 

-  We use the same hierarchy scheme used in the [official Python documentation](https://docs.python.org/3/library/stdtypes.html), which defines the following classes:

    -  **numeric**, **sequences**, **sets** and **mappings** (and a few more not discussed further here).

-  A special mention goes to two particular data types: **<code>bool</code>** and **<code>NoneType</code>**.

# In the previous lecture...

-  Built-in data types:
    -  <code>**bool**</code> and <code>**NoneType**</code> (<code>**None**</code>)
    -  <u>numeric</u>: <code>**int**</code>, <code>**float**</code>, <code>**complex**</code> (*immutable*)
    -  <u>sequences</u>: <code>**str**</code>, <code>**bytes**</code> (*immutable*), <code>**bytearray**</code> (*mutable*)

## In _this_ lecture

-  We finalize the discussion on Python's built-in data types:
    -  <u>sequences</u>: <code>**list**</code> (*mutable*) and <code>**tuple**</code> (*immutable*)
    -  <u>sets</u>: <code>**set**</code> (*mutable*)
    -  <u>mappings</u>: <code>**dict**</code> (*mutable*)

# Lists: Type <code>list</code> (_mutable_)

## Properties

-  An object of type <code>**list**</code> represents a sequence of possibly heterogeneous Python objects.

-  Lists are one of the basic data structure used to build more complex data types.

-  Being it mutable, any list object can be modified *in place*.

-  Operations defined on lists are the same we have already seen for any other sequence type (e.g., <code>**str**</code>).

In [None]:
# Define a reference to an empty list object
a_list = []

# Rebind the above reference to another list object
a_list = [1, 2, 'foo', None]

# Print the length of the list
print(len(a_list))

# Access the i-th element of the list (remember, the first element is indexed by 0)
print(a_list[2])

# a_list[-i] stands for a_list[n-i], where n = len(a_list)
# In the example below, therefore, we are accessing the very last element of the list
print(a_list[-1]) # same as a_list[len(a_list)-1]

# Change an element of the list (in place)
a_list[1] = 'bar'
print(a_list)

# Trying to access an element outside of the index range
print(a_list[7])

In [None]:
# Insert a new element at the end of an existing list using the 'append' method
a_list.append(42)
print(a_list)

# Insert a new element at a specific position of an existing list
# a_list.insert(pos, element) where pos is the position where element should be inserted
a_list.insert(2, 'red')
print(a_list)

# Is insert robust?
a_list.insert(100, 'blue')
print(a_list)

# -pos is a shortcut for len(a_list)-pos, therefore -1 means the new element 
# will replace the current last one, which is then properly shifted to the right
a_list.insert(-1, 73)
print(a_list)

# a_list.insert(len(a_list), element) is equivalent to a_list.append(element)
a_list.insert(len(a_list), 'cyan')
print(a_list)

In [None]:
# Insert a new element at the end of an existing list using the 'append' method
a_list.append(42)
print(a_list)


In [None]:
a_list.insert(2, 'red')
print(a_list)

## Checkpoint Quiz

How would you insert the list <code>**[4, 5]**</code> as the first element of our original list?

In [None]:
a_list.insert(0, [4,5])
print(a_list)

## Notes on <code>insert</code> _vs._ <code>append</code>

-  <code>**insert(pos, element)**</code> is computationally expensive compared to <code>**append(element)**</code>.

-  This is because references to elements at positions <code>**pos**, **pos+1**, ..., **n-1**</code> (where <code>**n**</code> is the total number of elements currently in the list) have to be shifted internally to make room for the new element. 

-  If you need to insert elements at both the beginning and end of a sequence, you may explore <code>**collections.deque**</code>, a double-ended queue, for this purpose.

In [None]:
# The inverse operation of 'insert' is 'pop',
# which removes and returns an element at a particular index.
# If no index is passed in, the last element is popped out by default
elem = a_list.pop()
print(elem)
print(a_list)

# Otherwise you can pass the index as argument
# Here we pop out the fourth element without storing it into a variable
a_list.pop(3)
print(a_list)

## Checkpoint Quiz

What happens if we try to do the following <code>**a_list.pop(123)**</code>, namely if we try to pop out an element using an out-of-range index?

In [None]:
# Trying to pop out an element using an out-of-range index
a_list.pop(123)

In [None]:
# Elements can be removed by index with 'del' or by value using 'remove', 
# 'remove' locates the first value passed as input and removes it from the list.
# Let's append another 'foo' at the end of the list
a_list.append('foo')
print(a_list)

# Then, remove the 2nd element (index=1)
del a_list[1]
print(a_list)

# Now, remove the first occurrence of 'foo'
a_list.remove('foo')
print(a_list)

# What if we try to remove an element which is not in the list?
a_list.remove('baz')
print(a_list)

In [None]:
# Although not very efficient, we can also check if a list contains an element
'red' in a_list

## Notes on the usage of <code>in</code> with lists

-  Checking whether a list contains an element is a lot **slower** than using <code>**dict**</code> and <code>**sets**</code> (to be introduced shortly).

-  Using <code>**list**</code>, Python has to make a **linear scan** across the values of the list (time complexity is $O(n)$ if $n$ is the number of elements of the list).

-  Using <code>**dict**</code> and <code>**sets**</code> - which are based on _hash tables_ - can make the check in constant time, i.e., $O(1)$.

# List Concatenation

In [None]:
# Lists can be added together using '+'
[42, True, 'bar'] + [False, None, 'foo'] + ['baz', 48, '']

In [None]:
# If you have a list already defined, 
# you can append multiple elements to it using 'extend' in place.
a_list = [42, True, 'bar']
print(a_list)

a_list.extend([False, None, 'foo'])
print(a_list)

a_list.extend(['baz', 48, ''])
print(a_list)

## Notes on list concatenation

-  Concatening lists using <code>**+**</code> is generally an expensive operation:
    -  A new list must be created and the objects copied over for each concatenation (similar to string concatenation). 
    
-  Using <code>**extend**</code> to append elements to an existing list is the preferred way to go, especially if you are building up a large list.

```python
# This approach uses the '+' operator (slower)
result_list = []
for a_list in list_of_list:
    result_list += a_list
```
```python
# This approach uses 'extend' method (faster)
result_list = []
for a_list in list_of_list:
    result_list.extend(a_list)
```

# List Sorting

In [None]:
# Lists can be sorted in-place (without creating a new object) by calling `sort`
a_list = [7, 2, 5, 1, 3]
print(a_list)

a_list.sort()
print(a_list)

# sort has a few options that will occasionally come in handy. 
# For example, you can pass a secondary sort key, 
# i.e., a function that produces a value to use for sorting the objects. 
# The following example shows how to sort a list of strings by their lengths:
str_list = ['saw', 'small', 'He', 'foxes', [123], 'six']
print(str_list)

str_list.sort(key=len)
print(str_list)

In [None]:
# Lists can be sorted in-place (without creating a new object) by calling `sort`
a_list = [7, 2, 5, 1, 3]
print(a_list)


In [None]:
a_list.sort()
print(a_list)




## Notes on sorting

-  Whenever you sort a list using <code>**sort()**</code>, remember that this happens in-place (i.e., you can not recover the original order). 

-  If you want to display a list in sorted order, but preserve the original order, you can use the <code>**sorted()**</code> function, instead. 

-  <code>**sorted()**</code> function also accepts the optional <code>**reverse=True**</code> argument.



In [None]:
students = ['bob', 'alice', 'carl']

# Display students in alphabetical order, but keep the original order.
print("Here is the list in alphabetical order:")
print(sorted(students))

# Display students in reverse alphabetical order, but keep the original order.
print("Here is the list in reverse alphabetical order:")
print(sorted(students, reverse=True))

print("Here is the list in its original order:")
# Show that the list is still in its original order.
print(students)

# List Slicing

In [None]:
# As for any other sequence types (tuples, NumPy arrays, pandas Series),
# you can select sections of lists using [start:stop] indexing notation
a_list = [7, 2, 3, 7, 8, 6, 0, 1]
print(a_list[1:5])

## Notes on slicing

-  Element at the <code>**start**</code> index is included, whilst the <code>**stop**</code> index is not.

-  Therefore, the total number of elements in the result is <code>**start - stop**</code>.

-  Either the <code>**start**</code> or <code>**stop**</code> can be omitted.
    -  if so, <code>**start**</code> will default to <code>**0**</code> and <code>**stop**</code> to <code>**n**</code> (where <code>**n**</code> is the length of the list).

In [None]:
# Slicing without specifying the start index
# From the 1-st (index=0) to the 5-th (index=4) element
print(a_list[:5])

# Slicing without specifying the stop index
# From the 5-th (index=4) to the last (index=len(seq)-1)element
print(a_list[4:])

# Negative indices slice the sequence relative to the end
# Slice the last three elements
print(a_list[-3:])

# Slice the 7-th element from last (included) up to the 2-nd from last (excluded)
print(a_list[-7:-2])

In [None]:
a_list

In [None]:
# print list
print(a_list)

# A step can also be used after a second colon to, say, take every other element
print(a_list[::2])

# A clever use of this is to pass -1, which has the useful effect of reversing the list
print(a_list[::-2])

# Looping over a List

## Accessing all the elements in a list

-  One of the most important concepts related to lists.

-  We use a **loop** (more on this later) to access _all_ the elements in a list. 

-  A loop is a block of code that repeats itself until it runs out of items to work with, or until a certain condition is met. 

-  In this case, our loop will run once for every item in our list (e.g., if a list has three items, our loop will run three times).

In [None]:
# Define a list containing dog breeds
dogs = ['border collie', 'golden retriever', 'german shepherd']

# Print each dog breed contained in the list above
for dog in dogs:
    print(dog)

## How does looping work?

-  The keyword <code>**for**</code> tells Python to "get ready" to use a loop.

-  The variable <code>**dog**</code> is a _temporary placeholder_ variable where Python will place each item of the list, one at a time, at each loop iteration:
    -  At the first iteration, <code>**dog**</code> references the string '<code>**border collie**</code>'.
    -  At the second iteration, <code>**dog**</code> references '<code>**golden retriever**</code>'.
    -  At the third iteration, <code>**dog**</code> references '<code>**german shepherd**</code>'.
    -  Finally, after this there are no more items in the list, and the loop terminates.

In [None]:
# Once we hold a reference to a list item we are not just limited to print it!
# In fact, we can perform any supported operation on it
# For example, we can print it yet with its first letters capitalized
for dog in dogs:
    # We can call the string method title() on the current referenced string item
    # and use it within a predefined string pattern using the format method
    print('My favourite dog breed is: {0:s}'.format(dog.title()))

# Note that this statement is NOT indented as the previous one 
# (i.e., it is outside the loop!)
# Therefore, it is printed ONLY once after the loop terminates
print('That\'s all I have to say about dogs!')

In [None]:
'this_is_my_\'string'

## Enumerating a List

-  When looping over a list, it might be useful to know the **index** of the current item. 

-  This can be achieved using <code>**list.index(value)**</code> syntax, but there is a simpler way.

-  The <code>**enumerate()**</code> function tracks the index of each item for you, as it loops through the list.

In [None]:
# enumerate takes the sequence (list) as input
# and returns the index and the reference to the current item in the list
for index, dog in enumerate(dogs):
    print('My n.{0:d} favourite dog breed is: {1:s}'.format(index + 1, dog.title()))

# Note that this statement is NOT indented as the previous one 
# (i.e., it is outside the loop!)
# Therefore, it is printed ONLY once after the loop terminates
print('That\'s all I have to say about dogs!')

In [None]:
for idx, d in enumerate(dogs):
    #print(idx)
    print(idx)

# List Comprehension

In [None]:
"""
Consider the following code snippet that, given a list of words, produces a new list
containing only those words containing at least 2 'a'
"""
# Input list of words
words = ['banana', 'kiwi', 'apple', 'melon', 'pineapple', 'papaya', 'strawberry', 'mango']

# Prepare the list containing the result
result = []
# Loop through all the words
for word in words:
    # Check if the current word contains at least 2 'a'
    if word.count('a') >= 1:
        # If so, just append it to the result list
        result.append(word + "!")

# Finally, print the result (should be ['banana', 'papaya'])
print(result)

In [None]:
# List comprehension allows us to write the same thing yet in a more compact way
# Let's start from scratch with an empty list (this step is not really needed)
result = []

# Using list comprehension you can do it in just a single line!
result = [word + "!" for word in words if word.count('a') >= 1]

# Finally, print the result
print(result)

In [None]:
# Note that list comprehension works also when you have nested lists
# For example, consider the following list of lists
data = [['banana', 'kiwi', 'apple', 'melon'],['pineapple', 'papaya', 'strawberry', 'mango']]

# If you want to obtain a list of words starting with the letter 'm'
words_starting_with_m = [word for word_list in data for word in word_list 
                         if word.startswith('m')]

# Finally, print the final list
print(words_starting_with_m)

# Tuples: Type <code>tuple</code> (_immutable_)

## Properties

-  Tuples are basically _**immutable**_ lists.

-  Lists are great for containing highly dynamic information, as you can append/insert/remove/modify items in a list. 

-  However, sometimes we may want to ensure that no user nor part of a program can change a list. That's exactly what tuples are for!

-  Allowed operations are the same as those of any other sequence type (i.e., <code>**list**</code>, <code>**str**</code>, <code>**bytes**</code>, etc.).

In [None]:
# Defining a tuple is like defining a list, except you use parentheses 
# instead of square brackets
colors = ('red', 'green', 'blue')

# Once you have a tuple, you can access individual elements just like you can with a list...
print('The second color is: ' + colors[1])

# ... and you can loop through the tuple with a for loop:
print('\nHere is the list of primary colors:')
for color in colors:
    print(color.title())

# What happens if we try to add an item to the tuple?
colors.append('black')

## Checkpoint Quiz

What happens if we try to do the following: <code>**colors.sort()**</code>, namely if we try to sort the tuple in-place?<br/>
And what would you expect to get if we did something like <code>**sorted(colors)**</code>?

In [None]:
colors.sort()

In [None]:
sorted(colors)

# Hash Tables

-  So far, we have seen data types which are able to store Python objects which are indexed by **integers**, such as <code>**str**</code> (*immutable*) or <code>**list**</code> (*mutable*).

-  However, Python objects can also be collected into **hash tables**.

-  Python provides two built-in types which corresponds to hash tables: <code>**set**</code> and <code>**dict**</code>

## What is _hashing_?

-  Hashing is the application of a **hash function**.

-  A hash function maps a set of objects to a set of integers satisfying _some_ properties.

-  Any hash function $h$ must be an _actual_ function, that is, if two objects $x$ and $x'$ are the same (i.e., $x = x'$), then their hash should also be same, namely $h(x) = h(x')$.

-  Also, a hash function should be easy to compute but hard to invert.

## Domain _vs._ Codomain of a Hash Function

-  Usually, the set of integers that the hash function maps to (i.e., the **codomain**) is **much smaller** than the set of objects (i.e., the **domain**)

-  So that there will be multiple objects that hash to the same value (**hash collision**). 

-  In practice, hash functions operate on large enough codomains, and the function is designed so that if two objects hash to the same value, then they are _very likely_ equal.

## What are Hash Functions used for?

-  We can leverage hash functions to organize a collection of objects into a new data structure, called **hash table**.

-  Example: Suppose we have a collection of objects, and given _any_ object, we want to be able to compute **very quickly** if that object belongs to our collection. 

-  **First solution**: Store objects in a list. But then to determine if an element is in the list, we might need to scan the _whole_ list (time complexity $O(n)$, where $n$ is the number of elements in the list).

-  **Better solution**: Use hashing!

## Hash Table

-  Instead of storing the objects in a list, we create a list of ''**buckets**'', each one _indexed_ by some hash value.

-  We then compute the hash of each object, and store it into the list entry corresponding to its hash value. 

-  If there are more hash values than buckets (as usually is the case), we distribute them using a second hash function, which can be as simple as taking the modulus with respect to the number of buckets.

## Lookup

-  To determine if an object is in a hash table, we only have to hash the object, and look in the bucket corresponding to that hash. 

-  This is a $O(1)$ (i.e., constant time) operation which **does not** depend on the size of the input

-  Of course, assuming the hash function **evenly** distributes objects in the available buckets (collisions).

## Hashing in Python

-  Python has a built-in function that performs a hash called <code>**hash()**</code>. 

-  For many objects, the hash is not very surprising. 

-  Python hashing depends on the architecture of the machine you are running on, and, in newer versions of Python, hashes are randomized for security purposes.

In [None]:
# Hashing an integer (immutable)
print("Hash of 42 is: {}".format(hash(42)))

# Hashing a string (immutable)
print("Hash of \"Aloha\" is: {}".format(hash("Aloha")))

# Hashing an empty tuple (immutable)
print("Hash of empty tuple () is: {}".format(hash(())))

In [None]:
# Not every Python object is hashable!
# Hashing a list (mutable)
print("Hash of list [1, 3, 5] is: {}".format(hash([1, 3, 5])))

## _Hashability_ of Python Objects

-  For an object to be hashable, it must be **immutable** (as well as any of its nested object)!

-  This guarantees that the hash of an object remains the same across the object's lifetime. 

-  If the object is mutable and changes, then its hash will also have to change accordingly.

-  This (design) restriction simplifies hash tables (i.e., <code>**set**</code> and <code>**dict**</code> introduced below), which otherwise should change the bucket where an object is store at runtime.

# Sets: Type <code>set</code> (_mutable_)

## Properties

-  A _set_ is an **unordered** collection of **unique** elements.

-  Internally, they are stored into a hash table.

-  A set can be created in two ways: using a _set literal_ with curly braces or via the <code>**set**</code> function.

In [None]:
# Defining a set using curly braces
s = {3,5,6,5,5,2,1,4,3}
print(s)

# Defining a set using the 'set' built-in function
s = set([3,5,6,5,5,2,1,4,3])
print(s)

# Note that this means that we can also transform a list into a set
a_list = ['apple', 'kiwi', 'banana', 'apple', 'ananas', 'kiwi', 'pear', 'apple']
s = set(a_list)
print(s)

In [None]:
"""
Note that the following is legitimate because the objects used to create the set s
can be stored in an 'iterable' (like the mutable list below), 
provided that each individual element in the iterable is hashable
(i.e., immutable) like the integers below.
"""
s = set([3,5,6,5,5,2,1,4,3])
print(s)
"""
But what if I do the following?
"""
s = set([[3,5,6,5],[5,2,1,4,3]])
print(s)

## Operations

Sets support mathematical set operations like **union**, **intersection**, **difference**, and **symmetric difference**. See Table below for a list of commonly used set methods.

In [None]:
# Defining two sets: A and B
A = {1, 2, 3, 4, 5}
B = {3, 4, 5, 6, 7, 8}

# Set Union (A or B)
print("Set Union: A \/ B = {}".format(A | B))

# Alternatively, you can invoke the 'union' method
print("Set Union: A \/ B = {}".format(A.union(B)))

# Set Intersection (A and B)
print("Set Intersection: A /\ B = {}".format(A & B)) 

# Alternatively, you can invoke the 'intersection' method
print("Set Intersection: A /\ B = {}".format(A.intersection(B)))

# Set Difference (A - B)
print("Set Difference: A - B = {}".format(A - B))

# Alternatively, you can invoke the 'difference' method
print("Set Difference: A - B = {}".format(A.difference(B)))

# Set Symmetric Difference (A xor B)
print("Set Symmetric Difference: A ^ B = {}".format(A ^ B))

# Alternatively, you can invoke the 'symmetric_difference' method
print("Set Symmetric Difference: A ^ B = {}".format(A.symmetric_difference(B)))

## Note on set operations

-  Each of the logical set operations have **in place** counterparts.

-  They can either be invoked using <code>**op=**</code> (where <code>**op = {|, &, -, ^}**</code>) or by calling the corresponding method with the <code>**_update**</code> suffix.

-  Those replace the contents of the set on the left side of the operation with the result. 

-  For very large sets, this will be more efficient.


In [None]:
# Make a copy of set A
C = A.copy()

# In-place Set Union (C or B)
C |= B
print("Set Union: C \/ B = {}".format(C))

# Make another copy of set A
D = A.copy()

# In-place Set Intersection (D and B)
D &= B
print("Set Intersection: A /\ B = {}".format(D)) 

# ... similarly for the other operations
# Eventually, the original set A is unchanged
print("Original set A = {}".format(A))

In [None]:
# You can also check if a set is a subset of (is contained in) 
# or a superset of (contains all elements of) another set
X = {1, 2, 3, 4, 5}
Y = {1, 3, 5}
print("'Y is subset of X' = {}".format(Y.issubset(X)))
print("'X is superset of Y' = {}".format(X.issuperset(Y)))

Y = {1, 3, 5, 7}
print("'Y is subset of X' = {}".format(Y.issubset(X)))
print("'X is superset of Y' = {}".format(X.issuperset(Y)))

# Finally, two sets are equal iff they have exactly the same content
Z = {5, 1, 7, 3}
print("'Y is equal to Z = {}'".format(Y == Z))

# Mappings: Type <code>dict</code> (_mutable_)

## Properties

-  Very likely, <code>**dict**</code> is the most important built-in Python data structure.

-  A more common name for it is **hash map** or **associative array**. 

-  It is a hash table where each element of the hash table (**key**) points to another object (**value**); the object representing the value itself is not hashed.

-  Keys and Values are of course Python objects! :)

-  The easiest way to create one is by using curly braces <code>**{}**</code> and using colons to separate keys and values

In [None]:
# Create an empty dictionary
d = {}

# Define a dictionary containing some elements
d = {'a': 1, 'b': 2, 'c': [3, 4]}

# Values can be accessed/added/updated using the same list notation []
# Instead of accessing values by index (int), dictionary's values are accessed by key
# Retrieve the value associated with the key 'b' in the dictionary above
print("Retrieve the value associated with the key 'b' = {}".format(d['b']))

# Add a new value associated with a new key
d['z'] = 'some string'
print("After adding a new entry, the dictionary is: {}".format(d))

# Update the value associated with an existing key
d['a'] = (5, 42)
print("After updating the value of an existing entry, the dictionary is: {}".format(d))

# You can check if a dict contains a key using the same syntax 
# as with checking whether a list or tuple contains a value
print("Q: The key 'c' is in the dictionary? A: {}".format('c' in d))

In [None]:
# Values can be deleted either using the 'del' keyword or the 'pop' method 
# (the latter simultaneously returns the value and deletes the key)
del d['b']
print("After deleting an existing entry, the dictionary is: {}".format(d))

val = d.pop('z')
print("After popping out an existing entry, the dictionary is: {}".format(d))
print("The value popped out is: '{}'".format(val))

## Useful methods: <code>keys</code> and <code>values</code>

-  The <code>**keys**</code> and <code>**values**</code> methods give you iterators of the dictionary’s keys and values, respectively as sets. 

-  While the key-value pairs are __not__ in any particular order, these functions output the keys and values in the same order.

In [None]:
d

In [None]:
# One dictionary can be merged into another using the update method (in-place)
d.update({'b' : 'foo', 'c' : 12})

In [None]:
d

In [None]:
# Print the set of keys
print("The set of dictionary's keys is: {}".format(d.keys()))

# Print the set of values
print("The set of dictionary's values is: {}".format(d.values()))

# One dictionary can be merged into another using the update method (in-place)
d.update({'b' : 'foo', 'c' : 12})
print("After updating, the dictionary is: {}".format(d))

# Creating Dictionaries from Sequences

In [None]:
# It’s common to end up with two sequences that you want to pair up element-wise in a dict. 
# As a first cut, you might write code like this
# The list of keys
key_list = ['foo', 'bar', 'baz']

# The list of values
value_list = [15, 73, 42]

# Prepare the dictionary (at the beginning this is empty)
mapping = {}

# Populate the dictionary using the 'zip' function
# The 'zip' function takes two lists X = [x_1, ..., x_m] and Y = [y_1, ..., y_n]
# and returns a list of tuples [(x_1, y_1), (x_2, y_2), ..., (x_k, y_k)], 
# where k = min(m, n)
for key, value in zip(key_list, value_list):
    mapping[key] = value
print("The mapping dictionary is: {}".format(mapping))

## Checkpoint Quiz

What happens if the list of keys contains duplicates, i.e., if we change the definition of <code>**key_list**</code> as follows:
```python
key_list = ['foo', 'bar', 'bar']
```

In [None]:
# A dictionary is roughly a collection of 2-tuples (one for the keys and one for the values),
# you can create one using the 'dict' type function and pass to it a list of 2-tuples
key_list = ('foo', 'bar', 'bar')

value_list = (15, 73, 42)

mapping = dict(zip(key_list, value_list))

print("The mapping dictionary is: {}".format(mapping))

# Default Values

In [None]:
# It’s quite common to have logic as follows:
if key in some_dict:
    value = some_dict[key]
else:
    value = default_value

# Luckily, the dict methods 'get' and 'pop' can take a default value to be returned, 
# so that the above if-else block can be written simply as:
value = some_dict.get(key, default_value)

In [None]:
# Try to get the value associated with the key 'let', 
# if this is not present fall back to 0
# 1. Using 'get'
value = mapping.get('let', 0)
print("Value returned = {}".format(value))

# 2. Using 'pop'
value = mapping.pop('let', 0)
print("Value returned = {}".format(value))

# If no default value is specified 'get' and 'pop' have 2 different behaviors:
# The 'get' method by default will return None if the key is not present
value = mapping.get('let')
print("Value returned = {}".format(value))

# ... whilst 'pop' will raise an exception.
value = mapping.pop('let')
print("Value returned = {}".format(value))

In [None]:
# Another typical situation happens when trying to set values in a dictionary.
# Sometimes those values are other collections, like lists. 
# Suppose you want to categorize a list of words by their first letters as a dict of lists.
# List of words
words = ['apple', 'bat', 'bar', 'atom', 'book', 'car', 'charlie', 'zoo']

# Initializing your empty dictionary
index = {}

# Loop through all the words in the list
for word in words:
    first_letter = word[0] # extract the first letter from the current word
    if first_letter not in index: # if the key (first_letter) is not in the dictionary 
        index[first_letter] = [word] # just create a new entry, i.e., a list with one word
    else:
        # otherwise, append the current word to the list associated with the existing key
        index[first_letter].append(word)
        
print("The index dictionary is: {}".format(index))

In [None]:
# The if-else code block above can be easily rewritten using the 'setdefault' dict method.
# List of words
words = ['apple', 'bat', 'bar', 'atom', 'book', 'car', 'charlie', 'zoo']

# Initializing your empty dictionary
index = {}

# Loop through all the words in the list
for word in words:
    first_letter = word[0] # extract the first letter from the current word
    # either set an empty list ([]) with the current word 
    # or append it to the existing entry
    index.setdefault(first_letter, []).append(word)

print("The index dictionary is: {}".format(index))

## Valid Types for Dictionary Keys

-  Although the **values** of a <code>**dict**</code> can be **_any_** Python object, the **keys** have to be **hashable**

-  Therefore **keys** must be **_immutable_** objects like scalar types (<code>**int**</code>, <code>**float**</code>, <code>**str**</code>) or <code>**tuple**</code> (**note:** all the objects in the tuple need to be immutable, too!).

-  Again, you can check whether an object is hashable (i.e., can be used as a key in a dictionary) with the <code>**hash()**</code> function.

In [None]:
# Check if an object of type str is 'hashable'
print(hash('string key'))

# Check if an object of type tuple is 'hashable'
print(hash((1, 2, (2, 3))))

# Check if a composite object of type tuple is 'hashable'
print(hash((1, 2, [2, 3]))) # fails because list are 'unhashable'

In [None]:
# To use a list as a key, one option is to convert it to a tuple, 
# which can be hashed as long as its elements also can.
d = {}
d[tuple([1, 2, 3])] = 'foo'
print(d)

d[tuple([1, 2, 1])] = 'bar'
print(d)
d[tuple([1, 2, [42, 73]])] = 'baz' # fails as the third element of the list is itself a list

## Summary

-  In the last two lectures we have covered our (not-exhaustive) overview of Python's built-in data types.

-  Python allows the programmer to easily define **new** data types (i.e., **classes**) supporting object-oriented paradigm.

-  Still, the data types we have seen so far (or optimized variants of those) are the main building blocks, which most of the times are enough to develop your data science applications.

-  Therefore, you are strongly encouraged to familiarize with them so that you know which are the most appropriate to use for achieving your specific task (https://wiki.python.org/moin/TimeComplexity)