**Introductory and intermediate computing for Data Science [Barcelona School of Economics]**

`Instructor:` Maxim Fedotov  
`Program:` M.Sc. in Data Science Methodology

# Class 3

## Sequence types: lists, tuples, ranges
Let's dive into several very important built-in sequence types: `list`, `tuple` (and touch `range` a bit). Theit nature follows from the name: these data structures contain ordered collection of objects. Their *displays* work as follows:

In [1]:
typical_cup_volumes = [0.33, 0.5]
clients_addresses = ("Ramon Trias Fargas, 25-27", "Roc Boronat, 138", "Carrer de la Mercè, 12",
                     "Doctor Aiguader, 80", "Passeig Pujades, 1", "Balmes, 132-134")
clients_ids = range(len(clients_addresses))

The above cell shows expressions that create such objects. How to make Python understand that an object is a tuple of one element? Try it out here:

In [None]:
# create and print (or just call) a variable which is supposed to be a tuple of one element


As you have seen already, arithmetic operators can be used with lists and tuples.

In [2]:
typical_cup_volumes = [0] + typical_cup_volumes
print(f"A healthy option would be to have {typical_cup_volumes[0]} liters of a sweet sparkling drink.")

A healthy option would be to have 0 liters of a sweet sparkling drink.


As you can see, we can access a value by specifying its index in the square brackets `typical_cup_volumes[0]` (note that it is necessary that there is no space between a name of a variable and a left square bracket). A negative integer would also work as an index – its absolute value specifies a position of an element with respect to the end of the list. We can access specfic elements from these data structures doing *slicing*. The interface for slicing is the same square brackets as for selecting one elements, but with specific contents inside it: `[start:end:step]`. Note that it is not necessary to specify all of them.

In [3]:
print("But I still sometimes drink {:.3f}".format(typical_cup_volumes[-1]),
      "of this sweet drink \U0001F605")

clients_addresses_reverted = clients_addresses[::-1]
print("These are addresses in reverse order:", "; ".join(clients_addresses_reverted))

print("IDs of the first two clients:", clients_ids[:2])  # note that you get a very 
                                                         # specific variable

But I still sometimes drink 0.500 of this sweet drink 😅
These are addresses in reverse order: Balmes, 132-134; Passeig Pujades, 1; Doctor Aiguader, 80; Carrer de la Mercè, 12; Roc Boronat, 138; Ramon Trias Fargas, 25-27
IDs of the first two clients: range(0, 2)


We can change values of lists by using an assignment expression of the following form:  
`list_identifier[index | slice] = new_value`

Note that if you provide ind(-ex / -ices) then you get an `IndexError`

The main difference between a list and a tuple is that the former is *mutable* and the latter is *immutable*. It means that we can freely change values of elements in a list, but not in a tuple.

In [None]:
# try to change the first element of typical_cup_volumes to 0.180


# now try to change any of the entries of clients_addresses


Note that Python considers several identifiers separated with commas as a tuple. This allows us to use an elegant expression when we work with several variables at the same time. For example, a basic computer science problem of swapping values of two variables can be done simply like that: 

In [4]:
value_1 = 1
value_2 = 2

value_1, value_2 = value_2, value_1

print(value_1, value_2)

2 1


In addition, tuples are useful because they exhibit *destructuring*. This is what you typically use when you need to retrieve values from tuples to create desired variables or reassign values of already created ones.

Suppose that the following user information comes to you in a tuple: username, user id, balance. You can use tuple distructuring to retrieve the elemets separately.

In [5]:
user_info = ('mfedotov', '123', 50.1)
user_name, user_id, user_balance = user_info
print(f"User {user_name} with id {user_id} has {user_balance} on their balance.")

# note that if you can use a "plug" if you do not need some of the elements like this

_, user_id, user_balance = user_info
print(f"User {_} with id {user_id} has {user_balance} on their balance.")

# note that this "plug" is still a variable, "_" is also a valid identifier of a variable.

User mfedotov with id 123 has 50.1 on their balance.
User mfedotov with id 123 has 50.1 on their balance.


In addition, tuple destructuring is used implicitly when you write a `for` loop where you iterate through a collection which elements contain several values.

Note that `list`, `tuple` and `range` are data types, data structures, and classes in Python. So, they have some *methods* that are associated with them.

For example, that is how you can implement a queue in Python. (*comment: to fully implement a "queue" data structure, you would write a specific class Queue with methods enqueue and dequeue, this cell below shows you a simplified example*)

In [6]:
queue = ["Bob", "Janine", "John", "Anastasia"]
queue.append("Alice")  # enqueuing
queue.pop(0)  # dequeuing
queue

['Janine', 'John', 'Anastasia', 'Alice']

## For loops

To really make use of the sequence data structures, we have *loops* at our disposal. There are two types of loops in Python: `for` and `while`. We start with the former ones as they are used in list comprehensions (critically useful tool).

Below you can find an example of a simple for loop:

In [7]:
kilocalories_drink = 37

kilocalories_portions = []
typical_cup_volumes = [0, 0.18, 0.33, 0.5, 1]

for volume in typical_cup_volumes:
    calories = volume * 10 * kilocalories_drink
    kilocalories_portions.append(calories)
    print(f"There are {calories:.1f} calories in {volume * 1000:n} ml. of the drink")

There are 0.0 calories in 0 ml. of the drink
There are 66.6 calories in 180 ml. of the drink
There are 122.1 calories in 330 ml. of the drink
There are 185.0 calories in 500 ml. of the drink
There are 370.0 calories in 1000 ml. of the drink


The basic contents of a for loop are:
* A keyword `for`
* An arbitrary identifier for a single element at each iteration (here it's `volume`)
* A keyword `in` which indicates that at each iteration we take on element of an iterable object that we specify right after.
* An identifier of an iterable which we want to loop through (here it's `typical_cup_volumes`) which is followed by `:`.
* Then there goes a body of the loop. Do not forget about correct indentation.

### Break and continue

There are two keywords that will help you to work with for loops: `break` and `continue`.

* `break` stops the loop right at a place it was reached.
* `continue` makes the loop to stop the current iteration without executing any code further and moves to the next iteration.

Suppose that you want to compute calories for positive portions only. In addition, you do not want to calculate calories for portions greater that 330 ml. Then you would use the following construction:

In [8]:
kilocalories_portions = []

# typically, for this kind of problems you would ensure that the list is sorted
typical_cup_volumes = sorted(typical_cup_volumes)                                                   
                                                   
for volume in typical_cup_volumes:
    if 0 < volume <= 0.33:
        calories = volume * 10 * kilocalories_drink
        kilocalories_portions.append(calories)
        print(f"There are {calories:.1f} calories in {volume * 1000:n} ml. of the drink")
    elif volume > 0.33:
        break
    else:
        continue

There are 66.6 calories in 180 ml. of the drink
There are 122.1 calories in 330 ml. of the drink


## While loops

`While` loops are another way to implement a repeated protocol of action. It is mostly used when there is a sort of stopping criterion to be satisfied after the end of the loop. 

Let's try to imprement a bisection root finding algorithm using a while loop.

In [9]:
from types import FunctionType
from typing import Union

def root_finding_bisection(f: FunctionType, 
                           left_endpoint: Union[int, float], 
                           right_endpoint: Union[int, float], 
                           maxit: int = 1000) -> float:
    if left_endpoint > right_endpoint:
        raise ValueError('Left enpoint must have less value that the right endpoint.')
    f_left = f(left_endpoint) 
    f_right = f(right_endpoint)
    if f_left * f_right >= 0:
        raise ValueError('The function values evaluated at the left and right endpoints' 
                         'must be of different signs.')
    iteration = 0 
    while iteration < maxit and (iteration == 0 or f_middle != 0):
        middle = (left_endpoint + right_endpoint) / 2
        f_middle = f(middle)
        if f_left * f_middle > 0:
            left_endpoint = middle
            f_left = f(left_endpoint)
        else:
            right_endpoint = middle
            f_right = f(right_endpoint)
        iteration += 1
    return middle
    
        
def f(x: Union[int, float]) -> float:
    return (x - 5) * (x + 3)

root_finding_bisection(f, -1, 10)

5.0

Note that it is still recommended to prioritize use of `for` loops rather than `while` loops if possible.

## List comprehensions

There is also a concept of list comprehensions that allows to utilize `for` loop functionality  in a concise way embedding it in a list display. Of course, `for` loops and list comprehensions do not serve for same purposes. However, for this specific example above, we could do the same thing using a list comprehension.

### Mapping

In [10]:
kilocalories_drink = 37  # typical value per 100 ml of a sweet sparkling drink
kilocalories_portions = [volume * 10 * kilocalories_drink for volume in typical_cup_volumes]
print(*kilocalories_portions, sep=' | ')

0 | 66.6 | 122.10000000000001 | 185.0 | 370


This list comprehension implements *mapping*, i.e. we apply a specific action to each element of the list.

We could define a function and do it the same way.

In [11]:
def kcal_portion(volume: int | float, kcal_drink: int | float) -> float:
    """
        Computes kilocalories per portion of drink.
        
        arguments: 
            - volume:     a volume of the drink in liters.
            - kcal_drink: kcal in 100 ml. of the drink
        
        returns:
            A value of kcal. per specified portion.
    """
    return volume * 10 * kcal_drink


kilocalories_portions = [kcal_portion(volume, kilocalories_drink) for volume in typical_cup_volumes]
print(*kilocalories_portions, sep=' | ')

0 | 66.6 | 122.10000000000001 | 185.0 | 370


In [12]:
help(kcal_portion)

Help on function kcal_portion in module __main__:

kcal_portion(volume: int | float, kcal_drink: int | float) -> float
    Computes kilocalories per portion of drink.
    
    arguments: 
        - volume:     a volume of the drink in liters.
        - kcal_drink: kcal in 100 ml. of the drink
    
    returns:
        A value of kcal. per specified portion.



Note that the type that you specify in a function definition are not enforced. Using `|` is a relatively new feature which can be unavailable for some older versions of Python. For older versions of Python consider using `typing` package. Note that in code of many sophisticated packages `typing` package is still used for this purpose.

In [13]:
from typing import Union

def kcal_portion(volume: Union[int, float], kcal_drink: Union[int, float]) -> float:
    """
        Computes kilocalories per portion of drink.
        
        arguments: 
            - volume:     a volume of the drink in liters.
            - kcal_drink: kcal in 100 ml. of the drink
        
        returns:
            A value of kcal. per specified portion.
    """
    return volume * 10 * kcal_drink

help(kcal_portion)

Help on function kcal_portion in module __main__:

kcal_portion(volume: Union[int, float], kcal_drink: Union[int, float]) -> float
    Computes kilocalories per portion of drink.
    
    arguments: 
        - volume:     a volume of the drink in liters.
        - kcal_drink: kcal in 100 ml. of the drink
    
    returns:
        A value of kcal. per specified portion.



### Filtering

There is also a concept of *filtering* which can be implemented with a list comprehension.

Suppose that we want to select only non-zero volumes from the list of volumes. Then we can use the following comprehension:

In [14]:
typical_cup_volumes[0] = 0
print(typical_cup_volumes)

volumes_positive = [volume for volume in typical_cup_volumes if volume > 0]
print("Positive volumes are:", *volumes_positive)

[0, 0.18, 0.33, 0.5, 1]
Positive volumes are: 0.18 0.33 0.5 1


We can also combine filtering with mapping. It is also allowed to use `else` section.

In [15]:
kilocalories_portions = [kcal_portion(volume, kilocalories_drink) if volume > 0 else None 
                         for volume in typical_cup_volumes]
print(*kilocalories_portions, sep=' | ')

kilocalories_portions_replacena = [value if value is not None else 0 for value in kilocalories_portions]
print(*kilocalories_portions_replacena, sep=' | ')

None | 66.6 | 122.10000000000001 | 185.0 | 370
0 | 66.6 | 122.10000000000001 | 185.0 | 370


### Reducing

Another helpful concept is *reducing*. That is, we can retrieve some useful information (e.g. some statistic) from a list, i.e. we reduce it to one particular number.

Let's see how we can take a maximum element from the list above.

In [16]:
# For a moment, suppose that the first element of the typical_cup_volumes list is None
typical_cup_volumes[0] = None

def get_max_safe(_list: list):
    _list_dropna = [elem for elem in _list 
                    if elem is not None and isinstance(elem, (int, float, complex)) and ~isinstance(elem, bool)]
    return max(_list_dropna)

get_max_safe(kilocalories_portions)

370

## Dictionaries

Dictionaries can be considered as a collection *key* / *value* pairs. Note that only a *hashable* object can be a key. For example, lists are specified as "unhashable".

In [17]:
user = {'name': 'Foo', 'score': 55}

print("The keys in the dictionary can be accessed through `dict.keys(...)` method:", user.keys())
print("The values in the dictionary can be accessed through `dict.values(...)` method:", user.values(), "\n")

# You can also access the 
for key, value in user.items():
    print("The %s of the user is %s." % (key, value))

The keys in the dictionary can be accessed through `dict.keys(...)` method: dict_keys(['name', 'score'])
The values in the dictionary can be accessed through `dict.values(...)` method: dict_values(['Foo', 55]) 

The name of the user is Foo.
The score of the user is 55.


You can access a value by its key in the following way:

In [18]:
print("Specify the key in the square brackets:", user['name'])
print("You can also use the `get` method:", user.get('name'))

Specify the key in the square brackets: Foo
You can also use the `get` method: Foo


You can also change a value or create a new entry of a dictionary.

In [19]:
user['name'] = 'Gru'
user['time_spent'] = 20
user

{'name': 'Gru', 'score': 55, 'time_spent': 20}

You can feel the importance of dictionaries when working with data. For example, some data can be specified as a list of dictionaries like this:

In [20]:
users = [{'name': 'Foo', 'score': 55}, {'name': 'Lu', 'score': 56}]

Or data can also come in a nested dictionary format as well. You typically incounter this type of data when parsing websites.

## Set types: sets, frozensets

There are also set types at your disposal. They embody pretty effective data structures that store unordered collections. Sets are typically used when checking whether a value of a variable belongs to a specific set of values, or when set operations (like intersection, union, set difference and so on).

In [23]:
filtering_parameters = {"id", "account", "balance"}

user_data = {"id": 1, "name": "foo", "account": 1337, "date_open": "21/12/21", "balance": 550}
user_data_filtered = {}  # Note that this line creates an empty DICTIONARY, not a set. 
                         # To create an empty set use `set()`

for parameter in user_data:  # note that by default such a loop iterates over the dictionary keys.
    if parameter in filtering_parameters:
        user_data_filtered[parameter] = user_data.get(parameter)
        
user_data_filtered

{'id': 1, 'account': 1337, 'balance': 550}

The logical expression `parameter in filtering_parameters` represents a containment test. It can also be done in that way with other collections, like `lists`, `tuples`, `ranges`, even `strings`.

Note that sets contain only unique values, i.e. if you would like to convert your `list` using `set(...)` function, then it gets rid of all duplicates in the resulting list.

In [24]:
# Suppose that we have some data on occupations of individuals
data = [{"name": "Foo", "occupation": "data analyst"}, {"name": "Lu", "occupation": "software engineer"}, 
        {"name": "Bro", "occupation": "data analyst"}, {"name": "Gru", "occupation": "electrical engineer"}]

n_individuals = len(data)
occupations = [None] * len(data)

for i in range(n_individuals):
    occupations[i] = data[i]["occupation"]
    
unique_occupations = set(occupations)

print("Unique occupations are:", ', '.join(unique_occupations)) 
print("Number of unique occupations in the data is:", len(unique_occupations))

Unique occupations are: software engineer, data analyst, electrical engineer
Number of unique occupations in the data is: 3


Examples of set operations are:

In [25]:
users_donating = ["Foo", "Lu", "Gru"]  # suppose that these are regularly donating users of your application
users_active = ["Bro", "Lu", "Gru"]  # these are the most active users

print("Difference:", set(users_donating) - set(users_active))  # note that order matters
print("Intersection:", set(users_donating) & set(users_active))
print("Union:", set(users_donating) | set(users_active)) 
print("Symmetric difference:", set(users_donating) ^ set(users_active)) 

Difference: {'Foo'}
Intersection: {'Lu', 'Gru'}
Union: {'Lu', 'Bro', 'Gru', 'Foo'}
Symmetric difference: {'Bro', 'Foo'}


Note that you can modify a set.

In [26]:
names = {'Gru', 'Foo', 'Lu'}
names.add('Bro')
names

{'Bro', 'Foo', 'Gru', 'Lu'}

That is why sets are made unhashable in Python (if you use the built-in function `hash(...)`, you get a `TypeError`). You can use a `frozenset` which is an *immutable* and *hashable* version of a set.

In [27]:
names = frozenset(['Gru', 'Foo', 'Lu'])
names

frozenset({'Foo', 'Gru', 'Lu'})