 # Collections

 Collections are used to store multiple objects. There are three main types of collections in Python: lists, tuples, and dictionaries.

 ## Lists

 ### Creating lists

 Lists are used to store multiple objects, for example:

In [1]:
characters = ["Luke Skywalker", "Darth Vader", "Yoda", "Obi-Wan Kenobi"]
print(characters)
print(type(characters))

['Luke Skywalker', 'Darth Vader', 'Yoda', 'Obi-Wan Kenobi']
<class 'list'>


 ### Accessing individual list items

 Lists can be accessed using their index, for example:

In [2]:
print(characters[0])  # first element
print(characters[1])  # second element
print(characters[2])  # third element

Luke Skywalker
Darth Vader
Yoda


In [3]:
# negative indices refer to the position relative to the end
print(characters[-1])  # last element
print(characters[-2])  # second to last element

Obi-Wan Kenobi
Yoda


 ### Accessing multiple list items (slicing)

 Accessing multiple items in a list is called slicing. Slicing is done using the colon (:), for example:

In [4]:
print(characters[0:2])
print(characters[:2])
print(characters[2:])
print(characters[-2:])
print(characters[:-2])

['Luke Skywalker', 'Darth Vader']
['Luke Skywalker', 'Darth Vader']
['Yoda', 'Obi-Wan Kenobi']
['Yoda', 'Obi-Wan Kenobi']
['Luke Skywalker', 'Darth Vader']



 Like with strings, `::` can be used to specify a step value. E.g. `characters[::2]` means, "every second item in the list".

In [5]:
print(characters[::2])  # every second item in the list
print(characters[1::2])  # every second item in the list starting from the second item

['Luke Skywalker', 'Yoda']
['Darth Vader', 'Obi-Wan Kenobi']


 ### Modifying individual list items

In [6]:
characters[0] = "Boba Fett"
print(characters)

['Boba Fett', 'Darth Vader', 'Yoda', 'Obi-Wan Kenobi']


 ### Adding and removing individual items:

In [7]:
# Adding an item
characters.append("Jaja Bings")
print(characters)

['Boba Fett', 'Darth Vader', 'Yoda', 'Obi-Wan Kenobi', 'Jaja Bings']


In [8]:
# Removing an item
characters.remove("Jaja Bings")
print(characters)

['Boba Fett', 'Darth Vader', 'Yoda', 'Obi-Wan Kenobi']


 ### Adding another list (i.e. concatenating lists):

In [9]:
# using the + operator
characters = characters + ["Palpatine", "Kylo"]
print(characters)
# using the extend() method
characters.extend(["Ackbar", "Organa"])
print(characters)
# careful with the append() method, when appending a list, it will be added as a single (nested) element
characters.append(["Palpatine", "Kylo"])
print(characters)

['Boba Fett', 'Darth Vader', 'Yoda', 'Obi-Wan Kenobi', 'Palpatine', 'Kylo']
['Boba Fett', 'Darth Vader', 'Yoda', 'Obi-Wan Kenobi', 'Palpatine', 'Kylo', 'Ackbar', 'Organa']
['Boba Fett', 'Darth Vader', 'Yoda', 'Obi-Wan Kenobi', 'Palpatine', 'Kylo', 'Ackbar', 'Organa', ['Palpatine', 'Kylo']]


 ### Conveniently generating lists

 #### Repeated elements

 Lists of repeated elements can be easily created using the multiplication operator (*), for example:

In [10]:
zeros = [0] * 5
print(zeros)
product_categories = ["Product A"] * 3 + ["Product B"] * 2
print(product_categories)

[0, 0, 0, 0, 0]
['Product A', 'Product A', 'Product A', 'Product B', 'Product B']


 #### Numeric ranges

 Lists of numeric ranges can be easily created using the `range()` function, for example:

In [11]:
numbers = list(range(1, 6))
print(numbers)

[1, 2, 3, 4, 5]


 ### Lists of mixed types

 Lists can contain objects of different types, for example:

In [12]:
another_list = [1, "hello", 3.0]
print(another_list)
print(type(another_list))

[1, 'hello', 3.0]
<class 'list'>


 ### Strings are also lists

 Strings can be treated as lists of characters, and therefore can be sliced and accessed in the same way, for example:

In [13]:
s = "hello world"
print(s[0])
print(s[0:5])
print(s[:5])

h
hello
hello


 ### Nested lists

In [14]:
nested_list = [[1, 2, 3], [4, 5, 6]]
print(nested_list)
print(nested_list[0])
print(nested_list[1])
print(nested_list[0][0])
print(nested_list[0][1])

[[1, 2, 3], [4, 5, 6]]
[1, 2, 3]
[4, 5, 6]
1
2


 ### Sorting lists

 Lists can be sorted or reversed, for example:

In [15]:
numbers = [5, 7, 2, 1, 3, 4, 6]
numbers.sort()
print(numbers)
numbers.reverse()
print(numbers)

[1, 2, 3, 4, 5, 6, 7]
[7, 6, 5, 4, 3, 2, 1]


 ### The len() function

 The len() function can be used to get the length of a list, tuple, string, or dict, for example:

In [16]:
print(len([1, 2, 3]))
print(len("hello"))

3
5


 <blockquote><b>&#x1F517; Data Preparation &amp; Analysis</b>

 * lists are omnipresent in data preparation and analysis: they are used to store data or results of data analysis, etc.

 * in fact, columns of *pandas* DataFrames are (special kinds of) lists of numbers, strings etc.

 * when reading data from a directory, lists are used to collect and select the file paths

 * in machine learning, lists are used to store the results of cross-validation, hyperparameter tuning, etc.

 </blockquote>

 ### Exercice: Lists

 #### Tasks:

 1. create a variable called *numbers* with the value [1,2,3,4,5] and print it on the screen

 2. add the values [6,7,8] to *numbers* and print it on the screen

 3. add the value 9 to *numbers* and print it on the screen

 4. add the values [10,11,12] to *numbers* and print it on the screen

 5. remove the value 1 from *numbers* and print it on the screen

 6. sort *numbers* and print it on the screen

 7. BONUS: use the reverse() method to reverse *numbers* and print it on the screen

 8. BONUS: use the index() method to find the index of the value 5 in *numbers* and print it on the screen

 9. BONUS: use the count() method to count the number of times the value 5 appears in *numbers* and print it on the screen

 10. print the number of words in "May the Force be with you" on the screen

 #### Solution:

In [17]:
# 1. create a variable called *numbers* with the value [1,2,3,4,5] and print it on the screen
numbers = [1, 2, 3, 4, 5]
print(numbers)

# 2. add the values [6,7,8] to *numbers* and print it on the screen
numbers = numbers + [6, 7, 8]
print(numbers)

# 3. add the value 9 to *numbers* and print it on the screen
numbers.append(9)
print(numbers)

# 4. add the values [10,11,12] to *numbers* and print it on the screen
numbers.extend([10, 11, 12])
print(numbers)

# 5. remove the value 1 from *numbers* and print it on the screen
numbers.remove(1)
print(numbers)

# 6. sort *numbers* and print it on the screen
numbers.sort()
print(numbers)

# 7. BONUS: reverse *numbers* and print it on the screen
numbers.reverse()
print(numbers)

# 8. BONUS: find the index of the value 5 in *numbers* and print it on the screen
print(numbers.index(5))

# 9. BONUS: count the number of times the value 5 appears in *numbers* and print it on the screen
print(numbers.count(5))

# 10. print the number of words in "May the Force be with you" on the screen
print(len("May the Force be with you".split()))

[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5, 6, 7, 8]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
[12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2]
7
1
6


 ## Tuples

 Tuples are similar to lists but are immutable. This means that once a tuple is created, it cannot

 be modified



 ### Creating tuples

 Tuples are created using parentheses (), for example:

In [18]:
t = (1, 2, 3)
print(t)
print(type(t))

(1, 2, 3)
<class 'tuple'>


 ### Accessing and slicing tuples

 Tuples can be accessed and sliced in the same way as lists, for example:

In [19]:
print(t[0])
print(t[0:2])

1
(1, 2)


 ### Immutability of tuples

 But contrary to lists, tuples cannot be modified, for example:

In [20]:
# t[0] = 4 # this will raise an error

 <blockquote><b>&#x1F517; Data Preparation &amp; Analysis</b>

 * in most cases, you will not require tuples for data preparation and analysis and will use lists instead

 * however, many common packages in Python return tuples, e.g. the *os.path.split()* function returns a tuple

 * furthermore, scikit-learn, a popular machine learning library, often expects tuples as input, and returns tuples as output

 </blockquote>

 ## Dictionaries

 Dictionaries are used to store key-value pairs


 ### Creating dictionaries

 Dictionaries are created using curly braces {}, for example:

In [21]:
droid = {"name": "R2-D2", "type": "Astromech", "creator": "Anakin Skywalker"}
print(droid)
print(type(droid))

{'name': 'R2-D2', 'type': 'Astromech', 'creator': 'Anakin Skywalker'}
<class 'dict'>


 ### Accessing dictionary items

 Dicts can be accessed using their key, for example:

In [22]:
print(droid["name"])
print(droid.get("type"))
# get also allows a default value for missing keys
print(droid.get("model", "Unknown"))

R2-D2
Astromech
Unknown


 ### Returning all keys and values

In [23]:
print(droid.keys())  # get all keys as a dynamic view (updates if dict changes, set operations work)
print(list(droid))  # get a static list of keys
print(droid.values())  # returns a list of all values
print(list(droid.values()))  # get a static list of values
print(droid.items())  # returns a list of tuples of key-value pairs

dict_keys(['name', 'type', 'creator'])
['name', 'type', 'creator']
dict_values(['R2-D2', 'Astromech', 'Anakin Skywalker'])
['R2-D2', 'Astromech', 'Anakin Skywalker']
dict_items([('name', 'R2-D2'), ('type', 'Astromech'), ('creator', 'Anakin Skywalker')])


 Demonstrating the difference between dynamic views and lists

In [24]:
values_view = droid.values()
keys_view = droid.keys()
values_list = list(droid.values())
keys_list = list(droid)
droid["new_key"] = "new_value"
print(f"Dynamic view (keys): {keys_view}")
print(f"Static list (keys): {keys_list}")
print(f"Dynamic view (values): {values_view}")
print(f"Static list (values): {values_list}")

Dynamic view (keys): dict_keys(['name', 'type', 'creator', 'new_key'])
Static list (keys): ['name', 'type', 'creator']
Dynamic view (values): dict_values(['R2-D2', 'Astromech', 'Anakin Skywalker', 'new_value'])
Static list (values): ['R2-D2', 'Astromech', 'Anakin Skywalker']


 ### Adding/Updating items

In [25]:
droid["companion"] = "C3PO"  # adding a new key-value pair
print(droid)
droid["companion"] = "C-3PO"  # updating an existing key-value pair
print(droid)

{'name': 'R2-D2', 'type': 'Astromech', 'creator': 'Anakin Skywalker', 'new_key': 'new_value', 'companion': 'C3PO'}
{'name': 'R2-D2', 'type': 'Astromech', 'creator': 'Anakin Skywalker', 'new_key': 'new_value', 'companion': 'C-3PO'}


 ### Removing items

In [26]:
del droid["creator"]
print(droid)

{'name': 'R2-D2', 'type': 'Astromech', 'new_key': 'new_value', 'companion': 'C-3PO'}


 ### Combining ('merging') dictionaries

 Combinining dictionaries is sometimes referred to as 'merging'. This can be done using the update() method, for example:

In [27]:
droid_extras = {"job": "navigator", "special_skill": "slicing"}
print(droid_extras)
droid.update(droid_extras)
print(droid)

# [markdown]
#
#  Dictionaries can also be merged using the `|` operator, and the `**` unpacking operator:

{'job': 'navigator', 'special_skill': 'slicing'}
{'name': 'R2-D2', 'type': 'Astromech', 'new_key': 'new_value', 'companion': 'C-3PO', 'job': 'navigator', 'special_skill': 'slicing'}


In [28]:
print(droid | droid_extras)  # using the | operator
print({**droid, **droid_extras})  # using the ** unpacking operator

{'name': 'R2-D2', 'type': 'Astromech', 'new_key': 'new_value', 'companion': 'C-3PO', 'job': 'navigator', 'special_skill': 'slicing'}
{'name': 'R2-D2', 'type': 'Astromech', 'new_key': 'new_value', 'companion': 'C-3PO', 'job': 'navigator', 'special_skill': 'slicing'}


 ### Nested and mixed dictionaries

 Dicts can be nested and contain different types of objects, for example:

In [29]:
mixed_dict = {
    "a": 1,
    "b": ["x", 2, "z"],
    "c": {"d": 4, "e": 5},
    "d": True,
    "e": 3.0,
    "f": "abc",
}
print(mixed_dict)
print(mixed_dict["a"])
print(type(mixed_dict["a"]))
print(mixed_dict["b"])
print(type(mixed_dict["b"]))
print(mixed_dict["c"])
print(type(mixed_dict["c"]))
print(mixed_dict["d"])
print(type(mixed_dict["d"]))
print(mixed_dict["e"])
print(type(mixed_dict["e"]))
print(mixed_dict["f"])
print(type(mixed_dict["f"]))

{'a': 1, 'b': ['x', 2, 'z'], 'c': {'d': 4, 'e': 5}, 'd': True, 'e': 3.0, 'f': 'abc'}
1
<class 'int'>
['x', 2, 'z']
<class 'list'>
{'d': 4, 'e': 5}
<class 'dict'>
True
<class 'bool'>
3.0
<class 'float'>
abc
<class 'str'>


 <blockquote><b>&#x1F517; Data Preparation &amp; Analysis</b>

 * in pandas *DataFrames*, dictionaries are used to create new columns or to replace existing columns

 * *DataFrames* are a special kind of dictionary, where the keys are the column names and the values (which are lists) are the columns themselves

 * dictionaries are also used to store *named* results of data analysis (e.g. results = {"accuracy": 0.95, "precision": 0.85})

 </blockquote>

 ## Sets

 Sets are used to store unique values


 ### Creating sets

 Sets are created using curly braces {}, for example:

In [30]:
jedi = {"Anakin Skywalker", "Luke Skywalker", "Yoda"}
print(jedi)
print(type(jedi))

{'Anakin Skywalker', 'Luke Skywalker', 'Yoda'}
<class 'set'>


 A main advantage of sets is that they automatically remove duplicates:

In [31]:
jedi_with_duplicates = {"Anakin Skywalker", "Luke Skywalker", "Yoda", "Anakin Skywalker"}
print(jedi_with_duplicates)

{'Anakin Skywalker', 'Luke Skywalker', 'Yoda'}


 ### Set operations

 Sets are particularly useful for membership testing and identifying differences and overlap between
 collections

In [32]:
sith = {"Palpatine", "Anakin Skywalker", "Ben Solo"}
print("Palpatine" in jedi)  # False
print("Anakin Skywalker" in sith)  # True
print(jedi.intersection(sith))  # members in both sets
print(jedi & sith)  # same as intersection, but using the & operator
print(jedi.difference(sith))  # jedi that never turned into sith
print(jedi - sith)  # same as difference, but using the - operator
print(sith - jedi)  # sith that were never jedi
print(jedi.union(sith))  # all members of both sets without duplication
print(jedi | sith)  # same as union, but using the | operator

False
True
{'Anakin Skywalker'}
{'Anakin Skywalker'}
{'Luke Skywalker', 'Yoda'}
{'Luke Skywalker', 'Yoda'}
{'Palpatine', 'Ben Solo'}
{'Ben Solo', 'Yoda', 'Anakin Skywalker', 'Palpatine', 'Luke Skywalker'}
{'Ben Solo', 'Yoda', 'Anakin Skywalker', 'Palpatine', 'Luke Skywalker'}


 <blockquote><b>&#x1F517; Data Preparation &amp; Analysis</b>

 When working with several datasources set operations can be used to identify unique and/or overlapping column names.
 </blockquote>

 ### Adding and removing elements

 You can add elements to a set using the `add()` method and remove elements using the `remove()` method:

In [33]:
jedi.add("Obi-Wan Kenobi")
print(jedi)

jedi.remove("Yoda")
print(jedi)

{'Anakin Skywalker', 'Luke Skywalker', 'Yoda', 'Obi-Wan Kenobi'}
{'Anakin Skywalker', 'Luke Skywalker', 'Obi-Wan Kenobi'}


 ## Exercice: Dictionaries and Sets

 ### Tasks:

 1. create a variable called *d* with the value {"a":1,"b":2,"c":3} and print it on the screen

 2. modify the value of "a" in *d* to 10 and print it on the screen

 3. add the key-value pair "d":4 to *d* and print it on the screen

 4. create a variable called *d2* with the value {"e":5,"f":6} and print it on the screen

 5. use the update() method to add the key-value pairs from *d2* to *d* and print it on the screen

 6. print the keys that exist in both *d* and *d2*

 7. print the keys that exist only in *d2* but not in *d*

 8. Bonus: use the pop() method to remove the key-value pair "b":2 from *d* and print it on the screen

 9. Bonus: solve task 5 using the ** operator instead of the update() method

 10. Bonus: use the clear() method to remove all key-value pairs from *d* and print it on the screen

 ### Solution:

In [34]:
# 1. create a variable called *d* with the value {"a":1,"b":2,"c":3} and print it on the screen
d = {"a": 1, "b": 2, "c": 3}
print(d)

# 2. modify the value of "a" in *d* to 10 and print it on the screen
d["a"] = 10
print(d)

# 3. add the key-value pair "d":4 to *d* and print it on the screen
d["d"] = 4
print(d)

# 4. create a variable called *d2* with the value {"e":5,"f":6} and print it on the screen
d2 = {"e": 5, "f": 6}
print(d2)

# 5. use the update() method to add the key-value pairs from *d2* to *d* and print it on the screen
d.update(d2)
print(d)

# 6. print the keys that exist in both *d* and *d2*
print(d.keys() & d2.keys())

# 7. print the keys that exist only in *d2* but not in *d*
print(d2.keys() - d.keys())

# 8. Bonus: use the pop() method to remove the key-value pair "b":2 from *d* and print it on the screen
print(d.pop("b"))
print(d)

# 9. Bonus: solve task 5 using the ** operator instead of the update() method
d3 = {**d, **d2}
print(d3)

# 10. Bonus: use the clear() method to remove all key-value pairs from *d* and print it on the screen
d.clear()
print(d)

{'a': 1, 'b': 2, 'c': 3}
{'a': 10, 'b': 2, 'c': 3}
{'a': 10, 'b': 2, 'c': 3, 'd': 4}
{'e': 5, 'f': 6}
{'a': 10, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6}
{'e', 'f'}
set()
2
{'a': 10, 'c': 3, 'd': 4, 'e': 5, 'f': 6}
{'a': 10, 'c': 3, 'd': 4, 'e': 5, 'f': 6}
{}
