# Session 3: Setting up Python and Data Types

## Learning Outcomes
By the end of this session, you will be able to:
Choose between lists, sets, dictionaries and tuples as the most appropriate 

*   Data structure
*   Add, access and remove data from these structures





## 3.3 Lists - What are lists? Create, add to, remove from, access by index


 To begin, we can declare a variable just like we learned in the previous session, but this time, we will instantiate it using square brackets instead of a value. 

In [None]:
x = []

What this means is that we have just created a new empty list, as we can see there are no values between the brackets. To demonstrate this, we can try printing our list variable to see what it contains. 
Now if we press control and enter, we can see that our output just shows us empty brackets.


In [None]:
print(x)

[]


 To populate our list with some values, we can simply add a few items, separated by commas. Printing our list now shows that it contains the numbers 1 to 5.

In [2]:
x = [1,2,3,4,5]
print(x)
#print(type(x))

[1, 2, 3, 4, 5]


An interesting feature of lists in python is that we can have mixed data types in a single data structure. If we consider that our list currently only contains integers, we can simply replace one with a word, otherwise known as a String.  
We can now see that our list not only contains integers, but Strings as well.


In [None]:
x = [1,2, "three", 4, 5]
print(x)

[1, 2, 'three', 4, 5]


 Having lists is a great way for us to manage quantities of data that would be unmanageable if we had to declare new variables for each one, but we still need to be able to access each value individually. To do this, we use ‘indexing’. Each of the values in our list is known as an element, and these elements have order. This allows us to numerically reference each element in our list. To do this, we will modify our print command to not just print our entire list, but a specific element. If we insert square brackets just after the variable name, we can use a number to reference the element we would like. 


In [None]:
print(x[1])

2


Now you can see that we have just a single value, instead of all five. What might have caught your attention, is that, although we used the index ‘1’, python printed ‘2’. This is because indexing in Python, like many other languages, starts at ‘0’, so if we want the first element in the list, we must use index ‘0’, like so:



In [4]:
print(x[0])

1


Another useful thing to remember in python is that we can use ‘negative indexing’, meaning that we can start indexing from the end of the list instead. If we change our index to negative 1, we can see that we get the last element in our list. If we change this to negative 2, we get the second last item in our list. 


In [None]:
print(x[-1])

5


In [None]:
print(x[-2])

4


  These are great ways to access our individual elements, but sometimes we want to access more than one value at a time. To do this, we can use ‘slices’. This means that we can extract subsets of our list by using a colon seperated pair of integers as an index. If we use index ‘0’, and then try adding a colon and index ‘2’ immediately afterwards, we can see that we python returns to us the first two elements of our list. 


In [None]:
print(x[0:2])

[1, 2]


It is important to note here that our indexes are inclusive of the first index, but not the last. That is, index ‘2’ contains the value ‘three’, but it is not returned to us. Similarly, we can simply leave out one of the indexes to return all the items up to an index, like so:

In [None]:
print(x[:2])

[1, 2]



Or, all the items from an index to the end of the list, like so: 

In [None]:
print(x[2:])

['three', 4, 5]


If we need to do something a little more advanced, for example, printing every second element, we can use a "step" like we would a slice, to do this, we include another colon, and the size of the step we would like. For example, if we would like to print every second element, starting at position 1 all the way to our last position, we could do this:

In [6]:
print(x[0:-1:2])

[1, 3]


If we wished to start at the very first position, and include the last position, we simply do not give python values to start and finish the slice. This looks a little odd, but it makes our code very concise!

In [13]:
print(x[::2])
#print(x.index(4))

[1, 3, 5]


In many cases, we would like to iterate through our list and do something with each element. For this, let's consider a 'for' loop. In the 'for' loop, we can use a variable that changes in each iteration of the loop to index our list. This value can then be used to perform calculations. 

In [11]:
for this_value in x:
  print("The current value is:" + str(this_value))
  print("10 times the current value is:" + str(this_value*10))

The current value is:1
10 times the current value is:10
The current value is:2
10 times the current value is:20
The current value is:3
10 times the current value is:30
The current value is:4
10 times the current value is:40
The current value is:5
10 times the current value is:50


Now that we’ve looked at how to index our lists, we must also consider how we might go about finding an index of a certain value. Let’s start by creating a new list containing some capital cities. 

In [14]:
capital_list = ["Beijing", "Kinshasa", "Moscow","Paris","London"]

Now that we have our list, it is sufficiently small enough for us to see that ‘Paris’ is at index ‘3’. But what if we didn’t know where ‘Paris’ was in our list? This is a common scenario because we do not want to have to manually inspect our data to define the location of interesting datapoints. Instead, if we can do it programatically, we can perform much more complex operations in the future. If we want to find what index ‘Paris’ is contained at, we can use the list method ‘index’. To do this, we simply type the name of our list, followed by a fullstop/dot, and the word ‘index’, and some round brackets that contain the value that we are searching for. If we run this, we can see that it returns ‘3’, which is the index that ‘Paris’ is located.


In [15]:
print(capital_list.index("Paris"))

3


Now that we know how to declare lists containing different items, and how to retrieve specific elements,  we also need to know how to add and remove from them. Let's first look at adding to our list - if we wish to add a new capital to our list, let's say "Lima", we can simply append our list like so:

In [16]:
capital_list.append("Lima")
print(capital_list)

['Beijing', 'Kinshasa', 'Moscow', 'Paris', 'London', 'Lima']


We are able to do this because lists in Python are 'mutable' meaning we can add, change, and delete values within it. We can also insert items at specific indexes. If we wish to add yet another city to our list, we can place at whatever index we like.

In [17]:
capital_list.insert(3,"Edinburgh")
print(capital_list)

['Beijing', 'Kinshasa', 'Moscow', 'Edinburgh', 'Paris', 'London', 'Lima']


We can now see that we've added 'Edinburgh" at index 3, which is between "Moscow" and "Paris". In some cases, we may wish to add a sequence of new values to our list. We could do it like this:

In [None]:
capital_list.append("Egypt")
capital_list.append("Kampala")
capital_list.append("Jakarta")
capital_list.append("Tokyo")

print(capital_list)


Or we could simply use the extend our list with our new list of capitals.

In [19]:
capital_list.extend(["New Delhi", "Harare", "Sucre", "Berlin"])

print(capital_list)
print(len(capital_list))

['Beijing', 'Kinshasa', 'London', 'Lima', 'New Delhi', 'Harare', 'Sucre', 'Berlin']
8


Now that we have quite an extensive list of capitals, we can look at ways of removing elements from it. If we wish to remove an element at a certain position, we can simply 'pop' it off.

In [20]:
capital_list.pop(0)
print(capital_list)
print(len(capital_list))

['Kinshasa', 'London', 'Lima', 'New Delhi', 'Harare', 'Sucre', 'Berlin']
7


We can see that the first element has now been removed, and remember that we can also use negative indexing like before to remove elements from the end of the list too! Another common operation is the removal of a specific value, rather than a certain index. So if we wanted to remove "Berlin" from our list, we would do this:

In [21]:
capital_list.remove("Berlin")
print(capital_list)
print(len(capital_list))

['Kinshasa', 'London', 'Lima', 'New Delhi', 'Harare', 'Sucre']
6


We can also remove a slice using the "del" command in similar manner to what we have seen earlier

In [29]:
del capital_list[2:5]
print(capital_list)
print(len(capital_list))
country_list = list(("USA", "UK"))
print(country_list)
print(type(country_list))

['Kinshasa', 'London']
2
['USA', 'UK']
<class 'list'>


To summarise the key characteristics of lists that we have uncovered in this screencast; Lists are both ordered and mutable. Mutable means that add, remove or even change the values in our list, and it is also ordered so we can perform these additions, removals and changes at specific indexes.

That's all for this screencast, next we'll look at putting some of these tasks into practice.

## 3.4 Tuples

So to begin with, let's start by declaring and initialising a tuple. To do this, we simply declare a variable like we have here, enter a comma seperated sequence of values, and enclose it within brackets. A key difference to note at this point, is that the Lists we looked at earlier were enclosed in square brackets, whereas Tuples are enclosed in round brackets, known as parentheses.

In [None]:
first_tuple = (12, 34, 1.44, 56, 1, "tree")
print(first_tuple)

Now, as we expected, we have our first tuple, and just like before, we are able to contain mixed data types within it. In our example, we can see we have Integers, Floats, and Strings.


As Tuples are ordered, just like Lists, we can use the same techniques we learned in the last screen to access the values within them. For example, if we want to find the third value, we would index it like so:

In [None]:
print(first_tuple[2])

You may have noticed that we used index 2 to address the third position, and if you recall, this is because python begins its indexing at zero.

 Much like lists, we can also slice our Tuple to get sublists like this:

In [None]:
print(first_tuple[1:3])

And negative indexing also let's us access the last items in our Tuples.

In [None]:
print(first_tuple[-1])

As I mentioned before, the key difference between Tuples and Lists, are that Lists are mutable, and Tuples are immutable. This means that, after we create our tuple, we can't change the values in it like we did with our lists. For example, if we wish to change the first item in our list, we will be met with the following error.

In [None]:
first_tuple[0] = 3.14

As you can see, the tuple object does not support item assignment. 

And if we try to append another value to the end, like we did with our lists, we get another error.



In [None]:
first_tuple.append("Oak")

In summary, tuples are another data structure that can be used to store values or objects in order, but unlike lists, they are of fixed length, and immutable.

If we have a group of values that we will not need to change, Tuples can be a good choice for two reasons; first, we can make our code safer by preventing alterations to our data, and second, it is slightly faster. While this speed up will be very difficult to notice in small pieces of code, there are many cases in which we find ourselves repeatedly running the same lines of code very large numbers of times. It is in these cases, that we should consider using tuples where we can.  

That's all for this screencast, next, we'll look at another datastructure; Sets.

## 3.5 Sets

To declare a set, we have two different syntaxes that performs the same task. We can create a set using syntax that looks very much like that of lists and tuples, where we contained a comma seperated set of values within 'curly brackets', also known as 'curly braces'.

In [None]:
first_set = {"one", 2, 3, 4.0, 5}
print(first_set)

{2, 3, 4.0, 5, 'one'}


And another method which is a little more explicit, where we use the keyword 'set', with double parenthesis.

In [None]:
first_set = set(("one", 2, 3, 4.0, 5))
print(first_set)

{2, 3, 4.0, 5, 'one'}


While both of these methods creates the same set, the differences between this and the previous data structures becomes immediately apparent when we look at what was printed. We can see that we had a string as the first element in our declaration, but when it was printed, it was moved to the end. This demonstrates to us that sets are actually unordered.

This has further implications when we consider the manner in which we have previously accessed the items in our datastructures up until this point. Previously, we relied on indexing our data. That is, when we wanted the first item in our list or set, it was trivial to access it. As Sets are not ordered, we can no longer employ this technique. This may seem like a disadvantage, but it provides us with a highly optimised way of checking if a value is already in our set. If we simply ask python if a value is 'in' our set, it returns a true or false value, like this:




In [None]:
"one" in first_set

True

In [None]:
1 in first_set

False

Another key difference is that sets do not contain duplicate values. When we try to create a set that contains 2 or more instances of the same value, all but one are discarded. For example, in this set, we can see that all the animals are recorded twice.

In [None]:
second_set = { "cat", "cat", "dog", "goat", "dog", "goat" }
print(second_set)


{'goat', 'cat', 'dog'}


After creating it, we can see that only unique values remain.

Sets are also mutable, so we can add a new value by using 'add'

In [None]:
second_set.add("horse")
print(second_set)

{'horse', 'goat', 'cat', 'dog'}


and remove them by using the 'remove' function

In [None]:
second_set.remove("dog")
print(second_set)

{'horse', 'goat', 'cat'}


So, like lists, sets can be changed, but if we want to create an immutable version, we use 'frozensets'. To do this, we use the keyword frozenset, much like we did to create our first set.

In [None]:
third_set = frozenset(("one", 2, 3, 4.0, 5))
print(third_set)

frozenset({2, 3, 4.0, 5, 'one'})


Now we can see that, in addition to the curly braces, it also says 'frozenset'. What this means, is that, if we try to add another value, like this:

In [30]:
third_set.add("ANOTHER_VALUE")
print(third_set)

NameError: name 'third_set' is not defined

we get an error, meaning that this new set cannot be altered. This is very much like the tuples we looked at in the previous screen, but it is unordered so we cannot access it by index, and we can easily check for the presence of a value, without the need to iterate through it, checking each entry.

Sets also contain two very useful built in functions; union and intersection. If we have two sets, we know that each one can only consist of unique values, but sometimes we want to understand the relationship between them. If we create two new sets like so:

In [None]:
s1 = { 1 , 2, 1, 2, 3, 4, 3, 3 }
s2 = { 2, 7, 4, 4 , 8, 9, 10 }

print(s1)
print(s2)

{1, 2, 3, 4}
{2, 4, 7, 8, 9, 10}


We can see that all duplicate values are removed. But if we want to know which values are contained in both sets, we can use intersection. Here we can see that we reference our first set, s1, and use its inbuilt method by using dot, followed by the method intersection. We then pass our second set, S2, between these round brackets that you see here.

In [None]:
print(s1.intersection(s2))

{2, 4}


And as you can see, it prints 2 and 4, as these are the two values that contained in both sets.

If we want to find out what all the unique values are across both sets, we can use union. Once again, we reference our first set, s1, and use its inbuilt method by using dot, followed by the method union. We then pass our second set, S2, between these round brackets that you see here. And we then get all the values that appear across both of our sets.

In [None]:
print(s1.union(s2))

{1, 2, 3, 4, 7, 8, 9, 10}


That's it for this video, next we look at our final data structure, Dictionaries!

## 3.6 Dictionaries 

Dictionaries look quite different to the previous structures as they consist of key-value pairs. In this piece of code, we can see that we have some module codes, paired with the names of those modules. We can see that each module code, for example "ITNPBD1", is paired with the module name using a colon, in this case "Mathematics for Big Data", and each entry in the dictionary is seperated by commas, just like previous data structures.



In [None]:
modules = {'ITNPBD1':'Mathematics for Big Data',
        'ITNPBD2':'Representing and Manipulating Data',
        'ITNPBD3':'Relational and Non-Relational Databases'}

print(modules)

{'ITNPBD1': 'Mathematics for Big Data', 'ITNPBD2': 'Representing and Manipulating Data', 'ITNPBD3': 'Relational and Non-Relational Databases'}


If we created a list with these details, we might store a module code in the first index, followed by its name in the second index, but things would become difficult when we have more than one module on a course!

Now that we have our dictionary containing the modules for a degree course, we would like to be able to access the entries, for example, if we were asked, "what is the name of module with the code 'ITNBPD2'?" we would use the following code:



In [None]:
print(modules['ITNPBD2'])

Representing and Manipulating Data


Here we can see that we use the same syntax as we would have if we were accessing an index in a list, but instead of using a number representing the index, we used the key value, which in this case, is the module code.

 Dictionaries are also mutable, so changing the values assigned to keys is done quite easily. If we want to rename one of our modules, we simply address the entry using its key, and assign its new name, as we see here:

In [None]:
modules['ITNPBD2'] = "Big Data Analytics"
print(modules['ITNPBD2'])

Big Data Analytics


Adding a new entry to our dictionary is also quite easy, and uses the same syntax as we just used to access the module name. Here, we will add a new module to our course.

In [None]:
modules['ITNPBD4'] = "Machine Learning"
print(modules['ITNPBD4'])

Machine Learning


And now if we print our dictionary, 

In [None]:
print(modules)

{'ITNPBD1': 'Mathematics for Big Data', 'ITNPBD2': 'Big Data Analytics', 'ITNPBD3': 'Relational and Non-Relational Databases', 'ITNPBD4': 'Machine Learning'}


we can see that our full course now contains a new module. It is at this point that we need to consider how this simplicity could cause also cause problems for us. If we wish to change the value of one of our modules, we would use the following:

In [None]:
modules['ITNPBD4'] = "Research Project"
print(modules)

{'ITNPBD1': 'Mathematics for Big Data', 'ITNPBD2': 'Big Data Analytics', 'ITNPBD3': 'Relational and Non-Relational Databases', 'ITNPBD4': 'Research Project'}


And we can see that our new module has changed, but, if we had accidentally mistyped our module code, we would have created a new module, instead of changing the value!

In [None]:
modules['TNPBD4'] = "Placement"
print(modules)

{'ITNPBD1': 'Mathematics for Big Data', 'ITNPBD2': 'Big Data Analytics', 'ITNPBD3': 'Relational and Non-Relational Databases', 'ITNPBD4': 'Research Project', 'ITNPB4': 'Research Project', 'TNPBD4': 'Placement'}


As we can see, instead of changing the value of our 4th module, we now have a 5th!

When we have key-value pairs that we no longer want in our dictionary, we can remove them using the 'del' command. To use it, do so with the following code:

In [None]:
del modules['TNPBD4']
print(modules)

{'ITNPBD1': 'Mathematics for Big Data', 'ITNPBD2': 'Big Data Analytics', 'ITNPBD3': 'Relational and Non-Relational Databases', 'ITNPBD4': 'Research Project', 'ITNPB4': 'Research Project', 'TNPBD5': 'Placement', 'TNPBD6': 'Placement', 'ITNPBD6': 'Non-Relational Databases'}


And we can see that the key, and its corresponding value have been removed from our dictionary.

Dictionaries also have a number of built-in methods that can be very useful for us. For example, if we want to view all the keys in dictionary, we simply use dot keys, like so:

In [None]:
print(modules.keys())

dict_keys(['ITNPBD1', 'ITNPBD2', 'ITNPBD3', 'ITNPBD4', 'ITNPB4', 'TNPBD4'])


Or if we want to view all the values, we can use dot values.

In [None]:
print(modules.values())

dict_values(['Mathematics for Big Data', 'Big Data Analytics', 'Relational and Non-Relational Databases', 'Research Project', 'Research Project', 'Placement'])


Note that when we use one of the inbuilt methods, we use the name of our dictionary, followed by a dot, the name of the function, and then brackets, to indicate to python that we are asking it to perform an action for us.

In summary, dictionaries provide us with an ordered data structure consisting of key-value pairs, that we access by key, rather than index. This gives us the ability to retrieve information from our datastructure based on its meaning, rather than its position in the structure.

## 3.7 When to use each structure

N/A