# Data Types

Data structures are basically just that - they are structures which can hold some *data* together. In other words, they are used to store a collection of related data.

There are four built-in data structures in Python - `list`, `tuple`, `dict` and `set`. We will see how to use each of them and how they make life easier for us.

## Lists

A `list` is a data structure that holds an ordered collection of items i.e. you can store a *sequence* of items in a list. This is easy to imagine if you can think of a shopping list where you have a list of items to buy, except that you probably have each item on a separate line in your shopping list whereas in Python you put commas in between them.

The list of items should be enclosed in square brackets so that Python understands that you are specifying a list. Once you have created a list, you can add, remove or search for items in the list. Since we can add and remove items, we say that a list is a *mutable* data type i.e. this type can be altered.

In [2]:
# This is my shopping list
shoplist = ['apple', 'mango', 'carrot', 'banana']

The variable `shoplist` is a shopping list for someone who is going to the market. In shoplist, we only store strings of the names of the items to buy but you can add any kind of object to a list including numbers and even other lists.

In [3]:
print(f'I have {len(shoplist)} items to purchase.')

I have 4 items to purchase.


In [3]:
# Note that end=' ' specified that we want a space after the print, not a new line
print('These items are:', end=' ') 
for item in shoplist:
    print(item, end=' ')

These items are: apple mango carrot banana 

We have used the a `for` loop to iterate through the items of the list. By now, you must have realised that a list is also a sequence. Notice the use of the `end` keyword argument to the `print` function to indicate
that we want to end the output with a space instead of the usual line break.

In [4]:
print('I also have to buy rice.')
shoplist.append('rice')
print('My shopping list is now:', shoplist)

I also have to buy rice.
My shopping list is now: ['apple', 'mango', 'carrot', 'banana', 'rice']


We add an item to the list using the `append` method of the list object. Then, we check that the item has been indeed added to the list by printing the contents of the list by simply passing the list to the `print` statement which prints it neatly.

In [5]:
print('I will sort my list now')
shoplist.sort()
print('Sorted shopping list is:', shoplist)

I will sort my list now
Sorted shopping list is: ['apple', 'banana', 'carrot', 'mango', 'rice']


We `sort` the list by using the sort method of the list. It is important to understand that this method affects the list itself and does not return a modified list - this is different from the way strings work. This is what we mean by saying that lists are mutable and that strings are *immutable*.

In [6]:
print('The first item I will buy is', shoplist[0])
olditem = shoplist[0]
del shoplist[0]
print('I bought the', olditem)
print('My shopping list is now:', shoplist)

The first item I will buy is apple
I bought the apple
My shopping list is now: ['banana', 'carrot', 'mango', 'rice']


When we finish buying an item in the market, we want to remove it from
the list. We achieve this by using the `del` statement. Here, we mention which
item of the list we want to remove and the `del` statement removes it from the
list for us. We specify that we want to remove the first item from the list and
hence we use `del shoplist[0]` (remember that Python starts counting from 0).

If you want to know all the methods defined by the list object, see `help(list)`
for details.

### List comprehension

When programming, frequently we want to transform one type of data into another. As a simple example, consider the following code that computes square numbers:

In [7]:
nums = [0, 1, 2, 3, 4]
squares = []
for x in nums:
    squares.append(x ** 2)
print(squares)

[0, 1, 4, 9, 16]


You can make this code simpler using a list comprehension

In [8]:
nums = [0, 1, 2, 3, 4]
squares = [x ** 2 for x in nums]
print(squares)

[0, 1, 4, 9, 16]


List comprehensions can also contain conditions:

In [9]:
nums = [0, 1, 2, 3, 4]
even_squares = [x ** 2 for x in nums if x % 2 == 0]
print(even_squares)

[0, 4, 16]


## Milestone: Mean & Variance

Write a program to calculate the mean and variance of 1000 randomly generated numbers in the range [0, 1]. The mean $\mu$ and variance $\sigma$ of a series of  $n$ numbers is:

$$
\begin{eqnarray}
\mu & = & \frac{1}{n}\sum^n_{i=1}{x_i} \\
\sigma & = & \frac{1}{n}\sum^n_{i=1}{(x_i-\mu)^2} 
\end{eqnarray}
$$

To implement the program above we need to know how to generate random numbers. For now we'll use a module from the `numpy` package. We'll be looking at `numpy` in more detail later on.

In [4]:
import numpy as np

# Generate 1000 random numbers between 0 and 1
numbers = np.random.random(1000)

# Calculate mean
mean = 0
for number in numbers:
    mean += number
mean /= len(numbers)

# The below will calculate the mean as the one above
# but uses the built-in function sum
mean = sum(numbers) / len(numbers)

# Calculate variance
variance = 0
for number in numbers:
    variance += (number - mean)**2
variance /= len(numbers)

# Print result (mean should be ~0.5 and variance should be ~0.08)
print(f"Mean is {mean:.3f} and variance is {variance:.3f}")

Mean is 0.510 and variance is 0.085


In [5]:
# This implementation will give the same output as the one above,
# however uses list comprehension for the variance
import numpy as np

numbers = np.random.random(1000)

mean = sum(numbers) / len(numbers)
variance = sum([(number - mean)**2 for number in numbers]) / len(numbers)

print(f"Mean is {mean:.3f} and variance is {variance:.3f}")

Mean is 0.494 and variance is 0.084


In [6]:
# This implementation uses numpy to calculate the mean and variance, just to
# get a glimpse at the power of numpy
import numpy as np

numbers = np.random.random(1000)

print(f"Mean is {numbers.mean():.3f} and variance is {numbers.var():.3f}")

Mean is 0.492 and variance is 0.085


## Milestone: Matrix Multiplication

Write a program which multiplies two 3x3 matrices $A$ and $B$ and stores the result in a separate matrix $C$. Matrix multiplication is given by:

$$ C_{ij} = \sum^n_{k=1}{A_{ik}B_{kj}} $$

where $i=1,...m$ and $j=1,...m$. The product $C$ is an array in two dimensions in which $C_{ij}$ is the scalar product of row $i$ of matrix $A$ and column $j$ of matrix $B$. The scalar product is in turb given by:

$$s=\sum^n_{i=1}{a_ib_i} $$

A matrix is a 2D entity, so ideally we represent it in something which can express this "2D-ness". Remember that a list can store any sequence of objects in it, even other lists. So we can store a matrix as a list of lists.

In [3]:
# Create matrices
A = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
B = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
C = [[0, 0, 0], [0, 0, 0], [0, 0, 0]]

# Perform the multiplication
for i in range(3):
    for j in range(3):
        for k in range(3):
            C[i][j] += A[i][k] * B[k][j]
            
print(C)

[[30, 36, 42], [66, 81, 96], [102, 126, 150]]


In [4]:
# This implementation uses numpy to perform the matrix multiplication
from numpy import matrix

A = matrix([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
B = matrix([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print(A * B)

[[ 30  36  42]
 [ 66  81  96]
 [102 126 150]]


## Tuples

Tuples are used to hold together multiple objects. Think of them as similar to
lists, but without the extensive functionality that the list class gives you. One major feature of tuples is that they are **immutable** like strings i.e. you cannot modify tuples.

Tuples are defined by specifying items separated by commas within a pair of parentheses. 

Tuples are usually used in cases where a statement or a user-defined function
can safely assume that the collection of values i.e. the tuple of values used will not change.

In [None]:
zoo = ('python', 'elephant', 'penguin')
print('Number of animals in the zoo is:', len(zoo))

new_zoo = ('monkey', 'camel', zoo)
print('Number of cages in the new zoo is:', len(new_zoo))
print('All animals in new zoo are:', new_zoo)
print('Animals brought from old zoo are:', new_zoo[2])

print('Last animal brought from old zoo is:', new_zoo[2][2])
print('Number of animals in the new zoo is:', len(new_zoo) - 1 + len(new_zoo[2]))

The variable `zoo` refers to a tuple of items. We see that the len function can be used to get the length of the tuple. This also indicates that a tuple is a sequence as well. 

We are now shifting these animals to a new zoo since the old zoo is being closed. Therefore, the `new_zoo` tuple contains some animals which are already there along with the animals brought over from the old zoo. Back to reality, note that a tuple within a tuple does not lose its identity.

We can access the items in the tuple by specifying the item’s position within
a pair of square brackets just like we did for lists. This is called the indexing operator. We access the third item in `new_zoo` by specifying `new_zoo[2]` and we access the third item within the third item in the new_zoo tuple by specifying `new_zoo[2][2]`.

An empty tuple is constructed by an empty pair of parentheses such as `myempty = ()`. However, a tuple with a single item is not so simple. You have to specify it using a comma following the first (and only) item so that Python can differentiate between a tuple and a pair of parentheses surrounding the object in an expression i.e. you have to specify `mytuple = (2 , )` if you mean you want a tuple containing the item `2`.

In [None]:
# Empty tuple
mytuple = ()
print("Tuple size:", len(mytuple))

# Tuple with one item (notice the comma)
mytuple = ('one', )
print("Tuple size:", len(mytuple))

## Dictionaries

A dictionary is like an address-book where you can find the address or contact
details of a person by knowing only his/her name i.e. we associate `keys` (name)
with `values` (details). Note that the key must be unique just like you cannot
find out the correct information if you have two persons with the exact same
name.

Note that you can use only immutable objects (like strings) for the keys of a
dictionary but you can use either immutable or mutable objects for the values of
the dictionary. This basically translates to say that you should use only simple
objects for keys.

Pairs of keys and values are specified in a dictionary by using the notation `d
= {key1 : value1, key2 : value2 }`. Notice that the key-value pairs are
separated by a colon and the pairs are separated themselves by commas and all
this is enclosed in a pair of curly braces.
Remember that key-value pairs in a dictionary are not ordered in any manner.
If you want a particular order, then you will have to sort them yourself before
using it.

The dictionaries that you will be using are instances/objects of the dict class.

In [8]:
# 'ab' is short for 'a'ddress 'b'ook
ab = { 'Alessio' : 'alessio.magro@um.edu.mt',
       'Zeppi'   : 'zeppi_hafi@golhajt.com.mt',
       'Natalie' : 'natalie.portman@inyourdreams.org',
       'Tom'     : 'tom.tailor@hotmail.com'
}

print("Alessio's address is", ab['Alessio'])

Alessio's address is alessio.magro@um.edu.mt


We create the dictionary `ab` using the notation already discussed. We then access
key-value pairs by specifying the key using the indexing operator as discussed in the context of lists and tuples. Observe the simple syntax.

In [9]:
# Deleting a key-value pair
del ab['Zeppi']
print(f'\nThere are {len(ab)} contacts in the address-book\n')


There are 3 contacts in the address-book



We can delete key-value pairs using our old friend - the `del` statement. We simply specify the dictionary and the indexing operator for the key to be removed and pass it to the `del` statement. There is no need to know the value corresponding to the key for this operation.

In [10]:
for name, address in ab.items():
    print(f'Contact {name} at {address}')

Contact Alessio at alessio.magro@um.edu.mt
Contact Natalie at natalie.portman@inyourdreams.org
Contact Tom at tom.tailor@hotmail.com


We can access each key-value pair of the dictionary using the items method of
the dictionary which returns a list of tuples where each tuple contains a pair of items - the key followed by the value. We retrieve this pair and assign it to the variables `name` and `address` correspondingly for each pair using the `for..in`
loop and then print these values in the for-block.

In [None]:
# Adding a key-value pair
ab['Guido'] = 'guido@python.org'

We can add new key-value pairs by simply using the indexing operator to access
a key and assign that value, as we have done for Guido in the above case.

In [None]:
if 'Guido' in ab:
    print("\nGuido's address is", ab['Guido'])

We can check if a key-value pair exists using the `in` operator.

You can also use dictionary comprehension to easilty construct dictionaries:

In [None]:
nums = [0, 1, 2, 3, 4]
even_num_to_square = {x: x ** 2 for x in nums if x % 2 == 0}
print(even_num_to_square)

For the list of methods of the `dict` class, see `help(dict)`.

## Sequences

Lists, tuples and strings are examples of sequences, but what are sequences and
what is so special about them?

The major features are *membership tests*, (i.e. the in and not in expressions)
and *indexing operations*, which allow us to fetch a particular item in the
sequence directly.

The three types of sequences mentioned above - lists, tuples and strings, also
have a *slicing* operation which allows us to retrieve a slice of the sequence i.e. a part of the sequence.

In [13]:
shoplist = ['apple', 'mango', 'carrot', 'banana']
name = 'alessio'

# Indexing or ’Subscription’ operation
print('Item 0 is', shoplist[0])
print('Item 1 is', shoplist[1])
print('Item 2 is', shoplist[2])
print('Item 3 is', shoplist[3])
print('Item -1 is', shoplist[-1])
print('Item -2 is', shoplist[-2])
print('Character 0 is', name[0])

# Print an empty line
print()

# Slicing on a list
print('Item 1 to 3 is', shoplist[1:3])
print('Item 2 to end is', shoplist[2:])
print('Item 1 to -1 is', shoplist[1:-1])
print('Item start to end is', shoplist[:])

print()

# Slicing on a string
print('characters 1 to 3 is', name[1:3])
print('characters 2 to end is', name[2:])
print('characters 1 to -1 is', name[1:-1])
print('characters start to end is', name[:])

First, we see how to use indexes to get individual items of a sequence. This is
also referred to as the subscription operation. Whenever you specify a number
to a sequence within square brackets as shown above, Python will fetch you the
item corresponding to that position in the sequence. Remember that Python
starts counting numbers from 0. Hence, `shoplist[0]` fetches the first item and
`shoplist[3]` fetches the fourth item in the `shoplist` sequence.

The index can also be a negative number, in which case, the position is calculated from the end of the sequence. Therefore, `shoplist[-1]` refers to the last item in the sequence and `shoplist[-2]` fetches the second last item in the sequence.

The slicing operation is used by specifying the name of the sequence followed by
an optional pair of numbers separated by a colon within square brackets. Note
that this is very similar to the indexing operation you have been using till now. Remember the numbers are optional but the colon isn’t.

The first number (before the colon) in the slicing operation refers to the position from where the slice starts and the second number (after the colon) indicates where the slice will stop at. If the first number is not specified, Python will start at the beginning of the sequence. If the second number is left out, Python will stop at the end of the sequence. Note that the slice returned starts at the start position and will end just before the end position i.e. the start position is included but the end position is excluded from the sequence slice.

Thus, `shoplist[1:3]` returns a slice of the sequence starting at position 1,
includes position 2 but stops at position 3 and therefore a slice of two items is returned. Similarly, `shoplist[:]` returns a copy of the whole sequence.

You can also do slicing with negative positions. Negative numbers are used for
positions from the end of the sequence. For example, `shoplist[:-1]` will return
a slice of the sequence which excludes the last item of the sequence but contains everything else.

You can also provide a third argument for the slice, which is the step for the
slicing (by default, the step size is 1):

In [None]:
shoplist = ['apple', 'mango', 'carrot', 'banana']
print(shoplist[::1])
print(shoplist[::2])
print(shoplist[::3])
print(shoplist[::-1])

Notice that when the step is 2, we get the items with position 0, 2, . . . When
the step size is 3, we get the items with position 0, 3, etc.

Try various combinations of such slice specifications using the Python interpreter interactively i.e. the prompt so that you can see the results immediately. The great thing about sequences is that you can access tuples, lists and strings all in the same way!

## Sets

Sets are *unordered* collections of simple objects. These are used when the
existence of an object in a collection is more important than the order or how
many times it occurs.

Using sets, you can test for membership, whether it is a subset of another set,
find the intersection between two sets, and so on.

In [15]:
numbers = {1, 2, 3, 4, 5, 6}

# The above is the same as
numbers = set([1, 2, 3, 4, 5, 6])
print(numbers)

In [16]:
# Check for membership
print(1 in numbers)
print(10 in numbers)

In [17]:
# Adding to a set
numbers.add(7)
print(numbers)

In [18]:
# Removing from a set
numbers.remove(7)
print(numbers)

In [19]:
set_1 = {1, 2, 3}
set_2 = {2, 3, 4}

# Set intersection (same as 'set_1 & set_2')
print(set_1.intersection(set_2))

In [20]:
# Set union (same as 'set_1 | set_2`)
print(set_1.union(set_2))

In [21]:
# Set difference (same as '`'set_1 - set_2')
print(set_1.difference(set_2))

In [22]:
# Check superset
print(set_1.issuperset({1, 2}))

In [24]:
# Check subset
set_3 = {1, 2}
print(set_3.issubset(set_1))

***
##### You can now work out worksheet 3

***
Back to [index](index.ipynb) page