<small><small><i>
All of these python notebooks are available at [https://gitlab.erc.monash.edu.au/andrease/Python4Maths.git]
</i></small></small>

# Data Structures

In simple terms, It is the the collection or group of data in a particular structure.

## Lists

Lists are the most commonly used data structure. Think of it as a sequence of data that is enclosed in square brackets and data are separated by a comma. Each of these data can be accessed by calling it's index value.

Lists are declared by just equating a variable to '[ ]' or list.

In [109]:
a = []

In [110]:
type(a)

list

One can directly assign the sequence of data to a list x as shown.

In [111]:
x = ['apple', 'orange']

### Indexing

In python, indexing starts from 0 as already seen for strings. Thus now the list x, which has two elements will have apple at 0 index and orange at 1 index. 

In [112]:
x[0]

'apple'

Indexing can also be done in reverse order. That is the last element can be accessed first. Here, indexing starts from -1. Thus index value -1 will be orange and index -2 will be apple.

In [113]:
x[-1]

'orange'

As you might have already guessed, x[0] = x[-2], x[1] = x[-1]. This concept can be extended towards lists with more many elements.

In [114]:
y = ['carrot','potato']

Here we have declared two lists x and y each containing its own data. Now, these two lists can again be put into another list say z which will have it's data as two lists. This list inside a list is called as nested lists and is how an array would be declared which we will see later.

In [115]:
z  = [x,y]
print( z )

[['apple', 'orange'], ['carrot', 'potato']]


Indexing in nested lists can be quite confusing if you do not understand how indexing works in python. So let us break it down and then arrive at a conclusion.

Let us access the data 'apple' in the above nested list.
First, at index 0 there is a list ['apple','orange'] and at index 1 there is another list ['carrot','potato']. Hence z[0] should give us the first list which contains 'apple' and 'orange'. From this list we can take the second element (index 1) to get 'orange'

In [116]:
print(z[0][1])

orange


Lists do not have to be homogenous. Each element can be of a different type:

In [117]:
["this is a valid list",2,3.6,(1+2j),["a","sublist"]]

['this is a valid list', 2, 3.6, (1+2j), ['a', 'sublist']]

### Slicing

Indexing was only limited to accessing a single element, Slicing on the other hand is accessing a sequence of data inside the list. In other words "slicing" the list.

Slicing is done by defining the index values of the first element and the last element from the parent list that is required in the sliced list. It is written as parentlist[ a : b ] where a,b are the index values from the parent list. If a or b is not defined then the index value is considered to be the first value for a if a is not defined and the last value for b when b is not defined.

In [118]:
num = [0,1,2,3,4,5,6,7,8,9]
print(num[0:4])
print(num[4:])

[0, 1, 2, 3]
[4, 5, 6, 7, 8, 9]


You can also slice a parent list with a fixed length or step length.

In [119]:
num[:9:3]

[0, 3, 6]

### Built in List Functions

To find the length of the list or the number of elements in a list, **len( )** is used.

In [120]:
len(num)

10

If the list consists of all integer elements then **min( )** and **max( )** gives the minimum and maximum value in the list. Similarly **sum** is the sum

In [121]:
print("min =",min(num),"  max =",max(num),"  total =",sum(num))

min = 0   max = 9   total = 45


In [122]:
max(num)

9

Lists can be concatenated by adding, '+' them. The resultant list will contain all the elements of the lists that were added. The resultant list will not be a nested list.

In [123]:
[1,2,3] + [5,4,7]

[1, 2, 3, 5, 4, 7]

In [124]:
[1,2,3]*2

[1, 2, 3, 1, 2, 3]

There might arise a requirement where you might need to check if a particular element is there in a predefined list. Consider the below list.

In [125]:
names = ['Earth','Air','Fire','Water']

To check if 'Fire' and 'Rajath' is present in the list names. A conventional approach would be to use a for loop and iterate over the list and use the if condition. But in python you can use 'a in b' concept which would return 'True' if a is present in b and 'False' if not.

In [126]:
'Fire' in names

True

In [127]:
'Space' in names

False

In a list with string elements, **max( )** and **min( )** are still applicable and return the first/last element in lexicographical order. 

In [128]:
mlist = ['bzaa','ds','nc','az','z','klm']
print("max =",max(mlist))
print("min =",min(mlist))

max = z
min = az


Here the first index of each element is considered and thus z has the highest ASCII value thus it is returned and minimum ASCII is a. But what if numbers are declared as strings?

In [129]:
nlist = ['1','94','93','1000']
print("max =",max(nlist))
print('min =',min(nlist))

max = 94
min = 1


Even if the numbers are declared in a string the first index of each element is considered and the maximum and minimum values are returned accordingly.

But if you want to find the **max( )** string element based on the length of the string then another parameter `key` can be used to specify the function to use for generating the value on which to sort. Hence finding the longest and shortest string in `mlist` can be doen using the `len` function:

In [130]:
print('longest =',max(mlist, key=len))
print('shortest =',min(mlist, key=len))

longest = bzaa
shortest = z


Any other built-in or user defined function can be used.

A string can be converted into a list by using the **list()** function, or more usefully using the **split()** method, which breaks strings up based on spaces.

In [131]:
print(list('hello world !'),'Hello   World !!'.split())

s="hola adios"
s.split()
? s.split

['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', ' ', '!'] ['Hello', 'World', '!!']


**append( )** is used to add a single element at the end of the list.

In [132]:
lst = [1,1,4,8,7]
lst.append(1)
print(lst)

[1, 1, 4, 8, 7, 1]


Appending a list to a list would create a sublist. If a nested list is not what is desired then the **extend( )** function can be used.

In [133]:
lst.extend([10,11,12])
print(lst)

[1, 1, 4, 8, 7, 1, 10, 11, 12]


**count( )** is used to count the number of a particular element that is present in the list. 

In [134]:
lst.count(1)

3

**index( )** is used to find the index value of a particular element. Note that if there are multiple elements of the same value then the first index value of that element is returned.

In [135]:
lst.index(1)

0

**insert(x,y)** is used to insert a element y at a specified index value x. **append( )** function made it only possible to insert at the end. 

In [136]:
lst.insert(5, 'name')
print(lst)

[1, 1, 4, 8, 7, 'name', 1, 10, 11, 12]


**insert(x,y)** inserts but does not replace element. If you want to replace the element with another element you simply assign the value to that particular index.

In [137]:
lst[5] = 'Python'
print(lst)

[1, 1, 4, 8, 7, 'Python', 1, 10, 11, 12]


**pop( )** function return the last element in the list. This is similar to the operation of a stack. Hence it wouldn't be wrong to tell that lists can be used as a stack.

In [138]:
lst.pop()

12

Index value can be specified to pop a ceratin element corresponding to that index value.

In [139]:
lst.pop(0)

1

**pop( )** is used to remove element based on it's index value which can be assigned to a variable. One can also remove element by specifying the element itself using the **remove( )** function.

In [140]:
lst.remove('Python')
print(lst)

[1, 4, 8, 7, 1, 10, 11]


Alternative to **remove** function but with using index value is **del**

In [141]:
del lst[1]
print(lst)

[1, 8, 7, 1, 10, 11]


The entire elements present in the list can be reversed by using the **reverse()** function.

In [142]:
lst.reverse()
print(lst)

[11, 10, 1, 7, 8, 1]


Python offers built in operation **sort( )** to arrange the elements in ascending order. Alternatively **sorted()** can be used to construct a copy of the list in sorted order

In [143]:
lst.sort()
print(lst)
print(sorted([3,2,1])) # another way to sort

[1, 1, 7, 8, 10, 11]
[1, 2, 3]


For descending order, By default the reverse condition will be False for reverse. Hence changing it to True would arrange the elements in descending order.

In [144]:
lst.sort(reverse=True)
print(lst)

[11, 10, 8, 7, 1, 1]


Similarly for lists containing string elements, **sort( )** would sort the elements based on it's ASCII value in ascending and by specifying reverse=True in descending.

In [145]:
names.sort()
print(names)
names.sort(reverse=True)
print(names)

['Air', 'Earth', 'Fire', 'Water']
['Water', 'Fire', 'Earth', 'Air']


To sort based on length key=len should be specified as shown.

In [146]:
names.sort(key=len)
print(names)
print(sorted(names,key=len,reverse=True))

['Air', 'Fire', 'Water', 'Earth']
['Water', 'Earth', 'Fire', 'Air']


### Copying a list

Assignment of a list does not imply copying. It simply creates a second reference to the same list. Most of new python programmers get caught out by this initially. Consider the following,

In [147]:
lista= [2,1,4,3]
listb = lista
print(listb)

[2, 1, 4, 3]


Here, We have declared a list, lista = [2,1,4,3]. This list is copied to listb by assigning it's value and it get's copied as seen. Now we perform some random operations on lista.

In [148]:
lista.sort()
lista.pop()
lista.append(9)
print("A =",lista)
print("B =",listb)

A = [1, 2, 3, 9]
B = [1, 2, 3, 9]


listb has also changed though no operation has been performed on it. This is because you have assigned the same memory space of lista to listb. So how do fix this?

If you recall, in slicing we had seen that parentlist[a:b] returns a list from parent list with start index a and end index b and if a and b is not mentioned then by default it considers the first and last element. We use the same concept here. By doing so, we are assigning the data of lista to listb as a variable.

In [149]:
lista = [2,1,4,3]
listb = lista[:] # make a copy by taking a slice from beginning to end
print("Starting with:")
print("A =",lista)
print("B =",listb)
lista.sort()
lista.pop()
lista.append(9)
print("Finnished with:")
print("A =",lista)
print("B =",listb)

Starting with:
A = [2, 1, 4, 3]
B = [2, 1, 4, 3]
Finnished with:
A = [1, 2, 3, 9]
B = [2, 1, 4, 3]


## List comprehension
A very powerful concept in Python (that also applies to Tuples, sets and dictionaries as we will see below), is the ability to define lists using list comprehension (looping) expression. For example:

In [150]:
[i**2 for i in [1,2,3]]

[i**2 for i in range(1000)]

[0,
 1,
 4,
 9,
 16,
 25,
 36,
 49,
 64,
 81,
 100,
 121,
 144,
 169,
 196,
 225,
 256,
 289,
 324,
 361,
 400,
 441,
 484,
 529,
 576,
 625,
 676,
 729,
 784,
 841,
 900,
 961,
 1024,
 1089,
 1156,
 1225,
 1296,
 1369,
 1444,
 1521,
 1600,
 1681,
 1764,
 1849,
 1936,
 2025,
 2116,
 2209,
 2304,
 2401,
 2500,
 2601,
 2704,
 2809,
 2916,
 3025,
 3136,
 3249,
 3364,
 3481,
 3600,
 3721,
 3844,
 3969,
 4096,
 4225,
 4356,
 4489,
 4624,
 4761,
 4900,
 5041,
 5184,
 5329,
 5476,
 5625,
 5776,
 5929,
 6084,
 6241,
 6400,
 6561,
 6724,
 6889,
 7056,
 7225,
 7396,
 7569,
 7744,
 7921,
 8100,
 8281,
 8464,
 8649,
 8836,
 9025,
 9216,
 9409,
 9604,
 9801,
 10000,
 10201,
 10404,
 10609,
 10816,
 11025,
 11236,
 11449,
 11664,
 11881,
 12100,
 12321,
 12544,
 12769,
 12996,
 13225,
 13456,
 13689,
 13924,
 14161,
 14400,
 14641,
 14884,
 15129,
 15376,
 15625,
 15876,
 16129,
 16384,
 16641,
 16900,
 17161,
 17424,
 17689,
 17956,
 18225,
 18496,
 18769,
 19044,
 19321,
 19600,
 19881,
 20164,
 2

As can be seen this constructs a new list by taking each element of the original `[1,2,3]` and squaring it. We can have multiple such implied loops to get for example:

In [151]:
[10*i+j for i in [1,2,3] for j in [5,7]]

[15, 17, 25, 27, 35, 37]

Finally the looping can be filtered using an **if** expression with the **for** - **in** construct.

In [152]:
[10*i+j for i in [1,2,3] if i%2==1 for j in [4,5,7] if j >= i+4] # keep odd i and  j larger than i+3 only

[15, 17, 37]

## Tuples

Tuples are similar to lists but only big difference is the elements inside a list can be changed but in tuple it cannot be changed. Think of tuples as something which has to be True for a particular something and cannot be True for no other values. For better understanding, Recall **divmod()** function.

In [153]:
xyz = divmod(10,3)
print(xyz)
print(type(xyz))

(3, 1)
<class 'tuple'>


Here the quotient has to be 3 and the remainder has to be 1. These values cannot be changed whatsoever when 10 is divided by 3. Hence divmod returns these values in a tuple.

To define a tuple, A variable is assigned to paranthesis ( ) or tuple( ).

In [154]:
tup = ()
tup2 = tuple()
tup3 = (1,2,"hola")
tup4 = tuple((4,5,"adios"))
print(tup,tup2,tup3,tup4)

() () (1, 2, 'hola') (4, 5, 'adios')


If you want to directly declare a tuple it can be done by using a comma at the end of the data.

In [155]:
27,

(27,)

27 when multiplied by 2 yields 54, But when multiplied with a tuple the data is repeated twice.

In [156]:
2*(27,)

(27, 27)

In [157]:
a,b,c,d = 1,2,3,4
b

2

Values can be assigned while declaring a tuple. It takes a list as input and converts it into a tuple or it takes a string and converts it into a tuple.

In [158]:
tup3 = tuple([1,2,3])
print(tup3)
tup4 = tuple('Hello')
print(tup4)

(1, 2, 3)
('H', 'e', 'l', 'l', 'o')


It follows the same indexing and slicing as Lists.

In [159]:
print(tup3[1])
tup5 = tup4[:3]
print(tup5)

2
('H', 'e', 'l')


### Mapping one tuple to another
Tupples can be used as the left hand side of assignments and are matched to the correct right hand side elements - assuming they have the right length

In [160]:
(a,b,c)= ('alpha','beta','gamma') # are optional
a,b,c= 'alpha','beta','gamma' # The same as the above
print(a,b,c)
a,b,c = ['Alpha','Beta','Gamma'] # can assign lists
print(a,b,c)
[a,b,c]=('this','is','ok') # even this is OK
print(a,b,c)

alpha beta gamma
Alpha Beta Gamma
this is ok


More complex nexted unpackings of values are also possible

In [161]:
(w,(x,y),z)=(1,(2,3),4)
print(w,x,y,z)
(w,xy,z)=(1,(2,3),4)
print(w,xy,z) # notice that xy is now a tuple

1 2 3 4
1 (2, 3) 4


### Built In Tuple functions

**count()** function counts the number of specified element that is present in the tuple.

In [162]:
d=tuple('a string with many "a"s')
d.count('a')

3

**index()** function returns the index of the specified element. If the elements are more than one then the index of the first element of that specified element is returned

In [163]:
d.index('a')

0

## Sets

Sets are mainly used to eliminate repeated numbers in a sequence/list. It is also used to perform some standard set operations.

Sets are declared as set() which will initialize a empty set. Also `set([sequence])` can be executed to declare a set with elements

In [164]:
set1 = set()
print(type(set1))

<class 'set'>


In [165]:
set0 = set([1,2,2,3,3,4])
# No aporta nada set0 = {1,2,2,3,3,4} # equivalent to the above
print(set0)

{1, 2, 3, 4}


elements 2,3 which are repeated twice are seen only once. Thus in a set each element is distinct.

However be warned that **{}** is **NOT** a set, but a dictionary (see next chapter of this tutorial)

In [166]:
type({})

dict

### Built-in Functions

In [167]:
set1 = set([1,2,3])

In [168]:
set2 = set([2,3,4,5])

**union( )** function returns a set which contains all the elements of both the sets without repition.

In [169]:
set1.union(set2)

{1, 2, 3, 4, 5}

**add( )** will add a particular element into the set. Note that the index of the newly added element is arbitrary and can be placed anywhere not neccessarily in the end.

In [170]:
set1.add(0)
set1

{0, 1, 2, 3}

**intersection( )** function outputs a set which contains all the elements that are in both sets.

In [171]:
set1.intersection(set2)

{2, 3}

**difference( )** function ouptuts a set which contains elements that are in set1 and not in set2.

In [172]:
set1.difference(set2)

{0, 1}

**symmetric_difference( )** function ouputs a function which contains elements that are in one of the sets.

In [173]:
set2.symmetric_difference(set1)

{0, 1, 4, 5}

**issubset( ), isdisjoint( ), issuperset( )** is used to check if the set1/set2 is a subset, disjoint or superset of set2/set1 respectively.

In [174]:
set1.issubset(set2)

False

In [175]:
set2.isdisjoint(set1)

False

In [176]:
set2.issuperset(set1)

False

**pop( )** is used to remove an arbitrary element in the set

In [177]:
set1.pop()
print(set1)

{1, 2, 3}


**remove( )** function deletes the specified element from the set.

In [178]:
set1.remove(2)
set1

{1, 3}

**clear( )** is used to clear all the elements and make that set an empty set.

In [179]:
set1.clear()
set1


set()

## Strings as tuples

Strings have already been discussed in Chapter 02, but can also be treated as collections similar to lists and tuples.
For example

In [180]:
S = 'Taj Mahal is beautiful'
print([x for x in S if x.islower()]) # list of lower case charactes
words=S.split() # list of words
print("Words are:",words)
print("--".join(words)) # hyphenated 
" ".join(w.capitalize() for w in words) # capitalise words

['a', 'j', 'a', 'h', 'a', 'l', 'i', 's', 'b', 'e', 'a', 'u', 't', 'i', 'f', 'u', 'l']
Words are: ['Taj', 'Mahal', 'is', 'beautiful']
Taj--Mahal--is--beautiful


'Taj Mahal Is Beautiful'

String Indexing and Slicing are similar to Lists which was explained in detail earlier.

In [181]:
print(S[4])
print(S[4:])

M
Mahal is beautiful


## Dictionaries

Dictionaries are mappings between keys and items stored in the dictionaries. Alternatively one can think of dictionaries as sets in which something stored against every element of the set. They can be defined as follows:

To define a dictionary, equate a variable to { } or dict()

In [182]:
d = dict() # or equivalently d={}
print(type(d))
d['abc'] = 3
d[4] = "A string"
print(d)

<class 'dict'>
{'abc': 3, 4: 'A string'}


As can be guessed from the output above. Dictionaries can be defined by using the `{ key : value }` syntax. The following *spanish/italian* dictionary has four elements

In [183]:
d = { 'uno': 'uno', 'dos' : 'due', 'tres' : 'tre', 'cien' : 'cento'}
len(d)

4

Now you are able to access *value* 'due' by the *index* 'dos'

In [184]:
print(d['dos'])

due


There are a number of alternative ways for specifying a dictionary including as a list of `(key,value)` tuples.
To illustrate this we will start with two lists and form a set of tuples from them using the **zip()** function
Two lists which are related can be merged to form a dictionary.

In [185]:
names_es = ['Uno', 'Dos', 'Tres', 'Cuatro', 'Cinco']
names_it = ['Uno', 'Due', 'Tre', 'Quattro', 'Cinque']
[ (es,it) for es,it in zip(names_es,names_it)] # create (es,it) pairs


[('Uno', 'Uno'),
 ('Dos', 'Due'),
 ('Tres', 'Tre'),
 ('Cuatro', 'Quattro'),
 ('Cinco', 'Cinque')]

Now we can create a dictionary that maps the name to the number as follows.

In [186]:
dict_es_it = dict( (es,it) for es,it in zip(names_es,names_it) )
dict_es_it

{'Cinco': 'Cinque',
 'Cuatro': 'Quattro',
 'Dos': 'Due',
 'Tres': 'Tre',
 'Uno': 'Uno'}

Note that the ordering for this dictionary is not based on the order in which elements are added but on its own ordering (based on hash index ordering). It is best never to assume an ordering when iterating over elements of a dictionary.

By using tuples as indexes we can make a dictionary behave like a sparse matrix:

In [187]:
matrix={ (0,1): 3.5, (2,17): 0.1}
matrix[2,2] = matrix[0,1] + matrix[2,17]
print(matrix)

{(0, 1): 3.5, (2, 17): 0.1, (2, 2): 3.6}


Dictionary can also be built using the loop style definition.

In [188]:
number_of_chars = { it : len(it) for it in names_it}
print(number_of_chars)

{'Tre': 3, 'Quattro': 7, 'Uno': 3, 'Cinque': 6, 'Due': 3}


### Built-in Functions

The **len()** function and **in** operator have the obvious meaning:

In [189]:
print("Dictionary has",len(dict_es_it), "elements.")

print("'Dos' is in dictionary?", 
      'Dos' in dict_es_it, 
      "    But 'Cien' is in dictionary?", 
      'Cien' in dict_es_it)

Dictionary has 5 elements.
'Dos' is in dictionary? True     But 'Cien' is in dictionary? False


**clear( )** function is used to erase all elements.

In [190]:
number_of_chars.clear()
print(number_of_chars)

{}


**values( )** function returns a list with all the assigned values in the dictionary. (Acutally not quit a list, but something that we can iterate over just like a list to construct a list, tuple or any other collection):

In [191]:
[ v for v in dict_es_it.values() ]

['Quattro', 'Cinque', 'Uno', 'Tre', 'Due']

**keys( )** function returns all the index or the keys to which contains the values that it was assigned to.

In [192]:
{ k for k in dict_es_it.keys() }

{'Cinco', 'Cuatro', 'Dos', 'Tres', 'Uno'}

**items( )** is returns a list containing both the list but each element in the dictionary is inside a tuple. This is same as the result that was obtained when zip function was used - except that the ordering has been 'shuffled' by the dictionary.

In [193]:
",  ".join( "%s = %s" % (es,it) for es, it in dict_es_it.items())

'Cuatro = Quattro,  Cinco = Cinque,  Uno = Uno,  Tres = Tre,  Dos = Due'

**pop( )** function is used to get the remove that particular element and this removed element can be assigned to a new variable. But remember only the value is stored and not the key. Because the is just a index value.

In [194]:
val = dict_es_it.pop('Cuatro')
print(dict_es_it)
print("Removed %s!" % val)

{'Cinco': 'Cinque', 'Uno': 'Uno', 'Tres': 'Tre', 'Dos': 'Due'}
Removed Quattro!


In [195]:
dict_es_it

{'Cinco': 'Cinque', 'Dos': 'Due', 'Tres': 'Tre', 'Uno': 'Uno'}

In [197]:
dict_it_es = { dict_es_it[es] : es for es in dict_es_it.keys()}
dict_it_es["Tre"]

'Tres'

In [205]:
lista1=[1,2,3,4]
lista2=['a','b','c','d']
[elemento for elemento in zip(lista1,lista2)]
dict(zip(lista1,lista2))

{1: 'a', 2: 'b', 3: 'c', 4: 'd'}