### (Collection) Data Types

So far we have learned about simple data types like integers or strings. There is another class of data types that allows you to store multiple values of varying data types.

We will first introduce the different types and then discuss their advantages and disadvantages.


#### List

A list is a **ordered**, **changeable** collection data type that allows you to store **duplicated** values.


In [None]:
# All values of the same data type
thislist = ['1', '2', '3'] 
print(thislist)
print('--------')

# Different data types
thislist = ['1', 2.0, 3] 
print(thislist)
print('--------')

# Change the list after creation    --> changeable
thislist.append('last element') 
print(thislist)
print('--------')

# Append the same value to the list         --> duplication allowed 
thislist.append('last element') 
print(thislist)


Lists can be accessed by using the position (index) of the element

In [None]:
thislist = ['1', '2', '3', 'last element', 'last element'] 
# Get the first and fourth element
print(thislist[0])
print(thislist[3])
print('--------')

# Get the last element
print(thislist[-1])
print('--------')

# slice and dice
print(thislist[1:3])
print('--------')

# Reverse the list --> Not very intuitive but powerful

print(thislist[::-1])
print('--------')

# Get every second element in the list 

print(thislist[::2])
print('--------')

# length of the list

print(len(thislist))
print('--------')

# Remove element

thislist.remove('last element') # --> Only removes the first element that matches
print(thislist)
print('--------')

# sort list
otherlist = [2,4,3,1]
print(sorted(otherlist))
print('--------')

# Join two lists

print(thislist + otherlist)


#### Tuples
Tuples are similar to lists in the sense that they can store multiple values of different types.

A tuple is a **ordered**, **unchangeable** collection data type that does allow you to store **duplicate** values.

#### Tuple - Exercise:

- Try the same functions as for the list. What works, what doesnt?

In [None]:
thistuple = ('1', '3', 4) # Same as for lists but with round brackets

#### Sets
Sets are similar to lists in the sense that they can store multiple values of different types.

A set is an **unordered**, **changeable** collection data type that does **not** allow you to store **duplicate** values.

In [None]:
thisset = {'1', '3', 4} # Compare brackets to tuples and lists
print(thisset) # Order might change as a tuple is unordered
print('--------')

# add (not append) a new value to the set. append is for ordered entities
thisset.add(5)
print(thisset)
print('--------')

# Add the same value again
thisset.add(5)
print(thisset) # --> Nothing changed. No duplicates allowed
print('--------')

# Get first element
# thisset[0]    
# Just joking, the set is unordered, there is no first or last :)
# Access to set will be explained once we re getting to loops.


#check if a variable is part of the set
print('1'  in thisset)  
print('-1'  in thisset)
print('--------')

# Fancy methods (difference, union, intersection,...) --> Very fast
thisset2 = {'1', 2, 3}
print(thisset.intersection(thisset2))


#### Dictionaries


A dictionary is an **unordered**, **changeable** collection data type that does **not** allow you to store **duplicate** values. The main difference to sets is that you can store `key:value` pairs. In a way this is a simple `table`.

In [None]:
import pprint # Just a module which can be used to do pretty printing

thisdict = {
  'Hans': 'Germany',
  'Carine': 'Brazil',
  'Fabienne': 'Switzerland'
}
pprint.pprint(thisdict) # each key (e.g. Hans) has a value e.g. Germany
print('--------')


# Values can be anything
thisdict['Viola'] = ['Germany', 'Hungary']
pprint.pprint(thisdict)
print('--------')


# When Hans changes his passport he loses the german citizenship
thisdict['Hans'] = 'Sweden' # Duplication not allowed
pprint.pprint(thisdict)
print('--------')


# Accession to dictionary usually means that you want the value (You know the key already)
pprint.pprint(thisdict['Fabienne'])
print('--------')

# You can also get all keys or all values
pprint.pprint(thisdict.keys())
pprint.pprint(thisdict.values())
print('--------')


# Is Dominic also in the dictionary?
print('Dominic' in thisdict)

#### Why so many data types?

Every collection data type has advantages and disadvantages:

- A list is indexed. You have very fast access to any value when accessed by index
- A set does deduplication for you. Also checking whether a value is in a set by value is really fast
- A dictionary is basically a set where you can have more values assigned to a single key

Lets check performance

In [None]:
# Check the performance of the in statement
import timeit
import random

print(timeit.timeit('0 in biglist', 'import random; biglist = [random.randint(0,1000) for x in range(1000)]', number=1000000))
print(timeit.timeit('0 in bigset', 'import random; bigset = set([random.randint(0,1000) for x in range(1000)])', number=1000000))



In [None]:
# Check the performance of the in statement versus index access
import timeit
import random

print(timeit.timeit('biglist[4]', 'import random; biglist = [random.randint(0,1000) for x in range(1000)]', number=1000000))
print(timeit.timeit('0 in bigset', 'import random; bigset = set([random.randint(0,1000) for x in range(1000)])', number=1000000))




- Lists and tuples (Ordered) are very fast with positional access but slow when searching for values.
- Dictionaries and Sets (Unordered) dont have positional access but are very slow with random access searching for values

#### Advanced

These data types will be enough for like 99% of the "non-table" work you will do in Python. Some extensions of these data types are implemented in the `collections` module (https://pymotw.com/3/collections/index.html). 