# B06: Basic Data Structures Part 2

We spent quite a bit of time looking at lists and methods. You'll be pleased to hear that a lot of the principles that you learned for lists are applicable to other data structures in Python, including the more advanced analytical data structures we'll meet later on.

## Dictionaries

Next we're going to look at Dictionaries (also called Dicts). Enclosed within curly brackets {} They are a type of 'mapping' and are made up of a key and a value as follows:

In [None]:
# Creating a dictionary
example = {
    'key1':'value1',
    'key2':'value2',
    'key3':'value3'
}
example

If you've got some experience in web development, in particular with [JSON](http://www.w3schools.com/json) then the format of dictionaries should look familiar to you. Let's see what else we can do with dictionaries...

In [2]:
# Basic dictionary
dict1 = {
    'Alex':27,
    'Tom':36,
    'Samantha':44,
    'Gill':51,
    'Barry':53
}     
dict1

{'Alex': 27, 'Tom': 36, 'Samantha': 44, 'Gill': 51, 'Barry': 53}

We can use the key as an index to call values:

In [3]:
dict1['Tom']

36

We can also add values to our dictionary as follows:

In [4]:
dict1['Ronnie'] = 15
dict1

{'Alex': 27, 'Tom': 36, 'Samantha': 44, 'Gill': 51, 'Barry': 53, 'Ronnie': 15}

Dictionaries also have a set of pre-defined methods for you to use. As before simply type the reference of a dictionary followed by a . and then press tab to bring up a list of available methods.

In [5]:
dict1.

SyntaxError: invalid syntax (<ipython-input-5-880fd3cb10da>, line 1)

Some useful methods are as follows:

In [6]:
dict1.keys()  # Returns all the keys

dict_keys(['Alex', 'Tom', 'Samantha', 'Gill', 'Barry', 'Ronnie'])

In [7]:
dict1.values()  # Returns all the values


dict_values([27, 36, 44, 51, 53, 15])

In [8]:
dict1.items() # Returns a list of tuples containing keys and values

dict_items([('Alex', 27), ('Tom', 36), ('Samantha', 44), ('Gill', 51), ('Barry', 53), ('Ronnie', 15)])

In [9]:
dict2 = {'Reggie':15}
dict1.update(dict2)     # Updates the dict with the contents of another dict 
dict1

{'Alex': 27,
 'Tom': 36,
 'Samantha': 44,
 'Gill': 51,
 'Barry': 53,
 'Ronnie': 15,
 'Reggie': 15}

## Nesting and Data Structures

Previously we mentioned that data structures can be used to store other data structures. This is called 'nesting' and can appear a bit daunting at first. However you'll soon see that this is a very powerful way to store data and coupled with our newfound skills in slicing and indexing, we'll have no problem at all dealing with it! Let's imaging we're opening a restaurant and we want to store our menu items in some Python data structures...

In [10]:
steak = ['Rump','Sirloin','Fillet']                                    # Steak Menu
pizza = ['Margarita','Napoli',['Single Pepperoni','Double Pepperoni']] # Pizza Menu
burger = ['Chicken',['Regular','Cheese','Special'],'Vegetarian']       # Burger Menu
salad = [None]                                                         # Salad Menu

We can access the various nested levels of these lists by specifying multiple indexes:

In [11]:
pizza[0]     # First index in the pizza menu

'Margarita'

In [12]:
pizza[2]     # Third index in the pizza menu

['Single Pepperoni', 'Double Pepperoni']

In [None]:
pizza[2][0] # Third index in the pizza menu, First index of the nested list (pepperoni types)

In [13]:
pizza[2][1] # Third index in the pizza menu, Second index of the nested list (pepperoni types)

'Double Pepperoni'

If we wanted to store and access all this data from a single data item, we could use a dictionary:

In [14]:
menu = {
    'Steak':steak,
    'Pizza':pizza,
    'Burger':burger,
    'Salad':salad
}
menu

{'Steak': ['Rump', 'Sirloin', 'Fillet'],
 'Pizza': ['Margarita', 'Napoli', ['Single Pepperoni', 'Double Pepperoni']],
 'Burger': ['Chicken', ['Regular', 'Cheese', 'Special'], 'Vegetarian'],
 'Salad': [None]}

Similar to lists, we can access various levels of the index as follows:

In [15]:
menu['Burger']      # Return whole the burger menu

['Chicken', ['Regular', 'Cheese', 'Special'], 'Vegetarian']

In [16]:
menu['Burger'][0]   # Return the first item of the burger menu

'Chicken'

In [17]:
menu['Burger'][1]   # Return the second item of the burger menu

['Regular', 'Cheese', 'Special']

In [18]:
menu['Burger'][1][1] # Return the second item of the burger menu, and the second item in nested data structure

'Cheese'

In [19]:
menu['Pizza'][2][0] # Return the third item of the pizza menu, and the first item in nested data structure

'Single Pepperoni'

This is a relatively simple example but demonstrates the power and versatility of Python's basic data structures. Let's now meet Tuples!

## Tuples

A tuple is similar to a list except that it is enclosed within curved brackets:

In [20]:
blank = ()                        # Creating a blank tuple
mytuple = (1,2,3,4,5,6,7,8,9,10)  # Creating a tuple with integers
print(type(mytuple),mytuple)

<class 'tuple'> (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)


There is one other important difference... Tuples are immutable. This means that they cannot be changed and makes them useful in situtations where you can be certain that data will not change (e.g. date of birth). They are also considered more efficient to process than lists but this will likely not be notocable until using masses of data!

Again, we can bring up a list of available methods by specifying our tuple followed by a . and then pressing tab:

In [None]:
mytuple.

You'll notice that there are only two methods available for tuples. This is because as they are immutable, there are no methods to change the data or structure of a tuple (e.g. append, sort etc.). For this reason you'll not be likely to use Tuples as much as dictionaries or lists.

## Sets

A Set is an unordered collection of unique elements and they can be created using the set() function as follows:

In [36]:
myset1 = set()    # Creating a blank set
myset1.add(1)     # Adding a value to our set
print(type(myset1),myset1)

<class 'set'> {1}


We can see that the set is displayed in curly brackets, so don't confuse it with a dictionary! Sets are their own type of object.

As with the other data structures we can see a list of available methods as follows:

In [37]:
myset1

{1}

You'll see that a lot of the methods are to do with comparing sets (union, difference, intersection etc.) and this is because one of the primary uses of sets is comparing two unique 'sets' of data to see which values intersect, differ etc.

For now, we'll add some more values to our set:

In [22]:
myset1.add(2)
myset1.add(4)
myset1.add(6)
myset1

{1, 2, 4, 6}

What happens when we try and add a value that's already in our set?

In [23]:
myset1.add(2)
myset1

{1, 2, 4, 6}

Nothing! That's because sets can only contain unique elements. Let's create a second set to do some comparisons with:

In [24]:
myset2 = set()
myset2.add(2)
myset2.add(4)
myset2.add(5)
myset2.add(6)
myset2

{2, 4, 5, 6}

In [25]:
myset1.difference(myset2)     # Returning the values that differ between two sets

{1}

In [26]:
myset1.intersection(myset2)   # Returning the values that are the same between two sets

{2, 4, 6}

In [27]:
myset1.union(myset2)         # Returning the union (values that are in either set) of the two sets

{1, 2, 4, 5, 6}

There is more to sets but they are probably the least used of the data structures in Python so we'll not be devoting more time to them here. However feel free to explore them further on your own!

## Converting Between Data Structures

When using Python for data analysis, you'll find that you spend a lot of time converting data between various data structures. This is because you'll meet a lot of functions in Python Extensions/Modules/Libraries etc. that require data to be supplied in a specific format.

This means that being able to convert between data structures is a <b>VERY</b> important skill! Let's start by going back to lists and looking at the ways in which we can convert them to other data structures.

## Converting Lists

In [44]:
beatles = ['John','Paul','George','Ringo', "Brian"]
beatles_tuple = tuple(beatles)            # Converting a list to a tuple
print(type(beatles_tuple),beatles_tuple)

<class 'tuple'> ('John', 'Paul', 'George', 'Ringo', 'Brian')


In [29]:
beatles_set = set(beatles)               # Converting a list to a set
print(type(beatles_set),beatles_set)

<class 'set'> {'Ringo', 'John', 'Paul', 'George'}


You'll remember that Dicts require key/values pairs so we'll create some key values as follows:

In [45]:
keys = [1,2,3,4]

In order to supply two sets of arguments to the dict() function we have to use a special function called zip(). We'll provide some more info on this really useful function later, but for now just rest easy in the knowledge that zip() is your friend and has done some work for you, and helps us when using 2 separate lists to create a dict:

In [47]:
beatles_dict = dict(zip(keys,beatles))
print(beatles_dict)
print(beatles)


{1: 'John', 2: 'Paul', 3: 'George', 4: 'Ringo'}
['John', 'Paul', 'George', 'Ringo', 'Brian']


In [43]:
help.zip()

AttributeError: '_Helper' object has no attribute 'zip'

## Converting Tuples

In [32]:
turtles = ('Leonardo','Raphael','Donatello','Michaelangelo')
turtles_list = list(turtles)              # Converting a tuple to a list
print(type(turtles_list),turtles_list)

<class 'list'> ['Leonardo', 'Raphael', 'Donatello', 'Michaelangelo']


In [33]:
turtles_set = set(turtles)              # Converting a tuple to a set
print(type(turtles_set),turtles_set)

<class 'set'> {'Donatello', 'Michaelangelo', 'Leonardo', 'Raphael'}


In [34]:
keys = [1,2,3,4]
turtles_dict = dict(zip(keys,turtles))
turtles_dict
print(type(turtles_dict),turtles_dict)

<class 'dict'> {1: 'Leonardo', 2: 'Raphael', 3: 'Donatello', 4: 'Michaelangelo'}


I'm sure by now you've noticed a pattern emerging! With a few simple functions we can easily convert between data structures:

```
list()
tuple()
set()
dict(zip(keys,data))
```

You can also use these functions to create data structures from individual variables as follows:

In [35]:
a = 123
b = 456
c = 'Hello'
d = 'World'


e = list([a,b,c,d])
print(type(e),e)

<class 'list'> [123, 456, 'Hello', 'World']


Remember that most functions in Python will have positional arguments so make sure you gather multiple data items in brackets so as not to confuse Python!

## Further Reading

[More on Sets](http://www.dotnetperls.com/set-python)

In [12]:
import pandas as pd
STPs = pd.read_csv(filepath_or_buffer='STP.csv')
dict_stp = { stp : 1/len(STPs.index) for stp in STPs['stp_id'].tolist() }

print(dict_stp)
print(1/42)

{'E54000005': 0.023809523809523808, 'E54000006': 0.023809523809523808, 'E54000007': 0.023809523809523808, 'E54000008': 0.023809523809523808, 'E54000009': 0.023809523809523808, 'E54000010': 0.023809523809523808, 'E54000011': 0.023809523809523808, 'E54000012': 0.023809523809523808, 'E54000013': 0.023809523809523808, 'E54000014': 0.023809523809523808, 'E54000015': 0.023809523809523808, 'E54000016': 0.023809523809523808, 'E54000017': 0.023809523809523808, 'E54000018': 0.023809523809523808, 'E54000019': 0.023809523809523808, 'E54000020': 0.023809523809523808, 'E54000021': 0.023809523809523808, 'E54000022': 0.023809523809523808, 'E54000023': 0.023809523809523808, 'E54000024': 0.023809523809523808, 'E54000025': 0.023809523809523808, 'E54000026': 0.023809523809523808, 'E54000027': 0.023809523809523808, 'E54000028': 0.023809523809523808, 'E54000029': 0.023809523809523808, 'E54000030': 0.023809523809523808, 'E54000031': 0.023809523809523808, 'E54000032': 0.023809523809523808, 'E54000033': 0.0238

In [8]:
help(pd.read_csv)

Help on function read_csv in module pandas.io.parsers:

read_csv(filepath_or_buffer: Union[str, pathlib.Path, IO[~AnyStr]], sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, cache_dates=True, iterator=False, chunksize=None, compression='infer', thousands=None, decimal: str = '.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, dialect=None, error_bad_lines=True, warn_bad_lines=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)
    Read a comma-separated values (csv) file in