# Overview of Data Structures

In this tutorial we will cover the most common python data structures (using data from the Fifa 2018 Players dataset). We will also look at some examples of using these structures as well as *when* they should be used.

<img src="http://res.cloudinary.com/dyd911kmh/image/upload/c_scale,f_auto,q_auto:best,w_700/v1512740202/Template_2_oxrskq.png">

## What is a Data Structure?

A data structure is a way of organizing and storing data so that it can be accessed and worked with efficiently. It defines the relationship between the data as well as the operations that may be performed on the data. There are various kinds of data structures that make it easier for data scientists and computer scientists to focus on the bigger picture of solving problems, not getting lost in the details of data description and access.

## Overview of the Types of Data Structures 

Data structures can be grouped into *Primitive Data Structures* and *Non-Primitive Data Structures*. 

## Primitive Data Structures

These are the most basic data structures. They are the building blocks for data manipulation and contain pure, simple values of data. Python has 4 primitive data types:

* Integer
* Float
* String
* Boolean

### Integer

You can use an integer to represent numeric data, more specifically, whole numbers from negative infinity to infinity. 

In [1]:
# Example of an integer:
x = 94
print(x)
type(x)

94


int

### Float

"Float" stands for "floating point number". You can use it for rational numbers, usually ending with a decimal figure.

In [2]:
# Example of a float:
x = 91.5
print(x)
type(x)

91.5


float

#### Working with integers and floats

Mathematical operations can be performed on integers and floats. Let's look at a few examples:

In [3]:
# Addition
print('94 + 89.325 = ', 94 + 89.325)

# Subtraction
print('93.15 - 92 = ', 93.15 - 92)

# Multipliction
print('92 x 89 = ', 92 * 89)

# Division
print('94 / 93 = ', 94 / 93)

# Power
print('92 x 92 = ', 92 ** 2)

# Modular
print('92 Modular 89 = ', 92 % 89)

94 + 89.325 =  183.325
93.15 - 92 =  1.1500000000000057
92 x 89 =  8188
94 / 93 =  1.010752688172043
92 x 92 =  8464
92 Modular 89 =  3


You might have noticed that when we perform operations with floats and integers together, the result is always a float.

We can also call some built-in python functions on a combination of integers and floats:

In [4]:
# Minimum
min(94, 93, 93.15, 89)

89

In [5]:
# Maximum
max(94, 93, 93.15, 89)

94

In [6]:
# Rounding 
round(93.15, 1)

93.2

### String

Strings are collections of letters, words or other characters. In Python, you can create them by enclosing a sequence of characters within a pair of single or double quotes. 

In [3]:
# Example of a string:
string_1 = 'Neymar'
print(string_1)
type(string_1)

Neymar


str

#### Working with strings

Strings can be sliced (i.e. substrings can be selected). In order to slice a string we use square brackets with the starting character index we want to slice from and the last character index we want to slice to (not included) separated by a semi-colon. If we only want one character then we simply put the character index value in square brackets.

NB: Python starts indexing from 0, therefore the first character starts at 0. Let's look at a few examples:

In [8]:
# Select 3rd character 
string_1[2]

'y'

In [9]:
# Select first 5 characters 
string_1[0:5]

'Neyma'

In [10]:
# Select characters 2 to 4 
string_1[1:4]

'eym'

In [11]:
# Select last character
string_1[-1]

'r'

In [12]:
# Select last 3 characters
# We input the negative of the number of characters from the end as the start index and input no end index
string_1[-3:]

'mar'

Here we will look at a few functions we can call on a string object.

In [13]:
# Determine length of a string - use built-in `len()` function
len(string_1)

6

We can use the `str.upper()` and `str.lower()` functions to make the string all upper or lowercase. The `str.upper()` and `str.lower()` functions make it easier to evaluate and compare strings by making case consistent throughout. That way if a user writes their name all lower case, we can still determine whether their name is in our database by checking it against our  version, which likely contains upper-case letters.

In [14]:
# Uppercase
string_1.upper()

'NEYMAR'

In [15]:
# Lowercase
string_1.lower()

'neymar'

Just as we can join strings together, we can also split strings up. To do this, we will use the `str.split()` function. The default parameter value is a space, but we can also pass in other characters to split by. This function will then return a list of the split up strings.

In [16]:
string_2 = 'Christiano Ronaldo'

# Split by space
string_2.split()

['Christiano', 'Ronaldo']

### Boolean

This built-in data type can take on the values `True` and `False`, which often makes them interchangeable with the integers 1 and 0. Booleans are useful in conditional and comparison expressions. 

In [17]:
# Example of a boolean:
boo = 5 > 2
print(boo)
type(boo)

True


bool

## Non-Primitive Data Structures

Non-primitive types are the sophisticated members of the data structure family. They don't just store a value, but rather a collection of values in various formats. They can be divided into:

* Lists
* Tuples
* Dictionaries
* Sets
* Arrays 

### Lists

Lists in Python are used to store a collection of items. They are mutable, which means that you can change their content without changing their identity. You can recognize lists by their square brackets [ ] that hold elements, separated by a comma. Lists are built into Python, you do not need to invoke them separately.

In [18]:
# Empty list
empty = []

# list of integers
list_1 = [93, 92, 48, 90, 95, 95, 96, 77, 89, 97]

# nested list
list_2 = ['Argentina', 30, [93, 92, 48, 90, 95, 95, 96, 77, 89, 97]]

#### Working with lists

Lists can be accessed by using square brackets with the starting item index we want to slice from and the last item index we want to slice to (not included), separated by a semi-colon. If we only want one character then we simply put the item index value in square brackets. The passed values should be integers. In summary:

* list[index]     - item at given index
* list[start:end] - items start through the end (but the end is not included!)
* list[start:]    - items start through the rest of the array
* list[:end]      - items from the beginning through the end (but the end is not included!)

NB: Python starts indexing from 0, therefore the first item starts at 0, second item at 1, ect. Let's look at a few examples:

In [19]:
# Select 3rd item 
list_1[2]

48

In [20]:
# Select first 5 items 
list_1[0:5]

[93, 92, 48, 90, 95]

In [21]:
# Select items 2 to 4 
list_1[1:4]

[92, 48, 90]

In [22]:
# Select last item
# We input the negative of the number of the nth last item 
list_1[-1]

97

To access items in a nested list we have to use multiple indices. For example:

In [23]:
# Select 3rd item in outside list and 5th item of inside list
list_2[2][4]

95

A useful thing to do with a list is to add one item to the list (at the end). This is done by using the list.append() function.

In [24]:
# Add one element to the end of a list
list_1.append(95)
list_1

[93, 92, 48, 90, 95, 95, 96, 77, 89, 97, 95]

### Tuple

Tuples are another standard sequence data type. The difference between tuples and lists is that tuples are immutable, which means once defined you cannot delete, add or edit any values inside it. 

This could be useful in situations where you want others to see the data, or perform operations separately on a copy of the data, but you do not want them to manipulate the actual data in your collection. You can recognize tuples by their round brackets ( ) which contain the elements, each separated by a comma. 

In [25]:
# Example of a tuple:
tuple_1 = (92, 75, 'Brazil', 85)
print(tuple_1)
type(tuple_1)

(92, 75, 'Brazil', 85)


tuple

#### Working with tuples

Tuples can be accessed the same way as lists - by slicing it. Let's look at one example:

In [26]:
# Select first 3 items 
tuple_1[0:3]

(92, 75, 'Brazil')

### Dictionaries

Dictionaries are associative arrays (also known as hash tables). Dictionaries are made up of key-value pairs. Each key in the dictionary is associated with (or mapped to) a value. Keys are separated from their respective values by a colon, the different items are separated by commas, and the whole dictionary is enclosed by curly braces. An empty dictionary without any items is written as just two curly braces { }. 

Keys are unique within a dictionary while values might not be. The values of a dictionary can be of any type, but the keys must be of an immutable data type such as strings, numbers, or tuples.

In [27]:
# Dictionary mapping names to ratings
dict_1 = {'Ronaldo': 94, 'Messi': 93, 'Neymar': 92, 'Bale': 89}
type(dict_1)

dict

#### Working with dictionaries

We can call the values of a dictionary by referencing the associated keys. You can think of the key as a type of index. Just as we use numeric indicies to retrieve elements of a string, list, or tuple, we use a dictionary's keys to retrieve its values.
The retrieval is one-way. You can get a value via its key, but you cannot get a key via its value.

##### Accessing values using keys

Because dictionaries offer key-value pairs for storing data, they can be important elements in your Python program. If we want to isolate Ronaldo from dict_1, we can do so by calling dict_1['Ronaldo']. Let's look at an example:

In [28]:
dict_1['Ronaldo']

94

##### Using functions to access elements

In addition to using keys to access values, we can also work with some functions:

* `dict.keys()` isolates keys
* `dict.values()` isolates values
* `dict.items()` returns items in a list format of (key, value) tuple pairs

In [29]:
dict_1.keys()

dict_keys(['Ronaldo', 'Messi', 'Neymar', 'Bale'])

In [30]:
dict_1.values()

dict_values([94, 93, 92, 89])

In [31]:
dict_1.items()

dict_items([('Ronaldo', 94), ('Messi', 93), ('Neymar', 92), ('Bale', 89)])

We receive output that places the keys/values/items within an iterable view object of a class. The keys are then shown within a list format.

### Set

A Set is a collection of distinct (unique) objects. These are useful to create lists that only hold unique values in the dataset. It is an unordered collection but a mutable one, this is very helpful when going through a very large dataset.

In [32]:
# Example of a set:
set_1 = {95, 'England', 81}
print(set_1)
type(set_1)

{81, 'England', 95}


set

#### Working with sets

We can add single items using the `add()` function:

In [33]:
# Add one item
set_1.add(96)
set_1

{81, 95, 96, 'England'}

What happens when we try to add an existing element to our current set? Let's find out!

In [34]:
# Add duplicate item
set_1.add(95)
set_1

{81, 95, 96, 'England'}

Nothing changes! This is because the item already exists in that particular set.

### Array

An array is a data structure that stores values of the same data type. In Python, this is the main difference between arrays and lists. While Python lists can contain values corresponding to different data types, arrays in Python can only contain values corresponding to the same data type.

We will mainly look at Numpy Arrays and Pandas Dataframes since they are very efficient for working with multi-dimentional data. 

In [35]:
# Example of a numpy array

# first we need to import the numpy library
import numpy as np 

x = np.array([[85, 1.85], [72, 1.68]])
print(x)
type(x)

[[85.    1.85]
 [72.    1.68]]


numpy.ndarray

In [36]:
# Example of a pandas dataframe

# first we need to import the pandas library
import pandas as pd

x = pd.DataFrame(data=np.array([[85, 1.85], [72, 1.68]]), columns=['weight', 'height'])
print(x)
type(x)

   weight  height
0    85.0    1.85
1    72.0    1.68


pandas.core.frame.DataFrame

## Exercises

In [6]:
var = 'Hello World'

#### Split `var` up into a list containing two strings

In [7]:
var.split()

['Hello', 'World']

In [10]:
tup = ('43', 9, 'Forty', 0)

#### Select the middle two entries of `tup`

In [11]:
tup[1:3]

(9, 'Forty')

In [13]:
dictionary = {'Dave': 100, 'Tristan': 84, 'James': 48, 'Sam': 66}

#### How can we retrieve Tristan's score?

In [14]:
dictionary['Tristan']

84

#### Replicate the Pandas dataframe below: 

In [24]:
import pandas as pd
import numpy as np

x = pd.DataFrame(data=np.array([[95, 23], [72, 27], [54, 29]]), columns=['Score', 'Age'])
print(x)
type(x)

   Score  Age
0     95   23
1     72   27
2     54   29


pandas.core.frame.DataFrame

We will cover how to work with arrays (Numpy Arrays and Pandas Dataframes) in a following tutorial.

That is all for this tutorial. In the next tutorial we will look at writing functions using our FIFA Players Data.