# Dictionaries

Dictionaries are one of the most important datatypes in Python, and is always being used even though we don't know it. 

Dictionaries can be thought of as the differences between numpy arrays and pandas dataframes. The benefits of dataframes over numpy arrays is that each column is actually given names rather than numbers. Specifically, dictionaries are a key-value pair, meaning that they link a key with a value. So consider we have a list of phone numbers

In [1]:
# These numbers are randomly generated
# Question for you, why are these numbers stored as strings not integers?
phone_numbers_list = ["0944 995 040", "0385 093 346", "0771 930 844", "0298 188 800"]

In [2]:
phone_numbers_list[0]

'0944 995 040'

As with numpy arrays the problem with storing them as a list is that we don't know what the 0-index, or 1-index or any other index is ment to represent. Instead we can attach names to each number via a dictionary

In [3]:
phone_numbers_dict = {"Andrei": "0944 995 040", "Pierre": "0385 093 346", 
                      "Natasha": "0771 930 844", "Maria": "0298 188 800"}

So rather than calling the 3-index person, we can instead call Maria

In [21]:
phone_numbers_list[3]

'0298 188 800'

In [22]:
phone_numbers_dict["Maria"]

'0298 188 800'

Essentially it is a generalised array, since in an array the key must be an integer but in a dictionary the key can be 'anything' (so long is it is hashable).

Now lets say that Andrei changed his number to `'0488 989 246'`. We can do this two ways

In [26]:
phone_numbers_dict = {"Andrei": "0944 995 040", "Pierre": "0385 093 346", 
                      "Natasha": "0771 930 844", "Maria": "0298 188 800"}

print("Before Change:", phone_numbers_dict["Andrei"])

phone_numbers_dict["Andrei"] = "0488 989 246"
print("After Change:", phone_numbers_dict["Andrei"])

Before Change: 0944 995 040
After Change: 0488 989 246


In [27]:
phone_numbers_dict = {"Andrei": "0944 995 040", "Pierre": "0385 093 346", 
                      "Natasha": "0771 930 844", "Maria": "0298 188 800"}

print("Before Change:", phone_numbers_dict["Andrei"])

# Create a dictionary that contains the new values
update_dict = {"Andrei": "0488 989 246"}
phone_numbers_dict.update(update_dict)
print("After Change:", phone_numbers_dict["Andrei"])

Before Change: 0944 995 040
After Change: 0488 989 246


### Couple of More Things

Lets say that you only want the keys, or only the values, or both. How do you do this?

In [28]:
phone_numbers_dict = {"Andrei": "0944 995 040", "Pierre": "0385 093 346", 
                      "Natasha": "0771 930 844", "Maria": "0298 188 800"}

In [29]:
for key in phone_numbers_dict.keys():
    print(key)

Andrei
Pierre
Natasha
Maria


In [30]:
for val in phone_numbers_dict.values():
    print(val)

0944 995 040
0385 093 346
0771 930 844
0298 188 800


In [31]:
for key, val in phone_numbers_dict.items():
    print(key, val)

Andrei 0944 995 040
Pierre 0385 093 346
Natasha 0771 930 844
Maria 0298 188 800


# Application of Dictionaries

The most important use, for us, of dictionaries is their use as keyword arguments because dictionaires ARE keyword arguments! Pay attention because this becomes very important when we automate co-ordinate descent

In [40]:
import pandas as pd

from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split

In [41]:
perth = pd.read_csv('perth_clean.csv', index_col=0)
simplified_perth = perth.loc[:, ["latitude", "longitude", "nearest_sch_rank", "log10_price"]].copy()

train_indices, test_indices = train_test_split(simplified_perth.index, test_size=0.2, random_state=0)

train_data = simplified_perth.loc[train_indices, :]
test_data = simplified_perth.loc[test_indices, :]

x_train = train_data.drop("log10_price", axis=1)
y_train = train_data["log10_price"]

x_test = test_data.drop("log10_price", axis=1)
y_test = test_data["log10_price"]

In [44]:
model = DecisionTreeRegressor(max_depth=7, min_samples_leaf=9, random_state=0)
model.fit(x_train, y_train)

model.score(x_test, y_test)

0.4692533744369717

In [45]:
# kwargs - key word arguments
kwargs = {"max_depth": 7, "min_samples_leaf": 9, "random_state": 0}

# Notice the double star (**)
model = DecisionTreeRegressor(**kwargs)
model.fit(x_train, y_train)

model.score(x_test, y_test)

0.4692533744369717