# Dictionaries

Another built in python data structure is the dictionary. Dictionaries creating mappings between key value pairs like so:  
`dict1 = {keyX : valueX, keyC : valueC, keyQ : valueQ}`. 

you could also create the same dictionary like this:  
`dict1 = dict(keyX : valueX, keyC : valueC, keyQ : valueQ)`. 

Unlike strings or lists, dictionaries do not have an index or a specific order. There are ways to iterate through a dictionary, but the order that the items are returned is not gauranteed to be in a specific order. Instead, dictionaries are good for finding the value of a specific item rather then an item in a specific place.  

Let's look at a couple of examples in practice:

# Importing Packages and Data
As usual, let's start by import pandas and some data that we would be working with.

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('lego_sets.csv')
df.head()

Unnamed: 0,ages,list_price,num_reviews,piece_count,play_star_rating,prod_desc,prod_id,prod_long_desc,review_difficulty,set_name,star_rating,theme_name,val_star_rating,country
0,6-12,29.99,2.0,277.0,4.0,Catapult into action and take back the eggs fr...,75823.0,Use the staircase catapult to launch Red into ...,Average,Bird Island Egg Heist,4.5,Angry Birds™,4.0,US
1,6-12,19.99,2.0,168.0,4.0,Launch a flying attack and rescue the eggs fro...,75822.0,Pilot Pig has taken off from Bird Island with ...,Easy,Piggy Plane Attack,5.0,Angry Birds™,4.0,US
2,6-12,12.99,11.0,74.0,4.3,Chase the piggy with lightning-fast Chuck and ...,75821.0,Pitch speedy bird Chuck against the Piggy Car....,Easy,Piggy Car Escape,4.3,Angry Birds™,4.1,US
3,12+,99.99,23.0,1032.0,3.6,Explore the architecture of the United States ...,21030.0,Discover the architectural secrets of the icon...,Average,United States Capitol Building,4.6,Architecture,4.3,US
4,12+,79.99,14.0,744.0,3.2,Recreate the Solomon R. Guggenheim Museum® wit...,21035.0,Discover the architectural secrets of Frank Ll...,Challenging,Solomon R. Guggenheim Museum®,4.6,Architecture,4.1,US


# Using Dictionaries to Rename Column Values
A common use case of dictionaries is to rename the values in a column. For example, let's say that we wanted to rename the *review_difficulty* naming convention to use a quantitative scale.

In [3]:
# Get previous values
df.review_difficulty.unique()

array(['Average', 'Easy', 'Challenging', 'Very Easy', nan,
       'Very Challenging'], dtype=object)

Notice the `nan` value above which represents *null* or blank values. We could potentially translate these difficulty ratings into a quantitative scale like this:

In [4]:
diff_dict = {'Very Easy' : 1, 'Easy' : 2, 'Average' : 3, 'Challenging' : 4, 'Very Challenging' : 5}

We could then create a new column (or update the current column) using that dictionary:

In [8]:
df['Difficulty_Rating'] = df.review_difficulty.map(diff_dict)
df.head() #Preview changes

Unnamed: 0,ages,list_price,num_reviews,piece_count,play_star_rating,prod_desc,prod_id,prod_long_desc,review_difficulty,set_name,star_rating,theme_name,val_star_rating,country,Difficulty_Rating
0,6-12,29.99,2.0,277.0,4.0,Catapult into action and take back the eggs fr...,75823.0,Use the staircase catapult to launch Red into ...,Average,Bird Island Egg Heist,4.5,Angry Birds™,4.0,US,3.0
1,6-12,19.99,2.0,168.0,4.0,Launch a flying attack and rescue the eggs fro...,75822.0,Pilot Pig has taken off from Bird Island with ...,Easy,Piggy Plane Attack,5.0,Angry Birds™,4.0,US,2.0
2,6-12,12.99,11.0,74.0,4.3,Chase the piggy with lightning-fast Chuck and ...,75821.0,Pitch speedy bird Chuck against the Piggy Car....,Easy,Piggy Car Escape,4.3,Angry Birds™,4.1,US,2.0
3,12+,99.99,23.0,1032.0,3.6,Explore the architecture of the United States ...,21030.0,Discover the architectural secrets of the icon...,Average,United States Capitol Building,4.6,Architecture,4.3,US,3.0
4,12+,79.99,14.0,744.0,3.2,Recreate the Solomon R. Guggenheim Museum® wit...,21035.0,Discover the architectural secrets of Frank Ll...,Challenging,Solomon R. Guggenheim Museum®,4.6,Architecture,4.1,US,4.0


# Creating Dictionaries from DataFrames
You can also quickly create dictionaries from another dataset. Let's say we want the full name of countries listed under the country column.

In [11]:
df.country.value_counts()[:5]

US    817
CA    815
GB    576
NL    576
DN    575
Name: country, dtype: int64

In [12]:
#Pull in a new dataset from online
countries = pd.read_csv('Country_Codes.csv')
countries.head()

Unnamed: 0,COUNTRY,A2 (ISO),A3 (UN),NUM (UN),DIALING CODE
0,Afghanistan,AF,AFG,4,93
1,Albania,AL,ALB,8,355
2,Algeria,DZ,DZA,12,213
3,American Samoa,AS,ASM,16,01/01/84
4,Andorra,AD,AND,20,376


In [13]:
#Create a dictionary
#The zip method is a neat little tool for pairing each entry from two columns together (like a zipper!)
#We then just wrap that in the dict() function.
country_dict = dict(zip(countries['A2 (ISO)'], countries['COUNTRY'])) 

In [16]:
#Map it to our original dataset (you can also do this with pd.merge() for joining multiple fields
df['Country_Full_Name'] = df.country.map(country_dict)
df.head(2)

Unnamed: 0,ages,list_price,num_reviews,piece_count,play_star_rating,prod_desc,prod_id,prod_long_desc,review_difficulty,set_name,star_rating,theme_name,val_star_rating,country,Difficulty_Rating,Country_Full_Name
0,6-12,29.99,2.0,277.0,4.0,Catapult into action and take back the eggs fr...,75823.0,Use the staircase catapult to launch Red into ...,Average,Bird Island Egg Heist,4.5,Angry Birds™,4.0,US,3.0,United States
1,6-12,19.99,2.0,168.0,4.0,Launch a flying attack and rescue the eggs fro...,75822.0,Pilot Pig has taken off from Bird Island with ...,Easy,Piggy Plane Attack,5.0,Angry Birds™,4.0,US,2.0,United States


In [17]:
df.Country_Full_Name.value_counts()[:5]

United States     817
Canada            815
Netherlands       576
United Kingdom    576
Austria           575
Name: Country_Full_Name, dtype: int64

# Custom Agg Functions
We can also use dictionaries along with the pandas groupby method to apply different aggregations to different columns:

In [1]:
import numpy as np

In [2]:
agg_dict = {'ages' : 'max',
            'Difficulty_Rating' : [np.mean, np.std],
           'Country_Full_Name' : lambda x: x.value_counts().index[0],
           'num_reviews' : ['mean', 'max']}
df.groupby('theme_name').agg(agg_dict)

NameError: name 'df' is not defined

# Data Transformation

Create a dictionary that rebins the age column to the following age ranges:
Under 5, 5-8, 8-12, 12+

*If there is a conflict in age bin, default to the higher age bin.

In [None]:
# Your code here

# Data Visualization
Create a bar graph depicting the number of lego sets for the original age range column. Then create a second bar graph for the new age column you created. How do they compare?

In [None]:
# Your code here