<font size = "4">

- To get familiar with how operations on Pandas DataFrames work, it can be easier to test things out on "toy" DataFrames.

- The easiest way to create a small test DataFrame is to define it using a **dictionary**

- Dictionaries are another Python structure that can hold multiple objects - like lists or Numpy arrays 

<font size = "4">

Consider the following example:

- We are keeping track of the course grades for a small class of 5 students.
- We decide to use Python lists, which are indexed by the integers 0, 1, 2, ...
- We need two lists: one for the student names, one for their grades.

In [9]:
student_roster = ["Nancy", "Bill", "Elaine", "Boris", "Assata"]
student_grades = [92.8, 87.3, 95.4, 82.3, 98.7]

<font size = "4">

- Suppose "Boris" does some extra credit and we want to increase his grade to 85.
- There's only 5 students, so we can quickly see that we need to change ``student_grades[3]``
- But with a larger class, we would want to do this with Python commands, not manually.

So, if we want to change the grade of "Boris" to 85, we could do:

In [11]:
b_index = student_roster.index("Boris") # find index

student_grades[b_index] = 85 # update corresponding element

print(student_grades)

[92.8, 87.3, 95.4, 85, 98.7]


<font size = "4">

But a **dictionary** can be indexed with an integer or a string. We can directly use the students' names.

In [8]:
grades = {"Nancy" : 92.8, "Bill" : 87.3, "Elaine" : 95.4, "Boris": 82.3, "Assata" : 98.7}

print(grades["Boris"])

82.3


<font size = "4">

We can change his Boris' grade easily:

In [12]:
grades["Boris"] = 85

In [13]:
print(grades)

{'Nancy': 92.8, 'Bill': 87.3, 'Elaine': 95.4, 'Boris': 85, 'Assata': 98.7}


<font size = "4">

The indices for a dictionary are the **keys**, and the elements of the dictionary are the **values**

In [21]:
print(grades.keys())
print(grades.values())

dict_keys(['Nancy', 'Bill', 'Elaine', 'Boris', 'Assata'])
dict_values([92.8, 87.3, 95.4, 85, 98.7])


In [22]:
for key in grades.keys():
    name = key 
    score = grades[key]
    print("Grade for", key, ":", score)

Grade for Nancy : 92.8
Grade for Bill : 87.3
Grade for Elaine : 95.4
Grade for Boris : 85
Grade for Assata : 98.7


<font size = "4">

Dictionaries can have strings and integers as keys. Items can be an object including lists (or other dictionaries)


In [25]:

example_dict = {0: "QTM-151", 
                "favorite movie" : "Evil Dead (2013)",
                12 : 35.2,
                "my_list" : [-8, 0.2, 9.1],
                "roster_from_above" : student_roster, 
                "dict_from_above": grades}

# keys
print(example_dict.keys())
print()

# values
print(example_dict[0])
print(example_dict["favorite movie"])
print(example_dict[12])
print(example_dict["my_list"])
print(example_dict["roster_from_above"])
print(example_dict["dict_from_above"])

dict_keys([0, 'favorite movie', 12, 'my_list', 'roster_from_above', 'dict_from_above'])

QTM-151
Evil Dead (2013)
35.2
[-8, 0.2, 9.1]
['Nancy', 'Bill', 'Elaine', 'Boris', 'Assata']
{'Nancy': 92.8, 'Bill': 87.3, 'Elaine': 95.4, 'Boris': 85, 'Assata': 98.7}


<font size = "4">

You can easily make small DataFrames using dictionaries.

- The keys will be the column names of the DataFrame.
- For each key, define the value as a list, which will be the entries of each column.
- **Each list must have the same length.** Otherwise, you will get an error if you try to convert it to a DataFrame

Below, we'll make a small DataFrame containing info for 5 U.S. states:

In [31]:
import pandas as pd

states = ["Wisconsin", "Georgia", "Delaware", "Illinois", "Oregon"]

# Wisconsin was 30th state admitted to the U.S., Georgia was 4th state admitted etc.
state_id = [30, 4, 1, 21, 33]

capitals = ["Madison", "Atlanta", "Dover", "Springfield", "Salem"]

# year admitted to the U.S.
admit_date = ["1848", "1788", "1787", "1818", "1859"]

# number of electoral votes for each state
electoral_votes = [10, 16, 3, 19, 8]

state_dict = {"state_id": state_id, "state_name" : states, "capital_city" : capitals, 
                "year_admitted" : admit_date, "electoral_college_votes" : electoral_votes}


df_states = pd.DataFrame(state_dict)

display(df_states)

Unnamed: 0,state_id,state_name,capital_city,year_admitted,electoral_college_votes
0,30,Wisconsin,Madison,1848,10
1,4,Georgia,Atlanta,1788,16
2,1,Delaware,Dover,1787,3
3,21,Illinois,Springfield,1818,19
4,33,Oregon,Salem,1859,8


<font size = "4">

We'll make another small DataFrame with different info for the same 5 states.

In [33]:
big_city = ["Milwaukee", "Atlanta", "Wilmington", "Chicago", "Portland"]

state_dict_2 = {"state_id" : state_id, "largest_city" : big_city, "year_admitted" : admit_date}

df_cities = pd.DataFrame(state_dict_2)

display(df_cities)

Unnamed: 0,state_id,largest_city,year_admitted
0,30,Milwaukee,1848
1,4,Atlanta,1788
2,1,Wilmington,1787
3,21,Chicago,1818
4,33,Portland,1859


<font size = "4">

- You can now experiment using ``pd.merge`` with the two DataFrames, ``df_states`` and ``df_cities``.

- It's easier to experiment with these small DataFrames, rather than large data frames created using .csv files from large datasets

- You can make up your own small datasets (even nonsensical ones) using dictionaries. This can be helpful when experimenting with many operations on Pandas DataFrames. 