# Module 6.1 Activity: DataFrames

Now that we have had more exposure to the types of operations we can do with DataFrames, let's dive right in! This notebook primarily focuses on DataFrames and how to interact with them.

In [40]:
#Just run the following cell to load the appropriate dependencies
import pandas as pd

candy_dict = {'Candy': ["Sour Patch Kids", "Skittles", "Snickers", "Candy Corn", "Starburst", "M&M’s"], 
              'Quantity': [14, 18, 22, 32, 6, 43],
             'Price': ["Expensive", "Cheap", "Cheap", "Expensive", "Expensive", "Cheap"]}

ranking_dict = {'Candy': ["Sour Patch Kids", "Skittles", "Snickers", "Candy Corn", "Starburst", "M&M’s"],
                'Ranking': [3,4,2,1,6,5]}

## Sample DataFrame

Consider the following DataFrame called `candy_jar`. It consists of a `Candy`, `Quantity`, and `Price` column.

In [7]:
candy_jar = pd.DataFrame(candy_dict)
candy_jar

Unnamed: 0,Candy,Quantity,Price
0,Sour Patch Kids,14,Expensive
1,Skittles,18,Cheap
2,Snickers,22,Cheap
3,Candy Corn,32,Expensive
4,Starburst,6,Expensive
5,M&M’s,43,Cheap


**Question One:** How can we return a list of all candy names in our DataFrame? 

In [10]:
names_of_candy = candy_jar['Candy']
list(names_of_candy)

['Sour Patch Kids', 'Skittles', 'Snickers', 'Candy Corn', 'Starburst', 'M&M’s']

**Question Two:** How can we return the information from the rows that would be indexed as 3 and 5? 

In [19]:
reduced = candy_jar.iloc[[3,5], :]
reduced

Unnamed: 0,Candy,Quantity,Price
3,Candy Corn,32,Expensive
5,M&M’s,43,Cheap


**Question Three:** A thief broke into your house and stole 10 M&M’s from your candy jar! Write a code to calculate the amount of M&M’s remaining.

In [33]:
amount_left = candy_jar.loc[5,"Quantity"] - 10
amount_left

33

By a stroke of luck, your friend Oski recovered your missing M&M's! Now, you are back to the original amount (i.e. the current DataFrame should look identical to it did before).

**Question Four:** Your mom wanted to know many expensive candies there are without manually counting. How can you create a table that gives the count of both expensive and cheap candies? 

In [35]:
price_df = candy_jar.groupby("Price").count()
price_df

Unnamed: 0_level_0,Candy,Quantity
Price,Unnamed: 1_level_1,Unnamed: 2_level_1
Cheap,3,3
Expensive,3,3


Now, instead of her wanting to simply know how many candies of each price there were, she wants to know the **average quantity** of each price. A GroupBy function will be essential here. [https://pandas.pydata.org/pandas-docs/stable/reference/groupby.html?highlight=groupby#computations-descriptive-stats].

In [39]:
average_quantity_df = candy_jar.groupby("Price").mean()
average_quantity_df

Unnamed: 0_level_0,Quantity
Price,Unnamed: 1_level_1
Cheap,27.666667
Expensive,17.333333


**Question Five:** Suppose we have another table named `preference_rank` that contains a ranking of all the types of candy from our `candy_jar` table. How can we combine `candy_jar` and `preference_rank` into one, combined DataFrame called `combined`?

In [41]:
preference_rank = pd.DataFrame(ranking_dict)
preference_rank

Unnamed: 0,Candy,Ranking
0,Sour Patch Kids,3
1,Skittles,4
2,Snickers,2
3,Candy Corn,1
4,Starburst,6
5,M&M’s,5


In [49]:
combined = candy_jar.join(preference_rank.set_index('Candy'), on='Candy')
combined

Unnamed: 0,Candy,Quantity,Price,Ranking
0,Sour Patch Kids,14,Expensive,3
1,Skittles,18,Cheap,4
2,Snickers,22,Cheap,2
3,Candy Corn,32,Expensive,1
4,Starburst,6,Expensive,6
5,M&M’s,43,Cheap,5


**Challenge Question:** How can you sort the `combined` DataFrame by `Ranking`? Can you get a list of the candy names in order of their ranking? Feel free to consult [https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html?highlight=sort_values#pandas.DataFrame.sort_values] if you are stuck!

In [52]:
sorted_df = combined.sort_values('Ranking')
sorted_names = sorted_df['Candy']
list(sorted_names)

['Candy Corn', 'Snickers', 'Sour Patch Kids', 'Skittles', 'M&M’s', 'Starburst']