## **Ex1 - Filtering and Sorting Data**

### **Step 1. Import the necessary libraries**

In [1]:
import pandas as pd
import numpy as np

### **Step 3. Assign it to a variable called chipo**

In [2]:
url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv'
chipo = pd.read_csv(url, sep = '\t')

### **Step 5. What is the price of each item?**

In [3]:
chipo.loc[:,['item_name', 'item_price']]

Unnamed: 0,item_name,item_price
0,Chips and Fresh Tomato Salsa,$2.39
1,Izze,$3.39
2,Nantucket Nectar,$3.39
3,Chips and Tomatillo-Green Chili Salsa,$2.39
4,Chicken Bowl,$16.98
...,...,...
4617,Steak Burrito,$11.75
4618,Steak Burrito,$11.75
4619,Chicken Salad Bowl,$11.25
4620,Chicken Salad Bowl,$8.75


### **Step 6. Sort by the name of the item**

In [4]:
chipo.loc[:,['item_name', 'item_price']].sort_values('item_name')

Unnamed: 0,item_name,item_price
3389,6 Pack Soft Drink,$12.98
341,6 Pack Soft Drink,$6.49
1849,6 Pack Soft Drink,$6.49
1860,6 Pack Soft Drink,$6.49
2713,6 Pack Soft Drink,$6.49
...,...,...
2384,Veggie Soft Tacos,$8.75
781,Veggie Soft Tacos,$8.75
2851,Veggie Soft Tacos,$8.49
1699,Veggie Soft Tacos,$11.25


### **Step 7. What was the quantity of the most expensive item ordered?**

In [5]:
#so first have to change item_price to numeric
chipo['item_price'] = [x[1:] for x in chipo['item_price']] 
#can also use lambda function with .replace() or .strip()

In [6]:
#changing the data type of item_price to numeric
chipo['item_price'] = pd.to_numeric(chipo['item_price'])

In [7]:
chipo[chipo['item_price'] == max(chipo['item_price'])]

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
3598,1443,15,Chips and Fresh Tomato Salsa,,44.25


In [14]:
item_name_price_quantity_groupby = chipo.groupby(['item_name']).agg({'item_price':'max', 'quantity':'sum'}).sort_values('item_price', ascending = False)
item_name_price_quantity_groupby.head(1)

Unnamed: 0_level_0,item_price,quantity
item_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Chips and Fresh Tomato Salsa,44.25,130


### **Step 8. How many times was a Veggie Salad Bowl ordered?**

In [15]:
item_name_price_quantity_groupby.loc['Veggie Salad Bowl']

item_price    11.25
quantity      18.00
Name: Veggie Salad Bowl, dtype: float64

Veggie Salad Bowl was ordered 18 times

### **Step 9. How many times did someone order more than one Canned Soda?**

In [26]:
ord_id_item_name_groupby = chipo.groupby(['order_id', 'item_name']).agg({'quantity' : 'sum'})
ord_id_item_name_groupby

Unnamed: 0_level_0,Unnamed: 1_level_0,quantity
order_id,item_name,Unnamed: 2_level_1
1,Chips and Fresh Tomato Salsa,1
1,Chips and Tomatillo-Green Chili Salsa,1
1,Izze,1
1,Nantucket Nectar,1
2,Chicken Bowl,2
...,...,...
1831,Chips,1
1832,Chicken Soft Tacos,1
1832,Chips and Guacamole,1
1833,Steak Burrito,2


In [51]:
ord_id_item_name_groupby[ord_id_item_name_groupby.index.isin(['Canned Soda'], level = 'item_name') & (ord_id_item_name_groupby['quantity'] > 1)].count()
#so since the item_name is in the index, I must filter first on index, then on the column of quantity. then using .count() to get the amount of times that Canned soda appeared more than once

quantity    24
dtype: int64

The amount of times someone ordered more than one canned soda was 24 tiems.

<div class="alert alert-block alert-warning">
    Exploring why the sample solution is different than mine for Step 9

In [46]:
#this is the sample solution
chipo_drink_steak_bowl = chipo[(chipo.item_name == "Canned Soda") & (chipo.quantity > 1)]
chipo_drink_steak_bowl

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
18,9,2,Canned Soda,[Sprite],2.18
51,23,2,Canned Soda,[Mountain Dew],2.18
162,73,2,Canned Soda,[Diet Coke],2.18
171,76,2,Canned Soda,[Diet Dr. Pepper],2.18
350,150,2,Canned Soda,[Diet Coke],2.18
352,151,2,Canned Soda,[Coca Cola],2.18
698,287,2,Canned Soda,[Coca Cola],2.18
700,288,2,Canned Soda,[Coca Cola],2.18
909,376,2,Canned Soda,[Mountain Dew],2.18
1091,450,2,Canned Soda,[Dr. Pepper],2.18


In [62]:
df = ord_id_item_name_groupby[ord_id_item_name_groupby.index.isin(['Canned Soda'], level = 'item_name') & (ord_id_item_name_groupby['quantity'] > 1)].reset_index()
df.head()

Unnamed: 0,order_id,item_name,quantity
0,9,Canned Soda,2
1,23,Canned Soda,2
2,73,Canned Soda,2
3,76,Canned Soda,2
4,81,Canned Soda,2


### SO

The sample solution only has 20, where as I have 24. I merged the datasets together to see what values where non-existent in the sample solution that I have.

In [65]:
df.merge(chipo_drink_steak_bowl, how = 'left', on = 'order_id', suffixes = ('_df','_sol'))

Unnamed: 0,order_id,item_name_df,quantity_df,quantity_sol,item_name_sol,choice_description,item_price
0,9,Canned Soda,2,2.0,Canned Soda,[Sprite],2.18
1,23,Canned Soda,2,2.0,Canned Soda,[Mountain Dew],2.18
2,73,Canned Soda,2,2.0,Canned Soda,[Diet Coke],2.18
3,76,Canned Soda,2,2.0,Canned Soda,[Diet Dr. Pepper],2.18
4,81,Canned Soda,2,,,,
5,108,Canned Soda,3,,,,
6,150,Canned Soda,2,2.0,Canned Soda,[Diet Coke],2.18
7,151,Canned Soda,2,2.0,Canned Soda,[Coca Cola],2.18
8,287,Canned Soda,2,2.0,Canned Soda,[Coca Cola],2.18
9,288,Canned Soda,2,2.0,Canned Soda,[Coca Cola],2.18


In [67]:
chipo[chipo['order_id'] == 81]

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
178,81,1,Chicken Burrito,"[Tomatillo-Green Chili Salsa (Medium), [Rice, ...",8.49
179,81,1,Canned Soda,[Coca Cola],1.09
180,81,1,Canned Soda,[Dr. Pepper],1.09


**So I believe the sample solution is wrong**

So, for order_id 81, you can see in the chipo dataset that there is a quantity of 2 for the iten 'Canned Soda', yet this order_id is not present in the sample solution!!

The problem, is that for the filtering of quantity > 1, the sample solution performs this on each element of Canned Soda (row-wise), where as I performed this on the grouping of the order_id. Becuase it is logical that a single order_id applies for 'someone' as the question is phrased.


Ultimately, the phrasing of the question is terrible and thus leads to ambiguious answers since the party of 'someone' **CANNOT** be rigidly defined with this dataset.

Is someone a single order_id? is someone a single row?  that is where the ambiguity lies, and it makes more logical sense that a 'someone' would be a single order_id, which is why I chose to groupby by order_id