# Exploring a database

Let's practice on Pandas. Download the dataset `chipotle.csv` from JULIE and upload it to your workspace.

1. Import a library that allows to read data from a csv file

In [2]:
import pandas as pd

2. Import the dataset

In [3]:
df = pd.read_csv("chipotle.csv")

In [4]:
dataset = pd.read_csv("chipotle.csv", index_col=0)

3. Look at the 10 first rows in the dataset

In [5]:
df.head(10)

Unnamed: 0.1,Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,1,Izze,[Clementine],$3.39
2,2,1,1,Nantucket Nectar,[Apple],$3.39
3,3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98
5,5,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...",$10.98
6,6,3,1,Side of Chips,,$1.69
7,7,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",$11.75
8,8,4,1,Steak Soft Tacos,"[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...",$9.25
9,9,5,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...",$9.25


4. What is the shape of the dataset ?

In [6]:
df.shape

(4622, 6)

5. Display all the columns of our dataset

In [7]:
df.columns

Index(['Unnamed: 0', 'order_id', 'quantity', 'item_name', 'choice_description',
       'item_price'],
      dtype='object')

6. What is the most ordered item?

     *Indication: we will use GROUP BY & Sort_values()*

In [8]:
df.groupby("item_name").sum("quantity").sort_values("quantity", ascending=False).head(20)

Unnamed: 0_level_0,Unnamed: 0,order_id,quantity
item_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Chicken Bowl,1779909,713926,761
Chicken Burrito,1238217,497303,591
Chips and Guacamole,1121773,449959,506
Steak Burrito,817795,328437,386
Canned Soft Drink,759990,304753,351
Chips,518704,208004,230
Steak Bowl,482721,193752,221
Bottled Water,439354,175944,211
Chips and Fresh Tomato Salsa,250144,100419,130
Canned Soda,189915,76396,126


7. How many items were ordered from Chipotle in total?

In [9]:
df["quantity"].sum()


4972

8. How much revenue has Chipotle made?
      
      A. Convert item_price to a decimal number.
        1. Look at dataset [ item_price ], what do you see?
        2. Let's find a way to get that $ out of our way. We can do that by using [str](https://bit.ly/2ClcdtN)
        3. Let's now convert the series to float instead of string
        
    B. Multiply the quantity sold by the price of the item.
    
    C. Add it all up, how much do you find?
    
    D. Round to two decimal places.
    
    We can use the [Series.round()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.round.html#pandas.Series.round) function.

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4622 entries, 0 to 4621
Data columns (total 6 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Unnamed: 0          4622 non-null   int64 
 1   order_id            4622 non-null   int64 
 2   quantity            4622 non-null   int64 
 3   item_name           4622 non-null   object
 4   choice_description  3376 non-null   object
 5   item_price          4622 non-null   object
dtypes: int64(3), object(3)
memory usage: 216.8+ KB


In [11]:
df["item_price"] = df["item_price"].astype("string")

In [12]:
df["item_price"] = df["item_price"].str.replace("$", "")

In [13]:
df["item_price"] = df["item_price"].astype("float")

In [14]:
df["revenue"] = df["quantity"] * df["item_price"]
df["revenue"].sum().round(2)

39237.02

9. What is the average revenue per order?

In [15]:
sum_revenue_per_order = df.groupby("order_id").sum("revenue")
average_revenue = sum_revenue_per_order.mean()
average_revenue

Unnamed: 0    5822.863141
quantity         2.711014
item_price      18.811429
revenue         21.394231
dtype: float64