# Getting and Knowing your data

**Author : Manish Shishodia**

*Import the dataset as raw from [here](https://github.com/maanshishodia/DATASETS/blob/main/Chipotle.csv)*

### Import the necessary libraries

In [1]:
import numpy as np
import pandas as pd

### Assign it to a variable called chipo

In [2]:
url = "https://raw.githubusercontent.com/maanshishodia/DATASETS/main/Chipotle.csv"
chipo = pd.read_csv(url)

In [3]:
chipo

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,2.39
1,1,1,Izze,[Clementine],3.39
2,1,1,Nantucket Nectar,[Apple],3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",16.98
...,...,...,...,...,...
4617,1833,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Sour ...",11.75
4618,1833,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Sour Cream, Cheese...",11.75
4619,1834,1,Chicken Salad Bowl,"[Fresh Tomato Salsa, [Fajita Vegetables, Pinto...",11.25
4620,1834,1,Chicken Salad Bowl,"[Fresh Tomato Salsa, [Fajita Vegetables, Lettu...",8.75


### What is the number of observations in the dataset?

In [4]:
chipo.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4622 entries, 0 to 4621
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   order_id            4622 non-null   int64  
 1   quantity            4622 non-null   int64  
 2   item_name           4622 non-null   object 
 3   choice_description  3376 non-null   object 
 4   item_price          4622 non-null   float64
dtypes: float64(1), int64(2), object(2)
memory usage: 180.7+ KB


### What is the number of columns in the dataset?

In [5]:
chipo.shape[1]

5

### What are the name of the columns?

In [6]:
chipo.columns

Index(['order_id', 'quantity', 'item_name', 'choice_description',
       'item_price'],
      dtype='object')

### How is the dataset indexed?

In [7]:
chipo.index

RangeIndex(start=0, stop=4622, step=1)

### Which was the most ordered item?

In [8]:
item = chipo.groupby("item_name")
most_ord_item = item.sum()
most_ord_item = most_ord_item.sort_values(["quantity"], ascending=False)
most_ord_item.head()

Unnamed: 0_level_0,order_id,quantity,item_price
item_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Chicken Bowl,713926,761,7342.73
Chicken Burrito,497303,591,5575.82
Chips and Guacamole,449959,506,2201.04
Steak Burrito,328437,386,3851.43
Canned Soft Drink,304753,351,438.75


### What was the most ordered item in the choice_description column?

In [9]:
c = chipo.groupby("choice_description")
c = c.sum()
most_ord_item = c.sort_values(["quantity"], ascending=False)
most_ord_item.head()

Unnamed: 0_level_0,order_id,quantity,item_price
choice_description,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
[Diet Coke],123455,159,326.71
[Coke],122752,143,288.79
[Sprite],80426,89,133.93
"[Fresh Tomato Salsa, [Rice, Black Beans, Cheese, Sour Cream, Lettuce]]",43088,49,432.25
"[Fresh Tomato Salsa, [Rice, Black Beans, Cheese, Sour Cream]]",36041,42,372.64


### How many items were orderd in total?

In [10]:
total_item_ord = chipo["quantity"].sum()
total_item_ord

4972

### How much was the revenue for the period in the dataset?

In [11]:
revenue = (chipo["item_price"] * chipo["quantity"]).sum()
print(f"Revenue was: ${str(revenue)}")

Revenue was: $39237.02


### How many orders were made in the period?

In [12]:
orders = chipo.order_id.value_counts().count()
orders

1834

### What is the average revenue amount per order?

In [13]:
chipo["revenue"] = chipo["item_price"] * chipo["quantity"]
ord_groupby = chipo.groupby(["order_id"]).sum()
ord_groupby.mean()["revenue"]

21.394231188658654

### How many different items are sold?

In [14]:
chipo.item_name.value_counts().count()

50