# Getting and Knowing your Data

This time we are going to pull data directly from the internet.
Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

### Step 1. Import the necessary libraries

In [2]:
import pandas as pd
import numpy as np

### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv). 

In [3]:
chipo = pd.read_csv('https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv', sep='\t')

### Step 3. Assign it to a variable called chipo.

In [4]:
chipo.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98


### Step 4. See the first 10 entries

In [5]:
chipo.head(10)

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98
5,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...",$10.98
6,3,1,Side of Chips,,$1.69
7,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",$11.75
8,4,1,Steak Soft Tacos,"[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...",$9.25
9,5,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...",$9.25


### Step 5. What is the number of observations in the dataset?

In [6]:
chipo.shape

(4622, 5)

### Step 6. What is the number of columns in the dataset?

In [15]:
chipo.shape[1]

5

### Step 7. Print the name of all the columns.

In [16]:
for x in chipo.columns:
    print(x)

order_id
quantity
item_name
choice_description
item_price


### Step 8. How is the dataset indexed?

In [18]:
chipo.index

RangeIndex(start=0, stop=4622, step=1)

### Step 9. Which was the most ordered item?

In [29]:
df = chipo.groupby(['item_name']).sum().sort_values('quantity', ascending=False)
print(df)

                                       order_id  quantity
item_name                                                
Chicken Bowl                             713926       761
Chicken Burrito                          497303       591
Chips and Guacamole                      449959       506
Steak Burrito                            328437       386
Canned Soft Drink                        304753       351
Chips                                    208004       230
Steak Bowl                               193752       221
Bottled Water                            175944       211
Chips and Fresh Tomato Salsa             100419       130
Canned Soda                               76396       126
Chicken Salad Bowl                       117104       123
Chicken Soft Tacos                        98395       120
Side of Chips                             84769       110
Veggie Burrito                            80962        97
Barbacoa Burrito                          74718        91
Veggie Bowl   

### Step 10. How many items were ordered?

In [32]:
df.loc['Chicken Bowl']

order_id    713926
quantity       761
Name: Chicken Bowl, dtype: int64

### Step 11. What was the most ordered item in the choice_description column?

In [40]:
chipo_cd = chipo.sort_values('choice_description').dropna()
chipo_cd_rows = chipo_cd['choice_description']
for x in chipo_cd_rows:
    print(x)

[Adobo-Marinated and Grilled Chicken, Pinto Beans, [Sour Cream, Salsa, Cheese, Cilantro-Lime Rice, Guacamole]]
[Adobo-Marinated and Grilled Chicken, [Sour Cream, Cheese, Cilantro-Lime Rice]]
[Adobo-Marinated and Grilled Chicken]
[Adobo-Marinated and Grilled Steak, [Sour Cream, Salsa, Cheese, Cilantro-Lime Rice, Guacamole]]
[Adobo-Marinated and Grilled Steak]
[Apple]
[Apple]
[Apple]
[Apple]
[Apple]
[Apple]
[Blackberry]
[Blackberry]
[Blackberry]
[Blackberry]
[Blackberry]
[Blackberry]
[Blackberry]
[Blackberry]
[Braised Barbacoa, Pinto Beans, [Sour Cream, Salsa, Cheese, Cilantro-Lime Rice, Guacamole]]
[Braised Barbacoa, Vegetarian Black Beans, [Sour Cream, Salsa, Cheese, Cilantro-Lime Rice]]
[Braised Carnitas, Pinto Beans, [Sour Cream, Cheese, Cilantro-Lime Rice]]
[Brown Rice, Adobo-Marinated and Grilled Chicken, Vegetarian Black Beans]
[Brown Rice]
[Clementine]
[Clementine]
[Clementine]
[Clementine]
[Clementine]
[Clementine]
[Clementine]
[Clementine]
[Coca Cola]
[Coca Cola]
[Coca Cola]
[C

[Fresh Tomato Salsa, [Rice, Sour Cream, Cheese]]
[Fresh Tomato Salsa, [Rice, Sour Cream, Cheese]]
[Fresh Tomato Salsa, [Rice, Sour Cream, Guacamole, Lettuce]]
[Fresh Tomato Salsa, [Rice, Sour Cream, Guacamole, Lettuce]]
[Fresh Tomato Salsa, [Rice, Sour Cream, Guacamole, Lettuce]]
[Fresh Tomato Salsa, [Rice, Sour Cream, Guacamole, Lettuce]]
[Fresh Tomato Salsa, [Rice, Sour Cream, Guacamole, Lettuce]]
[Fresh Tomato Salsa, [Rice, Sour Cream, Guacamole, Lettuce]]
[Fresh Tomato Salsa, [Rice, Sour Cream, Guacamole]]
[Fresh Tomato Salsa, [Rice, Sour Cream, Guacamole]]
[Fresh Tomato Salsa, [Rice, Sour Cream, Guacamole]]
[Fresh Tomato Salsa, [Rice, Sour Cream, Lettuce, Guacamole]]
[Fresh Tomato Salsa, [Rice, Sour Cream, Lettuce]]
[Fresh Tomato Salsa, [Sour Cream, Cheese, Guacamole, Rice]]
[Fresh Tomato Salsa, [Sour Cream, Cheese, Guacamole]]
[Fresh Tomato Salsa, [Sour Cream, Cheese, Guacamole]]
[Fresh Tomato Salsa, [Sour Cream, Cheese, Guacamole]]
[Fresh Tomato Salsa, [Sour Cream, Cheese, Lettu

### Step 12. How many items were orderd in total?

In [42]:
chipo['quantity'].sum()

4972

### Step 13. Turn the item price into a float

In [68]:
chipo.dtypes
chipo['item_price_float'] = [x.lstrip('$') for x in chipo['item_price']]
chipo['item_price_float'] = chipo['item_price_float'].astype('float64')
chipo.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price,revenue,item_price_float
0,1,1,Chips and Fresh Tomato Salsa,,$2.39,2.39,2.39
1,1,1,Izze,[Clementine],$3.39,3.39,3.39
2,1,1,Nantucket Nectar,[Apple],$3.39,3.39,3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39,2.39,2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98,33.96,16.98


### Step 14. How much was the revenue for the period in the dataset?

In [70]:
chipo['revenue'] = chipo['quantity'] * chipo['item_price_float']
chipo['revenue'].sum()

39237.02

### Step 15. How many orders were made in the period?

In [69]:
chipo['order_id'].count()

4622

### Step 16. What is the average amount per order?

In [72]:
avg_amt = chipo['revenue'].sum() / chipo['quantity'].sum()
avg_amt

7.8915969428801285

### Step 17. How many different items are sold?

In [57]:
items = chipo['item_name'].unique()
len(items)


50