# 1b - Getting and Knowing your Data with Pandas

This time we are going to pull data directly from the internet.
Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

### Step 1. Import the necessary libraries

In [1]:
import pandas as pd

### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv).

In [2]:
data = pd.read_csv("https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv", delimiter = "\t")

### Step 3. Assign it to a variable called chipo.

In [4]:
chipo = data

### Step 4. See the first 10 entries

In [5]:
chipo.head(10)

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98
5,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...",$10.98
6,3,1,Side of Chips,,$1.69
7,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",$11.75
8,4,1,Steak Soft Tacos,"[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...",$9.25
9,5,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...",$9.25


### Step 5. What is the number of observations in the dataset?

In [6]:
print("length of the data / number of rows: ", chipo.shape[0])

length of the data / number of rows:  4622


### Step 6. What is the number of columns in the dataset?

In [7]:
print("number of columns: ", chipo.shape[1])

number of columns:  5


### Step 7. Print the name of all the columns.

In [8]:
chipo.columns

Index(['order_id', 'quantity', 'item_name', 'choice_description',
       'item_price'],
      dtype='object')

### Step 8. Which was the most-ordered item?

In [14]:
# Group the 'chipo' by 'item_name' and calculate the sum of 'quantity'
grouped_chipo = chipo.groupby('item_name').agg({'quantity': 'sum'}).sort_values(by = "quantity", ascending=False)
# print overview of new df
print(grouped_chipo)

# print first row (= most ordered item)
print("\n\033[1mThe most-ordered item is:\033[0m", grouped_chipo.iloc[0])

                                       quantity
item_name                                      
Chicken Bowl                                761
Chicken Burrito                             591
Chips and Guacamole                         506
Steak Burrito                               386
Canned Soft Drink                           351
Chips                                       230
Steak Bowl                                  221
Bottled Water                               211
Chips and Fresh Tomato Salsa                130
Canned Soda                                 126
Chicken Salad Bowl                          123
Chicken Soft Tacos                          120
Side of Chips                               110
Veggie Burrito                               97
Barbacoa Burrito                             91
Veggie Bowl                                  87
Carnitas Bowl                                71
Barbacoa Bowl                                66
Carnitas Burrito                        

### Step 9. For the most-ordered item, how many items were ordered?

In [15]:
print("The most-ordered item was ordered", grouped_chipo.iloc[0,0], "many times")

The most-ordered item was ordered 761 many times


### Step 10. What was the most ordered item in the choice_description column?

In [19]:
# Group the 'chipo' by 'choice_description' and calculate the sum of 'quantity'
grouped_chipo2 = chipo.groupby('choice_description').agg({'quantity': 'sum'}).sort_values(by = "quantity", ascending=False)
# print overview of new df
print(grouped_chipo2)

# print first row (= most ordered item)
print("\n\033[1mThe most-ordered item is:\033[0m", grouped_chipo2.iloc[0])


                                                    quantity
choice_description                                          
[Diet Coke]                                              159
[Coke]                                                   143
[Sprite]                                                  89
[Fresh Tomato Salsa, [Rice, Black Beans, Cheese...        49
[Fresh Tomato Salsa, [Rice, Black Beans, Cheese...        42
...                                                      ...
[Roasted Chili Corn Salsa, [Fajita Vegetables, ...         1
[Roasted Chili Corn Salsa, [Fajita Vegetables, ...         1
[Roasted Chili Corn Salsa, [Fajita Vegetables, ...         1
[Roasted Chili Corn Salsa, [Guacamole, Sour Cre...         1
[[Tomatillo-Red Chili Salsa (Hot), Tomatillo-Gr...         1

[1043 rows x 1 columns]

[1mThe most-ordered item is:[0m quantity    159
Name: [Diet Coke], dtype: int64


### Step 11. How many items were orderd in total?

In [20]:
grouped_chipo["quantity"].sum()

4972

### Step 12. Turn the item price into a float (nothing to do under this cell, check the subtasks bellow)

#### Step 12.a. Check the item price type

In [21]:
chipo["item_price"].dtype

dtype('O')

In [22]:
price = chipo["item_price"]
print(price)

0        $2.39 
1        $3.39 
2        $3.39 
3        $2.39 
4       $16.98 
         ...   
4617    $11.75 
4618    $11.75 
4619    $11.25 
4620     $8.75 
4621     $8.75 
Name: item_price, Length: 4622, dtype: object


#### Step 12.b. Create a function that will change the type of item price to `float`

In [23]:
def price_to_float(value):
    value = value.replace("$", "")
    return float(value)

#### Step 12.c. Apply the function from the previous question to the item price using `map` or `apply` functions

In [28]:
chipo["item_price_float"] = chipo["item_price"].apply(price_to_float)
chipo["item_price_float"]

0        2.39
1        3.39
2        3.39
3        2.39
4       16.98
        ...  
4617    11.75
4618    11.75
4619    11.25
4620     8.75
4621     8.75
Name: item_price_float, Length: 4622, dtype: float64

In [29]:
chipo["item_price_float"].dtype

dtype('float64')

### Step 13. How much was the revenue for the period in the dataset?

In [42]:
# Asumption: Revenue = Price -> the price column shows price by order
revenue = chipo["item_price_float"].sum()
print("The total revenue is: ", revenue)

The total revenue is:  34500.16


### Step 14. How many orders were made in the period?

In [43]:
order_count = chipo["order_id"].nunique()
print("The number of distinct orders is: ",order_count )

The number of distinct orders is:  1834


### Step 15. What is the average revenue amount per order?

In [45]:
print("The average revenue amount per order is: ", (revenue1[0]/order_count))

The average revenue amount per order is:  18.811428571428575


### Step 16. How many different items are sold?

In [46]:
items_count = chipo["item_name"].nunique()
print("The number of distinct items sold is: ", items_count)

The number of distinct items sold is:  50
