# Ex2 - Getting and Knowing your Data

This time we are going to pull data directly from the internet.
Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

### Step 1. Import the necessary libraries

In [18]:
import pandas as pd
import numpy as np

### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv). 

In [19]:
chipotle = pd.read_csv("https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv", delimiter='\t')
print(chipotle.iloc[1])

order_id                         1
quantity                         1
item_name                     Izze
choice_description    [Clementine]
item_price                  $3.39 
Name: 1, dtype: object


### Step 3. Assign it to a variable called chipo.

In [20]:
chipo = chipotle

### Step 4. See the first 10 entries

In [21]:
print(chipo.head(10))

   order_id  quantity                              item_name  \
0         1         1           Chips and Fresh Tomato Salsa   
1         1         1                                   Izze   
2         1         1                       Nantucket Nectar   
3         1         1  Chips and Tomatillo-Green Chili Salsa   
4         2         2                           Chicken Bowl   
5         3         1                           Chicken Bowl   
6         3         1                          Side of Chips   
7         4         1                          Steak Burrito   
8         4         1                       Steak Soft Tacos   
9         5         1                          Steak Burrito   

                                  choice_description item_price  
0                                                NaN     $2.39   
1                                       [Clementine]     $3.39   
2                                            [Apple]     $3.39   
3                              

### Step 5. What is the number of observations in the dataset?

In [22]:
# Solution 1

print(len(chipo))

4622


In [23]:
# Solution 2



### Step 6. What is the number of columns in the dataset?

In [24]:
print(len(chipo.columns))

5


### Step 7. Print the name of all the columns.

In [25]:
print([column for column in chipo.columns])

['order_id', 'quantity', 'item_name', 'choice_description', 'item_price']


### Step 8. How is the dataset indexed?

In [26]:
#The dataset is ordered by order_id

### Step 9. Which was the most-ordered item? 

In [27]:
items = chipo.groupby('item_name').item_name.count()
items = items.sort_values(ascending=False)
print(items)
#Chicken Bowl

item_name
Chicken Bowl                             726
Chicken Burrito                          553
Chips and Guacamole                      479
Steak Burrito                            368
Canned Soft Drink                        301
Steak Bowl                               211
Chips                                    211
Bottled Water                            162
Chicken Soft Tacos                       115
Chicken Salad Bowl                       110
Chips and Fresh Tomato Salsa             110
Canned Soda                              104
Side of Chips                            101
Veggie Burrito                            95
Barbacoa Burrito                          91
Veggie Bowl                               85
Carnitas Bowl                             68
Barbacoa Bowl                             66
Carnitas Burrito                          59
Steak Soft Tacos                          55
6 Pack Soft Drink                         54
Chips and Tomatillo Red Chili Salsa       48


### Step 10. For the most-ordered item, how many items were ordered?

In [28]:
print(chipo[chipo.item_name == items.index[0]].quantity.sum())
#761

761


### Step 11. What was the most ordered item in the choice_description column?

In [29]:
choices = chipo.choice_description
def format_func(choice):
    if type(choice) == str:
        return choice.translate({ord(c): None for c in '[]'}).split(', ')
newChoices = choices.map(lambda choice: format_func(choice))
items = []
for choice in newChoices:
    if choice:
        for subchoice in choice:
            items.append(subchoice)
itemsSeries = pd.Series(items)
print(itemsSeries.value_counts())
#Rice

Rice                                    2389
Cheese                                  2281
Lettuce                                 1742
Sour Cream                              1711
Black Beans                             1342
Fresh Tomato Salsa                      1046
Guacamole                               1037
Fajita Vegetables                        722
Pinto Beans                              582
Roasted Chili Corn Salsa                 457
Fresh Tomato Salsa (Mild)                351
Tomatillo Red Chili Salsa                325
Fajita Veggies                           302
Roasted Chili Corn Salsa (Medium)        270
Tomatillo-Red Chili Salsa (Hot)          259
Tomatillo Green Chili Salsa              230
Diet Coke                                134
Tomatillo-Green Chili Salsa (Medium)     128
Coke                                     123
Sprite                                    77
Lemonade                                  33
Fresh Tomato (Mild)                       31
Coca Cola 

### Step 12. How many items were orderd in total?

In [30]:
print(itemsSeries.value_counts().sum())
#15754

15754


### Step 13. Turn the item price into a float

#### Step 13.a. Check the item price type

In [31]:
prices = chipo.item_price
print(prices)
print(type(prices[0]))

0        $2.39 
1        $3.39 
2        $3.39 
3        $2.39 
4       $16.98 
         ...   
4617    $11.75 
4618    $11.75 
4619    $11.25 
4620     $8.75 
4621     $8.75 
Name: item_price, Length: 4622, dtype: object
<class 'str'>


#### Step 13.b. Create a lambda function and change the type of item price

In [32]:
float_prices = chipo.item_price.map(lambda price: float(price[1:]))
print(float_prices)

0        2.39
1        3.39
2        3.39
3        2.39
4       16.98
        ...  
4617    11.75
4618    11.75
4619    11.25
4620     8.75
4621     8.75
Name: item_price, Length: 4622, dtype: float64


#### Step 13.c. Check the item price type

In [33]:
print(type(float_prices[0]))

<class 'numpy.float64'>


### Step 14. How much was the revenue for the period in the dataset?

In [34]:
print(sum(float_prices))

34500.16000000046


### Step 15. How many orders were made in the period?

In [35]:
print(chipo.order_id.nunique())
#Could also just state the max value of order_id

1834


### Step 16. What is the average revenue amount per order?

In [36]:
# Solution 1

def price_float(row):
    row.item_price = float(row.item_price[1:])
    return row

newChipo = chipo.apply(price_float, axis='columns')
meanRevenue = newChipo.groupby('order_id').item_price.sum().mean()
print(meanRevenue)


18.81142857142869


In [37]:
# Solution 2



### Step 17. How many different items are sold?

In [38]:
print(chipo.item_name.nunique())
#Items in item_name column

50


In [39]:
print(itemsSeries.nunique())
#Items in choice_description column

46
