# Ex2 - Getting and Knowing your Data

This time we are going to pull data directly from the internet.
Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

### Step 1. Import the necessary libraries

In [0]:
import pandas as pd

### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv). 

### Step 3. Assign it to a variable called chipo.

In [0]:
url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv'
    
chipo = pd.read_csv(url, sep = '\t')
chipo.head(10)

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98
5,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...",$10.98
6,3,1,Side of Chips,,$1.69
7,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",$11.75
8,4,1,Steak Soft Tacos,"[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...",$9.25
9,5,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...",$9.25


### Step 4. See the first 10 entries

In [0]:
first_ten = chipo.iloc[:10, :]
print(first_ten)

   order_id  ...  item_price
0         1  ...      $2.39 
1         1  ...      $3.39 
2         1  ...      $3.39 
3         1  ...      $2.39 
4         2  ...     $16.98 
5         3  ...     $10.98 
6         3  ...      $1.69 
7         4  ...     $11.75 
8         4  ...      $9.25 
9         5  ...      $9.25 

[10 rows x 5 columns]


### Step 5. What is the number of observations in the dataset?

In [0]:
# Solution 1
print(chipo.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4622 entries, 0 to 4621
Data columns (total 5 columns):
order_id              4622 non-null int64
quantity              4622 non-null int64
item_name             4622 non-null object
choice_description    3376 non-null object
item_price            4622 non-null object
dtypes: int64(2), object(3)
memory usage: 180.6+ KB
None


In [0]:
# Solution 2
print(chipo.shape)

(4622, 5)


### Step 6. What is the number of columns in the dataset?

In [0]:
print("columns : {}".format(chipo.shape[1]))

columns : 5


### Step 7. Print the name of all the columns.

In [0]:
print(chipo.columns.tolist())

['order_id', 'quantity', 'item_name', 'choice_description', 'item_price']


### Step 8. How is the dataset indexed?

In [0]:
print(chipo.index)
print(chipo.index.values)

RangeIndex(start=0, stop=4622, step=1)
[   0    1    2 ... 4619 4620 4621]


### Step 9. Which was the most-ordered item? 

In [0]:
x = chipo.groupby("item_name").sum()
x = x.sort_values(["quantity"], ascending= False)
print(x)

0    Chicken Bowl
dtype: object

### Step 10. For the most-ordered item, how many items were ordered?

In [0]:
condition = chipo["item_name"] == "Chicken Bowl"
print(chipo[condition].loc[:, "quantity"].sum())

761


### Step 11. What was the most ordered item in the choice_description column?

In [0]:
y = chipo.groupby("choice_description").sum()
y = y.sort_values(["quantity"], ascending=False)
print(y)

                                                    order_id  quantity
choice_description                                                    
[Diet Coke]                                           123455       159
[Coke]                                                122752       143
[Sprite]                                               80426        89
[Fresh Tomato Salsa, [Rice, Black Beans, Cheese...     43088        49
[Fresh Tomato Salsa, [Rice, Black Beans, Cheese...     36041        42
[Fresh Tomato Salsa, [Rice, Black Beans, Cheese...     37550        40
[Lemonade]                                             31892        36
[Fresh Tomato Salsa (Mild), [Pinto Beans, Rice,...     24432        36
[Coca Cola]                                            19282        32
[Fresh Tomato Salsa, [Rice, Cheese, Sour Cream,...     29614        30
[Fresh Tomato Salsa, [Rice, Black Beans, Cheese]]      27583        30
[Fresh Tomato Salsa, [Rice, Cheese, Lettuce]]          23326        26
[Fresh

### Step 12. How many items were orderd in total?

In [0]:
print(chipo.loc[:, "quantity"].sum())

4972


### Step 13. Turn the item price into a float

#### Step 13.a. Check the item price type

In [0]:
chipo.loc[:, "item_price"].dtype

dtype('O')

#### Step 13.b. Create a lambda function and change the type of item price

In [0]:
%%time
def convert(x):
    return float(x[1:])

for i in range(len(chipo.item_price)):
    chipo.loc[i, "item_price"] = convert(chipo.item_price[i])

CPU times: user 3.68 s, sys: 56.1 ms, total: 3.73 s
Wall time: 3.66 s


In [0]:
%%time
convert_ = lambda x: float(x[1:])
chipo.loc[:, "item_price"] = chipo.loc[:, "item_price"].apply(convert_)
print(chipo.item_price)

0        2.39
1        3.39
2        3.39
3        2.39
4       16.98
5       10.98
6        1.69
7       11.75
8        9.25
9        9.25
10       4.45
11       8.75
12       8.75
13      11.25
14       4.45
15       2.39
16       8.49
17       8.49
18       2.18
19       8.75
20       4.45
21       8.99
22       3.39
23      10.98
24       3.39
25       2.39
26       8.49
27       8.99
28       1.09
29       8.49
        ...  
4592    11.75
4593    11.75
4594    11.75
4595     8.75
4596     4.45
4597     1.25
4598     1.50
4599     8.75
4600     4.45
4601     1.25
4602     9.25
4603     9.25
4604     8.75
4605     4.45
4606     1.25
4607    11.75
4608    11.25
4609     1.25
4610    11.75
4611    11.25
4612     9.25
4613     2.15
4614     1.50
4615     8.75
4616     4.45
4617    11.75
4618    11.75
4619    11.25
4620     8.75
4621     8.75
Name: item_price, Length: 4622, dtype: float64
CPU times: user 8.35 ms, sys: 1.96 ms, total: 10.3 ms
Wall time: 12.3 ms


#### Step 13.c. Check the item price type

In [0]:
chipo.loc[:, "item_price"].dtype

dtype('float64')

### Step 14. How much was the revenue for the period in the dataset?

In [0]:
revenue = chipo.loc[:, "quantity"] * chipo.loc[:, "item_price"]
revenue = revenue.sum()
revenue

39237.02

### Step 15. How many orders were made in the period?

In [0]:
chipo.loc[:, "order_id"].unique().tolist()[-1]

1834

### Step 16. What is the average revenue amount per order?

In [0]:
# Solution 1



In [0]:
# Solution 2



### Step 17. How many different items are sold?

In [0]:
chipo.loc[:, "item_name"].value_counts().count()

50