# Day 6 –– Pandas, Conditionals, and a Few Good Functions

Today we're going to cover Pandas, we're going to review conditionals, and we're going to be (mainly) introduced to a few helpful functions: map, filter, and apply. We're also going to continue looking into different tools we can use with our functions.

### Libraries

In [51]:
import sys                             # system module 
import pandas as pd                    # data package
import datetime as dt                  # date and time module
import numpy as np                     # foundation for pandas 

### Today's Data

This data is separated by TABS, not COMMAS, as is most CSVs. That's the reason for the **sep = '\t'** argument.


In [317]:
url = 'https://raw.githubusercontent.com/TheUpshot/chipotle/master/orders.tsv'
chipotle = pd.read_csv(url, sep = '\t')   # tab (\t) separated values 
print('Variable dtypes:\n', chipotle.dtypes, sep = '')
chipotle.head()

Variable dtypes:
order_id               int64
quantity               int64
item_name             object
choice_description    object
item_price            object
dtype: object


Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98


# Conditional Logic Review

## Fireround –– Comparison and Boolean Operators

We're going to see that these different **comparison operators** will return True when the relationship is True and False when the relationship proposed is False.

In [172]:
print(4 == 4)

True


In [249]:
print(4 == 4)
print(4 == 5)

True
False


In [250]:
print(4 != 4)
print(4 != 5)

False
True


In [251]:
print('Jets' == 'Jets')
print('Yankees' !=  'Mets')
print('Devils' ==  ' Devils')


True
True
False


In [252]:
print(2 < 4.5)
print(4 > 4)

True
False


In [253]:
print(2 <= 5)
print(6 >= 6)

True
True


Here's a good one:

In [254]:
print('Daenerys' > 'Ned')

False


**Strings** that come first alphabetically are considered **less than**.

In [175]:
3 > 4 and 7 > 5

False

**and** will return 'True' when two expressions are *both* True and False when they are not. **or** is going to return True when **at least one** of our two expressions listed are True and False when they are not.

In [255]:
3 > 5 or 5 < 9

True

**not** acts as a *negation*

In [256]:
not(True)

False

In [257]:
not(not(10 < 1))

False

In [259]:
25 > 13 and not('Robb' > 'Bran' or 3 < 10)

True

## Conditionals Review

In [177]:
chipotle.head(2)

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39


In [181]:
for quantity in chipotle['quantity']:    
    if quantity >= 3:
        print('Large sale')
    else:
        print('Small sale')

Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale

Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Large sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale

Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Large sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale

Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale

In [182]:
for quantity in chipotle['quantity']:    
    if quantity >= 3:
        print('Large sale')
    elif quantity >= 2:
        print('Medium sale')
    else:
        print('Small sale')

Small sale
Small sale
Small sale
Small sale
Medium sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Medium sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Medium sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small s

Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Medium sale
Small sale
Medium sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Medium sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small s

Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Medium sale
Small sale
Medium sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Large sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Medium sale
Small sale
Small sale
Small sale
Small sale
Medium sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small 

Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Medium sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Medium sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Medium sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Medium sale
Small sale
Small 

The else statement is an optional statement and there could be at most only one else statement following if.

The elif statement allows you to check multiple expressions for TRUE and execute a block of code as soon as one of the conditions evaluates to TRUE.

Similar to the else, the elif statement is optional. However, unlike else, for which there can be at most one statement, there can be an arbitrary number of elif statements following an if. We'll find here, with this adjustment, that everything printed is 'Small Sale'

In [191]:
for quantity in chipotle['quantity']:    
    if quantity >= 1:
        print('Small sale')
    elif quantity >= 3:
        print('Large sale')
    else:
        print('Medium sale')

Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale

Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale

Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale

Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale
Small sale


The difference between ELIF and ELSE is that ELSE is our final, "CATCH-ALL" condition whereas ELIF can be one condition of MANY.


Having gone over conditionals and functions (on Tuesday), I wanted to clarify a comment I made the other day. Check the ordering of the conditionals and how, as a result, I'm able to write my function:

In [162]:
randcols = pd.DataFrame(np.random.randint(0, 10, size = 10), columns = ['first'])
randcols.head()

Unnamed: 0,first
0,6
1,0
2,4
3,2
4,1


In [264]:
def fizzbuzz1(data, col):
    """if the number is divisible by 3, return 'Fizz'; 
    if it's divisible by 5, return 'Buzz'; if it's 
    divisible by both 3 and 5, return 'FizzBuzz'; and 
    if it's neither divisible by 3 or 5, print nothing."""
    
    col = data[col]
    
    for fig in col:
        if fig % 3 == 0 and fig % 5 != 0:
            print("Fizz")
        elif fig % 3 != 0 and fig % 5 == 0:
            print("Buzz")
        elif fig % 3 == 0 and fig % 5 == 0:
            print("FizzBuzz")
        else:
            print("")

In [265]:
fizzbuzz1 = fizzbuzz1(randcols, 'first')

Fizz
FizzBuzz



Buzz

Fizz
Fizz



In [266]:
def fizzbuzz2(data, col):
    """if the number is divisible by 3, return 'Fizz'; 
    if it's divisible by 5, return 'Buzz'; if it's 
    divisible by both 3 and 5, return 'FizzBuzz'; and 
    if it's neither divisible by 3 or 5, print nothing."""
    
    col = data[col]
    
    for fig in col:
        if fig % 3 == 0 and fig % 5 == 0:
            print("FizzBuzz")
        elif fig % 3 == 0:
            print("Fizz")
        elif fig % 5 == 0:
            print("Buzz")
        else:
            print("")

In [267]:
fizzbuzz2 = fizzbuzz2(randcols, 'first')

Fizz
FizzBuzz



Buzz

Fizz
Fizz



In [268]:
print(fizzbuzz1 == fizzbuzz2)

True


# Data Cleaning and Pandas Review

# Pandas Review

### .iloc[ ] and .loc[ ]

.iloc[] is used for **integer-location** based indexing/selection by position. It is used to select rows and columns BY NUMBER. .loc[], on the other hand, has two uses:

    * Selecting rows by label/index
    
    * Selecting rows with a boolean / conditional lookup

**.iloc[]** can be written as a ***row*** selector and a ***column*** selector. As a ***row*** selector, we can look at it like this:

In [57]:
chipotle.iloc[0] # Gives you the first row 

order_id                                         1
quantity                                         1
item_name             Chips and Fresh Tomato Salsa
choice_description                             NaN
item_price                                  $2.39 
Name: 0, dtype: object

In [270]:
chipotle.iloc[0:3] # Gives you the first through third rows

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,2.39
1,1,1,Izze,[Clementine],3.39
2,1,1,Nantucket Nectar,[Apple],3.39


Now, let's look at .iloc[] as a column selector:

In [61]:
chipotle.iloc[:, 0] # Gets the first column

0          1
1          1
2          1
3          1
4          2
5          3
6          3
7          4
8          4
9          5
10         5
11         6
12         6
13         7
14         7
15         8
16         8
17         9
18         9
19        10
20        10
21        11
22        11
23        12
24        12
25        13
26        13
27        14
28        14
29        15
        ... 
4592    1825
4593    1825
4594    1825
4595    1826
4596    1826
4597    1826
4598    1826
4599    1827
4600    1827
4601    1827
4602    1827
4603    1827
4604    1828
4605    1828
4606    1828
4607    1829
4608    1829
4609    1829
4610    1830
4611    1830
4612    1831
4613    1831
4614    1831
4615    1832
4616    1832
4617    1833
4618    1833
4619    1834
4620    1834
4621    1834
Name: order_id, Length: 4622, dtype: int64

In [274]:
chipotle.iloc[:, 0:3].head() # Gets the first through third columns

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,2.39
1,1,1,Izze,[Clementine],3.39
2,1,1,Nantucket Nectar,[Apple],3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",16.98


## Quick Question

Now, how would I get the 50th through 60th rows and the second through fourth columns?

In [278]:
chipotle.iloc[49:60, 1:4]

Unnamed: 0,quantity,item_name,choice_description
49,1,Chips and Guacamole,
50,1,Steak Burrito,"[Roasted Chili Corn Salsa (Medium), [Rice, Faj..."
51,2,Canned Soda,[Mountain Dew]
52,1,Chicken Burrito,"[Roasted Chili Corn Salsa (Medium), [Black Bea..."
53,1,Canned Soda,[Sprite]
54,1,Steak Bowl,"[Fresh Tomato Salsa (Mild), [Black Beans, Rice..."
55,1,Chips and Fresh Tomato Salsa,
56,1,Barbacoa Soft Tacos,"[Fresh Tomato Salsa, [Fajita Vegetables, Black..."
57,1,Veggie Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables..."
58,1,Barbacoa Bowl,"[Roasted Chili Corn Salsa, [Fajita Vegetables,..."


In [64]:
chipotle.iloc[49:60, 2:5] 

Unnamed: 0,item_name,choice_description,item_price
49,Chips and Guacamole,,$3.99
50,Steak Burrito,"[Roasted Chili Corn Salsa (Medium), [Rice, Faj...",$8.99
51,Canned Soda,[Mountain Dew],$2.18
52,Chicken Burrito,"[Roasted Chili Corn Salsa (Medium), [Black Bea...",$10.98
53,Canned Soda,[Sprite],$1.09
54,Steak Bowl,"[Fresh Tomato Salsa (Mild), [Black Beans, Rice...",$8.99
55,Chips and Fresh Tomato Salsa,,$2.39
56,Barbacoa Soft Tacos,"[Fresh Tomato Salsa, [Fajita Vegetables, Black...",$9.25
57,Veggie Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",$11.25
58,Barbacoa Bowl,"[Roasted Chili Corn Salsa, [Fajita Vegetables,...",$11.75


***Note*** that the 60th observation is missing.

In [279]:
chipotle.iloc[60] 

order_id                               28
quantity                                1
item_name             Chips and Guacamole
choice_description                    NaN
item_price                           4.45
Name: 60, dtype: object

## Question

What if I was in a really weird mood and I wanted the 10th, the 20th, and the 100th row as of the 1st and 3rd columns? This relates to a question someone asked Tuesday.

In [280]:
chipotle.iloc[[9, 19, 99], [0, 2]]

Unnamed: 0,order_id,item_name
9,5,Steak Burrito
19,10,Chicken Bowl
99,44,Chicken Bowl


In [95]:
chipotle.iloc[[9, 19, 99], [0, 2]] # 1st, 4th, 7th, 25th row + 1st 6th 7th columns.


Unnamed: 0,order_id,item_name
10,5,Chips and Guacamole
20,10,Chips and Guacamole
100,44,Chips and Guacamole


#### Selecting rows by label/index

Selections using the **.loc[]** method are based on the **index** of the dataframe. The .loc[] method directly selects based on index values of any rows. 

In [88]:
chipotle1 = chipotle.set_index('item_name') #.loc['Chicken Bowl', 'choice_description':'item_price'] 
chipotle1.head()


Unnamed: 0_level_0,order_id,quantity,choice_description,item_price
item_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Chips and Fresh Tomato Salsa,1,1,,$2.39
Izze,1,1,[Clementine],$3.39
Nantucket Nectar,1,1,[Apple],$3.39
Chips and Tomatillo-Green Chili Salsa,1,1,,$2.39
Chicken Bowl,2,2,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98


Now with the index set, we can directly select rows for different 'item_name' values using .loc[]. 


In [282]:
chipotle1.loc['Chicken Bowl', 'quantity':'item_price'].head()


Unnamed: 0_level_0,quantity,choice_description,item_price
item_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Chicken Bowl,2,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98
Chicken Bowl,1,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...",$10.98
Chicken Bowl,1,"[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...",$11.25
Chicken Bowl,1,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",$8.75
Chicken Bowl,1,"[Roasted Chili Corn Salsa (Medium), [Pinto Bea...",$8.49


## Quick Question

We've got just 'Chicken Bowl' transactions here. How could we select both 'Chicken Bowl' ***and*** 'Side of Chips' transactions?

In [291]:
chipotle1.loc[['Chicken Bowl', 'Side of Chips'], 'quantity':'item_price']

Unnamed: 0_level_0,quantity,choice_description,item_price
item_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Chicken Bowl,2,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98
Chicken Bowl,1,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...",$10.98
Chicken Bowl,1,"[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...",$11.25
Chicken Bowl,1,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",$8.75
Chicken Bowl,1,"[Roasted Chili Corn Salsa (Medium), [Pinto Bea...",$8.49
Chicken Bowl,1,"[Roasted Chili Corn Salsa, [Rice, Black Beans,...",$11.25
Chicken Bowl,1,"[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...",$8.75
Chicken Bowl,1,"[Fresh Tomato Salsa, [Rice, Black Beans, Chees...",$8.75
Chicken Bowl,1,"[Tomatillo Red Chili Salsa, [Rice, Fajita Vege...",$8.75
Chicken Bowl,1,"[Tomatillo Red Chili Salsa, [Rice, Black Beans...",$8.75


In [78]:
chipotle1.loc[['Chicken Bowl', 'Side of Chips'], 'quantity':'item_price'].head()


Unnamed: 0_level_0,quantity,choice_description,item_price
item_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Chicken Bowl,2,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98
Chicken Bowl,1,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...",$10.98
Chicken Bowl,1,"[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...",$11.25
Chicken Bowl,1,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",$8.75
Chicken Bowl,1,"[Roasted Chili Corn Salsa (Medium), [Pinto Bea...",$8.49


## Another Quick Question

Somebody asked this the other day, but how would we make it so that we only select quantity and item_price (excluding choice_description) for all Chicken Bowl and Side of Chips orders?

In [290]:
chipotle1.loc[['Chicken Bowl', 'Side of Chips'], ['quantity', 'item_price']]

Unnamed: 0_level_0,quantity,item_price
item_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Chicken Bowl,2,$16.98
Chicken Bowl,1,$10.98
Chicken Bowl,1,$11.25
Chicken Bowl,1,$8.75
Chicken Bowl,1,$8.49
Chicken Bowl,1,$11.25
Chicken Bowl,1,$8.75
Chicken Bowl,1,$8.75
Chicken Bowl,1,$8.75
Chicken Bowl,1,$8.75


In [77]:
chipotle1.loc[['Chicken Bowl', 'Side of Chips'], ['quantity','item_price']].head()


Unnamed: 0_level_0,quantity,item_price
item_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Chicken Bowl,2,$16.98
Chicken Bowl,1,$10.98
Chicken Bowl,1,$11.25
Chicken Bowl,1,$8.75
Chicken Bowl,1,$8.49


#### Boolean / Logical indexing using .loc[ ]

In [292]:
type(chipotle1.loc[chipotle1['quantity'] == 4, 'item_price'])

pandas.core.series.Series

## Quick Question

What we see above is a **Series**. A **Series** is similar to a **list** or an **array** in Python. It represents a ***series of values*** (numeric or otherwise) such as a **column of data**. Think of it as a Python list on steroids. It **provides additional functionality, methods, and operators**, which make it a more powerful version of a list.

How, though, could we convert this series into a DataFrame?

In [None]:
dataframe = pd.DataFrame()

In [92]:
chipotle1.loc[chipotle1['quantity'] == 4, ['item_price']].head()

Unnamed: 0_level_0,item_price
item_name,Unnamed: 1_level_1
Chicken Burrito,$35.00
Chips and Fresh Tomato Salsa,$11.80
Bottled Water,$6.00
Bottled Water,$6.00
Canned Soda,$4.36


If selections of a **single column** are made as a string, a series is returned from .loc[]. We need to pass a list to get a DataFrame back.

But so why did I write dataframe like DataFrame? Well, if we didn't know that handy dandy trick, we could also do the following:

In [94]:
series = chipotle1.loc[chipotle1['quantity'] == 4, 'item_price'].head()

DataFrame = pd.DataFrame(series) # Note how DataFrame is written
DataFrame.head()

Unnamed: 0_level_0,item_price
item_name,Unnamed: 1_level_1
Chicken Burrito,$35.00
Chips and Fresh Tomato Salsa,$11.80
Bottled Water,$6.00
Bottled Water,$6.00
Canned Soda,$4.36


Quickly, let's take a look at the original dataset:

In [113]:
chipotle.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98


# Fun things you can do with .loc[]

What if we wanted to select rows where the email column starts with 'Chips', including all columns?

In [297]:
chipotle.loc[chipotle['item_name'].str.startswith("Chips")].head()   


Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,2.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,2.39
10,5,1,Chips and Guacamole,,4.45
14,7,1,Chips and Guacamole,,4.45
15,8,1,Chips and Tomatillo-Green Chili Salsa,,2.39


str. vectorizes string functions for Series and Index

## Quick Question

How would get so that it's only the **item_name** and **item_price** columns that we see?

pandas.core.series.Series

In [106]:
chipotle.loc[chipotle['item_name'].str.startswith("Chips"), ['item_name', 'item_price']].head()


Unnamed: 0,item_name,item_price
0,Chips and Fresh Tomato Salsa,$2.39
3,Chips and Tomatillo-Green Chili Salsa,$2.39
10,Chips and Guacamole,$4.45
14,Chips and Guacamole,$4.45
15,Chips and Tomatillo-Green Chili Salsa,$2.39


What if we wanted to select all columns but only some rows, with choice_description being equal to some value?

In [None]:
chipotle.loc[chipotle['choice_description'].str.]

In [298]:
chipotle.loc[chipotle['choice_description'].isin(['[Apple]', '[Clementine]', '[Blackberry]'])].head()


Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
1,1,1,Izze,[Clementine],3.39
2,1,1,Nantucket Nectar,[Apple],3.39
47,21,1,Izze,[Blackberry],3.39
66,30,1,Izze,[Blackberry],3.39
173,77,1,Nantucket Nectar,[Apple],3.39


### Using the ampersand (&) symbol

Let's try and select rows where item_name ends in 'Nectar' and quantity is greater than 1. 

In [116]:
chipotle.loc[chipotle['item_name'].str.endswith("Nectar") & (chipotle['quantity'] > 1)] 

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
601,247,2,Nantucket Nectar,[Pineapple Orange Banana],$6.78
2379,947,2,Nantucket Nectar,[Peach Orange],$6.78


Nantucket Nectar doesn't seem to be a big seller.

## Quick Question

How could we get this same result, but only with the quantity, item_name, and item_price columns included?

In [301]:
chipotle.loc[chipotle['item_name'].str.endswith("Nectar") & (chipotle['quantity'] > 1), ['quantity', 'item_name', 'item_price']] 

Unnamed: 0,quantity,item_name,item_price
601,2,Nantucket Nectar,6.78
2379,2,Nantucket Nectar,6.78


In [304]:
chipotle.loc[chipotle['item_name'].str.endswith("Nectar") & (chipotle['quantity'] > 1) | (chipotle['quantity'] == 1), ['item_name', 'quantity', 'item_price']] 


Unnamed: 0,item_name,quantity,item_price
0,Chips and Fresh Tomato Salsa,1,2.39
1,Izze,1,3.39
2,Nantucket Nectar,1,3.39
3,Chips and Tomatillo-Green Chili Salsa,1,2.39
5,Chicken Bowl,1,10.98
6,Side of Chips,1,1.69
7,Steak Burrito,1,11.75
8,Steak Soft Tacos,1,9.25
9,Steak Burrito,1,9.25
10,Chips and Guacamole,1,4.45


## Return of the Lambda Function

.apply() is used to –– you guessed it –– apply custom functions to either rows or columns. You'll find it to be really helpful in Pandas. Now what if you wanted to look at observations with two or fewer words in the item_name? How could we go about it? Here, we'll see that we use the .apply() function in conjunction with our very own custom lambda function.

Object `.apply` not found.


In [310]:
# A lambda function that yields True/False values can also be used.
# Select rows where the item name has 3 words or less it.

chipotle.loc[chipotle['item_name'].apply(lambda x: len(x.split(' ')) <= 2)].head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
1,1,1,Izze,[Clementine],3.39
2,1,1,Nantucket Nectar,[Apple],3.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",16.98
5,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...",10.98
7,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",11.75


## Clean Data

In [318]:
original_series = chipotle['item_price']

In [320]:
chipotle['original_series'] = original_series

In [321]:
#chipotle['item_price'] = chipotle['item_price'].apply(lambda x: x.replace('$', ''))
chipotle.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price,original_series
0,1,1,Chips and Fresh Tomato Salsa,,2.39,2.39
1,1,1,Izze,[Clementine],3.39,3.39
2,1,1,Nantucket Nectar,[Apple],3.39,3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,2.39,2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",16.98,16.98


**replace()** returns a copy of the string in which the occurrences of the old have been replaced with the new.

## Quick Question

In [322]:
chipotle['item_price'][0]

'2.39 '

Remember this problem? Let's try and fix it. Write your own code similar to the line above.

In [326]:
chipotle['item_price'] = chipotle['item_price'].apply(lambda x: x.replace(' ', ''))

In [200]:
chipotle['item_price'] = chipotle['item_price'].apply(lambda x: x.replace(' ', ''))
chipotle['item_price'][0]

'2.39'

## map() and filter() functions

Since we just got done reviewing lambda functions, I figured we could now would be a good time to go over the map() and filter() functions. The best time to use a **lambda** function is when you want simple functionalities to be anonymously embedded with larger expressions (i.e., you dont explicitly define your functions as we did in the last lecture.

### The map() function

The **map()** function can be used to apply our **lambda** function to *all* elements of an object. See below:

In [351]:
runs_scored =  [1, 0 , 2, 0, 1, 4, 3, 1, 2]

In [352]:
runs_scored = list(map(lambda x: x**2, runs_scored))
runs_scored

[1, 0, 4, 0, 1, 16, 9, 1, 4]

We've got a map object here. How to convert?

In [340]:
list(runs_scored)

[1, 0, 4, 0, 1, 16, 9, 1, 4]

In [346]:
run_scored2 = list(runs_scored)

In [354]:
runs_scored

[1, 0, 4, 0, 1, 16, 9, 1, 4]

## Paired Programming

Take the following list, add " still lives." to it (saving as survivors), and print your result.

In [360]:
survivors = ['Bran', 'Arya', 'Cersei', 'Tyrion', 'Jamie', 'Jon']

In [362]:
living = map(lambda survivor: survivor + ' still lives.', survivors)
list(living)

['Bran still lives.',
 'Arya still lives.',
 'Cersei still lives.',
 'Tyrion still lives.',
 'Jamie still lives.',
 'Jon still lives.']

16

In [221]:
survivors = map(lambda x: x + ' still lives.', survivors)
list(survivors)

['Bran still lives.',
 'Arya still lives.',
 'Cersei still lives.',
 'Tyrion still lives.',
 'Jamie still lives.',
 'Jon still lives.']

### The filter() function

The **filter()** function gives us a way to **filter out** some strings that don't meet a certain criteria (as defined by our **lambda** function). E.g.,

In [366]:
survivors = ['Bran', 'Arya', 'Cersei', 'Tyrion', 'Jamie', 'Jon']

In [364]:
survivors = filter(lambda x: len(x) <= 4, survivors)
list(survivors)

['Bran', 'Arya', 'Jon']

In [365]:
list(survivors)

[]

## Paired Programming

How would I make it so that only the names that **start with** 'J' are returned? This relates to a question asked last week.

In [367]:
survivors = filter(lambda x: x.startswith('J'), survivors)

In [368]:
survivors

<filter at 0x1161fd860>

In [370]:
list(survivors)

[]

In [237]:
survivors = ['Bran', 'Arya', 'Cersei', 'Tyrion', 'Jamie', 'Jon']
survivors = filter(lambda x: x.startswith('J'), survivors)
list(survivors)

['Jamie', 'Jon']

# Visualizations

## Paired Programming

Taking the original dataset, I'd like you to first clean the price variable and then create a histogram with quantity on the x-axis and price on the y-axis.

In [247]:
import plotly.offline as p
import plotly.graph_objs as go
p.init_notebook_mode(connected=True)


url = 'https://raw.githubusercontent.com/TheUpshot/chipotle/master/orders.tsv'
chipotle = pd.read_csv(url, sep = '\t')   # tab (\t) separated values 
print('Variable dtypes:\n', chipotle.dtypes, sep = '')

"url = 'https://raw.githubusercontent.com/TheUpshot/chipotle/master/orders.tsv'\nchipotle = pd.read_csv(url, sep = '\t')   # tab (\t) separated values \nprint('Variable dtypes:\n', chipotle.dtypes, sep = '')"

In [None]:
chipotle.loc[chipotle['item_name'].apply(lambda x: len(x.split(' ')) <= 2)].head()

In [None]:
chipotle['item_price'] = chipotle['item_price'].apply(lambda x: x.replace('$', ''))

In [375]:
graph = go.Histogram(x = chipotle.item_price, y = chipotle.quantity)

layout = go.Layout(title = 'Price by Quantity',
                   autosize = False,
                   xaxis = dict(
                                #type = 'log',
                                title = 'Item Price',
                                titlefont = dict(
                                family = 'Courier New, monospace',
                                size = 18,
                                color = '#7f7f7f')),
                   yaxis = dict(
                                title = 'Quantity',
                                titlefont = dict(
                                family = 'Courier New, monospace',
                                size = 18,
                                color = '#7f7f7f')))

graph = [graph]

fig = go.Figure(data = graph, layout = layout)
p.iplot(fig)


## Function Work –– the enumerate() function

The **enumerate() function** takes a collection (e.g. a tuple) and returns it as an ***enumerate object***. The **enumerate() function** adds a counter as the key of the enumerate object.

Print all pairwise **combinations** of a given list.
For example, given the list [1,2,3,4], the possible combinations are: [1,2] [1,3] [1,4] [2,3] [2,4] [3,4]

(Order doesn't matter in combinations.)



In [21]:
listtype = [1, 2, 3, 4]

In [150]:
def combinations(items):
    combos = []
    for n, item1 in enumerate(items): # Gives us first the index, then the value
        for item2 in items[n+1:]:
            combos.append([item1, item2])
    return combos

In [151]:
combinations(listtype)

[[1, 2], [1, 3], [1, 4], [2, 3], [2, 4], [3, 4]]

## Bonus Action (the 'yield' keyword)

In [146]:
def permutations(elements):
    if len(elements) <= 1:
        yield elements  # Only permutation possible = no permutation
    else:
        # Iteration over the first element in the result permutation:
        for (index, first_elmt) in enumerate(elements):
            other_elmts = elements[:index] + elements[index+1:]
            for permutation in permutations(other_elmts): 
                yield [first_elmt] + permutation

'yield' creates a generator rather than a typical list object

In [147]:
permutation = all_perms(listtype)
permutation

<generator object all_perms at 0x113d3cd00>

The next() function iterates through the generator one permutation at a time. 

In [148]:
print(next(permutation))
print(next(permutation))
print(next(permutation))

[1, 2, 3, 4]
[1, 2, 4, 3]
[1, 3, 2, 4]


We run the function again so we get the full (rather than the full minus the first three) results.

In [144]:
permutation = all_perms(listtype)
permutation

<generator object all_perms at 0x10acf31a8>

In [145]:
for nums in permutation:
    print(nums)

[1, 2, 3, 4]
[1, 2, 4, 3]
[1, 3, 2, 4]
[1, 3, 4, 2]
[1, 4, 2, 3]
[1, 4, 3, 2]
[2, 1, 3, 4]
[2, 1, 4, 3]
[2, 3, 1, 4]
[2, 3, 4, 1]
[2, 4, 1, 3]
[2, 4, 3, 1]
[3, 1, 2, 4]
[3, 1, 4, 2]
[3, 2, 1, 4]
[3, 2, 4, 1]
[3, 4, 1, 2]
[3, 4, 2, 1]
[4, 1, 2, 3]
[4, 1, 3, 2]
[4, 2, 1, 3]
[4, 2, 3, 1]
[4, 3, 1, 2]
[4, 3, 2, 1]


generator object helpful in that it frees up memory. We can create a generator the same way we create a list comprehension but by using a parenthesis rather than brackets. E.g., say we want to create a list that reports back the square of each one of our list's elements. For the list comprehension, we'd get this:

In [36]:
listtype_sq = [x*x for x in listtype]
listtype_sq

[1, 4, 9, 16]

In [47]:
listtype_sq_gen = (x*x for x in listtype)
listtype_sq_gen

<generator object <genexpr> at 0x104a14200>

And so in order to reveal the contents of this variable we'll have to iterate through –– like so: 

In [48]:
for nums in listtype_sq_gen:
    print(nums)

1
4
9
16


What if I try and run it a second time?

In [49]:
for nums in listtype_sq_gen:
    print(nums)

Nothing comes because I've **exhausted** my generator object