# Lunch & Learn
---------------------

### Control Structures

We have spent a ton on time on the pandas-verse and are now progressing to discuss control structures. To computer science folks, control structures are old news, however for analysts and business users, these concepts are likely new. 

__Control Structures__ -- blocks of code that dictate the flow of control, or said differently, they are a container for a series of funtion calls, instructions and statements. 

-------------------
### Todays Discussion

Basic:
* Boolean Comparisons and Operators 
* if, elif, else
* for loops

Advanced:
* control structures with pandas
* range function

Putting it All Together
* Challenges using all prior concepts discussed

Further Reading
* Additional Resources
--------------------


In [1]:

# Import pandas, numpy, image packages along with the mtcars dataset from online.
import pandas as pd
import numpy as np
df = pd.read_csv("https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv")


---------------------------
#### Boolean Comparisons and Operators

In python, booleans evaluate to True or False and they are created by comparing values. Many other control structures utlize this boolean logic to establish the rules at which they perform their operations. 

__Basic Boolean__

In [2]:
# comparing equality between objects
1 == 2

False

In [3]:
# comparing which object is greater
1 > 0

True

In [4]:
# comparing which object is less than or equal to
1 <= 0

False

In [5]:
# boolean of dis-comparison
1 != 0

True

In [6]:
# comparing strings: similarity
'dog' == 'cat'

False

In [7]:
# comparing strings: similarity
'fast' == 'fast'

True

In [8]:
# comparing strings: dis-similarity
'dog' != 'cat'

True

In [9]:
# Are dogs actually better than cats? (FYI, this is because the letter d is further in the alphabet than c)
'dog' > 'cat'

True

__Multiple Booleans__

In [10]:
# if we need to ask multiple questions that evaluate to boolean with AND
(1.1 > 1.0) & ('dog' == 'dog')

True

In [11]:
# if we need to ask multiple questions that evaluate to boolean with AND
(1.1 > 1.0) & ('dog' == 'cat')

False

In [12]:
# if we need to ask multiple questions that evaluate to boolean with OR
(1.1 > 1.0) | ('dog' == 'cat')

True

---------------------------------
#### if / elif / else Statements


> __if__ -- use if statements to run code, only if a particular condition holds, or stated otherwise, if the 'if' statement is True, the code beneath is run, otherwise it is not run.


In [13]:
if 1 == 1.0:
    print("They are the same")

They are the same


In [14]:
if 1 > 200:
    print("They are the same")

> __elif__ -- use elif statement to run code, only if a particular condition holds, the same as an if statement, however in a succession of logic after the first if statement you can use an __elif__ statement to continue logic.

In [15]:
x = 1

if x < 0:
    print("x is less than 0")
elif x > 1:
    print("x is greater than 1")
elif x == 1:
    print("x is 1")

x is 1


It is important to note here that the order in which we ask the boolean questions matters, insofar as if the original value evaluates to the first statement, it will never arrive to the subsequent elif statements. 

For example, below shows that if x = 1 and the first statement is x != 0 then the statement is satisfied and the control structure ceases evaluation of that object / object element.

In [16]:
x = 1

if x != 0:
    print("x is not 0")
elif x > 1:
    print("x is greater than 1")
elif x == 1:
    print("x is 1")

x is not 0


> __else__ -- follows and if statement and contains code that is called when all previous if statements evaluate to False. Else's do not include any conditional statements.

In [17]:
x = np.nan

if x < 0:
    print("x is not 0")
elif x > 1:
    print("x is greater than 1")
elif x == 1:
    print("x is 1")
else:
    print("I don't know what x is!")

I don't know what x is!


The above series of control strugures highlights an important thing to note in healthcare data science, NULL values. With shoddy data amuck, you will become incredibly familiar with missing data and they will often break many of your most basic control structure assumptions. 

However, do not fret, as you familiarize yourself with live data from a pythonic lense, you will become accustomed to dealing with NULL values.

-------------------------------
#### for Loops

> __for loops__ -- a control structure used for iterating over a sequence (list, tuple, dictionary, set, string, numpy array, pandas dataframes)


__for Loop with list__

In [18]:
animals = ["dog", "cat", "hamster", "snake", "newt"]

In [19]:
for i in animals:
    print(i)

dog
cat
hamster
snake
newt


__for Loop with dictionary__

In [20]:
animals = {"animal1": "dog", 
           "animal2": "cat", 
           "animal3": "hamster", 
           "animal4": "snake", 
           "animal5": "newt"}

In [21]:
for key in animals.keys():
    print(key)

animal1
animal2
animal3
animal4
animal5


In [22]:
for value in animals.values():
    print(value)

dog
cat
hamster
snake
newt


In [23]:
for key, value in animals.items():
    print(key, value)

animal1 dog
animal2 cat
animal3 hamster
animal4 snake
animal5 newt


__for Loop with string__

In [24]:
dog_name = "Scout"

In [25]:
for letter in dog_name:
    print(letter)

S
c
o
u
t


__loop with pandas dataframe__

In [26]:
df.head()

Unnamed: 0,model,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
1,Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
2,Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
3,Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
4,Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2


In [27]:
for i in df.model:
    print(i.upper())

MAZDA RX4
MAZDA RX4 WAG
DATSUN 710
HORNET 4 DRIVE
HORNET SPORTABOUT
VALIANT
DUSTER 360
MERC 240D
MERC 230
MERC 280
MERC 280C
MERC 450SE
MERC 450SL
MERC 450SLC
CADILLAC FLEETWOOD
LINCOLN CONTINENTAL
CHRYSLER IMPERIAL
FIAT 128
HONDA CIVIC
TOYOTA COROLLA
TOYOTA CORONA
DODGE CHALLENGER
AMC JAVELIN
CAMARO Z28
PONTIAC FIREBIRD
FIAT X1-9
PORSCHE 914-2
LOTUS EUROPA
FORD PANTERA L
FERRARI DINO
MASERATI BORA
VOLVO 142E


__loop with pandas dataframe using index range__

In [28]:
for i in range(0, len(df)):
    print(i)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


In [29]:
for i in range(0, len(df)):
    print(f"Miles per 100 gallons: {df.loc[i,'mpg']*100}") 

Miles per 100 gallons: 2100.0
Miles per 100 gallons: 2100.0
Miles per 100 gallons: 2280.0
Miles per 100 gallons: 2140.0
Miles per 100 gallons: 1870.0
Miles per 100 gallons: 1810.0000000000002
Miles per 100 gallons: 1430.0
Miles per 100 gallons: 2440.0
Miles per 100 gallons: 2280.0
Miles per 100 gallons: 1920.0
Miles per 100 gallons: 1780.0
Miles per 100 gallons: 1639.9999999999998
Miles per 100 gallons: 1730.0
Miles per 100 gallons: 1520.0
Miles per 100 gallons: 1040.0
Miles per 100 gallons: 1040.0
Miles per 100 gallons: 1470.0
Miles per 100 gallons: 3240.0
Miles per 100 gallons: 3040.0
Miles per 100 gallons: 3390.0
Miles per 100 gallons: 2150.0
Miles per 100 gallons: 1550.0
Miles per 100 gallons: 1520.0
Miles per 100 gallons: 1330.0
Miles per 100 gallons: 1920.0
Miles per 100 gallons: 2730.0
Miles per 100 gallons: 2600.0
Miles per 100 gallons: 3040.0
Miles per 100 gallons: 1580.0
Miles per 100 gallons: 1970.0
Miles per 100 gallons: 1500.0
Miles per 100 gallons: 2140.0


#### Putting it all together

We've learned boolean operators, if/elif/else statements and for loops. Now, lets put it all together.

> __Challenge:__ Given our mtcars dataset, can we take each element of the dataset, can we:
>> 1. Capitalize each car name that starts with M and lowercase the rest
>> 2. Calculate horsepower/weight of all 6 or 8 cyl vehicles, horsepower * weight for all 5 cyl and np.nan for all other cyls. Save new column as 'nonsense'

In [30]:
df.head()

Unnamed: 0,model,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
1,Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
2,Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
3,Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
4,Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2


In [31]:
# Challenge 1:

for i in range(0,len(df)):
    if df.loc[i,'model'].startswith("M"):
        df.loc[i,'model'] = df.loc[i,'model'].upper()
    else:
        df.loc[i,'model'] = df.loc[i,'model'].lower()

In [32]:
df

Unnamed: 0,model,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,MAZDA RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
1,MAZDA RX4 WAG,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
2,datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
3,hornet 4 drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
4,hornet sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
5,valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
6,duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
7,MERC 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
8,MERC 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
9,MERC 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4


In [33]:
# Challenge 2:

df['nonsense'] = 0 # instantiate column by broadcasting 0 

for i in range(0,len(df)):
    if (df.loc[i,'cyl'] == 6) | (df.loc[i,'cyl'] == 8):
        df.loc[i,'nonsense'] = df.loc[i,'hp'] / df.loc[i,'wt']
    elif df.loc[i,'cyl'] == 5:
        df.loc[i,'nonsense'] = df.loc[i,'hp'] * df.loc[i,'wt']
    else:
        df.loc[i,'nonsense'] = np.nan

In [34]:
df

Unnamed: 0,model,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb,nonsense
0,MAZDA RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4,41.984733
1,MAZDA RX4 WAG,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4,38.26087
2,datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1,
3,hornet 4 drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1,34.214619
4,hornet sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2,50.872093
5,valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1,30.346821
6,duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4,68.627451
7,MERC 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2,
8,MERC 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2,
9,MERC 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4,35.755814


--------------------
### Further Ressources

Control Structure Tutorials
* conditional statements -- https://realpython.com/python-conditional-statements/
* if/elif/else -- https://www.datacamp.com/community/tutorials/python-if-elif-else
* loops -- https://www.datacamp.com/community/tutorials/loops-python-tutorial

Control Structure CheatSheets
* https://intellipaat.com/blog/tutorial/python-tutorial/data-structures-with-python-cheat-sheet/

