# Conditionals in Python
### by [Jason DeBacker](https://jasondebacker.com), September 2019

This Jupyter Notebook outlines the use of conditional statements in Python.

if 
else
elif

or statements

the any(), all() methods for sequences

try

dummy vars/masking

multiple conditions

multiple conditions for creating pandas columns

## If statements

These are simple - let's just see some examples to understand the syntax in Python:

In [9]:
number = 4
if number == 4:  # end conditional with colon, double equal to check that equals
    print('Number equals 4')  # indent under conditional statement

Number equals 4


In [2]:
# try an else condition
if number > 4:
    print('Number greater than 4')
else:
    print('Number not greater than 4')

Number not greater than 4


In [3]:
# try and else-if condition
if number > 4:
    print('Number greater than 4')
elif number==4:  # Notice the elif rather than elseif
    print('Number equals 4')
else:
    print('Number if less than 4')

Number equals 4


In [10]:
# test multiple conditions
if (number > 3 ) & (number < 10):
    print('Number is in (3, 10)')

Number is in (3, 10)


In [7]:
# test with or condition
number = 11
if (number < 3 ) | (number > 10):
    print('Number is not in (3, 10)')

Number is not in (3, 10)


In [8]:
# can use word as well for "and" and "or" (though not in all cases)
if (number < 3 ) or (number > 10):
    print('Number is not in (3, 10)')

Number is not in (3, 10)


In [33]:
# with booleans
test_run = False
if test_run:
    print('this is a test run')
else:
    print('this is not a test run')

this is not a test run


## Conditionals with Numpy arrays

Conditional statements can be very useful for testing properties of Numpy arrays or for creating arrays of indicator values.  Let's see how conditionals work on Numpy arrays.

In [17]:
import numpy as np
a = np.random.randn(5, 5)  # generate 5x5 array of random numbers from random normal distrib
a

array([[ 0.46590533, -0.46929017,  1.08241086, -0.06899815,  0.1724926 ],
       [-1.3765027 , -1.3819083 ,  1.3671285 ,  0.18686497,  1.08951608],
       [ 1.16748088, -0.80493869, -0.98400941, -0.59034945,  0.70256022],
       [ 0.89179762,  1.15780931, -0.99985068,  1.10524979,  0.63942274],
       [ 0.53457973,  1.13923449,  0.49819905,  1.0892756 ,  0.75325164]])

In [19]:
# create an array that provides results of tests for a condition
# in a
b = a > 1
b

array([[False, False,  True, False, False],
       [False, False,  True, False,  True],
       [ True, False, False, False, False],
       [False,  True, False,  True, False],
       [False,  True, False,  True, False]], dtype=bool)

In [20]:
a * b

array([[ 0.        , -0.        ,  1.08241086, -0.        ,  0.        ],
       [-0.        , -0.        ,  1.3671285 ,  0.        ,  1.08951608],
       [ 1.16748088, -0.        , -0.        , -0.        ,  0.        ],
       [ 0.        ,  1.15780931, -0.        ,  1.10524979,  0.        ],
       [ 0.        ,  1.13923449,  0.        ,  1.0892756 ,  0.        ]])

In [21]:
# an if statement with a numpy array
if a > 1:
    print("a is greater than one")

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [22]:
if (a > 1).any():
    print("some element of a is greater than one")

some element of a is greater than one


In [23]:
if (a > 1).all():
    print("all elements of a are greater than one")
else:
    print("not all elements of a are greater than one")

not all elements of a are greater than one


## Conditionals in Pandas

Conditional statements can be very useful in Pandas for creating indicator variables or for creating subsets of your data.  

Let's create an indicator variable.

In [26]:
import pandas as pd

# create a dictionary with some data
data = {'school': ['Texas', 'Texas', 'Texas', 'UGA', 'UGA'],
        'year': [2014, 2015, 2016, 2015, 2016],
        'wins': [6, 5, 5, 10, 8]}

# create a DataFrame from the dictionary 
frame = pd.DataFrame(data)

# create a new indicator variable
frame['winning season'] = frame['wins'] > 6
frame

Unnamed: 0,school,wins,year,winning season
0,Texas,6,2014,False
1,Texas,5,2015,False
2,Texas,5,2016,False
3,UGA,10,2015,True
4,UGA,8,2016,True


In [29]:
# an indicator variable with mulitple conditions
frame['winning 2016'] = (frame['wins'] > 6) & (frame['year'] == 2016) # note use of parentheses
frame

Unnamed: 0,school,wins,year,winning season,winning 2016
0,Texas,6,2014,False,False
1,Texas,5,2015,False,False
2,Texas,5,2016,False,False
3,UGA,10,2015,True,False
4,UGA,8,2016,True,True


Now lets use conditions to slice our data.

In [30]:
texas = frame[frame['school'] == 'Texas']
texas

Unnamed: 0,school,wins,year,winning season,winning 2016
0,Texas,6,2014,False,False
1,Texas,5,2015,False,False
2,Texas,5,2016,False,False


In [32]:
# slice with multiple conditions
texas_last2 = frame[(frame['school'] == 'Texas') & (frame['year'] > 2014)]
texas_last2

Unnamed: 0,school,wins,year,winning season,winning 2016
1,Texas,5,2015,False,False
2,Texas,5,2016,False,False


## The try statement

The try statement works as follows.

* First, the try clause (the statement(s) between the try and except keywords) is executed.
* If no exception occurs, the except clause is skipped and execution of the try statement is finished.
* If an exception occurs during execution of the try clause, the rest of the clause is skipped. Then if its type matches the exception named after the except keyword, the except clause is executed, and then execution continues after the try statement.
* If an exception occurs which does not match the exception named in the except clause, it is passed on to outer try statements; if no handler is found, it is an unhandled exception and execution stops with a message as shown above.

This statement can be useful in cases such as:
* You might have a pre-computed object saved, but would like to go ahead and recreate it if not
* A function in your code might return something unexpected (e.g. 'NaN' values) and you want to alert yourself or other users to this

And more. 

Let's see this in action by trying to load a csv file that isn't there.

In [38]:
try:
    df = pd.read_csv('mydata.csv')
except:
    print("Can't find the data file")

Can't find the data file


## Is vs ==

One needs to be mindful of a subtly of Python conditionals. At first, it looks like there are two ways to test for the same thing.  Using `==` and using `is`.  For instance:

In [34]:
number = 4
# test a condition
if number == 4:
    print('Number equals 4')

# another way to test the condition
if number is 4:
    print('Number is 4')

Number equals 4
Number equals 4


But `==` and `is` actually test two differnet things.  `==` test for equality.  `is` tests for *identity*.  That is, `is` tests whether the name (i.e., variable) points to a the same object.  In the above we have `number` is equal to 4, but also that `number` points to the object that is the number 4.  Let's see this break down.

In [1]:
a = [1, 2, 3]
b = a
c = [1, 2, 3]

# test that a and c are equal
if a == c:
    print('a equals c')


# test that a and c point to the same object
if a is c:
    print('a is c')
else:
    print('a is not c')
# test that a and b point to the same object
if a is b:
    print('a is b')

a equals c
a is not c
a is b


So although `a`, `b`, `c` are all the same list of numbers, only the `a` and `b` point to the same object.  `c` points to an equal, but distinct object.