In [1]:
from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

# Lecture 13 Notebook

In this lecture we introduce more complex boolean expressions, conditionals, and for loops.


## Boolean expressions

We have already seen basic boolean expressions before

In [2]:
3 > 1

True

In [3]:
type(3 > 1)

bool

Recall that single `=` is **assignment**.  Thus the following is an error:

```python
3 = 3.0
```

Equality:

In [5]:
3 == 3.0

True

Inequality: 

In [6]:
10 != 2

True

Using variables in boolean expressions:

In [7]:
x = 14
y = 3

In [8]:
x > 15

False

In [9]:
12 < x

True

In [10]:
x < 20

True

Compound boolean expressions:

<br><br>

---

<center> return to slides </center>

---

<br>

## Boolean Expressions with Arrays

Just as arrays can be used in mathematical expressions we can also apply boolean operations to arrays.  They are applied element-wise.

In [None]:
pets = make_array('cat', 'cat', 'dog', 'cat', 'dog', 'rabbit')
pets

In [None]:
pets == 'cat'

How many cats?

Math with booleans

<br><br>

---

<center> return to slides </center>

---

<br>

<br><br>

---
## Rows & Apply

Just as we can access individual columns in a table we can also access individual rows. 

In [21]:
survey = Table.read_table('welcome_survey_sp23.csv')
survey.show(3)

Year,Extraversion,Number of textees,Hours of sleep,Handedness,Pant leg,Sleep position,Pets,Piercings
Second Year,2,5,9,Right-handed,Right leg in first,On your right side,"Cat, Dog, Fish, Snake, Lizard",-3
First Year,2,3,8,Right-handed,I don't know,On your back,,-1
First Year,5,5,8,Right-handed,Right leg in first,On your left side,Bearded dragon,0


Getting a field from a row

Getting an item from a row

### Math On Rows

Suppose we get a row that contains only numbers:

In [None]:
r2 = survey.select("Extraversion", "Number of textees", "Hours of sleep").row(2)
r2

We can apply aggregation functions like sum to that row

Recall that if we wanted to **apply** a function to all the rows of a table we use `apply`

In [None]:
(
    survey
    .select("Extraversion", "Number of textees", "Hours of sleep")
    .apply(sum)
)


Let's use this insight to improve our pivot table:

In [None]:
p = survey.pivot("Sleep position", "Hours of sleep")
p.show()

<br><br><br>
**Exercise:** Add the row totals to the table:

<details><summary>Click for Solution</summary>

```python
p.with_column("Total", p.drop("Hours of sleep").apply(np.sum)).show()
```

</details>

<br><br>

---

<center> return to slides </center>

---

<br><br>

<br><br><br><br>

---

## Conditional Statements

Conditional statements in python allow us to do different things based on the values in our data

In [None]:
x = 20

If the value of x is greater than or equal to 18 then print 'You can legally vote.'

In [14]:
if x >= 18:
    print('You can legally vote.')

Conditionals consist of two main parts:

```python

if boolean expression here :
    # body of the if statement goes here and must be indented
```

Notice than if the boolean expression is False than the body of the if statement is not executed:

In [15]:
print("Can you vote?")

if x >= 18:
    print('You can legally vote.')

print("This is run")
print("The value of x is", x)

Can you vote?
This is run
The value of x is 14


Sometimes you want to do something else if the first statement wasn't true:

In [None]:
if x >= 18:
    print('You can legally vote and drive.')
elif x >= 17:
    print('You can legally drive.')
else:
    print('You can legally drink milk.')

Implementing a function with conditionals and muliple return values:

In [16]:
def age(x):
    if x >= 18:
        print('You can legally vote and drive.')
    elif x >= 17:
        print('You can legally drive.')
    else:
        print('You can legally drink milk.')

In [17]:
age(3)

You can legally drink milk.


In [18]:
age(20)

You can legally vote and drive.


In [19]:
age(23)

You can legally vote and drive.


<br><br><br>
### Putting the peices together

Here we will build a function that returns whether a trip was one way or a round trip:

In [None]:
trip = Table().read_table('trip.csv')
trip.show(3)

In [None]:
def trip_kind(start, end):
    if start == end:
        return 'round trip'
    else:
        return 'one way'

In [None]:
kinds = trip.with_column('Trip Kind', 
                         trip.apply(trip_kind, 'Start Station', 'End Station'))
kinds.show(3)

Pivotting to Trip Kind

In [None]:
kinds_pivot = (
    kinds
    .where('Duration', are.below(600))
    .pivot('Trip Kind', 'Start Station')
    .sort("round trip", descending=True)
    .take(np.arange(10))
)
kinds_pivot

<br><br>

---

<center> return to slides </center>

---

<br><br>

<br><br><br><br>

---

## Simulation

We will use simulation heavily in this class.  A key element of simulation is leveraging randomness. The numpy python library has many functions for generating random events. Today we will use the `np.random.choice` function:

In [None]:
mornings = make_array('wake up', 'sleep in')

In [None]:
np.random.choice(mornings)

In [None]:
np.random.choice(mornings)

In [None]:
np.random.choice(mornings)

We can also pass an argument that specifies how many times to make a random choice:

In [None]:
np.random.choice(mornings, 7)

In [None]:
np.random.choice(mornings, 7)

In [None]:
morning_week = np.random.choice(mornings, 7)
morning_week

In [None]:
sum(morning_week == 'wake up')

In [None]:
sum(morning_week == 'sleep in')

In [None]:
np.mean(morning_week == 'sleep in')

<br><br> 
### Playing a Game of Chance

Let's play a game: we each roll a die. 

If my number is bigger: you pay me a dollar.

If they're the same: we do nothing.

If your number is bigger: I pay you a dollar.

Steps:
1. Find a way to simulate two dice rolls.
2. Compute how much money we win/lose based on the result.
3. Do steps 1 and 2 10,000 times.

### Simulating the roll of a die

In [None]:
die_faces = np.arange(1, 7)
die_faces

In [None]:
np.random.choice(die_faces)

<br><br><br><br>
**Exercise:** Implement a function to simulate a single round of play and returns the result.

<br>
<details><summary>Click for Solution</summary><br>
    
```python
def simulate_one_round():
    my_roll = np.random.choice(die_faces)
    your_roll = np.random.choice(die_faces)

    if my_roll > your_roll:
        return 1
    elif my_roll < your_roll:
        return -1
    else:
        return 0
```
    <br>
</details>

In [None]:
simulate_one_round()

<br><br>

---

<center> return to slides </center>

---

<br><br>

## `For` Statements

The for statement is another way to apply code to each element in a list or an array.

In [None]:
for pet in make_array('cat', 'dog', 'rabbit'):
    print('I love my ' + pet)

**Exercise:** What is the output of this for loop?

In [None]:
x = 0
for i in np.arange(1, 4):
    x = x + i
    print(x)

print("The final value of x is:", x)

<br><br>
**Exercise:** Use a for loop to simulate the total outcome of 10,000 plays of our game of chance:
Calculate wins in 10,000 plays

<details><summary> Click for Solution</summary>
    
```python
N = 10_000
winnings = 0

for i in np.arange(N):
    winnings = winnings + simulate_one_round()
    
print("I win", winnings, "dollars.")
```
</details>

<br><br><br>
**Bonus Exercise:** Use table functions to simulate 10,000 rounds of play:

In [None]:
N = 10_000
rolls = Table().<func_name>
    "my roll", <>,
    "your roll", <>
)

my_roll = rolls.column("my roll")
your_roll = rolls.column("your roll")
outcome = ..

#put outcome back as column in the roll table


<br><details><summary> Click for Solution</summary><br>
    
```python
N = 10_000
rolls = Table().with_columns(
    "my roll", np.random.choice(die_faces, N),
    "your roll", np.random.choice(die_faces, N)
)

my_roll = rolls.column("my roll")
your_roll = rolls.column("your roll")
outcome = 1*(my_roll > your_roll) + -1*(my_roll < your_roll)

rolls = rolls.with_column("outcome", outcome)
rolls
```
</details>

In [None]:
print("My total winnings:", rolls.column("outcome").sum())

<br><br>

---

<center> return to slides </center>

---

<br><br>

<br><br><br>

---

## Appending Arrays

Sometimes we will want to collect the outcomes of our simulations into a single array.  We can do this by appending each experiment to the end of an array using the numpy `np.append` function.

In [None]:
first = np.arange(4)
second = np.arange(10, 17)

In [None]:
np.append(first, 6)

In [None]:
first

In [None]:
np.append(first, second)

**Exercise:** Use append to record the outcomes of all the games rather than just the total.

<details><summary>Click for Solution</summary>
    
```python
N = 10_000

game_outcomes = make_array()

for i in np.arange(N):
    game_outcomes = np.append(game_outcomes, simulate_one_round())
    
game_outcomes
```

</details>

<br><br><br><br>

### Another example: simulating heads in 100 coin tosses

Suppose we simulate 100 coin tosses.  What fraction will be heads?  What if we simulate 100 coin tosses thousands of times.  What fraction will be heads?

In [None]:
coin = make_array('heads', 'tails')

In [None]:
sum(np.random.choice(coin, 100) == 'heads')

In [None]:
# Simulate one outcome

def num_heads():
    return sum(np.random.choice(coin, 100) == 'heads')

In [None]:
# Decide how many times you want to repeat the experiment

repetitions = 10000

In [None]:
# Simulate that many outcomes

outcomes = make_array()

for i in np.arange(repetitions):
    outcomes = np.append(outcomes, num_heads())

In [None]:
heads = Table().with_column('Heads', outcomes)
heads.hist(bins = np.arange(29.5, 70.6))

<br><br><br><br>

--- 
## Optional: Advanced `where`

Sometimes the `are.above_or_equal_to` style syntax will be painful to use.  We can instead construct an array of booleans to select rows from our table.  This will allow us to select rows based on complex boolean expressions spanning multiple columns. 

In [None]:
ages = make_array(16, 22, 18, 15, 19, 39, 27, 21)
patients = Table().with_columns("Patient Id", np.arange(len(ages))+1000, 'Age', ages,)
patients

**Exercise:** Find all the patients that are older than 21 or have a Patient Id that is even:

To compute the even patient ids, we can use the `%` modulus operator: