<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Review python iteration, control flows, and functions

_Author: Kiefer Katovich (SF) and Dave Yerrington (SF)_

---




### Learning Objectives
 
- Explore `Python` control flow and conditional programming.  
- Implement `For` and `While` loops to iterate through data structures.
- Apply `if, else` conditional statements.
- Create functions to perform repetitive actions.
- Demonstrate error-handling using `try, except` statements.
- Combine control flow and conditional statements to solve the classic "FizzBuzz" code challenge.
- Use `Python` control flow and functions to help us parse, clean, edit and analyze the Coffee Preferences dataset.

---
### Lesson Guide

- [If Else Statement](#if_else_statements)
- [Iterating With For Loops](#for_loops)
- [FizzBuzz](#fizz_buzz)
- [Functions](#functions)
- [While Loops](#while_loops)
- [Practice control flow on Coffee Preference dataset](#coffee_preference)


In [1]:
import numpy as np

<a id='if_else_statements'></a>

# If, Else Statements

---

### 1. Write an if-else statement to check whether the suitcase is over 50lb.

Print a message indicating whether or not the suitcase is over 50lbs.

In [2]:
weight = float(input("How many pounds does your suitcase weigh? "))

How many pounds does your suitcase weigh? 70


In [1]:
# A:

---

### 2. Write an if-else statement for multiple conditions.

Print out these recommendations based on the weather conditions:

1. The temperature is higher than 60 degrees and it is raining: Bring an umbrella.
2. The temperature is lower than or equal to 60 degrees and it is raining: Bring an umbrella and a jacket.
3. The temperature is higher than 60 degrees and the sun is shining: Wear a T-shirt.
4. The temperature is lower than or equal to 60 degrees and the sun is shining: Bring a jacket.

In [4]:
temperature = float(input('What is the temperature? '))
weather = raw_input('What is the weather? (rain or shine) ')

What is the temperature? 100
What is the weather? (rain or shine) shine


In [2]:
# A:

---
<a id='for_loops'></a>
# For Loops

---
### 3. Write a `for`-loop that iterates from the number 1 to the number 15.

On each iteration, print out the number.

In [3]:
# A:

---

### 4. Iterate from 1 to 15, printing whether the number is odd or even.

Hint: The modulus operator, `%`, can be used to take the remainder. For example:

```python
9 % 5 == 4
```

Or in other words, the remainder of dividing 9 by 5 is 4. 

In [4]:
# A:

---
<a id='fizz_buzz'></a>
### 5. Iterate from 1 to 30 with the following instructions:

1. If a number is divisible by 3, print 'fizz'. 
2. If a number is divisible by 5, print 'buzz'. 
3. If a number is both divisible by 3 and 5 print 'fizzbuzz'.
4. Otherwise, print just the number.

In [5]:
# A:

---

### 6. Iterate through the following list of animals, and print each one in all caps.

In [9]:
animals = ['duck', 'rat', 'boar', 'slug', 'mammoth', 'gazelle']

In [6]:
# A:

---

### 7. Iterate through the animals list. Capitalize the first letter and append the modified animals to a new list.

In [7]:
# A:

---

### 8. Iterate through the animals. Print out the animal name and the number of vowels in the name.
Hint: You may need to create a variable of vowels for comparison.  

In [8]:
# A:

---
<a id='functions'></a>
# Functions
---

### 9. Write a function that takes word as an argument and returns the number of vowels in the word.

Try it out on three words.

In [9]:
# A:

---

### 10. Write a function to calculate the area of a triangle uaing a height and width.

Test it out.

In [14]:
# A:

---
<a id='while_loops'></a>
# While Loops
---

### 11. While loops and strings.

Iterate over the following sentence repeatedly, counting the number of vowels in the sentence until you have tallied one million. Print out the number of iterations it took to reach that amount.

In [2]:
sentence = "A MAN KNOCKED ON MY DOOR AND ASKED FOR A SMALL DONATION TOWARDS THE LOCAL SWIMMING POOL SO I GAVE HIM A GLASS OF WATER"

In [10]:
# A:

---

### 12. Try to convert elements in a list to floats.

Create a new list with the converted numbers. If something cannot be converted, skip it and append nothing to the new list.

In [18]:
corrupted = ['!1', '23.1', '23.4.5', '??12', '.12', '12-12', '-11.1', '0-1', '*12.1', '1000']

In [11]:
# A:

---
<a id='coffee_preference'></a>

# Practice control flow on Coffee Preference dataset

### 13. Load coffee preference data from file and print

The code to load in the data is provided below. 

The `with open(..., 'r') as f:` opens up a file in "read" mode (rather than "write"), and assigns this opened file to `f`. 

We can then use the `.readlines()` built-in function to split the csv file on newlines and assign it to the variable `lines`.

In [3]:
with open('datasets/coffee-preferences.csv','r') as f:
    lines = f.readlines()

#### Iterate through lines and print them out

In [12]:
# A:

#### Print out just the lines object by typing `lines` in a cell and hitting enter.

In [13]:
# A:

---

### 14. Remove the remaining newline `'\n'` characters with a for-loop.

Iterate through the lines of the data and remove the unwanted newline characters.

**.replace('\n', '')** is a built-in string function that will take the substring you want to replace as its first argument and the string you want to replace it with as its second.

In [14]:
# A:

---

### 15. Split the lines into "header" and "data" variables.

The header is the first string in the list of strings. It contains the column names of our data.

In [15]:
# A:

---

### 16. Split the header and the data strings on commas.

To split a string on the comma character, use the built in **`.split(',')`** function. 

Split the header on commas, then print it. You can see that the original string is now a list containing items that were originally separated by commas.

In [16]:
# A:

---

### 17. Remove the "Timestamp" column.

We aren't interested in the "Timestamp" column in our data, so remove it from the header and the data list.

Removing the Timestamp from the header can be done with list functions or with slicing. To remove the header column from the data, use a for-loop.

Print out the new data object with the timestamps removed.

In [17]:
# A:

---

### 18. Convert numeric columns to floats and empty fields to `None`.

Iterate through the data, and construct a new data list of lists that contains the numeric ratings converted from strings into floats and the empty fields (which are empty strings '') replaced with the None object.

Use a nested for loop (a for loop within another for loop) to get the job done. You will likely need to use if-else conditional statements as well.

Print out the new data object to make sure you've succeeded.

In [28]:
# A:

---

### 19. Count the `None` values per person, and put counts in a dictionary.

Use a for loop to count the number of `None` values per person. Create a dictionary with the names of the people as keys, and the counts of `None` as values.

Who rated the most coffee brands? Who rated the least?

In [18]:
# A:

---

### 20. Calculate average rating per coffee brand.

**Excluding `None` values**, calculate the average rating per brand of coffee.

The final output should be a dictionary with keys as the coffee brand names, and their average rating as the values.

Remember that average can be calculated as the sum of the ratings over the number of ratings:

```python
average_rating = float(sum(ratings_list))/len(ratings_list)
```

Print your dictionary to see the average brand ratings.

In [19]:
# A:

---

### 21. Create a list containing only the people's names.

In [20]:
# A:

---

### 22. Picking a name at random. What are the odds of choosing the same name three times in a row?

Now we'll use a while-loop to "brute force" the odds of choosing the same name 3 times in a row randomly from the list of names.

Below I've imported the **`random`** package, which has the essential function for this code **`random.choice()`**.
The function takes a list as an argument, and returns one of the elements of that list at random.

In [33]:
import random
# Choose a random person from the list of people:
# random.choice(people)

Write a function to choose a person from the list randomly three times and check if they are all the same

Define a function that has the following properties:

1. Takes a list (your list of names) as an argument.
2. Selects a name using `random.choice(people)` three separate times.
3. Returns `True` if the name was the same all three times. Otherwise returns `False`.

In [21]:
# A:

---

### 23. Construct a while loop to run the choosing function until it returns True.

Run the function until you draw the same person three times using a while-loop. Keep track of how many tries it took and print out the number of tries after it runs.

In [22]:
# A:

In [None]:
##### Class 2

In [9]:
import os
import numpy as np
import pandas as pd

pd.set_option('display.max_rows', 10)
pd.set_option('display.notebook_repr_html', True)
pd.set_option('display.max_columns', 10)

In [10]:
df = pd.read_csv(classes/02/datasets/dataset-02-bistro.csv)

NameError: name 'classes' is not defined

In [11]:
import os 
import numpy as np
import pandas as pd

pd.set_option('display.max_rows', 10)
pd.set_option('display.notebook_repr_html', True)
pd.set_option('display.max_columns', 10)

In [12]:
df = pd.read_csv("datasets/coffee-preferences.csv")

In [13]:
df.columns

Index([u'Timestamp', u'Name', u'Starbucks', u'PhilzCoffee',
       u'BlueBottleCoffee', u'PeetsTea', u'CaffeTrieste', u'GrandCoffee',
       u'RitualCoffee', u'FourBarrel', u'WorkshopCafe'],
      dtype='object')

In [7]:
df

Unnamed: 0,Timestamp,Name,Starbucks,PhilzCoffee,BlueBottleCoffee,...,CaffeTrieste,GrandCoffee,RitualCoffee,FourBarrel,WorkshopCafe
0,3/17/2015 18:37:58,Alison,3,5,4.0,...,,,5.0,5.0,
1,3/17/2015 18:38:09,April,4,5,5.0,...,,,3.0,,5.0
2,3/17/2015 18:38:25,Vijay,3,5,5.0,...,3.0,2.0,1.0,1.0,1.0
3,3/17/2015 18:38:28,Vanessa,1,5,5.0,...,,,3.0,2.0,3.0
4,3/17/2015 18:38:46,Isabel,1,4,4.0,...,4.0,,4.0,4.0,
...,...,...,...,...,...,...,...,...,...,...,...
15,3/17/2015 18:40:49,Markus,3,5,,...,,,4.0,,
16,3/17/2015 18:41:18,Otto,4,2,2.0,...,,,3.0,3.0,3.0
17,3/17/2015 18:41:23,Alessandro,1,5,3.0,...,,,4.0,3.0,
18,3/17/2015 18:41:35,Rocky,3,5,4.0,...,3.0,3.0,4.0,4.0,3.0


In [12]:
type(df)

pandas.core.frame.DataFrame

In [13]:
df.head()


Unnamed: 0,Timestamp,Name,Starbucks,PhilzCoffee,BlueBottleCoffee,...,CaffeTrieste,GrandCoffee,RitualCoffee,FourBarrel,WorkshopCafe
0,3/17/2015 18:37:58,Alison,3,5,4.0,...,,,5.0,5.0,
1,3/17/2015 18:38:09,April,4,5,5.0,...,,,3.0,,5.0
2,3/17/2015 18:38:25,Vijay,3,5,5.0,...,3.0,2.0,1.0,1.0,1.0
3,3/17/2015 18:38:28,Vanessa,1,5,5.0,...,,,3.0,2.0,3.0
4,3/17/2015 18:38:46,Isabel,1,4,4.0,...,4.0,,4.0,4.0,


In [14]:
df.tail()

Unnamed: 0,Timestamp,Name,Starbucks,PhilzCoffee,BlueBottleCoffee,...,CaffeTrieste,GrandCoffee,RitualCoffee,FourBarrel,WorkshopCafe
15,3/17/2015 18:40:49,Markus,3,5,,...,,,4.0,,
16,3/17/2015 18:41:18,Otto,4,2,2.0,...,,,3.0,3.0,3.0
17,3/17/2015 18:41:23,Alessandro,1,5,3.0,...,,,4.0,3.0,
18,3/17/2015 18:41:35,Rocky,3,5,4.0,...,3.0,3.0,4.0,4.0,3.0
19,3/17/2015 18:42:01,cheong-tseng eng,3,1,,...,,,4.0,,


In [19]:
df.set_index('Name')

Unnamed: 0_level_0,Timestamp,Starbucks,PhilzCoffee,BlueBottleCoffee,PeetsTea,CaffeTrieste,GrandCoffee,RitualCoffee,FourBarrel,WorkshopCafe
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Alison,3/17/2015 18:37:58,3,5,4.0,3.0,,,5.0,5.0,
April,3/17/2015 18:38:09,4,5,5.0,3.0,,,3.0,,5.0
Vijay,3/17/2015 18:38:25,3,5,5.0,5.0,3.0,2.0,1.0,1.0,1.0
Vanessa,3/17/2015 18:38:28,1,5,5.0,2.0,,,3.0,2.0,3.0
Isabel,3/17/2015 18:38:46,1,4,4.0,2.0,4.0,,4.0,4.0,
...,...,...,...,...,...,...,...,...,...,...
Markus,3/17/2015 18:40:49,3,5,,3.0,,,4.0,,
Otto,3/17/2015 18:41:18,4,2,2.0,5.0,,,3.0,3.0,3.0
Alessandro,3/17/2015 18:41:23,1,5,3.0,2.0,,,4.0,3.0,
Rocky,3/17/2015 18:41:35,3,5,4.0,3.0,3.0,3.0,4.0,4.0,3.0


In [23]:
# Replace nan with 0

df.fillna(0)


Unnamed: 0,Timestamp,Name,Starbucks,PhilzCoffee,BlueBottleCoffee,...,CaffeTrieste,GrandCoffee,RitualCoffee,FourBarrel,WorkshopCafe
0,3/17/2015 18:37:58,Alison,3,5,4.0,...,0.0,0.0,5.0,5.0,0.0
1,3/17/2015 18:38:09,April,4,5,5.0,...,0.0,0.0,3.0,0.0,5.0
2,3/17/2015 18:38:25,Vijay,3,5,5.0,...,3.0,2.0,1.0,1.0,1.0
3,3/17/2015 18:38:28,Vanessa,1,5,5.0,...,0.0,0.0,3.0,2.0,3.0
4,3/17/2015 18:38:46,Isabel,1,4,4.0,...,4.0,0.0,4.0,4.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...
15,3/17/2015 18:40:49,Markus,3,5,0.0,...,0.0,0.0,4.0,0.0,0.0
16,3/17/2015 18:41:18,Otto,4,2,2.0,...,0.0,0.0,3.0,3.0,3.0
17,3/17/2015 18:41:23,Alessandro,1,5,3.0,...,0.0,0.0,4.0,3.0,0.0
18,3/17/2015 18:41:35,Rocky,3,5,4.0,...,3.0,3.0,4.0,4.0,3.0


In [19]:
#Sum columns

df_columns_sum = df.sum(axis = 0)
df_columns_sum

Timestamp           3/17/2015 18:37:583/17/2015 18:38:093/17/2015 ...
Name                AlisonAprilVijayVanessaIsabelIndiaDave HDeepth...
Starbucks                                                          57
PhilzCoffee                                                        85
BlueBottleCoffee                                                   56
                                          ...                        
CaffeTrieste                                                       27
GrandCoffee                                                        10
RitualCoffee                                                       49
FourBarrel                                                         33
WorkshopCafe                                                       35
Length: 11, dtype: object

In [15]:
#Print column names

df.columns

Index([u'Timestamp', u'Name', u'Starbucks', u'PhilzCoffee',
       u'BlueBottleCoffee', u'PeetsTea', u'CaffeTrieste', u'GrandCoffee',
       u'RitualCoffee', u'FourBarrel', u'WorkshopCafe'],
      dtype='object')

In [14]:
# Print first 2 columns

df['Timestamp']

0     3/17/2015 18:37:58
1     3/17/2015 18:38:09
2     3/17/2015 18:38:25
3     3/17/2015 18:38:28
4     3/17/2015 18:38:46
             ...        
15    3/17/2015 18:40:49
16    3/17/2015 18:41:18
17    3/17/2015 18:41:23
18    3/17/2015 18:41:35
19    3/17/2015 18:42:01
Name: Timestamp, Length: 20, dtype: object

In [17]:
# Who are in the datasets?

df.Name.unique()

array(['Alison', 'April', 'Vijay', 'Vanessa', 'Isabel', 'India', 'Dave H',
       'Deepthi', 'Ramesh', 'Hugh Jass', 'Alex', 'Ajay Anand',
       'David Feng', 'Zach', 'Matt', 'Markus', 'Otto', 'Alessandro',
       'Rocky', 'cheong-tseng eng'], dtype=object)

In [20]:
# What days are coffees being sold?
df.Timestamp.unique()

array(['3/17/2015 18:37:58', '3/17/2015 18:38:09', '3/17/2015 18:38:25',
       '3/17/2015 18:38:28', '3/17/2015 18:38:46', '3/17/2015 18:39:01',
       '3/17/2015 18:39:05', '3/17/2015 18:39:14', '3/17/2015 18:39:23',
       '3/17/2015 18:39:30', '3/17/2015 18:39:35', '3/17/2015 18:39:42',
       '3/17/2015 18:40:44', '3/17/2015 18:40:49', '3/17/2015 18:41:18',
       '3/17/2015 18:41:23', '3/17/2015 18:41:35', '3/17/2015 18:42:01'], dtype=object)

In [38]:
# How much did Alison score the coffese?


name_index = df.set_index('Name')
name_index.fillna(0)

name_totals = name_index.sum(axis=1)
name_totals

Name
Alison              25.0
April               25.0
Vijay               26.0
Vanessa             21.0
Isabel              23.0
                    ... 
Markus              15.0
Otto                22.0
Alessandro          18.0
Rocky               32.0
cheong-tseng eng     8.0
Length: 20, dtype: float64

In [44]:
# What was the average score for everyone?

df.groupby(['Name'])

<pandas.core.groupby.DataFrameGroupBy object at 0x0000000009B3DA20>