---
<a id='coffee_preference'></a>

# Practice Control Flow on the Coffee Preference Data Set

### 1) Load Coffee Preference data from file and print.

The code to load in the data is provided below. 

The `with open(..., 'r') as f:` opens up a file in "read" mode (rather than "write") and assigns this opened file to `f`. 

We can then use the built-in `.readlines()` function to split the CSV file on newlines and assign it to the variable `lines`.

In [None]:
import random


In [None]:
# Open and read in the coffee preference data set. Note we are opening the file for reading only.
with open('../datasets/coffee-preferences.csv','r') as f:
    lines = f.readlines()

#### Iterate through `lines` and print them out.

In [None]:
for line in lines:
    print(line)

#### Print out just the `lines` object by typing "lines" in a cell and hitting `enter`.


In [None]:
lines

---

### 2) Remove the remaining newline `'\n'` characters with a `for` loop.

Iterate through the lines of the data and remove the unwanted newline characters.

**.replace('\n', '')** is a built-in string function that will take the substring you want to replace as its first argument and the string you want to replace it with as its second.

In [None]:
# create an empty list to hold our cleaned data.
cleaned_lines = []
for line in lines:
    cleaned_lines.append(line.replace('\n',''))

cleaned_lines

In [None]:
cleaned_lines = [line.strip('\n') for line in lines  ]
cleaned_lines

---

### 3) Split the lines into "header" and "data" variables.

The header is the first string in the list of strings. It contains our data's column names.

In [None]:
header = cleaned_lines[0]
data = cleaned_lines[1:]

In [None]:
header

---

### 4) Split the header and data strings on commas.

To split a string on the comma character, use the built-in **`.split(',')`** function. 

Split the header on commas, then print it. You can see that the original string is now a list containing items that were originally separated by commas.

In [None]:
# Split on commas:
header = header.split(',')
print(header)

split_data = []
for d in data:
    split_data.append(d.split(','))

In [None]:
split_data[:3]

---

### 5) Remove the "Timestamp" column.

We aren't interested in the "Timestamp" column in our data, so remove it from the header and data list.

Removing "Timestamp" from the header can be done with list functions or with slicing. To remove the header column from the data, use a `for` loop.

Print out the new data object with the timestamps removed.

In [None]:
# Remove Timestamp - just exclude the 1st element in the header and each row of data.
header = header[1:]

data_nots = []
for row in split_data:
    data_nots.append(row[1:])
    
data_nots

---

### 6) Convert numeric columns to floats and empty fields to `None`.

Iterate through the data and construct a new data list of lists that contains the numeric ratings converted from strings to floats and the empty fields (which are empty strings, '') replaced with the `None` object.

Use a nested `for` loop (a `for` loop within another `for` loop) to get the job done. You will likely need to use `if… else` conditional statements as well.

Print out the new data object to make sure you've succeeded.

In [None]:
data_num = []
for row in data_nots:
    new_row = []
    for i, col in enumerate(row):
        if i == 0:
            new_row.append(col)
        else:
            if col == '':
                new_row.append(None)
            else:
                new_row.append(float(col))
    
    data_num.append(new_row)
    
data_num

---

### 7) Count the `None` values per person and put the counts in a dictionary.

Use a `for` loop to count the number of `None` values per person. Create a dictionary with the names of the people as keys and the counts of `None` as values.

Who rated the most coffee brands? Who rated the least?

In [None]:
# rated least

def ratings_amount(data, least=True):
    """determine who rated the least or most brands"""

    user_nones = {}
    for row in data:
        nones = 0
        for cell in row:
            if cell == None:
                nones += 1

        user_nones[row[0]] = nones
        
    if least:
        rating_count = [key for m in [max(user_nones.values())] for key,val in user_nones.items() if val == m]
    else:
        rating_count = [key for m in [min(user_nones.values())] for key,val in user_nones.items() if val == m]

    return rating_count

ratings_amount(data_num, least=True)

# Least: Alex, Dave H, cheong-tseng
# Most: Hugh Jass, Matt, Rocky, Vijay

---

### 8) Calculate average rating per coffee brand.

**Excluding `None` values**, calculate the average rating per brand of coffee.

The final output should be a dictionary with the coffee brand names as keys and their average rating as the values.

Remember that the average can be calculated as the sum of the ratings over the number of ratings:

```python
average_rating = float(sum(ratings_list))/len(ratings_list)
```

Print your dictionary to see the average brand ratings.

In [None]:
brand_ratings = {}
for brand in header[1:]:
    brand_ratings[brand] = []

for row in data_num:
    for i, cell in enumerate(row):
        if i > 0 and not cell == None:
            brand_ratings[header[i]].append(cell)

brand_avg_ratings = {}
for brand, ratings in brand_ratings.items():
    #print('{} {}'.format(brand, ratings))
    brand_avg_ratings[brand] = round(sum(ratings)/len(ratings),2)
    
brand_avg_ratings

---

### 9) Create a list containing only the people's names.

In [None]:
people = []
for row in data_num:
    people.append(row[0])
    
print(len(people))
people

---

### 11) Picking a name at random. How many attempts to choose the same name three times in a row?

Now, we'll use a `while` loop to "brute force" the odds of choosing the same name three times in a row randomly from the list of names.

"Brute force" is a term used quite frequently in programming to refer to a computationally inefficient way of solving a problem. It's brute force in this situation because we can use statistics to solve this much more efficiently than if we actually played out an entire scenario.

Below, we've imported the **`random`** package, which has the essential function for this code: **`random.choice()`**.
The function takes a list as an argument and returns one of the elements of that list at random.

In [None]:
#import random



Write a function to choose a person from the list randomly three times and check if they are all the same.

Define a function that has the following properties:

1) Takes a list (your list of names) as an argument.
2) Selects a name using `random.choice(people)` three separate times.
3) Returns `True` if the name was the same all three times; otherwise returns `False`.

In [None]:
def choose_three(names):
    """Choose 3 names at random."""
    
    person1 = random.choice(names)
    person2 = random.choice(names)
    person3 = random.choice(names)
    
    if person1 == person2 == person3:
        return True
    else:
        return False
    
choose_three(people)

---

### 12) Construct a `while` loop to run the choosing function until it returns `True`.

Run the function until you draw the same person three times using a `while` loop. Keep track of how many tries it took and print out the number of tries after it runs.

In [None]:
tries = 0
chose_same_person = False

while not chose_same_person:
    tries += 1
    
    same_person = choose_three(people)
    if same_person:
        chose_same_person = True

tries


<a name="conclusion"></a>
## Lesson Summary

Let's review what we learned in this lab. We:

- Discussed why Python is popular for data science.
- Demonstrated basic data science tasks using `Python` control flow and functions. These function helped us parse, clean, edit, and analyze the Coffee Preferences data set.


### Additional Questions?

....

### Additional Resources

- [Learn Python on Codecademy](https://www.codecademy.com/learn/python)
- [Learn Python the Hard Way](https://learnpythonthehardway.org)
- [Python Data Types and Variables](http://www.python-course.eu/variables.php)
- [Python IF… ELIF… ELSE Statements](https://www.tutorialspoint.com/python/python_if_else.htm)
- [Python Loops](https://www.tutorialspoint.com/python/python_loops.htm)
- [Python Control Flow](https://python.swaroopch.com/control_flow.html)