<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Introduction to Python

_Authors: Kiefer Katovich (San Francisco), Dave Yerrington (San Francisco), Joseph Nelson (Washington, D.C.), Sam Stack (Washington, D.C.)_

---


### Learning Objectives
 
#### Part 1: Python Datatypes
**After this lesson, you will be able to:**
- Discuss Python as a programming language.
- Define integers, strings, tuples, lists, and dictionaries.
- Demonstrate arithmetic operations and string operations.
- Demonstrate variable assignment.

#### Part 2: Python Iterations, Control Flow, and Functions
**After this lesson, you will be able to:**
- Understand `Python` control flow and conditional programming.  
- Implement `for` and `while` loops to iterate through data structures.
- Apply `if…else` conditional statements.
- Create functions to perform repetitive actions.
- Demonstrate error-handling using `try, except` statements.
- Combine control flow and conditional statements to solve the classic "FizzBuzz" code challenge.
- Use `Python` control flow and functions to help us parse, clean, edit, and analyze the Coffee Preferences data set.
---

### Lesson Guide

#### [Part 1: Python Datatypes](#why_py)
- [Why Python?](#why_py)
- [Introduction to Data Types](#intro)
- [Jupyter Notebook](#jupyter_nb)
- [Python Variables](#variables)
- [Operators](#operators)
- [Integers and Floats](#numbers)
- [Strings](#strings)
	-[String Indexing](#slicing)
- [Printing Strings](#print)
- [Lists](#lists)
- [Tuples](#tuples)
- [Dictionaries](#dictionary)
- [Importing Packages and Documentation](#import)
- [Practice With a Partner](#ind-practice)


#### [Part 2: Python Iterations, Control Flow, and Functions](#py_i)
- [`if…else` Statement](#if_else_statements)
- [Iterating With `for` Loops](#for_loops)
- [FizzBuzz](#fizz_buzz)
- [Functions](#functions)
- [`while` Loops](#while_loops)
- [Practice Control Flow on Coffee Preference Data Set](#coffee_preference)
- [Conclusion](#conclusion)
----

<a id='why_py'></a>

## Why Python?

Python was created by Guido van Rossum and released back in 1991. Since then, Python has greatly grown as a high-level, general-purpose programming language with a huge open-source community supporting it. The language was developed to emphasize readability of code (specifically, white-space use and syntax). "The Zen of Python" is a poem that explains the nature of the Python functionality.


#### _The Zen of Python_  
_Beautiful is better than ugly.  
Explicit is better than implicit.  
Simple is better than complex.  
Complex is better than complicated.  
Flat is better than nested.  
Sparse is better than dense.  
Readability counts.  
Special cases aren't special enough to break the rules. Although practicality beats purity.   
Errors should never pass silently.  
Unless explicitly silenced.  
In the face of ambiguity, refuse the temptation to guess.  
There should be one — and preferably only one — obvious way to do it. Although that way may not be obvious at first unless you're Dutch.
Now is better than never.  
Although never is often better than right now.  
If the implementation is hard to explain, it's a bad idea.  
If the implementation is easy to explain, it may be a good idea.   
Namespaces are one honking great idea — let's do more of those. _

---

## Why Python for Data Science?

##### General Purpose, Open Source, and Readability

These are some of the more prominent reasons Python has been so widely adopted for data science.

**General purpose:** Python was not intended just to be used for software development or website development. Instead it comes with the basic building blocks you need to develop anything you want out of it.

**Open Source:** Going back to the "basic building blocks" point; a large open-source community has already created hundreds of libraries containing combinations of the foundation blocks to create more specific toolsets. Here are a few examples:
- Requests: Interacting with websites
- Django: Python web framework
- Pandas: Data scientists' best friend
- Pyglet: GUI application building
- TensorFlow: Google's machine learning library


**Readability:** They're called programming languages because learning them is similar to learning a written language, but instead of learning how to communicate with a person, you're learning how to communicate with a computer. When a foreign language is similar to your native language it is much easier to pick up. The same can be said for Python, whose general flow makes it a lot easier for humans to read and interpret code.


---

<a id='intro'></a>
## Introduction: Python Data Types

There are several _standard_ data types within Python, the six most common being:

**Integers:** Whole numbers from negative infinity to infinity, such as 1, 0, -5, etc.

**Float:** Short for "floating point number," any rational number, usually used with decimals, such as 2.8 or 3.14159.

**Strings:** A set of letters, numbers, or other characters, e.g., "Frank Underwood, I am your father."

**Tuples:** A list with a fixed number of elements, e.g., in x=(1,2,3), the parentheses makes it a tuple. x = ("Kirk", "Picard", "Spock")

**Lists:** A list without a fixed number of elements, e.g., in x=[1,2,3], note the square brackets a list. x = ["Lord", "of", "the", "Rings"]

**Dictionaries**: A type with multiple elements, e.g., x = {1: 'a','b': 2,3: 3} where you address the elements with, e.g., a text.
x = {'key1':'value1', 'key2':'value2'}

Throughout the lesson, we will review each data type more in depth and discuss common ways of interacting with each of them.

[Python Basic data types](https://en.wikiversity.org/wiki/Python/Basic_data_types)

---

<a id='jupyter_nb'></a>
## Jupyter Notebook

Before we get started, let's go over interacting with iPython in the Jupyter Notebook.

Code cells are run by pressing shift + enter or using the Play button in the toolbar.

In [None]:
# This is a cell.

In [None]:
# assigning a variable
v = 1

In [None]:
# Assign another.
dsi_ga = 'DSI is awesome!'

In [None]:
# Run this!
dsi_ga

In [None]:
# Print this.
print(v)

You can also perform basic math using integers in the iPython notebook.

In [None]:
45-19

<a id='variables'></a>
## Variables

Variables are names that have been assigned to specific values or data.  These names can be almost anything you want, but there are some restrictions and best practices.

**Restrictions**
- Variable names cannot be just a number (i.e., `2`, `0.01`, `10000`).
- Variables cannot be assigned the same name as a default or imported function (i.e., '`type`', '`print`', '`for`').
- Variable names cannot have spaces in them.

**Best Practices**
- Variable names should be lowercase.
- A variable's name should be representative of the value(s) it has been assigned.
- If you must use multiple words in your variable name, use an underscore to separate them.

In [None]:
# assigning a float
x = 1.0
type(x)

In [None]:
# assigning an int
y = 1
type(y)

In [None]:
# assigning a string
z = '1'
type(z)

In [None]:
t = (2,3,4)
type(t)


In [None]:
d = {'fe':2}
type(d)

**It is critical to remember that when assigning variables, we are not stating that "_x equals 1_", we are stating that "_x has been assigned the value of 1_".**

<a id='operators'></a>
## Operators

"Operators are the constructs which can manipulate the value of operands." — [Tutorials Point: Python](https://www.tutorialspoint.com/python/python_basic_operators.htm)

Operators can be used in a mathematical sense to calculate (or create) the sums, difference, products, or quotient of values or variables.

In [None]:
# addition
print 1 + 2
# subtraction
print 1 - 2
# multiplication
print 1 * 2
# division
print 1. / 2

As you can see, the output of the division is not correct.  This is because "`/`" will round down the output in order to keep the datatypes of the input and output consistent.  
_Not that this aspect has been removed in Python 3.0_

Converting one or both of the integers to floats will allow proper division.

In [None]:
# division of float numbers
1 / 2.0

There is also "`//`" division, whose output will be a whole number.

In [None]:
# still a float, but a whole-number float
9.0//2

The equals sign in Python is known as the assignment operator. It is the means by which we can assign values to variables.

In [None]:
number = 2.0
type(number)

In [None]:
# exponent power operator
2 ** 2

In [None]:
# modulo can be used to get the remainder
5%2

Booleans and Boolean evaluation operators.  Booleans exist as either true or false, and are generally used as a means of evaluation.

In [None]:
True and False

In [None]:
not False

In [None]:
True or False

Comparison Operators
- Less than: **`<`**
- Greater than: **`>`**
- Less than or equal to: **`<=`**
- Greater than or equal to: **`<=`**
- Equals: **`==`**
- Does not equal: **`!=`**


In [None]:
2 > 1, 2 < 1, 2 > 2, 2 < 2, 2 >= 2, 2 <= 2

In [None]:
# equality
[1,2] == [1,2], [1,2] != [2,1]

<a id='numbers'></a>
## Numbers in Python

Numbers in Python can be stored four ways. Two, floats and integers, are very common, and the other two, [Long](https://docs.python.org/2/library/functions.html#long) and [Complex](https://docs.python.org/2/library/functions.html#complex), are relatively uncommon. Today we will review integers and floats, as there is a good chance these will be the only ones you ever use.

Integers are whole numbers. 
- 1
- 200
- 100009 

Floats are numbers with decimals. The name "float" comes from "floating point," as the decimal can _float_ the length of the number.
- 1.11
- 26.006
- 3.0

In [None]:
x_int = 1
x_float =1.0

type(x_int), type(x_float)

If a integer or float is compatible, it can be converted to the other type.

In [None]:
float(x_int)

In [None]:
type(int(x_float))

<a id='strings'></a>

## Strings

Strings are essentially any character combination in between quotes. They are most often used as a way of storing text.

In [None]:
s = "Hello world"
s.split(' ')

In [None]:
s.replace('Ho','Hi')

Strings have a lot of methods and attributes associated with them, which allow us to better understand and manipulate them.

In [None]:
# length of the string
len(s)

In [None]:
# Replace an element of a string.
s2 = s.replace("world", "test")
print(s2)

<a id='slicing'></a>


**String Indexing**  

We can extract characters at specific index locations in a string using indexing.

In [None]:
# indexing the first (index 0) character in the string
s[0]

The numbers you enter after the variable (the [0]) are called indices.

_Counting in Python and many other programming languages begins at 0, as opposed to 1._  

In [None]:
# Objects at indexes 0,1,2,3 & 4

s[0:5]

Most ranges or functions with ranges have upper ends that are not inclusive. So a range of `[0:5]` starts at `0` and stops before `5`.

In [None]:
# from index 6 up to the end of the string
s[6:]

In [None]:
# no start or end specified
s[:]

In addition to specifying a range, you can add a step size or character skip rate.

In [None]:
# Define step size of 2, every other character.
s[::2]

#### Concatenating
To add two strings together, type the first string, an addition sign, and then the second string.

In [None]:
print 'Hello'+'world'

You can do the same with variables referring to strings.
In the iPython notebook, type:


In [None]:
x = 'Hello'
y = 'world'

x + y

There is also "C-style" formatting, which allows us to create a string with placeholder values that we can populate.

In [None]:
# C-style formatting
print("value = %f" % 1.0) 
# "%f" is the placeholder for a float.

In [None]:
# alternative, more intuitive way of formatting a string 
s3 = 'value1 = {0}, value2 = {1}'.format(3.1415, 1.5)
print(s3)

Multiplying is very easy and straightforward.

In [None]:
x = 'Hello '
x * 5

<a id='lists'></a>


## Lists

Lists are a way of storing ordered data.

Lists can be composed of ints, floats, strings, or other lists, as well as other data types we have not covered yet.

In [None]:
l = [1,2,3,4]

print(type(l))
print(l)

In [None]:
# The contents of a variable can be reassigned to another variable.
a = l

In [None]:
print a

In [None]:
# list of strings
names = ['Joseph', 'Bob', 'Rick']
print(names)

Lists also have several methods that allow us to alter them, such as the `.append()` method, which allows us to add another element on to the end of a list.

In [None]:
names.append('John')

In [None]:
names

In [None]:
# Lists can be indexed in the same method as strings.
print(l[1:3])
print(l[::2])

In [None]:
# We can slice a value in a list as well.
names[1][1:]

In the example above, the first index slice gets the string "`Bob`", and the second indexing aspect gets the characters in "`Bob`" at index 1 until the end.

In [None]:
# Lists don't have to be the same type.
l = [1, 'a', 1.0, 1-1j]
print(l)

In [None]:
# We can create a list of values in a range using the "range" function.
start = 10
stop = 30
step = 2
range(start, stop, step)

# Consume the iterator created by range.
list(range(start, stop, step))

Here's how we create a list from scratch.

In [None]:
# Create a new empty list.
l = []

# Add an element using `append`.
l.append("A")
l.append("d")
l.append("d")

print(l)

In [None]:
# Reassign a range of values with another list.
l[1:3] = ["b", "c"]
print(l)

Use the `.insert()` method to add values at specific indexes.

In [None]:
l.insert(0, "i")
l.insert(1, "n")
l.insert(2, "s")
l.insert(3, "e")
l.insert(4, "r")
l.insert(5, "t")

print(l)

In [None]:
s = [2,3,4]
s.insert(0,1)
dir(s)

If a value already exists at an index where the new value is trying to be inserted, the original value gets bumped to the next index.

---
The `.remove()` method can be used to remove specific values if they appear in a list.

In [None]:
l.remove("A")
print(l)

On the other hand, the `del` function can be used with a list and index to delete values.

In [None]:

del l[7]
del l[6]

print(l)

<a id='tuples'></a>


## Tuples

Tuples are similar to lists in that they store a sequence of various separate values. However, tuples are not mutable, in that once they are created, the values in them cannot be changed.

In [None]:
point = (10, 20)
print(point, type(point))

In [None]:
# They can be sliced just like lists and strings.
point[0]

Unpacking a variable is a common practice when iterating through Python data types. Unpacking essentially allows us to simultaneously set new variables to items in a list, tuple, or dictionary.  

In [None]:
# unpacking
x, y = point

print("x =", x)
print("y =", y)

<a id='dictionary'></a>


## Dictionaries

Dictionaries are a non-ordered Python data type. Instead of using an ordered index to access data stored in a dictionary, we use a system of key-value pairs.

A key is similar to a variable name, and is typically a string. A value is similar to the value assigned to the variable and can be any data type.

Curly brackets { } enclose dictionaries. The first input in a dictionary pair is the "key". The second input in a dictionary pair is the "value". The general format looks like this:

In [None]:
params = {"key1" : 1.0,
          "key2" : 2.0,
          "key3" : 3.0,}

print(type(params))
print(params)

The keys stay the same but the values are changeable. You can also only have one occurrence of a key in a dictionary, but values can be repeated.

In [None]:
# value for key2 in the params dictionary
params["key2"]

In [None]:
# adding a new key value pair to a dictionary
params["key4"] = "D"

In [None]:
# Print the entirety of the dictionary.
print(params)

In [None]:
# Reassigning the value of a key-value pair in the dictionary.
params["key1"] = "A"
params["key2"] = "B"

In [None]:
print("hamburger = " + str(params["key1"]))
print("Key 1 = " + str(params["key2"]))
print("Key 2 = " + str(params["key3"]))
print("Key 3 = " + str(params["key4"]))

In [None]:
lst = []
key = 'key'
value = 'value'
my_dict = {key: value}
lst.append(my_dict)
lst[0] = (key, value)

lst

In [None]:
my_list = list(range(1,20,3))
my_list = my_list[len(my_list)-1]
my_list = str(my_list) + str(180//2)
my_list

<a id='import'></a>

## Importing Packages and Documentation

Not everything we will use is readily available in Python. Sometimes, we'll need to import packages, which are assemblies of functions, or additional data types.

In [None]:
import math

x = math.cos(2 * math.pi)
print(x)

Import the whole module into the current namespace instead.

In [None]:
from math import *
x = cos(2 * pi)
print(x)

There are several ways to look at documentation for a module. Within the Jupyter notebook we can use the `help()` function, or you can place your cursor inside of a function and press "`shift + tab`".

In [None]:
help(math.cos)

<a name="ind-practice"></a>
## Independent Practice: Topic 
Using strings, lists, indexing, concatenation, as well as other Python elements discussed in this lesson, make up your own statements and share them as a reply in our slack channel. We will then work through a few to see if we can guess the output.



----


<a id='py_i'></a>
## Part 2: Python Iterations, Control Flow, and Functions

We've gone over how data can exist within the Python language. Now let's look at the core ways of interacting with it.

- `if…elif…else` statements
- `for` and `while` loops
- Error handling with `try` and `except`
- Functions


First, let's bring in one of the many libraries Python has available to help us with some of the statements we'll be creating.

In [None]:
import numpy as np

NumPy is one of the core data science libraries you will use. It has many functions for many useful mathematical operations already built so we don't have to build them ourselves.

All you _need_ to do to import a library is execute ``` import <library name>```. In our situation, we import NumPy and assign it to the value 'np', which allows us to use 'np' as a shorthand.  

_Why would we do this?_
To access one of the functions within NumPy we would still have to call "numpy" `numpy.mean(x)`, and this just creates a shorthand for doing so.

<a id='if_else_statements'></a>

# `if…else` Statements

---

### 1. Write an `if…else` statement to check whether the suitcase is over 50 pounds.

Print a message indicating whether or not the suitcase is over 50 pounds.

In [None]:
weight = float(input("How many pounds does your suitcase weigh?"))

In [None]:
# A:

# ---

### 2. Write an `if…else` statement for multiple conditions.

Print out these recommendations based on the weather conditions:

1. The temperature is higher than 60 degrees and it is raining: Bring an umbrella.
2. The temperature is lower than or equal to 60 degrees and it is raining: Bring an umbrella and a jacket.
3. The temperature is higher than 60 degrees and the sun is shining: Wear a T-shirt.
4. The temperature is lower than or equal to 60 degrees and the sun is shining: Bring a jacket.

In [3]:
temperature = float(input("What is the temperature?"))
weather = str(input("What is the weather- rain or shine?"))

What is the temperature?61
What is the weather- rain or shine?rain


NameError: name 'rain' is not defined

In [1]:
d = 2

In [6]:
weather = str(raw_input("What is the weather- rain or shine?"))

What is the weather- rain or shine?rain


In [8]:
if temperature > 60 and weather == 'rain':
    print('Bring an umbrella')
elif temperature <= 60 and weather == 'rain':
    print 'Bring an umbrella and a jacket'
elif temperature > 60 and weather == 'shing':
    print 'Wear a T-shirt'
elif temperature <= 60 and weather == 'shining':
    print 'Bring a jacket'

Bring an umbrella


In [None]:
temperature = float(input('What is the temperature? '))
weather = raw_input('What is the weather (rain or shine)? ')

In [None]:
# A:

---
<a id='for_loops'></a>
# `for` Loops


One of the core aspects of using a programming language is to automate repetitive tasks. One just means in Python is the `for` loop.

The `for` loop allows you to perform a repetitive task on every element within an object, such as every every name in a list.


Let's see how the pseudocode works.

```python
# For each individual object in the list
    # perform task_A on said object.
    # Once task_A has been completed, move to next object in the list.
```

Let's say we wanted to print each of the names in the list, as well as "Is Awesome!"

In [None]:
names = ['Alex','Brian', 'Catherine']

for name in names:
    print name + ' Is Awesome!'

This process of cycling through a list item by item is known as "iteration". 

---

### 3. Write a `for` loop that iterates from number 1 to number 15.

On each iteration, print out the number.  



In [15]:
# A:
for n in range(1,16):
    print n

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15


---

### 4. Iterate from 1 to 15, printing whether the number is odd or even.

Hint: The modulus operator, `%`, can be used to take the remainder. For example:

```python
9 % 5 == 4
```

Or, in other words, the remainder of dividing 9 by 5 is 4.

In [19]:
for n in range(1,16):
    if n % 2 == 1:       
        print str(n) + ' odd'
    else:
        print str(n) +' even'

1 odd
2 even
3 odd
4 even
5 odd
6 even
7 odd
8 even
9 odd
10 even
11 odd
12 even
13 odd
14 even
15 odd


---
<a id='fizz_buzz'></a>
### 5. Iterate from 1 to 30 with the following instructions:

1. If a number is divisible by 3, print "fizz". 
2. If a number is divisible by 5, print "buzz". 
3. If a number is both divisible by 3 and 5 print "fizzbuzz".
4. Otherwise, print just the number.

In [25]:
# A:
for num in range(1,31):
    response = ''
    if num % 3 == 0:
        response = response + 'fizz'
    elif num >= 3 and num %3 == 0:
        print str(num) + ' fizz'
    elif num >= 5 and num % 5 ==0:
        print str(num) + ' buzz'
    else:
        print num

1
2
3 fizz
4
5 buzz
6 fizz
7
8
9 fizz
10 buzz
11
12 fizz
13
14
15 fizzbuzz
16
17
18 fizz
19
20 buzz
21 fizz
22
23
24 fizz
25 buzz
26
27 fizz
28
29
30 fizzbuzz


Remember this example. FizzBuzz is a common coding challenge. It is relatively easy to solve, but those who ask are always looking for more creative ways to solve or optimize it.

---

### 6. Iterate through the following list of animals, and print each one in all caps.

In [26]:
animals = ['duck', 'rat', 'boar', 'slug', 'mammoth', 'gazelle']

In [31]:
for animal in animals:
    print animal.upper()
    

DUCK
RAT
BOAR
SLUG
MAMMOTH
GAZELLE


---

### 7. Iterate through the animals list. Capitalize the first letter and append the modified animals to a new list.

In [33]:
# A:
new_list = []
for animal in animals:
    new_list.append(animal.capitalize())
new_list    

['Duck', 'Rat', 'Boar', 'Slug', 'Mammoth', 'Gazelle']

---

### 8. Iterate through the animals. Print out the animal name and the number of vowels in the name.
Hint: You may need to create a variable of vowels for comparison.

In [37]:
# A
vowels = ['a', 'e', 'i', 'o', 'u']

for animal in animals:
    count = 0
    for letter in animal:
        if letter in vowels:
            count += 1
    print animal + ' ' + str(count)

duck 1
rat 1
boar 2
slug 1
mammoth 2
gazelle 3


---
<a id='functions'></a>
# Functions
---

Similar to the way we can use `for` loops as a means of performing repetitive tasks on a series of objects, we can also create functions to perform repetitive tasks. Within a function, we can write a large block of action and then call the function whenever we want to use it.  


Let's make some pseudocode.
```python
# Define the function name and the requirements it needs.
    # Perform actions.
    # Optional: Return output.
```

Let's create a function that takes two numbers as arguments and returns their sum, difference, and product. 

In [None]:
def arithmetic(num1, num2):
    print num1 + num2
    print num1 - num2
    print num1 * num2
    
arithmetic(3,5)

Once we define the function, it will exist until we reset our kernel, close our notebook, or overwrite it.

In [None]:
arithmetic(4,10)

### 9. Write a function that takes a word as an argument and returns the number of vowels in the word.

Try it out on three words.

In [39]:
# A:
def takeWord(word):
    word = word.lower()
    vows = ['a', 'e', 'i', 'o' ,'u']
    count = 0
    for letter in word:
        if letter in vows:
            count += 1  
    return count
            

In [42]:
takeWord('animal')

3

---

### 10. Write a function to calculate the area of a triangle using a height and width.

Test it out.

In [43]:
def triangle(h, w):
    return float(h*w)/2

In [44]:
# A:
triangle(3,4)

6.0

---
<a id='while_loops'></a>
# `while` Loops
---


`while` loops are a different means of performing repetitive tasks/iteration. The function of a `for` loop is to perform tasks over a _finite list_. The function of a `while` loop is to perform a repetitive task until a _specific threshold or criteria is met_. Keep in mind this can be relatively dangerous, as it is easy to create a loop that never meets a criteria and runs forever.

_We say "list", but we are not just talking about a Python list datatype. We're including any datatype where information can be iterated through._

Let's look at some pseudocode.

```python
# A threshold or criteria is set.
    # As long as the threshold or criteria isn't met,
    # perform a task.
    # Check threshold/criteria.
        # If threshold/criteria is met or exceed,
            # break loop.
        # If not, repeat.
    
```

Example of an infinite `while` loop:

```python
x = 0
While x < 10:
    print x
```

Because the value assigned to `x` never changes and always remains below 10, this loop will print "`x`" infinitely until you force-kill the kernel. 
We can fix this infinity loop by having a incrementation for `x` within the loop.

```python
x = 0
While x < 10:
    print x
    x = x+1
```



### 11. Use `while` loops and strings.

Iterate over the following sentence repeatedly, counting the number of vowels in the sentence until you have tallied 1 million. Print out the number of iterations it took to reach that amount.

In [None]:
sentence = "A MAN KNOCKED ON MY DOOR AND ASKED FOR A SMALL DONATION TOWARDS THE LOCAL SWIMMING POOL SO I GAVE HIM A GLASS OF WATER"

In [None]:
# A:
vow = 0
interation_count = 0
while vow < 1000000:
    for letter in sentence:
        if letter in ['a', 'e' ,'i' ,'o' ,'u']:
            count += 1
    

---

### 12. Try to convert elements in a list to floats.

Create a new list with the converted numbers. If something cannot be converted, skip it and append nothing to the new list.

_Hint: Use error-handling methods._

In [None]:
corrupted = ['!1', '23.1', '23.4.5', '??12', '.12', '12-12', '-11.1', '0-1', '*12.1', '1000']

In [None]:
# A:

---
<a id='coffee_preference'></a>

# Practice Control Flow on Coffee Preference Data Set

### 13. Load coffee preference data from file and print.

The code to load in the data is provided below. 

The `with open(..., 'r') as f:` opens up a file in "read" mode (rather than "write"), and assigns this opened file to `f`. 

We can then use the `.readlines()` built-in function to split the csv file on newlines and assign it to the variable `lines`.v

In [5]:
import csv
with open('C:\users/Tung/python-foundations-master/assets/datasets/coffee-preferences.csv','r') as f:
    lines = f.readlines()

In [2]:
import os
os.getcwd()

'C:\\Users\\Tung\\python-foundations-master'

#### Iterate through lines and print them out.

In [6]:
# A:print
print(lines)

['Timestamp,Name,Starbucks,PhilzCoffee,BlueBottleCoffee,PeetsTea,CaffeTrieste,GrandCoffee,RitualCoffee,FourBarrel,WorkshopCafe\n', '3/17/2015 18:37:58,Alison,3,5,4,3,,,5,5,\n', '3/17/2015 18:38:09,April,4,5,5,3,,,3,,5\n', '3/17/2015 18:38:25,Vijay,3,5,5,5,3,2,1,1,1\n', '3/17/2015 18:38:28,Vanessa,1,5,5,2,,,3,2,3\n', '3/17/2015 18:38:46,Isabel,1,4,4,2,4,,4,4,\n', '3/17/2015 18:39:01,India,5,3,3,3,3,1,,,3\n', '3/17/2015 18:39:01,Dave H,4,5,,5,,,,,\n', '3/17/2015 18:39:05,Deepthi,3,5,,2,,,,,2\n', '3/17/2015 18:39:14,Ramesh,3,4,,3,,,,,4\n', '3/17/2015 18:39:23,Hugh Jass,1,5,5,4,5,2,5,4,1\n', '3/17/2015 18:39:23,Alex,4,5,,3,,,,,\n', '3/17/2015 18:39:30,Ajay Anand,3,4,4,3,5,,,,\n', '3/17/2015 18:39:35,David Feng,2,3,4,2,2,,5,4,3\n', '3/17/2015 18:39:42,Zach,3,4,4,3,,,,,5\n', '3/17/2015 18:40:44,Matt,3,5,4,3,2,2,4,3,2\n', '3/17/2015 18:40:49,Markus,3,5,,3,,,4,,\n', '3/17/2015 18:41:18,Otto,4,2,2,5,,,3,3,3\n', '3/17/2015 18:41:23,Alessandro,1,5,3,2,,,4,3,\n', '3/17/2015 18:41:35,Rocky,3,5,4,3,

#### Print out just the lines object by typing "lines" in a cell and hitting `enter`.

In [None]:
# A:


---

### 14. Remove the remaining newline `'\n'` characters with a `for` loop.

Iterate through the lines of the data and remove the unwanted newline characters.

**.replace('\n', '')** is a built-in string function that will take the substring you want to replace as its first argument, and the string you want to replace it with as its second.

In [10]:
# A:
line_2 = []
for line in lines:
    line_2.append(line.replace('\n',''))
print(line_2)

['Timestamp,Name,Starbucks,PhilzCoffee,BlueBottleCoffee,PeetsTea,CaffeTrieste,GrandCoffee,RitualCoffee,FourBarrel,WorkshopCafe', '3/17/2015 18:37:58,Alison,3,5,4,3,,,5,5,', '3/17/2015 18:38:09,April,4,5,5,3,,,3,,5', '3/17/2015 18:38:25,Vijay,3,5,5,5,3,2,1,1,1', '3/17/2015 18:38:28,Vanessa,1,5,5,2,,,3,2,3', '3/17/2015 18:38:46,Isabel,1,4,4,2,4,,4,4,', '3/17/2015 18:39:01,India,5,3,3,3,3,1,,,3', '3/17/2015 18:39:01,Dave H,4,5,,5,,,,,', '3/17/2015 18:39:05,Deepthi,3,5,,2,,,,,2', '3/17/2015 18:39:14,Ramesh,3,4,,3,,,,,4', '3/17/2015 18:39:23,Hugh Jass,1,5,5,4,5,2,5,4,1', '3/17/2015 18:39:23,Alex,4,5,,3,,,,,', '3/17/2015 18:39:30,Ajay Anand,3,4,4,3,5,,,,', '3/17/2015 18:39:35,David Feng,2,3,4,2,2,,5,4,3', '3/17/2015 18:39:42,Zach,3,4,4,3,,,,,5', '3/17/2015 18:40:44,Matt,3,5,4,3,2,2,4,3,2', '3/17/2015 18:40:49,Markus,3,5,,3,,,4,,', '3/17/2015 18:41:18,Otto,4,2,2,5,,,3,3,3', '3/17/2015 18:41:23,Alessandro,1,5,3,2,,,4,3,', '3/17/2015 18:41:35,Rocky,3,5,4,3,3,3,4,4,3', '3/17/2015 18:42:01,cheong

---

### 15. Split the lines into "header" and "data" variables.

The header is the first string in the list of strings. It contains the column names of our data.

In [21]:
# A:
header = line_2[0]
data = line_2[1:]
header
data

['3/17/2015 18:37:58,Alison,3,5,4,3,,,5,5,',
 '3/17/2015 18:38:09,April,4,5,5,3,,,3,,5',
 '3/17/2015 18:38:25,Vijay,3,5,5,5,3,2,1,1,1',
 '3/17/2015 18:38:28,Vanessa,1,5,5,2,,,3,2,3',
 '3/17/2015 18:38:46,Isabel,1,4,4,2,4,,4,4,',
 '3/17/2015 18:39:01,India,5,3,3,3,3,1,,,3',
 '3/17/2015 18:39:01,Dave H,4,5,,5,,,,,',
 '3/17/2015 18:39:05,Deepthi,3,5,,2,,,,,2',
 '3/17/2015 18:39:14,Ramesh,3,4,,3,,,,,4',
 '3/17/2015 18:39:23,Hugh Jass,1,5,5,4,5,2,5,4,1',
 '3/17/2015 18:39:23,Alex,4,5,,3,,,,,',
 '3/17/2015 18:39:30,Ajay Anand,3,4,4,3,5,,,,',
 '3/17/2015 18:39:35,David Feng,2,3,4,2,2,,5,4,3',
 '3/17/2015 18:39:42,Zach,3,4,4,3,,,,,5',
 '3/17/2015 18:40:44,Matt,3,5,4,3,2,2,4,3,2',
 '3/17/2015 18:40:49,Markus,3,5,,3,,,4,,',
 '3/17/2015 18:41:18,Otto,4,2,2,5,,,3,3,3',
 '3/17/2015 18:41:23,Alessandro,1,5,3,2,,,4,3,',
 '3/17/2015 18:41:35,Rocky,3,5,4,3,3,3,4,4,3',
 '3/17/2015 18:42:01,cheong-tseng eng,3,1,,,,,4,,']

---

### 16. Split the header and the data strings on commas.

To split a string on the comma character, use the built in **`.split(',')`** function. 

Split the header on commas, then print it. You can see that the original string is now a list containing items that were originally separated by commas.

In [25]:
header_list = header.split(',')
print(header_list)
data_list = []
for line in data:
    data_list.append(line.split(','))
data_list[0][1]

['Timestamp', 'Name', 'Starbucks', 'PhilzCoffee', 'BlueBottleCoffee', 'PeetsTea', 'CaffeTrieste', 'GrandCoffee', 'RitualCoffee', 'FourBarrel', 'WorkshopCafe']


'Alison'

---

### 17. Remove the "Timestamp" column.

We aren't interested in the "Timestamp" column in our data, so remove it from the header and the data list.

Removing the Timestamp from the header can be done with list functions or with slicing. To remove the header column from the data, use a `for` loop.

Print out the new data object with the timestamps removed.

In [32]:
# A:
print header_list
print data_list

['Name', 'Starbucks', 'PhilzCoffee', 'BlueBottleCoffee', 'PeetsTea', 'CaffeTrieste', 'GrandCoffee', 'RitualCoffee', 'FourBarrel', 'WorkshopCafe']
[['Alison', '3', '5', '4', '3', '', '', '5', '5', ''], ['April', '4', '5', '5', '3', '', '', '3', '', '5'], ['Vijay', '3', '5', '5', '5', '3', '2', '1', '1', '1'], ['Vanessa', '1', '5', '5', '2', '', '', '3', '2', '3'], ['Isabel', '1', '4', '4', '2', '4', '', '4', '4', ''], ['India', '5', '3', '3', '3', '3', '1', '', '', '3'], ['Dave H', '4', '5', '', '5', '', '', '', '', ''], ['Deepthi', '3', '5', '', '2', '', '', '', '', '2'], ['Ramesh', '3', '4', '', '3', '', '', '', '', '4'], ['Hugh Jass', '1', '5', '5', '4', '5', '2', '5', '4', '1'], ['Alex', '4', '5', '', '3', '', '', '', '', ''], ['Ajay Anand', '3', '4', '4', '3', '5', '', '', '', ''], ['David Feng', '2', '3', '4', '2', '2', '', '5', '4', '3'], ['Zach', '3', '4', '4', '3', '', '', '', '', '5'], ['Matt', '3', '5', '4', '3', '2', '2', '4', '3', '2'], ['Markus', '3', '5', '', '3', '', '',

---

### 18. Convert numeric columns to floats and empty fields to `None`.

Iterate through the data, and construct a new data list of lists that contains the numeric ratings converted from strings into floats and the empty fields (which are empty strings '') replaced with the None object.

Use a nested `for` loop (a `for` loop within another `for` loop) to get the job done. You will likely need to use `if…else` conditional statements as well.

Print out the new data object to make sure you've succeeded.

In [57]:
# A:
data_list_2 = []
for item in data_list:
    tmp_list = []
    for i in item:
        if i == '':
            i = None
        elif len(i) == 1:
                i = float(i)
        tmp_list.append(i)
    data_list_2.append(tmp_list)
data_list_2         

[['Alison', 3.0, 5.0, 4.0, 3.0, None, None, 5.0, 5.0, None],
 ['April', 4.0, 5.0, 5.0, 3.0, None, None, 3.0, None, 5.0],
 ['Vijay', 3.0, 5.0, 5.0, 5.0, 3.0, 2.0, 1.0, 1.0, 1.0],
 ['Vanessa', 1.0, 5.0, 5.0, 2.0, None, None, 3.0, 2.0, 3.0],
 ['Isabel', 1.0, 4.0, 4.0, 2.0, 4.0, None, 4.0, 4.0, None],
 ['India', 5.0, 3.0, 3.0, 3.0, 3.0, 1.0, None, None, 3.0],
 ['Dave H', 4.0, 5.0, None, 5.0, None, None, None, None, None],
 ['Deepthi', 3.0, 5.0, None, 2.0, None, None, None, None, 2.0],
 ['Ramesh', 3.0, 4.0, None, 3.0, None, None, None, None, 4.0],
 ['Hugh Jass', 1.0, 5.0, 5.0, 4.0, 5.0, 2.0, 5.0, 4.0, 1.0],
 ['Alex', 4.0, 5.0, None, 3.0, None, None, None, None, None],
 ['Ajay Anand', 3.0, 4.0, 4.0, 3.0, 5.0, None, None, None, None],
 ['David Feng', 2.0, 3.0, 4.0, 2.0, 2.0, None, 5.0, 4.0, 3.0],
 ['Zach', 3.0, 4.0, 4.0, 3.0, None, None, None, None, 5.0],
 ['Matt', 3.0, 5.0, 4.0, 3.0, 2.0, 2.0, 4.0, 3.0, 2.0],
 ['Markus', 3.0, 5.0, None, 3.0, None, None, 4.0, None, None],
 ['Otto', 4.0, 2.0, 

---

### 19. Count the `None` values per person, and put counts in a dictionary.

Use a `for` loop to count the number of `None` values per person. Create a dictionary with the names of the people as keys, and the counts of `None` as values.

Who rated the most coffee brands? Who rated the least?

In [55]:
# A:
none_cnts = {}
for line in data_list_2:
    none_cnt = 0
    for item in line:
        if item == None:
            none_cnt += 1
    none_cnts[line[0]] = none_cnt
print(none_cnts)

{'Dave H': 6, 'Ramesh': 5, 'Alex': 6, 'Rocky': 0, 'Zach': 4, 'Vanessa': 2, 'Deepthi': 5, 'India': 2, 'Matt': 0, 'Isabel': 2, 'April': 3, 'Alessandro': 3, 'Ajay Anand': 4, 'Otto': 2, 'Markus': 5, 'Hugh Jass': 0, 'David Feng': 1, 'Vijay': 0, 'cheong-tseng eng': 6, 'Alison': 3}


---

### 20. Calculate average rating per coffee brand.

**Excluding `None` values**, calculate the average rating per brand of coffee.

The final output should be a dictionary with keys as the coffee brand names, and their average rating as the values.

Remember that the average can be calculated as the sum of the ratings over the number of ratings:

```python
average_rating = float(sum(ratings_list))/len(ratings_list)
```

Print your dictionary to see the average brand ratings.

In [64]:

brand_list = []
for line in data_list_2:
    for index, brand in enumerate(line[1:]):
        if line[index] == None: 
            line[index] = 0
        sum_of_brand = line[index] + line[index]
    print sum_of_brand
        


10.0
0
2.0
4.0
8.0
0
0
0
0
8.0
0
0
8.0
0
6.0
0
6.0
6.0
8.0
0


---

### 21. Create a list containing only the people's names.

In [66]:
# A:
name_list = []
for line in data_list_2:
    name = line[0]
    name_list.append(name)
name_list   

['Alison',
 'April',
 'Vijay',
 'Vanessa',
 'Isabel',
 'India',
 'Dave H',
 'Deepthi',
 'Ramesh',
 'Hugh Jass',
 'Alex',
 'Ajay Anand',
 'David Feng',
 'Zach',
 'Matt',
 'Markus',
 'Otto',
 'Alessandro',
 'Rocky',
 'cheong-tseng eng']

---

### 22. Picking a name at random. What are the odds of choosing the same name three times in a row?

Now we'll use a `while` loop to "brute force" the odds of choosing the same name three times in a row randomly from the list of names.

"Brute Force" is a term used quite frequently in programming to reference a computationally inefficient way of solving a problem. It's brute force in this situation because we can use statistics to solve this much more efficiently than actually playing out an entire scenario.

Below I've imported the **`random`** package, which has the essential function for this code **`random.choice()`**.
The function takes a list as an argument, and returns one of the elements of that list at random.

In [68]:
import random
# Choose a random person from the list of people:
# random.choice(people)
def choose3times(list):
    first = random.choice(list)
    second = random.choice(list)
    third =  random.choice(list)
    if first == second == third:
        return True
    else:
        return False

Write a function to choose a person from the list randomly three times and check if they are all the same.

Define a function that has the following properties:

1. Takes a list (your list of names) as an argument.
2. Selects a name using `random.choice(people)` three separate times.
3. Returns `True` if the name was the same all three times. Otherwise returns `False`.

In [91]:
# A:
my_list = []
for i in range(100):
    times = 0
    while not choose3times(name_list):
        times += 1
    my_list.append(times)
1/(sum(my_list)/100.0)

0.0027171697959405485

---

### 23. Construct a `while` loop to run the choosing function until it returns `True`.

Run the function until you draw the same person three times using a `while` loop. Keep track of how many tries it took and print out the number of tries after it runs.

In [None]:
# A:


<a name="conclusion"></a>
## Lesson Summary


Let's review what we learned today:

- Discussed why Python is popular for data science.
- Demonstrated variable assignment.
- Defined integers, strings, tuples, lists, and dictionaries.
- Demonstrated arithmetic operations and string operations.
- Reviewed `Python` control flow and conditional programming. 
- Implemented `for` and `while` loops to iterate through data structures.
- Applied `if…else` conditional statements.
- Created functions to perform repetitive actions.
- Demonstrated error-handling using `try, except` statements.
- Combined control flow and conditional statements to solve the classic "FizzBuzz" code challenge.
- Used `Python` control flow and functions to help us parse, clean, edit, and analyze the Coffee Preferences data set.



### Additional Questions?


....

### Additional Resources

- [Learn Python on Codecademy](https://www.codecademy.com/learn/python)
- [Learn Python the Hard Way](https://learnpythonthehardway.org)
- [Python Datatypes and Variables](http://www.python-course.eu/variables.php)
- [Python IF…ELIF…ELSE Statements](https://www.tutorialspoint.com/python/python_if_else.htm)
- [Python Loops](https://www.tutorialspoint.com/python/python_loops.htm)
- [Python Control Flow](https://python.swaroopch.com/control_flow.html)

In [None]:
x = 5

In [None]:
x