# Objectives
After going through this notebook, students will be able to write functions that

- manipulate data
- apply conditional logic
- iterate through list-type structures
- leverage Python libraries

In [3]:
"Yeah!"

'Yeah!'

# [10] Data Types

Everything we do with coding boils down to working with data, so let's get familiar with the data types of Python and also start to build our comfort level with Python

Let's also get comfortable with using the Jupyter Notebook environment!


**Let's start with integers and floats.**

In [4]:
2 + 2

4

In [5]:
7.25 / 5

1.45

**Try any mathematical operation and see it does what we want.**

In [6]:
((3 - 5) / 42) + 53

52.95238095238095

**What about strings? What are strings?**

In [7]:
'Hello'

'Hello'

In [8]:
"Hello"

'Hello'

In [9]:
"Tom's Coffee"

"Tom's Coffee"

In [10]:
'Tom\'s coffee'

"Tom's coffee"

**There are other data types that are built-in, primary data types, called Booleans**

In [11]:
True

True

Let's try `False`

In [12]:
False

False

There's also `None`

In [13]:
None

**Casting**

- Let's play with `int()`, `boolean()`, `str()`
- With `bool` let's reverse-engineer what it's doing by trying it on different inputs and see what the outputs are.

In [14]:
bool("Hello")

True

In [15]:
bool("")

False

In [16]:
bool(1280)

True

In [18]:
bool(0)

False

In [19]:
str(2.0)

'2.0'

# [15] String Manipulation
A significant chunk of certain ML work involves working with unstructured data like textual data.

Give some examples and ask for more examples from the class:
- Analyzing restaurant reviews
- Detecting spam
- Classifying sentiment

How are these related to manipulating string data?

**There are many different things we can do with strings. Let's look at the docs:***

### https://docs.python.org/3/library/stdtypes.html#string-methods

**Let's see how we define variables.**

In [20]:
description = "Python is a great language to learn!"

In [21]:
description.lower()

'python is a great language to learn!'

**Now try uppercasing**

In [22]:
description.upper()

'PYTHON IS A GREAT LANGUAGE TO LEARN!'

In [23]:
description.capitalize()

'Python is a great language to learn!'

**Some other methods to try**

- `.capitalize()`
- `.endswith()`

In [25]:
description.endswith('learn!')

True

In [27]:
messy_string = '    Hello how are you? There are a lot of $pace$ ' 

## Exercise
Strip the extraneous whitespace from the above variable.

In [28]:
messy_string.strip()

'Hello how are you? There are a lot of $pace$'

# Exercise
Replace the dollar-sign `$` above with an `s`

In [29]:
messy_string.strip().replace("$", "s")

'Hello how are you? There are a lot of spaces'

# [20] If / Else
Now that we've built a foundation in data, the next layer that we add in Boolean logic.

Ultimately, even the most complex program can usually be phrased as a series or tree fo Boolean statements.


Suppose we are dealing with an e-Commerce site.


```
If the user has items that they have added to their shopping cart and they haven't taken any action to checkout in the last hour, send them a reminder email.
```

Let's look at the syntax. Execute the following code

In [37]:
x = 15

if x > 10:
    print("Wow! X is greater than 10!")
else:
    print("x is less than 10 :(")

Wow! X is greater than 10!


## Exercise
Create an `if/else` that implements the following logic:

`text = "This is an interesting sentence."`

If text is longer than 10 characters, print that its a long string, otherwise, print its not a long string.

In [38]:
text = "This is an interesting sentence."

In [39]:
if len(text) > 10:
    print("Long string!")
else:
    print("Not a long string")

Long string!


# [20] Lists
Lists are fundamental data structure in Python. Let's learn the syntax.

In [41]:
restaurants= ['Chipotle', 'TGIF', 'Nonna Maria', 'AA Sushi']

Let's go through the common operations together:

- add an element
- remove an element
- check if an element is in the list
- check the length of a list

In [42]:
restaurants.append("Burger King")

In [45]:
restaurants

['Chipotle', 'TGIF', 'Nonna Maria', 'AA Sushi', 'Burger King']

In [46]:
restaurants.pop()

'Burger King'

In [48]:
restaurants

['Chipotle', 'TGIF', 'Nonna Maria', 'AA Sushi']

In [49]:
'McDonalds' in restaurants

False

In [50]:
'Chipotle' in restaurants

True

## Exercise
Suppose we have the following list

In [56]:
artists = ['Van Gogh', 'Michelangelo', 'Jackson Pollack']

Do the following operations:

- Add "Andy Warhol" to the list
- Remove "Jackson Pollack" from the list
- Check if "Andy Warhol" is in the list using the `in` syntax

In [53]:
artists.append("Andy Warhol")

In [55]:
artists

['Van Gogh', 'Michelangelo', 'Jackson Pollack', 'Andy Warhol']

# [30] For Loops

A for loop, which is a specific type of "list iteation", is a powerful tool that allows us to iterate through lists or things like lists, so we can do logic for each element in the list.

In [57]:
books = ["Harry Potter", "Catcher in the Rye", "The Great Gatsby", "Just Mercy", "Das Kapital"]

for item_name in books:
    print(item_name)

Harry Potter
Catcher in the Rye
The Great Gatsby
Just Mercy
Das Kapital


## Let's layer in logic

In [58]:
books = ["Harry Potter", "Catcher in the Rye", "The Great Gatsby", "Just Mercy", "Das Kapital"]

for item_name in books:
    if item_name == "Harry Potter":
        print("Expelliarmus!")
    else:
        print(item_name)

Expelliarmus!
Catcher in the Rye
The Great Gatsby
Just Mercy
Das Kapital


## Let's add some more complexity

In [59]:
books_with_ratings = [
    ["Harry Potter", 7.8],
    ["Catcher in the Rye", 9.2],
    ["The Great Gatsby", 8.7],
    ["Just Mercy", 7.1],
    ["Das Kapital", 7.0],
    ["The Hardy Boys", 6.5],
]

best_books = []

for item in books_with_ratings:
    rating = item[1]
    if rating > 7.5:
        best_books.append(item)

print(best_books)

[['Harry Potter', 7.8], ['Catcher in the Rye', 9.2], ['The Great Gatsby', 8.7]]


## Exercise
Given a list courses, return only the courses that the student hasn't taken.

In [61]:
courses_already_taken = ['Mathematics 101', 'Astronomy 203', 'English 405']

In [62]:
all_courses = [
    ['Mathematics 101', 5.1],
    ['Mathematics 102', 6.7],
    ['Mathematics 103', 8.8],
    ['Mathematics 201', 7.5],
    ['Mathematics 202', 6.1],
    ['Spanish 407', 7.2],
    ['Astronomy 203', 6.7],
    ['Astronomy 204', 8.8],
    ['English 101', 5.5],
    ['English 102', 6.5],
    ['English 405', 6.6],
    ['Physics 101', 7.2],
    ['Physics 102', 7.8],
]

In [65]:
results = []
for item in all_courses:
    if (item[0] not in courses_already_taken) and (item[1] > 7.0):
        results.append(item)

In [67]:
results

[['Mathematics 103', 8.8],
 ['Mathematics 201', 7.5],
 ['Spanish 407', 7.2],
 ['Astronomy 204', 8.8],
 ['Physics 101', 7.2],
 ['Physics 102', 7.8]]

In [68]:
[item for item in all_courses if item[0] not in courses_already_taken and item[1] > 7.0]

[['Mathematics 103', 8.8],
 ['Mathematics 201', 7.5],
 ['Spanish 407', 7.2],
 ['Astronomy 204', 8.8],
 ['Physics 101', 7.2],
 ['Physics 102', 7.8]]

## Exercise
Now return only the courses that they haven't taken AND the rating is higher than 7.0

# [40] Functions
Functions are the heart and soul of programming and a lot of what we will do in data munging and preparation is create functions that we can apply to columns in our dataset to transform it to what we need.

**Let's look at the syntax.**

In [72]:
FAVORITES = ['Chipotle', 'California Pizza Kitchen', "Paris Bistro"]

def is_in_favorites(restaurant):
    return restaurant in FAVORITES

**Key things to notice**

- colon
- indentation
- return statement
- name of variables
- `def`

**Let's call the function.**

In [75]:
is_in_favorites("Burger King")

False

## Exercise
Create a function to transform Celsius to Fahrenheit.

## Exercise
A lot of the time in data munging, we are cleaning up dirty data. Data can be dirty for many reasons. 

- outliers
- typos
- wrong data types
- etc

Create a function that does the following:

```
Turns $12,305,200.22 into 12305200.22
```

**BONUS** Gracefully handle bad data -> None

In [81]:
def clean_amount(val):
    try:
        return float(val.replace('$', '').replace(',', ''))
    except:
        return None

In [82]:
clean_amount("$12,305,200.22")

12305200.22

In [84]:
clean_amount("HELLO!")

**Let's take a look at the `random` library in the Python docs.**

## https://docs.python.org/3/library/random.html

## Exercise
Go ahead and use the `random` library to generate a random number.

In [85]:
import random

In [95]:
random.randint(1, 10)

3

## Lab
Create a function that randomly returns one of your favorite restaurants, given that it has a rating of 6.5 or higher.

**Bonus** This function should not return the same restaurant twice in a row.

In [99]:
FAVORITE_RESTAURANTS = [
    ['Chipotle', 7.6],
    ['Subway', 6.2],
    ['Boston Market', 6.4],
    ['Pizza Hut', 7.1],
    ['Don Ramon Cuban Eatery', 8.3],
    ['Paris Bistro', 7.7],
    ['AA Sushi', 8.5],
    ['Thai Palace', 8.3]
]

In [100]:
ELIGIBLE_RESTAURANTS = [rec for rec in FAVORITE_RESTAURANTS if rec[1] >= 6.5]

In [126]:
data = {'latest_rec': None}

def recommend():
    restaurants = [r for r in ELIGIBLE_RESTAURANTS if r != data['latest_rec']]
    latest_rec = random.choice(restaurants)
    data['latest_rec'] = latest_rec
    return latest_rec

In [125]:
recommend()

['Pizza Hut', 7.1]