## Lambda functions
- When to use, what's the upside?

## Data Cleaning
- How to even

## Dataframes (working with/on DFs)
- When to use a dataframe
- How to clean up dataframes
- How to replace stuff, change data types
- What drop duplicates does?
- Syntax: Between dataframes and series, where do the square brackets go?

## Lambdas 
- A function definition isn't the same thing as the functing being run
- Think of a function definition/body as a spell in your spellbook or a spell in a spell slot
- Running the function is casting that spell
- A lambda is a function definition/body with no name

In [1]:
def add(a, b):
    return a + b

type(add)

function

In [2]:
# Function definition is a blueprint, a recipe
# A car horn honks, a blue print is the recipe for creating the option to honk
# We call/run/execute/evoke functions when we type the function name w/ parentheses
add(2, 34)

36

In [3]:
# no parens, no function execution
# a function name on its own is the representation of that function's blueprint
add

<function __main__.add(a, b)>

In [4]:
add = lambda x, y: x + y
type(add)

function

## So what do we know so far?
- Lambdas exist to allow us to define a function in one line
- the "return" is implicit
- Lambdas in isolation aren't really valuable at all.
- Where practitioners use lambdas is when we have methods/functions that take in a function as the input
- You'll see lambdas used with:
    - sorting a list of dictionary by a specific dictionary key (sort, min, max)
    - .apply 

In [5]:
numbers = [1, 20, 99, -5]
min(numbers), max(numbers)

(-5, 99)

In [6]:
books = [
    {
        "title": "The Giving Tree",
        "author": "Shel Silverstein",
        "price": "$2.99",
        "locations": ["Half-Price Books", "Target", "Library"]
    },
    {
        "title": "How to Win Friends and Influence People",
        "author": "Dale Carnegie",
        "price": "$0.99",
        "locations": ["Chewie Bookshelf", "Target", "Library"]
    },
    {
        "title": "Visual Display of Quantitative Information",
        "author": "Edward Tufte",
        "price": "$40.00",
        "locations": ["https://tufte.com"]
    },
    {
        "title": "Black Swan",
        "author": "Nassim Taleb",
        "price": "$12.50",
        "locations": ["BN.com", "Amazon.com", "Library"]
    },
]

In [7]:
# This is a defined function we could use elsewhere
def by_price(book):
    return book["price"]

In [8]:
# Pay special attention to the by_price being an input...
# Notice that by_price doesn't have its own parentheses
max(books, key=by_price)

{'title': 'Visual Display of Quantitative Information',
 'author': 'Edward Tufte',
 'price': '$40.00',
 'locations': ['https://tufte.com']}

In [9]:
max(books, key=lambda book: book["price"])

{'title': 'Visual Display of Quantitative Information',
 'author': 'Edward Tufte',
 'price': '$40.00',
 'locations': ['https://tufte.com']}

In [10]:
min(books, key=lambda book:book["author"])

{'title': 'How to Win Friends and Influence People',
 'author': 'Dale Carnegie',
 'price': '$0.99',
 'locations': ['Chewie Bookshelf', 'Target', 'Library']}

In [11]:
sorted(books, key=lambda book: book["title"])

[{'title': 'Black Swan',
  'author': 'Nassim Taleb',
  'price': '$12.50',
  'locations': ['BN.com', 'Amazon.com', 'Library']},
 {'title': 'How to Win Friends and Influence People',
  'author': 'Dale Carnegie',
  'price': '$0.99',
  'locations': ['Chewie Bookshelf', 'Target', 'Library']},
 {'title': 'The Giving Tree',
  'author': 'Shel Silverstein',
  'price': '$2.99',
  'locations': ['Half-Price Books', 'Target', 'Library']},
 {'title': 'Visual Display of Quantitative Information',
  'author': 'Edward Tufte',
  'price': '$40.00',
  'locations': ['https://tufte.com']}]

In [12]:
sorted(books, key=lambda book: book["price"])

[{'title': 'How to Win Friends and Influence People',
  'author': 'Dale Carnegie',
  'price': '$0.99',
  'locations': ['Chewie Bookshelf', 'Target', 'Library']},
 {'title': 'Black Swan',
  'author': 'Nassim Taleb',
  'price': '$12.50',
  'locations': ['BN.com', 'Amazon.com', 'Library']},
 {'title': 'The Giving Tree',
  'author': 'Shel Silverstein',
  'price': '$2.99',
  'locations': ['Half-Price Books', 'Target', 'Library']},
 {'title': 'Visual Display of Quantitative Information',
  'author': 'Edward Tufte',
  'price': '$40.00',
  'locations': ['https://tufte.com']}]

In [13]:
# Books list of dictionaries
# Almost every key is a single scalar (one number or one string)
# Locations key points to a list
books

[{'title': 'The Giving Tree',
  'author': 'Shel Silverstein',
  'price': '$2.99',
  'locations': ['Half-Price Books', 'Target', 'Library']},
 {'title': 'How to Win Friends and Influence People',
  'author': 'Dale Carnegie',
  'price': '$0.99',
  'locations': ['Chewie Bookshelf', 'Target', 'Library']},
 {'title': 'Visual Display of Quantitative Information',
  'author': 'Edward Tufte',
  'price': '$40.00',
  'locations': ['https://tufte.com']},
 {'title': 'Black Swan',
  'author': 'Nassim Taleb',
  'price': '$12.50',
  'locations': ['BN.com', 'Amazon.com', 'Library']}]

In [14]:
# How do we access the first item on a list?
books[0]

{'title': 'The Giving Tree',
 'author': 'Shel Silverstein',
 'price': '$2.99',
 'locations': ['Half-Price Books', 'Target', 'Library']}

In [15]:
# How we access the author of the first item on the list?
# We need to specify the "author" key
# If some line of code gives you a dictionary, treat that expression like a dictionary
books[0]["author"]

'Shel Silverstein'

In [16]:
# How we access the first character of the author of the first item on the list?
books[0]["author"][0]

'S'

In [17]:
# How to access the last location of the last book?
books[-1]["locations"][-1]

'Library'

In [18]:
# Slicing syntax
"pineapple"[0:]

'pineapple'

In [19]:
# Slicing syntax
# start at index 4
"pineapple"[4:]

'apple'

In [20]:
# How do we produce a list of only the first two authors?
[
    books[0]["author"],
    books[1]["author"]
    
]

['Shel Silverstein', 'Dale Carnegie']

In [21]:
# If we need to produce a list of only the authors
authors = []
for book in books:
    author_name = book["author"]
    authors.append(author_name)

authors

['Shel Silverstein', 'Dale Carnegie', 'Edward Tufte', 'Nassim Taleb']

In [22]:
books

[{'title': 'The Giving Tree',
  'author': 'Shel Silverstein',
  'price': '$2.99',
  'locations': ['Half-Price Books', 'Target', 'Library']},
 {'title': 'How to Win Friends and Influence People',
  'author': 'Dale Carnegie',
  'price': '$0.99',
  'locations': ['Chewie Bookshelf', 'Target', 'Library']},
 {'title': 'Visual Display of Quantitative Information',
  'author': 'Edward Tufte',
  'price': '$40.00',
  'locations': ['https://tufte.com']},
 {'title': 'Black Swan',
  'author': 'Nassim Taleb',
  'price': '$12.50',
  'locations': ['BN.com', 'Amazon.com', 'Library']}]

## Data Cleaning
- We can't do math on strings

In [23]:
# "adding" strings is not a type error
# but if the strings hold numbers, we don't add the numbers, only concat the string
"mango" + "icee"

'mangoicee'

In [24]:
# Multiplying a string by a number repeats the string...
"mango" * 3

'mangomangomango'

In [25]:
def get_price(book):
    # reassigning is over-writing an existing variable 
    # to point to a new or transformed value
    book["price"] = float(book["price"].replace("$", ""))
    return book

get_price({"price": "$2.99"})

{'price': 2.99}

In [26]:
# Base python data cleaning of our books list
clean_books = []

# for singular in plural
for book in books:
    clean_book = get_price(book)
    clean_books.append(clean_book)
    
clean_books

[{'title': 'The Giving Tree',
  'author': 'Shel Silverstein',
  'price': 2.99,
  'locations': ['Half-Price Books', 'Target', 'Library']},
 {'title': 'How to Win Friends and Influence People',
  'author': 'Dale Carnegie',
  'price': 0.99,
  'locations': ['Chewie Bookshelf', 'Target', 'Library']},
 {'title': 'Visual Display of Quantitative Information',
  'author': 'Edward Tufte',
  'price': 40.0,
  'locations': ['https://tufte.com']},
 {'title': 'Black Swan',
  'author': 'Nassim Taleb',
  'price': 12.5,
  'locations': ['BN.com', 'Amazon.com', 'Library']}]

In [27]:
total = 0
for book in books:
    # += is short for total = total + something
    total += book["price"]

total

56.480000000000004

In [28]:
# Let's redefine this to be the messy data
books = [
    {
        "title": "The Giving Tree",
        "author": "Shel Silverstein",
        "price": "$2.99",
        "locations": ["Half-Price Books", "Target", "Library"]
    },
    {
        "title": "How to Win Friends and Influence People",
        "author": "Dale Carnegie",
        "price": "$0.99",
        "locations": ["Chewie Bookshelf", "Target", "Library"]
    },
    {
        "title": "Visual Display of Quantitative Information",
        "author": "Edward Tufte",
        "price": "$40.00",
        "locations": ["https://tufte.com"]
    },
    {
        "title": "Black Swan",
        "author": "Nassim Taleb",
        "price": "$12.50",
        "locations": ["BN.com", "Amazon.com", "Library"]
    },
    {
        "title": "Black Swan",
        "author": "Nassim Taleb",
        "price": "$12.50",
        "locations": ["BN.com", "Amazon.com", "Library"]
    },
    {
        "title": "Black Swan",
        "author": "Nassim Taleb",
        "price": "$12.50",
        "locations": ["BN.com", "Amazon.com", "Library"]
    }
]

In [29]:
import pandas as pd

In [30]:
df = pd.DataFrame(books)
df

Unnamed: 0,title,author,price,locations
0,The Giving Tree,Shel Silverstein,$2.99,"[Half-Price Books, Target, Library]"
1,How to Win Friends and Influence People,Dale Carnegie,$0.99,"[Chewie Bookshelf, Target, Library]"
2,Visual Display of Quantitative Information,Edward Tufte,$40.00,[https://tufte.com]
3,Black Swan,Nassim Taleb,$12.50,"[BN.com, Amazon.com, Library]"
4,Black Swan,Nassim Taleb,$12.50,"[BN.com, Amazon.com, Library]"
5,Black Swan,Nassim Taleb,$12.50,"[BN.com, Amazon.com, Library]"


In [31]:
# How do we clean the price column?
df["price"] = df["price"].str.replace("$", "").astype(float)
df

  df["price"] = df["price"].str.replace("$", "").astype(float)


Unnamed: 0,title,author,price,locations
0,The Giving Tree,Shel Silverstein,2.99,"[Half-Price Books, Target, Library]"
1,How to Win Friends and Influence People,Dale Carnegie,0.99,"[Chewie Bookshelf, Target, Library]"
2,Visual Display of Quantitative Information,Edward Tufte,40.0,[https://tufte.com]
3,Black Swan,Nassim Taleb,12.5,"[BN.com, Amazon.com, Library]"
4,Black Swan,Nassim Taleb,12.5,"[BN.com, Amazon.com, Library]"
5,Black Swan,Nassim Taleb,12.5,"[BN.com, Amazon.com, Library]"


In [32]:
# We need to re-assign variables to re-assign the values they point to in memory
price = "$1.99"
price = float(price.replace("$", ""))
price

1.99

In [33]:
# What's the sum of all the prices?
df.price.sum()

81.48

In [34]:
# What's the average price?
df.price.mean()

13.58

In [35]:
# accessing lists with .apply to get at fields in a dataframe that are collections
# Exercise: create a simple list of all of the locations represented
# If you ever feel like you NEEED NEED NEED a for loop on dataframe, consider using .apply(axis=1)

In [36]:
all_locations = []

In [37]:
def append_locations(locations):
    for location in locations:
        all_locations.append(location)

append_locations(["SAMS club", "Costco", "The Twig"])

all_locations

['SAMS club', 'Costco', 'The Twig']

In [38]:
append_locations(["ryanorsinger.com"])

all_locations

['SAMS club', 'Costco', 'The Twig', 'ryanorsinger.com']

In [39]:
df.head()

Unnamed: 0,title,author,price,locations
0,The Giving Tree,Shel Silverstein,2.99,"[Half-Price Books, Target, Library]"
1,How to Win Friends and Influence People,Dale Carnegie,0.99,"[Chewie Bookshelf, Target, Library]"
2,Visual Display of Quantitative Information,Edward Tufte,40.0,[https://tufte.com]
3,Black Swan,Nassim Taleb,12.5,"[BN.com, Amazon.com, Library]"
4,Black Swan,Nassim Taleb,12.5,"[BN.com, Amazon.com, Library]"


In [40]:
all_locations = []

In [41]:
# Axis=1 tells the .apply to apply the provided function to each row
# The append_locations function did not return a value, which is why the output is None
# The functionality of append_locations is to directly append strings from each list to a larger list
# There's no return value, so the return is None
df.locations.apply(append_locations)

0    None
1    None
2    None
3    None
4    None
5    None
Name: locations, dtype: object

In [42]:
df.head()

Unnamed: 0,title,author,price,locations
0,The Giving Tree,Shel Silverstein,2.99,"[Half-Price Books, Target, Library]"
1,How to Win Friends and Influence People,Dale Carnegie,0.99,"[Chewie Bookshelf, Target, Library]"
2,Visual Display of Quantitative Information,Edward Tufte,40.0,[https://tufte.com]
3,Black Swan,Nassim Taleb,12.5,"[BN.com, Amazon.com, Library]"
4,Black Swan,Nassim Taleb,12.5,"[BN.com, Amazon.com, Library]"


In [43]:
all_locations = set(all_locations)
all_locations

{'Amazon.com',
 'BN.com',
 'Chewie Bookshelf',
 'Half-Price Books',
 'Library',
 'Target',
 'https://tufte.com'}