# Day 5 –– Functions and Variables

### Introductions / What does everyone want out of this course?

Just wanted to kind of go around real quick and see where everyone is at.

### Functions

I think Jake touched on this the other day, but one thing that we're trying to avoid when coding is repetition. We never want to repeat ourselves. You'll find that writing a function can be frustrating and laborious. Sometimes you'll skip it entirely and muscle your way through whatever is you're trying to accomplish. And sometimes this is doable, but a lot of the times it's not. The larger your datasets get and the more complicated you're process becomes, the more vital function writing is. 

Functions as a concept shouldn't be foreign to you guys. You've used a lot of them up to this point. Writing your own is the challenge.

If you feel like you're bad at first, don't worry. It takes time, and sometimes you'll see that the people that are worst to start become the best as time goes on. At the beginning, it can come down to overthinking it.

### Importing Libraries

In [11]:
import pandas as pd
import numpy as np

### Writing your own Simple Function

In [12]:
def function(argument):
    """Description of what your function does"""
    local_variable = argument + ' is our argument' # Local, local, local
    print(local_variable)
    

In [3]:
function('egg')

egg is our argument


### Scope and what it means

We see that our **local_variable** is, as it sounds, local. It cannot be recalled here because it exists only in our LOCAL environment (i.e., in the realm of our function) 

In [4]:
local_variable

NameError: name 'local_variable' is not defined

In [197]:
local_variable = 'egg is our argument'

In [200]:
local_variable

'egg is our argument'

From here on I'm going to try and work with functions that I think relevant. They may increase quickly in complexity, but you'll find that you could actually use most of these in your work. 

## Today's Datasets

Today we're going to be working with the data set I've simulated below. Let's assume that this is survey data recorded from 100 different towns. In this survey we asked people "If you had in front of you pizza, a bowl of fruit, fried chicken, and a lamb leg, which type of foods would you eat?" They were allowed to choose as many or as few as they liked. The numbers are reported as percentages.

In [4]:
favfoods = pd.DataFrame(np.random.randint(0, 100, size = (100, 4)), columns=['pizza', 'fruit', 'chicken', 'lamb'])

favfoods.head()


Unnamed: 0,pizza,fruit,chicken,lamb
0,46,89,47,60
1,25,92,28,30
2,64,8,78,3
3,85,35,52,46
4,10,10,95,37


We're also going to look at the 'Let It Be' lyrics you guys worked with last class.

In [4]:
lyrics = """When I find myself in times of trouble, Mother Mary comes to me
Speaking words of wisdom, let it be
And in my hour of darkness she is standing right in front of me
Speaking words of wisdom, let it be
Let it be, let it be, let it be, let it be
Whisper words of wisdom, let it be
And when the broken hearted people living in the world agree
There will be an answer, let it be
For though they may be parted, there is still a chance that they will see
There will be an answer, let it be
Let it be, let it be, let it be, let it be
There will be an answer, let it be
Let it be, let it be, let it be, let it be
Whisper words of wisdom, let it be
Let it be, let it be, let it be, let it be
Whisper words of wisdom, let it be
And when the night is cloudy there is still a light that shines on me
Shine until tomorrow, let it be
I wake up to the sound of music, Mother Mary comes to me
Speaking words of wisdom, let it be
Let it be, let it be, let it be, yeah, let it be
There will be an answer, let it be
Let it be, let it be, let it be, yeah, let it be
Whisper words of wisdom, let it be
"""

### Dummy Variables and the Return Keyword

**return** is going to –– wait for it –– return the function's value. Why should we use **return**? The **return** keyword lets you return values from functions. If we instead were to use **print()**, we'd find that whatever variable we assigned the application of our function to would be of the 'NoneType'. 'NoneType,' as it sounds, is a 'None' object's type, where a 'None' object indicates **no value**. 'None' is the value returned from functions that don't return ***anything***. Thus we'd like to avoid creating 'NoneTypes,' as they don't add much in the form of value.

In [138]:
def dummy_conv(data, column, thresh):
    """Returns a dummy variable for a specified column
    and threshold"""
    
    conv_dummy = [] # Empty list operating as our dummy variable
    dummy_name = column + '_dummy'
    
    for perc in data[column]: # Indexing dataset
        if perc > thresh:
            conv_dummy.append(1) # Adding one to the list
        else:
            conv_dummy.append(0) # Adding zero to the list
    conv_dummy = pd.DataFrame({dummy_name:conv_dummy})
    data = pd.concat([data, conv_dummy], axis = 1) # Adding dummy to
                                                   # the column
    return data

A few things here. First, this function takes multiple arguments. That's good. It makes our function more versatile and more dynamic. Second, this function uses an **if-else** statement. From the looks of the class content, you guys have already covered conditional statements. But just as a refresher: If you've ever taken a course in logic or real analysis, though, you're probably already familiar with **if-then** statements. That's the first part of our function: **if** the percentage is above our given threshold, **then** this town is pizza happy –– **else** (as in it is not), this town is *not* pizza happy. Third, check out the final part of our function. Our list is renamed 'conv_dummy' and converted to a dataframe. Now we're returned a value that we can potentially add to our dataframe.

***Note*** that Pandas' get_dummies function comes in handy when dealing with categorical variables.

In [139]:
pizza_happy = dummy_conv(favfoods, 'pizza', 50)
pizza_happy.head()

Unnamed: 0,pizza,fruit,chicken,lamb,pizza_dummy
0,21,10,54,92,0
1,9,73,60,26,0
2,69,9,83,85,1
3,11,47,66,61,0
4,88,20,26,53,1


Here we see that our object is a **dataframe**

In [42]:
type(pizza_happy) 

pandas.core.frame.DataFrame

***Note***, though, what happens when we use the **print** function rather than the **return** keyword:

In [113]:
def dummy_conv(data, column, thresh = 50):
    
    conv_dummy = [] # Empty list operating as our dummy variable
    dummy_name = column + '_dummy'
    
    for perc in data[column]: # Indexing dataset
        if perc > thresh:
            conv_dummy.append(1) # Adding one to the list
        else:
            conv_dummy.append(0) # Adding zero to the list
    conv_dummy = pd.DataFrame({dummy_name:conv_dummy})
    data = pd.concat([data, conv_dummy], axis = 1) # Adding dummy to
                                                   # the column
    print(data)

In [114]:
pizza_happy = dummy_conv(favfoods, 'pizza')

    pizza  fruit  chicken  lamb  pizza_dummy
0      63      6       61    15            1
1      76     87       68    34            1
2      43     31       37    82            0
3      99     93       26     2            1
4      90     45       91    48            1
5      72     41       48    85            1
6      84      9       94    82            1
7      45     89       43    74            0
8      40     47       70    32            0
9      87     62       52    73            1
10     64     56       67    89            1
11     38     42       37     9            0
12     19     51       67    85            0
13     58     99       34    59            1
14     47     49       25    46            0
15     51      1       30    59            1
16     77     54       40    66            1
17     67     26       98    14            1
18     19     12        8     7            0
19     31     17        3    98            0
20     91     32        5    15            1
21     11 

In [105]:
type(pizza_happy)

pandas.core.frame.DataFrame

Behold the 'NoneType' –– a frightful beast. Also note the use of a ***default*** argument. Rather than have to input 50 as my 'thresh' argument above, the program simply took it for granted. You'll find that most functions you use have default arguments.

### Practice

Real quick, let's create a FizzBuzz function using the current dataframe. We'll iterate over the 'pizza' column. 

If you remember the idea from the last class, it's if the number is divisible by 3, 'Fizz'; if it's divisible by 5, 'Buzz'; if it's divisible by both 3 and 5, 'FizzBuzz'; and if it's neither divisible by 3 or 5, print nothing.

In [205]:
def fizzbuzz(data, col):
    """if the number is divisible by 3, return 'Fizz'; 
    if it's divisible by 5, return 'Buzz'; if it's 
    divisible by both 3 and 5, return 'FizzBuzz'; and 
    if it's neither divisible by 3 or 5, print nothing."""
    
    col = data[col]
    
    for fig in col:
        if fig % 3 == 0 and fig % 5 != 0:
            print("Fizz")
        elif fig % 3 != 0 and fig % 5 == 0:
            print("Buzz")
        elif fig % 3 == 0 and fig % 5 == 0:
            print("FizzBuzz")
        else:
            print("")
            
    

In [206]:
fizzbuzz(favfoods, 'pizza')

Fizz
Fizz
Fizz




Fizz
Buzz
Fizz
Fizz
Fizz
FizzBuzz
FizzBuzz
Buzz
FizzBuzz
FizzBuzz
Fizz




Buzz

Fizz
Buzz



Fizz


Fizz
Buzz
Buzz

Fizz
Fizz
Fizz






Buzz
FizzBuzz

Fizz





Fizz



Fizz
Fizz
Fizz




Fizz




Buzz

Fizz
Fizz
Buzz
Fizz

Fizz


FizzBuzz



Fizz
FizzBuzz

Buzz

Fizz


Fizz


Buzz



Buzz


### Counting the Lyrics of 'Let It Be'

I think you guys did something similar in the last class, but here I just figured we could write it as a function.

In [123]:
import string


list

In [7]:
lyrics = lyrics.lower().replace('\n', ' ').replace(',', '').split(' ')
lyrics.pop()
lyrics

['when',
 'i',
 'find',
 'myself',
 'in',
 'times',
 'of',
 'trouble',
 'mother',
 'mary',
 'comes',
 'to',
 'me',
 'speaking',
 'words',
 'of',
 'wisdom',
 'let',
 'it',
 'be',
 'and',
 'in',
 'my',
 'hour',
 'of',
 'darkness',
 'she',
 'is',
 'standing',
 'right',
 'in',
 'front',
 'of',
 'me',
 'speaking',
 'words',
 'of',
 'wisdom',
 'let',
 'it',
 'be',
 'let',
 'it',
 'be',
 'let',
 'it',
 'be',
 'let',
 'it',
 'be',
 'let',
 'it',
 'be',
 'whisper',
 'words',
 'of',
 'wisdom',
 'let',
 'it',
 'be',
 'and',
 'when',
 'the',
 'broken',
 'hearted',
 'people',
 'living',
 'in',
 'the',
 'world',
 'agree',
 'there',
 'will',
 'be',
 'an',
 'answer',
 'let',
 'it',
 'be',
 'for',
 'though',
 'they',
 'may',
 'be',
 'parted',
 'there',
 'is',
 'still',
 'a',
 'chance',
 'that',
 'they',
 'will',
 'see',
 'there',
 'will',
 'be',
 'an',
 'answer',
 'let',
 'it',
 'be',
 'let',
 'it',
 'be',
 'let',
 'it',
 'be',
 'let',
 'it',
 'be',
 'let',
 'it',
 'be',
 'there',
 'will',
 'be',
 'an'

'\n' can be used within a string to start a ***new line***. And so here we **replace** a new line with a space, or ' '. We also, as you can see, replace a ',' with nothing, or ''. pop() removes the specified item in the list. Because we don't specify which element that is, it goes with the default (that being the final element in the list). Here, that would have just been ''.

In [78]:
def lyric_count(song):    
    """Prints a record of song lyrics by count""" # Describe what our function does
    
    lyric_record = {} # Initialize our empty dictionary

    for word in song: # Iterating over every word in our list 

        if word in lyric_record.keys(): 
            lyric_record[word] += 1 # Add one to the count if the lyric is already recorded

        else:
            lyric_record[word] = 1 # Record the lyric as existing if there is no record already

    # Print the populated dictionary
    print(lyric_record)


In [127]:
letitbe_count = lyric_count(lyrics)

In [128]:
letitbe_count

{'when': 3,
 'i': 2,
 'find': 1,
 'myself': 1,
 'in': 4,
 'times': 1,
 'of': 11,
 'trouble': 1,
 'mother': 2,
 'mary': 2,
 'comes': 2,
 'to': 3,
 'me': 4,
 'speaking': 3,
 'words': 7,
 'wisdom': 7,
 'let': 36,
 'it': 36,
 'be': 41,
 'and': 3,
 'my': 1,
 'hour': 1,
 'darkness': 1,
 'she': 1,
 'is': 4,
 'standing': 1,
 'right': 1,
 'front': 1,
 'whisper': 4,
 'the': 4,
 'broken': 1,
 'hearted': 1,
 'people': 1,
 'living': 1,
 'world': 1,
 'agree': 1,
 'there': 6,
 'will': 5,
 'an': 4,
 'answer': 4,
 'for': 1,
 'though': 1,
 'they': 2,
 'may': 1,
 'parted': 1,
 'still': 2,
 'a': 2,
 'chance': 1,
 'that': 2,
 'see': 1,
 'night': 1,
 'cloudy': 1,
 'light': 1,
 'shines': 1,
 'on': 1,
 'shine': 1,
 'until': 1,
 'tomorrow': 1,
 'wake': 1,
 'up': 1,
 'sound': 1,
 'music': 1,
 'yeah': 2}

In [84]:
type(letitbe_count)

NoneType

### Quick Group Task

Now what if rather than a printed report of the dictionary, I wanted my function to return a value that I could store in my letitbe_count variable? Something that I could work with later. What would I do? 

In [5]:
def lyric_count(song):    
    """Prints a record of song lyrics by count""" # Describe what our function does
    
    lyric_record = {} # Initialize our empty dictionary

    for word in song: # Iterating over every word in our list 

        if word in lyric_record.keys(): 
            lyric_record[word] += 1 # Add one to the count if the lyric is already recorded

        else:
            lyric_record[word] = 1 # Record the lyric as existing if there is no record already

    # Print the populated dictionary
    return lyric_record


letitbe_count = lyric_count(lyrics)

# Printing the result
print(letitbe_count)

NameError: name 'lyrics' is not defined

In [74]:
type(letitbe_count)

dict

## Lambda Functions

**Lambda functions** are quick and dirty ways of writing quick, simple functions. In this example we're going to use the **filter()** function, which allows us to ***filter*** out words of our lyric list that don't satisfy our 'pizza happy' criteria. 

In [112]:
quick_fn = (lambda data, col: data[col] >= 50)

pizza_happy = quick_fn(pizza_happy, 'pizza')
pizza_happy

0      True
1      True
2     False
3      True
4      True
5      True
6      True
7     False
8     False
9      True
10     True
11    False
12    False
13     True
14    False
15     True
16     True
17     True
18    False
19    False
20     True
21    False
22    False
23    False
24     True
25     True
26     True
27    False
28     True
29    False
      ...  
70    False
71     True
72     True
73     True
74    False
75    False
76    False
77     True
78    False
79     True
80     True
81    False
82     True
83    False
84     True
85     True
86    False
87    False
88     True
89     True
90     True
91     True
92    False
93     True
94    False
95    False
96    False
97     True
98    False
99     True
Name: pizza, Length: 100, dtype: bool

### Practice

Let's try and write a variation of our dummy variable function from before with a lambda function. Same idea (three arguments), but now we'll be able to do it all in one line.

In [None]:
quick_dummy = (lambda data, col, thresh: data[col] >= thresh)

pizza_happy = quick_dummy(favfoods, 'pizza', 50)
pizza_happy



## Try and Except and Error Handling

In [9]:
# Define count_entries()
def lyric_count(song = lyrics):    
    """Prints a record of song lyrics by count""" # Describe what our function does
    
    lyric_record = {} # Initialize our empty dictionary
    
    # Initialize EXCEPTION handling with try-except verbage
    try:
        for word in song: # Iterating over every word in our list 

            if word in lyric_record.keys(): 
                lyric_record[word] += 1 # Add one to the count if the lyric is already recorded

            else:
                lyric_record[word] = 1 # Record the lyric as existing if there is no record already

        # Print the populated dictionary
        return lyric_record
    # Add except block
    except:
        print('Cannot compute –– variable is not iterable.')

In [129]:
letitbe_count1 = lyric_count(lyrics)

# Printing the result
print(letitbe_count1)

{'when': 3, 'i': 2, 'find': 1, 'myself': 1, 'in': 4, 'times': 1, 'of': 11, 'trouble': 1, 'mother': 2, 'mary': 2, 'comes': 2, 'to': 3, 'me': 4, 'speaking': 3, 'words': 7, 'wisdom': 7, 'let': 36, 'it': 36, 'be': 41, 'and': 3, 'my': 1, 'hour': 1, 'darkness': 1, 'she': 1, 'is': 4, 'standing': 1, 'right': 1, 'front': 1, 'whisper': 4, 'the': 4, 'broken': 1, 'hearted': 1, 'people': 1, 'living': 1, 'world': 1, 'agree': 1, 'there': 6, 'will': 5, 'an': 4, 'answer': 4, 'for': 1, 'though': 1, 'they': 2, 'may': 1, 'parted': 1, 'still': 2, 'a': 2, 'chance': 1, 'that': 2, 'see': 1, 'night': 1, 'cloudy': 1, 'light': 1, 'shines': 1, 'on': 1, 'shine': 1, 'until': 1, 'tomorrow': 1, 'wake': 1, 'up': 1, 'sound': 1, 'music': 1, 'yeah': 2}


In [10]:
lyric_count()

{'when': 3,
 'i': 2,
 'find': 1,
 'myself': 1,
 'in': 4,
 'times': 1,
 'of': 11,
 'trouble': 1,
 'mother': 2,
 'mary': 2,
 'comes': 2,
 'to': 3,
 'me': 4,
 'speaking': 3,
 'words': 7,
 'wisdom': 7,
 'let': 36,
 'it': 36,
 'be': 41,
 'and': 3,
 'my': 1,
 'hour': 1,
 'darkness': 1,
 'she': 1,
 'is': 4,
 'standing': 1,
 'right': 1,
 'front': 1,
 'whisper': 4,
 'the': 4,
 'broken': 1,
 'hearted': 1,
 'people': 1,
 'living': 1,
 'world': 1,
 'agree': 1,
 'there': 6,
 'will': 5,
 'an': 4,
 'answer': 4,
 'for': 1,
 'though': 1,
 'they': 2,
 'may': 1,
 'parted': 1,
 'still': 2,
 'a': 2,
 'chance': 1,
 'that': 2,
 'see': 1,
 'night': 1,
 'cloudy': 1,
 'light': 1,
 'shines': 1,
 'on': 1,
 'shine': 1,
 'until': 1,
 'tomorrow': 1,
 'wake': 1,
 'up': 1,
 'sound': 1,
 'music': 1,
 'yeah': 2}

#### Except Function

In [11]:
flatironzip = 10010

lyric_count(flatironzip)

Cannot compute –– variable is not iterable.


## The ***args and ****kwargs flexible arguments

### *args flexible argument

**args* allows you to pass a variable number of arguments to a function, turning whatever the input into a tuple. This allows for flexibility. You can then slice this tuple to access its different values.

Here we write a function that acts as a product. I.e., every argument is multiplied by every argument that follows. 

***Note***: in class I called the product a "Cartesian product." That's my bad. A Cartesian product is the product of two sets, whereas this is just the product of two numbers. The thought remains the same, though: a summation is to addition as a product is to multiplication.

In [9]:
def product(*args):
    """Multiply all *args values by one another."""
    
    prod = 1 # Initializing our product
    
    # Iterating over all of our arguments
    
    for num in args:
        prod *= num
        
    return prod

In [10]:
product(2, 3, 5)

30

***Note***:

    a *= b is equivalent to a = a * b

## Practice

Same idea, except this time write the function as a summation.

In [13]:
def summation(*args):
    """Multiply all *args values by one another."""
    
    sum1 = 0 # Initializing our product
    
    # Iterating over all of our arguments
    
    for num in args:
        sum1 += num
        
    return sum1

In [14]:
summation(2, 3, 5)

10

### **kwargs flexible argument

**kwargs allows us to handle NAMED arguments that have not been defined in advance. It converts the identifier key-value pairs into a dictionary within the function body.

In [161]:
def daily_task(**kwargs):
    """Use the key-value pairs we pass through using **kwargs and print results."""
    
    for dictkey, dictval in kwargs.items():
        print(dictkey + ': ' + dictval)

The .items() method returns a list of tuple pairs.

In [162]:
daily_task(day2 = 'Data Science Basics', day3 = 'Conditional Logic, Visualization', day4 = 'Visualization, Functions')

day2: Data Science Basics
day3: Conditional Logic, Visualization
day4: Visualization, Functions
