# <span style="color:darkblue"> Lecture 8b: Map </span>

<font size = "5">

In the previous lecture we ...

- Worked through the definition of functions
- Illustrated some examples

In this lecture, we will ...

- Discuss the syntax of functions (local/global)
- Apply functions to multiple elements in a data frame
- Introduce ".py" files

## <span style="color:darkblue"> I. Import Libraries </span>

In [1]:
# the "pandas" library is for manipualting datasets

import pandas as pd
import numpy as np


## <span style="color:darkblue"> II. Operations over data frames (map) </span>


<font size = "5">

Create an empty data frame

In [2]:
data  = pd.DataFrame()

In [3]:
print(data)

Empty DataFrame
Columns: []
Index: []


<font size = "5">

Add variables

In [4]:
# The following are lists with values for different individuals
# "age" is the number of years
# "num_underage_siblings" is the total number of underage siblings
# "num_adult_siblings" is the total number of adult siblings

data["age"] = [18,29,15,32,6]
data["num_underage_siblings"] = [0,0,1,1,0]
data["num_adult_siblings"] = [1,0,0,1,0]


In [5]:
data

Unnamed: 0,age,num_underage_siblings,num_adult_siblings
0,18,0,1
1,29,0,0
2,15,1,0
3,32,1,1
4,6,0,0


<font size = "5">

Define functions

In [6]:
# The first two functions return True/False depending on age constraints
# The third function returns the sum of two numbers
# The fourt function returns a string with the age bracket

fn_iseligible_vote = lambda age: age >= 18

fn_istwenties = lambda age: (age >= 20) & (age < 30)

fn_sum = lambda x,y: x + y

def fn_agebracket(age):
    if (age >= 18):
        status = "Adult"
    elif (age >= 10) & (age < 18):
        status = "Adolescent"
    else:
        status = "Child"
    return(status)


<font size = "5">
Applying functions with one argument: <br>

```python
 apply(myfunction)
 ```
 - Takes a dataframe series (a column vector) as an input
 - Computes function separately for each individual


In [7]:
# The fucntion "apply" will extract each element and return the function value
# It is similar to running a "for-loop" over each element

data["can_vote"]    = data["age"].apply(fn_iseligible_vote)
data["in_twenties"] = data["age"].apply(fn_istwenties)
data["age_bracket"] = data["age"].apply(fn_agebracket)


# NOTE: The following code also works:
# data["can_vote"]    = data["age"].apply(lambda age: age >= 18)
# data["in_twenties"] = data["age"].apply(lambda age: (age >= 20) & (age < 30))

display(data)


Unnamed: 0,age,num_underage_siblings,num_adult_siblings,can_vote,in_twenties,age_bracket
0,18,0,1,True,False,Adult
1,29,0,0,True,True,Adult
2,15,1,0,False,False,Adolescent
3,32,1,1,True,False,Adult
4,6,0,0,False,False,Child


<font size = "5">

Creating a new variable

In [8]:
data['new_var'] = data['age'].apply(lambda age: age >= 18)
data

Unnamed: 0,age,num_underage_siblings,num_adult_siblings,can_vote,in_twenties,age_bracket,new_var
0,18,0,1,True,False,Adult,True
1,29,0,0,True,True,Adult,True
2,15,1,0,False,False,Adolescent,False
3,32,1,1,True,False,Adult,True
4,6,0,0,False,False,Child,False


<font size = "5">

Dropping an existing variable

In [9]:
data = data.drop(columns=['new_var'])
data

Unnamed: 0,age,num_underage_siblings,num_adult_siblings,can_vote,in_twenties,age_bracket
0,18,0,1,True,False,Adult
1,29,0,0,True,True,Adult
2,15,1,0,False,False,Adolescent
3,32,1,1,True,False,Adult
4,6,0,0,False,False,Child


<font size = "5">

Mapping functions with one or more arguments <br>

**Definition:** The ```map()``` function executes a <br>
specified function for each item in an iterable <br>
(such as a list or an array). <br>
 The item is sent to the function as a parameter.

```python
list(map(myfunction, list1,list2, ....))
```

In [10]:
list(map(fn_iseligible_vote, data["age"]))

[True, True, False, True, False]

In [11]:
# Repeat the above example with map
# We use list() to convert the output to a list
# The first argument of map() is a function
# The following arguments are the subarguments of the function

data["can_vote_map"] = list(map(fn_iseligible_vote, data["age"]))

In [12]:
# In this example, there are more than two arguments

data["num_siblings"] = list(map(fn_sum,
                                data["num_underage_siblings"],
                                data["num_adult_siblings"]))

In [13]:
list(map(fn_sum, data["num_underage_siblings"],
         data["num_adult_siblings"]))

[1, 0, 1, 2, 0]

In [14]:
data["num_underage_siblings"]

0    0
1    0
2    1
3    1
4    0
Name: num_underage_siblings, dtype: int64

In [15]:
data["num_adult_siblings"]

0    1
1    0
2    0
3    1
4    0
Name: num_adult_siblings, dtype: int64

<font size = "5">

<span style="color:darkgreen"> Recommended! </span>

- Arguments can be split into multiple lines!
- Start a separate line after a comma
- Experts recommend each line has 80 characters or less

In [90]:
data["num_siblings"] = list(map(fn_sum,
                                data["num_underage_siblings"],
                                data["num_adult_siblings"]))

<font size = "5">

Try it yourself!

- Write a function checking whether num_siblings $\ge$ 1
- Add a variable to the dataset called "has_siblings"
- Assign True/False to this variable using "apply()"

In [16]:
def has_siblings(num_siblings):
    return num_siblings >= 1

data["has_siblings"] = data["num_siblings"].apply(has_siblings)
data


Unnamed: 0,age,num_underage_siblings,num_adult_siblings,can_vote,in_twenties,age_bracket,can_vote_map,num_siblings,has_siblings
0,18,0,1,True,False,Adult,True,1,True
1,29,0,0,True,True,Adult,True,0,False
2,15,1,0,False,False,Adolescent,False,1,True
3,32,1,1,True,False,Adult,True,2,True
4,6,0,0,False,False,Child,False,0,False


<font size = "5">

Try it yourself!

- Read the car dataset "data_raw/features.csv"
- Create a function that tests whether mpg $\ge$ 29
- Add a variable "mpg_above_29" which is True/False if mpg $\ge$ 29
- Store the new dataset to "data_clean/features.csv"


In [19]:
# Write your own code

car_data = pd.read_csv("data/features.csv")
def mpg_above_29(mpg):
    return mpg >= 29
car_data["mpg_above_29"] = car_data["mpg"].apply(mpg_above_29)
car_data.to_csv("data/features.csv", index=False)




<font size = "5">

Try it yourself!

- Map can also be applied to simple lists!
- Create a lambda function with arguments {fruit,color}.
- The function returns the string <br>
" A {fruit} is {color}"
- Create the following two lists:

``` list_fruits  = ["banana","strawberry","kiwi"] ```

``` list_colors  = ["yellow","red","green"] ```
- Use the list(map()) function to output a list with the form

In [20]:
# Write your own code

list_fruits = ["banana", "strawberry", "kiwi"]
list_colors = ["yellow", "red", "green"]

result = list(map(lambda fruit, color: f"A {fruit} is {color}", list_fruits, list_colors))
result




['A banana is yellow', 'A strawberry is red', 'A kiwi is green']

## <span style="color:darkblue"> III. External Scripts </span>

<font size = "5">

".ipynb" files ...

- Markdown + python code
- Great for interactive output!

".py" files ...

- Python (only) script
- Used for specific tasks
- Why? Split code into smaller, more manageable files



<font size = "5">

<table><tr>
<td style = "border:0px"> <img src="figures/screenshot_py_functions.png" alt="drawing" width="300"/>  </td>
<td style = "border:0px">

File with functions

 </td>
</tr></table>


<font size = "5">


You can reference Python scripts "as if" you were running them <br>
from the Jupyter notebook

- Can help break down big projects into smaller chunks
- Keep things organized in subfolders
- Interact with variables in the current environment

In [21]:
# Check the script in the subfolder

message = "hello"
exec(open("./scripts/example_script.py").read())
message_output


'This is a message: hello'

<font size = "5">

**A module is just a Python program that ends with <br>
.py extension and a folder that contains a <br>
 module becomes a package!**

<font size = "5">

We can import functions into the working <br>
environment from a file

- When you input code this way it won't interact with variables <br>
in the current environment
- Best reserved for functions or parameter values
- Not for code that needs to interact with things you <br>
defined in the current environment

In [22]:
import scripts.example_functions as ef

<font size = "5">

We reference them using the alias

In [23]:
x = 1
print(ef.fn_quadratic(1))
print(ef.fn_quadratic(5))

ef.message_hello("Juan")

1
25


'hi Juan'


<font size = "5">

<table><tr>
<td style = "border:0px"> <img src="figures/screenshot_py_variables.png" alt="drawing" width="300"/>  </td>
<td style = "border:0px">

File with variables

- Storing values/settings
- Variables are global <br>
(can be referenced later)

</td>
</tr></table>

<font size = "5">

We can also import and reference variables

In [24]:
import scripts.example_variables as ev

In [25]:
# When we run this program
# # the value of alpha will be overwritten

alpha = 1
print(alpha)
print(ev.alpha)

1
5
