<center><img src="http://i.imgur.com/sSaOozN.png" width="500"></center>

## Course: Computational Thinking for Governance Analytics

### Prof. José Manuel Magallanes, PhD 
* Visiting Professor of Computational Policy at Evans School of Public Policy and Governance, and eScience Institute Senior Data Science Fellow, University of Washington.
* Professor of Government and Political Methodology, Pontificia Universidad Católica del Perú. 

_____

# Session 1:  Programming Fundamentals
## Part C: Building Functions in Python

We build functions to make the code more readable. Functions, plus the data structures and control of execution capabilities you saw before, will give you the basic tools you need to develop programs.

A function is a three-step process: Input, Transformation, Output. For example, if you need to convert a numeric value from Fahrenheit into Celsius , the input is the value in Fahrenheit, the transformation is the formula, and the output the result of the formula (a value in Celsius).

In [1]:
def converterToCelsius(valueInFarenheit): #input
    #transformation
    resultInCelsius= (valueInFarenheit-32)*5/9
    #output
    return resultInCelsius

From above, creating functions in Python requires the use of **def** followed by the name of the function; the function arguments continue between parenthesis. The process comes after the _colon_, notice that _identation_ is needed. The command **return** serves to give the output. For Python, there is a new function available:

In [2]:
converterToCelsius(100)

37.77777777777778

<a id='beginning'></a>
This session will be organized on the following topics:

1. [Input components.](#part1) 
2. [Output organization.](#part2) 
3. Applying functions
    + [to simple structures.](#part3) 
    + [to composite structures.](#part4) 



____

<a id='part1'></a>

## The function input
We control the amount of input in a function:

In [3]:
# this function requires TWO inputs:
def XsumY(valueX,valueY):
    ###
    resultSum=valueX+valueY
    ###
    return resultSum

The code above receives two values and outputs their sum. You can see how it works this way:

In [4]:
XsumY(3,10)

13

The next function uses two inputs and one of them has a *default* value:

In [8]:
def riseToPower(base,exponent=2): # two argument names!!!， with a exponent default equals to 2
    ###
    result=1
    if exponent > 0:
        for times in range(1,exponent+1): # use 'exponent + 1'...!
            result=result*base
    ###
    return(result)

Since you have a default value in the input arguments, you decide if you give that input or not. Let’s see how it works:

In [10]:
riseToPower(9)

81

In [11]:
riseToPower(9,3)

729

In [12]:
riseToPower(9,0)

1

In [13]:
# for sure you can use the arguments name:
riseToPower(base=9,exponent=0)

1

In [14]:
# using arguments names does not require order:
riseToPower(exponent=0,base=9)

1

### Homework:  

Change the above function to create the function **riseToPowerPlus**, which gives a good answer even when the power is negative.

____

In [56]:
def riseToPowerPlus(base,exponent): # two argument names
    ###
    result=1
    if exponent > 0:
        for times in range(1,exponent+1): # use 'exponent + 1'...!
            result=result*base
    elif exponent < 0:
        for lalala in range(exponent,0):
            result=result*(1/base)
    ###
    return(result)

In [58]:
#
riseToPowerPlus(base=2,exponent=-2)

0.25

Functions need argument names in the input definition, but if you have many arguments, you need to keep the order. However, Python offers two additional ways to input **several arguments**. First, let me know what happens when we divide by zero:

In [59]:
3/0

ZeroDivisionError: division by zero

In [60]:
# Then
def divRounded(numerator,denominator,precision=2):
    try:
        result = numerator/denominator
        return round(result, precision)
    except ZeroDivisionError:
        print('You can not use 0 as the denominator')       

In [61]:
# testing:
n=13
d=12
p=5
divRounded(n,d,p)

1.08333

A different approach would be to use a list or tuple with the arguments, the function requires ONE '*':

In [62]:
inputArgs=[13,12,5] # order matters, keep it.
divRounded(*inputArgs)

1.08333

A dict can be very useful, just use TWO '*':

In [63]:
inputArgs={'numerator':13, 'precision':5,'denominator':12} # order does not matter
divRounded(**inputArgs)

1.08333

[Go to page beginning](#beginning)
____

<a id='part2'></a>

## The function output

Our output has been a single value, but it can be several ones; however, you need the right structure.

In [64]:
# one input, and several output in simple data structure:
def factors(number):
    factorsList=[] # empty list that will collect output
    
    for i in range(1, number + 1):
        #if the remainder of 'number'/'i' equals zero...
        if number % i == 0:
            # ...add 'i' to the list of factors!
            factorsList.append(i)

    return factorsList # returning  values in a list.

In [65]:
factors(20) 

[1, 2, 4, 5, 10, 20]

### Homework:  

Change the function ’factors’to reduce the amount of iterations in the for loop and still get the factors shown above.

In [90]:
# one input, and several output in simple data structure:
def factors(number):
    factorsList=[] # empty list that will collect output
    from math import sqrt
    for i in range(1, int(round(sqrt(number),0))+1): # 5*5=25>25, no need to do iteration above this

        if number % i == 0:
            # ...add 'i' to the list of factors!
            factorsList.append(i)
            factorsList.append(int(20/i))

    return factorsList # returning  values in a list.

In [91]:
factors(20) 

[1, 20, 2, 10, 4, 5]

In this next case, you can have several input, and get an output organized in a more complex structure (a data frame):

In [101]:
# several input, a composite data structure:
def powerDF(aList,power=2):
    import pandas as pd
    # list comprehension
    powerList=[val**power for val in aList]
    # both lists into a dict:
    answerAsDicts={'number':aList,'power'+str(power):powerList}
    # data frame is created, and that is returned:
    return pd.DataFrame(answerAsDicts)

In [99]:
powerDF(factors(10),3)

Unnamed: 0,number,power3
0,1,1
1,20,8000
2,2,8
3,10,1000


In [100]:
# of course, this works:
valsDict={'aList':factors(10), 'power':3}
powerDF(**valsDict)

Unnamed: 0,number,power3
0,1,1
1,20,8000
2,2,8
3,10,1000


### Homework:
Make a function that reads two lists and returns a data frame with those lists and extra columns with their sum, difference, multiplication and division.

In [136]:
# several input, a composite data structure:
def powerDF(aList,power=2):
    import pandas as pd
    import numpy as np
    # list comprehension
    powerList=[val**power for val in aList]
    #
    theirsum=np.array(powerList)+np.array(aList)
    #
    theirdifference=np.array(powerList)-np.array(aList)
    #    
    theirmultiplication=np.array(powerList)*np.array(aList)
    #    
    theirdivision=np.array(powerList)/np.array(aList)
    # both lists into a dict:
    answerAsDicts={'number':aList,'power'+str(power):powerList,'theirsum':theirsum,'theirdifference':theirdifference,
                   'theirmultiplication':theirmultiplication, 'theirdivision':theirdivision}
    # data frame is created, and that is returned:
    return pd.DataFrame(answerAsDicts)

In [137]:
powerDF(factors(10),3)

Unnamed: 0,number,power3,theirsum,theirdifference,theirmultiplication,theirdivision
0,1,1,2,0,1,1.0
1,20,8000,8020,7980,160000,400.0
2,2,8,10,6,16,4.0
3,10,1000,1010,990,10000,100.0


[Go to page beginning](#beginning)

____
<a id='part3'></a>

## Applying functions to simple structures

Imaging you have created a function that converts a value like:

In [138]:
def double(x):
    return 2*x

and you have this list:

In [139]:
myList=[1,2,3]

What can you get here?

In [142]:
double(myList)

[1, 2, 3, 1, 2, 3]

I bet you wanted something like this:

In [143]:
map(double,myList)

<map at 0x28c18f5e320>

You just see an strange result!...Well Python did do what you need, but you can't see it because it returned an **iterator**. Do this then:

In [144]:
list(map(double,myList))

[2, 4, 6]

With **map** you can apply the function to every element of the list.

Easy functions can be written using **lambda** notation:

In [145]:
double2=lambda x: 2*x
list(map(double2,myList))

[2, 4, 6]

You can use these functions to create filters:

In [146]:
drinkingAge= lambda x: x >= 21

In [147]:
agesList=[12,34,56,19,24,13]
list(filter(drinkingAge,agesList))

[34, 56, 24]

In the last line above, you filtered the original vector agesVals by combining **filter** and _drinkingAge_, the filtering works by selecting the values that have TRUE in the output of drinkingAge.

[Go to page beginning](#beginning)

____
<a id='part4'></a>

## Applying functions to composite structures

We will be using data frames often. This is a particular structure that has its **own** mechanism to apply functions:

In [148]:
#Creating data frame
import pandas as pd
data={'numberA':[10,20,30,4,5],'numberB':[6,7,8,9,10]}
dataDF=pd.DataFrame(data)
dataDF

Unnamed: 0,numberA,numberB
0,10,6
1,20,7
2,30,8
3,4,9
4,5,10


Now applying function _double_ to it:

In [149]:
double(dataDF)

Unnamed: 0,numberA,numberB
0,20,12
1,40,14
2,60,16
3,8,18
4,10,20


The function at the element level worked well, that is because the columns (which came from a list) are now arrays.

However, often you need to put more effort to make functions work in pandas. The function **apply** is very important to use a function in a data frame in pandas:

In [150]:
# this will double each element column-wise
dataDF.apply(double,axis=0)

Unnamed: 0,numberA,numberB
0,20,12
1,40,14
2,60,16
3,8,18
4,10,20


In [151]:
# this will double each element row-wise
dataDF.apply(double,axis=1)

Unnamed: 0,numberA,numberB
0,20,12
1,40,14
2,60,16
3,8,18
4,10,20


The axis argument tells in what direction the function should be applied. Double works at the level of cells, so it made no difference. 

Our function made no difference, but compare for _sum_:

In [152]:
# the sum of the colums
dataDF.apply(sum,axis=0)

numberA    69
numberB    40
dtype: int64

In [153]:
# the sum of the rows
dataDF.apply(sum,axis=1)

0    16
1    27
2    38
3    13
4    15
dtype: int64

Compare for min:

In [154]:
dataDF.apply(min) # axis=0 is the default, I can omit it.

numberA    4
numberB    6
dtype: int64

In [155]:
dataDF.apply(min,axis=1)

0    6
1    7
2    8
3    4
4    5
dtype: int64

Pandas has the function **applymap** to especifically apply a function to every cell of the data frame:

In [156]:
dataDF.applymap(double)

Unnamed: 0,numberA,numberB
0,20,12
1,40,14
2,60,16
3,8,18
4,10,20


You can have functions that operate at the cell level, or at the column (_Series_) level; _apply_ will work at both levels, in the particular axis of interest. _applymap_  works at the cell level for data frames as a whole, but not at the _Series_ level. Sometimes the difference is not obvious.

Just make sure what you have:

In [157]:
# This is a Series
dataDF.numberA

0    10
1    20
2    30
3     4
4     5
Name: numberA, dtype: int64

In [158]:
# This is a Series
dataDF['numberA']

0    10
1    20
2    30
3     4
4     5
Name: numberA, dtype: int64

In [159]:
# This is a data frame:
dataDF[['numberA']]

Unnamed: 0,numberA
0,10
1,20
2,30
3,4
4,5


In [160]:
# This is a Series
dataDF.loc[:,'numberA']

0    10
1    20
2    30
3     4
4     5
Name: numberA, dtype: int64

In [161]:
# This is a data frame:
dataDF.loc[:,['numberA']]

Unnamed: 0,numberA
0,10
1,20
2,30
3,4
4,5


In [162]:
# This is a Series
dataDF.iloc[:,0]

0    10
1    20
2    30
3     4
4     5
Name: numberA, dtype: int64

In [163]:
# This is a data frame:
dataDF.iloc[:,[0]]

Unnamed: 0,numberA
0,10
1,20
2,30
3,4
4,5


____

Solve the homework in a new Jupyter notebook, and then upload it to GitHub. Name the notebook as 'hw_functions'.

_____

* [Go to page beginning](#beginning)
* [Go to REPO in Github](https://github.com/EvansDataScience/ComputationalThinking_Gov_1)
* [Go to Course schedule](https://evansdatascience.github.io/GovernanceAnalytics/)