# Apply methods with more than one parameter

Previously we used the apply method to determine the proper buffer size of raptor nests based on the value of the *recentspec* column. This method works well when we want to calculate a new column value based on the value of a single column but what if we need to use the value of 2 or more columns?

This is possible, but I will be the first to admit that the syntax is a bit confusing. Nevertheless it is important and I will do my best to explain it.

In [1]:
%matplotlib inline
import geopandas as gpd

raptor = gpd.read_file("data/Raptor_Nests.shp")

Lets take a look at the function we created in a previous lecture.

In [2]:
def calc_raptor_buffer(spec):
    if spec=='Swainsons Hawk':
        return 333
    elif spec=='Northern Harrier':
        return 500
    else:
        return 667

And adapt it so that we also consider the *neststat* column as a factor in determining the buffer size.  Writing the function is easy

In [3]:
def calc_raptor_buffer(spec, stat):
    if spec == 'Swainsons Hawk':
        if stat == 'ACTIVE NEST':
            return 500
        else:
            return 333
    elif spec == 'Red-tail Hawk':
        if stat == 'ACTIVE NEST':
            return 800
        else:
            return 667
    else:
        return 500
        

But how do we apply this function that takes more than 1 parameter to a column when that column only has a single value?

Pandas is a rich and full-featured library and it turns out that there are multiple ways to accomplish this.  Check out this link to [stackoverflow](https://stackoverflow.com/questions/13331698/how-to-apply-a-function-to-two-columns-of-pandas-dataframe) for a complete discussion.  But the syntax that I prefer is below.

In [4]:
raptor['buffer_dist']=raptor.apply(lambda raptor: calc_raptor_buffer(raptor['recentspec'], raptor['recentstat']), axis=1)
raptor.head()

Unnamed: 0,postgis_fi,lat_y_dd,long_x_dd,lastsurvey,recentspec,recentstat,Nest_ID,geometry,buffer_dist
0,361.0,40.267502,-104.870872,2012-03-16,Swainsons Hawk,INACTIVE NEST,361,POINT (-104.79595 40.29891),333
1,362.0,40.264321,-104.860255,2012-03-16,Swainsons Hawk,INACTIVE NEST,362,POINT (-104.78897 40.22089),333
2,1.0,38.650081,-105.494251,2014-07-28,Swainsons Hawk,INACTIVE NEST,1,POINT (-105.50223 38.68694),333
3,2.0,40.309574,-104.932604,2011-01-06,Swainsons Hawk,INACTIVE NEST,2,POINT (-104.84889 40.35215),333
4,3.0,40.219343,-104.729246,2014-07-03,Swainsons Hawk,ACTIVE NEST,3,POINT (-104.74466 40.18571),500


Again, I will be the first to admit that this syntax is confusing and unfortunate.  Hopefully someone will come up with a more straightforward way in the future.

But lets walk through this one step at a time and hopefully it will make more sense.

First of all notice that in this example we are calling apply on an entire dataframe and not a single column as in the past.  This is critical to understanding how this works.

Because apply is called on the entire dataframe, the entire dataframe is what is received in the lambda function.

Recall that a lambda function takes an input value and returns an output. (*lambda x: x\*\*2* takes x and returns x squared)  In this case the input value is the entire dataframe so we can then use Pandas methods to specify individual fields.

**NOTE:** when used on an entire dataframe rather than a single column it is necessary to specify that axis=1.  The default axis=0 references index values and we want to reference column values.

Perhaps it will be more clear if we simplify the lambda expression a bit by referring to the inout value as x rather than raptor.

In [5]:
raptor['buffer_dist2']=raptor.apply(lambda x: calc_raptor_buffer(x['recentspec'], x['recentstat']), axis=1)
raptor.head()

Unnamed: 0,postgis_fi,lat_y_dd,long_x_dd,lastsurvey,recentspec,recentstat,Nest_ID,geometry,buffer_dist,buffer_dist2
0,361.0,40.267502,-104.870872,2012-03-16,Swainsons Hawk,INACTIVE NEST,361,POINT (-104.79595 40.29891),333,333
1,362.0,40.264321,-104.860255,2012-03-16,Swainsons Hawk,INACTIVE NEST,362,POINT (-104.78897 40.22089),333,333
2,1.0,38.650081,-105.494251,2014-07-28,Swainsons Hawk,INACTIVE NEST,1,POINT (-105.50223 38.68694),333,333
3,2.0,40.309574,-104.932604,2011-01-06,Swainsons Hawk,INACTIVE NEST,2,POINT (-104.84889 40.35215),333,333
4,3.0,40.219343,-104.729246,2014-07-03,Swainsons Hawk,ACTIVE NEST,3,POINT (-104.74466 40.18571),500,500


Another possibility is to rewrite the function so that it takes the entire dataframe as a single input parameter and then specifies the individual column as needed insidce the function.

In [6]:
def calc_raptor_buffer(x):
    if x['recentspec'] == 'Swainsons Hawk':
        if x['recentstat'] == 'ACTIVE NEST':
            return 500
        else:
            return 333
    elif x['recentspec'] == 'Red-tail Hawk':
        if x['recentstat'] == 'ACTIVE NEST':
            return 800
        else:
            return 667
    else:
        return 500
    
raptor['buffer_dist3']=raptor.apply(calc_raptor_buffer, axis=1)
raptor.head()

Unnamed: 0,postgis_fi,lat_y_dd,long_x_dd,lastsurvey,recentspec,recentstat,Nest_ID,geometry,buffer_dist,buffer_dist2,buffer_dist3
0,361.0,40.267502,-104.870872,2012-03-16,Swainsons Hawk,INACTIVE NEST,361,POINT (-104.79595 40.29891),333,333,333
1,362.0,40.264321,-104.860255,2012-03-16,Swainsons Hawk,INACTIVE NEST,362,POINT (-104.78897 40.22089),333,333,333
2,1.0,38.650081,-105.494251,2014-07-28,Swainsons Hawk,INACTIVE NEST,1,POINT (-105.50223 38.68694),333,333,333
3,2.0,40.309574,-104.932604,2011-01-06,Swainsons Hawk,INACTIVE NEST,2,POINT (-104.84889 40.35215),333,333,333
4,3.0,40.219343,-104.729246,2014-07-03,Swainsons Hawk,ACTIVE NEST,3,POINT (-104.74466 40.18571),500,500,500


In retrospect this seems like the most strightforward way to approach this problem.  There is a reason, however, that I did not start with this approach.  Although it works, there are significant performance advantages that can be achieved by vectorizing functions using NumPy's vectorize method and to the best of my knowledge it is not possible to vectorize a function written in this way.

## Vectorizing the apply function

To understand how this works lets return to the previous example but this time we are interested in performance so we will time it over 50 repititions.

In [7]:
import time

def calc_raptor_buffer(x):
    if x['recentspec'] == 'Swainsons Hawk':
        if x['recentstat'] == 'ACTIVE NEST':
            return 500
        else:
            return 333
    elif x['recentspec'] == 'Red-tail Hawk':
        if x['recentstat'] == 'ACTIVE NEST':
            return 800
        else:
            return 667
    else:
        return 500
    
ts = time.time()
for i in range(50):  #Repeat 50 times to get a better time value
    raptor['buffer_dist3']=raptor.apply(calc_raptor_buffer, axis=1)
te = time.time()

print("Elapsed time: {:5.5f}".format(te-ts))

raptor.head()

Elapsed time: 0.58395


Unnamed: 0,postgis_fi,lat_y_dd,long_x_dd,lastsurvey,recentspec,recentstat,Nest_ID,geometry,buffer_dist,buffer_dist2,buffer_dist3
0,361.0,40.267502,-104.870872,2012-03-16,Swainsons Hawk,INACTIVE NEST,361,POINT (-104.79595 40.29891),333,333,333
1,362.0,40.264321,-104.860255,2012-03-16,Swainsons Hawk,INACTIVE NEST,362,POINT (-104.78897 40.22089),333,333,333
2,1.0,38.650081,-105.494251,2014-07-28,Swainsons Hawk,INACTIVE NEST,1,POINT (-105.50223 38.68694),333,333,333
3,2.0,40.309574,-104.932604,2011-01-06,Swainsons Hawk,INACTIVE NEST,2,POINT (-104.84889 40.35215),333,333,333
4,3.0,40.219343,-104.729246,2014-07-03,Swainsons Hawk,ACTIVE NEST,3,POINT (-104.74466 40.18571),500,500,500


We can see that running this apply method 50 times took less than 0.6 seconds.  This is quite fast on this data set and maybe not worth vectorizing at this point.  If it were a large dataset and/or a very complex function however, performance might be an issue.

The NumPy vectorize method takes the name of a function that will be vectorized and a set of NumPy vectors that will be used as inputs to that function. Then it runs the function with all of the inputs, but because it is running in NumPy there are often significant performance advantages. In this case the process runs more than 10x faster than the simple apply method. 

In [8]:
import numpy as np

def calc_raptor_buffer(spec, stat):
    if spec == 'Swainsons Hawk':
        if stat == 'ACTIVE NEST':
            return 500
        else:
            return 333
    elif spec == 'Red-tail Hawk':
        if stat == 'ACTIVE NEST':
            return 800
        else:
            return 667
    else:
        return 500
    
ts = time.time()
for i in range(50):  #Repeat 50 times to get a better time value
    raptor['buffer_dist4']=np.vectorize(calc_raptor_buffer)(raptor['recentspec'], raptor['recentstat'])
te = time.time()

print("Elapsed time: {:5.5f}".format(te-ts))

raptor.head()

Elapsed time: 0.02567


Unnamed: 0,postgis_fi,lat_y_dd,long_x_dd,lastsurvey,recentspec,recentstat,Nest_ID,geometry,buffer_dist,buffer_dist2,buffer_dist3,buffer_dist4
0,361.0,40.267502,-104.870872,2012-03-16,Swainsons Hawk,INACTIVE NEST,361,POINT (-104.79595 40.29891),333,333,333,333
1,362.0,40.264321,-104.860255,2012-03-16,Swainsons Hawk,INACTIVE NEST,362,POINT (-104.78897 40.22089),333,333,333,333
2,1.0,38.650081,-105.494251,2014-07-28,Swainsons Hawk,INACTIVE NEST,1,POINT (-105.50223 38.68694),333,333,333,333
3,2.0,40.309574,-104.932604,2011-01-06,Swainsons Hawk,INACTIVE NEST,2,POINT (-104.84889 40.35215),333,333,333,333
4,3.0,40.219343,-104.729246,2014-07-03,Swainsons Hawk,ACTIVE NEST,3,POINT (-104.74466 40.18571),500,500,500,500
