In earlier chapter we have dealt with Series and DataFrame Creation, Access Techniques.<br/>
In this chapter we will discuss pandas functions.

# Agenda: 
* Introduction to Pandas Functions.
* Examples
* Understanding functions using  Titanic Dataset.

# Introduction:
   * map
   * apply
   * applymap
   * groupby
   * Rolling

I am creating below dataframe to explain  below examples.

In [2]:
import pandas as pd
import numpy as np
np.random.seed(1)
marks = pd.DataFrame(np.random.randint(10,20,size=(5,5)), 
                     columns=['maths','science','hindi','english','sst'], 
                     index=['a','b','c','d','e'])
marks

Unnamed: 0,maths,science,hindi,english,sst
a,15,18,19,15,10
b,10,11,17,16,19
c,12,14,15,12,14
d,12,14,17,17,19
e,11,17,10,16,19


## pd.Series.map():
* Parameters: function, dict, or Series
* Result: Series.

pd.Series.map() maps values of series using input correspondence( which can be a functon,dict,series).<br/>
See below example to know how to pass function as argument to pd.Series.map().

**Exmaple: map using function name as argument **

In [3]:
def fuction_addnum(x):
    return x+5
marks.english = marks.english.map(fuction_addnum)
marks

Unnamed: 0,maths,science,hindi,english,sst
a,15,18,19,20,10
b,10,11,17,21,19
c,12,14,15,17,14
d,12,14,17,22,19
e,11,17,10,21,19


In this example function_addnum is name of the function that we defined.<br/>
Check below example to know how to create map using lambda.

**Example: map using lambda function as argument**

In [4]:
marks.hindi = marks.hindi.map(lambda x: x+10)
marks

Unnamed: 0,maths,science,hindi,english,sst
a,15,18,29,20,10
b,10,11,27,21,19
c,12,14,25,17,14
d,12,14,27,22,19
e,11,17,20,21,19


Missing values handling:

Data utilized in above exmaples is complete,so we didn't faced any problem.<br/>
But, what if the data contains missing values(NaN)?.Think..<br/>
To deal with such incomplete data map function has provided na_action parameter.<br/>
By using this parameter, map function propagates NaN values without passing them to mapping function.

Check below exmaple for better understanding.

**Example: missing values handling using na_action**

Below, I am going to add new entry which has missing values, to dataframe 'marks'.

In [5]:
new_entry = pd.DataFrame({'maths':13,'science':np.nan,'hindi':np.nan,'english':32,'sst':15},index=['f'])
marks.append(new_entry)

Unnamed: 0,english,hindi,maths,science,sst
a,20,29.0,15,18.0,10
b,21,27.0,10,11.0,19
c,17,25.0,12,14.0,14
d,22,27.0,12,14.0,19
e,21,20.0,11,17.0,19
f,32,,13,,15


Now,we have data with missing values.<br/>
Execute below code, and see how it ignores the records with missing values.

In [6]:
marks.hindi = marks.hindi.map(lambda x: x+10,na_action='ignore')
marks

Unnamed: 0,maths,science,hindi,english,sst
a,15,18,39,20,10
b,10,11,37,21,19
c,12,14,35,17,14
d,12,14,37,22,19
e,11,17,30,21,19


## apply:

Parameters: function.<br/> 
Result: Modified Series/DataFrame.

Note that apply can be applied on both series and dataframe objects.<br/>
First see how it works with series data.

**apply on series:**

In [7]:
marks.maths

a    15
b    10
c    12
d    12
e    11
Name: maths, dtype: int32

Above one is the data presented in series 'maths'.<br/>
In the next example, I am going to increase every value in the series 'maths' by 5 using apply method.

**Example: apply function using function name as argument**

In [8]:
marks.maths = marks.maths.apply(func=fuction_addnum)
marks.maths

a    20
b    15
c    17
d    17
e    16
Name: maths, dtype: int64

Let's see another example when series containing dict as data.

**Example: apply function on series which having dictionary data**

In [9]:
df = pd.DataFrame({'a':[{'c':10,'d':20}, {'c':33,'d':44}]})
df

Unnamed: 0,a
0,"{'c': 10, 'd': 20}"
1,"{'c': 33, 'd': 44}"


See DataFrame df it contains dicts elements {'c': 10, 'd': 20},{'c': 33, 'd': 44} in column 'a'.<br/>
Now we are going to add two more columns to dataframe df using data presented in columns.

In [10]:
df['c'] = df.a.apply(lambda x:x['c'])
df['d'] = df.a.apply(lambda x:x['d'])
df

Unnamed: 0,a,c,d
0,"{'c': 10, 'd': 20}",10,20
1,"{'c': 33, 'd': 44}",33,44


Let's understand the above exmaple carefully.<br/>
Above code has taken all elements mapped to key 'c' in from dicts in column a, and created a new column with name c.<br/>
In the next exmaple, we will create new columns from list of list data.

**Example: apply method on series containg list of list data**

In [11]:
def f(c):
    return np.max(c)
df = pd.DataFrame({'a':[[12,3,6],[5,6],[8,9]], 'b':[[6,16],[18,8],[29,9]]})
df['new_a'] = df.a.apply(f)
df['new_b'] = df.b.apply(f)
df

Unnamed: 0,a,b,new_a,new_b
0,"[12, 3, 6]","[6, 16]",12,16
1,"[5, 6]","[18, 8]",6,18
2,"[8, 9]","[29, 9]",9,29


**apply on dataframe:**

* Parameters: function.
* Result: Series or DataFrame<br/>

apply on dataframe, applies function along the input axis of DataFrame.<br/>
Here, Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1).<br/>
See below example to know how to use apply fucntion on dataframe.

**Example: apply function on dataframe when function name passed as argument**

In [12]:
marks.apply(func=fuction_addnum)

Unnamed: 0,maths,science,hindi,english,sst
a,25,23,44,25,15
b,20,16,42,26,24
c,22,19,40,22,19
d,22,19,42,27,24
e,21,22,35,26,24


We can see that above code has added value 5 to all elements along the axis = 0(default axis).<br/> 
Below example finds the average of each column(axis=0) using np.mean(series).

**Example: apply on dataframe object when lambda function passed as argument**

In [13]:
marks.apply(lambda x: np.mean(x))

maths      17.0
science    14.8
hindi      35.6
english    20.2
sst        16.2
dtype: float64

Below example finds the mean of each row (axis=1) using np.mean

**Example: Passing axis information to apply method**

In [14]:
marks.apply(lambda x: np.mean(x), axis=1)

a    21.4
b    20.6
c    19.4
d    21.8
e    20.6
dtype: float64

## pd.DataFrame.applymap():
* Parameters: function
* Result: DataFrame

Applies function to a Dataframe elementwise,i.e. like doing map(func, series) for each series in the DataFrame

**Example: applymap method on dataframe**

In [15]:
df = pd.DataFrame(np.random.randint(1,10,size=(5,5)))
df

Unnamed: 0,0,1,2,3,4
0,8,7,2,1,2
1,9,9,4,9,8
2,4,7,6,2,4
3,5,9,2,5,1
4,4,3,1,5,3


In above line we have created a DataFrame with some random data to understand the applymap.<br/>
In below exmaple, we will set dataframe element value to 0,if it contains value less than 5.

In [16]:
df.applymap(lambda x: 0 if x < 5 else x)

Unnamed: 0,0,1,2,3,4
0,8,7,0,0,0
1,9,9,0,9,8
2,0,7,6,0,0
3,5,9,0,5,0
4,0,0,0,5,0


Above example has taken all the elements in dataframe elementwise, and applied function on each element(irrespective of axis).

## groupby:

* Parameters: mapping, function, str, or iterable
* Result GroupBy object

By “group by” we are referring to a process involving one or more of the following steps<br>

* **Splitting:** the data into groups based on some criteria<br/>
* **Applying:** a function to each group independently<br/>
* **Combining:** the results into a data structure<br/>

Of these, the split step is the most straightforward. In fact, in many situations you may wish to split the data set into groups and do something with those groups yourself. In the apply step, we might wish to one of the following:

* **Aggregation:** computing a summary statistic (or statistics) about each group. Some examples:

Compute group sums or means<br/>
Compute group sizes / counts<br/>
* **Transformation:** perform some group-specific computations and return a like-indexed. Some examples:<br/>

Standardizing data (zscore) within group<br/>
Filling NAs within groups with a value derived from each group<br/>
* **Filtration:** discard some groups, according to a group-wise computation that evaluates True or False. Some examples:<br/>

Discarding data that belongs to groups with only a few members<br/>
Filtering out data based on the group sum or mean

* Some combination of the above: GroupBy will examine the results of the apply step and try to return a sensibly combined result if it doesn’t fit into either of the above two categories.<br/>

For more info check this link:<a href='https://pandas.pydata.org/pandas-docs/stable/groupby.html'>Group By: split-apply-combine</a>

### Splitting:

In this section, we are going to see examples for grouping and accessing data from group object using below dataframe(example_df).

In [33]:
example_df = pd.DataFrame({'Player':['Rahul','Pradeep','Ajay','Sandeep','Nithin','Anup','Surendar','Manjith'],
                                   'Raid_Points':[300,368,258,200,45,200,45,60],
                                   'Tackle_Points':[45,1,20,50,30,25,100,40],
                                   'Profile':['All Rounder','Raider','Raider','Defender','Raider','Raider','Defender','Defender'],
                                   'Year':[2014,2017,2017,2015,2014,2015,2017,2014]})
example_df

Unnamed: 0,Player,Profile,Raid_Points,Tackle_Points,Year
0,Rahul,All Rounder,300,45,2014
1,Pradeep,Raider,368,1,2017
2,Ajay,Raider,258,20,2017
3,Sandeep,Defender,200,50,2015
4,Nithin,Raider,45,30,2014
5,Anup,Raider,200,25,2015
6,Surendar,Defender,45,100,2017
7,Manjith,Defender,60,40,2014


**Exmaple: grouping data by a series**

In [18]:
example_df.groupby('Year')

<pandas.core.groupby.DataFrameGroupBy object at 0x000001DA30E67B38>

Note: groupby fucntion has returned DataFrameGroupBy object, not dataframe.<br/>
See below examples to know how access elements from  DataFrameGroupBy object.

**Example: accessing groups from DataFrameGroupBy object**

In [21]:
grb_year = example_df.groupby('Year')
grb_year.groups

{2014: Int64Index([0, 4, 7], dtype='int64'),
 2015: Int64Index([3, 5], dtype='int64'),
 2017: Int64Index([1, 2, 6], dtype='int64')}

**Example: accessing specified group using get_group method**

In [317]:
grb_year.get_group(2014)

Unnamed: 0,Player,Profile,Raid_Points,Tackle_Points,Year
0,Rahul,All Rounder,300,45,2014
4,Nithin,Raider,45,30,2014
7,Manjith,Defender,60,40,2014


** Exmaple: printing data in group object using for loop**

In [22]:
for name,group in grb_year:
    print(name)
    print(group)

2014
    Player      Profile  Raid_Points  Tackle_Points  Year
0    Rahul  All Rounder          300             45  2014
4   Nithin       Raider           45             30  2014
7  Manjith     Defender           60             40  2014
2015
    Player   Profile  Raid_Points  Tackle_Points  Year
3  Sandeep  Defender          200             50  2015
5     Anup    Raider          200             25  2015
2017
     Player   Profile  Raid_Points  Tackle_Points  Year
1   Pradeep    Raider          368              1  2017
2      Ajay    Raider          258             20  2017
6  Surendar  Defender           45            100  2017


** Example: finding summary of group object**

This example give you an idea how to describe data based on year.

In [23]:
example_df.groupby('Year').describe()

Unnamed: 0_level_0,Raid_Points,Raid_Points,Raid_Points,Raid_Points,Raid_Points,Raid_Points,Raid_Points,Raid_Points,Tackle_Points,Tackle_Points,Tackle_Points,Tackle_Points,Tackle_Points,Tackle_Points,Tackle_Points,Tackle_Points
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
Year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
2014,3.0,135.0,143.09088,45.0,52.5,60.0,180.0,300.0,3.0,38.333333,7.637626,30.0,35.0,40.0,42.5,45.0
2015,2.0,200.0,0.0,200.0,200.0,200.0,200.0,200.0,2.0,37.5,17.67767,25.0,31.25,37.5,43.75,50.0
2017,3.0,223.666667,164.214291,45.0,151.5,258.0,313.0,368.0,3.0,40.333333,52.538874,1.0,10.5,20.0,60.0,100.0


** Example: applying function sum function on grouped data**

In [24]:
example_df.groupby('Year')['Raid_Points'].sum()

Year
2014    405
2015    400
2017    671
Name: Raid_Points, dtype: int64

**Example: grouping data using lists of columns**

This example shows you how to group data by multiple series infromation.

In [26]:
grb_profile_year = example_df.groupby(['Profile','Year'])
grb_profile_year.groups

{('All Rounder', 2014): Int64Index([0], dtype='int64'),
 ('Defender', 2014): Int64Index([7], dtype='int64'),
 ('Defender', 2015): Int64Index([3], dtype='int64'),
 ('Defender', 2017): Int64Index([6], dtype='int64'),
 ('Raider', 2014): Int64Index([4], dtype='int64'),
 ('Raider', 2015): Int64Index([5], dtype='int64'),
 ('Raider', 2017): Int64Index([1, 2], dtype='int64')}

**Example: accessing perticular group**

Accessing elements in groups using tuple entry(it is actually a group presented in group object).

In [27]:
grb_profile_year.get_group(('Raider', 2017))

Unnamed: 0,Player,Profile,Raid_Points,Tackle_Points,Year
1,Pradeep,Raider,368,1,2017
2,Ajay,Raider,258,20,2017


### Aggregation:

**Example: agg method on group object**

This example illustrates how to perform aggregate operation on group object.

In [28]:
example_df.groupby('Year').agg('sum')

Unnamed: 0_level_0,Raid_Points,Tackle_Points
Year,Unnamed: 1_level_1,Unnamed: 2_level_1
2014,405,115
2015,400,75
2017,671,121


Note: To agg we can also pass np.sum instead of 'sum'.

**Example: applying list of aggregate functions on group**

This example shows you how to perform multiple agg operations on grouped object.

In [29]:
example_df.groupby('Year').agg(['sum','mean','std'])

Unnamed: 0_level_0,Raid_Points,Raid_Points,Raid_Points,Tackle_Points,Tackle_Points,Tackle_Points
Unnamed: 0_level_1,sum,mean,std,sum,mean,std
Year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
2014,405,135.0,143.09088,115,38.333333,7.637626
2015,400,200.0,0.0,75,37.5,17.67767
2017,671,223.666667,164.214291,121,40.333333,52.538874


### Transformation:

**Example: transform on grouped object**

In [30]:
example_df.groupby('Year').transform(lambda x: (x - x.mean()) / x.std()*10)

Unnamed: 0,Raid_Points,Tackle_Points
0,11.531133,8.728716
1,8.789328,-7.48652
2,2.090764,-3.87015
3,,7.071068
4,-6.289709,-10.910895
5,,-7.071068
6,-10.880092,11.35667
7,-5.241424,2.182179


### Filtration:

**Example: filtraring data in grouped object on given condition**

In [57]:
example_df.groupby('Year').filter(lambda x: x['Raid_Points'].sum() <= 500)

Unnamed: 0,Player,Profile,Raid_Points,Tackle_Points,Year
0,Rahul,All Rounder,300,45,2014
3,Sandeep,Defender,200,50,2015
4,Nithin,Raider,45,30,2014
5,Anup,Raider,200,25,2015
7,Manjith,Defender,60,40,2014


## pd.DataFrame.rolling():
* Parameters: window: int, or offset()
* Result: a Window or Rolling sub-classed for the particular operation

Provides rolling window calculations

See Below example to understand rolling.

**Example:rolling on dataframe**

In [31]:
marks.rolling(2).sum()

Unnamed: 0,maths,science,hindi,english,sst
a,,,,,
b,35.0,29.0,76.0,41.0,29.0
c,32.0,25.0,72.0,38.0,33.0
d,34.0,28.0,72.0,39.0,33.0
e,33.0,31.0,67.0,43.0,38.0


Let's understand above rolling method.<br/>
Assume that rolling is a pointer placed at row1 at the time of execution begins.<br/>
Since we have given window size 2 it looks for 2 values, but at this time it only knows one value.<br/>
So, it cann't perform the job we specified i.e sum, so it setted all elements in row1 to NaN.<br/>
Started moving down the table and reached row2.<br/>
From row2 onwords it knows pointed row values as well as previous row values, By using<br/>
that data it did the summation and retuned rolling sum for window 2.

To eliminate NaN values in result use min_periods argument with appropriate value.<br/>
See below example to know how to eliminate NaN values.

**Example:rolling method with argument min_period **

In [32]:
marks.rolling(2,min_periods=1).sum()

Unnamed: 0,maths,science,hindi,english,sst
a,20.0,18.0,39.0,20.0,10.0
b,35.0,29.0,76.0,41.0,29.0
c,32.0,25.0,72.0,38.0,33.0
d,34.0,28.0,72.0,39.0,33.0
e,33.0,31.0,67.0,43.0,38.0


Rolling is very useful method to analyze time series data.

check this link <a herf='http://localhost:8888/notebooks/myexp/data/Pandas/Understanding%20pandas%20functions%20using%20Titanic%20Dataset.ipynb#apply:'>Understanding pandas functions using Titanic Dataset</a> for more exmaples.

link for apply vs transform:<a href='https://stackoverflow.com/questions/27517425/apply-vs-transform-on-a-group-object'>apply vs transform</a>

# Summary:
<......>

Next: Pandas mearge,joins,filter,concatination