# Agenda: 
* Introduction to Pandas Functions.
* Pandas Series Functions with Examples.
* Pandas DataFrame Function with Examples.

## Introduction:
Pandas functions can be categorized into two.
* Pandas Series Functions.
   * map
   * apply
   * groupby
* Pandas DataFrame Functions.
   * apply
   * applymap
   * groupby
   * Rolling

* We are going to use below DataFrame to understand the topics mentioned in agenda.

In [29]:
import pandas as pd
import numpy as np

In [91]:
np.random.seed(1)
marks = pd.DataFrame(np.random.randint(10,20,size=(5,5)), 
                     columns=['maths','science','hindi','english','sst'], 
                     index=['a','b','c','d','e'])

In [92]:
marks

Unnamed: 0,maths,science,hindi,english,sst
a,15,18,19,15,10
b,10,11,17,16,19
c,12,14,15,12,14
d,12,14,17,17,19
e,11,17,10,16,19


## Pandas Series Functions:

### map:
* Pandas map function maps values of series using input correspondence( which can be a functon,dict,series).
* It takes two parameters,<br/>
   1.arg : function, dict, or Series<br/>
   2.na_action:{None, ‘ignore’}<br/>
     It propagates NaN values without passing them to mapping function if we set na_action to ignore.<br/>

* Below, Example illustrates how to provide function as argument to pd.Series.map()
* First define a function which does your task. Then, pass that function as argument to Series.
* In this example, I want to add extra 5 marks to each student on their english scores.<br/> So,I have created below function which adds value 5 to given element.

In [112]:
def fuction_addnum(x):
    return x+5

* In the next step I am providing above function name as argument to Series marks.english

In [93]:
marks.english = marks.english.map(fuction_addnum)

In [94]:
marks

Unnamed: 0,maths,science,hindi,english,sst
a,15,18,19,20,10
b,10,11,17,21,19
c,12,14,15,17,14
d,12,14,17,22,19
e,11,17,10,21,19


* Now, we can see that each student marks in english increased by 5.
* Let's see another map example using lambda fucntion.

In [95]:
marks.hindi = marks.hindi.map(lambda x: x+10)

In [96]:
marks

Unnamed: 0,maths,science,hindi,english,sst
a,15,18,29,20,10
b,10,11,27,21,19
c,12,14,25,17,14
d,12,14,27,22,19
e,11,17,20,21,19


 * Same as above exmaple it increased each student hindi marks by 10.
 * So far we have seen examples for how to pass arg param. In the next example we will see what na_acton does if we set it as ingore.
 * To understand the example I am going to and some NaN data to above dataframe marks.

In [116]:
new_entry = pd.DataFrame({'maths':13,'science':np.nan,'hindi':np.nan,'english':32,'sst':15},index=['f'])

In [117]:
new_entry

Unnamed: 0,english,hindi,maths,science,sst
f,32,,13,,15


* I am going to add this new entry to above marks.

In [118]:
marks.append(new_entry)

Unnamed: 0,english,hindi,maths,science,sst
a,20,69.0,15,18.0,10
b,21,67.0,10,11.0,19
c,17,65.0,12,14.0,14
d,22,67.0,12,14.0,19
e,21,60.0,11,17.0,19
f,32,,13,,15


* Once you have the data with NaN values call map function on series data and pass function arg and na_action as ingnore.
* map function with na_action is set to ignore ignores the records having NaN and returns you the results.

In [100]:
marks.hindi = marks.hindi.map(lambda x: x+10,na_action='ignore')

In [102]:
marks

Unnamed: 0,maths,science,hindi,english,sst
a,15,18,49,20,10
b,10,11,47,21,19
c,12,14,45,17,14
d,12,14,47,22,19
e,11,17,40,21,19


### apply:

* pd.Series.apply() method applies a function to each element in the Series.
* It Takes 3 parameters,<br/>
  func : function<br/>
  convert_dtype : boolean, default True<br/>
  Try to find better dtype for elementwise function results. If False, leave as dtype=object<br/>
  args : tuple<br/>
  Positional arguments to pass to function in addition to the value.<br/>

In [120]:
marks.maths = marks.maths.apply(func=fuction_addnum)

In [122]:
marks

Unnamed: 0,maths,science,hindi,english,sst
a,20,18,69,20,10
b,15,11,67,21,19
c,17,14,65,17,14
d,17,14,67,22,19
e,16,17,60,21,19


Let's see an example where series contains dict elements.

In [143]:
df = pd.DataFrame({'a':[{'c':10,'d':20}, {'c':33,'d':44}]})

In [144]:
df

Unnamed: 0,a
0,"{'c': 10, 'd': 20}"
1,"{'c': 33, 'd': 44}"


See DataFrame df it contains a column 'a' which holding dict elements {'c': 10, 'd': 20},{'c': 33, 'd': 44}.<br/>
Now we are going to add two more columns to dataframe df using data presented in columns using apply function.

In [147]:
df['c'] = df.a.apply(lambda x:x['c'])

In [148]:
df

Unnamed: 0,a,c
0,"{'c': 10, 'd': 20}",10
1,"{'c': 33, 'd': 44}",33


Above code  has taken all elements mapped to key 'c' in all dicts in columns a and created a new columns with name c.

In [150]:
df['d'] = df.a.apply(lambda x:x['d'])

Above code taken all element assigned to key 'd' in all dicts in columns a and created a new columns with name d.<br/>
check below  df how it looks after adding new columns(c,d).

In [132]:
df

Unnamed: 0,a,c,d
0,"{'c': 10, 'd': 20}",10,20
1,"{'c': 33, 'd': 44}",33,44


### groupby:

## Pandas DataFrame Functions:

### apply:
* Applies function along input axis of DataFrame.
* Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1).
* Return type depends on whether passed function aggregates, or the reduce argument if the DataFrame is empty.

* Parameters: function.
* Result: Series or DataFrame

In [154]:
marks.apply(func=fuction_addnum)

Unnamed: 0,maths,science,hindi,english,sst
a,25,23,74,25,15
b,20,16,72,26,24
c,22,19,70,22,19
d,22,19,72,27,24
e,21,22,65,26,24


We can see that above code added 5 all the series along the axis = 0 

Below example finds the mean of each column(axis=0) uing lambda function.

In [158]:
marks.apply(lambda x: np.mean(x))

maths      17.0
science    14.8
hindi      65.6
english    20.2
sst        16.2
dtype: float64

Below example finds the mean of each row (axis=1) uing lambda function.

In [284]:
marks.apply(lambda x: np.mean(x), axis=1)

a    27.4
b    26.6
c    25.4
d    27.8
e    26.6
dtype: float64

Next example we are going to see how to create new columns from list of list data.

In [293]:
def f(c):
    return np.max(c)
df = pd.DataFrame({'a':[[12,3,6],[5,6],[8,9]], 'b':[[6,16],[18,8],[29,9]]})
df['new_a'] = df.a.apply(f)
df['new_b'] = df.b.apply(f)
df

Unnamed: 0,a,b,new_a,new_b
0,"[12, 3, 6]","[6, 16]",12,16
1,"[5, 6]","[18, 8]",6,18
2,"[8, 9]","[29, 9]",9,29


### applymap:
* Applies function to a Dataframe elementwise,i.e. like doing map(func, series) for each series in the DataFrame
* Parameters: function
* Result: DataFrame

In [294]:
df = pd.DataFrame(np.random.randint(1,10,size=(5,5)))
df

Unnamed: 0,0,1,2,3,4
0,9,3,4,2,3
1,8,3,7,1,3
2,7,7,3,8,8
3,1,7,6,2,5
4,7,1,7,6,2


In above line we have created a DataFrame with some random data.By using above dataframe we will understand the applymap.<br/>

In below exmaple, we will set dataframe element value to 0 if the its value less than 5.

In [297]:
df.applymap(lambda x: 0 if x < 5 else x)

Unnamed: 0,0,1,2,3,4
0,9,0,0,0,0
1,8,0,7,0,0
2,7,7,0,8,8
3,0,7,6,0,5
4,7,0,7,6,0


Above Example has taken all the elements in dataframe elementwise and applied function on each element(irrespective of axis).

### groupby:

* Groups series using mapper (dict or key function) or by a series of columns.
* Parameters: mapping, function, str, or iterable
* Result GroupBy object

Example:

In [308]:
groupby_example_df = pd.DataFrame({'Player':['Rahul','Pradeep','Ajay','Sandeep','Nithin','Anup','Surendar','Manjith'],
                                   'Raid_Points':[300,368,258,200,45,200,45,60],
                                   'Tackle_Points':[45,1,20,50,30,25,100,40],
                                   'Profile':['All Rounder','Raider','Raider','Defender','Raider','Raider','Defender','Defender'],
                                   'Year':[2014,2017,2017,2015,2014,2015,2017,2014]})
groupby_example_df

Unnamed: 0,Player,Profile,Raid_Points,Tackle_Points,Year
0,Rahul,All Rounder,300,45,2014
1,Pradeep,Raider,368,1,2017
2,Ajay,Raider,258,20,2017
3,Sandeep,Defender,200,50,2015
4,Nithin,Raider,45,30,2014
5,Anup,Raider,200,25,2015
6,Surendar,Defender,45,100,2017
7,Manjith,Defender,60,40,2014


In [None]:
groupy function returns DataFrameGroupBy object.

In [232]:
groupby_example_df.groupby('Year')

<pandas.core.groupby.DataFrameGroupBy object at 0x0000021E444A9B00>

In [233]:
grb_year = groupby_example_df.groupby('Year')
grb_year.groups

{2014: Int64Index([0, 4, 7], dtype='int64'),
 2015: Int64Index([3, 5], dtype='int64'),
 2017: Int64Index([1, 2, 6], dtype='int64')}

In [234]:
grb_year.get_group(2014)

Unnamed: 0,Player,Profile,Raid_Points,Tackle_Points,Year
0,Rahul,All Rounder,300,45,2014
4,Nithin,Raider,45,30,2014
7,Manjith,Defender,60,40,2014


In [236]:
for name,group in grb_year:
    print(name)
    print(group)

2014
    Player      Profile  Raid_Points  Tackle_Points  Year
0    Rahul  All Rounder          300             45  2014
4   Nithin       Raider           45             30  2014
7  Manjith     Defender           60             40  2014
2015
    Player   Profile  Raid_Points  Tackle_Points  Year
3  Sandeep  Defender          200             50  2015
5     Anup    Raider          200             25  2015
2017
     Player   Profile  Raid_Points  Tackle_Points  Year
1   Pradeep    Raider          368              1  2017
2      Ajay    Raider          258             20  2017
6  Surendar  Defender           45            100  2017


In [254]:
groupby_example_df.groupby('Year').describe()

Unnamed: 0_level_0,Raid_Points,Raid_Points,Raid_Points,Raid_Points,Raid_Points,Raid_Points,Raid_Points,Raid_Points,Tackle_Points,Tackle_Points,Tackle_Points,Tackle_Points,Tackle_Points,Tackle_Points,Tackle_Points,Tackle_Points
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
Year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
2014,3.0,135.0,143.09088,45.0,52.5,60.0,180.0,300.0,3.0,38.333333,7.637626,30.0,35.0,40.0,42.5,45.0
2015,2.0,200.0,0.0,200.0,200.0,200.0,200.0,200.0,2.0,37.5,17.67767,25.0,31.25,37.5,43.75,50.0
2017,3.0,223.666667,164.214291,45.0,151.5,258.0,313.0,368.0,3.0,40.333333,52.538874,1.0,10.5,20.0,60.0,100.0


In [255]:
groupby_example_df.groupby('Year')['Raid_Points'].sum()

Year
2014    405
2015    400
2017    671
Name: Raid_Points, dtype: int64

In [246]:
grb_profile_year = groupby_example_df.groupby(['Profile','Year'])
grb_profile_year.groups

{('All Rounder', 2014): Int64Index([0], dtype='int64'),
 ('Defender', 2014): Int64Index([7], dtype='int64'),
 ('Defender', 2015): Int64Index([3], dtype='int64'),
 ('Defender', 2017): Int64Index([6], dtype='int64'),
 ('Raider', 2014): Int64Index([4], dtype='int64'),
 ('Raider', 2015): Int64Index([5], dtype='int64'),
 ('Raider', 2017): Int64Index([1, 2], dtype='int64')}

In [253]:
grb_profile_year.get_group(('Raider', 2017))

Unnamed: 0,Player,Profile,Raid_Points,Tackle_Points,Year
1,Pradeep,Raider,368,1,2017
2,Ajay,Raider,258,20,2017


In [257]:
groupby_example_df.groupby('Year').agg('sum')

Unnamed: 0_level_0,Raid_Points,Tackle_Points
Year,Unnamed: 1_level_1,Unnamed: 2_level_1
2014,405,115
2015,400,75
2017,671,121


In [262]:
groupby_example_df.groupby('Year').agg(['sum','mean','std'])

Unnamed: 0_level_0,Raid_Points,Raid_Points,Raid_Points,Tackle_Points,Tackle_Points,Tackle_Points
Unnamed: 0_level_1,sum,mean,std,sum,mean,std
Year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
2014,405,135.0,143.09088,115,38.333333,7.637626
2015,400,200.0,0.0,75,37.5,17.67767
2017,671,223.666667,164.214291,121,40.333333,52.538874


In [264]:
groupby_example_df.groupby('Year').transform(lambda x: (x - x.mean()) / x.std()*10)

Unnamed: 0,Raid_Points,Tackle_Points
0,11.531133,8.728716
1,8.789328,-7.48652
2,2.090764,-3.87015
3,,7.071068
4,-6.289709,-10.910895
5,,-7.071068
6,-10.880092,11.35667
7,-5.241424,2.182179


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [195]:
titanic_data.groupby('Pclass')

<pandas.core.groupby.DataFrameGroupBy object at 0x0000021E44539128>

In [198]:
grb_pclass = titanic_data.groupby('Pclass')
grb_pclass.groups

{1: Int64Index([  1,   3,   6,  11,  23,  27,  30,  31,  34,  35,
             ...
             853, 856, 857, 862, 867, 871, 872, 879, 887, 889],
            dtype='int64', length=216),
 2: Int64Index([  9,  15,  17,  20,  21,  33,  41,  43,  53,  56,
             ...
             848, 854, 861, 864, 865, 866, 874, 880, 883, 886],
            dtype='int64', length=184),
 3: Int64Index([  0,   2,   4,   5,   7,   8,  10,  12,  13,  14,
             ...
             875, 876, 877, 878, 881, 882, 884, 885, 888, 890],
            dtype='int64', length=491)}

In [203]:
for name,group in grb_pclass:
    print(name)
    print(group)

1
     PassengerId  Survived  Pclass  \
1              2         1       1   
3              4         1       1   
6              7         0       1   
11            12         1       1   
23            24         1       1   
27            28         0       1   
30            31         0       1   
31            32         1       1   
34            35         0       1   
35            36         0       1   
52            53         1       1   
54            55         0       1   
55            56         1       1   
61            62         1       1   
62            63         0       1   
64            65         0       1   
83            84         0       1   
88            89         1       1   
92            93         0       1   
96            97         0       1   
97            98         1       1   
102          103         0       1   
110          111         0       1   
118          119         0       1   
124          125         0       1   
136       

In [194]:
grb_pclass.get_group(1).head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
11,12,1,1,"Bonnell, Miss. Elizabeth",female,58.0,0,0,113783,26.55,C103,S
23,24,1,1,"Sloper, Mr. William Thompson",male,28.0,0,0,113788,35.5,A6,S


### Rolling:
* Provides rolling window calculations.
* Parameters: window: int, or offset()
* Result: a Window or Rolling sub-classed for the particular operation

See Below example to understand rolling.

In [301]:
marks.rolling(2).sum()

Unnamed: 0,maths,science,hindi,english,sst
a,,,,,
b,35.0,29.0,136.0,41.0,29.0
c,32.0,25.0,132.0,38.0,33.0
d,34.0,28.0,132.0,39.0,33.0
e,33.0,31.0,127.0,43.0,38.0


Let's understand above rolling method.<br/>
Assume that rolling is a pointer placed at row1 at the time of execution begins.<br/>
Since we have given window size 2 it looks for 2 values, but at this time it only knows one value.<br/>
So it cann't perform the job we specified i.e sum, so it setted all elements in row1 to NaN.<br/>
Started moving down the table and reached row2.<br/>
From row2 onwords it knows pointed row values as well as previous row data, By using<br/>
that data it did the summation and retuned rolling sum for window 2.

TO eliminate NaN values in result use min_periods argument with appropriate value.<br/>
See below example to know how to eliminate NaN values.

In [307]:
marks.rolling(2,min_periods=1).sum()

Unnamed: 0,maths,science,hindi,english,sst
a,20.0,18.0,69.0,20.0,10.0
b,35.0,29.0,136.0,41.0,29.0
c,32.0,25.0,132.0,38.0,33.0
d,34.0,28.0,132.0,39.0,33.0
e,33.0,31.0,127.0,43.0,38.0


Rolling is very useful while your dealing with time series data.