In [2]:
# @hidden_cell

from IPython.display import display, HTML

CSS = """
.output {
    flex-direction: row
}
"""

HTML('<style>{}</style>'.format(CSS))



# Agenda: 
* Introducing Pandas Functions
* Pandas Functions with Examples

## Introducing Pandas Functions

   * map
   * apply
   * applymap
   * groupby
   * Rolling
   * str

* Before you apply any fucntion on pandas object you need to be clear with type of operation
you want to perform on data. Basically, we can categorize operations on pandas object into 3 types,
   * Table wise operation (pipe)
   * Row or Columns wise operation(map,apply,groupby)
   * Element wise operation(appplymap)


* Let's  see how we can choose them based on scenario.

* We have dataset which contains marks students secured in various subjects from school 'XYZ'.<br/>

** Marks of students from schools 'XYZ'**

In [60]:
import pandas as pd
import numpy as np
from IPython.display import display, HTML

np.random.seed(1)
marks = pd.DataFrame(np.random.randint(10,20,size=(5,5)), 
                     columns=['Maths','Science','Hindi','English','SST'], 
                     index=['Virat','Dhoni','Ronald','David','sunil chhetri'])
new_entry = pd.DataFrame({'Maths':np.nan,'Science':np.nan,'Hindi':30.0,'English':np.nan,'SST':25.0},
                         index=['Ajay'],columns=['Maths','Science','Hindi','English','SST'])
marks = marks.append(new_entry)
display(marks)

Unnamed: 0,Maths,Science,Hindi,English,SST
Virat,15.0,18.0,19.0,15.0,10.0
Dhoni,10.0,11.0,17.0,16.0,19.0
Ronald,12.0,14.0,15.0,12.0,14.0
David,12.0,14.0,17.0,17.0,19.0
sunil chhetri,11.0,17.0,10.0,16.0,19.0
Ajay,,,30.0,,25.0


## Pandas Functions with Example

### pipe [ pd.DataFrame.pipe() ] ]
* Parameters: Function
* Result: object with the return type of function


* Calls user-defined methods on an object in method chains. Example: f(g(h(df), arg1=a), arg2=b, arg3=c)
 

**Case**<br/>

English Teacher  wants to find how many students got marks above 15.

**Solution**

** Pipe of length 1 **

In [65]:
marks.English.pipe(lambda x: x > 15)

Virat            False
Dhoni             True
Ronald           False
David             True
sunil chhetri     True
Ajay             False
Name: English, dtype: bool

**pipe of length 2**

In [66]:
marks.English.pipe(lambda x: x > 15).pipe(lambda x: sum(x))

3

### map [ pd.Series.map() ]

* Parameters: function, dict, or Series
* Result: Series.


* pd.Series.map() maps values of series using input correspondence( which can be a functon,dict,series).<br/>

Let's understand this with real scenario.

**Case:**<br/>
After declaring above marks English teacher noticed a wrong question in english paper.
To rectify her mistake she decided to give extra 5 marks to all students who had taken english test.
Now, she has to add 5 to each student on column English.
(Assume that schools is using pandas library to analyze student records)

** Solution:**<br/>
We can solve above case using pandas map function.<br/>
Check below code:

**Map using custom function as an argument**

In [2]:
def fuction_addnum(x):
    return x+5

marks.English = marks.English.map(fuction_addnum)
display(marks)

Unnamed: 0,Maths,Science,Hindi,English,SST
Virat,15.0,18.0,19.0,20.0,10.0
Dhoni,10.0,11.0,17.0,21.0,19.0
Ronald,12.0,14.0,15.0,17.0,14.0
David,12.0,14.0,17.0,22.0,19.0
sunil chhetri,11.0,17.0,10.0,21.0,19.0
Ajay,,,30.0,,25.0


* Here, map has taken each student mark on english,then mapped to other value(marks+5) using customized function named function_addnum.

We can also solve above case without creating custom function using lambda.<br/>
Check below Example.

**Map using lambda function as argument**

In [90]:
marks.English = marks.English.map(lambda x: x+5)
display(marks)


Unnamed: 0,Maths,Science,Hindi,English,SST
Virat,15.0,18.0,19.0,20.0,10.0
Dhoni,10.0,11.0,17.0,21.0,19.0
Ronald,12.0,14.0,15.0,17.0,14.0
David,12.0,14.0,17.0,22.0,19.0
sunil chhetri,11.0,17.0,10.0,21.0,19.0
Ajay,,,30.0,,25.0


* Optionally we can also pass na_action argument to map.<br/>
* na_action parameter propagates NaN values without passing them to mapping function, If it configured  to 'ignore'

**Missing values handling using na_action**

In [92]:
marks.English = marks.English.map(lambda x: x+5,na_action='ignore')
display(marks)

Unnamed: 0,Maths,Science,Hindi,English,SST
Virat,15.0,18.0,19.0,20.0,10.0
Dhoni,10.0,11.0,17.0,21.0,19.0
Ronald,12.0,14.0,15.0,17.0,14.0
David,12.0,14.0,17.0,22.0,19.0
sunil chhetri,11.0,17.0,10.0,21.0,19.0
Ajay,,,30.0,,25.0


<br/>
### apply [ pd.Seres.apply() & pd.DataFrame.apply() ]

Parameters: function.<br/> 
Result: Modified Series/DataFrame.


* Note that apply can be applied on both series and dataframe objects.<br/>
* First see how it works with series data.

**apply on series:**

* Invokes function on values of Series. 
* Function can be a NumPy function that applies to the entire Series or a Python function that only works on single values.

In [98]:
marks.English

Virat            15.0
Dhoni            16.0
Ronald           12.0
David            17.0
sunil chhetri    16.0
Ajay              NaN
Name: English, dtype: float64

* Above one is the data presented in series 'English'.<br/>
* Case that we discussed earlier can also be solved by using apply method on English subject, check below code.

**Apply function using function name as argument**

In [95]:
marks.English = marks.English.apply(func=fuction_addnum)
marks.English

Virat            20.0
Dhoni            21.0
Ronald           17.0
David            22.0
sunil chhetri    21.0
Ajay              NaN
Name: English, dtype: float64

**case:**<br/>
Newly joined teacher added marks of students in below format. But this format is not desirable and readable.<br/>
How to solve this problem?

In [170]:
marks = pd.DataFrame({'student_marks':[{'Maths':15.0,'Science':18.0,'Hindi':19.0,'English':15.0,'SST':10.0},
                            {'Maths':10.0,'Science':11.0,'Hindi':17.0,'English':16.0,'SST':19.0},
                           {'Maths':12.0,'Science':14.0,'Hindi':15.0,'English':12.0,'SST':14.0},
                           {'Maths':12.0,'Science':14.0,'Hindi':17.0,'English':17.0,'SST':19.0},
                           {'Maths':11.0,'Science':17.0,'Hindi':10.0,'English':16.0,'SST':19.0},
                           {'Maths':np.nan,'Science':np.nan,'Hindi':30.0,'English':np.nan,'SST':25.0}]},
                    index=['Virat','Dhoni','Ronald','David','Sunl Chhetri','Ajay'])
display(marks)

Unnamed: 0,student_marks
Virat,"{'Maths': 15.0, 'Science': 18.0, 'Hindi': 19.0..."
Dhoni,"{'Maths': 10.0, 'Science': 11.0, 'Hindi': 17.0..."
Ronald,"{'Maths': 12.0, 'Science': 14.0, 'Hindi': 15.0..."
David,"{'Maths': 12.0, 'Science': 14.0, 'Hindi': 17.0..."
Sunl Chhetri,"{'Maths': 11.0, 'Science': 17.0, 'Hindi': 10.0..."
Ajay,"{'Maths': nan, 'Science': nan, 'Hindi': 30.0, ..."


**solution:**

**Apply function on series which having dictionary data**

In [111]:
marks['Maths'] = marks.student_marks.apply(lambda x:x['Maths'])
marks['Science'] = marks.student_marks.apply(lambda x:x['Science'])
marks['Hindi'] = marks.student_marks.apply(lambda x:x['Hindi'])
marks['English'] = marks.student_marks.apply(lambda x:x['English'])
marks['SST'] = marks.student_marks.apply(lambda x:x['SST'])
marks.drop('student_marks',axis=1,inplace=True)
display(marks)

Unnamed: 0,Maths,Science,Hindi,English,SST
Virat,15.0,18.0,19.0,15.0,10.0
Dhoni,10.0,11.0,17.0,16.0,19.0
Ronald,12.0,14.0,15.0,12.0,14.0
David,12.0,14.0,17.0,17.0,19.0
Sunl Chhetri,11.0,17.0,10.0,16.0,19.0
Ajay,,,30.0,,25.0


* From first line it has taken all elements mapped to key 'Maths' from dicts in column student_marks, and created a new column with name"Maths".<br/>
* Similarly ,it created all other Series(Subjects) too.

* Above logic in less lines.

In [171]:
keys = marks.student_marks[0].keys()
for key in keys:
    marks[''+key] = marks.student_marks.apply(lambda x: x[''+key])
    
marks.drop('student_marks',axis=1,inplace=True)
display(marks)

Unnamed: 0,Maths,Science,Hindi,English,SST
Virat,15.0,18.0,19.0,15.0,10.0
Dhoni,10.0,11.0,17.0,16.0,19.0
Ronald,12.0,14.0,15.0,12.0,14.0
David,12.0,14.0,17.0,17.0,19.0
Sunl Chhetri,11.0,17.0,10.0,16.0,19.0
Ajay,,,30.0,,25.0


**case:**<br/>
Students Internal marks are stored in below format. In which way all students can able to figure out their highest score subject wise  easily from internal marks.

In [125]:
internal_marks = pd.DataFrame({'Maths':[[5,6,4],[5,2,3],[6,4,2],[5,5,2],[5,5,1],[np.nan,np.nan]],
                   'Science':[[6,6,6],[5,3,3],[7,4,3],[7,4,3],[8,5,4],[np.nan,np.nan,np.nan]],
                    'Hindi':[[5,5,9],[5,2,10],[5,5,5],[6,6,5],[3,3,4],[10,10,10]],
                  'English':[[5,5,5],[3,7,6],[4,2,6],[3,7,7],[4,8,4],[np.nan,np.nan,np.nan]],
                  'SST':[[2,3,5],[10,5,4],[7,4,3],[5,5,9],[5,6,8],[10,10,5]]},
                  index=['Virat','Dhoni','Ronald','David','Sunl Chhetri','Ajay'],
                             columns=['Maths','Science','Hindi','English','SST'])
display(internal_marks)

Unnamed: 0,Maths,Science,Hindi,English,SST
Virat,"[5, 6, 4]","[6, 6, 6]","[5, 5, 9]","[5, 5, 5]","[2, 3, 5]"
Dhoni,"[5, 2, 3]","[5, 3, 3]","[5, 2, 10]","[3, 7, 6]","[10, 5, 4]"
Ronald,"[6, 4, 2]","[7, 4, 3]","[5, 5, 5]","[4, 2, 6]","[7, 4, 3]"
David,"[5, 5, 2]","[7, 4, 3]","[6, 6, 5]","[3, 7, 7]","[5, 5, 9]"
Sunl Chhetri,"[5, 5, 1]","[8, 5, 4]","[3, 3, 4]","[4, 8, 4]","[5, 6, 8]"
Ajay,"[nan, nan]","[nan, nan, nan]","[10, 10, 10]","[nan, nan, nan]","[10, 10, 5]"


**solution:**

**Apply method on series containg list of list data**

In [123]:
def maxmarks(subject_marks):
    return np.max(subject_marks)

internal_marks['Max_Maths'] = internal_marks.Maths.apply(maxmarks)
internal_marks['Max_Science'] = internal_marks.Science.apply(maxmarks)
internal_marks['Max_Hindi'] = internal_marks.Hindi.apply(maxmarks)
internal_marks['Max_English'] = internal_marks.English.apply(maxmarks)
internal_marks['Max_SST'] = internal_marks.SST.apply(maxmarks)
display(internal_marks)

Unnamed: 0,Maths,Science,Hindi,English,SST,Max_Maths,Max_Science,Max_Hindi,Max_English,Max_SST
Virat,"[5, 6, 4]","[6, 6, 6]","[5, 5, 9]","[5, 5, 5]","[2, 3, 5]",6.0,6.0,9,5.0,5
Dhoni,"[5, 2, 3]","[5, 3, 3]","[5, 2, 10]","[3, 7, 6]","[10, 5, 4]",5.0,5.0,10,7.0,10
Ronald,"[6, 4, 2]","[7, 4, 3]","[5, 5, 5]","[4, 2, 6]","[7, 4, 3]",6.0,7.0,5,6.0,7
David,"[5, 5, 2]","[7, 4, 3]","[6, 6, 5]","[3, 7, 7]","[5, 5, 9]",5.0,7.0,6,7.0,9
Sunl Chhetri,"[5, 5, 1]","[8, 5, 4]","[3, 3, 4]","[4, 8, 4]","[5, 6, 8]",5.0,8.0,4,8.0,8
Ajay,"[nan, nan]","[nan, nan, nan]","[10, 10, 10]","[nan, nan, nan]","[10, 10, 5]",,,10,,10


* Above case can also solve using below code(reduced form of above logic)

In [145]:
cols = internal_marks.columns.tolist()
for label in cols :
    new_column = 'Max_'+label
    internal_marks[''+new_column] = internal_marks[''+label].apply(maxmarks)
display(internal_marks)  

Unnamed: 0,Maths,Science,Hindi,English,SST,Max_Maths,Max_Science,Max_Hindi,Max_English,Max_SST
Virat,"[5, 6, 4]","[6, 6, 6]","[5, 5, 9]","[5, 5, 5]","[2, 3, 5]",6.0,6.0,9,5.0,5
Dhoni,"[5, 2, 3]","[5, 3, 3]","[5, 2, 10]","[3, 7, 6]","[10, 5, 4]",5.0,5.0,10,7.0,10
Ronald,"[6, 4, 2]","[7, 4, 3]","[5, 5, 5]","[4, 2, 6]","[7, 4, 3]",6.0,7.0,5,6.0,7
David,"[5, 5, 2]","[7, 4, 3]","[6, 6, 5]","[3, 7, 7]","[5, 5, 9]",5.0,7.0,6,7.0,9
Sunl Chhetri,"[5, 5, 1]","[8, 5, 4]","[3, 3, 4]","[4, 8, 4]","[5, 6, 8]",5.0,8.0,4,8.0,8
Ajay,"[nan, nan]","[nan, nan, nan]","[10, 10, 10]","[nan, nan, nan]","[10, 10, 5]",,,10,,10


<br/>
**apply on dataframe:**

* Parameters: function.
* Result: Series or DataFrame<br/>


* apply on dataframe applies function along the input axis of DataFrame.<br/>
* Here, Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1).<br/>

**case**<br/>
Head master  asked to add 5 marks to all the students for all the subjects if he/she had appeared for the exam.

**solution:**

**Apply function on dataframe when function name passed as argument**

In [173]:
marks.apply(func=fuction_addnum)

Unnamed: 0,Maths,Science,Hindi,English,SST
Virat,20.0,23.0,24.0,20.0,15.0
Dhoni,15.0,16.0,22.0,21.0,24.0
Ronald,17.0,19.0,20.0,17.0,19.0
David,17.0,19.0,22.0,22.0,24.0
sunil chhetri,16.0,22.0,15.0,21.0,24.0
Ajay,,,35.0,,30.0


* We can see that above code has added value 5 to all elements along the axis = 0(default axis).<br/> 

**case**<br/>
Head master asked to find the average marks of each subject

**solution**

**Apply on dataframe object when lambda function passed as argument**

In [53]:
marks.apply(lambda x: np.mean(x))

Maths      22.000000
Science    14.800000
Hindi      18.000000
English    35.200000
SST        17.666667
dtype: float64

**case**<br/>
If teachers wants to find the total score and average score of each student.

**solution**

Below example finds the mean of each row (axis=1) using np.mean

**Passing axis information to apply method**

In [3]:
marks.apply(lambda x: np.sum(x), axis=1)

Virat            82.0
Dhoni            78.0
Ronald           72.0
David            84.0
sunil chhetri    78.0
Ajay             55.0
dtype: float64

In [58]:
marks.apply(lambda x: np.mean(x), axis=1)

Virat            21.4
Dhoni            20.6
Ronald           19.4
David            21.8
sunil chhetri    20.6
Ajay             27.5
dtype: float64

<br/>
### applymap [ pd.DataFrame.applymap() ]
* Parameters: function
* Result: DataFrame


* Applies function to a Dataframe elementwise,i.e. like doing map(func, series) for each series in the DataFrame

**Applymap method on dataframe**

**case**<br/>

Teacher has to generate the student status report((fail/pass)) for each subject.

**solution**

In [177]:
marks.applymap(lambda x: 'Fail' if x < 15 else 'Pass')

Unnamed: 0,Maths,Science,Hindi,English,SST
Virat,Pass,Pass,Pass,Pass,Fail
Dhoni,Fail,Fail,Pass,Pass,Pass
Ronald,Fail,Fail,Pass,Fail,Fail
David,Fail,Fail,Pass,Pass,Pass
sunil chhetri,Fail,Pass,Fail,Pass,Pass
Ajay,Pass,Pass,Pass,Pass,Pass


* Note : Above example has taken all the elements in dataframe elementwise, and applied function on each element(irrespective of axis).

<br/>
### groupby [ pd.Series.groupby() & pd.DataFrame.groupby() ]:

* Parameters: mapping, function, str, or iterable
* Result GroupBy object

By “group by” we are referring to a process involving one or more of the following steps<br>

* **Splitting:** the data into groups based on some criteria<br/>
* **Applying:** a function to each group independently<br/>
* **Combining:** the results into a data structure<br/>

Of these, the split step is the most straightforward. In fact, in many situations you may wish to split the data set into groups and do something with those groups yourself. In the apply step, we might wish to one of the following:

* **Aggregation:** computing a summary statistic (or statistics) about each group. Some examples:

Compute group sums or means<br/>
Compute group sizes / counts<br/>
* **Transformation:** perform some group-specific computations and return a like-indexed. Some examples:<br/>

Standardizing data (zscore) within group<br/>
Filling NAs within groups with a value derived from each group<br/>
* **Filtration:** discard some groups, according to a group-wise computation that evaluates True or False. Some examples:<br/>

Discarding data that belongs to groups with only a few members<br/>
Filtering out data based on the group sum or mean

* Some combination of the above: GroupBy will examine the results of the apply step and try to return a sensibly combined result if it doesn’t fit into either of the above two categories.<br/>

For more info check this link:<a href='https://pandas.pydata.org/pandas-docs/stable/groupby.html'>Group By: split-apply-combine</a>

<br/>
<br/>


Till now we have seen acadimic data, From here on we will use sample kabaddi data to understand the functions

**sample kabaddi dataset**

In [5]:
example_df = pd.DataFrame({'Player':['Rahul','Pradeep','Ajay','Sandeep','Nithin','Anup','Surendar','Manjith'],
                                   'Raid_Points':[300,368,258,200,45,200,45,60],
                                   'Tackle_Points':[45,1,20,50,30,25,100,40],
                                   'Profile':['All Rounder','Raider','Raider','Defender','Raider','Raider','Defender','Defender'],
                                   'Year':[2014,2017,2017,2015,2014,2015,2017,2014]})
display(example_df)

Unnamed: 0,Player,Profile,Raid_Points,Tackle_Points,Year
0,Rahul,All Rounder,300,45,2014
1,Pradeep,Raider,368,1,2017
2,Ajay,Raider,258,20,2017
3,Sandeep,Defender,200,50,2015
4,Nithin,Raider,45,30,2014
5,Anup,Raider,200,25,2015
6,Surendar,Defender,45,100,2017
7,Manjith,Defender,60,40,2014


#### Groupby

**Grouping data by a series**

In [62]:
example_df.groupby('Year')

<pandas.core.groupby.DataFrameGroupBy object at 0x00000203B05E8E80>

* Note: groupby fucntion has returned DataFrameGroupBy object, not dataframe.<br/>
* See below examples to know how access elements from  DataFrameGroupBy object.

**case**<br/>
want to see the performance of players based on year.

**solution**

**Accessing groups from DataFrameGroupBy object**

In [63]:
grb_year = example_df.groupby('Year')
grb_year.groups

{2014: Int64Index([0, 4, 7], dtype='int64'),
 2015: Int64Index([3, 5], dtype='int64'),
 2017: Int64Index([1, 2, 6], dtype='int64')}

* Here we got groups but this data is not in readable form, to see data use either access by get_group method or use for loop to print the contents.

**Accessing specified group using get_group method**

In [64]:
grb_year.get_group(2014)

Unnamed: 0,Player,Profile,Raid_Points,Tackle_Points,Year
0,Rahul,All Rounder,300,45,2014
4,Nithin,Raider,45,30,2014
7,Manjith,Defender,60,40,2014


** Printing data in group object using for loop**

In [66]:
for name,group in grb_year:
    print(name)
    print(group)

2014
    Player      Profile  Raid_Points  Tackle_Points  Year
0    Rahul  All Rounder          300             45  2014
4   Nithin       Raider           45             30  2014
7  Manjith     Defender           60             40  2014
2015
    Player   Profile  Raid_Points  Tackle_Points  Year
3  Sandeep  Defender          200             50  2015
5     Anup    Raider          200             25  2015
2017
     Player   Profile  Raid_Points  Tackle_Points  Year
1   Pradeep    Raider          368              1  2017
2      Ajay    Raider          258             20  2017
6  Surendar  Defender           45            100  2017


**case**<br/>
Sponsors want to see the summary of data over the years.

**solution**

** Finding summary of group object**

This example give you an idea how to describe data based on year.

In [67]:
example_df.groupby('Year').describe()

Unnamed: 0_level_0,Raid_Points,Raid_Points,Raid_Points,Raid_Points,Raid_Points,Raid_Points,Raid_Points,Raid_Points,Tackle_Points,Tackle_Points,Tackle_Points,Tackle_Points,Tackle_Points,Tackle_Points,Tackle_Points,Tackle_Points
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
Year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
2014,3.0,135.0,143.09088,45.0,52.5,60.0,180.0,300.0,3.0,38.333333,7.637626,30.0,35.0,40.0,42.5,45.0
2015,2.0,200.0,0.0,200.0,200.0,200.0,200.0,200.0,2.0,37.5,17.67767,25.0,31.25,37.5,43.75,50.0
2017,3.0,223.666667,164.214291,45.0,151.5,258.0,313.0,368.0,3.0,40.333333,52.538874,1.0,10.5,20.0,60.0,100.0


**case**<br/>
Aksed to find the total number of raid points scores by year

**solution**

** Applying function sum function on grouped data**

In [68]:
example_df.groupby('Year')['Raid_Points'].sum()

Year
2014    405
2015    400
2017    671
Name: Raid_Points, dtype: int64

**case**<br/>
Asked to find the total raid points and tackle points of players by the year and categorywise(profilewise).
In general terms , If sponsor gives profile and year information , they have to get the records of players with given profile and year.

**Grouping data using lists of columns**

This example shows you how to group data by multiple series infromation.

In [69]:
grb_profile_year = example_df.groupby(['Profile','Year'])
grb_profile_year.groups

{('All Rounder', 2014): Int64Index([0], dtype='int64'),
 ('Defender', 2014): Int64Index([7], dtype='int64'),
 ('Defender', 2015): Int64Index([3], dtype='int64'),
 ('Defender', 2017): Int64Index([6], dtype='int64'),
 ('Raider', 2014): Int64Index([4], dtype='int64'),
 ('Raider', 2015): Int64Index([5], dtype='int64'),
 ('Raider', 2017): Int64Index([1, 2], dtype='int64')}

**Accessing perticular group**

Accessing elements in groups using tuple entry(it is actually a group presented in group object).

In [70]:
grb_profile_year.get_group(('Raider', 2017))

Unnamed: 0,Player,Profile,Raid_Points,Tackle_Points,Year
1,Pradeep,Raider,368,1,2017
2,Ajay,Raider,258,20,2017


<br/>
### Aggregation:

**case**<br/>
Asked to find the total number of raid points and tackle points secured by year.


**solution**

**agg method on group object**

This example illustrates how to perform aggregate operation on group object.

In [71]:
example_df.groupby('Year').agg('sum')

Unnamed: 0_level_0,Raid_Points,Tackle_Points
Year,Unnamed: 1_level_1,Unnamed: 2_level_1
2014,405,115
2015,400,75
2017,671,121


Note: To agg we can also pass np.sum instead of 'sum'.

**case**<br/>
  Asked to see major statistical summary of data based on year.

**solution**

**Applying list of aggregate functions on group**

This example shows you how to perform multiple agg operations on grouped object.

In [72]:
example_df.groupby('Year').agg(['sum','mean','std'])

Unnamed: 0_level_0,Raid_Points,Raid_Points,Raid_Points,Tackle_Points,Tackle_Points,Tackle_Points
Unnamed: 0_level_1,sum,mean,std,sum,mean,std
Year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
2014,405,135.0,143.09088,115,38.333333,7.637626
2015,400,200.0,0.0,75,37.5,17.67767
2017,671,223.666667,164.214291,121,40.333333,52.538874


<br/>
#### Transformation:

* In book <a href=https://www.amazon.com/Python-Data-Science-Handbook-Essential/dp/1491912057/ref=as_li_ss_tl?ie=UTF8&qid=1491155961&sr=8-1&keywords=python+data+science+handbook&linkCode=sl1&tag=pbpython-20&linkId=5b7ccd9b952061a57f7c88236e6ac784 >Python Datascience Handbook </a>  transformation is described as,<br/>
While aggregation must return a reduced version of the data, transformation can return some transformed version of the full data to recombine. For such a transformation, the output is the same shape as the input. A common example is to center the data by subtracting the group-wise mean.

![](https://raw.githubusercontent.com/suresrividya/data-science-notes/master/transformgroupby.PNG)

In [42]:
example_df.groupby('Year').transform(lambda x: np.mean(x))

Unnamed: 0,Raid_Points,Tackle_Points
0,135.0,38.333333
1,223.666667,40.333333
2,223.666667,40.333333
3,200.0,37.5
4,135.0,38.333333
5,200.0,37.5
6,223.666667,40.333333
7,135.0,38.333333


**Transform on grouped object**

In [178]:
example_df.groupby('Year').transform(lambda x: (x - x.mean()) / x.std()*10)

Unnamed: 0,Raid_Points,Tackle_Points
0,11.531133,8.728716
1,8.789328,-7.48652
2,2.090764,-3.87015
3,,7.071068
4,-6.289709,-10.910895
5,,-7.071068
6,-10.880092,11.35667
7,-5.241424,2.182179


<br/>
### Filtration:
* Returns a copy of a DataFrame excluding elements from groups that do not satisfy the boolean criterion specified by function.


**Filtraring data in grouped object on given condition**

In [74]:
example_df.groupby('Year').filter(lambda x: x['Raid_Points'].sum() <= 500)

Unnamed: 0,Player,Profile,Raid_Points,Tackle_Points,Year
0,Rahul,All Rounder,300,45,2014
3,Sandeep,Defender,200,50,2015
4,Nithin,Raider,45,30,2014
5,Anup,Raider,200,25,2015
7,Manjith,Defender,60,40,2014


<br/>
### rolling [ pd.DataFrame.rolling() ]:
* Parameters: window: int, or offset()
* Result: a Window or Rolling sub-classed for the particular operation


* Provides rolling window calculations
* See Below example to understand rolling

**Rolling on dataframe**

In [43]:
example_df.rolling(window=2).sum()

Unnamed: 0,Player,Profile,Raid_Points,Tackle_Points,Year
0,Rahul,All Rounder,,,
1,Pradeep,Raider,668.0,46.0,4031.0
2,Ajay,Raider,626.0,21.0,4034.0
3,Sandeep,Defender,458.0,70.0,4032.0
4,Nithin,Raider,245.0,80.0,4029.0
5,Anup,Raider,245.0,55.0,4029.0
6,Surendar,Defender,245.0,125.0,4032.0
7,Manjith,Defender,105.0,140.0,4031.0


Let's understand above rolling method.<br/>
* Assume that rolling is a pointer placed at row1 at the time of execution begins.
 Since we have given window size 2 it looks for 2 values, but at this time it knows only one value(one value in each column).
So, it cann't perform the job we specified i.e sum, so it setted all elements in row1 to NaN.
Started moving down the table and reached row2.
From row2 onwords it knows pointed row values as well as previous row values, By using
that data it performed the summation and retuned rolling sum.

* To eliminate NaN values in result use min_periods argument with appropriate value.<br/>
* See below example to know how to eliminate NaN values.

**Rolling method with argument min_period **

In [79]:
example_df.rolling(2,min_periods=1).sum()

Unnamed: 0,Player,Profile,Raid_Points,Tackle_Points,Year
0,Rahul,All Rounder,300.0,45.0,2014.0
1,Pradeep,Raider,668.0,46.0,4031.0
2,Ajay,Raider,626.0,21.0,4034.0
3,Sandeep,Defender,458.0,70.0,4032.0
4,Nithin,Raider,245.0,80.0,4029.0
5,Anup,Raider,245.0,55.0,4029.0
6,Surendar,Defender,245.0,125.0,4032.0
7,Manjith,Defender,105.0,140.0,4031.0


* Note: Rolling is very useful method to analyze time series data.

### Str [ pd.Series.str ]

* Parameters: Object type  Series
* Result: Depends on the operation that we perform on str object 
    
    
* Designed to work with text series. It is equipped with a set of string processing methods that make it easy
to operate on each element of the array.<br/>
* Important thing to note is this method executes missing values automatically..

**Finding length of players in dataset exmaple_df**

In [85]:
example_df.Player.str.len()

0    5
1    7
2    4
3    7
4    6
5    4
6    8
7    7
Name: Player, dtype: int64

**FInding Name of the Players whose name start with 'S'**

In [45]:
example_df.Player.str.startswith('S')

0    False
1    False
2    False
3     True
4    False
5    False
6     True
7    False
Name: Player, dtype: bool

In [46]:
example_df[example_df.Player.str.startswith('S')]

Unnamed: 0,Player,Profile,Raid_Points,Tackle_Points,Year
3,Sandeep,Defender,200,50,2015
6,Surendar,Defender,45,100,2017


## Conclusion
In this Chapter we have seen Pandas function pipe,map,apply,applymap,groupby and  applications of them. In the next chaper we will learn another important cancept pandas dataframejoin, merge and concatination technniques.