# Funzioni *lambda* con `.groupby()`
E' possibile applicare le funzioni *lambda* ai risultati della funzione *groupby*.

In [82]:
import pandas as pd
from numpy.random import random as rn

In [84]:
df = pd.DataFrame({'A': rn(10), 'B': rn(10), 'C': rn(10)})
df

Unnamed: 0,A,B,C
0,0.462081,0.394312,0.304069
1,0.858358,0.356605,0.670749
2,0.44782,0.498101,0.132036
3,0.856451,0.469844,0.281512
4,0.64319,0.704687,0.947187
5,0.22,0.719393,0.983945
6,0.347326,0.142806,0.801363
7,0.328384,0.986184,0.526027
8,0.345199,0.485782,0.443971
9,0.882585,0.600516,0.980692


Le funzioni lambda oltre a permettere di eseguire le 4 operazioni (+, -, *, /) all'interno dei *DataFrames* o delle *Series* permettono anche di svolgere altri tipi di calcoli, come l'esclusione di certi elementi dai risultati.

Le funzioni *lambda* possono essere applicate tramite la funzione `.apply()`.

In [86]:
df.apply(lambda x: x*10)

Unnamed: 0,A,B,C
0,4.620813,3.943123,3.04069
1,8.583576,3.566047,6.707494
2,4.478197,4.98101,1.320358
3,8.564508,4.69844,2.815118
4,6.431902,7.04687,9.471873
5,2.200001,7.193928,9.839451
6,3.473263,1.428062,8.013626
7,3.283835,9.861836,5.260268
8,3.451987,4.857819,4.439707
9,8.825849,6.005164,9.806922


In [88]:
df = pd.read_csv('https://shanelynnwebsite-mid9n9g1q9y8tt.netdna-ssl.com/wp-content/uploads/2015/06/phone_data.csv')
df

Unnamed: 0,index,date,duration,item,month,network,network_type
0,0,15/10/14 06:58,34.429,data,2014-11,data,data
1,1,15/10/14 06:58,13.000,call,2014-11,Vodafone,mobile
2,2,15/10/14 14:46,23.000,call,2014-11,Meteor,mobile
3,3,15/10/14 14:48,4.000,call,2014-11,Tesco,mobile
4,4,15/10/14 17:27,4.000,call,2014-11,Tesco,mobile
...,...,...,...,...,...,...,...
825,825,13/03/15 00:38,1.000,sms,2015-03,world,world
826,826,13/03/15 00:39,1.000,sms,2015-03,Vodafone,mobile
827,827,13/03/15 06:58,34.429,data,2015-03,data,data
828,828,14/03/15 00:13,1.000,sms,2015-03,world,world


In [165]:
gb = df.groupby('month').apply(lambda x: (x['duration'] >= 1000))
df[gb.values == True]

Unnamed: 0,index,date,duration,item,month,network,network_type
8,8,16/10/14 15:12,1050.0,call,2014-11,Three,mobile
10,10,16/10/14 16:21,1183.0,call,2014-11,Three,mobile
31,31,18/10/14 13:10,1714.0,call,2014-11,Three,mobile
59,59,23/10/14 08:34,1940.0,call,2014-11,landline,landline
105,105,31/10/14 13:27,1234.0,call,2014-11,Tesco,mobile
116,116,02/11/14 15:44,1023.0,call,2014-11,Three,mobile
117,117,02/11/14 19:16,1025.0,call,2014-11,Three,mobile
171,171,07/11/14 09:33,1205.0,call,2014-11,Vodafone,mobile
223,223,12/11/14 17:59,1001.0,call,2014-11,Three,mobile
252,252,19/11/14 18:56,2120.0,call,2014-12,Three,mobile


In [184]:
operations = {
  'duration': ['mean', 'min', 'max', 'sum']
}

gb = df.groupby(['month', 'item']).apply(lambda x: (x['duration'] >= 500))
res = df[gb.values == True].groupby(['month', 'item']).agg(operations)

res.round(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,duration,duration,duration,duration
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,min,max,sum
month,item,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2014-11,call,813.67,4.0,1714.0,4882.0
2014-11,data,34.43,34.43,34.43,172.14
2014-11,sms,1.0,1.0,1.0,5.0
2014-12,call,81.75,4.0,208.0,327.0
2014-12,data,34.43,34.43,34.43,68.86
2014-12,sms,1.0,1.0,1.0,3.0
2015-01,call,150.71,17.0,449.0,1055.0
2015-01,sms,1.0,1.0,1.0,4.0
2015-02,call,104.83,7.0,478.0,629.0
2015-02,data,34.43,34.43,34.43,68.86
