## <font color="maroon"><h4 align="center">Pandas Group By</font>

** Analyzing Weather data from various cities and see how group by can be used to run data analytics.** 

In [1]:
import pandas as pd
df = pd.read_csv("weather_by_cities.csv")
df

Unnamed: 0,day,city,temperature,windspeed,event
0,01-01-2025,new york,32,6,Rain
1,01-02-2025,new york,36,7,Sunny
2,01-03-2025,new york,28,12,Snow
3,01-04-2025,new york,33,7,Sunny
4,01-01-2025,mumbai,90,5,Sunny
5,01-02-2025,mumbai,85,12,Fog
6,01-03-2025,mumbai,87,15,Fog
7,01-04-2025,mumbai,92,5,Rain
8,01-01-2025,paris,45,20,Sunny
9,01-02-2025,paris,50,13,Cloudy


In [2]:
g = df.groupby("city")
g

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000021DA14035F0>

**DataFrameGroupBy object looks something like below,**

In [3]:
for city, data in g:
    print("city:",city)
    print("data:",data)    

city: mumbai
data:           day    city  temperature  windspeed  event
4  01-01-2025  mumbai           90          5  Sunny
5  01-02-2025  mumbai           85         12    Fog
6  01-03-2025  mumbai           87         15    Fog
7  01-04-2025  mumbai           92          5   Rain
city: new york
data:           day      city  temperature  windspeed  event
0  01-01-2025  new york           32          6   Rain
1  01-02-2025  new york           36          7  Sunny
2  01-03-2025  new york           28         12   Snow
3  01-04-2025  new york           33          7  Sunny
city: paris
data:            day   city  temperature  windspeed   event
8   01-01-2025  paris           45         20   Sunny
9   01-02-2025  paris           50         13  Cloudy
10  01-03-2025  paris           54          8  Cloudy
11  01-04-2025  paris           42         10  Cloudy


**This is similar to SQL,**

**SELECT * from weather_data GROUP BY city**

In [5]:
g.get_group('mumbai')

Unnamed: 0,day,city,temperature,windspeed,event
4,01-01-2025,mumbai,90,5,Sunny
5,01-02-2025,mumbai,85,12,Fog
6,01-03-2025,mumbai,87,15,Fog
7,01-04-2025,mumbai,92,5,Rain


In [6]:
g.max()

Unnamed: 0_level_0,day,temperature,windspeed,event
city,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
mumbai,01-04-2025,92,15,Sunny
new york,01-04-2025,36,12,Sunny
paris,01-04-2025,54,20,Sunny


**This method of splitting your dataset in smaller groups and then applying an operation (such as min or max) to get aggregate result is called Split-Apply-Combine. 

It is illustrated in a diagram below**

<img src="split_apply_combine.png">

In [7]:
g.min()

Unnamed: 0_level_0,day,temperature,windspeed,event
city,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
mumbai,01-01-2025,85,5,Fog
new york,01-01-2025,28,6,Rain
paris,01-01-2025,42,8,Cloudy


In [None]:
g.describe()

In [8]:
g.size()

city
mumbai      4
new york    4
paris       4
dtype: int64

In [9]:
g.count()

Unnamed: 0_level_0,day,temperature,windspeed,event
city,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
mumbai,4,4,4,4
new york,4,4,4,4
paris,4,4,4,4


In [None]:
%matplotlib inline
g.plot()

<h4>Group data using custom function: Let's say you want to group your data using custom function. Here the requirement is to create three groups<h4>
<ol>
    <li>Days when temperature was between 80 and 90</li>
    <li>Days when it was between 50 and 60</li>
    <li>Days when it was anything else</li>
</ol>

For this you need to write custom grouping function and pass that to groupby

In [None]:
def grouper(df, idx, col):
    if 80 <= df[col].loc[idx] <= 90:
        return '80-90'
    elif 50 <= df[col].loc[idx] <= 60:
        return '50-60'
    else:
        return 'others'

In [None]:
g = df.groupby(lambda x: grouper(df, x, 'temperature'))
g

In [None]:
for key, d in g:
    print("Group by Key: {}\n".format(key))
    print(d)