# Pandas GroupBy: Your Guide to Grouping Data in Python
    
by Brad Solomon  Nov 18, 2019.[Here](https://realpython.com/pandas-groupby/)



## Table of Contents

1. Housekeeping
2. Example 1: U.S. Congress Dataset
    - 2.1 The “Hello, World!” of Pandas GroupBy
    - 2.2 Pandas GroupBy vs SQL
    - 2.3 How Pandas GroupBy Works
        - 2.3.a DataFrameGroupBy.```__iter__()```
        - 2.3.b DataFrameGroupBy.```groups()```
        - 2.3.c DataFrameGroupBy.```get_group()```
        - 2.3.d next()
3. Example 2: Air Quality Dataset
    - 3.1 Grouping on Derived Arrays
    - 3.2 Resampling
4. Example 3: News Aggregator Dataset
    - 4.1 Using Lambda Functions in .groupby()
    - 4.2 Improving the Performance of .groupby()
5. Pandas GroupBy: Putting It All Together
6. Conclusion
7. More Resources on Pandas GroupBy

In this tutorial, you’ll cover:

- How to use `Pandas GroupBy` operations on real-world data
- How the `split-apply-combine` chain of operations works
- How to `decompose the split-apply-combine` chain into steps
- How methods of a Pandas GroupBy object can be placed into different categories based on their intent and result

### 1. Housekeeping

- The U.S. Congress dataset.[Here](https://github.com/unitedstates/congress-legislators)
- The air quality dataset.[Here](http://archive.ics.uci.edu/ml/datasets/Air+Quality)
- The news aggregator dataset.[Here](http://archive.ics.uci.edu/ml/datasets/News+Aggregator)


In [1]:
import pandas as pd

# Use 3 decimal places in output display
pd.set_option("display.precision", 3)

# Don't wrap repr(DataFrame) across additional lines
pd.set_option("display.expand_frame_repr", False)

# Set max rows displayed in output to 25
pd.set_option("display.max_rows", 25)

### 2. Example 1: U.S. Congress Dataset

In [2]:
import pandas as pd

dtypes = {
    "first_name": "category",
    "gender": "category",
    "type": "category",
    "state": "category",
    "party": "category",
}
df = pd.read_csv(
    "..\\..\\..\\data\legislators.historical.data.csv",
    dtype=dtypes,
    usecols=list(dtypes) + ["birthday", "last_name"],
    parse_dates=["birthday"]
)

In [3]:
df.head()
#df.tail()

Unnamed: 0,last_name,first_name,birthday,gender,type,state,party
0,Bassett,Richard,1745-04-02,M,sen,DE,Anti-Administration
1,Bland,Theodorick,1742-03-21,M,rep,VA,
2,Burke,Aedanus,1743-06-16,M,rep,SC,
3,Carroll,Daniel,1730-07-22,M,rep,MD,
4,Clymer,George,1739-03-16,M,rep,PA,


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11982 entries, 0 to 11981
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   last_name   11982 non-null  object        
 1   first_name  11982 non-null  category      
 2   birthday    11431 non-null  datetime64[ns]
 3   gender      11982 non-null  category      
 4   type        11982 non-null  category      
 5   state       11982 non-null  category      
 6   party       11750 non-null  category      
dtypes: category(5), datetime64[ns](1), object(1)
memory usage: 315.7+ KB


#### 2.1 The “Hello, World!” of Pandas GroupBy

__What is the count of Congressional members, on a state-by-state basis, over the entire history of the dataset?__ 

In SQL, you could find this answer with a SELECT statement:

```
SELECT state, count(last_name)
FROM df
GROUP BY state
ORDER BY state;
```

The near-equivalent in Pandas:

In [5]:
n_by_state = df.groupby("state")["last_name"].count()
n_by_state.head(10)

state
AK     16
AL    206
AR    117
AS      2
AZ     48
CA    363
CO     90
CT    240
DC      2
DE     97
Name: last_name, dtype: int64

You call .groupby() and pass the name of the column you want to group on, which is "state". Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation.

You can pass a lot more than just a single column name to .groupby() as the first argument. You can also specify any of the following:

- A [list](https://realpython.com/python-lists-tuples/) of multiple column names
- A [dict](https://realpython.com/python-dicts/) or Pandas Series
- A [NumPy array](https://realpython.com/numpy-array-programming/) or Pandas Index, or an array-like iterable of these

Here’s an example of grouping `jointly on two columns`, __which finds the count of Congressional members broken out by state and then by gender__:

The analogous SQL query would look like this:
```
SELECT state, gender, count(last_name)
FROM df
GROUP BY state, gender
ORDER BY state, gender;
```

In [6]:
df2c = df.groupby(["state", "gender"])["last_name"].count()
df2c

state  gender
AK     F           0
       M          16
AL     F           3
       M         203
AR     F           5
                ... 
WI     M         197
WV     F           1
       M         119
WY     F           2
       M          38
Name: last_name, Length: 116, dtype: int64

#### 2.2 Pandas GroupBy vs SQL

This is a good time to introduce one prominent difference between the Pandas GroupBy operation and the SQL query above. The result set of the SQL query contains three columns:

1. state
2. gender
3. count

In the Pandas version, the grouped-on columns are pushed into the __MultiIndex__ of the resulting Series by default:

In [7]:
n_by_state_gender = df.groupby(["state", "gender"])["last_name"].count()
type(n_by_state_gender)

pandas.core.series.Series

In [8]:
n_by_state_gender.index[:5]

MultiIndex([('AK', 'F'),
            ('AK', 'M'),
            ('AL', 'F'),
            ('AL', 'M'),
            ('AR', 'F')],
           names=['state', 'gender'])

To more closely emulate the SQL result and push the grouped-on columns back into columns in the result, you an use as_index=False:

This __produces a DataFrame with three columns and a RangeIndex__, rather than a Series with a MultiIndex. In short, `using as_index=False will make your result more closely mimic the default SQL output for a similar operation`.

In [9]:
df.groupby(["state", "gender"], as_index=False)["last_name"].count()

Unnamed: 0,state,gender,last_name
0,AK,F,
1,AK,M,16.0
2,AL,F,3.0
3,AL,M,203.0
4,AR,F,5.0
...,...,...,...
111,WI,M,197.0
112,WV,F,1.0
113,WV,M,119.0
114,WY,F,2.0


Also note that the SQL queries above explicitly use __ORDER BY__, whereas .groupby() does not. That’s because `.groupby() does this by default through its parameter sort, which is True` unless you tell it otherwise:

In [10]:
# Don't sort results by the sort keys
df.groupby("state", sort=False)["last_name"].count()

state
DE      97
VA     432
SC     251
MD     305
PA    1053
      ... 
AK      16
PI      13
VI       4
GU       4
AS       2
Name: last_name, Length: 58, dtype: int64

#### 2.3 How Pandas GroupBy Works

In [11]:
by_state = df.groupby("state")
print(by_state)

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000002CFEB865948>


What is that DataFrameGroupBy thing? Its **`.__str__()`** doesn’t give you much information into what it actually is or how it works.

One term that’s frequently used alongside .groupby() is __split-apply-combine__. This refers to a chain of three steps:

- Split a table into groups
- Apply some operations to each of those smaller tables
- Combine the results

So, `how can you mentally separate the split, apply, and combine stages if you can’t see any of them happening in isolation?` One useful way to inspect a Pandas GroupBy object and see the splitting in action is to iterate over it. This is implemented in **`DataFrameGroupBy.__iter__()`** and produces an iterator of (group, DataFrame) pairs for DataFrames:

#####  2.3.a DataFrameGroupBy.``` __iter__()```

In [12]:
for state, frame in by_state:
    print(f"First 2 entries for {state!r}")
    print("------------------------")

    print(frame.head(2), end="\n\n")

First 2 entries for 'AK'
------------------------
     last_name first_name   birthday gender type state        party
6618    Waskey      Frank 1875-04-20      M  rep    AK     Democrat
6646      Cale     Thomas 1848-09-17      M  rep    AK  Independent

First 2 entries for 'AL'
------------------------
    last_name first_name   birthday gender type state       party
911   Crowell       John 1780-09-18      M  rep    AL  Republican
990    Walker       John 1783-08-12      M  sen    AL  Republican

First 2 entries for 'AR'
------------------------
     last_name first_name   birthday gender type state party
1000     Bates      James 1788-08-25      M  rep    AR   NaN
1278    Conway      Henry 1793-03-18      M  rep    AR   NaN

First 2 entries for 'AS'
------------------------
          last_name first_name   birthday gender type state     party
10796         Sunia       Fofó 1937-03-13      M  rep    AS  Democrat
11753  Faleomavaega        Eni 1943-08-15      M  rep    AS  Democrat

F

5461     Moody     Gideon 1832-10-16      M  sen    SD  Republican

First 2 entries for 'TN'
------------------------
    last_name first_name   birthday gender type state       party
141     White      James 1749-06-16      M  rep    TN         NaN
142    Blount    William 1749-03-26      M  sen    TN  Republican

First 2 entries for 'TX'
------------------------
     last_name first_name   birthday gender type state     party
2567  Pilsbury    Timothy 1789-04-12      M  rep    TX  Democrat
2670   Kaufman      David 1813-12-18      M  rep    TX  Democrat

First 2 entries for 'UT'
------------------------
      last_name first_name   birthday gender type state     party
3483  Bernhisel       John 1799-07-23      M  rep    UT      Whig
3646     Kinney       John 1816-04-02      M  rep    UT  Democrat

First 2 entries for 'VA'
------------------------
   last_name  first_name   birthday gender type state                party
1      Bland  Theodorick 1742-03-21      M  rep    VA          

#####  2.3.b DataFrameGroupBy.```groups()```

The __.groups__ attribute will give you a `dictionary of {group name: group label} pairs`. For example, by_state is a dict with states as keys. Here’s the value for the "PA" key:

In [13]:
by_state.groups["PA"]

Int64Index([    4,    19,    21,    27,    38,    57,    69,    76,    84,
               88,
            ...
            11840, 11864, 11873, 11875, 11885, 11889, 11930, 11943, 11957,
            11971],
           dtype='int64', length=1053)

In [14]:
by_state.groups['AK']

Int64Index([ 6618,  6646,  7441,  7500,  8038,  8235,  8876,  9818,  9950,
             9984, 10081, 10107, 10324, 11260, 11384, 11732],
           dtype='int64')

##### 2.3.c DataFrameGroupBy.```get_group()```

You can also use __.get_group()__ as a way `to drill down to the sub-table` from a single group:

In [15]:
by_state.get_group('PA')

Unnamed: 0,last_name,first_name,birthday,gender,type,state,party
4,Clymer,George,1739-03-16,M,rep,PA,
19,Maclay,William,1737-07-20,M,sen,PA,Anti-Administration
21,Morris,Robert,1734-01-20,M,sen,PA,Pro-Administration
27,Wynkoop,Henry,1737-03-02,M,rep,PA,
38,Jacobs,Israel,1726-06-09,M,rep,PA,
...,...,...,...,...,...,...,...
11889,Brady,Robert,1945-04-07,M,rep,PA,Democrat
11930,Shuster,Bill,1961-01-10,M,rep,PA,Republican
11943,Rothfus,Keith,1962-04-25,M,rep,PA,Republican
11957,Costello,Ryan,1976-09-07,M,rep,PA,Republican


In [16]:
by_state.get_group('AK').head(5)

Unnamed: 0,last_name,first_name,birthday,gender,type,state,party
6618,Waskey,Frank,1875-04-20,M,rep,AK,Democrat
6646,Cale,Thomas,1848-09-17,M,rep,AK,Independent
7441,Grigsby,George,1874-12-02,M,rep,AK,
7500,Sulzer,Charles,1879-02-24,M,rep,AK,
8038,Sutherland,Daniel,1869-04-17,M,rep,AK,Republican


This is virtually equivalent to using __.loc[]__. You could get the same output with something like `df.loc[df["state"] == "PA"]`.

In [17]:
df.loc[df["state"] == "PA"].head(5)
df.loc[df["state"] == "PA"]

Unnamed: 0,last_name,first_name,birthday,gender,type,state,party
4,Clymer,George,1739-03-16,M,rep,PA,
19,Maclay,William,1737-07-20,M,sen,PA,Anti-Administration
21,Morris,Robert,1734-01-20,M,sen,PA,Pro-Administration
27,Wynkoop,Henry,1737-03-02,M,rep,PA,
38,Jacobs,Israel,1726-06-09,M,rep,PA,
...,...,...,...,...,...,...,...
11889,Brady,Robert,1945-04-07,M,rep,PA,Democrat
11930,Shuster,Bill,1961-01-10,M,rep,PA,Republican
11943,Rothfus,Keith,1962-04-25,M,rep,PA,Republican
11957,Costello,Ryan,1976-09-07,M,rep,PA,Republican


##### 2.3.d  next()

what about the apply part?
 
From the Pandas GroupBy object by_state, you can grab the initial U.S. state and DataFrame with next(). When you iterate over a Pandas GroupBy object, you’ll get pairs that you can unpack into two variables:

In [18]:
# First tuple from iterator
state, frame = next(iter(by_state))
state

'AK'

In [19]:
frame.head(3)

Unnamed: 0,last_name,first_name,birthday,gender,type,state,party
6618,Waskey,Frank,1875-04-20,M,rep,AK,Democrat
6646,Cale,Thomas,1848-09-17,M,rep,AK,Independent
7441,Grigsby,George,1874-12-02,M,rep,AK,


In [20]:
# Count for state == 'AK'
print('Rows:',len(frame))
frame["last_name"].count()  

Rows: 16


16

### 3. Example 2: Air Quality Dataset
The air quality dataset contains hourly readings from a gas sensor device in Italy. `Missing values are denoted with -200` in the CSV file.

In [21]:
import pandas as pd

f = lambda x : (x.replace(',', '.'))
ft = lambda x : (x.replace('.', ':'))

df = pd.read_csv(
    "..\\..\\..\\data\\AirQualityUCI.data.csv",
    sep=';',
    parse_dates=[["Date", "Time"]],
    squeeze=True,
    na_values=[-200],
    converters = {'Time': ft,'CO(GT)': f, 'T': f, 'RH': f, 'AH': f},
    #
    # When using pandas.read_csv, you must provide a converter or a dtype, not both.
    # dtype={"CO(GT)": "float64", "T": "float64", "RH": "float64", "AH": "float64"},
    #
    usecols=["Date", "Time", "CO(GT)", "T", "RH", "AH"]    
).rename(
    columns={
        "CO(GT)": "co",
        "Date_Time": "tstamp",
        "T": "temp_c",
        "RH": "rel_hum",
        "AH": "abs_hum",
    }
#).set_index("tstamp")
)

df.set_index(pd.DatetimeIndex(df['tstamp']), inplace =True)
df = df.astype({'co': 'float64', 'temp_c': 'float64', 'rel_hum': 'float64', 'abs_hum': 'float64'})

# remove duplicate column
df.drop('tstamp', axis=1, inplace =True)

print(df.head(5))
print('\n')
print(df.info())

                      co  temp_c  rel_hum  abs_hum
tstamp                                            
2004-10-03 18:00:00  2.6    13.6     48.9    0.758
2004-10-03 19:00:00  2.0    13.3     47.7    0.726
2004-10-03 20:00:00  2.2    11.9     54.0    0.750
2004-10-03 21:00:00  2.2    11.0     60.0    0.787
2004-10-03 22:00:00  1.6    11.2     59.6    0.789


<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 9357 entries, 2004-10-03 18:00:00 to 2005-04-04 14:00:00
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   co       9357 non-null   float64
 1   temp_c   9357 non-null   float64
 2   rel_hum  9357 non-null   float64
 3   abs_hum  9357 non-null   float64
dtypes: float64(4)
memory usage: 365.5 KB
None


Here, co is that hour’s average carbon monoxide reading, while temp_c, rel_hum, and abs_hum are the average temperature in Celsius, relative humidity, and absolute humidity over that hour, respectively. The observations run from March 2004 through April 2005:

In [22]:
print(df.index.min())
print(df.index.max())

2004-01-04 00:00:00
2005-12-03 23:00:00


In [23]:
df.index.min(), df.index.max()

(Timestamp('2004-01-04 00:00:00'), Timestamp('2005-12-03 23:00:00'))

#### 3.1 Grouping on Derived Arrays
- A NumPy array or Pandas Index, or an array-like iterable of these

You can take advantage in order to __group by the day of the week__. You can use the index’s __.day_name()__ to produce a Pandas Index of strings. Here are the first ten observations:

In [24]:
day_names = df.index.day_name()
print(type(day_names))
day_names[:10]

<class 'pandas.core.indexes.base.Index'>


Index(['Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Wednesday',
       'Wednesday', 'Wednesday', 'Wednesday'],
      dtype='object', name='tstamp')

You can then take this object and use it as the .groupby() key. In Pandas-speak, day_names is array-like. It’s a one-dimensional sequence of labels.

Now, pass that object to .groupby() to find the average carbon monoxide ()co) reading by day of the week:

In [25]:
df.groupby(day_names)["co"].mean()

tstamp
Friday      -23.622
Monday      -33.320
Saturday    -32.798
Sunday      -28.420
Thursday    -39.114
Tuesday     -45.802
Wednesday   -36.717
Name: co, dtype: float64

In [26]:
df.groupby(day_names)["temp_c"].mean()   

tstamp
Friday        1.977
Monday       14.356
Saturday     11.300
Sunday        9.293
Thursday     11.162
Tuesday      11.256
Wednesday     9.133
Name: temp_c, dtype: float64

What if you wanted to group not just by day of the week, but by hour of the day? That result should have 7 * 24 = 168 observations. To accomplish that, you can pass a list of array-like objects. In this case, you’ll pass Pandas Int64Index objects:

In [27]:
hr = df.index.hour
hr

Int64Index([18, 19, 20, 21, 22, 23,  0,  1,  2,  3,
            ...
             5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
           dtype='int64', name='tstamp', length=9357)

In [28]:
df.groupby([day_names, hr])["co"].mean().rename_axis(["dow", "hr"])
#df

dow        hr
Friday     0    -19.802
           1    -23.698
           2    -24.011
           3    -24.227
           4    -78.105
                  ...  
Wednesday  19   -21.602
           20   -25.716
           21   -26.713
           22   -27.298
           23   -27.433
Name: co, Length: 168, dtype: float64

In [29]:
yrs = df.index.year
df.groupby(yrs)['co'].mean()

tstamp
2004   -42.854
2005    -6.847
Name: co, dtype: float64

In [30]:
df.groupby(yrs)['temp_c'].mean()

tstamp
2004    13.779
2005    -2.881
Name: temp_c, dtype: float64

Here’s one more similar case that uses __.cut()__ to bin the temperature values into discrete intervals:

In [31]:
bins = pd.cut(df["temp_c"], bins=4, labels=('super-cool', "cool", "warm", "hot"))
print(type(bins))
bins.head()

<class 'pandas.core.series.Series'>


tstamp
2004-10-03 18:00:00    hot
2004-10-03 19:00:00    hot
2004-10-03 20:00:00    hot
2004-10-03 21:00:00    hot
2004-10-03 22:00:00    hot
Name: temp_c, dtype: category
Categories (4, object): [super-cool < cool < warm < hot]

In [32]:
df[["rel_hum", "abs_hum"]].groupby(bins).agg(["mean", "median"])

Unnamed: 0_level_0,rel_hum,rel_hum,abs_hum,abs_hum
Unnamed: 0_level_1,mean,median,mean,median
temp_c,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
super-cool,-200.0,-200.0,-200.0,-200.0
cool,,,,
warm,,,,
hot,49.234,49.6,1.026,0.995


In [33]:
bins.values

[hot, hot, hot, hot, hot, ..., hot, hot, hot, hot, hot]
Length: 9357
Categories (4, object): [super-cool < cool < warm < hot]

In [34]:
bins.tolist()[:10]

['hot', 'hot', 'hot', 'hot', 'hot', 'hot', 'hot', 'hot', 'hot', 'hot']

In [35]:
df.groupby(bins.tolist()).groups['hot']

DatetimeIndex(['2004-10-03 18:00:00', '2004-10-03 19:00:00',
               '2004-10-03 20:00:00', '2004-10-03 21:00:00',
               '2004-10-03 22:00:00', '2004-10-03 23:00:00',
               '2004-11-03 00:00:00', '2004-11-03 01:00:00',
               '2004-11-03 02:00:00', '2004-11-03 03:00:00',
               ...
               '2005-04-04 05:00:00', '2005-04-04 06:00:00',
               '2005-04-04 07:00:00', '2005-04-04 08:00:00',
               '2005-04-04 09:00:00', '2005-04-04 10:00:00',
               '2005-04-04 11:00:00', '2005-04-04 12:00:00',
               '2005-04-04 13:00:00', '2005-04-04 14:00:00'],
              dtype='datetime64[ns]', name='tstamp', length=8991, freq=None)

In [36]:
df.groupby(bins.values).get_group('super-cool')

Unnamed: 0_level_0,co,temp_c,rel_hum,abs_hum
tstamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2004-01-04 14:00:00,1.7,-200.0,-200.0,-200.0
2004-01-04 15:00:00,1.9,-200.0,-200.0,-200.0
2004-01-04 16:00:00,2.3,-200.0,-200.0,-200.0
2004-08-04 23:00:00,2.0,-200.0,-200.0,-200.0
2004-09-04 00:00:00,2.4,-200.0,-200.0,-200.0
...,...,...,...,...
2005-11-02 17:00:00,6.6,-200.0,-200.0,-200.0
2005-11-02 18:00:00,6.5,-200.0,-200.0,-200.0
2005-11-02 19:00:00,7.1,-200.0,-200.0,-200.0
2005-11-02 20:00:00,4.9,-200.0,-200.0,-200.0


#### 3.2 Resampling
What if you wanted to group by an observation’s year and quarter?

In [37]:
# See an easier alternative below
df.groupby([df.index.year, df.index.quarter])["co"].agg(
    ["max", "min"]
).rename_axis(["year", "quarter"])

Unnamed: 0_level_0,Unnamed: 1_level_0,max,min
year,quarter,Unnamed: 2_level_1,Unnamed: 3_level_1
2004,1,9.4,-200.0
2004,2,8.7,-200.0
2004,3,7.5,-200.0
2004,4,11.9,-200.0
2005,1,8.7,-200.0
2005,2,6.1,-200.0
2005,3,6.0,-200.0
2005,4,8.4,-200.0


This whole operation can, alternatively, be expressed through resampling. One of the uses of resampling is as a [time-based groupby](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#resampling). All that you need to do is pass a frequency string, such as "Q" for "quarterly", and Pandas will do the rest:

In [38]:
df.resample("Q")["co"].agg(["max", "min"])

Unnamed: 0_level_0,max,min
tstamp,Unnamed: 1_level_1,Unnamed: 2_level_1
2004-03-31,9.4,-200.0
2004-06-30,8.7,-200.0
2004-09-30,7.5,-200.0
2004-12-31,11.9,-200.0
2005-03-31,8.7,-200.0
2005-06-30,6.1,-200.0
2005-09-30,6.0,-200.0
2005-12-31,8.4,-200.0


In [39]:
df.resample("Y")["co"].agg(["max", "min"])

Unnamed: 0_level_0,max,min
tstamp,Unnamed: 1_level_1,Unnamed: 2_level_1
2004-12-31,11.9,-200.0
2005-12-31,8.7,-200.0


In [40]:
df.resample("M")["co"].agg(["max", "min"])

Unnamed: 0_level_0,max,min
tstamp,Unnamed: 1_level_1,Unnamed: 2_level_1
2004-01-31,7.5,-200.0
2004-02-29,9.4,-200.0
2004-03-31,8.1,-200.0
2004-04-30,8.7,-200.0
2004-05-31,8.1,-200.0
2004-06-30,6.2,-200.0
2004-07-31,7.0,-200.0
2004-08-31,6.6,-200.0
2004-09-30,7.5,-200.0
2004-10-31,9.5,-200.0


Often, when you use .resample() you can express time-based grouping operations in a much more succinct manner. The result may be a tiny bit different than the more verbose .groupby() equivalent, but you’ll often find that .resample() gives you exactly what you’re looking for.