-
Notifications
You must be signed in to change notification settings - Fork 20
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Feature Type
-
Adding new functionality to datar
-
Changing existing functionality in datar
-
Removing existing functionality in datar
Problem Description
Hi Mr. Pwwang, I hope everything goes well with your journey.
Recently, I have discovered another little issue regarding the group_by() function from your library. Therefore, I raised another request for your help. Hope you will consider it.
The problem is that group_by() function does not work well with pandas.Grouper() for handling more complicated grouping keys. Below, I use the air quality data for illustration. Here is the link of the dataset https://github.com/pandas-dev/pandas/blob/main/doc/data/air_quality_no2_long.csv
#----------------------------------------------#
#--------------- Data preparation -------------#
#----------------------------------------------#
import datar.all as dr
from datar import f
import pandas as pd
from pipda import register_verb
dr.filter = register_verb(func = dr.filter_)
# Suppress all warnings
import warnings
warnings.filterwarnings("ignore")
df_aq = (
pd.read_csv("05_Pandas_DataR_dataframe/data/air_quality_no2_long.csv")
.rename(columns={"date.utc": "date"})
.assign(date = lambda df: pd.to_datetime(df["date"], format="%Y-%m-%d %H:%M:%S%z"))
)
print(df_aq.head(3))
# city country date location parameter value unit
# <object> <object> <datetime64[ns, UTC]> <object> <object> <float64> <object>
# 0 Paris FR 2019-06-21 00:00:00+00:00 FR04014 no2 20.0 µg/m³
# 1 Paris FR 2019-06-20 23:00:00+00:00 FR04014 no2 21.8 µg/m³
# 2 Paris FR 2019-06-20 22:00:00+00:00 FR04014 no2 26.5 µg/m³
#-----------------------------------------------#
#-------------- Try df.groupby() ---------------#
#-----------------------------------------------#
print(
df_aq
.groupby(pd.Grouper(key="date", freq="5D"))
.agg(value_mean = ("value", "mean")) # # Calculate the mean of "value" column every 5 days
.reset_index()
)
# date value_mean
# <datetime64[ns, UTC]> <float64>
# 0 2019-05-07 00:00:00+00:00 30.286017
# 1 2019-05-12 00:00:00+00:00 24.975304
# 2 2019-05-17 00:00:00+00:00 30.772917
# 3 2019-05-22 00:00:00+00:00 32.298340
# 4 2019-05-27 00:00:00+00:00 20.337705
# 5 2019-06-01 00:00:00+00:00 25.743933
# 6 2019-06-06 00:00:00+00:00 19.717273
# 7 2019-06-11 00:00:00+00:00 25.300855
# 8 2019-06-16 00:00:00+00:00 25.027119
# 9 2019-06-21 00:00:00+00:00 20.000000
#-----------------------------------------------#
#----------- Try with dr.group_by() ------------#
#-----------------------------------------------#
print(
df_aq
>> dr.group_by(pd.Grouper(key="date", freq="5D"))
>> dr.summarize(value_mean = f.value.mean()) # Calculate the mean of "value" column every 5 days
)
# ... value_mean
# <object> <float64>
# 0 TimeGrouper(key='date', freq=<5 * Days>, axis=... 26.261847
'''Something wrong happens'''
Feature Description
print(
df_aq
>> dr.group_by(pd.Grouper(key="date", freq="5D"))
>> dr.summarize(value_mean = f.value.mean()) # Calculate the mean of "value" column every 5 days
)
# date value_mean
# <datetime64[ns, UTC]> <float64>
# 0 2019-05-07 00:00:00+00:00 30.286017
# 1 2019-05-12 00:00:00+00:00 24.975304
# 2 2019-05-17 00:00:00+00:00 30.772917
# 3 2019-05-22 00:00:00+00:00 32.298340
# 4 2019-05-27 00:00:00+00:00 20.337705
# 5 2019-06-01 00:00:00+00:00 25.743933
# 6 2019-06-06 00:00:00+00:00 19.717273
# 7 2019-06-11 00:00:00+00:00 25.300855
# 8 2019-06-16 00:00:00+00:00 25.027119
# 9 2019-06-21 00:00:00+00:00 20.000000
Additional Context
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request