Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selectbydate: Code Extension #53

Open
Tanvi-Jain01 opened this issue Jul 9, 2023 · 0 comments
Open

Selectbydate: Code Extension #53

Tanvi-Jain01 opened this issue Jul 9, 2023 · 0 comments

Comments

@Tanvi-Jain01
Copy link

Tanvi-Jain01 commented Jul 9, 2023

@nipunbatra , @patel-zeel

Extending Selectbydate function by adding:

Additional Time Periods: The modified function introduces the capability to compute the average value of each month or year in addition to daily averages. This provides more granular insights into the data.

Grouping Support: The modified function allows for optional grouping of the data by specified columns. This enables the calculation of average values based on different groups, providing more customized analysis and comparisons.

Resampling Flexibility: The modified function uses the resample method with dynamic frequency parameters based on the selected time period. This allows for greater flexibility in computing average values at different frequencies without hardcoding the resampling periods.

Original Code:

df.index = pd.to_datetime(df.date)
df = df.drop("date", axis=1)
df_n = df[year].resample("1D").mean()
df_n = df_n.fillna(method="ffill")
df_n["month"] = df_n.index.month
df_n.index.dayofweek
print(df_n)

Improved Code:

Improving the code by adding group and time_period as parameter.

def selectByDate(df, year, group=None, time_period='day'):
    """
    Utility function to cut a given dataframe by year and find the average value
    of each day, month, or year. Optionally, data can be grouped by specified columns.
    
    Parameters
    ----------
    df: data frame
        A data frame containing a date field and optional grouping columns.
    year: type string
        A year to select and filter the data.
    group: list, optional
        A list of columns to group the data by. Default is None (no grouping).
    time_period: {'day', 'month', 'year'}, optional
        The time period to compute the average value. Default is 'day'.
    
    Returns
    -------
    data frame
        A data frame with the average value of each day, month, or year.
        If group is specified, the data will be grouped accordingly.
    """
    import pandas as pd
    
    df['date'] = pd.to_datetime(df['date'])
    df_year = df[df['date'].dt.year == int(year)]
    
    if group:
        df_year_grouped = df_year.groupby(group).resample(time_period[0], on='date').mean(numeric_only=True)
        return df_year_grouped
    
    if time_period == 'month':
        df_month = df_year.resample('M', on='date').mean(numeric_only=True)
        return df_month
    elif time_period == 'year':
        df_yearly = df_year.resample('Y', on='date').mean(numeric_only=True)
        return df_yearly
    
    df_day = df_year.resample('D', on='date').mean(numeric_only=True)
    return df_day

selectByDate(df1,'2022',group=['latitude','longitude','station'], time_period='month')

Here we can groupby any of the column present in the dataframe, and providing more flexibility on the time period by giving date, month, year.

Output:

selectby
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant