# Aggregate events at a fixed interval

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google/temporian/blob/last-release/docs/src/recipes/aggregate_interval.ipynb)

This recipe aggregates possibly non-uniformly sampled events into fixed-length intervals (e.g., seconds, hours, days, or weeks). In other words, it converts the event features into time series.

For example, suppose we have the sales log from a store, where each sold item is represented by an event. Let's assume each sale event has a date-time, the sale price and the unit cost of the product. We want to calculate total daily sales, with one single event at `00:00` each day.

## Example data

Let's create some sale events with **non-uniform sampling** and the mentioned features.

In [None]:
import pandas as pd
import temporian as tp

sales_data = pd.DataFrame(
    data=[
        # sale timestamp,   price, cost
        ["2020-01-01 13:04", 3.0,  1.0],
        ["2020-01-01 13:04", 5.0,  2.0],  # duplicated timestamp
        ["2020-01-02 15:24", 7.0,  3.0],
        ["2020-01-03 13:45", 3.0,  1.0],
        ["2020-01-03 16:10", 7.0,  3.0],
        ["2020-01-03 17:30", 10.0, 5.0],
        ["2020-01-06 10:10", 4.0,  2.0],
        ["2020-01-06 19:35", 3.0,  1.0],
    ],
    columns=[
        "timestamp",
        "unit_price",
        "unit_cost",
    ],
)

sales_evset = tp.from_pandas(sales_data)
sales_evset.plot()

## Solution
We want to calculate total daily sales. So this is what we can do:
1. Create a uniform sampling with one tick per day (could be any other interval), at time `00:00:00`.
1. Add up all sales that happened between `00:00:01` from the previous day, and the current tick at `00:00:00`.

### 1. Create uniform sampling

In [None]:
# Define the time span to cover: one week
time_span = tp.event_set(timestamps=["2020-01-01 00:00", "2020-01-07 00:00"])

# Create daily ticks at 00:00
interval = tp.duration.days(1)
ticks = time_span.tick(interval)

ticks

### 2. Aggregate the events

Now we can aggregate the events between ticks, in this case by running a moving sum over the specified `sampling=ticks`, with the `window_length` equal to the interval between ticks.

Note that all moving window operators support the `sampling` argument, so any other kind of aggregation could be used depending on the use case (e.g: moving average, max, min).

In [None]:
# Provide uniform ticks as sampling
moving_sum = sales_evset.moving_sum(window_length=interval, sampling=ticks)

moving_sum

## (Optional) Rename and plot

Finally, we can rename features to match their actual meaning after aggregation.

In this case we also calculate and plot the daily profit.

In [None]:
# Rename aggregated features
daily_sales = moving_sum.rename({"unit_price": "daily_revenue", "unit_cost": "daily_cost"})

# Profit = revenue - cost
daily_profit = (daily_sales["daily_revenue"] - daily_sales["daily_cost"]).rename("daily_profit")

daily_profit.plot()
