In [None]:
import temporian as tp

In [None]:
sales = tp.from_parquet(
    "../../data/ecommerce_sales.parquet",
    timestamps="InvoiceDate",
)
sales

In [None]:
sales["TotalPrice"].plot()

A common operation on temporal data is to calculate the moving sum. Let's calculate and plot the sum of sales for each transaction in the previous seven days. The moving sum can be computed using the moving_sum operator.

In [None]:
weekly_sales = sales["TotalPrice"].moving_sum(tp.duration.days(7))
weekly_sales.plot(interactive=True)

## Sales per products

In [None]:
sales_per_product = sales.add_index("StockCode")
weekly_sales_per_product = sales_per_product["TotalPrice"].moving_sum(
    tp.duration.days(7)
)
weekly_sales_per_product.plot()

## Aggregate transactions into time series
Our dataset contains individual client transactions. To use this data with a machine learning model, it is often useful to aggregate it into time series, where the data is sampled uniformly over time. For example, we could aggregate the sales weekly, or calculate the total sales in the last week for each day.

However, it is important to note that aggregating transaction data into time series can result in some data loss. For example, the individual transaction timestamps and values would be lost. This is because the aggregated time series would only represent the total sales for each time period.

Let's compute the total sales in the last week for each day for each product individually.

In [None]:
daily_sampling = sales_per_product.tick(tp.duration.days(1))
weekly_sales_daily = sales_per_product["TotalPrice"].moving_sum(
    tp.duration.days(7), sampling=daily_sampling
)
weekly_sales_daily.plot()

In [None]:
tp.to_pandas(weekly_sales_daily)

## Train a forecasting model with TensorFlow model

A key application of Temporian is to clean data and perform feature engineering for machine learning models. It is well suited for forecasting, anomaly detection, fraud detection, and other tasks where data comes continuously.

In this example, we show how to train a TensorFlow model to predict the next day's sales using past sales for each product individually. We will feed the model various levels of aggregations of sales as well as calendar information.

Let's first augment our dataset and convert it to a dataset compatible with a tabular ML model.

In [None]:
sales_per_product = sales.add_index("StockCode")
daily_sampling = sales_per_product.tick(tp.duration.days(1))

Compute moving sums with various window length.
Machine learning models are able to select the ones that matter.

In [None]:
features = [
    sales_per_product["TotalPrice"]
    .moving_sum(tp.duration.days(w), sampling=daily_sampling)
    .rename(f"moving_sum_{w}")
    for w in [3, 7, 14, 28]
]

Calendar information such as the day of the week are very informative of human activities.

In [None]:
features.append(daily_sampling.calendar_day_of_week())

The label is the daly sales shifted / leaked one days in the future.

In [None]:
label = (
    sales_per_product["UnitPrice"]
    .leak(tp.duration.days(1))
    .moving_sum(tp.duration.days(1), sampling=daily_sampling)
    .rename("label")
)
dataset = tp.glue(*features, label)
dataset

We can then convert the dataset from EventSet to TensorFlow Dataset format, and train a Random Forest.

In [None]:
import tensorflow_decision_forests as tfdf


def extract_label(example):
    example.pop("timestamp")
    label = example.pop("label")
    return example, label


tf_dataset = tp.to_tensorflow_dataset(dataset).map(extract_label).batch(100)

In [None]:
model = tfdf.keras.RandomForestModel(task=tfdf.keras.Task.REGRESSION, verbose=2)
model.fit(tf_dataset)

In [None]:
model.summary()

In [None]:
tfdf.model_plotter.plot_model_in_colab(model, tree_idx=0, max_depth=2)

In [None]:
tfdf.model_plotter.plot_model(model, tree_idx=0, max_depth=2)