# Time Series Forecasting with BigFrames

This notebook demonstrates time series forecasting using BigFrames with TimesFM and ARIMAPlus models on San Francisco bikeshare data.

In [None]:
import bigframes.pandas as bpd
bpd.options.display.repr_mode = "anywidget"

In [None]:
# Load bikeshare data, filtering for subscriber trips from 2018 onwards.
df = bpd.read_gbq("bigquery-public-data.san_francisco_bikeshare.bikeshare_trips")
df = df[df["start_date"] >= "2018-01-01"]
df = df[df["subscriber_type"] == "Subscriber"]

# Aggregate trips by hour.
df["trip_hour"] = df["start_date"] .dt.floor("h")
df_grouped = df[["trip_hour", "trip_id"]].groupby("trip_hour").count().reset_index()
df_grouped = df_grouped.rename(columns={"trip_id": "num_trips"})

## Forecasting with TimesFM

Use TimesFM to forecast the number of bikeshare trips for the last week of the dataset.

In [None]:
# Forecast the last 168 hours (one week).
result = df_grouped.head(2842-168).ai.forecast(
    timestamp_column="trip_hour",
    data_column="num_trips",
    horizon=168
)
result

## Forecasting with ARIMAPlus

Forecast the same period using the ARIMAPlus model.

In [None]:
from bigframes.ml import forecasting

# Create and configure an ARIMAPlus model for hourly data.
model = forecasting.ARIMAPlus(
    auto_arima_max_order=5,  # Reduce runtime for large datasets
    data_frequency="hourly",
    horizon=168
)

# Use the same training data as the TimesFM model.
X = df_grouped.head(2842-168)[["trip_hour"]]
y = df_grouped.head(2842-168)[["num_trips"]]

model.fit(X, y)
predictions = model.predict(horizon=168, confidence_level=0.95)
predictions


## Multiple Time Series Forecasting

Use ARIMAPlus to forecast multiple time series simultaneously. The `id_col` parameter differentiates each series.

In [None]:
# Filter for specific stations to create distinct time series.
df_multi = bpd.read_gbq("bigquery-public-data.san_francisco_bikeshare.bikeshare_trips")
df_multi = df_multi[df_multi["start_station_name"] .str.contains("Market|Powell|Embarcadero")]

# Group data by station and date.
features = bpd.DataFrame({
    "start_station_name": df_multi["start_station_name"],
    "num_trips": df_multi["start_date"],
    "date": df_multi["start_date"] .dt.date,
})
num_trips = features.groupby(
    ["start_station_name", "date"], as_index=False
 ).count()

# Fit the model, identifying each series by 'start_station_name'.
model.fit(
    num_trips[["date"]],
    num_trips[["num_trips"]],
    id_col=num_trips[["start_station_name"]]
)
model

## Visualize Forecasting Results

Plot the TimesFM forecast results against the actual data to visually assess model performance.

In [None]:
# Prepare forecast data for plotting.
result = result.sort_values("forecast_timestamp")
result = result[["forecast_timestamp", "forecast_value"]]
result = result.rename(columns={
    "forecast_timestamp": "trip_hour",
    "forecast_value": "num_trips_forecast"
})

# Combine actual and forecasted data for the last 4 weeks.
df_all = bpd.concat([df_grouped, result])
df_all = df_all.tail(672)

# Plot actual vs. forecasted trips.
df_all.plot.line()