# Data Vis: Plotting Time Series Data
* Notebook 1: Smoothing and Trends

## Setup

In [None]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

# Data

In this notebook, we will use a private dataset about (solar) power generation and use of a single family house in Germany. The dataset contains the following columns:
- `timestamp`: The date and time of the measurement. The data is recorded every 1 hour.
- `total_consumption_kw`: The amount of power consumed per hour in kilowatts.
- `from_grid_kw`: The amount of power provided from the grid per hour in kilowatts.
- `from_pv_kw`: The amount of power generated by the solar panels per hour in kilowatts.
- `from_battery_kw`: The amount of power provided by the battery per hour in kilowatts.
- `to_grid_kw`: The amount of power provided to the grid per hour in kilowatts.
- `to_battery_kw`: The amount of power provided to the battery per hour in kilowatts.
- `battery_percent`: The average percentage of battery charge at the time of measurement.
- `battery_kwh`: The average amount of power in the battery at the time of measurement in kilowatt hours.
- various weather data, including temperature, humidity, precipitation, wind speed, and solar radiation (ghi, dni, dhi).

In [None]:
data = pd.read_csv("solar.csv")

In [None]:
data["timestamp"] = pd.to_datetime(data["timestamp"])

In [None]:
data["year"] = data["timestamp"].dt.year
data["month"] = data["timestamp"].dt.month_name()
data["day"] = data["timestamp"].dt.day
data["hour"] = data["timestamp"].dt.hour
data["weekday"] = data["timestamp"].dt.day_name()
data["is_weekend"] = np.where(data["weekday"].isin(["Saturday", "Sunday"]), 1, 0)   

In [None]:
data.set_index("timestamp", inplace=True)

In [None]:
data.head()

# Simple Moving Average

Let's first calculate a simple moving average (SMA) for the `from_pv_kw` column. The SMA is calculated by taking the average of the last `n` values in the series. We will use different `window` sizes and illustrate the effects.

In [None]:
data["from_pv_kw_rolling_mean_3"] = data["from_pv_kw"].rolling(window=3).mean()
data["from_pv_kw_rolling_mean_6"] = data["from_pv_kw"].rolling(window=6).mean()
data["from_pv_kw_rolling_mean_12"] = data["from_pv_kw"].rolling(window=12).mean()

In [None]:
data.head(20)

In [None]:
data_1week = data["2024-07-01":"2024-07-07"]

plt.figure(figsize=(12, 6))
sns.lineplot(data=data_1week, x=data_1week.index, y="from_pv_kw", color="black")
sns.lineplot(data=data_1week, x=data_1week.index, y="from_pv_kw_rolling_mean_3", color="red", label="3-hours rolling mean")
sns.lineplot(data=data_1week, x=data_1week.index, y="from_pv_kw_rolling_mean_6", color="green", label="6-hours rolling mean")
sns.lineplot(data=data_1week, x=data_1week.index, y="from_pv_kw_rolling_mean_12", color="blue", label="12-hours rolling mean")
plt.title("Solar Power Generation Over Time")
plt.xlabel("Date")
plt.ylabel("Power Output (kW)")
plt.show()

# Trends with Functional Form

We can use `regplot` to add a linear or plolynomial regression line to line chart. Unfortunately, `regplot` does not support time indexes. So, we have to reset the index and use observation numbers on the x-axis.

In [None]:
# plot from_pv_kw over time and a linear trend line
plt.figure(figsize=(12, 6))
sns.lineplot(data=data, x=data.reset_index().index, y="from_pv_kw", color="black", alpha=0.2)
sns.regplot(data=data, x=data.reset_index().index, y="from_pv_kw", scatter=False, color="red", label="Linear Trend")
sns.regplot(data=data, x=data.reset_index().index, y="from_pv_kw", order=2, scatter=False, color="blue", label="Quadratic Trend")
sns.regplot(data=data, x=data.reset_index().index, y="from_pv_kw", order=3, scatter=False, color="green", label="Cubic Trend")
plt.title("Solar Power Generation Over Time with Linear Trend")
plt.xlabel("Date")
plt.ylabel("Power Output (kW)")
plt.legend()
plt.show()

# Your Turn

Now it's your turn to play with the data. You can use other time series and try different window sizes for the SMA. You can also try different functional forms for the trend line, such as polynomial regression.

In [None]:
# YOUR CODE HERE