# Data Vis: Plotting Time Series Data
* Notebook 1: Plotting Time Series

## Setup

In [None]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

# Data

In this notebook, we will use a private dataset about (solar) power generation and use of a single family house in Germany. The dataset contains the following columns:
- `timestamp`: The date and time of the measurement. The data is recorded every 1 hour.
- `total_consumption_kw`: The amount of power consumed per hour in kilowatts.
- `from_grid_kw`: The amount of power provided from the grid per hour in kilowatts.
- `from_pv_kw`: The amount of power generated by the solar panels per hour in kilowatts.
- `from_battery_kw`: The amount of power provided by the battery per hour in kilowatts.
- `to_grid_kw`: The amount of power provided to the grid per hour in kilowatts.
- `to_battery_kw`: The amount of power provided to the battery per hour in kilowatts.
- `battery_percent`: The average percentage of battery charge at the time of measurement.
- `battery_kwh`: The average amount of power in the battery at the time of measurement in kilowatt hours.
- various weather data, including temperature, humidity, precipitation, wind speed, and solar radiation (ghi, dni, dhi).

In [None]:
data = pd.read_csv("solar.csv")

In [None]:
data["timestamp"] = pd.to_datetime(data["timestamp"])

In [None]:
data["year"] = data["timestamp"].dt.year
data["month"] = data["timestamp"].dt.month_name()
data["day"] = data["timestamp"].dt.day
data["hour"] = data["timestamp"].dt.hour
data["weekday"] = data["timestamp"].dt.day_name()
data["is_weekend"] = np.where(data["weekday"].isin(["Saturday", "Sunday"]), 1, 0)   

In [None]:
data.set_index("timestamp", inplace=True)

In [None]:
data.head()

# Line Charts

Line charts are a great way to visualize time series data. They allow us to see trends and patterns over time.

In [None]:
plt.figure(figsize=(12, 6))
sns.lineplot(data=data, x=data.index, y="from_pv_kw")
plt.title("Solar Power Generation Over Time")
plt.xlabel("Date")
plt.ylabel("Power Output (kW)")
plt.show()

The hourly frequency of the data generates a lot of noise. To reduce this noise, we can resample the data to a daily frequency. This will give us a clearer picture of the trends in the data.

In [None]:
data_daily = data.resample("D").sum()

plt.figure(figsize=(12, 6))
sns.lineplot(data=data_daily, x=data_daily.index, y="from_pv_kw")
plt.title("Solar Power Generation Over Time")
plt.xlabel("Date")
plt.ylabel("Power Output (kW)")
plt.show()

The daily patterns become more visible when we focus on one week of data.

In [None]:
data_1week = data["2024-07-01":"2024-07-07"]

plt.figure(figsize=(12, 6))
sns.lineplot(data=data_1week, x=data_1week.index, y="from_pv_kw")
plt.title("Solar Power Generation Over Time")
plt.xlabel("Date")
plt.ylabel("Power Output (kW)")
plt.show()

Now it's your turn! Create line charts for other time series in the dataset. You can use the same approach as above to resample the data to a daily frequency and plot the data.

In [None]:
# YOUR CODE HERE

# Seasonal Plot

A seasonal plot shows the data for one or more seasons (e.g., month, weekdays) on the same plot. This allows us to see how the data changes over time and how it varies by season.

To do this, we first need to group the data by seasons and arrange the values by weekdays.

In [None]:
data_grouped = data.groupby(["year", "month", "weekday"]).mean().reset_index()
data_grouped["weekday"] = pd.Categorical(data_grouped["weekday"], 
                                          categories=["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"], 
                                          ordered=True)

In [None]:
data_grouped

Then, we create separate dataframes for each season.

In [None]:
data_grouped_jan = data_grouped[data_grouped["month"] == "January"]
data_grouped_feb = data_grouped[data_grouped["month"] == "February"]
data_grouped_mar = data_grouped[data_grouped["month"] == "March"]
data_grouped_apr = data_grouped[data_grouped["month"] == "April"]
data_grouped_may = data_grouped[data_grouped["month"] == "May"]
data_grouped_jun = data_grouped[data_grouped["month"] == "June"]
data_grouped_jul = data_grouped[data_grouped["month"] == "July"]
data_grouped_aug = data_grouped[data_grouped["month"] == "August"]
data_grouped_sep = data_grouped[data_grouped["month"] == "September"]
data_grouped_oct = data_grouped[data_grouped["month"] == "October"]
data_grouped_nov = data_grouped[data_grouped["month"] == "November"]
data_grouped_dec = data_grouped[data_grouped["month"] == "December"]


In [None]:
data_grouped_jan

Finally, we can plot the data using the `lineplot` method. We can see that the electricity consumption is higher on MOndays in the winter months.

In [None]:
plt.figure(figsize=(12, 6))
sns.lineplot(data=data_grouped_jan, x="weekday", y="total_consumption_kw", label="January")
sns.lineplot(data=data_grouped_feb, x="weekday", y="total_consumption_kw", label="February")
sns.lineplot(data=data_grouped_mar, x="weekday", y="total_consumption_kw", label="March")
sns.lineplot(data=data_grouped_apr, x="weekday", y="total_consumption_kw", label="April")
sns.lineplot(data=data_grouped_may, x="weekday", y="total_consumption_kw", label="May")
sns.lineplot(data=data_grouped_jun, x="weekday", y="total_consumption_kw", label="June")
sns.lineplot(data=data_grouped_jul, x="weekday", y="total_consumption_kw", label="July")
sns.lineplot(data=data_grouped_aug, x="weekday", y="total_consumption_kw", label="August")
sns.lineplot(data=data_grouped_sep, x="weekday", y="total_consumption_kw", label="September")
sns.lineplot(data=data_grouped_oct, x="weekday", y="total_consumption_kw", label="October")
sns.lineplot(data=data_grouped_nov, x="weekday", y="total_consumption_kw", label="November")
sns.lineplot(data=data_grouped_dec, x="weekday", y="total_consumption_kw", label="December")
plt.title("Power Consumption per Day and Month")  
plt.xlabel("")
plt.ylabel("Power Consumption (kWh)")
plt.xticks(rotation=45)
plt.legend()
plt.show()
