# Lab04: Timeseries Data
![Time Series](https://uploads-ssl.webflow.com/5ec4696a9b6d337d51632638/6033e511c460742564ad33f7_63C156C6-39FD-4AB8-A947-0CA2F2B58180-p-800.png)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

## Time Series Data Structures
Pandas offers additional data structure for working with date and time. A single point in time is represented as a ``Timestamp``.

In [None]:
pd.to_datetime('2019-12-03 1:35pm')

In [None]:
pd.to_datetime('7/8/1952')

The above date is interpreted as ``month/day/year``. The order can be changed by setting the ``dayfirst`` parameter.

In [None]:
pd.to_datetime('7/8/1952', dayfirst=True)

If we supply a list or array of strings as input to ``to_datetime()``, a sequence of date/time values in a ``DatetimeIndex`` object is returned. This is the core data structure that powers much of pandas time series functionality.

In [None]:
pd.to_datetime(['2018-01-05', '7/8/1952', 'Oct 10, 1995'])

If we pass a number of strings in the same date/time format, we can explicitly specify it with the ``format`` parameter. This can significantly speed up the performance for very large datasets.

In [None]:
pd.to_datetime(['2/25/10', '8/6/17', '12/15/12'], format='%m/%d/%y')

In [None]:
pd.to_datetime(['2/25/2010', '8/6/2017', '12/15/2012'], format='%m/%d/%Y')

## Creating a TimeSeries DataFrame
Daily Open Power System Data from Germany inlcuding data about electricity consumption, wind power production and solar power production for 2006-2017, everything in GWh. The data is from [this tutorial](https://github.com/Open-Power-System-Data/time_series).

First, read the data into `df` and display the head. The data can be found [here](https://raw.githubusercontent.com/ADSLab-Salzburg/DataAnalysiswithPython/main/data/opsd_germany_daily.csv).

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Ignore this cell - this is for automatic testing.

Describe the data frame.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

Show the datatypes that are used in each column. Which datatype is not very easy useable for our usecase?

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

As you can see, the Date is of type _object_. But we want to use it as a _dateime64_ object. To convert it, have a look to the [pandas.to_datetime](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html) function. Print the retulting datatypes.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Ignore this cell - this is for automatic testing.

In the following, one of the dates (`one_day`) is taken. You can easily extract all attributes, like year, day, month, hour, etc. from this object. In our case, the time is not set, thus it is set to 00:00:00. Play around a bit.

In [None]:
one_day = df.iloc[0, 0]
print(one_day)
print(one_day.year)
print(one_day.month)
print(one_day.day)

### Use Date  as Index
Instead of using arbritary integer values as the index, try it with the date. The date is unique anyways. Print the head.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Ignore this cell - this is for automatic testing.

### Adding New Columns
Add new columns named _Year_, _Month_ (Name, Jan-Dec), and _Weekday_ (Mo-So) and fill it with the corresponding values from the index. For further functionality of the the DateTime class have a look in the [API docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html).

Print the head.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Ignore this cell - this is for automatic testing.

### Timebased Indexing
For selecting a special date, just use `loc` with the date as a string. If a whole month shall be returned, ommit the day in the string. Play around a bit.

In [None]:
df.loc['2014-12-11']

In [None]:
df.loc['2012-03']

## Let's plot something!
In the following you can use the pandas plot functionality, and refine the plots using matplotlib (makes it easier).

First, plot the `Consumptions`. Use a `figsize=(11,5)` and a `linewidth=.5`. Use axis labeling and a title.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

Hmm.. Here we do not see too much. Let's try a different styling. can we see a pattern (difference between weekdays and weekends)?

Plot the same as before, but with a different `marker` and `linestyle=None`. See [the docs](https://matplotlib.org/3.3.3/api/_as_gen/matplotlib.pyplot.plot.html).

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

Now, you want to add two additional subplots for `Solar` and `Wind`. Again, use pandas to plot the data (same parameter as before). Adjust the `figsize` that one can see the data. For subplots with pandas, have a look [in the docs](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.plot.html).

Don't forget about axis labeling!

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Investigating Patterns
... by looking at slices of the data or grouping information ...

Plot the `Consumption` for one dedicated year.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

Let's try to group the data (e.g. monthly). You know an appropriate visualization for that!

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Comparing Consumption and Production
### Resampling the Data
For further analyisis, a daily view is maybe not optimal. Let's resample our data to a monthly view (sum of data in a month).

Create a new data frame with the columns `'Consumption', 'Wind', 'Solar', 'Wind+Solar'`. 
Use [resample](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html) on that data to resample it on a monthly view and use [sum](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.resample.Resampler.sum.html?highlight=sum#pandas.core.resample.Resampler.sum) on that resampled data to compute the monthly grouped sum.

Print the tail.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

Plot `'Consumption', 'Wind', 'Solar'` in one plot. Use different colors and different styles. Use a legend and don't forget about axis labeling.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Use Your Own Data
Go to [Wiki Pageviews](https://tools.wmflabs.org/pageviews/) and download a dataset of the pageviews of a Wikipedia page of your choice. 
- Enter the page name(s).
- Click into the Dates field and choose "All time".
- Download the .csv file.

Next, load the data into a pandas ``DataFrame``:
- Change the index to a ``DateTimeIndex``.
- Inspect the dataset (missing values?)
- Add additional columns (Weekday, Month)
- Try to find patterns:
  - selecting just one year/month
  - grouping monthly/weekly

Finally, for the submission, show at least one plot with a weekly aggregate of numbers.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

### Inspiration/Further Reading
- [Time Series Analysis with Pandas (Tutorial)](https://www.dataquest.io/blog/tutorial-time-series-analysis-with-pandas/)
- [Working with Time Series in Python (Tutorial)](https://jakevdp.github.io/PythonDataScienceHandbook/03.11-working-with-time-series.html)