<br>

<img src="./image/Logo/logo_elia_group.png" width = 200>

<br>

# Dealing with datetime in Pandas
<br>

The following section is a pretty important one. Much of the data you work with in the Energy sector has some sort of a time component. But: most of the time, if you upload your data, the **dates are basically strings**. So Python **doesn't recognize them as dates or datetimes** and you won't be able to e.g. select a certain time range, get the data from every friday or from a certain hour. So what you have to do is convert your datetime data into a datetime object. 

<br>
Overall, daytime components can be really tricky! That doesn't mean, that they are impossible to work with. Just follow each example below and if you need a break - feel free to take one &#9749;.

## Datetime as Index
<br>
In order to get started with this section, let's import pandas as well as the data set you are going to work with: 

In [None]:
import pandas as pd

In [None]:
elia_load = pd.read_csv("./data/energy/elia_load_2019_01_15.csv", sep = ";")

The data you are working with in the following might look familiar. It is the measured and upscaled load on the Elia grid from January 2019. Let's have a look: 

In [None]:
elia_load.head(n=3)

There are two ways of how you can work with datetime data: 

1. Your datetime **is an index**, in this case a DatetimeIndex
2. Your datetime **is a column**, in this case a Datetime object 

Currently, as you can see in your dataframe, your datetime is a column called `Datetime`. Let's check it's data type: 

In [None]:
elia_load.dtypes

As we can see, the data type of the column `Datetime` is an object only (not a Datetime object). In order to work with your datetime, you need to tell Python, that the data in `Datetime` actually is a date. To do so, you can create a so-called DatetimeIndex. This Datimeindex can be created right when you are uploading your data set: 

1. Just use the `parse_dates` parameter and set it to `= True`. 
2. In addition, you need to tell Python to treat the first column as the index. This is done by using another parameter called `index_col` which you set to `=0`

In [None]:
load_dt_index = pd.read_csv("./data/energy/elia_load_2019_01_15.csv", sep = ";", parse_dates = True, index_col = 0)

In [None]:
load_dt_index.head(n=3)

In [None]:
load_dt_index.index

Nice! Now your first column became a DatetimeIndex! &#128512; 

<br>

To simplify things, you now remove the timezone, because timezones are sometimes tricky in Python and as we are only working in one time-zone, you don't need to worry about them right now. To drop the timezone you use `tz_localize(None)`.

In [None]:
load_dt_index = load_dt_index.tz_localize(None)

In [None]:
load_dt_index.head(n=3)

If you are interested in a specific time range, you can select your data just bei using `.loc[]`. Remember, `.loc[]` is **label based**.

In [None]:
load_night = load_dt_index.loc["2019-01-15 23:45:00":"2019-01-15 22:00:00"]

In [None]:
load_night

You can now do many many other things with this DatetimeIndex. For instance...

- You can get the data of all Fridays within your data set

In [None]:
friday = load_dt_index[load_dt_index.index.weekday == 5]
friday.shape

- You can get the data from a specific hour

In [None]:
night_hour = load_dt_index[load_dt_index.index.hour == 23]
night_hour.shape

- You can select your data by using:
    - df_name.index.year
    - df_name.index.month
    - df_name.index.week
    - df_name.index.weekday
    - df_name.index.day

- You can also do resampling with DatetimeIndex: So let's get the average load over a day

In [None]:
daily_load = load_dt_index.resample('D').mean().round()
daily_load.columns = ["Elia Grid Mean Load"]
daily_load.head()

But be careful when moving to **different levels of granularity** or when aggregating your data. Ask yourself: What data am I showing now? Did the data change fundamentally? How can I indicate this to future users of the data and future readers of my code? 

For instance, when the Mean Load is measured in MW and you use `.sum()` as an aggregator for a certain time period, the unit changes from MW to MWh. This is why you always have to make sure that you rename your data to reflect this.

### Exercise

Now it's your turn. 

1. Update the following cell to upload the data with 
    - the column `Datetime` being a **DatetimeIndex** 
    - this DatetimeIndex being the only Index

In [None]:
physical_flow = pd.read_csv("./data/energy/physical_flow_2021_1_01.csv", sep = ";")

In [None]:
physical_flow.head(n=3)

2. Resemple your data to get the mean() by hour
  - also, update the column name to `Physical_Flow_mean`

3. Remove the timezone from `physical_flow` and store the data without timezone in a variable called `none_tz`

4. Select all data from `none_tz` that is from the 2021-12-01 and between 5am and 5:45am, store the data in a variable called `early_morning`

&#128077; Great, well done!

## Datetime as column - pd.to_datetime
<br>
Besides DatetimeIndex, there is another way to work with your Datetime, which keeps your Datetime in a column, but still makes Python recognizing that your Datetime isn't just a String, but a Datetime. To do so, you use:

- `pd.to_datetime()`
- Syntax: `df_name["Datetime"] = pd.to_datetime(df_name["Datetime"])`
- and define it's format (see help(pd.to_datetime))

But let's do an example so that you can see what `pd.to_datetime` is all about: 

### Example

In the folllowing, you...

1. create a string that looks like a date
2. convert that string into a datetime with `pd.to_datetime`
3. print out the (now) datetime object 
4. check its new data type

In [None]:
date = "2022-05-15"
print(date)

In [None]:
datetime = pd.to_datetime(date)
print(datetime)

Same for Series. In addition, pandas recognizes different formats. Take a look at the output as well as data type: 

In [None]:
s_dates = pd.Series(["2022-05-15",
                     "2022 May 14th",
                     "13.05.22",
                     "2022-05-12"])

s_dates

In [None]:
s_datetime = pd.to_datetime(s_dates)
s_datetime

This can be super inportant. Because sometimes, you might receive data from different sources with dates being in different formats. &#128526;

### Make your Datetime readable 
<br>
Imagine you want to return your datetimes in a string, so that it is readable for a user. This can be done with the to `.strftime()` method. This way, you can format our datetime object in a different string and return it.

In [None]:
print(datetime)

In [None]:
print(datetime.strftime("The load was measured on %A, %b %mth"))

With the parameters`%A`, `%b` and so on, you can tell pandas how to read and therefore print your dates. `%a` for instance tells pandas to return the weekday as localeâ€™s full name. `%b` returns the month as abbreviated name. But don't worry, you don't need to know these by heart. Just check the [documentation](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior).

## Select datetime with pd.to_datetime
<br>

Now that you are all prepaired, let's get into our data set.

In [None]:
elia_load.head()

In [None]:
elia_load.dtypes

With `pd.to_datetime` you convert your string into a Datetime object. With the parameter `format = ` you tell Python the format of your Datetime/how to parse your date. Again, check the [documentation](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior) to get the parameters you need.

In [None]:
elia_load["Datetime"] = pd.to_datetime(elia_load["Datetime"], format = "%Y-%m-%dT%H:%M:%S%z")

In [None]:
elia_load.head()

Your data set may almost look the same, but if you check the data types again you will recognize, that your column `Datetime`...

In [None]:
elia_load.dtypes

So since Python now knows that your first column represents Datetimes, you can do many cool things with them. <br>

For instance, with `day_name()` Pandas tells you the weekday of that date. Let's see which day of the week is that first row:

In [None]:
elia_load.loc[0, "Datetime"].day_name()

And of course, you can also select data. To do so, you use `.loc` as well as `.between`. Pay attention to the `,` separating those start and end datetimes: 

In [None]:
night = elia_load.loc[elia_load["Datetime"].between("2019-01-15 22:45:00+01:00","2019-01-15 23:45:00+01:00")]

In [None]:
night

### Exercise 

Let's play around a bit using the physical flow data again.

In [None]:
physical_flow.head(n=3)

1. Convert your `Datetime` data into real Datetime objects by using `pd.to_datetime`

2. Print out the name of the day from 4th row

3. Let's now say, that there was a huge imbalance on the net on the 23th of July in 2002.
    - Turn the following string into a datetime using `pd.to_datetime`
    - print out a **string** saying "Exact time of imbalance: Tuesday, Jul 07th 23:07:44:000000"
    - Hint: `.strftime()`

In [None]:
imbalance_date = "23/07/2002 23:07:44"

<br>

## Recap, Tips & Takeaways &#128161;

<br>

<div class="alert alert-block alert-success">

**Let's recap what you have learned in this section:**

- You always need to convert your Datetime data into Datetime objects in order to work with it 
- There are mainly two ways to do so: 
    1. DatetimeIndex: with `parse_dates = True` and `index_col = 0` when uploading your data 
       * here, you can use `.loc` to select certain Datetimes
    2. `pd.to_datetime`
       * here, you additionally use `.between`
       * with the parameter `format = ` you can parse your Datetime data 
       * with `.strftime()` you can print out Datetime as String and adjust its format

        
</div>