<br>

<img src="./image/Logo/logo_elia_group.png" width = 200>

<br>

# Dealing with datetime in Pandas
<br>

The following section is a pretty important one. Much of the data you work with in the Energy sector has some sort of a time component. But most of the time, if you upload your data, the **dates are basically just strings**. So Python **doesn't recognize them as dates or datetimes types** and you won't be able to e.g. select a certain date & time range like get the data from every friday or from a certain hour. So what you have to do is convert your date & time data from string into an actual datetime object. 

<br>
Overall, daytime components can be really tricky! That doesn't mean, that they are impossible to work with. Just follow each example below and if you need a break - feel free to take one &#9749;.

## Datetime as Index
<br>
In order to get started with this section, let's import pandas as well as the data set you are going to work with: 

In [None]:
import pandas as pd
elia_load = pd.read_csv("./data/energy/elia_load_2019_01_15.csv", sep = ";")

The data you are working with in the following might look familiar. It is the measured and upscaled load on the Elia grid from January 2019. Let's have a look: 

In [None]:
elia_load.head(n=3)

There are two ways of how you can work with datetime data: 

1. Your datetime **is an index**, in this case a DatetimeIndex
2. Your datetime **is a column**, in this case a Datetime object 

Currently, as you can see in your DataFrame `elia_load`, your datetime is a column called `Datetime`. Let's check it's data type: 

In [None]:
elia_load.dtypes

As we can see, the data type of the column `Datetime` is an object only and **not a Datetime object** yet. In order to work with your datetime, you need to tell Python, that the data in `Datetime` actually is a date. To do so, you can create a so-called `DatetimeIndex`. This `DatetimeIndex` can be created right when you are reading your data set from the csv:

1. Just use the `parse_dates` parameter and set it to `True`.
2. In addition, you need to tell Python to treat the first column as the index. This is done by using another parameter called `index_col` which you set to `0`.

In [None]:
load_dt_index = pd.read_csv("./data/energy/elia_load_2019_01_15.csv", sep = ";", parse_dates = True, index_col = 0)
load_dt_index.head(n=3)

In [None]:
load_dt_index.index

Nice! Now your first column became a `DatetimeIndex`! &#128512; 

<br>

To simplify things, you now remove the time zone, because time zones are sometimes tricky in Python and as we are only working in one time zone, you don't need to worry about them right now. To drop the time zone you use `tz_localize(None)`.

In [None]:
load_dt_index = load_dt_index.tz_localize(None)
load_dt_index.head(n=10)

If you are interested in a specific time range, you can select your data just bei using `.loc[]`. Remember, `.loc[]` is **label based**.

In [None]:
load_night = load_dt_index.sort_index().loc["2019-01-15 22:00:00" : "2019-01-15 23:45:00"]
load_night

üóìÔ∏è You can now do many many other things with this DatetimeIndex. For instance...

- You can get the data of all Fridays within your data set

In [None]:
friday = load_dt_index[load_dt_index.index.weekday == 5]
friday.shape

- You can get the data from a specific hour

In [None]:
night_hour = load_dt_index[load_dt_index.index.hour == 23]
night_hour.shape

- You can select your data by using:
    - df_name.index.year
    - df_name.index.month
    - df_name.index.week
    - df_name.index.weekday
    - df_name.index.day

- You can also do a resampling with DatetimeIndex, e.g. get the average load over a day:

In [None]:
daily_load = load_dt_index.resample('D')['Elia Grid Load'].mean().round()
daily_load.rename("Elia Grid Mean Load")

But be careful when moving to **different levels of granularity** or when aggregating your data. Ask yourself: What data am I showing now? Did the data change fundamentally? How can I indicate this to future users of the data and future readers of my code? 

For instance, when the Mean Load is measured in MW and you use `.sum()` as an aggregator for a certain time period, the unit changes from MW to MWh. This is why you always have to make sure that you rename your data to reflect this.

### Exercise

Now it's your turn. 

1. Update the following cell to upload the data with 
    - the column `Datetime` being a **DatetimeIndex** 
    - this DatetimeIndex being the only Index

In [None]:
physical_flow = pd.read_csv("./data/energy/physical_flow_2021_1_01.csv", sep = ";")
physical_flow.head(n=3)

2. Resample your data to get the **mean by hour** and also, update the name to `Physical_Flow_mean`

In [None]:
# delete this line and replace it with your solution

3. Remove the time zone information from `physical_flow` and store the data without it in a new variable called `none_tz`

In [None]:
# delete this line and replace it with your solution

4. Select all data from `none_tz` that is from the **2021-12-01 and between 5am and 5:45am**, store the data in a variable called `early_morning`

In [None]:
# delete this line and replace it with your solution

&#128077; Great, well done!

## Datetime as column - pd.to_datetime
<br>
Besides DatetimeIndex, there is another way to work with your Datetime, which keeps your Datetime in a column, but still makes Python recognizing that your Datetime isn't just a String, but a Datetime object. To do so, you use:

- [pandas.to_datetime](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html#pandas-to-datetime) and define it's format (see `#help(pd.to_datetime)`)

But let's do an example:
1. Create a string that looks like a date
2. Convert that string into a datetime with `pandas.to_datetime`
3. Print out the (now) datetime object 
4. Check its new data type

In [None]:
date = "2022-05-15"
type(date)

In [None]:
datetime = pd.to_datetime(date)
print(datetime)
type(datetime)

Same goes for Series. In addition, Pandas recognizes different formats. Take a look at the output as well as data type: 

In [None]:
s_dates = pd.Series(["2022-05-15",
                     "2022 May 14th",
                     "13.05.22",
                     "2022-05-12"])

s_dates

In [None]:
s_datetime = pd.to_datetime(s_dates, format='mixed', dayfirst=True)
s_datetime

This can be super inportant. Because sometimes, you might receive data from different sources with dates being in different formats. &#128526;

### Make your Datetime readable 
<br>
Imagine you want to return your datetimes in a string, so that it is readable for a user. This can be done with the to `.strftime()` method. This way, you can format our datetime object in a different string and return it.

In [None]:
print(datetime)
print(datetime.strftime("The load was measured on %A, %b %dth"))

With the parameters`%A`, `%b` and so on, you can tell Pandas how to read and therefore print your dates. `%A` for instance tells Pandas to return the weekday as locale‚Äôs full name. `%b` returns the month as abbreviated name. But don't worry, you don't need to know these by heart. Just check the [documentation of strftime and strptime](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior).

## Select datetime with pd.to_datetime
<br>

Now that you are all prepared, let's get into our data set:

In [None]:
elia_load.head()

In [None]:
elia_load.dtypes

With `pd.to_datetime` you convert your string into a Datetime object. With the parameter `format = ` you tell Python the format of your Datetime/how to parse your date. Again, check the [documentation](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior) to get the parameters you need.

In [None]:
elia_load["Datetime"] = pd.to_datetime(elia_load["Datetime"], format = "%Y-%m-%dT%H:%M:%S%z")
elia_load.head()

Your data set may almost look the same, but if you check the data types again you will recognize, that your column `Datetime`...

In [None]:
elia_load.dtypes

So since Python now knows that your first column represents Datetimes, you can do many cool things with them. <br>

For instance, with `day_name()` Pandas tells you the weekday of that date. Let's see which day of the week is that first row:

In [None]:
elia_load.loc[0, "Datetime"].day_name()

And of course, you can also select data. To do so, you use `.loc` as well as `.between`. Pay attention to the `,` separating those start and end datetimes: 

In [None]:
night = elia_load.loc[elia_load["Datetime"].between("2019-01-15 22:45:00+01:00","2019-01-15 23:45:00+01:00")]
night

### Exercise 

Let's play around a bit using the physical flow data again.

In [None]:
physical_flow.head()

1. Convert your `Datetime` data into real Datetime objects by using `pd.to_datetime`

In [None]:
# delete this line and replace it with your solution

2. Print out the name of the day from 4th row

In [None]:
# delete this line and replace it with your solution

3. Let's now say, that there was a huge imbalance on the net on the **23rd of July in 2002**:
    - Turn the string `imbalance_date` into a datetime using `pd.to_datetime`
    - print out a **string** saying "Exact time of imbalance: Tuesday, Jul 07th 23:07:44:000000" 

    *Hint: use `.strftime()`*

In [None]:
imbalance_date = "23/07/2002 23:07:44"
# delete this line and replace it with your solution

<br>

## Recap, Tips & Takeaways &#128161;

<br>

<div class="alert alert-block alert-success">

**Let's recap what you have learned in this section:**

- You always need to convert your Datetime data into Datetime objects in order to work with it 
- There are mainly two ways to do so: 
    1. DatetimeIndex: with `parse_dates = True` and `index_col = 0` when uploading your data 
       * here, you can use `.loc` to select certain Datetimes
    2. `pd.to_datetime`
       * here, you additionally use `.between`
       * with the parameter `format = ` you can parse your Datetime data 
       * with `.strftime()` you can print out Datetime as String and adjust its format

        
</div>