<br>

<img src="./image/Logo/logo_elia_group.png" width = 200>

<br>

# Data Visualization
<br>

&#128077; Congratulations, you have learned the basics of Python and even beyond that by exploring the infamous packages Pandas. Now you can sit back and relax. 😎 

This last section is the fun part and should give you a short overview of how you can quickly visualize your data. Because this is where you get all the "WOW" ah "Ahhhs". ✨🤯 
There are three visualization libraries, which you can use to visualize your data: **matplotlib**, **seaborn** and **plotly**.


[Matplotlib](https://matplotlib.org/)
- Built to emulate [MATLAB](https://www.mathworks.com/products/matlab.html)'s graphic commands but is independent of MATLAB
- One of the most used and known library for visualization in Python

[Seaborn](https://seaborn.pydata.org/)
- Built on Matplotlib actually
- Adds additional functionality
- Works well with Pandas data structures like DataFrames and Series

[Plotly](https://plotly.com/)
- Creates interactive graphs by default
- Is just really cool stuff &#128526;


## Make your data ready for Visualization
<br>

But before you do fancy bar charts, boxplots or interactive graphs, you have to prepare your data in order to subsequently visualize it. But no worries, this is what you have been trained for all along. <br>
1. First, upload your data: 

In [None]:
import pandas as pd

energy_borders = pd.read_csv("./data/energy/physical_flow_2021_1_01.csv", sep = ";")

2. Understand your data. In this case: A positive value means export from Belgium, while negative value means import into Belgium.

In [None]:
energy_borders.head(3)

In [None]:
energy_borders.dtypes

3. Check for missing values and replace or drop them if there are any 

In [None]:
energy_borders.isna().any().any()

4. Convert your Datetime data into a Datetime object

In [None]:
energy_borders["Datetime"] = pd.to_datetime(energy_borders["Datetime"])

&#128077; Great! Now your data is ready to be visualized.

But before we get started with the different libraries, there is an easy and super short way to visualize your data ad-hoc `pandas.DataFrame.plot()`! This is a Pandas in-built function. As default, matplotlib is used as plotting backend.

In [None]:
energy_borders.plot(x = "Datetime", y = "Physical Flow Value")

Sure, this is not the most fancy plot. However, to get a first idea of how your data looks, this can come in handy!

## Matplotlib
<br>

Matplotlib is often used in research or scientific backgrounds. You can do all sorts of plots with it, from bar charts and histograms to scatterplots and boxplots.
There are many different ways of how to use Matplotlib, which is provided through the submodule called pyplot and usually imported as `import matplotlib.pyplot as plt` for further usage.

- `plt.subplots` is the basic command to start with. It creates two objects:
    - A figure (fig) which is a container that holds everything you see on the page
    - An axes object (ax) which is a canvas on which you will draw on  
- Feel free to check ou the [user guide](https://matplotlib.org/stable/tutorials/introductory/usage.html#sphx-glr-tutorials-introductory-usage-py)

<img src = "./image/matplotlib_example.png" width = 400>

[Image Source](https://matplotlib.org/stable/gallery/lines_bars_and_markers/spectrum_demo.html#sphx-glr-gallery-lines-bars-and-markers-spectrum-demo-py)

## Example Matplotlib

Let's do an example plot with Matplotlib. For that, check out all countries in your DataFrame `energy_borders`. All of them represent actual neighbouring bidding zones to Belgium:

In [None]:
energy_borders["Control area"].unique()

In [None]:
germany = energy_borders.loc[energy_borders["Control area"] == "Germany"]
france = energy_borders.loc[energy_borders["Control area"] == "France"]

In [None]:
germany.head(3)

In [None]:
france.head(3)

Let's visualize!

In [None]:
import matplotlib.pyplot as plt

# analogy: fig is like a paper and ax is what you would draw on it 
fig, ax = plt.subplots() # creating the fig and ax objects

# adding data to our plot
ax.plot(germany["Datetime"],germany["Physical Flow Value"], label = "Germany") # adding the first line plot
ax.plot(france["Datetime"], france["Physical Flow Value"], label = "France") # adding the second line plot

ax.set_ylabel("Physical Flow in MW") # setting the label of the y axis
ax.legend() # adding a legend
plt.xticks(rotation = 45) # changing the rotation of x-label

# showing our plot
plt.show()

### Exercise

Look at the plot above and try to add a line of code that implements a horizontal line at y = 0, which, in this case, marks the threshold of import or export. <br>
Remember, you can find basically everything in the documentation, stackoverflow or just by googling it 😎!

In [None]:
# delete this line and replace it with your solution

## Seaborn 

Seaborn is a data visualization library based on matplotlib. It is great for statistical data visualization. In order to work with seaborn you have to import both - matplotlib as well as seaborn. Check out the [documentation](https://seaborn.pydata.org/introduction.html) to get more info. 
<br>

<img src = "./image/seaborn_example_plot.png" width = 500>

So let's get right to it!

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

energy_borders_2 = pd.read_csv("./data/energy/physical_flow_2021_1_01.csv", sep = ";") # do NOT import your Datetime as DatetimeIndex
energy_borders_2["Datetime"] = pd.to_datetime(energy_borders_2["Datetime"] ) # instead convert Datetime as Datetime Object with pd.to_datetime

Select the bidding zone that you would like to visualize: 

In [None]:
energy_borders_2["Control area"].unique()

In [None]:
luxembourg = energy_borders_2[energy_borders_2["Control area"] == "Luxembourg"]
netherlands = energy_borders_2[energy_borders_2["Control area"] == "Netherlands"]

In [None]:
luxembourg.head(3)

In [None]:
netherlands.head(3)

Data visualization: 

In [None]:
sns.lineplot(x = "Datetime", y = "Physical Flow Value", data = luxembourg, label = "Luxembourg") # plotting Luxembourg
sns.lineplot(x = "Datetime", y = "Physical Flow Value", data = netherlands, label = "Netherlands") # plotting Netherlands
plt.xticks(rotation=45)
plt.legend()

plt.gcf().axes[0].yaxis.get_major_formatter().set_scientific(False) # transforming y labels to whole numbers
plt.show()

Cool right? 😎 Please note that the x and y labels are **automatically set by the column name**. 
<br>

You can also create a data set with multiple selected countries and visualize them:

In [None]:

filter_countries = (energy_borders_2["Control area"] == "Germany") | (energy_borders_2["Control area"] == "Netherlands") | (energy_borders_2["Control area"] == "Luxembourg") 

sns.lineplot(x = "Datetime", y = "Physical Flow Value", hue = "Control area", data = energy_borders_2[filter_countries])
plt.xticks(rotation=45)
plt.show()

## Plotly
<br>

Plotly is by far the most fun library for data visualization. The main advantage of this library is that its plot are **interactive plots by default**. With mouse hover functionality, interactive buttons e.g. zoom in, and many more. It is a JavaScript graphing library, but has a Python wrapper, so there is no need to know JavaScript.

There are three ways on how Plotly figures can be created:

1. **plotly.express** for simple, quick plots `(px)`
    - Specify a DataFrame and its columns as arguments
    - Quick, nice but less customization
2. **plotly.graph_objects** for more customization `(go)`
    - go.X methods like `go.Bar()` or `go.Scatter()`
    - Many more customization options, but also more code needed
    - You have to call the whole DataFrame with its columns
3. **plotly.figure_factory** for specific, advanced figures
    - out of scope hence not handled but look [here](https://plotly.com/python/figure-factories/) for more information

If you want to learn more about Plotly, check out the **Documentations** links below:

1. [Basics](https://plotly.com/python/)
2. [Plotly Express](https://plotly.com/python-api-reference/plotly.express.html)
3. [Graph_opbjects page](https://plotly.com/python-api-reference/plotly.graph_objects.html)
4. [go.Figure](https://plotly.com/python-api-reference/generated/plotly.graph_objects.Figure.html)

### Plotly Express

In [None]:
import pandas as pd
import plotly.express as px

# creating a dummy DataFrame
flow_by_area = pd.DataFrame({"Control Area":["Luxembourg", "Netherlands", "Germany", "France"], "Mean Physical Flow in MW": [-234.05,378.23,120.42,-232.09]})

#  create the plot
fig  = px.bar(data_frame = flow_by_area, x = "Control Area", y = "Mean Physical Flow in MW", title = "Physical Flow by Area")
fig.show()
print(fig)

But what if you would like to change the colour? No problem!

In [None]:
#  create the graph - adding color by specificing the color argument with the column score and creating a custom scale (not necessary)
fig  = px.bar(data_frame = flow_by_area, x = "Control Area", y = "Mean Physical Flow in MW", title = "Physical Flow by Area", color="Mean Physical Flow in MW",  color_continuous_scale=["rgb(0,255,0)","rgb(0,178,238)"])
fig.show()

### Scatterplot

Scatterplot are pretty useful to observe relationships between variables. It uses dots to represent a (x,y) value pairs. In plotly, you can easily add hover or mouseover effects to display further info on each value pair. But first, let's have a look at the data you are using again: 

In [None]:
energy_borders_2.head()

In [None]:
# Create the scatterplot
fig = px.scatter(data_frame = energy_borders_2,
  x = "Datetime", 
  y = "Physical Flow Value", 
  color = 'Control area',
  # Add columns to the hover information
  hover_data = ["Resolution code", "Control area"],
  # Add bold variable in hover information
  hover_name = 'Control area'
)

# add an annotation (floating annotation with "xref":"paper", "yref":"paper", "x":0.5, "y":0.8)
my_annotation = {"x": "2021-12-01 18:45:00+01:00", "y":1707.232, "showarrow": True, "arrowhead":3, "text": "Look at this outlier", "font": {"size": 10, "color": "black"}}
fig.update_layout({"annotations" : [my_annotation]})
# Show the plot
fig.show()

As you can see, you can do lots of cool stuff with **Plotly**. Even annotations such as *"Look at this data point"* are possible! Check out the hover effects! 🚀

<br>

## Recap, Tips & Takeaways &#128161;

<br>

<div class="alert alert-block alert-success">

**Let's recap what you have learned in this chapter:**

- `pandas.DataFrame.plot(x = "col_name", y = "col_name")` makes it possible to quickly plot your data and comes out of the box with Pandas 😎
- To work with **Matplotlib** you have to **import matplotlib.pyplot as plt** and create the fig and ax objects `fig, ax = plt.subplots()`
- To work with **Seaborn**, you need to **import matplotlib.pyplot as plt** AND **import seaborn as sns**
- **Plotly** is great for interactive plots
    - You need to **import pandas as pd** AND **import plotly.express as px**
    - With `plotly.express` you can plot interactive graphs very quickly
- And last but not least, you can visualize basically anything! You can easily adjust your plots, customize them, create thresholds, additional axes and so on. Just check the documentation and play around a bit. 💪
</div>