# Introduction to Plotly

This notebook shows the basic concepts to get a plot using Plotly. 

## Initialization

We call the necessary libraries (especially `plotly.graph_objects`). 

In [1]:
import plotly.graph_objects as go
import pandas as pd

## Data preparation

We'll be using a dataset that contains energy consumption data from a Hybrid vehicle. 

Each row is a measurement of the consumption of a vehicle during a trip. The vehicle switches from electric mode to fuel mode depending on various parameters, such as the charge of the battery, power needed, etc. 

The variables are: 

- Distance in EV mode (km): Total distance driven completely in electric mode (this occurs when the battery is charged sufficiently). 
- Fuel consumption (L/100km): Fuel consumption rate for this trip. 
- Electric Consumption (kWh/100km): Electric consumption rate for this trip. 
- Distance (km): Total trip disrance. 
- Avg speed (km/h): Average speed during the trip. 
- Energy recovered (kWh): Total amount of energy recovered during the trip. A hybrid vehicle produces (recovers) electric energy e.g. when braking or driving downhill. 

In [2]:
FILE = "data/ElectricVehicle.xlsx"

In [3]:
df = pd.read_excel(FILE)
df.head()

Unnamed: 0,Distance in EV mode (km),Fuel consumption (L/100km),Electric Consimption (kWh/100km),Distance (km),Avg speed (km/h),Energy recovered (kWh)
0,27.7,0.3,12.0,29.4,43.3,
1,24.1,0.5,14.9,24.9,14.1,
2,1.9,4.7,3.2,21.9,22.4,
3,0.1,5.9,5.6,21.8,16.8,2.1
4,2.1,4.7,3.8,32.5,19.1,3.1


Let's rename the columns to usable names. 

In [4]:
df.columns = ['distance_electric', 'fuel', 'electric', 'total_distance', 'speed', 'energy_recovered']

And let's add a variable specifying the % of the trip that happened in electric mode (this should be related to the charge of the battery). 

In [5]:
df['charge'] = df.distance_electric / df.total_distance

## Plotting with Plotly

This is an example of Scatterplot using the plotly package. 

- We instatiate a `Figure` object. 
- As an argument we give a "trace" or list of traces. You can think of a trace as a "layer" of a plot. This trace is a scatterplot, which can be created with `go.Scatter`. 
- For each trace, you need to give:
    - The necessary `x` and/or `y`
    - The `mode` (type of plot, here `markers` specifies a point plot ). 
    - Additional  arguments that specify **properties** of the plot, in the form of a dictionary. Here, for instance we specify how the marker should look like. 
- Then, with `update_layout`, we can specify x and y axis titles, and a plot title. 
- We display the figure with the `show` method.  

In [6]:
figure = go.Figure(
    go.Scatter(
        x=df.charge,
        y=df.electric,
        mode='markers',
        marker_color = "blue",
        marker_size = 10,
    )   
)
figure.update_layout(
    title='Electric Vehicle',
    xaxis_title='Initial charge',
    yaxis_title='Electric consumption (kWh/100km)'
)
figure.show()

# If this error appearsValueError: 
# Mime type rendering requires nbformat>=4.2.0 but it is not installed
# install the package nbformat


In [7]:
import sklearn.preprocessing as pre
from sklearn.cluster import KMeans

data2 = df[['charge', 'electric']]

data2 = pre.scale(data2)

kmeans = KMeans(n_clusters=3).fit(data2)

df['cluster'] = kmeans.predict(data2)

figure = go.Figure()

for cluster in df.cluster.unique():
    data = df[df.cluster == cluster]
    figure.add_trace(
        go.Scatter(
            x=data.charge,
            y=data.electric,
            mode='markers',
            marker_size=10,
            name=f'Cluster {cluster}'
        )
    )

figure.update_layout(
    title='Electric Vehicle',
    xaxis_title='Initial charge',
    yaxis_title='Electric consumption (kWh/100km)'
)




## Understanding properties

Properties such as `marker` specify how to display the points of a plot. Plotly allows styling to a high level of detail. A tutorial on the important properties that are related to markers can be found [here](https://plotly.com/python/marker-style/). 

### Styling with dictionaries

The most common option for styling is a dictionary, as in the given example. (Note it uses the construct `dict(...)` but you could also pass a dictionary in explicit form like `{'size': 8}`). 

### Styling with shortcuts

Plotly developers where smart and allow specifying the properties as "shortcuts" instead of using a dictionary. Please refer to the example below. 

For instance, `marker_size=8'  is equivalent to the dictionary

    marker = {
        'size': 8
    }

and `marker_line_width=2` is equivalent to

    marker = {
        'line': {
            'width': 2
        }
    }

Basically, plotly understands that an underscore (_) in an argument name is a shortcut for a dictionary entry. This will be illustrated in the figure below. 

In [8]:

figure2 = go.Figure(
    go.Scatter(
        x=df.charge,
        y=df.electric,
        mode='markers',
        marker_size=df.speed,
        marker_color="blue",
        marker_line_width=2,
        marker_line_color='blue',
        #marker_showscale=True,
        #marker_colorscale='blues',    
    )   
)
figure2.update_layout(
    title='Electric Vehicle',
    xaxis_title='Initial charge',
    yaxis_title='Electric consumption (kWh/100km)'
)
figure2.show()


## Value-based vs. list-based properties

A property or subproperty can take a single value or a list-like object. 

- When a property (like the color) is a single value, all points get the same value. 
- When the property (like the color) is a list, each points in the data gets the values of the list in order (make sure it has the same number of elements). 

In the example above, see how the value of `maker_color` was set to `[df.speed]`. This means each point will be colored according to the corresponding value of its column `speed`. 

In that case, we can display a scale on the right by setting the property `marker_showscale=True`. 

## Understanding traces

In Plotly, a **trace** is a separate specification of how data shoud be plotted. 
You can think of it as a new "series" of data.
You can plot different traces on the same plot and each will be plotted according to its own specification.  

In [9]:
# Define a new binary variable high_speed
high_speed = df.speed > 30

# Plot the data for high speed only
figure3 = go.Figure(
    go.Scatter(
        name = "High speed",
        x=df.charge[high_speed],
        y=df.electric[high_speed],
        mode='markers',
        marker={
            'size': 8,
            'color': "red",
            'line': {
                'width': 2,
                'color': 'red'
            }
        }
    )   
)

# Now add another scatterplot (trace) for low speed
# It will be plotted on the same figure but with a different color
figure3.add_trace(
    go.Scatter(
        name = "Low speed",
        x=df.charge[~high_speed],
        y=df.electric[~high_speed],
        mode='markers',
        marker={
            'size': 8,
            'color': "lightgrey",
            'line': {
                'width': 2,
                'color': 'blue'
            }
        }
    )   
)

# By activating the legend, the user can distinguish between the two groups
figure3.update_layout(
    title='Electric Vehicle',
    xaxis_title='Initial charge',
    yaxis_title='Electric consumption (kWh/100km)',
    # showlegend=True
)

figure3.show()



## Understanding the data object

Each plotly object has a property called `data` that contains a tuple of objects describing everything that is in the plot: 

In [10]:
figure3.data

(Scatter({
     'marker': {'color': 'red', 'line': {'color': 'red', 'width': 2}, 'size': 8},
     'mode': 'markers',
     'name': 'High speed',
     'x': array([0.94217687, 0.        , 0.        , 0.58885017, 0.        , 0.92068966,
                 0.24444444, 0.9972752 , 0.08962264, 0.94425087, 0.01376812, 0.03301127]),
     'y': array([12. ,  3.6,  1.3, 10. ,  2.4, 11.7,  7.9, 13.8,  3.3, 12.2,  4.1,  2.9])
 }),
 Scatter({
     'marker': {'color': 'lightgrey', 'line': {'color': 'blue', 'width': 2}, 'size': 8},
     'mode': 'markers',
     'name': 'Low speed',
     'x': array([0.96787149, 0.08675799, 0.00458716, 0.06461538, 0.81385281, 0.5036855 ,
                 0.44289236, 0.00847458, 0.        , 0.01453958, 0.39339339, 0.        ,
                 1.        , 0.08370044, 0.97084548, 0.46897547, 0.02912621, 0.01848049,
                 0.        , 0.46336429, 0.42716858, 0.31391403, 0.25375171, 0.46818923,
                 0.16305022, 0.61151079, 0.00427807, 0.89726027, 0.50712251

This is useful because it allows accessing what has beed displayed programmatically (this is not possible in other libraries). Besides, it will be useful when we design interactions and callbacks. 