# Python for Data Analysis - Plotting with Plotly

* There are many different plotting libraries for Python, including matplotlib, Plotly, Bokeh, Seaborn, and many more.
* Matplotlib is worth learning, as it is commonly used in academic settings for creating report-ready plots.
* However, some of the other plotting libraries, such as Plotly and Seaborn, provide a convenient way of creating interactive and visually plots.


In this example, we will load the hills data set as before. This is the `The Database of British and Irish Hills v18` and is freely available under a Creative Commons Attribution 4 License, at `https://www.hills-database.co.uk/downloads.html`. This data set contains grid reference information for peaks, hills, and cols in Britain. 

In [1]:
import os
import pandas as pd

filename = "DoBIH_v18.csv"
data_folder = "data/"
project_folder = "../"
filepath = os.path.join(project_folder, data_folder, filename)

print(f"My data file is located at: '{filepath}'")
print(f"My data path is valid: {os.path.exists(filepath)}")

df = pd.read_csv(filepath, encoding='utf-8', engine='python')

My data file is located at: '../data/DoBIH_v18.csv'
My data path is valid: True


* We can use Plotly Express, which is just Plotly with reasonable default values, to get started very quickly.
* First lets reproduce the matplotlib scatter example we saw previously.
* We will need to install Plotly in our virtual environment

In [2]:
import plotly.express as px

fig = px.scatter(
    df, 
    x="Longitude", 
    y="Latitude",
    color="Country",
    hover_data="Metres",
    title="Location of hills in Great Britain"
    )
fig.layout.yaxis.scaleanchor="x"
fig.show()

* Lets make the marker size smaller

In [3]:
fig = px.scatter(
    df, 
    x="Longitude", 
    y="Latitude",
    color="Country",
    hover_data="Metres",
    title="Location of hills in Great Britain",
    symbol_sequence=["triangle-up"]
    )
fig.update_traces(marker={'size': 2})
fig.layout.yaxis.scaleanchor="x"
fig.show()

* And change the opacity

In [4]:
fig = px.scatter(
    df, 
    x="Longitude", 
    y="Latitude",
    color="Country",
    hover_data="Metres",
    opacity=0.6,
    title="Location of hills in Great Britain",
    symbol_sequence=["triangle-up"]
    )
fig.update_traces(marker={'size': 2})
fig.layout.yaxis.scaleanchor="x"
fig.show()

* Lets colour the points not by country, but by their height. This is a continuous variable, so a continuous colour scale should work well.

In [5]:
fig = px.scatter(
    df, 
    x="Longitude", 
    y="Latitude",
    color="Metres",
    hover_data="Metres",
    title="Location of hills in Great Britain",
    symbol_sequence=["triangle-up"]
    )
fig.update_traces(marker={'size': 3})
fig.layout.yaxis.scaleanchor="x"
fig.show()

* It is very easy to change the colour scale of the plot.

In [6]:
fig = px.scatter(
    df, 
    x="Longitude", 
    y="Latitude",
    color="Metres",
    color_continuous_scale='Viridis',
    hover_data="Metres",
    title="Location of hills in Great Britain",
    symbol_sequence=["triangle-up"]
    )
fig.update_traces(marker={'size': 3})
fig.layout.yaxis.scaleanchor="x"
fig.show()

* Lets filter our data to include only hills above 950 metres before plotting.

In [7]:
threshold_height = 700
tall_hills_df = df.loc[df["Metres"] >= threshold_height].sort_values("Metres")

fig = px.scatter(
    tall_hills_df, 
    x="Longitude", 
    y="Latitude",
    color="Metres",
    color_continuous_scale='Inferno',
    hover_data="Metres",
    title=f"Location of hills above {threshold_height} metres in Great Britain",
    symbol_sequence=["triangle-up"]
    )
fig.layout.yaxis.scaleanchor="x"
fig.show()