# Starter Guide to Data Visualisation

There are hundreds of tools used for data visualisation and analysis, the most popular being Excel, Python, R, BI tools (Tableau, PowerBI, etc).

In Python alone, there are several libraries you can avail to make intersting and unique visualisations. In this notebook, I will take a brief look at Matplotlib, Seaborn and Plotly and how we can make some common plots using these. We will also look at some additional edits we can make to the plot - styling the plot, gridspec, adding text, etc.

**Remember:** This noteook is just meant to give you an introduction to some plots and act as an inspiration for you to explore new libraries and plots

### A Brief Introduction to the libraries

For those who are new to these libraries, you should know in the background, seaborn uses matplotlib to make the visualisations so the results you get from seaborn will be the same. I personally prefer seaborn due to its range of customisations but you could use matplotlib as well. 

Furthermore, if you are looking to build an interactive experience for the user, Plotly is the way to go. Again, there are some alternatives here like Bokeh, Marvel, etc. In this notebook, to interact with plotly, you can zoom into a plot by dragging on an area or isolate certain plots by clicking them on the legend. Double clicking the figure resets it back to default.

# Imports

For this demonstration, we will primarily use the Iris dataset for simplicity. We may resort to other datasets for a few visualisations.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly
import plotly.express as px
import plotly.graph_objects as go
import matplotlib.gridspec as gridspec
import palettable.scientific.sequential as palette
import matplotlib as mpl

In [None]:
country_data = px.data.gapminder()
iris = px.data.iris()
canada_data = country_data[country_data['country'] == 'Canada']
brazil_data = country_data[country_data['country'] == 'Brazil']
asia_data = country_data[country_data['continent'] == 'Asia']

# GridSpec

Before we dive into the different plots using various libraries, let's take a look at how to set up plots using gridspec. This helps us arrange our figures better. There are several ways to customise your figures and GridSpec is my go-to. If you there are other methods you use, leave a comment and I'd love to add them here as well.

For this demonstration, we will plot a 3x4 space with different dimensions. You can fit different figures in each custom dimension and size them however you like. It's pretty useful when you're trying to make a plot complete with visualisations on different features and want to organise the text with it.

In [None]:
fig = plt.figure(figsize=(18, 12))
spec = gridspec.GridSpec(ncols=4, nrows=3, figure=fig)

plt.title('GridSpec plots in a 3x4 grid')
ax = fig.add_subplot(spec[0,0])
plt.text(.25,.5, 'Plot on row 1 and column 1')
ax = fig.add_subplot(spec[0,1:3])
plt.text(.25,.5, 'Plot on row 1 and column 2-3')
ax = fig.add_subplot(spec[0,3])
plt.text(.25,.5, 'Plot on row 1 and column 4')
ax = fig.add_subplot(spec[1:3,0])
plt.text(.25,.5, 'Plot on row 2-3 and column 1')
ax = fig.add_subplot(spec[1,1])
plt.text(.25,.5, 'Plot on row 2 and column 2')
ax = fig.add_subplot(spec[2,1])
plt.text(.25,.5, 'Plot on row 3 and column 2')
ax = fig.add_subplot(spec[1:3,2:4])
plt.text(.3,.5, 'Plot on row 2-3 and column 3-4')
plt.show()

# Styling

The matplotlib and seaborn libraries come with several default options on styling - darkgrid, whitegrid, dark, white, and ticks. You can use these or customise your own (I will be using some customisations throughout the notebook to give you examples)

In [None]:
# Importing the Iris Data

# iris = pd.read_csv('../input/iris/Iris.csv')
# iris.head(5)

# Creating Grid and plotting different styles
fig = plt.figure(figsize=(18, 9))
spec = gridspec.GridSpec(ncols=3, nrows=2, figure=fig)

sns.set_style("darkgrid")
ax = fig.add_subplot(spec[0,0])
plt.title("Dark Grid")
sns.stripplot(x="species", y="sepal_length", data=iris)

sns.set_style("dark")
ax = fig.add_subplot(spec[0,1])
plt.title("Dark")
sns.stripplot(x="species", y="sepal_length", data=iris)

sns.set_style("whitegrid")
ax = fig.add_subplot(spec[0,2])
plt.title("White Grid")
sns.stripplot(x="species", y="sepal_length", data=iris)

sns.set_style("white")
ax = fig.add_subplot(spec[1,0])
plt.title("White")
sns.stripplot(x="species", y="sepal_length", data=iris)

sns.set_style("ticks")
ax = fig.add_subplot(spec[1,1])
plt.title("Ticks")
sns.stripplot(x="species", y="sepal_length", data=iris)

plt.tight_layout()
plt.show()

# Despining

In Seaborn and Matplolib, you have options to despine the figures and remove axes for a more minimal look.

In [None]:
fig = plt.figure(figsize=(18, 5))
spec = gridspec.GridSpec(ncols=2, nrows=1, figure=fig)

sns.set_style("white")
ax = fig.add_subplot(spec[0,0])
sns.stripplot(x="species", y="sepal_length", data=iris)

sns.set_style("white")
ax1 = fig.add_subplot(spec[0,1])
sns.stripplot(x="species", y="sepal_length", data=iris)

for s in ['top', 'right']:
    ax.spines[s].set_visible(False)
    
for s in ['top', 'right', 'bottom', 'left']:
    ax1.spines[s].set_visible(False)

plt.show()

Note: If you wish to despine all your figures you could alternatively use the following functions:

- sns.despine()
- sns.despine(left=True, bottom=True)

In the figures above, we applied a different styling to both figures because of which we used a separate loop for both.

# Colors

The default color schemes provided by seaborn are pretty good. While there are several options available to us, I prefer using my own color schemes. For instance, if I'm performing a country wise analysis, I like to maintain a color scheme for a particular country. 

For this purpose, I use the color scheme available at https://jiffyclub.github.io/palettable/scientific/sequential/. Here, you can get a palatte up to 20 and you can choose a sequential scheme or a diverging scheme as well. In the figures below, I've shown 2 color schemes using 12 colors but you can cutomise them however you like

In [None]:
fig = plt.figure(figsize=(18, 5))
spec = gridspec.GridSpec(ncols=2, nrows=1, figure=fig)

ax1 = fig.add_subplot(spec[0,0])
plt.bar(canada_data['year'], canada_data['pop'], color=palette.Oslo_12.hex_colors)

ax1 = fig.add_subplot(spec[0,1])
plt.bar(canada_data['year'], canada_data['pop'], color=palette.LaJolla_12.hex_colors)
plt.show()

# Line Plot

In [None]:
sns.set_style('white')
fig = plt.figure(figsize=(15, 4))

# I have also added the matplotlib code for this graph if you want to use that instead
# plt.plot(canada_data['year'], canada_data['lifeExp'])
sns.set(rc={"axes.facecolor":"#283747", "axes.grid":False,'xtick.labelsize':14,'ytick.labelsize':14})
sns.lineplot(data=canada_data, x='year', y='lifeExp', color = "#FF5722", label='Canada')
sns.lineplot(data=brazil_data, x='year', y='lifeExp', color = "#FFEB3B", label='Brazil')
plt.legend(bbox_to_anchor=(1.05, 1),loc=0)
plt.show()

# Plotly
figure1 = px.line(canada_data, x="year", y="lifeExp", title='Life expectancy in Canada')
figure1.show()

# Plotly Continent Data
figure1 = px.line(asia_data, x="year", y="lifeExp", color='country',title='Life expectancy in Canada')
figure1.show()

# Scatter Plot

The figures below go to show the versatility of both seaborn and plotly when plotting even the most basic datasets (iris).

In [None]:
sns.set_style('white')
fig = plt.figure(figsize=(15, 8))

# We have used some common customisations below. You can add many more!
sns.scatterplot(data=iris, x='sepal_width', y='sepal_length', hue='petal_length', style="species", size='petal_length')
plt.legend(loc=0)
plt.show()

# Plotly
figure1 = px.scatter(iris, x="sepal_width", y="sepal_length", color='petal_length')
figure1.show()
figure2 = px.scatter(iris, x="sepal_width", y="sepal_length", color="species",
                 size='petal_length', hover_data=['petal_width'])
figure2.show()

figure3 = px.scatter(iris, x="sepal_length", y="sepal_width", marginal_x="histogram", marginal_y="rug")
figure3.show()

# Bar Plot

In [None]:
canada_data

In [None]:
americas = country_data[(country_data['continent'] == 'Americas') & (country_data['year']==2007)]

fig = plt.figure(figsize=(15, 8))
sns.barplot(data=canada_data, x='year', y='pop')
plt.show()

# Plotly
figure = px.bar(canada_data, x='year', y='pop',
             hover_data=['lifeExp', 'gdpPercap'], color='lifeExp',
             labels={'pop':'population of Canada'}, height=400)
figure.show()

figure2 = px.bar(americas, y='pop', x='country', text='pop')
figure2.update_traces(texttemplate='%{text:.2s}', textposition='outside')
figure2.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
figure2.show()

# Dist Plot

Here, we have used the function "displot" in seaborn. You can try to play around with kdeplot, histplot, distplot and others.

In [None]:
import plotly.figure_factory as ff
import numpy as np

# Add histogram data
x1 = np.random.randn(200) - 2
x2 = np.random.randn(200)
x3 = np.random.randn(200) + 2

sns.set_style('white')
hist_data = [x1, x2, x3]
group_labels = ['Group 1', 'Group 2', 'Group 3']
colors = ['#835AF1', '#7FA6EE', '#B8F7D4']

sns.displot(hist_data,kde=True) # Histogram with a kernel density estimate (Density Curve)
plt.show()

fig = ff.create_distplot(hist_data, group_labels, colors=colors, bin_size=.25,curve_type='normal', #override default 'kde'
                         show_curve=True)
fig.show()

# Bubble Plot

In [None]:
data2007 = country_data[country_data['year'] == 2007]
sns.scatterplot(data=data2007, x="gdpPercap", y="lifeExp", size="pop", hue="continent", palette="viridis", edgecolors="black", alpha=0.5, sizes=(10, 1000))
plt.legend(bbox_to_anchor=(1.05, 1),loc=0)
plt.show()

fig = px.scatter(data2007, x="gdpPercap", y="lifeExp",size="pop", color="continent", hover_name="country", log_x=True, size_max=60)
fig.show()

# 3D Plot

I'm not sure if we can do a 3D Plot in Scatter Plots so I'm not adding it yet. However, if you know how to, let me know!

In [None]:
fig = px.scatter_3d(iris, x='sepal_length', y='sepal_width', z='petal_width',color='petal_length', symbol='species')
fig.show()

# Conclusion

This notebook only contains a few possible plots from two libraries explored. I encourage you to take these as a reference and delve into the thousands of possibilities available to you with Python libraries alone.

## How can you help?

If there are any libraries or plots that you would like me to cover, leave a comment and I'll be sure to add them. Also, if you liked this notebook, I'd love it if you would leave an upvote :)