In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Introduction

Hi Welcome fellow visualization enthusiast!

Reading a visualization is fun, but it can become more fun if you look at a more beautiful visualizations. For me personally, I believe that a good data visualization is not only a visualization that gives a meaningful insight but also pleasant to watch. It's not only about how you present your data to be insightful but also how you present it to your audience in a way that is pleasant to watch. Much like a software, a software that only focuses on it's functionality and ignores the UI/UX sides will make the website looks, meh.

So in this notebook I want to try to share you, the viewers, about my experience on how to make your visualization stands out for beginners. So before that a little disclaimer, I'm also still learning about visualization and I appreciate your feedbacks! 

# Colors

Colors is very important in any visual representation. Whether it is on a software, banner, and in this case data visualization.

## Don't Choose Default Color Palette

Make your own! (or steal from color palette generator hehe)

The default color choices in almost any data visualization packate for example: Matplotlib, Seaborn, etc. is kinda ... suck. It's color seems dead or pale or ... you named it. See this color palette below.

In [None]:
import seaborn as sns

colors_default = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']

sns.palplot(colors_default)

See, it looks soo pale and dead. So how do we choose a good color?

Well I'm no expert at color but from my experience a good color is a color that doesn't put too much stress on your eyes, a color that is not too contrast. If you want to learn the theory behind color, I suggest you learn about how to choose colors in UI/UX perspective. Since we are not UI/UX designer let's use our intuition OR use color palette generator! You can use this website called [colors.co](https://coolors.co/generate)

Tip: 
* My personal workflow is to split your color palette into groups of colors as shown as below
* **DO NOT** choose absolute black #00000, slides it a little like for example: #33333

In [None]:
import seaborn as sns

colors_aft = ['#2C302E', '#474A48', '#909590', '#9AE19D', '#537A5A']

colors_blue = ["#132C33", "#264D58", '#17869E', '#51C4D3', '#B4DBE9']
colors_dark = ["#1F1F1F", "#313131", '#636363', '#AEAEAE', '#DADADA']
colors_red = ["#331313", "#582626", '#9E1717', '#D35151', '#E9B4B4']
colors_mix = ["#17869E", '#264D58', '#179E66', '#D35151', '#E9DAB4', '#E9B4B4', '#D3B651', '#6351D3']

sns.palplot(colors_aft)

sns.palplot(colors_blue)
sns.palplot(colors_dark)
sns.palplot(colors_red)
sns.palplot(colors_mix)


Heres the difference between default vs my cherry picked color

In [None]:
sns.palplot(colors_mix)
sns.palplot(colors_default)

Okay let's compare the differences you can make when you choose a colors that is more lively!

In [None]:
import pandas as pd
import matplotlib.pyplot as plt 

data = pd.read_csv("/kaggle/input/titanic/train.csv")
df_1 = data.groupby('Embarked').count()['PassengerId'].reset_index()
df_2 = data.groupby(['Sex', 'Embarked']).count()['PassengerId'].unstack().reset_index()

fig, ax = plt.subplots(1, 3, figsize=(16, 6))
ax[0].bar(df_1.Embarked, df_1.PassengerId)
df_2.plot(kind='bar', ax=ax[1])
sns.kdeplot(data=data, x='Fare', shade=True, ax=ax[2])

plt.show()

Now if we use our defined colors. Oh and another tips is that you can adjust the opacity of your visualization by adding argument alpha, my rule of thumb is to set alpha to 0.8 but of course it is up to you. By applying opacity it makes the colors more "passive"-ey (I dont know how to describe it) soo it makes the colors a little bit pleaseant to see.

Tip: 
* add opacity (alpha)
* add edgecolor, but **DO NOT** use absolute black

In [None]:
import pandas as pd
import matplotlib.pyplot as plt 

data = pd.read_csv("/kaggle/input/titanic/train.csv")
df_1 = data.groupby('Embarked').count()['PassengerId'].reset_index()
df_2 = data.groupby(['Sex', 'Embarked']).count()['PassengerId'].unstack().reset_index()

fig, ax = plt.subplots(1, 3, figsize=(16, 6))

ax[0].bar(df_1.Embarked, df_1.PassengerId, color=colors_blue[2], alpha=0.8, edgecolor=colors_dark[1])
df_2.plot(kind='bar', ax=ax[1], color=colors_mix[0:3], alpha=0.8, edgecolor=colors_dark[1])
sns.kdeplot(data=data, x='Fare', shade=True, ax=ax[2], color=colors_blue[2], alpha=0.3)

plt.show()

See! By adjusting the right color selection you already improved your viz! and this is just by playing the colors right so lets move on

# Don't Just Use Title!

One of the most important thing in data visualization is about how you tell a story about your visualization right? well then create a story!

Show your audience a short overview about what you present in the visualization that you create. One can use suptitle or Text below the title. In this way we not only create a visualization that is not too "lonely" but it also gives the audience a little information on what you are showing 

So what's important is that you don't want your viewers to think too hard about your visualization. By adding a little overview it helps the viewer to understand atleast the context about what you are visualizing

Let's use another example

In [None]:
data = pd.read_csv("/kaggle/input/world-happiness-report-2021/world-happiness-report-2021.csv")
SEA = data[data['Regional indicator'] == "Southeast Asia"]['Country name'].to_list()
def getSea(row) : 
    if row['Country name'] == "Indonesia" : 
        return "Indonesia"
    elif row['Country name'] in SEA : 
        return "SEA"
    else : 
        return "Other"

In [None]:
df = data
df['Countries'] = df.apply(lambda x: getSea(x), axis=1)
meanx=df['Social support'].mean()
meany=df['Healthy life expectancy'].mean()
singx=df[df['Country name'] == 'Singapore']['Social support']
singy=df[df['Country name'] == 'Singapore']['Healthy life expectancy']

fig, ax = plt.subplots(figsize=(18, 8), dpi=75)

sns.scatterplot(
    data=df, 
    x='Social support', 
    y='Healthy life expectancy', 
    size='Logged GDP per capita', 
    ax=ax, sizes=(5, 1000),
    alpha=0.9,
    hue='Countries',
    palette=[colors_dark[4], colors_blue[1], colors_red[2]]
)
linex = ax.axvline(meanx, linestyle='dotted', color=colors_dark[1], alpha=0.8, label='Average')
liney = ax.axhline(meany, linestyle='dotted', color=colors_dark[1], alpha=0.8)
text  = ax.text(
    s="Singapore",
    x=singx-0.013,
    y=singy+1.5,
    color=colors_dark[2]
)

# Focus only here

ax.legend(bbox_to_anchor=(1.05, 1), ncol=1, borderpad=1, frameon=False, fontsize=12)
ax.set_xlabel("Social support", fontsize=14, labelpad=10, fontweight='bold', color=colors_dark[0])
ax.set_ylabel("Healthy life expectancy", fontsize=14, labelpad=10, fontweight='bold', color=colors_dark[0])
xmin, xmax = ax.get_xlim()
ymin, ymax = ax.get_ylim()

plt.title("Healthy Life Expectancy Vs Social Support", fontsize=18, color=colors_dark[0])
plt.show()

So let's see what's wrong in the visualization above. It's confusing for the viewers right?

The problems are: 
1. The title is confusing
2. Viewers don't understand the context
3. Viewers need to think hard in order to understand the visualization

So to make your viewers more understand about what you are visualizing you can do: 
1. Use the appropriate title
2. Use a suptitle to explain a little overview about what is that you visualizing, and what is that you want to explain

So let's add a quick fix by adjusting the title and adding a little suptitle

In [None]:
df = data
df['Countries'] = df.apply(lambda x: getSea(x), axis=1)
meanx=df['Social support'].mean()
meany=df['Healthy life expectancy'].mean()
singx=df[df['Country name'] == 'Singapore']['Social support']
singy=df[df['Country name'] == 'Singapore']['Healthy life expectancy']

fig, ax = plt.subplots(figsize=(18, 8), dpi=75)

sns.scatterplot(
    data=df, 
    x='Social support', 
    y='Healthy life expectancy', 
    size='Logged GDP per capita', 
    ax=ax, sizes=(5, 1000),
    alpha=0.9,
    hue='Countries',
    palette=[colors_dark[4], colors_blue[1], colors_red[2]]
)
linex = ax.axvline(meanx, linestyle='dotted', color=colors_dark[1], alpha=0.8, label='Average')
liney = ax.axhline(meany, linestyle='dotted', color=colors_dark[1], alpha=0.8)
text  = ax.text(
    s="Singapore",
    x=singx-0.013,
    y=singy+1.5,
    color=colors_dark[2]
)

# Focus only here

ax.legend(bbox_to_anchor=(1.05, 1), ncol=1, borderpad=1, frameon=False, fontsize=12)
ax.set_xlabel("Social support", fontsize=14, labelpad=10, fontweight='bold', color=colors_dark[0])
ax.set_ylabel("Healthy life expectancy", fontsize=14, labelpad=10, fontweight='bold', color=colors_dark[0])
xmin, xmax = ax.get_xlim()
ymin, ymax = ax.get_ylim()

plt.text(s="Social support, Healthy life expectancy\nand GDP per Capita", ha='left', x=xmin, y=ymax*1.04, fontsize=24, fontweight='bold', color=colors_dark[0])
plt.title("It seems that Indonesia still falls on the third quadrants with 3 other SEA countries still have room for imrpovement\nSingapore have the best score among SEA countries", loc='left', fontsize=13, color=colors_dark[2])  
plt.show()

well you can see that I use text instead of suptitle, a little hacks that I use, since using Text is a lot more flexible in my opinion

So yeah, it's a little bit better, the challenge for me or you the viewers is try to comes up with a good overview.

Tip: 
1. You can use ax.get_xlim() and ax.get_ylim() to get the axes sizes. This way you can position your Text, title, etc more neat
2. Use different color but not too different between Title and suptitle

# Setting Up Your Viz

The thing that I love about matplotlib is that it is highly customizable. Almost anything that you see in your plot, you can customize! IF you read do documentation. If you want to make the default look of your matplotlib different you can always search google for it. In this section I wanted to share about how I set up my visualization.

## Grid

Default grid sucks, the problem with default matplotlib grid is that its alpha is 1, its on two axes, and its always in front. Look at the visualization below.

The problem with that grid is that it makes the data that is presented in the canvas harder to see, especially in scatter plots. And if you use the default grid on bar plot its even worse

In [None]:
import matplotlib.pyplot as plt 
import pandas as pd 

data = pd.read_csv("/kaggle/input/iris/Iris.csv")
df = data.groupby('Species').mean()['SepalLengthCm']

fig, ax = plt.subplots(1, 2, figsize=(16, 8))

ax[0].scatter(data.SepalLengthCm, data.SepalWidthCm)
ax[0].grid()

ax[1].bar(df.index, df.values)
ax[1].grid()

plt.show()

To fix this you can use three things
* Move The axis to back by using Axes.set_axisbelow(True)
* Change its opacity (alpha)
* Toggle X and Y axes

Tip: 
1. When choosing alpha make sure it is barely recognizable
2. If the bar plot is standing then use y axis, else use x

Lets fix that

In [None]:
import matplotlib.pyplot as plt 
import pandas as pd 

data = pd.read_csv("/kaggle/input/iris/Iris.csv")
df = data.groupby('Species').mean()['SepalLengthCm']

fig, ax = plt.subplots(1, 2, figsize=(16, 8))

ax[0].scatter(data.SepalLengthCm, data.SepalWidthCm)
ax[0].grid(alpha=0.4)
ax[0].set_axisbelow(True)

ax[1].bar(df.index, df.values)
ax[1].grid(axis='y', alpha=0.4)
ax[1].set_axisbelow(True)

plt.show()

## Axes

Did you know that you can remove the axis spines on your plot? yesh. There are times that you want to remove the spine, for example the top spine and the right span, by removing those spine it can make your viz more "open" (again i dont know how to describe that) but then again it is up to you let's use the previous example

In [None]:
import matplotlib.pyplot as plt 
import pandas as pd 

data = pd.read_csv("/kaggle/input/iris/Iris.csv")
df = data.groupby('Species').mean()['SepalLengthCm']

fig, ax = plt.subplots(1, 2, figsize=(16, 8))

ax[0].scatter(data.SepalLengthCm, data.SepalWidthCm)
ax[0].grid()
ax[0].spines['right'].set_visible(False)
ax[0].spines['top'].set_visible(False)

ax[1].bar(df.index, df.values)
ax[1].grid()
ax[1].spines['right'].set_visible(False)
ax[1].spines['top'].set_visible(False)
ax[1].spines['left'].set_visible(False)


plt.show()

# Final

Now to finalize this guide lets turn the previous example with the knowledge that I've shared before

List to change: 
* Colors
* Title and Suptitle
* Grid
* Spines

In [None]:
import matplotlib.pyplot as plt 
import pandas as pd 

data = pd.read_csv("/kaggle/input/iris/Iris.csv")
df = data.groupby('Species').mean()['SepalLengthCm']

fig, ax = plt.subplots(figsize=(16, 8))

sns.scatterplot(x=data.SepalLengthCm, y=data.SepalWidthCm, palette=colors_mix[0:3], alpha=1, hue=data.Species)

# Change grid to back and set opacity
ax.grid(alpha=0.2)
ax.set_axisbelow(True)

ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

# Label
ax.set_xlabel("Sepal Length Cm", fontsize=14, labelpad=10, fontweight='bold', color=colors_dark[0])
ax.set_ylabel("Sepal Width Cm", fontsize=14, labelpad=10, fontweight='bold', color=colors_dark[0])

xmin, xmax = ax.get_xlim()
ymin, ymax = ax.get_ylim()

# Title
plt.text(s="Stand Out Your Viz | Iris Dataset", ha='left', x=xmin, y=ymax*1.108, fontsize=24, color=colors_dark[0])
plt.text(s="How to Differentiate Iris Species By Its Size", ha='left', x=xmin, y=ymax*1.07, fontsize=24, fontweight='bold', color=colors_dark[0])
plt.title("This visualization shows us that the species of Iris plants can be identified by its sepal sizes\nIris-setosa is more likely has a relatively short sepal length but with a wider sepal width", loc='left', fontsize=13, color=colors_dark[2]) 

plt.show()

In [None]:
import matplotlib.pyplot as plt 
import pandas as pd 

data = pd.read_csv("/kaggle/input/iris/Iris.csv")
df = data.groupby('Species').mean()['SepalLengthCm']

fig, ax = plt.subplots(figsize=(16, 8))

bars = ax.bar(df.index, df.values, color=colors_dark[4], edgecolor=colors_dark[3], alpha=0.7)

# You can set individual bars colors, alpha, etc
bars[2].set_alpha(0.7)
bars[2].set_color(colors_mix[2])
bars[2].set_edgecolor(colors_dark[0])

ax.grid(axis='y', alpha=0.2)
ax.set_axisbelow(True)

ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False)

# Label
ax.set_xlabel("Species", fontsize=14, labelpad=10, fontweight='bold', color=colors_dark[0])
ax.set_ylabel("Average Sepal Length Cm", fontsize=14, labelpad=10, fontweight='bold', color=colors_dark[0])

xmin, xmax = ax.get_xlim()
ymin, ymax = ax.get_ylim()

# Title
plt.text(s="Stand Out Your Viz | Iris Dataset", ha='left', x=xmin, y=ymax*1.18, fontsize=24, color=colors_dark[0])
plt.text(s="How to Differentiate Iris Species By Its Size", ha='left', x=xmin, y=ymax*1.12, fontsize=24, fontweight='bold', color=colors_dark[0])
plt.title("This visualization shows us that the species of Iris plants can be identified by its sepal sizes\nIn average Iris-virginica has the most tall sepal length", loc='left', fontsize=13, color=colors_dark[2]) 

plt.show()


plt.show()

# Thats It

I guess there it is, my guide on how to make you visualization more stand out. There are still a lot that you can do to improve from this, but I hope that this notebook give you the minimal knowledge on how to make a beautiful viz.

Things that you can learn more: 
* Chosing the right plot
* Emphasize your purpose on your plots by highlighting what you want the viewers to see
* Story telling

Hope this helps! if there are any criticism, feedback, or something that you want to add up please feel free to comments!