## Imports

In [23]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

## Introduction

When I first started out with my data analysis projects, I was not entirely sure which choice of graph was appropriate when exploring my data.
The goal of this notebook is to help the user understand what chart to use and when to use it depending on the data that you have.
Hope this can help you in your analytics journey!

## Line Graph

The line graph is a simple yet effective way to **analyze your data or trends over time**. 
For example, fruits sold at your local fruit store in 2021. 
Let's analyze its trends.



In [24]:
xYears =  [2017, 2018, 2019, 2020, 2021, 2022]
df = pd.DataFrame(dict(
    xYears = xYears, 
    fruitsSold = [sum(np.random.randint(0, 50, 365)) for x in range(len(xYears))],
    veggiesSold = [sum(np.random.randint(0, 50, 365)) for x in range(len(xYears))],
    ))
df['goodsSold'] = df['fruitsSold'] + df['veggiesSold']
df

Unnamed: 0,xYears,fruitsSold,veggiesSold,goodsSold
0,2017,9325,8730,18055
1,2018,8807,8668,17475
2,2019,9196,8700,17896
3,2020,9158,8753,17911
4,2021,8707,9135,17842
5,2022,9164,8781,17945


In [25]:
fig = px.line(df, x='xYears', y=df.columns[1:3], title="Fruits and Veggies Sold Since 2017", markers=True,
              color_discrete_sequence=px.colors.qualitative.G10)
fig.show()

Another example could be viewing profit over the last five years. This is a simple, quick and effective way to get a better understanding of your data.

## Bar Charts 

Bar charts are a great way to **compare data across categories**. Let's have a look!

**Note** that bar charts are another great way to view data trends over time.

In [26]:
# Canadian population over the years using bar charts.
data_canada = px.data.gapminder().query("country == 'Canada'") # creating Pandas Dataframe and querying data.
fig = px.bar(data_canada, x='year', y='pop')
fig.show()

I appreciate how pretty graphs look in Plotly. We can easily see how the increase in life expectancy over the years just with the use of colors.

In [27]:
# Let's prettify our last example
fig = px.bar(data_canada, x='year', y='pop',
             hover_data=['lifeExp', 'gdpPercap'], color='lifeExp' ,  # where color changes depending on life expectancy 
             labels={'pop':'population of Canada'}, height=400,
             color_continuous_scale=px.colors.sequential.Viridis)
fig.show()

## Heat Map
Heat Maps are a great way to visually **understand the relationship between two factors**. Let's take a look at an example.

Let's say you are building a productivity app where the user submits his levels of productivity based on the moment in the day. How can we visually find relationships between such factors.

Visually, we can see very quickly what day and time of day that this user was the most productive.


In [28]:
data=[[1, 25, 30, 50, 1], [20, 1, 60, 80, 30], [30, 60, 1, 5, 20]]
fig = px.imshow(data,
                labels=dict(x="Day of Week", y="Time of Day", color="Productivity"),
                x=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'],
                y=['Morning', 'Afternoon', 'Evening'], text_auto=True
               )
fig.update_xaxes(side="top")
fig.show()



## Treemaps

Treemaps allow us to **visualize hierarchical data as a proportion of a whole**.

Here is a great example with countries and their respective populations.

With Plotly, you can click on any part of the graph to zoom in more precisely on what you want to analyze.

In [29]:
df = px.data.gapminder().query("year == 2007")
fig = px.treemap(df, path=[px.Constant("world"), 'continent', 'country'], values='pop',
                  color='lifeExp', hover_data=['iso_alpha'],
                  color_continuous_scale='RdBu',
                  color_continuous_midpoint=np.average(df['lifeExp'], weights=df['pop']))
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show()

## Gantt Charts

Gantt charts show **duration over a time**. A good use case for this chart is illustrating a project schedule.

Let's take a look.

In [30]:
df = pd.DataFrame([
    dict(Task="Job A", Start='2009-01-01', Finish='2009-02-28', Completion_pct=50),
    dict(Task="Job B", Start='2009-03-05', Finish='2009-04-15', Completion_pct=25),
    dict(Task="Job C", Start='2009-02-20', Finish='2009-05-30', Completion_pct=75)
])

fig = px.timeline(df, x_start="Start", x_end="Finish", y="Task", color="Completion_pct", 
                  color_continuous_scale='Blues')
fig.update_yaxes(autorange="reversed")
fig.show()

## Bullet Charts

Bullet charts allow us to **evaluate performance of a metric against a goal**.

Here is the definition from the Plotly documentation:

* Stephen Few's Bullet Chart was invented to replace dashboard gauges and meters, combining both types of charts into simple bar charts with qualitative bars (steps), quantitative bar (bar) and performance line (threshold); all into one simple layout. Steps typically are broken into several values, which are defined with an array.




In [31]:
fig = go.Figure(go.Indicator(
    mode = "number+gauge+delta",
    gauge = {'shape': "bullet"},
    value = 220,
    delta = {'reference': 300},
    domain = {'x': [0, 1], 'y': [0, 1]},
    title = {'text': "Profit"}))
fig.update_layout(height = 250)

fig.show()

In [32]:
fig = go.Figure(go.Indicator(
    mode = "number+gauge+delta", value = 220,
    domain = {'x': [0.1, 1], 'y': [0, 1]},
    title = {'text' :"<b>Profit</b>"},
    delta = {'reference': 200},
    gauge = {
        'shape': "bullet",
        'axis': {'range': [None, 300]},
        'threshold': {
            'line': {'color': "red", 'width': 2},
            'thickness': 0.75,
            'value': 280},
        'steps': [
            {'range': [0, 150], 'color': "lightgray"},
            {'range': [150, 250], 'color': "gray"}]}))
fig.update_layout(height = 250)
fig.show()

## Scatterplots

Scatterplots allows us to investigate **relationships between quantitative values**.

It allows us to identify any sort of **correlation** between such quantitative values.

In [33]:
df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", trendline="ols")
fig.show()

In [34]:
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species",
                 size='petal_length', hover_data=['petal_width'])
fig.show()

## Histograms

Histograms allow us to understand the **distribution of our data**.

Here is a a great [article](https://www.analyticsvidhya.com/blog/2021/05/normal-distribution-an-ultimate-guide/#:~:text=In%20normally%20distributed%20data%2C%20there,standard%20deviations%20of%20the%20mean.) explaining the properties of a normal distribution.

In [35]:
df = px.data.tips()
fig = px.histogram(df, x="total_bill", nbins=20)
fig.show()

## Area Maps

Aree maps are graphs that **highlight the area** of a specific location. 

Here is an example:

In [36]:
fig = go.Figure(go.Scattermapbox(
    fill = "toself",
    lon = [-74, -70, -70, -74], lat = [47, 47, 45, 45],
    marker = { 'size': 10, 'color': "orange" }))

fig.update_layout(
    mapbox = {
        'style': "stamen-terrain",
        'center': {'lon': -73, 'lat': 46 },
        'zoom': 5},
    showlegend = False)

fig.show()

## Box-and-Whisker Plots

Box-and-whisker plots show the **distribution of a set of data**. 

In most cases, a histogram analysis provides a sufficient display, but a box and whisker plot can provide additional detail while allowing multiple sets of data to be displayed in the same graph.

In [37]:
df = px.data.tips()
fig = px.box(df, x="time", y="total_bill", points="all")
fig.show()

In [38]:
data = [1, 2, 3, 4, 5, 6, 7, 8, 9]

fig = go.Figure()
fig.add_trace(go.Box(y=data, quartilemethod="linear", name="Linear Quartile Mode"))
fig.add_trace(go.Box(y=data, quartilemethod="inclusive", name="Inclusive Quartile Mode"))
fig.add_trace(go.Box(y=data, quartilemethod="exclusive", name="Exclusive Quartile Mode"))
fig.update_traces(boxpoints='all', jitter=0)
fig.show()