<a href="https://colab.research.google.com/github/gonzalo711/-DAPT-Project-2-FIFA-Money-Ball/blob/main/1_plotly.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Plotly

-  Plotly Python library is an interactive open-source library. This can be a very helpful tool for data visualization and understanding the data simply and easily. plotly graph objects are a high-level interface to plotly which are easy to use. It can plot various types of graphs and charts like scatter plots, line charts, bar charts, box plots, histograms, pie charts, etc.



_Why plotly over other visualization tools or libraries?_

1. Plotly has hover tool capabilities that allow us to detect any outliers or anomalies in a large number of data points.
2. It is visually attractive, which  can be accepted by a wide range of audiences.
3. It allows us for the endless customization of our graphs that makes our plot more meaningful and understandable for others.
4. It is user friendly and interactive.

## Getting started with Plotly:

#### Plotly sub-modules:

There are three main modules in Plotly. They are:

- **plotly.plotly** : acts as the interface between the local machine and Plotly. It contains functions that require a response from Plotly’s server.

- **plotly.graph.objects** : contains the objects (Figure, layout, data, and the definition of the plots like scatter plot, line chart) that are responsible for creating the plots.  The Figure can be represented either as dict or instances of plotly.graph_objects.Figure and these are serialized as JSON before it gets passed to plotly.js. Consider the below example for better understanding.

- **plotly.tools** : the tools submodule provides a variety of utility functions and classes that can help simplify and enhance the process of creating visualizations with Plotly.


Finally there's another sub-module called plotly.express that creates the entire Figure at once. It uses the graph_objects internally and returns the graph_objects.Figure instance.

For the course of this tutorial we'll mostly use __plotly.express__ !

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px

In [2]:
# import dataset

superstore = pd.read_csv("Sample - Superstore.csv")
superstore.head(3)

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,...,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
0,1,CA-2016-152156,08/11/2016,11/11/2016,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,2,0.0,41.9136
1,2,CA-2016-152156,08/11/2016,11/11/2016,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94,3,0.0,219.582
2,3,CA-2016-138688,12/06/2016,16/06/2016,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,...,90036,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62,2,0.0,6.8714


##### Knowing a figure metadata:

In [3]:
# Creating the Figure instance
fig = px.line(x=[1,2, 3], y=[1, 2, 3])

# printing the figure instance
print(fig)

fig.show()



Figure({
    'data': [{'hovertemplate': 'x=%{x}<br>y=%{y}<extra></extra>',
              'legendgroup': '',
              'line': {'color': '#636efa', 'dash': 'solid'},
              'marker': {'symbol': 'circle'},
              'mode': 'lines',
              'name': '',
              'orientation': 'v',
              'showlegend': False,
              'type': 'scatter',
              'x': array([1, 2, 3]),
              'xaxis': 'x',
              'y': array([1, 2, 3]),
              'yaxis': 'y'}],
    'layout': {'legend': {'tracegroupgap': 0},
               'margin': {'t': 60},
               'template': '...',
               'xaxis': {'anchor': 'y', 'domain': [0.0, 1.0], 'title': {'text': 'x'}},
               'yaxis': {'anchor': 'x', 'domain': [0.0, 1.0], 'title': {'text': 'y'}}}
})


- Figures are represented as trees where the root node has three top layer attributes – data, layout, and frames and the named nodes called ‘attributes’. Consider the above example, layout.legend is a nested dictionary where the legend is the key inside the dictionary whose value is also a dictionary.  

- plotly.tools module contains various tools in the forms of the functions that can enhance the Plotly experience.

In [4]:
fig["layout"]

Layout({
    'legend': {'tracegroupgap': 0},
    'margin': {'t': 60},
    'template': '...',
    'xaxis': {'anchor': 'y', 'domain': [0.0, 1.0], 'title': {'text': 'x'}},
    'yaxis': {'anchor': 'x', 'domain': [0.0, 1.0], 'title': {'text': 'y'}}
})

In [5]:
fig.show()

#### Line Plot ( _px.line() function_ ):

This function is used to create a line plot. It can also be created using the pandas dataframe where each row of data_frame is represented as vertex of a polyline mark in 2D space.

##### Parameters and Syntax:

##### Syntax:
plotly.express.line(data_frame=None, x=None, y=None, line_group=None, color=None, line_dash=None, hover_name=None, hover_data=None, title=None, template=None, width=None, height=None)

##### Parameters:

- data_frame: DataFrame or array-like or dict needs to be passed for column names.

- x, y: This parameters is either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to position marks along the x and y axis in cartesian coordinates respectively.

- color: This parameters assign color to marks.

- line_group: This parameter is used to group rows of data_frame into lines.

- line_dash: This parameter is used to assign dash-patterns to lines.

- hover_name:  Values from this column or array_like appear in bold in the hover tooltip.

- hover_data: This parameter is used to appear in the hover tooltip or tuples with a bool or formatting string as first element, and list-like data to appear in hover as second element Values from these columns appear as extra data in the hover tooltip.

In [None]:
superstore.head()

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,State,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales
0,1,CA-2017-152156,08/11/2017,11/11/2017,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96
1,2,CA-2017-152156,08/11/2017,11/11/2017,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94
2,3,CA-2017-138688,12/06/2017,16/06/2017,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,California,90036.0,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62
3,4,US-2016-108966,11/10/2016,18/10/2016,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775
4,5,US-2016-108966,11/10/2016,18/10/2016,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368


In [6]:
superstore.isnull().sum()

Row ID           0
Order ID         0
Order Date       0
Ship Date        0
Ship Mode        0
Customer ID      0
Customer Name    0
Segment          0
Country          0
City             0
State            0
Postal Code      0
Region           0
Product ID       0
Category         0
Sub-Category     0
Product Name     0
Sales            0
Quantity         0
Discount         0
Profit           0
dtype: int64

In [7]:
superstore.dtypes

Row ID             int64
Order ID          object
Order Date        object
Ship Date         object
Ship Mode         object
Customer ID       object
Customer Name     object
Segment           object
Country           object
City              object
State             object
Postal Code        int64
Region            object
Product ID        object
Category          object
Sub-Category      object
Product Name      object
Sales            float64
Quantity           int64
Discount         float64
Profit           float64
dtype: object

In [8]:
superstore["Order Date"] = pd.to_datetime(superstore["Order Date"])
superstore["Ship Date"] = pd.to_datetime(superstore["Ship Date"])


Parsing dates in DD/MM/YYYY format when dayfirst=False (the default) was specified. This may lead to inconsistently parsed dates! Specify a format to ensure consistent parsing.


Parsing dates in DD/MM/YYYY format when dayfirst=False (the default) was specified. This may lead to inconsistently parsed dates! Specify a format to ensure consistent parsing.



In [9]:
superstore.sort_values(by='Order Date', inplace = True)

plot = px.line(
    superstore,
    x = 'Order Date',
    y = 'Sales',
    title= " Sales by Order Date",
    markers="lines+markers")

plot.show()

In [10]:
# Default and common axis hover:

plot = px.line(
    superstore.sort_values(by="Order Date"),
    x = 'Order Date',
    y = 'Sales',
    title= " Sales by Order Date",
    color = 'Segment')

plot.update_layout(hovermode = "x")
plot.show()

In [None]:
# Different hover_mode:

plot = px.line(
    superstore.sort_values(by="Order Date"),
    x = 'Order Date',
    y = 'Sales',
    title= " Sales by Order Date",
    color = 'Segment')

plot.update_layout(hovermode = "x unified")
plot.show()

In [None]:
# update figure:

plot.update_traces(mode="markers+lines", hovertemplate=None)
plot.update_layout(hovermode = None)

##### Bar Charts:

In [None]:
# plotting the bar chart

category_sales = (
    superstore
    .groupby(["Segment", "Category"])
    .agg({"Sales":"sum"})
    .sort_values(by="Sales", ascending=False)
    .reset_index()
)

fig = px.bar(category_sales, x="Segment", y="Sales", color = "Category")

# showing the plot
fig.show()

In [None]:
category_sales

Unnamed: 0,Segment,Category,Sales
0,Consumer,Technology,401011.665
1,Consumer,Furniture,387696.258
2,Consumer,Office Supplies,359352.608
3,Corporate,Technology,244041.837
4,Corporate,Office Supplies,224130.536
5,Corporate,Furniture,220321.7018
6,Home Office,Technology,182402.371
7,Home Office,Office Supplies,121939.19
8,Home Office,Furniture,120640.6159


##### Grouped bar char:



In [None]:
fig = px.bar(
    category_sales,
    x="Sales",
    y="Category",
    color = "Segment",
    barmode = "group",
    orientation="h",
    text='Sales' )

fig.update_traces(textposition="outside")
fig.show()

##### Stacked Bar Chart:

- A stacked bar chart or  graph is a chart that uses bars to demonstrate comparisons between categories of data, but with ability to impart and compare parts of a whole. Each bar in the chart represents a whole and segments which represent different parts or categories of that whole.

In [None]:
# plotting the bar chart
fig = px.bar(category_sales, x="Category", y="Sales", color = "Segment", barmode="stack")

# showing the plot
fig.show()


##### Percentual Stacked Bar Chart:

In [None]:
# create percentual share:
category_sales["segment_share"] = category_sales['Sales'] / category_sales.groupby('Category')['Sales'].transform('sum')

# plotting the bar chart
fig = px.bar(category_sales, x="Category", y="segment_share", color = "Segment", barmode="stack",  text='segment_share', title="Segment share in each Category")
fig.show()

##### Histograms:

An histogram contains a rectangular area to display the statistical information which is proportional to the frequency of a variable and its width in successive numerical intervals. A graphical representation that manages a group of data points into different specified ranges. It has a special feature that shows no gaps between the bars and similar to a vertical bar graph.

In [None]:
# Since the technology Category is the one with the highest sales, let's analyze the distribution of number of orders per day:

tech_supplies = superstore[superstore["Category"] == "Technology"].groupby("Order Date").agg({"Order ID": pd.Series.nunique}).reset_index()
orders_quantity_daily = pd.DataFrame(tech_supplies.groupby("Order ID")["Order Date"].count()).reset_index()

# Plotting Histogram
fig = px.histogram(orders_quantity_daily, x="Order ID", y="Order Date")
fig.update_xaxes(title_text = "Order Quantity per Day")
fig.update_yaxes(title_text = "Frequency of Orders")
fig.update_layout(
    title = dict(text="Order Quantity Frequency on a Daily Basis",
                 font=dict(size=35),
                 automargin = True)
)

# showing the plot
fig.show()

##### Scatter Plot:

- A scatter plot is a set of dotted points to represent individual pieces of data in the horizontal and vertical axis. A graph in which the values of two variables are plotted along X-axis and Y-axis, the pattern of the resulting points reveals a correlation between them. It's particularly useful to understand relationships between two variables and check dataset dispersion.

In [None]:
# Plotting Scatter chart

# It may be interesting to understand what's the time between Order Data and Shipping Date:

superstore["Order Processing Days"] = (superstore['Ship Date'] - superstore['Order Date']).dt.days

fig = px.scatter(superstore, x="Row ID", y="Order Processing Days", color = "Segment")

# showing the plot
fig.show()

In [None]:
# Once again, let's filter for Technology Category:

tech_scatter = superstore[superstore["Category"] == "Technology"].groupby(["Row ID","Segment","Order Processing Days"]).agg({"Sales": "sum"}).reset_index()

fig = px.scatter(
    tech_scatter,
    x="Row ID",
    y="Order Processing Days",
    size="Sales",
    color ="Segment",
    hover_data=["Segment"]
)
fig.show()


##### Pie Chart:

- A pie chart is a circular statistical graphic, which is divided into slices to illustrate numerical proportions. It depicts a special chart that uses “pie slices”, where each sector shows the relative sizes of data. A circular chart cuts in a form of radii into segments describing relative frequencies or magnitude also known as circle graph.

In [None]:
# Plotting pie chart
fig = px.pie(superstore, values="Sales", names="Segment", color_discrete_sequence=px.colors.sequential.RdBu)

# showing the plot
fig.show()

##### Box-plot chart:

- Also known as Whisker plot, Box-plot is created to display the summary of the set of data values having properties like minimum, first quartile, median, third quartile and maximum. In the box plot, a box is created from the first quartile to the third quartile, a vertical line is also there which goes through the box at the median.

In [None]:
# Plotting box chart
fig = px.box(superstore, x="Category", y="Sales", color="Category")

# showing plot
fig.show()

In [None]:
#other view:
fig = px.box(superstore, x = "Category", y="Sales", points="all", color="Category")
fig.show()

##### Violin Plots:

- Violin Plot is a method to visualize the distribution of numerical data of different variables. It is similar to Box Plot but with a rotated plot on each side, giving more information about the density estimate on the y-axis. The density is mirrored and flipped over and the resulting shape is filled in, creating an image resembling a violin. The advantage of a violin plot is that it can show nuances in the distribution that aren’t perceptible in a boxplot. On the other hand, the boxplot more clearly shows the outliers in the data.

In [None]:
# Plotting Violin chart
fig = px.violin(superstore, x="Category", y="Sales", color="Segment")

# showing plot
fig.show()

##### Gantt (Generalized Activity Normalization Time Table ) Chart:

- Chart in which series of horizontal lines are present that show the amount of work done or production completed in given period of time in relation to amount planned for those projects.  

In [None]:
import plotly.figure_factory as ff

# Data to be plotted

df = [dict(Task="Define Requirements", Start='2020-01-01', Finish='2020-01-08'),
    dict(Task="Perform Analysis", Start='2020-01-06', Finish='2020-01-15'),
    dict(Task="Build Dashboard", Start='2020-01-15', Finish='2020-02-01')]

# Creating the plot
fig = ff.create_gantt(df)
fig.show()

##### Map Chart:

In [None]:
# upper case all state Names:
superstore["State"] = superstore["State"].str.upper()

# replace name by correct state:
superstore["State"] = superstore["State"].str.replace('DISTRICT OF COLUMBIA', 'WASHINGTON')

In [None]:
states = ['ALABAMA', 'ALASKA', 'ARIZONA', 'ARKANSAS', 'CALIFORNIA', 'COLORADO', 'CONNECTICUT', 'DELAWARE', 'FLORIDA', 'GEORGIA', 'HAWAII', 'IDAHO', 'ILLINOIS', 'INDIANA', 'IOWA', 'KANSAS', 'KENTUCKY', 'LOUISIANA', 'MAINE', 'MARYLAND', 'MASSACHUSETTS', 'MICHIGAN', 'MINNESOTA', 'MISSISSIPPI', 'MISSOURI', 'MONTANA', 'NEBRASKA', 'NEVADA', 'NEW HAMPSHIRE', 'NEW JERSEY', 'NEW MEXICO', 'NEW YORK', 'NORTH CAROLINA', 'NORTH DAKOTA', 'OHIO', 'OKLAHOMA', 'OREGON', 'PENNSYLVANIA', 'RHODE ISLAND', 'SOUTH CAROLINA', 'SOUTH DAKOTA', 'TENNESSEE', 'TEXAS', 'UTAH', 'VERMONT', 'VIRGINIA', 'WASHINGTON', 'WEST VIRGINIA', 'WISCONSIN', 'WYOMING']
abrev = ['AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY']
state_abrev = dict(zip(states, abrev))

In [None]:
def get_state_abbr(state_name):
    return state_abrev.get(state_name)

superstore['State Abbr'] = superstore['State'].apply(get_state_abbr)

In [None]:
# how many states we couldn't match?

superstore[superstore['State Abbr'].isna()]["State"].unique()

array([], dtype=object)

In [None]:
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot, plot

# group sales by state
state_sales = superstore.groupby(["State","State Abbr"]).agg({"Sales":"sum"}).reset_index()

data = dict(type = 'choropleth',
            locations = state_sales["State Abbr"],
            locationmode = 'USA-states',
            z = state_sales["Sales"],
            text = state_sales["State Abbr"],
            colorbar = {'title': 'Sales'})

# define layout
layout = dict(geo ={'scope': 'usa'})

# create map
choromap = go.Figure(data = [data], layout = layout)

# plotting graph
iplot(choromap)

##### 3D Heatmap:

- Surface plot is those plot which has three-dimensions data which is X, Y, and Z. Rather than showing individual data points, the surface plot has a functional relationship between dependent variable Y and have two independent variables X and Z. This plot is used to distinguish between dependent and independent variables.

In [None]:
x = np.outer(np.linspace(-2, 2, 30), np.ones(30))
y = x.copy().T
z = np.cos(x ** 2 + y ** 2)

# plotting the figure
fig = go.Figure(data=[go.Surface(x=x, y=y, z=z)])

fig.show()

#### Elements:

##### - Drop-down menu:

In [None]:
import plotly.graph_objects as px

# Data to be Plotted
random_x = superstore["Category"]
random_y = superstore["Sales"]

plot = px.Figure(data=[px.Scatter(
    x=random_x,
    y=random_y,
    mode='markers')
])

# Add dropdown
plot.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(
                    args=["type", "scatter"],
                    label="Scatter Plot",
                    method="restyle"
                ),
                dict(
                    args=["type", "bar"],
                    label="Bar Chart",
                    method="restyle"
                ),
                dict(
                    args=["type", "pie"],
                    label="Pie Chart",
                    method="restyle"
                )
            ]),
            direction="down",
        ),
    ]
)

plot.show()

##### - Buttons:

In [None]:
# Data to be Plotted
random_x = superstore["Category"]
random_y = superstore["Sales"]

plot = px.Figure(data=[px.Scatter(
    x=random_x,
    y=random_y,
    mode='markers')
])

# Add dropdown
plot.update_layout(
    updatemenus=[
        dict(
            type="buttons",
            buttons=list([
                dict(
                    args=["type", "scatter"],
                    label="Scatter Plot",
                    method="restyle"
                ),
                dict(
                    args=["type", "bar"],
                    label="Bar Chart",
                    method="restyle"
                ),
                dict(
                    args=["type", "pie"],
                    label="Pie Chart",
                    method="restyle"
                )
            ]),
            direction="left",
        ),
    ]
)

plot.show()