<a href="https://colab.research.google.com/github/wcj365/data690fall21/blob/main/plotly-intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chapter 1 - Introduction 

## 1.1 The Basics

Plotly means many related things. At the core, it refers to Plotly.js, a Java Script library for data visualization. Plotly.js itself is based on another Java Script library called D3.js. D3 stands for Data-Driven Document. D3.js brings data to life by manipulating documents using HTML, SVG, CSS, and Java Script.

In the context of this book, Plotly refers to `Plotly Python`, a Python library based on Ploly.js. Beside Python, Plotly.js has been made avaialble for other programming languages including R, Julia, and Matlab.

Plotly also refers to the Canadian company that developed the afore-mentioned `Plotly Python` library. The company also developed Dash and Chart Studio on top of Plotly Python. Dash provides a Python framework for developing interactive dashboards and web applications. Chart Studio is a web-based drag-and-drop tool for generating interactive visualizations without the need for coding.  

This book introduces Plotly Python library via examples. The readers are expected to have some familarity with Python, Numpy, and Pandas. In particular, the readers should be proficient with Python's List and Dictionary, Numpy's random number generation, and Pandas's data processing and aggregation function.  

This book is meant to be read and run. The entire book is available online and can be opened using Jupyter Notebook for hands-on practices.

This chapter illustrates the fundamental concepts of data visualization and Plotly. 

###  1.1.1 Data Types

Data can be boardly classfied into two types:
- Numerical 
    - Interval
    - Ratio
- Categorical
    - Ordinal
    - Nominal

```
# This is formatted as code
```



### 1.1.2 Visual Markers

Data visualization employs visual markers to brign data to life. A marker represents a data point. A marker has four major properties:

- Shape (point, line,  square, rectangle, triangle, area, etc.)
- Position or location (coordinates on axies)
- Size 
- Color 

### 1.1.3 Plotly Structure

Plotly uses Java Script Object Notation (JSON) format to describe how data are visualized. JSON is a standard format for web applications and data integrations. It is similar to Python's dictionary object and uses key-value pairs to describe data and computing instructions.

A Plotly data visualization is represented by a **Figure** object. A figure has two components: **Data** and **Layout**. 

The data component is is a list of **Traces**. A trace describes any predefined type of charts such as boxplot, bar chart, and scatter plot and any custom-coded type of charts.

The layout component describes the overall characteristics of a figure such as its title, legend, and titles of the axes among many others. 


A sophisticated visualization can be implemented by incorporating multiple traces each representing a unique visual component with customized layout.

![](https://github.com/wcj365/plotly-python/blob/main/static/images/plotly_module.jpg?raw=1)

## 1.2 Getting Started 



### 1.2.1 Plotly Libraries

In [None]:
# As of this writing, the Google Colab has Plotly version 4.4.1 pre-installed
# We need to upgrade it to the latest version

!pip install --upgrade plotly



In [None]:
import numpy as np                  # We use numpy to generate some sample data for ploting
import plotly.graph_objects as go   # graph_opjects package is the core of plotly
import plotly.io as pio

import plotly
plotly.__version__

'5.3.1'

### 1.2.2 Theme Templates

Plotly visualization supports multiple themes represented by templates. 

We can set a default template up front which will be applied to all visualizations generated afterward by setting the variable `pio.templates.default`. 

Each viualization can also specify a template by using a property `template` in the Figure object.

In [None]:
# This shows the default template is "plotly"
# This also lists all supported templates. 

print(pio.templates)

Templates configuration
-----------------------
    Default template: 'plotly'
    Available templates:
        ['ggplot2', 'seaborn', 'simple_white', 'plotly',
         'plotly_white', 'plotly_dark', 'presentation', 'xgridoff',
         'ygridoff', 'gridon', 'none']



In [None]:
# Change the default template from plotly to plotly_dark

pio.templates.default = "plotly_dark"

### 1.2.3 An Empty Figure
Figure class in the graph_objects package encapsulate a plotly chart.

This example show an empty chart without any data.

In [None]:
fig = go.Figure()  # instantiate a Figure object

fig.show()         # Show the figure using the new default template "plotly_dark"

In [None]:
fig.update_layout(template="seaborn")  # Seaborn layout only apply to this figure

fig.show()         # Show the figure using the specified template "seaborn"

We can use Python `print()` function to print the content of a figre including its Data component and the Layout component. 

Here the output shows the Data component is an empty list since this figure has no data. The Layout component also has minimal information since we have not specify anything special. The template variable has a long list of attributes with default values and is represented with "..." for brevity.


In [None]:
print(fig)

Figure({
    'data': [], 'layout': {'template': '...'}
})


## 1.3 Simple Examples

### 1.3.1 A "Hello World" Chart
This example uses the method `update_layout()` of Figure class to add a title for the figure as well as the X axis and Y axis.

This simple chart has no data to display. 



In [None]:
fig = go.Figure()
fig.update_layout(title="Hello World!")

# Alternatively,
# my_layout = go.Layout(title="Hello World!")
# fig = go.Figure(layout = my_layout)

fig.show()

Here, the outpt shows thie figure has no data but its layout has a value for the title. 

In [None]:
print(fig)

Figure({
    'data': [], 'layout': {'template': '...', 'title': {'text': 'Hello World!'}}
})


### 1.3.2. A Boxplot of Ages of Some Men

Here, we create a trace of type "boxplot" for a list of numbers and add it to the Figure object using `add_trace()` method of the Figure class. A trace is represented by a Python dictionary data type which contains key-value pairs. We use the Graph object's box() method to create the trace. Alternatively, we can just create a Python dictionary. See the sectoin on Best Practices of which option to choose.

#### Statistics Background

"A boxplot is a standardized way of displaying the distribution of data based on a five number summary (“minimum”, first quartile (Q1), median, third quartile (Q3), and “maximum”). It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed."

https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51

Fences can be used to illustrate extreme values (outliers) in box plots. Sometimes you might see reference to “inner fences” and “outer fences”. These are defined as:
- Lower inner fence: Q1 – (1.5 * IQR)
- Upper inner fence: Q3 + (1.5 * IQR)
- Lower outer fence: Q1 – (3 * IQR)
- upper outer fence: Q3 + (3 * IQR)

Points beyond the inner fences in either direction are mild outliers; 

points beyond the outer fences in either direction are extreme outliers.


In addition, we also add a title for the X Axis.

In [None]:
# Here, we use Numpy to generate a list of random numbers to represent the ages for a group of men.

male_ages = np.random.randint(low=1, high=101, size=20)   # 20 random integers between 1 and 101 excluding 101.

print(male_ages)

[97  5 23 33 78 26 79 59 23 69 66 68 36 34 22 99 52 22 19 80]


In [None]:
# A trace is represented by a Python Dict object which contains key-value pairs.
# The key "x" represents the X Axis and its values are represented by a Python List object
# The key "type" represents the type of the chart, "box" for boxplot, "scatter" scatter plot, etc.

trace_0 = go.Box(   # type of chart: Boxplot
    x=male_ages,    
    name="Male"     # The name of the trace, used as a legend to distinguish multiple traces.

)

# Alternatively, just use the Python dictionary 
# trace_0 = {
#     "x":male_ages,
#     "type":"box",    # type of chart: Boxplot
#     "name":"Male"    # The name of the trace, used as a legend to distinguish multiple traces.
# }

fig = go.Figure()
fig.add_trace(trace_0)

# Alternatively,
# fig = go.Figure(data=[trace_0])

fig.update_layout(
    title="Boxplot of Ages of Some Men",
    xaxis={"title":"Age"}         # This is equivalent to xaxis_title="Age"
)

fig.show()

Since Plotly figures are interactive, you can move your mouse around to see the five summary statistics.

Here the print() function show that the figure has one trace of type boxplot and the data points. The figure also has some custom layout properties specified including its title, title for the X axis, and title for Y axis.

In [None]:
print(fig)

Figure({
    'data': [{'name': 'Male',
              'type': 'box',
              'x': array([47, 48, 38,  1, 44, 80, 23, 66, 50, 80, 79, 44, 61, 66, 43, 82, 51, 68,
                          42, 70])}],
    'layout': {'template': '...', 'title': {'text': 'Boxplot of Ages of Some Men'}, 'xaxis': {'title': {'text': 'Age'}}}
})


### 1.3.3. A Boxplot of Ages of Some Men and Women
We add another trace representing the boxplot of ages of some women.

In [None]:
male_ages = np.random.randint(low=1, high=100, size=20)

trace_0 = go.Box(   
    x=male_ages,    
    name="Male"   
)

female_ages = np.random.randint(low=1, high=100, size=20)

trace_1 = go.Box(   
    x=female_ages,    
    name="Female"    
)

fig = go.Figure()
fig.add_trace(trace_0)
fig.add_trace(trace_1)

# Alternatively,
# fig = go.Figure(data=[trace_0, trace_1])

fig.update_layout(
    title="Boxplot of Ages of Some Men and Women",
    xaxis={"title":"Age"},
    showlegend=True             # The legend can be shown or hidden
)

fig.show()

Since we already have the label "Male" and "Female" for the Y axis, the color legend on the upper right is not necessary. We can hide it by changing the `showlegend` property of the Layout to `False`. 

In [None]:
fig.update_layout(showlegend=False)

# Alternatively,
# fig.layout.showlegend = False

fig.show()

## 1.4. Export Figures 

There are several ways to export a Plotly visualization.


### 1.4.1 Ouptut JSON Data 

As mentioned in 1.2.3, we can use Python `print()` function to print the content of a figre including its Data component and the Layout component. Here, the output of print() shows the figure has two traces along with properties of the layout.

In [None]:
print(fig)

Figure({
    'data': [{'name': 'Male',
              'type': 'box',
              'x': array([21, 11, 65,  7,  4, 15, 88, 10, 15, 90, 92, 50, 89, 27, 54, 44, 48, 53,
                          96, 43])},
             {'name': 'Female',
              'type': 'box',
              'x': array([23, 22, 35, 65,  8,  6, 34, 22, 43, 98, 10, 82, 21, 50,  7, 61, 37, 96,
                           9, 78])}],
    'layout': {'showlegend': False,
               'template': '...',
               'title': {'text': 'The Boxplot for Age'},
               'xaxis': {'title': {'text': 'Age'}}}
})


### 1.4.2 Output as Static Images

Plotly visualizations can be downloade as a static image by clicking on download icon. The icon is on the far left of the tool bar which appears on the upper right of the visualization when you move the mouse over that area.  

We can also write code to export a visualization to a static image file in a varieties of format including PNG, JPG, and PDF. Since generated image files are static, the interactivity is lost. 

In order to generate static images from Plotly visualizations, a ancilary Python library `kaleido` must be installed first.

In [None]:
!pip install kaleido



In [None]:
fig.write_image("boxplot.png")
fig.write_image("boxplot.pdf")


### 1.4.3 Output to HTML Files

Plotly visualizations can be exported to HTML files which can be opened using a web browser for display and interaction.

The parameter `include_plotlyjs` has two options that result in different size of the generated file. 

When the option is `True`, the Plotly.js Java Script library is included in the file which adds about 3MB to the size of the file. 

When the option is set to "cnd" (content delivery network), the Plotly.js library is not included in the file but referenced from the website `https://cdn.plot.ly`. This makes the resulting file much smaller.


In [None]:
fig.write_html("boxplot_include_js.html", include_plotlyjs=True)

fig.write_html("boxplot_cdn.html", include_plotlyjs="cdn")


## 1.5 Python Flexibility

Plotly Python is flexible and provide multiple ways to achieve the same things. 


### 1.5.1 Different Ways to Create a Figure 

We can create an empty Figure object and then add traces and update layout properties like this:
```
trace_0 = go.Box(   
    x=male_ages,    
    name="Male"   
)

fig = go.Figure()
fig.add_trace(trace_0)
fig.update_layout(title="A Boxplot")
```
Alternatively, we can create traces and add them to the Data object and create a Layout object with some specified properties and then create the figure using the Data object and Layout object as inputs:

```
trace_0 = go.Box(   
    x=male_ages,    
    name="Male"   
)

my_layout = go.Layout(title="A Boxplot")
fig = go.Figure(data=[trace_0], layout=my_layout)
```


    

### 1.5.2 Different Ways to Create a Trace

We can use a specifc method of the Graph object. Here we use Box() method to create a Boxplot. This method creates a Python dictionary object to represent a Boxplot.

```
trace_0 = go.Box(
    x=[10, 3, -5, -35, 23, 8, 78, -65, 13,31, 82],  
    name="Trace Name"                   
)
```

Alternatively, we can use a Python dictionary object to represent a trace:

```
trace_0 = {                         
    "x":[10, 3, -5, -35, 23, 8, 78, -65, 13,31, 82],  
    "type":"box",
    "name":"Trace Name"                   
}
```

### 1.5.3 Different Ways to Specify a Layout Property

For example, to specify the title of the X axis, the following three methods work the same:

- `fig.update_layout(xaxis={"title":"Age"})`
- `fig.update_layout(xaxis_title="Age")`
- `fig.layout.xaxis.title = "Age"`

Python is a flexible language and offers alternative ways to achieve the same outcome. In some cases, there are industry best practices. For example, the commonly used indentation is four spaces. In other cases, it is up to your personal preference. In the latter, you should try to pick one and use it consistently. 


## 1.6. Summary

Here are the steps to create a plotly chart:

1. Create an instance of the Figure class. 
2. Create traces (one or more) each representing a plot.
3. Add the traces to the Figure instance.
4. Update the layout of the figure (title, legend, etc.).
5. Display or export the figure.