<div align="right" style="text-align:right;">
        <a href="https://www.clusterkit.co.th" title="clusterkit.co.th">
                <img style="width:200px;display:inline;" width="200px" src="../assets/logo.png" alt="ClusterKit">
        </a>
</div>

# Plotly & Dash Overview

## Content

- [Plotly & Dash Overview](#Plotly-&-Dash-Overview)
    - [NumPy](#NumPy)
        - [NumPy Overview](#NumPy-Overview)
        - [NumPy Arrays](#NumPy-Arrays)
        - [Operations On Arrays](#Operations-On-Arrays)
        - [Additional Functionality](#Additional-Functionality)
    - [Pandas](#Pandas)
        - [Pandas Overview](#Pandas-Overview)
        - [Series](#Series)
        - [DataFrames](#DataFrames)
    - [Plotly](#Plotly)
        - [Plotly Overview](#Plotly-Overview)
        - [Plotly Basic](#Plotly-Basic)
        - [Scatterplot](#Scatterplot)
        - [Line Chart](#Line-Chart)
        - [Bar Chart](#Bar-Chart)
        - [Bubble Chart](#Bubble-Chart)
        - [Heatmap](#Heatmap)
    - [Dash](#Dash)
        - [Dash Overview](#Dash-Overview)
        - [Dash Layout](#Dash-Layout)
        - [Converting Plotly To Dash](#Converting-Plotly-To-Dash)
        - [Dash Components](#Dash-Components)
        - [Dash Callbacks](#Dash-Callbacks)
        - [Multiple Inputs](#Multiple-Inputs)
        - [Multiple Outputs](#Multiple-Outputs)
        - [Hover Over Data](#Hover-Over-Data)
        - [Click Data](#Click-Data)
        - [Selected Data](#Selected-Data)
        - [Live Updating](#Live-Updating)
        - [Authentication](#Authentication)
        - [Deployment](#Deployment)

# NumPy

## NumPy Overview

NumPy is a first-rate library for numerical programming

- Widely used in academia, finance and industry.  
- Mature, fast, stable and under continuous development.  

The essential problem that NumPy solves is fast array processing.

For example, suppose we want to create an array of 1 million random draws from a uniform distribution and compute the mean.

If we did this in pure Python it would be orders of magnitude slower than C or Fortran.

This is because

- Loops in Python over Python data types like lists carry significant overhead.  
- C and Fortran code contains a lot of type information that can be used for optimization.  
- Various optimizations can be carried out during compilation when the compiler sees the instructions as a whole.  

However, for a task like the one described above, there’s no need to switch back to C or Fortran.

Instead, we can use NumPy, where the instructions look like this:

In [19]:
import numpy as np

x = np.random.uniform(0, 1, size=1000000)
x

array([0.46906046, 0.28075918, 0.21946581, ..., 0.10859233, 0.34069026,
       0.17174221])

In [20]:
x.mean()

0.5000360687009411

The operations of creating the array and computing its mean are both passed out to carefully optimized machine code compiled from C.

More generally, NumPy sends operations *in batches* to optimized C and Fortran code.

This is similar in spirit to Matlab, which provides an interface to fast Fortran routines.

### NumPy is great for operations that are naturally *vectorized*.

Vectorized operations are precompiled routines that can be sent in batches, like

- matrix multiplication and other linear algebra routines  
- generating a vector of random numbers  
- applying a fixed transformation (e.g., sine or cosine) to an entire array  

### References

[https://numpy.org/doc/1.17/reference/index.html](https://numpy.org/doc/1.17/reference/index.html)

## NumPy Arrays

### Basic

In [29]:
my_list = [0, 1, 2, 3, 4]
my_list

[0, 1, 2, 3, 4]

In [30]:
arr = np.array(my_list)
arr

array([0, 1, 2, 3, 4])

In [31]:
type(arr)

numpy.ndarray

### Arange integers, takes in start, stop, and step size

In [32]:
a = np.arange(0, 10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [33]:
a = np.arange(0, 10, 2)
a

array([0, 2, 4, 6, 8])

### Create an array of zeros

In [35]:
a = np.zeros((5, 5))
a

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

### Create an array of ones

In [36]:
a = np.ones((2, 4))
a

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.]])

### Create an array of random integers

In [41]:
a = np.random.randint(0, 10)
a

2

### Create 2d matrix of random integers

In [42]:
a = np.random.randint(0, 10, (3, 3))
a

array([[2, 8, 4],
       [6, 4, 8],
       [9, 3, 6]])

### Create linearly spaced array

In [48]:
a = np.linspace(0, 10, 6)
a

array([ 0.,  2.,  4.,  6.,  8., 10.])

## Operations On Arrays

In [49]:
arr = np.random.randint(0, 100, 10)
arr

array([50, 45, 65, 24, 47, 27, 46, 60, 73, 26])

In [50]:
arr.max()

73

In [51]:
arr.min()

24

In [52]:
arr.mean()

46.3

In [59]:
arr.argmin()

3

In [60]:
arr.argmax()

8

In [61]:
arr.reshape(2, 5)

array([[50, 45, 65, 24, 47],
       [27, 46, 60, 73, 26]])

## Additional Functionality

In [68]:
mat = np.arange(0, 100).reshape(10, 10)
mat

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

In [70]:
row = 0
col = 1

### Select an individual number

In [72]:
mat[row, col]

1

### Select an entire column

In [73]:
mat[:, col]

array([ 1, 11, 21, 31, 41, 51, 61, 71, 81, 91])

### Select an entire row

In [74]:
mat[row, :]

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

### Masking

In [77]:
mat > 50

array([[False, False, False, False, False, False, False, False, False,
        False],
       [False, False, False, False, False, False, False, False, False,
        False],
       [False, False, False, False, False, False, False, False, False,
        False],
       [False, False, False, False, False, False, False, False, False,
        False],
       [False, False, False, False, False, False, False, False, False,
        False],
       [False,  True,  True,  True,  True,  True,  True,  True,  True,
         True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True]])

In [79]:
mat[mat>50]

array([51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

Masking allows you to use conditional filters to grab elements
we'll see this idea used in pandas.

# Pandas

## Pandas Overview

[Pandas](http://pandas.pydata.org/) is a package of fast, efficient data analysis tools for Python.

Its popularity has surged in recent years, coincident with the rise
of fields such as data science and machine learning.
  
Just as [NumPy](http://www.numpy.org/) provides the basic array data type plus core array operations, pandas

1. defines fundamental structures for working with data and  
1. endows them with methods that facilitate operations such as  
  
  - reading in data  
  - adjusting indices  
  - working with dates and time series  
  - sorting, grouping, re-ordering and general data munging 
  - dealing with missing values, etc., etc.  
  
More sophisticated statistical functionality is left to other packages, such
as [statsmodels](http://www.statsmodels.org/) and [scikit-learn](http://scikit-learn.org/), which are built on top of pandas.

This lecture will provide a basic introduction to pandas.

Throughout the lecture, we will assume that the following imports have taken
place

In [81]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Series

Two important data types defined by pandas are  `Series` and `DataFrame`.

You can think of a `Series` as a “column” of data, such as a collection of observations on a single variable.

A `DataFrame` is an object for storing related columns of data.

Let’s start with Series

In [86]:
s = pd.Series(np.random.randn(4), name='daily returns')
s

0   -0.837713
1    0.871841
2   -0.249015
3    0.835131
Name: daily returns, dtype: float64

Here you can imagine the indices `0, 1, 2, 3` as indexing four listed
companies, and the values being daily returns on their shares.

Pandas `Series` are built on top of NumPy arrays and support many similar
operations

In [89]:
s * 100

0   -83.771273
1    87.184114
2   -24.901461
3    83.513091
Name: daily returns, dtype: float64

In [None]:
np.abs(s)

But `Series` provide more than NumPy arrays.

Not only do they have some additional (statistically oriented) methods

In [90]:
s.describe()

count    4.000000
mean     0.155061
std      0.841654
min     -0.837713
25%     -0.396189
50%      0.293058
75%      0.844308
max      0.871841
Name: daily returns, dtype: float64

But their indices are more flexible

In [91]:
s.index = ['AMZN', 'AAPL', 'MSFT', 'GOOG']
s

AMZN   -0.837713
AAPL    0.871841
MSFT   -0.249015
GOOG    0.835131
Name: daily returns, dtype: float64

Viewed in this way, `Series` are like fast, efficient Python dictionaries
(with the restriction that the items in the dictionary all have the same
type—in this case, floats).

In fact, you can use much of the same syntax as Python dictionaries

In [92]:
s['AMZN']

-0.8377127304410259

In [93]:
s['AMZN'] = 0
s

AMZN    0.000000
AAPL    0.871841
MSFT   -0.249015
GOOG    0.835131
Name: daily returns, dtype: float64

In [94]:
'AAPL' in s

True

## DataFrames

While a `Series` is a single column of data, a `DataFrame` is several columns, one for each variable.

In essence, a `DataFrame` in pandas is analogous to a (highly optimized) Excel spreadsheet.

Thus, it is a powerful tool for representing and analyzing data that are naturally organized  into rows and columns, often with  descriptive indexes for individual rows and individual columns.

Let’s look at an example that reads data from the CSV file `dataset/salaries.csv`

Here’s the content of `salaries.csv`

```text
Name,Salary,Age
John,50000,34
Sally,120000,45
Alyssa,80000,27
```

In [95]:
df = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas/data/test_pwt.csv')
type(df)

pandas.core.frame.DataFrame

In [96]:
df

Unnamed: 0,country,country isocode,year,POP,XRAT,tcgdp,cc,cg
0,Argentina,ARG,2000,37335.653,0.9995,295072.2,75.716805,5.578804
1,Australia,AUS,2000,19053.186,1.72483,541804.7,67.759026,6.720098
2,India,IND,2000,1006300.297,44.9416,1728144.0,64.575551,14.072206
3,Israel,ISR,2000,6114.57,4.07733,129253.9,64.436451,10.266688
4,Malawi,MWI,2000,11801.505,59.543808,5026.222,74.707624,11.658954
5,South Africa,ZAF,2000,45064.098,6.93983,227242.4,72.71871,5.726546
6,United States,USA,2000,282171.957,1.0,9898700.0,72.347054,6.032454
7,Uruguay,URY,2000,3219.793,12.099592,25255.96,78.97874,5.108068


# Plotly

## Plotly Overview

## Plotly Basic

## Scatterplot

## Line Chart

## Bar Chart

## Bubble Chart

## Heatmap

# Dash

## Dash Overview

## Dash Layout

## Converting Plotly To Dash

## Dash Components

## Dash Callbacks

## Multiple Inputs

## Multiple Outputs

## Hover Over Data

## Click Data

## Selected Data

## Live Updating

## Authentication

## Deployment