# Interactive Python visualization dashboards with Plotly and Dash

##### Russell Romney, 1/24/2018


### Python has very powerful data tools, but interactive visualization is a way to make the data come to life for the user

---

**What we are going to do today:**

1. Python: what you need to know
2. Linear regression: basics
3. Making a basic interactice Dash app
3. Making a Plotly chart in a Dash app
3. Making an interactive chart in a Dash app
4. Interactive linear regression in a Dash app

Helpful links:

dataset: [Graduate admissions data from Kaggle](https://www.kaggle.com/mohansacharya/graduate-admissions/home)

github repo with all the code and this Jupyter Notebook: [python4data-dash repo @ github](https://github.com/russellromney/python4data-dash)

Dash documentation: [Plotly Dash Docs](https://dash.plot.ly/)


I AM DOING ALL THIS ON A MACBOOK - BUT YOU CAN DO THIS ON A WINDOWS MACHINE AS WELL; IT IS EASIEST TO USE THE ANACONDA PROMPT THAT COMES WITH THE ANACONDA DISTRIBUTION INSTEAD OF TRYING TO DO THIS AT THE WINDOWS COMMAND LINE (IF YOU DO NOT HAVE EXPERIENCE)

## Python: what you need to know


GET STARTED

Download and install Anaconda with Python 3.5, 3.6, or 3.7. You won't need it **necessarily** if you are on a Mac or Linux, but it's a really great thing to have if you are using Python to code.

Packages you need:

* Comes with Anaconda: `pandas scikit-learn`
* Doesn't come with: `dash dash-core-components dash-html-components dash-renderer`

We are not working in Jupyter Notebook today. You will need an IDE (a place to write and edit code) - I use Visual Studio Code as I think it is the fastest and most usable, but Atom, PyCharm, Notepad++, and many other options are very good as well. You could even use a text document for this, but it wouldn't do syntax highlighting.


### Coding in Python for data

Pandas dataframe (remember from Frank last week?)

In [26]:
import pandas as pd
df = pd.read_csv('Admission_Predict_Ver1.1.csv')
df.head(10)

Unnamed: 0,Serial No.,GRE Score,TOEFL Score,University Rating,SOP,LOR,CGPA,Research,Chance of Admit
0,1,337,118,4,4.5,4.5,9.65,1,0.92
1,2,324,107,4,4.0,4.5,8.87,1,0.76
2,3,316,104,3,3.0,3.5,8.0,1,0.72
3,4,322,110,3,3.5,2.5,8.67,1,0.8
4,5,314,103,2,2.0,3.0,8.21,0,0.65
5,6,330,115,5,4.5,3.0,9.34,1,0.9
6,7,321,109,3,3.0,4.0,8.2,1,0.75
7,8,308,101,2,3.0,4.0,7.9,0,0.68
8,9,302,102,1,2.0,1.5,8.0,0,0.5
9,10,323,108,3,3.5,3.0,8.6,0,0.45


ADDING COLUMNS

In [27]:
# change the column names to remove weird spaces
df.columns = [x.strip() for x in df.columns]

# add new columns with combined data
df['SOP_LOR'] = df.SOP + df.LOR
df['SOP_LOR_UR'] = df.SOP + df.LOR + df['University Rating']
df['LOR_UR'] = df.LOR + df['University Rating']
df['SOP_UR'] = df.SOP + df['University Rating']


DICTIONARIES

In [28]:
# key:value pairs
# two ways

d1 = dict(one='1',two='2')
print(d1)

d2 = {"one":"1","two":"2"}
print(d2)

print(d1==d2)
d1['one']

{'one': '1', 'two': '2'}
{'one': '1', 'two': '2'}
True


'1'

DEFINING FUNCTIONS

In [29]:
def function_1(input1, input2):
    a = input1 + " is first; "
    b = input2 + " is second but not third"
    return a+b

function_1("go","vandals")

'go is first; vandals is second but not third'

RUNNING A PYTHON FILE

assuming python script is named app.py:

`$ python3 app.py`

LOCALHOST SERVERS

Websites are hosted on servers. You can host a fake server on your local computer at `http://localhost:<PORT>`

Flask is a web microframework that comes with a built-in server. When you run a file using Flask, you can use the built-in development server that runs on `localhost:5000`

## Making a basic interactive Dash app

Dash is an extension of `Flask` that makes it easy to create `HTML objects` in Python with the `React.js framework` which allows you to communicate and asynchronously change/update parts of the webapp without reloading the whole page.

You pass special HTML objects to the Dash app, which renders them as real HTML objects in a webapp. To start, you can use the development server to see these webapps or even run them on servers like I do.

An example of a simple Dash app I made: [Mortgage Years App](https://mortgage-years.herokuapp.com)

#### How it works

Each HTML object can have an ID and several attributes. Dash provides special HTML objects in the `dash_html_components` module that are a lot like normal HTML objects, and wrappers for React components (HTML objects that you can easily talk to) in the `dash_core_components` module. You can even build them yourself if you'd like!

PRO DASH TIP:

`import dash`

`import dash_core_components as dcc` 

`import dash_html_components as html`

`from dash.dependencies import Input, Output`

CALLBACKS

Dash lets you define `callback functions` that listen for changes in the special HTML and React objects. When a change happens, you can automatically update an attribute of any of the other objects on the page based on the values of any or many of the other objects. 

DOES THIS MAKE SENSE? MAYBE NOT?

#### LET'S SEE THIS IN ACTION!  ---> IDE

`app1.py`

---

## Making a Plotly chart in a Dash app

Plotly is an interactive charting library built on top of D3.js, but all you need to know is that `Plotly lets you easily pass special dictionaries of attributes to a figure that you then plot`.

#### PLOTLY CHARTING FLOW

`import plotly.graph_objs as go`

**data**: a list of graph object (special dictionaries) from `plotly.graph_objs`; e.g. `go.Scatter`, `go.Bar`, `go.Histogram`, etc. These are "traces" - they are added and removed from the canvas at will, and are not really connected in the graph. This makes it very easy to add new layers to the graph.

**layout**: a special dictionary `go.Layout` of chart layout attributes

**figure**: a special dictionary `go.Figure` containing a data object and a layout object; this is the thing you plot

1 NEW THING
* static graph that shows a static relationship between two variables

#### LET'S SEE THIS IN ACTION! ---> IDE¶

`app2.py`

---

## Making an interactive Plotly chart in a Dash app

This is where we start to abstract things away a little bit - and we start to see the real power of Dash and Plotly.

We want to see the relationship between any of the variables and the chance of admission, without needing to create many new graphs. 

HOW WE WILL DO IT

We will create a dropdown menu where we can select the name of the column we want to visualize. Then, whichever column name is passed, we will update the graph's `figure` attribute with a new scatter plot of the data for that column.

2 NEW THINGS:
* dropdown menu
* function to update the graph

#### LET'S SEE THIS IN ACTION! ---> IDE¶

`app3.py`

---

## Interactive linear regression in a Dash app

(If we have time or if people are interested)

> "This gives you superpowers" - Russell

Creating models again and again is boring, and creating lots of graphs is also boring. Let's figure out a way to get around this. 

### Linear regression: basics

Generally: a "line of best fit" for a given variable, as predicted by other variables. 

Goal: create a line that minimizes the error between the predicted results and the actual results.

**How to do this in Python (very simple):**
1. split your data into train and test data
2. create a linear regression model INSTANCE
2. fit the model to your TRAIN DATA
3. use the fitted model to predict values for your TEST DATA
3. determine the model's accuracy

In our example, we will only use one variable to predict the new values. The output is a coefficient that tells us the magnitude of the relationship between the target variable (the one we are trying to predict) and the explanatory variable (the one we are using to predict the target).

> One way to determine the model's quality is to look at the R^2 - the percentage of variation in the test data explained by the explanatory variable

In [22]:
# import the objects we use
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

In [23]:
# split the data into train and test data
import numpy as np
x = np.array(df['GRE Score']).reshape(-1,1)
y = df['Chance of Admit']
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=.35)

# create the model
lr = LinearRegression()

# fit the model to the data
lr.fit(x_train,y_train)

# create predictions
pred = lr.predict(x_test)

In [24]:
# coefficient
lr.coef_

array([0.01016382])

In [25]:
r2_score(y_test,pred)

0.5996766572568488

What we can do is use the same concept from the last app:
* take in a column name
* plot the data for that column

...but we add a few steps:
* split the data into train and test sets
* create a linear regression model for that column
* create predictions
* see how good the model is
* plot the results of the model on the same chart as the data

So the powerful thing is: just by selected a column name, we can create and test a regression model and graph its prediction on the same chart as the data!

> "That is pretty cool." - Russell

I'm going to just build this one myself, don't feel the need to follow along as you'll get the idea and can refer to the notes later if you'd like.

#### LET'S SEE THIS IN ACTION! ---> IDE¶

`app4.py`

---

# Now you have seen the power of Dash and Plotly. Happy visualizing, etc.! Dash can do so much more than just visualizing. It's built on a versatile backend and a hip & powerful frontend, so it can build fully featured web apps (it does have a few limits though).

### That is what I am using it for right now - I'm building a fully featured investment analytics platform using Dash (as well as some other things that make it more powerful). 


---

# Fin.