<a href="https://colab.research.google.com/github/sensei-jirving/Online-DS-PT-01.24.22-cohort-notes/blob/main/BonusLectures/InteractiveVisualizations/PreClass_Interactive_Visualizations_with_Plotly_Express.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Interactive Visualizations - with Plotly Express
- 02/25/22 - Bonus Office Hours
- 01.24.22 Cohort

>- A brief introduction to creating interactive visualizations using Plotly.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


## Plotly Imports
import plotly.express as px 
import plotly.graph_objects as go
import plotly.io as pio

## Data
- King County House Prices:
    - Originally downloaded from Kaggle:
        - https://www.kaggle.com/harlfoxem/housesalesprediction 
    - [Google Drive Share Url](https://drive.google.com/file/d/1kam1UuRmCA8a9i5_ZXaP9GBc7J-B79Mm/view?usp=sharing)
        - Directly Loadable Publish Url: 
        ```python
        filename = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vSEZuzn_iFfNfMDBO4pRkwUMj62a_ptghfTGpdMZCFbZQjCmXuE3wtu9X2RU91eKgzuHLYimclPfo53/pub?output=csv'
        df = pd.read_csv(filename)
        ```


In [None]:
## Load the data, set the index col to be the id column 
filename = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vSEZuzn_iFfNfMDBO4pRkwUMj62a_ptghfTGpdMZCFbZQjCmXuE3wtu9X2RU91eKgzuHLYimclPfo53/pub?output=csv'
df = pd.read_csv(filename,index_col=0,
                 parse_dates=['date'])
df

In [None]:
## Quick inspection of dataframe
report = pd.DataFrame({'column': df.columns,
                       'dtype': df.dtypes, 
                       'nunique': df.nunique(),
                       '# nulls': df.isna().sum(),
                        "% nulls": df.isna().sum()/len(df)}).reset_index(drop=True)
report

In [None]:
df.describe().round(2)

### Some quick feature engineering

In [None]:
## Making category was_renovated out of yr_renovated
df['was_renovated'] = df['yr_renovated'] > 0
df['was_renovated'].value_counts()

In [None]:
## Making has_basement from of sqft_basement
df['has_basement'] = df['sqft_basement'] > 0
df['has_basement'].value_counts()

# 🕹 Plotly Express


- Plotly Express: https://plotly.com/python/plotly-express/ 
    - Plotly express is a submodule of a larger package called plotly.
    - Normal plotly is a bit tedious to code (manually constructing every line with loops, complex dictionary organization,etc. 
    - Plotly Express is designed to be the new easy-entry/start for Plotly visuals



>- Plotly Express is to Plotly like Seaborn is to Matplotib
    - Doing more for us with less code.|
- Like working with Seaborn, we would make the figure with plotly express and use the underlying plotly structure to customize viualizations.

## Plotly Things to Demonstrate

- Scatter Plots
    - plain
    - color-coded
    - with a trendline
    - Using themes
- Scatter Matrices (For EDA ONLY!)
- Histograms & Bar Plots
- Maps (scatter)

## Scatter Plots

In [None]:
## simple scatterplot


#### Hover_data

In [None]:
## simple scatterplot + hoverdata


### Customizing Plotly Express Figures

- Like seaborn, we will want to save the output of our plotly express function and then use its methods to update it. 

In [None]:
## remake scatterplot and save as fig
fig = None
fig

### Whats in a Plotly Fig?

- the data and aesthetics are stored in attributes/dictionaries inside of the figure
- For details: https://plotly.com/python/figure-structure/


- To change the figure, there are a few key methods we will use to modify our visual:
    - fig.update_traces
    - fig.upate_layout
    - fig.show


In [None]:
## Print fig to see info
fig.show()
print(fig)

In [None]:
## to make smaller markers/different colors - use fig.update_traces 
# make markers smaller with a white border


In [None]:
## functionize customization for efficiency
def update_figure():
    pass

### Color-Coded Scatterplot by waterfront

In [None]:
## scatterplot with color=waterfront


### Color-Coded Scatterplot by waterfront - with a tredline

In [None]:
## scatterplot with trendline


In [None]:
## convert waterfront to string for viz


#### Themes
- https://plotly.com/python/templates/

In [None]:
import plotly.io as pio
pio.templates

In [None]:
## remake figure with different theme/template


### Scatter Matrix

In [None]:
## Determining which cols to excude from the scatter matrix
exclude_cols = ['date','zipcode','lat','long','yr_built','yr_renovated','view',
                'condition','has_basement','was_renovated','sqft_lot15','sqft_living15']
plot_cols = df.drop(columns=exclude_cols).columns
plot_cols

In [None]:
## Scatter matrix (EDA ONLY)



## Histograms & Barplots

### Histogram + boxplot!

In [None]:
### histogram + boxplot


### Bar plots

>- Different than a seaborn barplot/
- Plots each house as a horizonal line.
- Does not play nicely with large datasets!

In [None]:
## barplot of has_basement vs price
# take a sample of only 1000 homes
sample = None


## Mapping

In [None]:
# reminding ourselves of our dataframe
df.head(2)

#### `px.scatter_mapbox`

In [None]:
# lat vs long, color by price


> ***Why aren't we seeing anything???***

- To the Documentation!
    - [px.scatter_mapbox docs](https://plotly.github.io/plotly.py-docs/generated/plotly.express.scatter_mapbox.html#plotly-express-scatter-mapbox)
- read the "mapbox_style" parameter 

- adding hover_data

In [None]:
## saving all columns but lat./long as hover_cols
hover_cols = df.drop(columns=['lat','long']).columns
hover_cols

In [None]:
## PLot scatter mapbox with hover_data
fig = px.scatter_mapbox(df,lat='lat',lon='long',color='price',
                        mapbox_style='open-street-map',
                        hover_data=hover_cols) ## adding hover data
update_figure(fig,marker_style=dict(size=5));


- Turn off scroll-over zoom using config 
https://plotly.com/python/configuration-options/ 
```python
config = dict({'scrollZoom':False})
fig.show(config=config)
```

# Appendix

## 3d Scatterplot

- Not an ideal choice when the data has different units!

In [None]:
# 3d scatter plot
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

plot_cols = ['sqft_living','sqft_lot','bedrooms']
plot_df = df[['price'] + plot_cols ].copy()
plot_df


In [None]:
## scale data
plot_df[plot_cols] = scaler.fit_transform( plot_df[plot_cols])
plot_df.describe().round(2)

In [None]:
## use scatter_3d 
