## Before we start:
* Final assignment available on EduFlow during Monday class.

### A few questions:
* **Other tools for visualization:**
  * [Power BI](https://powerbi.microsoft.com/)
  * [Looker Studio](https://lookerstudio.google.com/) (formerly Google Data Studio)
  * [Tableau](https://www.tableau.com/)
  * +++ A host of other solutions
* **Why Python & Pandas vs other more accessible visualization tools?**
  * In short: Python for scale, performance, flexability > SPEED.
  * Combine them:
    * Python to get, filter, merge data (e.g. via API)
    * Initial analysis/QA using visualization (e.g. Pandas)
    * Generate output (csv/excel/json)
    * Load into Power BI/Tableau etc for further analysis and more advanced visualizations.
  * Typically Python for EDA (Exploratory Data Analysis); Power BI etc for more general analysis / business reporting.
* **Assignment 4.3**
  * Objective 2 - Confusion about question/wording, uncertainty about grouping/sum
  * Objective 3 - Color bars


# 4.4 Visualizing Data in Python

## [Seaborn](https://seaborn.pydata.org/tutorial/introduction)
* Uses matplotlib as it's engine, just like pandas
* More aesthetically pleasing and modern, better for higher-quality visualizations

Why? Important to know about available resources, specially common ("best practice") ones with widespread industry usage, solid documentation and strong community support.

In [None]:
# Import alias "sns -> seaborn namespace" or name play on actor "Samuel Norman Seaborn"
# pip install seaborn
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
from random import randint 

In [None]:
df = pd.DataFrame({
    "Income" :[randint(0,1000) for x in range(30)],
    "Expenses" : [randint(0,1000) for x in range(30)]})
df.head()

### Updating scatter plots

In [None]:
# https://seaborn.pydata.org/generated/seaborn.regplot.html?highlight=regplot#seaborn.regplot
# https://www.geeksforgeeks.org/seaborn-regression-plots/

# Since only one graph, no need to reference plt.figure or create a grid 
sns.regplot(data=df, x="Income", y="Expenses")
plt.show()

### Updating Histogram plots

In [None]:
# https://seaborn.pydata.org/generated/seaborn.histplot.html?highlight=histplot#seaborn.histplot
# Shows both a histogram (using density normalization) and a superimposed kernel density estimate (if kde=True and stat="density")
# Default bin size is determined using a reference rule that depends on the sample size and variance
# Heads up! distplot() i DEPRECATED

fig = plt.figure(figsize=(10,3))
income_axes = fig.add_subplot(1,2,1)
expense_axes = fig.add_subplot(1,2,2)
plt.subplots_adjust(wspace=.3)

# Plot data
sns.histplot(df["Income"], ax=income_axes, kde=True, stat="density")
sns.histplot(df["Expenses"], ax=expense_axes, kde=True, stat="density")
plt.show()

### Box plots

In [None]:
# https://seaborn.pydata.org/generated/seaborn.boxplot.html?highlight=boxplot#seaborn.boxplot
# https://www.geeksforgeeks.org/box-plot/
# Depicts data through their quartiles

sns.boxplot(data=df, orient="h")
plt.show()

<img src="https://media.geeksforgeeks.org/wp-content/uploads/20201127012952/boxplot.png" style="width: 600px;"/>

## Use Cases for Visualization