# Intro to Data Visualization

![Image](./images/intro.png)

__Data Part Time Jan 2024__

In [None]:
# but why???
import pandas as pd
df = pd.read_csv('./datasets/vehicles.csv', low_memory=False).sort_values(by=['Vehicle Class']).head(1000)
table_data = df.pivot_table(index=['Make'], values=['CO2 Emission Grams/Mile']).reset_index()
table_data

In [None]:
# here's why!!!
plot_data = table_data.plot.barh(x='Make', y='CO2 Emission Grams/Mile', figsize=(13,8))

In [None]:
# but be careful...
table_data.set_index(['Make'], inplace=True)
plot_problem = table_data.plot.pie(y='CO2 Emission Grams/Mile', figsize=(13, 13))

# Use cases

<p align="center"><img src="./images/reporting.png"></p>

### Static Plotting Libraries for Python

- [matplotlib: Visualization with Python](https://matplotlib.org/) 

_Alternatives: [plotnine (a.k.a.: ggplot)](https://plotnine.readthedocs.io/en/stable/index.html)_

- [seaborn: statistical data visualization](http://seaborn.pydata.org/index.html)

- [pandas.DataFrame.plot](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html)



<p align="center"><img src="./images/anatomy_figure.jpg"></p>

In [None]:
# import libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

# magic for inline plotting in jupyter notebook (not necessary anymore)
%matplotlib inline

In [None]:
# line plot data
x = np.arange(0,10,1)
y = np.power(x, 2)
print(x)
print(y)

In [None]:
# line plot
plt.plot(x, y, 'o')

In [None]:
# also line plot data
np.random.seed(19680801)

dt = 0.01
t = np.arange(0, 30, dt)
nse1 = np.random.randn(len(t))                 # white noise 1
nse2 = np.random.randn(len(t))                 # white noise 2

# Two signals with a coherent part at 10Hz and a random part
s1 = np.sin(2 * np.pi * 10 * t) + nse1
s2 = np.sin(2 * np.pi * 10 * t) + nse2

print(s1)
print(s2)
print(t)

In [None]:
# also a line plot
fig, axs = plt.subplots(ncols=1, nrows=2, figsize=(15,5))
axs[0].plot(t, s1, t, s2)
axs[0].set_xlim(0, 2)
axs[0].set_xlabel('time')
axs[0].set_ylabel('s1 and s2')
axs[0].grid(True)
cxy, f = axs[1].cohere(s1, s2, 256, 1. / dt)
axs[1].set_ylabel('coherence')
fig.tight_layout()
plt.show()

In [None]:
# dataframe for seaborn
x = np.arange(100)
y = x ** 2
data = pd.DataFrame({'x': x, 'y': y})
data.head()

In [None]:
# seaborn line plot

sns.lineplot(data=data, x='x', y='y')

In [None]:
# pandas always very helpful
df = pd.DataFrame({'pig': [20, 18, 489, 675, 1776], 
                   'horse': [4, 25, 281, 600, 1900], 
                   'tardigrade': [58, 250, 28, 6, 19]},
                  index=[1990, 1997, 2003, 2009, 2014])
df

In [None]:
# pandas plotting
lines = df.plot.line(figsize=(18,8))

### Dynamic Plotting Libraries for Python

- [Plotly Python Open Source Graphing Library](https://plotly.com/python/) 

_Alternatives: [bokeh](https://docs.bokeh.org/en/latest/index.html#), [pygal](https://www.pygal.org/en/latest/index.html), [gleam](https://github.com/dgrtwo/gleam)_

__Other uses:__ [folium](https://python-visualization.github.io/folium/), [geoplotlib](https://github.com/andrea-cuttone/geoplotlib), [missingno](https://github.com/ResidentMario/missingno)

### Interactive Data Visualization Software
- __[Microsoft Power BI](https://app.powerbi.com/)__
_===> [go to gallery](https://community.powerbi.com/t5/COVID-19-Data-Stories-Gallery/bd-p/pbi_covid19_datastories)_

- __[Tableau](https://public.tableau.com/)__ 
_===> [go to gallery](https://public.tableau.com/app/discover/viz-of-the-day)_

- [Looker](https://looker.com/)

- [Qlik](https://www.qlik.com/us/)

- [MicroStrategy](https://www.microstrategy.com/en)

#### [Gartner Magic Quadrant](https://powerbi.microsoft.com/en-my/blog/microsoft-named-a-leader-in-the-2023-gartner-magic-quadrant-for-analytics-and-bi-platforms/)

<p align="center"><img src="./images/magic_quadrant_2021_2022.jpg"></p>


<p align="center"><img src="./images/magic_quadrant_2023.jpg"></p>


### Virtual Machines for Power BI 

> Windows 1 - 0 Macintosh!!! Sorry...

- [VirtualBox](https://www.virtualbox.org/wiki/VirtualBox) _Free_

- [Parallels](https://www.parallels.com/) _Not Free_

### Open-Source Web Applications
- [Streamlit](https://streamlit.io/)
- [Flourish](https://flourish.studio/)
- [Charticulator](https://charticulator.com/)
- [D3](https://d3js.org/)

#### Chart Literacy

<p align="center"><img src="./images/ft.png"></p>

### Useful Tools

- [Visual Vocabulary](https://github.com/Financial-Times/chart-doctor/tree/main/visual-vocabulary)

- [Data Visualization Catalogue](https://datavizcatalogue.com/index.html)

- [Adobe Color](https://color.adobe.com/es/create/image)

- [Hex to RGB Color Converter](https://www.rapidtables.com/convert/color/hex-to-rgb.html)

- [Resize Images](https://www.reduceimages.com/)

### Create Effective Views

- Emphasize the most important data

- Orient your views for legibility

- Organize your views

- Avoid overloading your views

- Limit the number of colors and shapes in a single view

- __Design Holistic Dashboards!!!__

# The End