# Visualization in python

Libraries:

- matplotlib (this standard one!)
- seaborn
- bokeh
- ggplot
- plotly

etc.

In [None]:
### Generate some random numbers for later use

import numpy as np
import pandas as pd

# Data
df=pd.DataFrame({
    'x': range(1,11),
    'y_1': np.random.randn(10),
    'y_2': np.random.randn(10)+range(1,11),
    'y_3': np.random.randn(10)+range(11,21)
})
df

## 1. Concepts of a matplotlib graph

![Chart Component](https://matplotlib.org/_images/anatomy.png "Chart components")

There's the initialize steps.

```python

import matplotlib.pyplot as plt  # The essential import


```

In [None]:
# Do the import
import matplotlib.pyplot as plt

## 2. line plot




### 2.1 Standard methods

Usage:
1. ```plt.plot([1, 4, 9, 16])```
    - Single list: the list is treated as y-values, x-values are assumed to be `range(len(first-list))`
2. ```plt.plot([1, 2, 3, 4], [1, 4, 9, 16])```
    - Two lists: the first list is treated as x-values, the second list is treated as y-values
3. ```plt.plot([1, 2, 3, 4], [1, 4, 9, 16], "ro")```
    - Third parameter indicate the line style
4. ```plt.plot([1, 2, 3, 4], [1, 4, 9, 16], "g-^", [1, 4, 9, 16])```
    - We can add multiple
5. ```plt.plot("x", "y", "ro", data=<subscriptable>)```
    - In case we have a pandas dataframe, you want to plot with two columns
    
a line object is returned, which can be used for further configuration    
    
What means **subscriptable**? Answer: Any object that you can do ```aObject['key']```

**Demonstration**

In [None]:
plt.plot([1, 4, 9, 16])

In [None]:
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])

In [None]:
plt.plot([1, 2, 3, 4], [1, 4, 9, 16], "g-^", [1, 4, 9, 16])

### 2.2 Pandas's shortcut

Pandas provides a very convenience shortcut to access matplotlib functionality.  The syntax is much shorter!

**Demonstration**

In [None]:
df.set_index("x").plot()

## 3. Bar plot on categorical data

### 3.1 Standard methods

1. ```plt.bar([0,1,2,3], [1, 4, 9, 16])```
    - You need both x and y values for a bar chart
2. ```plt.bar(['apple', 'orange', 'lemon', 'mango'], [1, 4, 9, 16])```
    - x values can be categorical
    
a rect object is returned, which can be used for further configuration

**Demonstration**

In [None]:
# Show
plt.bar(['apple', 'orange', 'lemon', 'mango'], [1, 4, 9, 16])

### 3.2 Pandas's shortcut

We need to use `.plot.bar()` as the shortcut.

In [None]:
df.set_index("x").plot.bar()

## 4. Scatter plot



### 4.1 Standard methods

1. Standard
    - `plt.scatter([0,1,2,3], [1, 4, 9, 16])`
2. Add color and size to scatter plot
    - `plt.scatter('a', 'b', c='c', s='d', data=data)`

In [None]:
# Show
plt.scatter([0,1,2,3], [1, 4, 9, 16], c='skyblue', s=100)

### 4.2 Pandas's shortcut

**Exercise**

> Plot the samples of height vs weight in `PP-L5-Human_Weight.csv`
1. Load the csv into a dataframe
2. Use `plt.scatter()` function to plot the data

In [None]:
hw_df = pd.read_csv("https://drive.google.com/u/1/uc?id=1imrvK_KN8JrjJSWmyZscQ4U1xfqeY-7l&export=download")

# Type your code below


Use `plot.scatter()` as the shortcut.

In [None]:
df.plot.scatter(x="x", y="y_1")

## 5. Histogram



Histogram is the aggregated form of data.  It groups a series of values into different bins of same size.  E.g.  1-3, 3-5, 5-7

We can also acess it from the standard `plt.hist()` or pandas's `df.hist()`.

**Demonstration**

In [None]:
# Try it
hw_df = pd.read_csv("https://drive.google.com/u/1/uc?id=1imrvK_KN8JrjJSWmyZscQ4U1xfqeY-7l&export=download")
hw_df = pd.DataFrame({
    'height': [148,157,157,157,157,168,168,168,175,175],
    'width': [50,60,70,70,70,70,80,80,80,90]})


hw_df.hist()

## 6. Figure and Axe

When we do `plt.xxx()`, matplotlib plot it onto a predefined axe in a predifined fig.

However, we can have control over the figure and the axes.

### 6.1 Multiple plots

```python
fig = plt.figure()  # an empty figure with no Axes
fig, ax = plt.subplots()  # a figure with a single Axes
fig, axs = plt.subplots(2, 2)  # a figure with a 2x2 grid of Axes
```

Everytime `plt.figure()` or `plt.subplots()` is run, the system memorize the current figure and current axe.

Every subsequence plotting action will be plotted on that axe (unless you specified the axe).

Behind the scene, ```plt.plot([1, 4, 9, 16])``` will look up the current axe, then run ```ax.plot([1, 4, 9, 16])```

Indeed, we can draw directly on a particular axe.

```python
fig, axs = plt.subplots(2, 2)
axs[1][0].plot([1, 4, 9, 16])

```

**Demonstration**

In [None]:
fig, axs = plt.subplots(2, 2)
axs[1][0].plot([1, 4, 9, 16])

### 6.2 Styling your plot

1. Add a title to your figure
    - `fig.suptitle('')`
2. Add a title to your axe
    - `ax.set_title('')`
3. Set x label:
    - `ax.set_xlabel('')`
4. Set y label:
    - `ax.set_ylabel('')`
5. Add legend:
    - `ax.legend()`
6. Add text:
    - `ax.text()`
7. Set some predefined style (This is a global setting):
    - `plt.style.use('seaborn-darkgrid')`
    - `plt.style.use('ggplot')`

8. And more:
    - `ax.set_xticks([])`
    - `ax.set_xticklabels([])`
    - `ax.set_yticks([])`
    - `ax.set_yticklabels([])`

In [None]:
# Show
fig, axs = plt.subplots(2, 2)
axs[1][0].plot([1, 4, 9, 16], [2,6,9,10], [1,5,15,27])
axs[0][1].set_title('Hello')
axs[1][0].legend(['M','F'])

### 6.3 Divide into multiple series

We can draw mulitple plots onto the same axe.

Also some style you may add.



In [None]:
# Show
plt.plot( 'x', 'y_1', data=df, marker='o', markerfacecolor='blue', markersize=12, color='skyblue', linewidth=4)
plt.plot( 'x', 'y_2', data=df, marker='', color='olive', linewidth=2)
plt.plot( 'x', 'y_3', data=df, marker='', color='olive', linewidth=2, linestyle='dashed', label="toto")
plt.legend()

## 7. Fancy graphs

Matplotlib has a lot of support, which allow us to highly customize the chart.

**Examples**

https://matplotlib.org/3.1.0/gallery/lines_bars_and_markers/scatter_hist.html

https://matplotlib.org/3.1.1/gallery/images_contours_and_fields/image_annotated_heatmap.html


However, it will take as a lot of time to acomplish that.  Thus, we will use Seaborn or fancy graphs. (See next lesson!)