# Visualizations

## Matplotlib

- Matplotlib is a Python library for producing plots and other two-dimensional data visualizations.

- It is an open-source drawing library that supports drawings.

- Used to draw 2D and 3D graphics.

- Data can easily visualize with the help of matplotlib and generate plots, histograms, and bar charts.

## Types of Plots
1. Bar Chart

2. Scatter Plot

3. Pie Chart

4. Histogram

5. Line chart

6. Area Graph

7. etc . . .

There are a number of plotting libraries, but we will focus on `matplotlib`. Another plotting libabry that is popular is named [`seaborn`](https://seaborn.pydata.org/). 

In [None]:
import seaborn as sns

In [None]:
sns.get_dataset_names()

In [None]:
titanic = sns.load_dataset('titanic')
titanic.head(5)

In [None]:
type(titanic)

In [None]:
tips = sns.load_dataset('tips')
tips.head(5)

In [None]:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (12, 5)
plt.rcParams['figure.dpi'] = 100

**Line Plots**

Perhaps the simplest of all plots is the visualization of a single function $y=f(x)$. For all matplotlib plots, we start by creating a figure and an axes. In their simplest form, a figure and axes can be created as follows

In [None]:
fig = plt.figure()
ax = plt.axes()

In [None]:
type(fig)

In [None]:
type(ax)

<center><img src="https://files.realpython.com/media/fig_map.bc8c7cabd823.png" width="300" height="300"></center>

The `plt.plot()` function takes additional arguments that can be used to specify these. To adjust the color, you can use the color keyword, which accepts a string argument representing virtually any imaginable color.

In [None]:
import numpy as np
x = np.linspace(0, 10, 100)

In [None]:
fig, ax = plt.subplots()
plt.plot(x, np.sin(x), color='blue');            # Specify the color by name
plt.plot(x, np.sin(x-1), color='g');             # Short color code (rgbcmyk)
plt.plot(x, np.sin(x-2), color='0.75');          # Grayscale between 0 and 1
plt.plot(x, np.sin(x-3), color='#FFDD44');       # HEX code (RRGGBB from 00 to FF)
plt.plot(x, np.sin(x-4), color=(1.0, 0.2, 0.3)); # RGB tuple (0 and 1)
plt.plot(x, np.sin(x-5), color='chartreuse');    # All HTML color names are supported
ax.set_xlabel('$x$')
ax.set_ylabel('$f(x)$')
ax.set_title('Sine Functions');

Matplotlib does a decent job of choosing default axes limits for your plot, but sometimes it’s nice to have finer control.

In [None]:
plt.plot(x, np.sin(x))
plt.xlim(-1,11)
plt.ylim(-1.5, 1.5);

Titles and axis labels are the simplest such labels there are methods that can be used to quickly set them.

In [None]:
plt.plot(x, np.sin(x))
plt.xlabel('x')
plt.ylabel('sin(x)')
plt.title("A Sine Curve");

In [None]:
x = np.random.randint(low = 1, high = 11, size = 50)
y = x + np.random.randint(1, 5, size = x.size)
data = np.column_stack((x, y))
data[:5]

In [None]:
fig, (ax1, ax2) = plt.subplots(nrows = 1, ncols = 2, figsize = (8, 4))

In [None]:
fig, (ax1, ax2) = plt.subplots(nrows = 1, ncols = 2, figsize = (8, 4))

ax1.scatter(x = x, y = y, marker = 'o', c = 'r', edgecolor = 'b')
ax1.set_title('Scatter: $x$ versus $y$')
ax1.set_xlabel('$x$')
ax1.set_ylabel('$y$')

ax2.hist(data, bins=np.arange(data.min(), data.max()), label = ('x', 'y'))
ax2.legend(loc = (0.65, 0.8))
ax2.set_title('Frequencies of $x$ and $y$')
ax2.yaxis.tick_right()

**Bar plots** 

A bar plot displays the counts of the labels from a categorical column. This can be done using `pandas`

```
df.plot.bar()
```

or using `matplotlib`

```
plt.bar()
```

In [None]:
url = 'https://raw.githubusercontent.com/mahmoudharding/exceed/main/data/titanic.csv'
titanic = pd.read_csv(url)
titanic

In [None]:
titanic['Survived'].value_counts()

In [None]:
titanic['Sex'].value_counts()

In [None]:
pd.crosstab(titanic['Survived'], titanic['Sex'])

In [None]:
tbl = pd.crosstab(titanic['Sex'], titanic['Survived'])
tbl

In [None]:
tbl.plot.bar()

In [None]:
tbl.plot.bar(rot=0);

In [None]:
titanic.info()

In [None]:
titanic['Survived'][:5]

In [None]:
titanic['Survived'] = titanic['Survived'].astype('category')
titanic['Survived'][:5]

In [None]:
titanic['Survived'].value_counts()

In [None]:
titanic['Survived'].cat.categories = ['Died', 'Survived']
titanic['Survived'][:5]

In [None]:
titanic['Survived'].value_counts()

**Note:** 

- `x` represents the categories.

- `height` represents the corresponding heights.

In [None]:
plt.bar(x = ['Died', 'Suvived'], height = titanic['Survived'].value_counts());

Let's convert the `Embarked` and `Sex` columns to type category.

In [None]:
titanic.Embarked = titanic.Embarked.astype('category')
titanic.Sex = titanic.Sex.astype('category')

In [None]:
print(titanic['Embarked'][:6])
print('\n')
print(titanic['Sex'][:6])

In [None]:
titanic.Embarked.cat.categories = ["Cherbourg", "Queenstown", "Southampton"]
titanic.Sex.cat.categories = ["Female", "Male"]

**Stacked Bar Plot**

In [None]:
tbl = pd.crosstab(titanic.Embarked, titanic.Survived)
tbl

In [None]:
type(tbl)

In [None]:
print(titanic.Embarked.cat.categories)

Using `.loc` we can slice our two-way contingency table. 

In [None]:
tbl.loc[:, 'Survived']

In [None]:
tbl.loc[('Cherbourg','Southampton'), 'Survived':]

In [None]:
p1 = plt.bar(x = ["Cherbourg", "Queenstown", "Southampton"],
             height = tbl.loc[:, "Died"], label = 'Died')

p2 = plt.bar(x = ["Cherbourg", "Queenstown", "Southampton"],
             height = tbl.loc[:, "Survived"],
             bottom = tbl.loc[:, "Died"], label = 'Survived')

plt.title("Passenger Counts for Each Port")
plt.xlabel("\nPort Embarked")
plt.ylabel("Count")
plt.legend();

In [None]:
labels = ["Cherbourg", "Queenstown", "Southampton"]

fig, ax = plt.subplots()
ax.bar(labels, tbl.loc[:, "Died"], label = 'Died')
ax.bar(labels, height = tbl.loc[:, "Survived"], 
               bottom = tbl.loc[:, "Died"], 
               label = 'Survived')

ax.set_title("Passenger Counts for Each Port")
ax.set_xlabel("\nPort Embarked")
ax.set_ylabel('Count')
ax.legend();

In [None]:
tbl.plot.bar(stacked = True, rot = 0);

**Side by Side Bar Chart**

In [None]:
tbl.plot.bar(rot = 0);

In [None]:
ax = tbl.plot.bar(rot = 0)
ax.set_title("Passenger Counts For Each Port")
ax.set_xlabel("\nPort")
ax.set_ylabel("Count")
ax.legend(loc = "upper left");

**Histograms**

A histogram is an approximate representation of the distribution (the frequency and pattern) of numerical data.

Now let's look at the distribution of the ages of the passengers on the titanic.

In [None]:
titanic.Age.hist();

In [None]:
ax = titanic.Age.hist()
ax.grid(False)
ax.set_title("Distribution of Passenger Ages")
ax.set_xlabel("\nAges")
ax.set_ylabel("Count");

In [None]:
ax = titanic.Age.hist(edgecolor = "black")
ax.grid(False)
ax.set_title("Distribution of Passenger Ages")
ax.set_xlabel("\nAges")
ax.set_ylabel("Count");

In [None]:
ax = titanic.Age.hist(color = "red", edgecolor = "black", bins = 6)
ax.grid(False)
ax.set_title("Distribution of Passenger Ages")
ax.set_xlabel("\nAges")
ax.set_ylabel("Count");

In [None]:
bins = [0, 10, 20, 30, 40, 50, 60, 70]
ax = titanic.Age.hist(color = "lightblue", edgecolor = "blue", bins = bins)
ax.grid(False)
ax.set_title("Distribution of Passenger Ages")
ax.set_xlabel("\nAges")
ax.set_ylabel("Count");

**Side by Side Histograms**

In [None]:
ax1, ax2 = titanic.Age.hist(edgecolor = "black", bins = bins, by = titanic.Survived, rot = 0)
ax1.set_xlabel("\nAge")
ax2.set_xlabel("\nAge");

**Overliad Histograms**

In [None]:
male = titanic.loc[titanic.Sex == 'Male'].Age
female = titanic.loc[titanic.Sex == 'Female'].Age

num_bins = 10

ax = male.hist(bins = num_bins, alpha = 0.5, label = 'Male', edgecolor = 'black')
ax = female.hist(bins = num_bins, alpha = 0.5, label = 'Female', edgecolor = 'black')
ax.grid(False)
ax.set_xlabel("\nFare")
ax.set_ylabel("Count")
ax.legend();

In [None]:
age_died = titanic.loc[titanic.Survived == "Died", "Age"]
age_survived = titanic.loc[titanic.Survived == "Survived", "Age"]
ax = age_died.hist(alpha = 0.5, label = 'Died', edgecolor = 'black')
ax = age_survived.hist(alpha = 0.5, label = 'Survived', edgecolor = 'black')
ax.grid(False)
ax.set_xlabel("\nAge")
ax.set_ylabel("Count")
ax.legend();

**Scatter Plot**

Scatter plots are used to visualize the linear association between two numerical features.

In [None]:
titanic.plot.scatter(x = "Age", y = "Fare");

In [None]:
titanic.plot.scatter(x = "Age", y = "Fare", c = "red", s = 40, marker = "^");

In [None]:
ax = plt.scatter(x = titanic.loc[titanic.Survived == "Died"].Age, 
                 y = titanic.loc[titanic.Survived == "Died"].Fare, 
                 alpha = 0.5, c = "red", 
                 label = "Died")
ax = plt.scatter(x = titanic.loc[titanic.Survived == "Survived"].Age, 
                 y = titanic.loc[titanic.Survived == "Survived"].Fare, 
                 alpha = 0.25, c = "blue", 
                 label = "Survived")

plt.title("Scatter Plot of Age vs. Fare by Survival Status")
plt.xlabel("Age")
plt.ylabel("Fare")
plt.legend();

**Line Plot**

A line plot is a graph that shows frequency of data along a number line.

In [None]:
avocado = pd.read_csv('https://raw.githubusercontent.com/mahmoudharding/exceed/main/data/avocado.csv', index_col = 0)
avocado.head(5)

Suppose we want the average total volume for each day.

In [None]:
...

In [None]:
...

In [None]:
...

In [None]:
...

In [None]:
avocado_vol = pd.read_csv('https://raw.githubusercontent.com/mahmoudharding/exceed/main/data/avocado_total_vol.csv', index_col = 0)
avocado_vol.head(5)

In [None]:
labels = avocado_vol['Date'].to_list()
labels

In [None]:
xs = np.arange(12)
ax = avocado_vol.plot('Date', 'Average Total Volume', legend = None)
ax.set_xticks(xs)
ax.set_xticklabels(labels, rotation = -45)

plt.title("Average Total Volume")
plt.xlabel("\n Date")
plt.ylabel("Volume");