# Matplotlib Tutorial

Author: Lauren Gliane

Welcome to the tutorial! To get you comfortable with using Matplotlib we will cover:

1. Why Learn?
2. Plotting Lines and Points
3. Bar Charts and Histograms
4. Scatter Plots
5. Box Charts
6. Customization (markers, line, labels, grids)

**Note:** You should complete the NumPy tutorial before this one!


## 1. Why Learn Matplotlib?

Whether you’re exploring trends, explaining your project results, or impressing in a presentation, **Matplotlib lets you bring your data to life with colorful, customizable charts**. It’s a must-have skill for turning numbers into something people can actually see, understand, and it makes your work stand out in any class, research project, or career path.

In context of **Machine Learning applications**, Matplotlib can be used to:

1. **Understand model behavior**
   - Visualization can offer insights into your model by showing the learning process of a neural network or decision boundaries (where your model starts to change its mind).
2. **Hyperparameter tuning**
   - Plots can help understand the effect of changing hyperparameters on your model's performance, which lets you understand how to achieve the best results.


### Importing our libraries

If the cell below throws an error, run `pip install matplotlib numpy` in your terminal.


In [None]:
import matplotlib.pyplot as plt
import numpy as np

## 2. Plotting Lines and Points

### The Importance of Lines in ML

**Exploratory Data Analysis (EDA)** is a process taken to understand the data you are working with. EDA involves plotting your data to discover patterns and identify anomalies. To perform EDA, we need to visualize the data using tools like Matplotlib.


### Plotting Lines

[Matplotlib Line Documentation](https://matplotlib.org/stable/gallery/lines_bars_and_markers/simple_plot.html), [matplotlib.plot() Documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html), [matplotlib.show() Documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.show.html)

The `plt.plot()` function takes in two parameters. The first takes in x-values of your points, and the second takes in the y-values of your points. The function adds elements to the figures, but does not create a visible graph yet. `plt.show()` opens a graphical user interface (GUI) window to display your figures.

_Note:_
_In Jupyter Notebooks, calling `plt.show()` is not necessary, but you must call it to view your plots when running Python code in your terminal._


In [None]:
# create an array of points: (0, 0) and (10, 10)
points = np.array([[0, 0], [10, 10]])

# Use slicing to pass in the x-values first, then the y-values
plt.plot(points[:, 0], points[:, 1])
plt.show()

### Plotting Points

To plot only specified points, use the shortcut 'o' as a third parameter to plot the points in the graph.

The points or **markers** can be changed to different symbols. The [matplotlib.markers Documentation](https://matplotlib.org/stable/api/markers_api.html) provides a key for all available customization symbols.

##### Exercise 1: Try out different shapes

Open the documentation link above and plot the same points with different markers, such as a square or a star!


In [None]:
# create an array of 10 random points
points = np.random.rand(10, 2) * 10

# use the 'o' shortcut to only plot points, no line
plt.plot(points[:, 0], points[:, 1], "o")
plt.show()

#### Exercise 2

On your own, plot 3 coordinates using triangles (0,4), (3,7), and (9,17).


In [None]:
# write your code here

#### **Solution**

(Try to solve it yourself first before looking!)

Click the cells below to reveal a possible solution approach.


In [None]:
points = np.array([[0, 4], [3, 7], [9, 17]])

plt.plot(points[:, 0], points[:, 1], "^")
plt.show()

## 3. Bar Charts and Histograms

Bar charts and histograms are important in machine learning for:

1. **Visualizing and comparing categorical data:** bar charts allow you to easily compare frequencies of discrete values.
2. **Evaluating model performance:** performance metrics (such as error rates or accuracy) are often visualized through bar charts to compare the effect of changes.


### Bar Charts

[matplotlib.pyplot.bar Documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.bar.html)

Matplotlib uses the `plt.bar()` function to create bar charts.

#### Demo

Our x-values will be fruit names (categories). Our y points will be the amount of fruits (values).


In [None]:
# categories and values
x = np.array(["Apples", "Bananas", "Cherries", "Dates"])
y = np.array([5, 7, 3, 8])

plt.bar(x, y)
plt.show()

#### Exercise 3

Create a **horizontal bar** chart to show the **top 4** fruits and their % according to this [report of popular fruits](https://www.freshproduce.com/resources/consumer-trends/top-20/).

_Hint: Use [plt.barh()](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.barh.html) instead of just `plt.bar()`to create horizontal chart. If your x-axis doesn't look quite right, take a look at the docs!_


In [None]:
# write your code here

#### **Solution**

(Try to solve it yourself first before looking!)

Click the cells below to reveal a possible solution approach.


In [None]:
points = np.array(
    [
        ["Grapes", 74],
        ["Apples", 77],
        ["Berries", 82],
        ["Bananas", 84],
    ]
)

plt.barh(points[:, 0], points[:, 1])
plt.show()

## Histograms

Histograms are used to show the distribution of a dataset, this is helpful when identifying distribution type as normal, skewed, or something else. The distribution visualization feature of histograms can also provide valuable insights that can guide:

1. **Data Preprocessing**: transforming your data to a different scale to aid learning
2. **Feature Engineering**: turning raw data in to relevant information to improve your model
3. **Model Selection**: choosing a model fit for the given problem helps avoid overfitting and underfitting

#### Demo: Try it out!

To generate our histogram use the the `hist()` function to do so.
We'll use the [numpy.random.normal()](https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html) to generate 250 random values with mean 170 and std 10.


In [None]:
# generate 250 random values with mean 170 and std 10
x = np.random.normal(170, 10, 250)

plt.hist(x)
plt.show()

#### Exercise 4

Alter the code above to **change the number of bins** in the histogram using the bins parameter (e.g., `plt.hist(x, bins=20)`). Try experimenting with different bin counts and observe how it changes the distribution shape.


In [None]:
# write your code here

## 4. Scatter Plots

Scatter plots are important in ML for visually identifying patterns, trends, and correlations between variables, detecting outliers, and understanding the distribution of data.

1. Pattern and Trend Recognition: identifying themes within a dataset (image/voice recognition, movement/behavior patterns)
2. Understanding Data Distribution: discovering positive, negative, or no correlation between variable changes
3. Predictive Modeling Insights: provides insights to comparing actual values against predicted values

#### Demo: Try it out!

Use the [scatter()](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html) function to generate a scatter plot from the x and y array.


In [None]:
x = np.array([5, 7, 8, 7, 2, 17, 2, 9, 4, 11])
y = np.array([99, 86, 87, 88, 100, 86, 103, 87, 94, 78])

plt.scatter(x, y)
plt.show()

#### Exercise 5: Challenge yourself!

Modify the scatter plot to show red dots instead of default blue ones. Hint: add a `color='red'` parameter in `scatter()`.


In [None]:
# write your code here

## 5. Box Charts

A box plot (or whisker plot) shows the distribution, median, and outliers of a dataset.
Plotting box charts, or box-and-whisker plots, is important for machine learning applications because they are a compact and effective method for visualizing data distribution and detecting outliers. This is a crucial step in the exploratory data analysis (EDA) and data preprocessing phases of any ML project.

Use the [boxplot()](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.boxplot.html) function to create one!

#### Demo: Try it out!

We'll use [numpy.random.normal()](https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html) to generate random numbers again.


In [None]:
# generate random data
data = np.random.normal(100, 20, 200)

plt.boxplot(data)
plt.show()

## 6. Customization

The default graphs look clean, but customizing our graphs can better represent data themes and showcase your data. We'll cover how to add/change:

- [markers](https://matplotlib.org/stable/api/markers_api.html)
- [linestyles](https://matplotlib.org/stable/gallery/lines_bars_and_markers/linestyles.html)
- [axis labels](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.xlabel.html)
- [grid](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.grid.html)

#### Try it out!

Test the code below to see all four customizers in action.


In [None]:
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 6, 8, 14, 20])

# marker: circle
# linestyle:  dashed
# color: purple
plt.plot(x, y, marker="o", linestyle="--", color="purple")
plt.show()


### Lines and Markers

A variety of [linestyles](https://matplotlib.org/stable/gallery/lines_bars_and_markers/linestyles.html) and [markers](https://matplotlib.org/stable/api/markers_api.html) are provided in the matplotlib library. Complete the challenge below to the specified colors and markers, but also test out other symbols on your own!

#### Exercise 7: Challenge yourself!

Try customizing the line style to be dotted, use triangle markers, and set the line color to green. _(Hint: Check through the linked documentation above)_


In [None]:
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 6, 8, 14, 20])

# write your code below


#### **Solution**

(Try to solve it yourself first before looking!)

Click the cells below to reveal a possible solution approach.


In [None]:
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 6, 8, 14, 20])

plt.plot(x, y, linestyle=":", marker="^", color="green")
plt.show()

### Labeling: Title and Axes

One of the most important parts: Let's name our [title](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.title.html), [x axis](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.xlabel.html), and [y axis](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.ylabel.html).
(Click for matplotlib Documentation)

#### Exercise 8

Use the following functions to create labels for the Exercise 7 graph:

1. `plt.title(your_title)`
2. `plt.xlabel(your_xlabel)`
3. `plt.ylabel(your_ylabel)`


In [None]:
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 6, 8, 14, 20])

plt.plot(x, y, marker="o", linestyle="--", color="purple")

# write your code here

plt.show()