# Matplotlib Tutorial
Author: Lauren Gliane

Welcome to the tutorial! To get you comfortable with using Matplotlib we will cover:
1. Why learn?
2. Plotting Lines/Points
3. Bar Charts and Histograms
4. Scatter plots
5. Box Charts
6. Customization (markers, line, labels, grids)
7. Random Data and Importing Data


## 1. Why Learn Matplotlib?
Learning Matplotlib allows you to turn boring data into eye-catching, story-driven visuals. Whether you’re exploring trends, explaining your project results, or impressing in a presentation, Matplotlib lets you bring your data to life with colorful, customizable charts. It’s a must-have skill for turning numbers into something people can actually see, understand, and remember—and it makes your work stand out in any class, research project, or career path.

Using Matplotlib gives you the most customization and flexibility when creating visualized data. Create anything you want: graph your performance, plot a dataset you found, and teach others this valuable skill!

## 2. Plotting A Line
The plot() function takes in two parameters. The first takes in the points on the x axis. The second takes in the points on the y axis.

[Matplotlib Line Documentation](https://matplotlib.org/stable/gallery/lines_bars_and_markers/simple_plot.html)

#### Test it out
Run this code below to practice calling the **array(), plot(), and show()** functions from the libraries we just imported.

1. Import **libraries** (pyplot and numpy)
2. Create an **array of x-points** (ranging 0 to 10) into **xpoints variable**
3. Create an **array of y-points** (ranging 0 to 240) into **ypoints variable**
4. Generate and show plot using **plot(xpoints, ypoints)** and **show()**

In [None]:
# import our libraries
import matplotlib.pyplot as plt
import numpy as np

# create an array using the array() function in the numpy library
xpoints = np.array([0, 10])
ypoints = np.array([0, 240])

# generate and show our plot using the plot() and show() functions from pyplot library
plt.plot(xpoints, ypoints)
plt.show()

### Plotting Points
To plot only specified points, use the shortcut 'o' as a third parameter. This means rings.

Other 


In [None]:
# create an array
xpoints = np.array([0, 10])
ypoints = np.array([0, 240])

# use the 'o' shortcut to only plot points, no line
plt.plot(xpoints, ypoints, 'o')
plt.show()

### Plotting Multiple Points
Below is an example of how to plot just the four coordinates (1,3), (2,7), (3,9), (5,2).

#### Try it out!
On your own, alter the example code below and plot 3 coordinates (0,4), (9,7), and (3,17).

In [None]:
# two arrays of coords (1,3), (2,7), (3,9), (5,2)
xpoints = np.array([1, 2, 3, 5])
ypoints = np.array([4, 7, 9, 2])

plt.plot(xpoints, ypoints)
plt.show()

## 3. Bar Charts and Histograms

### Bar Charts

Bar charts are useful when comparing values between different categories. Matplotlib uses the bar() function to create these.

[matplotlib.pyplot.bar Documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.bar.html)

#### Try it out!
Our x points will be fruit names (categories). Our y points will be the amount of fruits (values).

In [None]:
# categories and values
x = np.array(["Apples", "Bananas", "Cherries", "Dates"])
y = np.array([5, 7, 3, 8])

plt.bar(x, y)
plt.show()

#### Challenge yourself!
Change the bar chart we just made into a horizontal bar chart using [barh()](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.barh.html).
Change the fruits and values to reflect the values in this [report of popular fruits](https://www.freshproduce.com/resources/consumer-trends/top-20/).

## Histograms
Histograms are use to show the distribution of a dataset. We use the the hist() function to do so.

#### Try it out!
We'll use the [numpy.random.normal()](https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html) to generate 250 random values with mean 170 and std 10.

In [None]:
# generate 250 random values with mean 170 and std 10
x = np.random.normal(170, 10, 250)

plt.hist(x)
plt.show()

#### Challenge yourself!
Change the number of bins in the histogram using the bins parameter (e.g., plt.hist(x, bins=20)). Try experimenting with different bin counts and observe how it changes the distribution shape.

## 4. Scatter Plots

Scatter plots show the relationship between two variables. Use the [scatter()](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html) function.

#### Try it out!

x = np.array([5, 7, 8, 7, 2, 17, 2, 9, 4, 11])
y = np.array([99, 86, 87, 88, 100, 86, 103, 87, 94, 78])

plt.scatter(x, y)
plt.show()

#### Challenge yourself!
Modify the scatter plot to show red dots instead of default blue ones. Hint: add a color='red' parameter in scatter().

## 5. Box Charts
A box plot (or whisker plot) shows the distribution, median, and outliers of a dataset. Use the [boxplot()](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.boxplot.html) function to create one!

#### Try it out!
We'll use [numpy.random.normal()](https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html) to generate random numbers again.

In [None]:
# generate random data
data = np.random.normal(100, 20, 200)

plt.boxplot(data)
plt.show()

#### Challenge yourself!
Generate two sets of random data and plot them together in the same box chart by passing a list of datasets into boxplot(). For example:

In [None]:
plt.boxplot([data1, data2])

## Customization
The default graphs look clean, but customizing our graphs can better represent data themes and showcase your data. We'll cover how to add/change:
- [markers](https://matplotlib.org/stable/api/markers_api.html)
- [linestyles](https://matplotlib.org/stable/gallery/lines_bars_and_markers/linestyles.html)
- [axis labels](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.xlabel.html)
- [grid](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.grid.html)

#### Try it out!
Test the code below to see all four customizers in action.

In [None]:
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 6, 8, 14, 20])

plt.plot(x, y, marker='o', linestyle='--', color='purple')
plt.xlabel("X Axis")
plt.ylabel("Y Axis")
plt.grid(True)
plt.show()


#### Challenge yourself!
Try customizing the line style to be dotted (':'), use triangle markers ('^'), and set the line color to green.

## 8. Random Data and Importing Data
Sometimes you won’t always type in your own data points. You might want to generate random data for testing or import existing data from a file.

### Generating Random Data
You can use the [numpy.random.rand()](https://numpy.org/doc/stable/reference/random/generated/numpy.random.rand.html) function to generate random points between a 0 and 1.

In [None]:
# generate 50 random x and y points between 0 and 1
x = np.random.rand(50)
y = np.random.rand(50)

# create a scatter plot
plt.scatter(x, y)
plt.show()

### Importing Data from a File
You can also import real-world data using the [numpy.loadtxt()](https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html) function. Suppose you have a CSV file named data.csv with two columns (x, y):

##### data.csv

1,2
2,4
3,1
4,3
5,7

You can load and plot data.csv like this:

In [None]:
# load CSV data into two arrays
data = np.loadtxt("data.csv", delimiter=",")
x = data[:,0]   # first column
y = data[:,1]   # second column

plt.plot(x, y, marker='o')
plt.show()

#### Challenge yourself

1. Use np.random.rand() to generate 100 random points, but scale them up to be between 0 and 100 instead of 0 and 1.
Hint: multiply the output of rand() by 100.

2. Create your own small CSV file with at least 5 (x, y) pairs and use loadtxt() to plot them.