In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Plotting Data with the Matplotlib Package

Sometimes data is straightforward to understand if simply printed as numbers, like we've been doing:

In [None]:
[1, 2, 4, 9, 20]

But visual charts are often much more effective at conveying patterns in data! 
Matplotlib can create plots out of collections of data.  For example, a line graph:

In [None]:
plt.plot([1, 2, 4, 9, 20]);
# The ";" indicates that you just want to see the plot

These figures can then be saved as image files and shared with others, like in presentations or papers:

In [None]:
plt.plot([1, 2, 4, 9, 20]);
plt.savefig("myplot.png")

In this notebook, we'll explore a few of the basic plots and how to adjust them to be more informative. 

### Special Things to Look Out For

  - **Line Order Matters**: Pay attention to what happens when code is written in a different order (e.g. data creation before plotting, plotting before labeling)
  - **Functions Have Optional Parameters that Change Their Behavior**: Many of the plotting functions will let you modify them by adding a named parameter (e.g. plt.hist(data, bins=20, density=True, cumulative=False)).  

## Line Plots with the plt.plot() function

  - **`plt.plot(data)`**: Makes a line plot, with the y axis being the values in `data`
  - **`plt.plot(x, y)`**: Makes a line plot, with the x and y axis matching the values in x and y.  
    - *Note*: x and y must have the same length.


**Exercises**

Make a line plot of this list.  What does the data look like?

In [None]:
[1, 2, 4, 2, 1]

Make a line plot of this list. What does the data look like?

In [None]:
data = np.sin(np.linspace(0, np.pi * 4, 200))

Notice that the above plot's x axis doesn't quite match the values.  Make a line plot of the data, specifying both the x and y axis this time.  What is different about this plot?

In [None]:
x = np.linspace(0, np.pi * 4, 200)
data = np.sin(x)

## Labeling Your Data: Adding Text

Let's add some text to our plots, so people understand what the plots represent!

  - **`plt.title("My Title")`**: Adds a title to the current plot
  - **`plt.xlabel("X")`**:  Adds a label to the x axis
  - **`plt.ylabel("Y")`**: Adds a label to the y axis
  - **`plt.plot(data, label="My label"); plt.legend()`**:  Labels the plotted data and makes a legend with the labels.
    - *Note*: Normally this is done on seperate lines of code.


Example: You can combine them together and label everything!

In [None]:
plt.plot([1, 2, 4, 8, 16], label="Multiples of 2")
plt.plot([1, 3, 9, 27], label="Multiples of 3")
plt.title("Comparing Multiples of Integers")
plt.xlabel("X")
plt.ylabel("Y")
plt.legend();

**Exercises**

Make a line plot of the data and add the title "Sin of X":

In [None]:
x = np.linspace(0, np.pi * 10, 200)
data = np.sin(x)

Make a line plot of the data and label the Y axis "Sin of X":

In [None]:
x = np.linspace(0, np.pi * 10, 200)
data = np.sin(x)

Make a line plot of the data put the label "Sin of X" in a legend:

In [None]:
x = np.linspace(0, np.pi * 10, 200)
data = np.sin(x)

Make a line plot of each dataset put the labels "Sin of X" and "Cos of X" in a legend:

In [None]:
x = np.linspace(0, np.pi * 4, 200)
data1 = np.sin(x)
data2 = np.cos(x)

## Histograms

When you have a lot of data, often you just want to know how much data you have and around what values your data is clustered (the data's "distribution").  This is where histograms come in; on the x axis, they show the values in your data, and on the y axis, how often those values (grouped together in "bins") occured in your dataset.

  - **`plt.hist(data)`**: Make a histogram of the data
  - **`plt.hist(data, bins=20)`**: Make a histogram of the data, specifying the number of bins
  - **`plt.hist(data, density=True)`**: Make a probability distribution of the data (normalizes the y axis)
  - **`plt.hist(data, cumulative=True)`**: Make a cumulative histogram of the data
  - **`plt.hist(data, label="My data"); plt.legend()`**: label the data and put the label in a legend.
  - **`plt.hist(data, alpha=0.5)`**: Specify how opaque (i.e. how not-transparent) the plot is, from 0-1.
  

**Exercises**

Make a histogram of this data.  Looking at the plot, how would you describe the data's distribution (min, max, average, shape)?

In [None]:
n_data_points = 20
data = np.random.random(size=n_data_points)

Make a histogram of this data.  Looking at the plot, how would you describe the data's distribution (min, max, average, shape)?

In [None]:
n_data_points = 200
data = np.random.random(size=n_data_points)

Make a histogram of this data. Looking at the plot, how would you describe the data's distribution (min, max, average, shape)?

In [None]:
n_data_points = 2000
data = np.random.random(size=n_data_points)

Make a histogram of this data. Looking at the plot, how would you describe the data's distribution (min, max, average, shape)?

In [None]:
n_data_points = 2000
data = np.random.normal(size=n_data_points)

Make a histogram of this data, setting the number of bins to 100, then 20, then 10, then 5 (just change the number and run the code again, looking at the graph).  How does the number of bins affect your interpreation of the data's distribution?

In [None]:
n_data_points = 2000
data = np.random.normal(size=n_data_points)

Make a histogram of this data with normalized Y values (a probability density).  What is different about this, compared to the histogram above?

In [None]:
n_data_points = 200
data = np.random.normal(size=n_data_points)

Make a cumulative histogram of this data. What is different about this, compared to the histogram above?

In [None]:
n_data_points = 200
data = np.random.normal(size=n_data_points)

Make two histograms, one for each of the two datasets.  Label the data in a legend, and make the data more transparent for easier viewing.

In [None]:
n_data_points = 200
data1 = np.random.normal(0, 1, size=n_data_points)
data2 = np.random.normal(2, 0.5, size=n_data_points)

## Examples of Other Plots

### Scatter Plots

  - **`plt.scatter(x, y)`**
  

Make a scatter plot of this data:

In [None]:
x = np.random.uniform(-5, 5, size=200)
noise = np.random.normal(0, 12, size=200)
y = x ** 3 + noise

### Heatmaps

  - **`plt.imshow(data)`**
  - **`plt.imshow(data, cmap='gray')`**
  - **`plt.imshow(data, cmap='jet')`**
  - **`plt.imshow(data, cmap='viridis')`**
  - **`plt.imshow(data, cmap='PiYG')`**

Make a heatmap of the 2D data:

In [None]:
data_1d = np.sin(np.linspace(-np.pi * 4, np.pi * 4, 200))
data_2d = np.tile(data_1d, (200, 1))

### Subplots

Take a look at the example below, showing how to use the `plt.subplot` function:

In [None]:
plt.subplot(1, 2, 1) # plt.subplot(no of rows, no of columns, active subplot)
plt.plot(x)
plt.title('CDF, More bins, More Labels')
plt.xlabel("X axis")
plt.ylabel('Y Axis')
plt.xlim(-3.5, 20);

plt.subplot(1, 2, 2)
plt.plot(x, label='My Data')
plt.plot(x*2, label='My Data*2')
plt.legend();

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=a56a6638-2daf-4d94-a53f-2b9297e8f67a' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>