# Matplotlib Tutorial: Introduction to Data Visualization


Name:  
UID: 

In this tutorial, we will explore the basics of Matplotlib, a popular Python library for creating static, interactive, and animated visualizations. Matplotlib provides a wide range of functionalities for generating histograms, scatter plots, bar charts, and more.

# 1. Installation

Before getting started, ensure that Matplotlib is installed and imported in your Python environment. You can install it using pip and typically, we import the package as plt. Let's also import numpy to use some useful features

In [None]:
pip install matplotlib

In [9]:
import matplotlib.pyplot as plt
import numpy as np

# 2. Basic Plotting
We'll begin by creating a simple line plot using Matplotlib. First, let's generate some sample data. The numpy linspace function returns evenly spaced numbers over a specified interval. We can implement the function as:

np.linspace(start, stop, num_samples)

See if you can generate 100 samples between 0 and 10.

In [10]:
# Generate data
x = np.linspace(0, 10, 100)
y = np.sin(x)

Now that we have a dataset, let's plot our data. We can use the matplotlib plot function to create a simple line plot of our data. We can implement the function as:

plt.plot(x, y, ..., experiment w/ other parameters!)

In [None]:
# Create a figure (graph)
plt.figure(figsize=(8, 6))

# Plot the x and y data
plt.plot(..., ..., label='sin(x)', color='blue', linestyle='--', linewidth=2)

# Add labels and title
plt.xlabel('x')
plt.ylabel('sin(x)')
plt.title('Sine Function')

# Add legend
plt.legend()

# Show plot
plt.grid(True)
plt.show()

# 3. Scatter Plot

One of the most important parts of data science is generating data according to our needs. Let's try to generate 100 random data points and then plot them as a scatter plot. To generate our data, we can use np.random.rand, which returns random values in a given shape (for this excercise let's stick to 1 dimension):

random.rand(d0, d1, ..., dn)

The matplotlib scatter function is similar to the plot function where again there are a lot of parameters that can affect the resulting visualization. Here, let's try to plot our random data with random marker sizes as defined by the "s" parameter. 


In [None]:
# Generate random data
x = np.random.rand(...)
y = np.random.rand(...)
sizes = np.random.randint(10, 100, size=100)

# Create scatter plot
plt.figure(figsize=(8, 6))
plt.scatter(..., ..., s=..., alpha=0.5, c='red', marker='o')

# Add labels and title
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Scatter Plot')

# Show plot
plt.grid(True)
plt.show()


# 4. Bar Chart

Another important thing to consider is when to utilize certain visualization methods over others. For example, let's generate some data that is organized by an arbitrary classification. Create arrays for both the categories you wish to plot as well as their values, and we can use the plt.bar function to create a bar chart.

In [None]:
# Generate data
categories = ...
values = ...

# Create bar chart
plt.figure(figsize=(8, 6))
plt.bar(..., ..., color='green')

# Add labels and title
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Bar Chart')

# Show plot
plt.grid(True)
plt.show()


# 5. Histogram

In certain applications, we want to be able to look at the overall distribution of our data. One way we can do this is by using a histogram. Try to generate data with a specific mean and standard deviation, and observe how the resulting histogram changes accordingly. To generate data, we can use the numpy normal function as:

np.random.normal(mean, standard_dev, num_samples)

In [None]:
# Generate random data
data = np.random.normal(..., ..., ...)

# Create histogram
plt.figure(figsize=(8, 6))
plt.hist(data, bins=..., color='skyblue', edgecolor='black')

# Add labels and title
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('Histogram')

# Show plot
plt.grid(True)
plt.show()


# 6. 3D Plot

Naturally, we are not limited to visualizing data in 2 dimensions. 3d plots can be very useful when working with dense datasets with multiple features and can provide a more detailed view of the data. To plot 3-dimensional data, we need arrays with values for all 3 axes, as well as the scatter3D function. Also try other functions within the library in order to create an interactive 3d model of the data!

In [None]:
# Generate 3-dimensional data
x = ...
y = ...
z = ...

# Create and show the plot
plt.figure()
ax = plt.axes(projection='3d')
ax.scatter3D(..., ..., ..., c='r')

plt.xlabel('X')
plt.ylabel('Y')
plt.title('3D Plot')

plt.grid(True)
plt.show()

# 7. Animations

One of the most useful yet complex applications of matplotlib is to create animations of datasets that change over time, in order to highlight trends and observe the growth of the data. Use the sample code below to create a gif of a scatter plot that grows randomly.

In [45]:
from PIL import Image as im

class tutorial():
    def __init__(self):
        # Initialize object with array to store each frame of animation
        self.snaps = []
    
    def snap(self, x, y):
        # Plots the data into a figure, then saves that figure as an individual frame of animation
        TEMPFILE = "temp.png"
        fig, ax = plt.subplots()
        plt.xlabel('X')
        plt.ylabel('Y')
        plt.title("PLT Animation")
        ax.plot(x, y)

        fig.savefig(TEMPFILE)
        plt.close()
        self.snaps.append(im.fromarray(np.asarray(im.open(TEMPFILE))))

    def generate_gif(self, iter):
        # Builds each array iteratively and saves a "snapshot" at each iteration
        x = [0]
        y = [0]
        for _ in range(iter):
            x.append(x[-1] + 1)
            y.append(np.random.randint(10))
            self.snap(x,y)
        return
            
# Initialize objects and call appropriate method
gif = tutorial()
gif.generate_gif(...)

# Save the array of snapshots as a single .gif file
images = gif.snaps
images[0].save(
    'tutorial.gif',
    optimize=False,
    save_all=True,
    append_images=images[1:],
    loop=0,
    duration=500
)