# Basic Data Visualization with Matplotlib

This notebook introduces the fundamentals of data visualization using Matplotlib, one of the most popular Python visualization libraries. We'll cover various plot types, customization techniques, and best practices for creating effective visualizations.

## 1. Import Required Libraries

In [None]:
# Import the necessary libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Set up matplotlib for better display in Jupyter notebooks
%matplotlib inline

# Check matplotlib version
print(f"Matplotlib version: {plt.__version__}")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")

## 2. Setting Up the Visualization Environment

Matplotlib offers various styles and configurations to enhance the appearance of our visualizations. Let's explore some of these options.

In [None]:
# List available styles
print("Available styles:", plt.style.available)

# Set the style to a more modern look
plt.style.use('seaborn-v0_8')

# Define default figure size for better visibility
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['figure.dpi'] = 100

# Sample style demonstration
x = np.linspace(0, 10, 100)
y = np.sin(x)

fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Default style
with plt.style.context('default'):
    axes[0].plot(x, y)
    axes[0].set_title('Default Style')

# Seaborn style
with plt.style.context('seaborn-v0_8'):
    axes[1].plot(x, y)
    axes[1].set_title('Seaborn Style')

# Dark background style
with plt.style.context('dark_background'):
    axes[2].plot(x, y)
    axes[2].set_title('Dark Background Style')

plt.tight_layout()
plt.show()

## 3. Basic Line Plots

Line plots are one of the most basic and widely used plot types. They're great for showing trends over time or continuous relationships between variables.

In [None]:
# Create some sample data
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)
y3 = np.sin(x) * np.cos(x)

# Create a simple line plot
plt.figure(figsize=(10, 6))
plt.plot(x, y1)
plt.title('Simple Line Plot: Sine Function')
plt.xlabel('x')
plt.ylabel('sin(x)')
plt.grid(True)
plt.show()

# Multiple lines with customization
plt.figure(figsize=(10, 6))
plt.plot(x, y1, label='sin(x)', color='blue', linestyle='-', linewidth=2)
plt.plot(x, y2, label='cos(x)', color='red', linestyle='--', linewidth=2)
plt.plot(x, y3, label='sin(x)*cos(x)', color='green', linestyle='-.', linewidth=2)
plt.title('Multiple Line Plot with Different Line Styles')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.grid(True)
plt.show()

# Demonstrating markers
plt.figure(figsize=(10, 6))
plt.plot(x[::10], y1[::10], 'bo-', label='sin(x) with markers', markersize=8)
plt.plot(x[::10], y2[::10], 'rs--', label='cos(x) with markers', markersize=8)
plt.title('Line Plots with Markers')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.grid(True)
plt.show()

## 4. Bar Charts and Histograms

Bar charts are used to compare categorical data, while histograms help visualize the distribution of a dataset.

In [None]:
# Bar Chart Example
categories = ['Category A', 'Category B', 'Category C', 'Category D', 'Category E']
values = [25, 40, 30, 55, 15]

plt.figure(figsize=(10, 6))
plt.bar(categories, values, color='skyblue', edgecolor='darkblue')
plt.title('Simple Bar Chart')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()

# Horizontal Bar Chart
plt.figure(figsize=(10, 6))
plt.barh(categories, values, color='lightgreen', edgecolor='darkgreen')
plt.title('Horizontal Bar Chart')
plt.xlabel('Values')
plt.ylabel('Categories')
plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.show()

# Grouped Bar Chart
group_labels = ['Group 1', 'Group 2', 'Group 3', 'Group 4']
group1_values = [15, 30, 25, 40]
group2_values = [25, 20, 30, 35]

x = np.arange(len(group_labels))
width = 0.35

fig, ax = plt.subplots(figsize=(10, 6))
rects1 = ax.bar(x - width/2, group1_values, width, label='Series 1', color='lightblue')
rects2 = ax.bar(x + width/2, group2_values, width, label='Series 2', color='lightcoral')

ax.set_title('Grouped Bar Chart')
ax.set_xlabel('Groups')
ax.set_ylabel('Values')
ax.set_xticks(x)
ax.set_xticklabels(group_labels)
ax.legend()
ax.grid(axis='y', linestyle='--', alpha=0.7)

plt.tight_layout()
plt.show()

# Histogram Example
# Generate random data
data = np.random.randn(1000)

plt.figure(figsize=(10, 6))
plt.hist(data, bins=30, color='skyblue', edgecolor='darkblue', alpha=0.7)
plt.title('Histogram of Random Data')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()

# Multiple Histograms
data1 = np.random.normal(0, 1, 1000)
data2 = np.random.normal(3, 1, 1000)

plt.figure(figsize=(10, 6))
plt.hist(data1, bins=30, alpha=0.5, label='Distribution 1', color='skyblue')
plt.hist(data2, bins=30, alpha=0.5, label='Distribution 2', color='lightcoral')
plt.title('Multiple Histograms')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.legend()
plt.grid(linestyle='--', alpha=0.7)
plt.show()

## 5. Scatter Plots

Scatter plots are ideal for visualizing relationships between two continuous variables and identifying patterns or correlations.

In [None]:
# Generate sample data
np.random.seed(42)
x = np.random.rand(50)
y = 2 * x + np.random.normal(0, 0.2, 50)

# Simple scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(x, y, color='blue', alpha=0.7)
plt.title('Simple Scatter Plot')
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True, linestyle='--', alpha=0.7)
plt.show()

# Scatter plot with trend line
plt.figure(figsize=(10, 6))
plt.scatter(x, y, color='blue', alpha=0.7, label='Data points')

# Add trend line
coef = np.polyfit(x, y, 1)
poly_line = np.poly1d(coef)
x_line = np.linspace(min(x), max(x), 100)
y_line = poly_line(x_line)
plt.plot(x_line, y_line, 'r-', label=f'Trend line: y = {coef[0]:.2f}x + {coef[1]:.2f}')

plt.title('Scatter Plot with Trend Line')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.grid(True, linestyle='--', alpha=0.7)
plt.show()

# Scatter plot with size and color mapping
np.random.seed(42)
x = np.random.rand(100)
y = np.random.rand(100)
sizes = np.random.rand(100) * 500  # Varying sizes
colors = np.random.rand(100)  # Varying colors

plt.figure(figsize=(10, 6))
scatter = plt.scatter(x, y, s=sizes, c=colors, alpha=0.6, cmap='viridis')

plt.colorbar(scatter, label='Color Value')
plt.title('Scatter Plot with Varying Size and Color')
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True, linestyle='--', alpha=0.7)
plt.show()

## 6. Customizing Plot Elements

Customizing plot elements can significantly enhance the clarity and visual appeal of your visualizations.

In [None]:
# Create sample data
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Basic customization example
plt.figure(figsize=(10, 6))
plt.plot(x, y, 'b-', linewidth=2)

# Custom title and labels
plt.title('Customized Sine Wave Plot', fontsize=16, fontweight='bold')
plt.xlabel('X-axis Label', fontsize=12)
plt.ylabel('Y-axis Label', fontsize=12)

# Custom tick parameters
plt.xticks(fontsize=10)
plt.yticks(fontsize=10)

# Adding a grid
plt.grid(True, linestyle='--', alpha=0.7)

# Adding annotations
plt.annotate('Peak', xy=(1.5, 1), xytext=(2, 0.8),
            arrowprops=dict(facecolor='black', shrink=0.05, width=1.5))
plt.annotate('Trough', xy=(4.7, -1), xytext=(5.2, -0.8),
            arrowprops=dict(facecolor='black', shrink=0.05, width=1.5))

# Customizing axis limits
plt.xlim(0, 10)
plt.ylim(-1.2, 1.2)

plt.show()

# Customizing with different color schemes
plt.figure(figsize=(12, 10))

# Create 4 subplots
for i, cmap_name in enumerate(['viridis', 'plasma', 'magma', 'cividis']):
    ax = plt.subplot(2, 2, i+1)
    
    # Create data for the color map example
    x = np.random.rand(100)
    y = np.random.rand(100)
    
    # Create scatter with different colormaps
    scatter = ax.scatter(x, y, c=x+y, cmap=cmap_name, s=100, alpha=0.7)
    plt.colorbar(scatter, ax=ax, label=f'{cmap_name} colormap')
    
    ax.set_title(f'{cmap_name.capitalize()} Color Scheme')
    ax.set_xlabel('X-axis')
    ax.set_ylabel('Y-axis')
    ax.grid(True, linestyle='--', alpha=0.7)

plt.tight_layout()
plt.show()

# Custom spines and axes styling
x = np.linspace(-3, 3, 100)
y = x**2

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x, y, 'r-', linewidth=2)

# Move the left spine to x=0
ax.spines['left'].set_position(('data', 0))
# Move the bottom spine to y=0
ax.spines['bottom'].set_position(('data', 0))
# Hide the top and right spines
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

# Only show ticks on the left and bottom spines
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')

ax.set_title('Custom Spines and Axes')
plt.grid(True, linestyle='--', alpha=0.7)
plt.show()

## 7. Subplots and Multiple Figures

Creating multiple plots within a single figure allows for easy comparison and helps tell a more comprehensive data story.

In [None]:
# Simple subplots example
x = np.linspace(0, 2*np.pi, 100)

fig, axes = plt.subplots(2, 2, figsize=(12, 8))

# Top left: sin(x)
axes[0, 0].plot(x, np.sin(x), 'b-')
axes[0, 0].set_title('sin(x)')
axes[0, 0].grid(True)

# Top right: cos(x)
axes[0, 1].plot(x, np.cos(x), 'r-')
axes[0, 1].set_title('cos(x)')
axes[0, 1].grid(True)

# Bottom left: sin(2x)
axes[1, 0].plot(x, np.sin(2*x), 'g-')
axes[1, 0].set_title('sin(2x)')
axes[1, 0].grid(True)

# Bottom right: cos(2x)
axes[1, 1].plot(x, np.cos(2*x), 'm-')
axes[1, 1].set_title('cos(2x)')
axes[1, 1].grid(True)

plt.tight_layout()
plt.show()

# Subplots with shared axes
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.sin(x) * np.exp(-0.1*x)

fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True, figsize=(10, 8))

ax1.plot(x, y1)
ax1.set_title('sin(x)')
ax1.grid(True)

ax2.plot(x, y2)
ax2.set_title('sin(x) * exp(-0.1*x)')
ax2.set_xlabel('x')
ax2.grid(True)

plt.tight_layout()
plt.show()

# More complex subplot layout
fig = plt.figure(figsize=(12, 8))

# Define the grid structure: 3 rows, 3 columns
ax1 = plt.subplot2grid((3, 3), (0, 0), colspan=2)  # Spans 2 columns
ax2 = plt.subplot2grid((3, 3), (0, 2), rowspan=2)  # Spans 2 rows
ax3 = plt.subplot2grid((3, 3), (1, 0))
ax4 = plt.subplot2grid((3, 3), (1, 1))
ax5 = plt.subplot2grid((3, 3), (2, 0), colspan=3)  # Spans all 3 columns

# Add some example plots
ax1.plot(np.random.rand(10), 'ro-')
ax1.set_title('Plot 1 (Spans 2 columns)')
ax1.grid(True)

ax2.plot(np.random.rand(10), 'g^-')
ax2.set_title('Plot 2 (Spans 2 rows)')
ax2.grid(True)

ax3.plot(np.random.rand(10), 'bs-')
ax3.set_title('Plot 3')
ax3.grid(True)

ax4.plot(np.random.rand(10), 'yo-')
ax4.set_title('Plot 4')
ax4.grid(True)

ax5.plot(np.random.rand(20), 'md-')
ax5.set_title('Plot 5 (Spans 3 columns)')
ax5.grid(True)

plt.tight_layout()
plt.show()

## 8. Saving and Exporting Visualizations

Save your visualizations to various file formats for use in reports, presentations, or publications.

In [None]:
# Create a sample plot to save
plt.figure(figsize=(10, 6))
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.plot(x, y, 'b-', linewidth=2)
plt.title('Sine Wave Plot', fontsize=16)
plt.xlabel('X-axis', fontsize=12)
plt.ylabel('Y-axis', fontsize=12)
plt.grid(True, linestyle='--', alpha=0.7)

# Save the figure in various formats
# PNG format (raster) - good for web display
plt.savefig('sine_wave.png', dpi=300, bbox_inches='tight')

# JPG format (raster) - good for photographs, smaller file size
plt.savefig('sine_wave.jpg', dpi=300, bbox_inches='tight', quality=90)

# PDF format (vector) - good for printing and publications
plt.savefig('sine_wave.pdf', bbox_inches='tight')

# SVG format (vector) - good for web and editing in vector graphics software
plt.savefig('sine_wave.svg', bbox_inches='tight')

plt.show()

print("The figure has been saved in the following formats:")
print("- PNG (raster): sine_wave.png")
print("- JPG (raster): sine_wave.jpg")
print("- PDF (vector): sine_wave.pdf")
print("- SVG (vector): sine_wave.svg")

# Tips for saving high-quality figures
print("\nTips for saving high-quality figures:")
print("1. Use vector formats (PDF, SVG) for publications and printing")
print("2. Use higher DPI (300+) for raster formats (PNG, JPG) to ensure good quality")
print("3. Use bbox_inches='tight' to eliminate unnecessary whitespace")
print("4. For web display, PNG is often better than JPG for plots")
print("5. For data-heavy visualizations, consider using interactive formats")

## Conclusion

In this notebook, we've covered the basics of data visualization with Matplotlib:

- Setting up the visualization environment
- Creating various types of plots: line plots, bar charts, histograms, and scatter plots
- Customizing plot elements for better readability and visual appeal
- Working with subplots to create complex visualizations
- Saving plots in different formats for various use cases

These foundational skills will help you create effective visualizations for data analysis, presentations, and publications. As you become more comfortable with these basics, you can explore more advanced visualization techniques and libraries like Seaborn, Plotly, and others.