# Matplotlib Introduction

Matplotlib is a powerful Python library for creating static, animated, and interactive visualizations. It is widely used for data visualization in scientific computing, engineering, and data analysis.

This notebook covers:
- Basic concepts of Matplotlib
- Creating simple plots
- Customizing plots
- Plotting with example data

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Importing Matplotlib

`matplotlib.pyplot` is the main module for plotting in Matplotlib. The alias `plt` is commonly used for convenience.

## Creating a Simple Line Plot

Let's plot a simple line graph using lists of x and y values.

In [None]:
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.plot(x, y)
plt.title('Simple Line Plot')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.show()

**Explanation:**
- `x` and `y` are Python lists that store the horizontal and vertical coordinates for each data point. Each pair `(x[i], y[i])` represents a point on the graph.
- `plt.plot(x, y)` creates a line plot by connecting the data points in order. This is useful for visualizing trends or relationships between two variables.
- `plt.title('Simple Line Plot')` adds a descriptive title to the top of the plot, helping viewers understand what the graph represents.
- `plt.xlabel('X Axis')` and `plt.ylabel('Y Axis')` label the horizontal and vertical axes, clarifying what each axis measures.
- `plt.show()` displays the plot window, rendering the visualization so you can see the result.

## Customizing the Plot

You can change the appearance of your plot using various options.

In [None]:
plt.plot(x, y, color='red', linestyle='--', marker='o', label='Data')
plt.title('Customized Line Plot')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.legend()
plt.grid(True)
plt.show()

**Explanation:**
- `color='red'` changes the color of the line to red, making it stand out visually.
- `linestyle='--'` sets the line style to dashed, which can help differentiate multiple lines in a plot.
- `marker='o'` adds circular markers at each data point, making individual values easier to identify.
- `label='Data'` assigns a label to the line, which is used in the legend for clarity when multiple lines are present.
- `plt.legend()` displays a legend on the plot, showing the label for each line or marker type.
- `plt.grid(True)` overlays grid lines on the plot, making it easier to read values and compare data points.
- `plt.show()` renders the customized plot for viewing.

## Bar Chart Example

Bar charts are useful for comparing discrete categories.

In [None]:
categories = ['A', 'B', 'C', 'D']
values = [10, 24, 36, 18]
plt.bar(categories, values, color='skyblue')
plt.title('Bar Chart Example')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()

**Explanation:**
- `plt.bar(categories, values, color='skyblue')` creates a vertical bar chart, where each bar represents a category and its height corresponds to its value. The `color` argument sets the bars to a light blue shade for better visibility.
- `categories` is a list of labels for each bar, making it easy to compare discrete groups.
- `values` is a list of numbers that determine the height of each bar, representing the data for each category.
- `plt.title('Bar Chart Example')` adds a title to the chart, summarizing what the visualization shows.
- `plt.xlabel('Category')` and `plt.ylabel('Value')` label the axes, clarifying what each axis represents.
- `plt.show()` displays the bar chart so you can visually compare the categories.

## Scatter Plot Example

Scatter plots are useful for visualizing relationships between two variables.

In [None]:
x = np.random.rand(50)
y = np.random.rand(50)
plt.scatter(x, y, color='green')
plt.title('Scatter Plot Example')
plt.xlabel('X Value')
plt.ylabel('Y Value')
plt.show()

**Explanation:**
- `np.random.rand(50)` generates 50 random floating-point numbers between 0 and 1 for both x and y, simulating sample data for the scatter plot.
- `plt.scatter(x, y, color='green')` plots each (x, y) pair as a separate point on the graph, using green color for the markers. Scatter plots are ideal for visualizing the relationship or correlation between two variables.
- `plt.title('Scatter Plot Example')` adds a title to the plot, indicating what the data represents.
- `plt.xlabel('X Value')` and `plt.ylabel('Y Value')` label the axes, helping interpret the meaning of each dimension.
- `plt.show()` displays the scatter plot, allowing you to observe patterns or clusters in the data.

## Saving Plots to Files

You can save your plots to image files for use in reports or presentations.

In [None]:
plt.plot(x, y)
plt.title('Save Plot Example')
plt.savefig('my_plot.png')
plt.close()

**Explanation:**
- `plt.savefig('my_plot.png')` saves the current plot as a PNG image file named 'my_plot.png' in your working directory. This is useful for including visualizations in reports, presentations, or sharing with others.
- `plt.close()` closes the plot window and releases memory associated with the figure. This is important when generating multiple plots in a script to prevent excessive memory usage.

## Example: Pandas DataFrame Line Plot
Let's create a simple DataFrame and plot its values using matplotlib.

In [None]:
data = {
    'Year': [2020, 2021, 2022, 2023],
    'Sales': [100, 150, 200, 250]
}
df = pd.DataFrame(data)
plt.figure(figsize=(6,4))
plt.plot(df['Year'], df['Sales'], marker='o')
plt.title('Sales Over Years')
plt.xlabel('Year')
plt.ylabel('Sales')
plt.grid(True)
plt.show()

**Explanation:**
- A pandas DataFrame is created to organize yearly sales data, making it easy to manipulate and visualize.
- `plt.figure(figsize=(6,4))` sets the size of the plot, ensuring it is clear and readable.
- `plt.plot(df['Year'], df['Sales'], marker='o')` draws a line graph with years on the x-axis and sales on the y-axis, using circular markers to highlight each data point. This helps visualize trends over time.
- `plt.title('Sales Over Years')` adds a descriptive title to the plot.
- `plt.xlabel('Year')` and `plt.ylabel('Sales')` label the axes, clarifying what each axis measures.
- `plt.grid(True)` overlays grid lines, making it easier to read values and compare data points.
- `plt.show()` displays the plot for analysis.

## Example: Pandas DataFrame Bar Plot
Now let's visualize categorical data using a bar plot.

In [None]:
data = {
    'Fruit': ['Apple', 'Banana', 'Orange', 'Grape'],
    'Quantity': [30, 45, 10, 22]
}
df = pd.DataFrame(data)
plt.figure(figsize=(6,4))
plt.bar(df['Fruit'], df['Quantity'], color='skyblue')
plt.title('Fruit Quantity')
plt.xlabel('Fruit')
plt.ylabel('Quantity')
plt.show()

**Explanation:**
- A pandas DataFrame is created to store fruit names and their corresponding quantities, making the data easy to manage and plot.
- `plt.figure(figsize=(6,4))` sets the size of the plot for better visibility.
- `plt.bar(df['Fruit'], df['Quantity'], color='skyblue')` generates a bar chart with fruit names on the x-axis and quantities on the y-axis. The bars are colored light blue for visual appeal.
- `plt.title('Fruit Quantity')` adds a title to the chart, summarizing the data being presented.
- `plt.xlabel('Fruit')` and `plt.ylabel('Quantity')` label the axes, clarifying what each axis represents.
- `plt.show()` displays the bar chart, allowing for easy comparison of quantities across different fruits.

## Example: Pandas DataFrame Scatter Plot
Let's visualize the relationship between two variables using a scatter plot.

In [None]:
data = {
    'Height': [150, 160, 170, 180, 190],
    'Weight': [50, 60, 65, 80, 90]
}
df = pd.DataFrame(data)
plt.figure(figsize=(6,4))
plt.scatter(df['Height'], df['Weight'], color='green')
plt.title('Height vs Weight')
plt.xlabel('Height (cm)')
plt.ylabel('Weight (kg)')
plt.show()

**Explanation:**
- A pandas DataFrame is created to store height and weight data for several individuals, making it easy to plot and analyze.
- `plt.figure(figsize=(6,4))` sets the size of the plot for clarity.
- `plt.scatter(df['Height'], df['Weight'], color='green')` creates a scatter plot with height on the x-axis and weight on the y-axis. Each point represents an individual's measurements, and the green color makes the points stand out.
- `plt.title('Height vs Weight')` adds a descriptive title to the plot.
- `plt.xlabel('Height (cm)')` and `plt.ylabel('Weight (kg)')` label the axes, clarifying the units and meaning of each axis.
- `plt.show()` displays the scatter plot, making it easy to observe relationships or patterns between height and weight.

## Summary

Matplotlib provides flexible tools for creating a wide variety of plots, including line, bar, and scatter plots. By customizing colors, labels, and styles, you can make your data visualizations clear and informative.

Pandas is a powerful library for data manipulation and analysis. It works seamlessly with Matplotlib, allowing you to easily plot data stored in DataFrames.

**Key Takeaways:**
- Use Matplotlib for visualizing data in many formats.
- Use Pandas to organize and prepare your data for plotting.
- Customize your plots with titles, axis labels, colors, markers, legends, and grids.
- Save your plots to files for sharing or reporting.

**Practice:**
- Try modifying the example code to use your own data and experiment with different plot types and customizations.
- Explore more advanced features such as subplots, histograms, pie charts, and interactive plots.
- Combine Pandas and Matplotlib to analyze and visualize real-world datasets.