# Creating Visualizations

<center><img src="../images/stock/pexels-goumbik-669622.jpg" width="500"></center>

Often, just looking at the raw numbers in a dataset isn't enough to really grasp what's going on.

That's where **data visualization** comes in! By presenting information as visuals like charts and graphs, we can tap into our natural ability to spot trends and patterns much more easily and effectively.

# Plotting with Matplotlib

**Matplotlib** is a popular and incredibly useful Python library specifically designed for creating data visualizations.

The way Matplotlib works is by organizing each plot into a set of distinct elements. This structured approach gives you a lot of control, allowing you to create highly customized charts and graphs to suit your needs.

## Coding Styles

Matplotlib offers two primary approaches to coding:

__Explicit (Object-Oriented) Style:__ 

* In this style, you create Figure and Axes objects directly and then use their methods to create plots.
* This approach provides the greatest control and flexibility, especially for complex plots with multiple subplots or custom layouts.

__Implicit (Pyplot) Style:__ 

* This style relies on the `pyplot` module to automatically create and manage Figure and Axes objects.
* You use `pyplot` functions (like `plt.plot()`, `plt.bar()`, `plt.hist()`) to add plotting elements. Matplotlib handles the underlying object creation and management.

For simpler plotting tasks, the implicit (`pyplot`) style is often sufficient and more concise. 

Since this lesson focuses on introducing the basic plot types, we'll primarily use the `pyplot` style for its ease of use. 

However, for more advanced or intricate visualizations, the explicit, object-oriented approach is generally recommended.

## Core Workflow

To bring your data to life with Matplotlib, you'll generally follow these core steps:

### 0 Installing Matplotlib

To use Matplotlib in your Python environment, you would typically install it using the following command:

```bash
!pip install matplotlib
```

However on our Jupyter Hub environment, the Matplotlib package is already set up and ready for you to use. 

### 1. Importing Matplotlib

To start using the plotting capabilities of Matplotlib in your Python code, the first step is to import the necessary module. The convention when working with Matplotlib is to import the `pyplot` module and give it a shorter alias, `plt`.

You'll typically see the following line at the beginning of your Python scripts or Jupyter Notebook cells when you want to create plots:

```python
import matplotlib.pyplot as plt
```

Let's go ahead and import Matplotlib using the code cell below:

In [None]:
## Begin Example


## End Example

### 2. Prepare the Data

As we've discussed extensively, and you've already practiced, a crucial step *before* visualizing any data is to ensure it's in the right format. This means that to create meaningful plots, you must first:

* **Acquire the data:** Get your hands on the information you want to visualize.
* **Clean the data:** Handle any missing values, errors, or inconsistencies.
* **Perform necessary transformations:** Reshape, aggregate, or calculate new features as needed for your visualization.

Essentially, well-prepared data is the foundation for creating clear and insightful visualizations. 

### 3. Create a Figure

In Matplotlib, every plot you create lives within a **Figure** object. Think of the Figure as the entire canvas or window on which your plot will be drawn. You need to create a Figure before you can start adding any actual plots (like lines, bars, or scatter points).

You can create a new, empty Figure using the `plt.figure()` command:

```python
plt.figure()
```

The `plt.figure()` function also allows you to customize the Figure in various ways. 

One of the most common options you'll use is the `figsize` argument. This lets you specify the width and height of your figure in inches, influencing its size and aspect ratio when displayed.

For example, to create a figure with a specific width and height, you would pass a tuple containing the dimensions to the `figsize` argument:

```python
plt.figure(figsize=(width_in_inches, height_in_inches))
```

So, `plt.figure((10, 6))` would create a figure that is 10 inches wide and 6 inches tall. 

### 4. Call Plotting Functions

Once you have your data prepared and a Figure created (either implicitly or explicitly), you can start adding actual visual elements to it using Matplotlib's various **plotting functions**. 

These functions take your data as input and generate different types of plots. 

Here are some of the most commonly used plotting functions available through `plt` (from `matplotlib.pyplot`):

* **`plt.plot(x, y, ...)`**: Creates **line plots**. 
* **`plt.bar(x, height, ...)`**: Creates **bar charts**. 
* **`plt.hist(x, ...)`**: Creates **histograms**.
* **`plt.pie(x, ...)`**: Creates **pie charts**. 

Each of these functions has various optional arguments that allow you to customize the appearance of your plot, such as colors, line styles, marker shapes, and more. We'll explore some of these customizations in the following sections.

### 5. Customize the Plot

Once you've created a basic plot, you'll often want to add more information and make it visually clearer and more informative. Matplotlib provides several functions to **customize** your plots with elements like labels, titles, legends, and grid lines. Here are some of the essential customization functions available through `plt`:

* **Labels:** Help viewers understand what the axes represent.
    * **`plt.xlabel('X-axis Label')`**: Sets the label for the horizontal (x) axis. Replace `'X-axis Label'` with a descriptive name for your x-axis data.
    * **`plt.ylabel('Y-axis Label')`**: Sets the label for the vertical (y) axis. Replace `'Y-axis Label'` with a descriptive name for your y-axis data.

* **Title:** Provides context and summarizes the main point of the plot.
    * **`plt.title('Plot Title')`**: Sets the title of your plot. Replace `'Plot Title'` with a concise and informative title.

* **Legend:** Explains what different lines, markers, or bars in your plot represent. This is especially useful when you have multiple datasets plotted on the same axes.
    * **`plt.legend()`**: Displays the legend. For the legend to show meaningful information, you need to have included a `label` argument when you initially called your plotting functions (e.g., `plt.plot(x, y, label='Data Series A')`).

* **Grid:** Adds grid lines to the plot, which can make it easier to read the values on the axes.
    * **`plt.grid(True)`**: Turns the grid lines on. You can also customize the appearance of the grid (e.g., line style, color) by passing additional arguments to this function.

By using these customization functions, you can transform a basic plot into a clear, informative, and professional-looking visualization. We'll see examples of how to use these in the following sections.

### 6. Display or Save the Plot

Once you've created and customized your plot, the final step is to either **display** it on your screen or **save** it to a file.

Matplotlib provides simple functions for these purposes:

**`plt.show()`** 

* This function displays the plot in an interactive window. When you run your Python script or Jupyter Notebook cell containing `plt.show()`, the generated plot will pop up.

**`plt.savefig('filename')`**

* This function allows you to save your plot to a file.
* Matplotlib supports various file formats, and the file extension you use in the filename will usually determine the format.

**Saving in different formats:**
 

```python
# PNG
plt.savefig('my_plot.png') 

# JPG
plt.savefig('my_plot.jpg') 

# PDF
plt.savefig('my_plot.pdf')   

# SVG
plt.savefig('my_plot.svg')      
```

**Controlling the resolution (DPI):** 

For image formats like PNG and JPG, you can also control the resolution (dots per inch) using the `dpi` argument:

```python
plt.savefig('high_resolution_plot.png', dpi=300)
```

A higher DPI value generally results in a sharper image with more detail but also a larger file size.

# Creating Common Visualizations

Now that we've covered the fundamental workflow of plotting with Matplotlib, let's dive into some of the most frequently used and effective types of visualizations. Understanding when and how to use these different plot types is crucial for exploring your data and communicating insights.

In this section, we'll focus on the following four key visualization types:

* **Line Graphs**
* **Bar Graphs** 
* **Pie Charts** 
* **Histograms** 

We'll explore each of these in more detail, including how to create them using Matplotlib and when they are most appropriate.

## Line Graphs

Line graphs (or line charts) are particularly useful for illustrating trends in data across a continuous period. 

The x-axis of the graph usually represents a time series (like days, months, or years), and the y-axis displays the corresponding numerical data. 

By connecting these data points with lines, we can clearly see how values rise and fall over time.

Here's the basic syntax for creating a line plot with Matplotlib:

### Basic Syntax

Here's the basic syntax for creating a line plot with Matplotlib:

```python
plt.plot(x, y)
```

* `x:` The horizontal coordinates of the data points.  These values determine the position of the data points along the x-axis.

* `y:` The vertical coordinates of the data points. These values determine the position of the data points along the y-axis and, consequently, the shape of the line.


__Commonly Used Optional Arguments:__

* `color`: The color of the line.  You can specify colors using names, hex codes, or RGB tuples.
* `linestyle`: The style of the line (e.g., solid, dashed, dotted). 
* `label`: A string used to label the line in the plot's legend.

### Line Graph: Tesla

<center><img src="../images/stock/pexels-adaptphotos-11099564.jpg" width="500"></center>

As we just discussed, line graphs are particularly effective at illustrating how a value changes over a continuous period. Speaking of things that rise and fall over time, let's take another look at Tesla's stock performance. This is a classic example of data that can be beautifully visualized with a line graph to reveal trends and volatility.

To get started, we'll create a DataFrame containing Tesla's stock performance over the past year using the `yfinance` module. This will give us the data we need to plot the stock price against time. Let's proceed with fetching this data.

In [None]:
## Begin Example




## End Example

Next up, using `matplotlib`, let's visualize Tesla's closing price. 

__Here's how we'll do it:__

* Create a new figure for the plot with a specified size using `plt.figure()` and the `figsize` argument.

* Plot the data using `plt.plot()` and specifying the data for the x and y axis
    * Provide the data for the x-axis (the date)
    * Provide the data for the y-axis (the closing stock prices)

* Customize the plot

    * Use `plt.xlabel()` to set the label for the x-axis
 
    * Use `plt.ylabel()` to set the label for the y-axis

    * Use `plt.title()` to set the title of the plot
  


* Display the Plot
    * Call `plt.show()` to display the generated plot.


In [None]:
## Begin Plot








## End Plot

## Bar Graphs

Bar graphs are used to represent categorical data with rectangular bars. 

Typically, the height of each bar is proportional to the value it represents, making it easy to visually compare the magnitudes of different categories. 

While bars can be horizontal, they are most often displayed vertically. This allows for quick and intuitive comparisons between groups.

### Basic Syntax

Here's the basic syntax for creating a bar graph with Matplotlib:

Basic Syntax

```python
plt.bar(x, height)
```

`x:` The x-coordinates of the bars. These values specify the horizontal position of each bar.  Commonly, x represents categorical data.

`height:` The height of each bar. These values represent the magnitude or quantity of what you are measuring for each category.

__Commonly Used Optional Arguments__

* `color:` A sequence of colors to use for the bars. You can specify colors as names, hex codes or RGB tuples. If you provide a single color, it will be used for all bars.

* `width:` The width(s) of the bars. You can specify a single value to make all bars the same width, or a sequence of values to give each bar a different width. The default is `0.8`.

* `align:` The alignment of the bars relative to their x-coordinates. Possible values are `'center'` (the default) or `'edge'`

* `label:` A string that labels the bar graph in the legend.

* `edgecolor:` The color of the borders of the bars.

### Bar Graph: BikeTown

<center><img src="../images/stock/pexels-cristiana-raluca-213635-686230.jpg" width=500"></center>

Now, let's switch gears and explore **bar graphs**. As we discussed, bar graphs are excellent for comparing the values of different categories. To illustrate this, we'll use data from BikeTown, a bike-sharing service.

We'll be working with a dataset containing trip information from May 2018, which is available at the following URL:

`https://s3.amazonaws.com/biketown-tripdata-public/2018_05.csv`

Our goal with this data will be to focus on understanding the popularity of different **Payment Plan types** offered by BikeTown. By creating a bar graph, we can easily compare the number of trips taken under each payment plan, providing a clear visual comparison of their usage. 

Let's start by loading and preparing this dataset.

In [None]:
## Begin Example



## End Example

Now that we have loaded the BikeTown data, let's create the bar graph to visualize the number of trips for each payment plan.

And here's how we'll do it:

Instructions go here

In [None]:
## Begin Plot








## End Plot

## Histograms

Histograms are a type of bar chart specifically designed to show frequency distributions. 

For each unique value (or a defined interval of values) in a dataset, a vertical bar is drawn. 

The height of this bar directly corresponds to the frequency – that is, how many times that particular value (or values within the interval) appears in the data.

### Basic Syntax

Here is the basic syntax for creating a Histogram with Matplotlib:

```python
plt.hist(data)
```
* `data`:  The dataset to be plotted as a histogram. `data` should be a sequence of numerical data.

__Commonly Used Optional Arguments__

* `bins`:  This determines the number of bins (intervals) into which the data is grouped.  A larger number of bins can reveal more detail in the distribution, while a smaller number can provide a broader overview.

* `color`:  The color of the bars in the histogram.

* `edgecolor`:  The color of the borders of the bars.

### Histograms: BikeTown

<center><img src="../images/stock/pexels-stitch-20123868.jpg"></center>

Continuing with the BikeTown data,let's create a histogram to depict the frequency of distance values.

Let's load in the data from the `Distance_Miles` column and clean a very specific data entry...that bike ride that was thousands of miles..

In [None]:
## Begin Example








## End Example

Now that we *took care of* the odd data, let's visualize it as a histogram.

Instructions go here

In [None]:
## Begin Plot








## End Plot

## Pie Charts

Pie charts are circular graphs that show how a dataset is divided into different categories, with each category's contribution represented as a proportion of the whole.

Think of the entire pie as representing 100% of the data. Each "slice" or "wedge" of the pie corresponds to a different category within the dataset.

The size of each slice (its area) is proportional to the percentage of the total that the category represents. This makes pie charts a very intuitive way to visually compare the relative sizes or proportions of different categories.

### Basic Syntax

Here's the basic syntax for creating a pie chart with Matplotlib:

Basic Syntax

```python
plt.pie(data)
```

* `data`:  A 1-dimensional sequence of numerical values. This is the core data that determines the size of each slice in the pie chart.

__Commonly Used Optional Arguments__

* `labels:` A sequence of strings, providing a label for each wedge in the pie chart.
* `colors:` A sequence of strings, providing a label for each wedge in the pie chart., allowing you to visually distinguish between categories.
* `autopct:` A string or a callable used to label the wedges with their numerical values (typically percentages)


### Pie Chart: FIFA 22

<center><img src="../images/stock/pexels-a-darmel-7862351.jpg" width="500"></center>

You should have a `.csv` file at the following location:
* `../data/players_22.csv`

This dataset, sourced from [Sports-Statistics](https://sports-statistics.com/sports-data/fifa-2022-dataset-csvs/), includes players data for the Career Mode from FIFA 22:

* Every player available from FIFA 22
* 100+ attributes
* Player positions
* Player attributes
* Player personal data
* And more

For demonstration purposes, let's load in the data from the `preferred_foot` column, and inspect the first 5 entries.

In [None]:
## Begin Example



## End Example

It may be a good idea to visualize this data as a Pie Chart.

Here's how we'll do it: 

* __Count the occurrences of each preferred foot:__
    * Use `.value_counts()` on the `preferred_foot` column. 
    * Creating a new Series where the index consists of the unique values, `Left` and `Right`their corresponding frequencies (counts) as the values.

* __Use the counts as pie slice sizes:__

    * We'll use the `Left` and `Right` counts as inputs for the pie chart function.

    * Each count will determine the size of the corresponding slice in the pie.


__Pie Chart Abstract Syntax:__
```python
plt.pie(slices)
```

* `slices`: Is the only required input. It should be a sequence of values (list or Pandas Series) representing the size of each pie slice.

Here are some commonly used optional parameters for customizing the pie chart's appearance:
* __`labels`:__ A list of strings providing labels for each slice.
* __`colors`:__ A list of colors for each slice.
* __`autopct`:__ A string or a callable that formats the numerical values displayed on each slice

Now let's get to crafting that chart:

In [None]:
## Begin Plot








## End Plot

## Conclusion

This notebook has provided a basic introduction to creating common types of plots in Matplotlib. 

For more detailed information and advanced customization options, please refer to the official [Matplotlib documentation]( https://matplotlib.org/stable/index.html)