# Creating Visualizations

To truly understand a dataset, simply looking at the raw numbers often isn't sufficient. 

Data visualization offers a powerful solution. By presenting information in visual formats like charts, we can unlock our ability to recognize trends and patterns effectively.

# Plotting with Matplotlib

Matplotlib is a widely used and powerful Python library for creating data visualizations. 

It structures each plot as a collection of organized elements, allowing for highly customized charts and graphs.

## Installing Matplotlib

```bash
!pip install matplotlib
```

## Using Matplotlib
When working with Matplotlib in Python, it's common practice to import it using the alias `plt`. 

To get started, let's import `pandas` and `matplotlib`:

In [None]:
## Begin Code

import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
## End Code

# Common Visualizations

Next, we're going to explore some of the most common and useful ways to visualize data. We'll focus on these four key types:

* Line Graphs
* Bar Graphs
* Pie Charts
* Histograms

## Line Graphs

Line graphs (or line charts) are particularly useful for illustrating trends in data across a continuous period. 

The x-axis of the graph usually represents a time series (like days, months, or years), and the y-axis displays the corresponding numerical data. 

By connecting these data points with lines, we can clearly see how values rise and fall over time.

### Line Graph: Tesla

Speaking of rise and fall over time. Let's take another look at Tesla's Stock Performance.

Let's Create a DataFrame to contain Tesla's Year-to-Date stock performance using the `yfinance` module:

In [None]:
## Begin Code
!pip install yfinance
import yfinance as yf

tesla_ticker = "TSLA"

tesla = yf.Ticker(tesla_ticker)

tesla_history = tesla.history(period = "1y")

tesla_history.head()

Next up, using `matplotlib`, let's visualize Tesla's closing price using the following steps:

__Create the Line Graph__
* `plt.figure(figsize=(12, 6))`: Creates a new figure for the plot with a specified size (width=12 inches, height=6 inches). This helps in making the plot more readable.
* `plt.plot(tesla_history.index, tesla_history['Close'], label='Tesla Stock Price', color='blue')`: This is the core plotting function:
    * `tesla_history.index`: Provides the dates for the x-axis.
    * `tesla_history['Close']`: Provides the closing prices for the y-axis. We are choosing to plot the closing price. You could also plot 'Open', 'High', 'Low', or 'Adj Close'.
    * label='Tesla Stock Price': Sets the label for the line, which will appear in the legend.
    * color='blue': Sets the color of the line to blue.

__Add Labels and Title__
* `plt.xlabel("Date")`: Sets the label for the x-axis as `"Date"`.
* `plt.ylabel("Closing Price (USD)")`: Sets the label for the y-axis as `"Closing Price (USD)"`.
* `plt.title("Tesla Stock Price Over the Last Year")`: Sets the title of the plot.
* `plt.grid(True)`: Adds a grid to the plot, which can make it easier to read the values.
* `plt.legend()`: Displays the legend, which identifies the plotted line.

__Rotate x-axis labels for better readability__
* `plt.xticks(rotation=45)`: Rotates the x-axis labels by 45 degrees. This is often useful when you have many dates to prevent the labels from overlapping.
* `plt.tight_layout()`: Adjusts the plot layout to provide reasonable spacing between different elements, preventing labels from getting cut off.

__Display the Plot__
plt.show(): Displays the generated plot.


In [None]:
figure_size = (12, 6)
x = tesla_history.index
y = tesla_history["Close"]
x_label = "Date"
y_label = "Closing Price (USD)"
title = "Tesla Stock Price Over the Last Year"

tesla_history.plot.line(x="Index", 
                        y="Close", 
                        title="Telsa Stock YTD")
"""
plt.figure(figsize = figure_size)
plt.plot(x, y)

plt.xlabel(x_label)
plt.ylabel(y_label)
plt.title(title)
plt.grid(True)


plt.xticks(rotation=45)
plt.tight_layout()
"""
plt.show()

## Bar Graphs

Bar graphs are used to represent categorical data with rectangular bars. 

Typically, the height of each bar is proportional to the value it represents, making it easy to visually compare the magnitudes of different categories. 

While bars can be horizontal, they are most often displayed vertically. This allows for quick and intuitive comparisons between groups.

In [None]:
## Begin Code
URL = "https://s3.amazonaws.com/biketown-tripdata-public/2018_05.csv"

biketown = pd.read_csv(URL)

biketown.head()
## End Code

In [None]:
## Begin Code

# Count the number of trips for each user type
payment_plans = biketown['PaymentPlan'].value_counts()

# Create the bar graph
size = (12, 6)
colors = ['skyblue', 'lightcoral']
title = 'Number of Trips by Payment Plan'
x_label = 'Payment Plan'
y_label = 'Number of Trips'

plt.figure(figsize=size)
payment_plans.plot(kind='bar', color=colors)

# Labels and Titles
plt.title(title)
plt.xlabel(x_label)
plt.ylabel(y_label)
plt.xticks(rotation=0)  # Keep x-axis labels horizontal
plt.tight_layout()

# Show Plot
plt.show()


## End Code

## Pie Charts

Pie charts are circular graphs that display the proportion of each category as a percentage of the total. 

The entire pie represents 100% of the data, and each slice represents a different category. 

The area of each slice is proportional to the percentage that category contributes to the whole dataset, offering a clear visual comparison of these proportions.

### Pie Chart: Example

You should have a `.csv` file at the following location:
* `../data/players_22.csv`

This dataset, sourced from [Sports-Statistics](https://sports-statistics.com/sports-data/fifa-2022-dataset-csvs/), includes players data for the Career Mode from FIFA 22:

* Every player available from FIFA 22
* 100+ attributes
* Player positions
* Player attributes
* Player personal data
* And more

For demonstration purposes, let's load in the data from the `preferred_foot` column, and inspect the first 5 entries.

In [None]:
## Begin Example
FILE = "../data/players_22.csv"

fifa = pd.read_csv(FILE, usecols=["preferred_foot"])

fifa.head()
## End Example

It may be a good idea to visualize this data as a Pie Chart.

What we'll do is: 

* Use `.value_counts()` to create a new Series containing the unique values, `Left` and `Right`, as indices and their respective frequencies as values.
* Use the `Left` and `Right` counts as wedges in our Pie Chart.

__Pie Chart Abstract Syntax:__
```python
plt.pie(wedges)
```

* `wedges`: Is a required input, a sequence of values representing the size of each wedge.

Here are some commonly used optional parameters:
* __labels:__ A list of strings providing labels for each wedge
* __colors:__ A list of colors for each wedge
* __autopct:__ String or callable used to label the wedges with their numeric value(s).


In [None]:
## Begin Example

# Get Value Counts
counts = fifa.value_counts()
right_foot = counts["Right"]
left_foot = counts["Left"]

# Create Pie Chart
plt.figure(figsize=(5,5), dpi=100)

title = ("Soccer Player Foot Preference")
labels = ["Left", "Right"]
wedges = [left_foot, right_foot]

plt.pie(wedges, 
        labels = labels,
        autopct="%.2f %%")

plt.title(title)

# Show the Chart
plt.show()

## Histograms

Histograms are a type of bar chart specifically designed to show frequency distributions. 

For each unique value (or a defined interval of values) in a dataset, a vertical bar is drawn. 

The height of this bar directly corresponds to the frequency – that is, how many times that particular value (or values within the interval) appears in the data.

__Basic Syntax__

```python
plt.hist(data)
```

__Commonly Used Optional Arguments__
* bins
* color
* edge color



In [None]:
## Begin Example

#biketown_clean = biketown[biketown["RouteID"] != 7242673]
biketown_clean = biketown["Distance_Miles"].copy()

plt.hist(biketown_clean, range=(0,15))
plt.show()
## End Example