# Creating Histograms

This notebook has been adapted from... 

https://github.com/callysto/basketball-and-data-science/blob/main/content/03-statistics.ipynb, with permission.

(Open in 
[Callysto](https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https://github.com/pbeens/Data-Analysis&branch=main&subPath=BADS/03-statistics/03-01-histograms.ipynb&depth=1) | [Colab](https://githubtocolab.com/pbeens/Data-Analysis/blob/main/BADS/03-statistics/03-01-histograms.ipynb))

# Getting Our Data

---

Let's work with Raptors data from 2023:

In [None]:
# Importing pandas and plotly.express libraries
import pandas as pd
import plotly.express as px

# Reading the data from the url into a pandas dataframe
url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/raptors-2023.csv'
raptors_df = pd.read_csv(url)

# Reviewing Our Data

Note: You can review the names of the data columns [here](https://github.com/pbeens/Data-Dunkers/blob/main/Data/raptors-2023-Column-Names.md). *(Hint: Hold Ctrl/Cmd (⌘) while clicking to open in a new tab).*

Let's take a quick look at the top of the data:

In [None]:
display(raptors_df.head())

...and the names of the columns:

In [None]:
display(raptors_df.columns)

...and just these two columns:

In [None]:
display(raptors_df[['Pos', 'FG%']])

...and the unique values of the Position column (with `.unique()`):

In [None]:
display(raptors_df['Pos'].unique())

...and how many of each? (with `value_counts()`)

In [None]:
display(raptors_df['Pos'].value_counts())

# PLotting a Histogram

A histogram is like a bar graph that groups data into "bins" and displays how many values are in each bin.

Let's look at how well the Raptors do with field goals (FG%):

In [None]:
px.histogram(raptors_df, 
    x='FG%', 
    title='Raptors Field Goal Percentage')

What can you say about the shape of the histogram?

Let's change the number of bins (with `nbins=15`):

In [None]:
px.histogram(raptors_df, 
    x='FG%', 
    title='Raptors Field Goal Percentage', 
    nbins=15)

Just like with other visualizations we can use the color attribute to divide the data by another column from our dataset. Clicking on the labels in the legend will turn those traces on an off.

In [None]:
px.histogram(raptors_df, 
    x='FG%', 
    title='Raptors Field Goal Percentage by Position', 
    color='Pos')

# Exercise

Create a histogram that shows the free throw percent. Colour by age. 

In [None]:
# Importing pandas and plotly.express libraries
import pandas as pd

# Reading the data from the url into a pandas dataframe
url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/raptors-2023.csv'
raptors_df = pd.read_csv(url)

# Write your program here



---
Back to [Lessons](https://github.com/pbeens/Data-Dunkers/blob/main/Lessons.md)