This notebook has been adapted from... 

https://github.com/callysto/basketball-and-data-science/blob/main/content/03-statistics.ipynb, with permission.

(Open in 
[Callysto](https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https://github.com/pbeens/Data-Analysis&branch=main&subPath=BADS/03-statistics/03-01-histograms.ipynb&depth=1) | [Colab](https://githubtocolab.com/pbeens/Data-Analysis/blob/main/BADS/03-statistics/03-01-histograms.ipynb))

---

Statistics are used in many fields of study to investigate why things happen, when they occur, and whether their reoccurrence is predictable. Some everyday examples of how statistics are used include¹:

- **Biology**: Statistics can be used to analyze data from experiments and research studies in biology.
- **Business growth**: Statistics can be used to analyze sales data and other business metrics to identify trends and opportunities for growth.
- **Economics**: Statistics can be used to analyze economic data such as GDP, inflation rates, and unemployment rates.
- **Farming & gardening**: Statistics can be used to analyze crop yields and other agricultural data.
- **Groceries**: Statistics can be used to analyze sales data for grocery stores and other retailers.
- **Housing**: Statistics can be used to analyze housing data such as home prices and rental rates.
- **Infrastructure**: Statistics can be used to analyze data related to infrastructure such as traffic patterns and road conditions.
- **Medicine**: Statistics can be used to analyze medical data such as patient outcomes and drug efficacy.
- **Warranties**: Statistics can be used to analyze warranty claims data to identify trends and potential issues with products.
- **Website performance**: Statistics can be used to analyze website traffic data and user behavior.

1. Source: Conversation with Bing, 2023-07-10

---

# Statistics

Let's work with Raptors data from 2023:

In [None]:
# Importing pandas and plotly.express libraries
import pandas as pd
import plotly.express as px

# Reading the data from the url into a pandas dataframe
url = 'https://raw.githubusercontent.com/pbeens/Data-Analysis/main/Data/raptors-2023.csv'
raptors_df = pd.read_csv(url)

Let's take a quick look at the top of the data:

In [None]:
display(raptors_df.head())

...and the names of the columns:

In [None]:
display(raptors_df.columns)

...and just these two columns:

In [None]:
display(raptors_df[['Pos', 'FG%']])

...and the unique values of the Position column (with `.unique()`):

In [None]:
display(raptors_df['Pos'].unique())

...and how many of each? (with `value_counts()`)

In [None]:
display(raptors_df['Pos'].value_counts())

# Histograms

Let's introduce another type of visualization, called a histogram. 

A histogram is like a bar graph that groups data into "bins" and displays how many values are in each bin.

Let's look at how well the Raptors do with field goals (FG%):

In [None]:
px.histogram(raptors_df, 
    x='FG%', 
    title='Raptors Field Goal Percentage')

Let's the number of bins (with `nbins=15`):

In [None]:
px.histogram(raptors_df, 
    x='FG%', 
    title='Raptors Field Goal Percentage', 
    nbins=15)

Just like with other visualizations we can use the color attribute to divide the data by another column from our dataset. Clicking on the labels in the legend will turn those traces on an off.

In [None]:
px.histogram(raptors_df, 
    x='FG%', 
    title='Raptors Field Goal Percentage by Position', 
    color='Pos')

# Exercise

Create a histogram that shows the free throw percent. Colour by age. 

In [None]:
# Importing pandas and plotly.express libraries
import pandas as pd

# Reading the data from the url into a pandas dataframe
url = 'https://raw.githubusercontent.com/pbeens/Data-Analysis/main/Data/raptors-2023.csv'
raptors_df = pd.read_csv(url)

# Write your program here



---
Back to [Lessons](https://github.com/pbeens/Data-Dunkers/blob/main/Lessons.md)