# Statistics

This notebook has been adapted from... 

https://github.com/callysto/basketball-and-data-science/blob/main/content/03-statistics.ipynb, with permission.

(Open in [Callysto](https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https://github.com/pbeens/Data-Analysis&branch=main&subPath=BADS/03-statistics/03-02-statistics.ipynb&depth=1) | [Colab](https://githubtocolab.com/pbeens/Data-Analysis/blob/main/BADS/03-statistics/03-02-statistics.ipynb))

---

Statistics are used in many fields of study to investigate why things happen, when they occur, and whether their reoccurrence is predictable. Some everyday examples of how statistics are used include¹:

- **Biology**: Statistics can be used to analyze data from experiments and research studies in biology.
- **Business growth**: Statistics can be used to analyze sales data and other business metrics to identify trends and opportunities for growth.
- **Economics**: Statistics can be used to analyze economic data such as GDP, inflation rates, and unemployment rates.
- **Farming & gardening**: Statistics can be used to analyze crop yields and other agricultural data.
- **Groceries**: Statistics can be used to analyze sales data for grocery stores and other retailers.
- **Housing**: Statistics can be used to analyze housing data such as home prices and rental rates.
- **Infrastructure**: Statistics can be used to analyze data related to infrastructure such as traffic patterns and road conditions.
- **Medicine**: Statistics can be used to analyze medical data such as patient outcomes and drug efficacy.
- **Warranties**: Statistics can be used to analyze warranty claims data to identify trends and potential issues with products.
- **Website performance**: Statistics can be used to analyze website traffic data and user behavior.

1. Source: Conversation with Bing, 2023-07-10

---

Let's calculate some basic statistics,including mean, median, minimum, and maximum.

Recalling the names of the columns in the Raptors file:

|Column|Meaning|
|-|-|
|Age|Player's age on February 1 of the season|
|Lg|League|
|Pos|Position|
|G|Games|
|GS|Games Started|
|MP|Minutes Played Per Game|
|FG|Field Goals Per Game|
|FGA|Field Goal Attempts Per Game|
|FG%|Field Goal Percentage|
|3P|3-Point Field Goals Per Game|
|3PA|3-Point Field Goal Attempts Per Game|
|3P%|3-Point Field Goal Percentage|
|2P|2-Point Field Goals Per Game|
|2PA|2-Point Field Goal Attempts Per Game|
|2P%|2-Point Field Goal Percentage|
|eFG%|Effective Field Goal Percentage*|
|FT|Free Throws Per Game|
|FTA|Free Throw Attempts Per Game|
|FT%|Free Throw Percentage|
|ORB|Offensive Rebounds Per Game|
|DRB|Defensive Rebounds Per Game|
|TRB|Total Rebounds Per Game|
|AST|Assists Per Game|
|STL|Steals Per Game|
|BLK|Blocks Per Game|
|TOV|Turnovers Per Game|
|PF|Personal Fouls Per Game|
|PTS|Points Per Game|

<span style="font-size:10px">*This statistic adjusts for the fact that a 3-point field goal is worth one more point than a 2-point field goal.</span>

## min(), max(), median(), mean()

Looking specifically at the Games column (G), let's calculate the minimum, maximum, mean, and median number of games played. 

In [None]:
# Importing pandas and plotly.express libraries
import pandas as pd

# Reading the data from the url into a pandas dataframe
url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/raptors-2023.csv'
raptors_df = pd.read_csv(url)

# minimum
print('Minimum =', raptors_df['G'].min())

# maximum
print('Maximum =', raptors_df['G'].max())

# mean
print('Mean =', raptors_df['G'].mean())

# median
print('Median =', raptors_df['G'].median())

# Exercise

What is the average age of the Raptors? 

In [None]:
# Write your program here.

# Supplemental

## Average for all numbered columns

What if we want to look at the averages of *all* the numbered columns? Simple!

In [None]:
display(raptors_df.mean(numeric_only = True).round(2))

## All stats at once

Display all avaialble statistics for all numberic columns (with `describe()`):

In [None]:
raptors_df.describe()

What if we only want to look at one column?

In [None]:
display(raptors_df['G'].describe())

## f-strings

By using some advanced printing techniques in Python (called *f-strings*), we can make the data look better:

In [None]:
# minimum
print(f"Minimum = {raptors_df['G'].min():.0f}")

# maximum
print(f"Maximum = {raptors_df['G'].max():.0f}")

# mean
print(f"Mean = {raptors_df['G'].mean():.1f}")

# median
print(f"Median = {raptors_df['G'].median():.0f}")

---
Back to [Lessons](https://github.com/pbeens/Data-Dunkers/blob/main/Lessons.md)