---
Title: "Intro to Categorical Model-Making Module by Austin Hayes"

Author:
  - Name: Austin Hayes

  -  Email: ahayes65@charlotte.edu

  -  Affiliation: University of North Carolina at Charlotte

Date: July 29, 2025

Description: Using WNBA Team 'Per Game' stats, we will review the basics of categorical model-making. We will make QDA, LDA and Logistic models in order to predict which team will become the next WNBA Champion.

Categories:
  - Logistic Regression
  - Quadratic Discriminant Analysis
  - Linear Discriminant Analysis
  - Importing and Reading data
  - Data Cleaning
  - Data Science
  - Pandas


### Data

This Dataset is from Basketball Reference @ https://www.basketball-reference.com

Visit the original data page here: https://www.basketball-reference.com/wnba/years/2024.html

The data set contains 60 rows and 26 columns. Each row represents a WNBA team during the 2024 season.

Download data: 

Available on the [Intro to Categorical Model-Making Module by Austin Hayes](https://github.com/schuckers/Charlotte_SCORE_Summer25/tree/main/Data%20for%20Modules/Data%20For%20Intro%20to%20Categorical%20Model-Making%20Module%20by%20Austin%20Hayes): [2024_WNBA_Per_Game.csv](https://raw.githubusercontent.com/schuckers/Charlotte_SCORE_Summer25/refs/heads/main/Data%20for%20Modules/Data%20For%20Intro%20to%20Categorical%20Model-Making%20Module%20by%20Austin%20Hayes/2024_WNBA_Per_Game.csv)

---

### Variables and their Descriptions:


<details>
<summary><b>Variable Descriptions</b></summary>

| Variable | Description |
|----------|-------------|
| Rk       | Rank of the team in the league |
| Team     | Name of the team |
| G        | Games played |
| MP       | Minutes played per game |
| FG       | Field Goals made per game |
| FGA      | Field Goal attempts per game |
| FG%      | Field Goal percentage (FG ÷ FGA) |
| 3P       | Three-Point Field Goals made per game |
| 3PA      | Three-Point Field Goal attempts per game |
| 3P%      | Three-Point Field Goal percentage (3P ÷ 3PA) |
| 2P       | Two-Point Field Goals made per game |
| 2PA      | Two-Point Field Goal attempts per game |
| 2P%      | Two-Point Field Goal percentage (2P ÷ 2PA) |
| FT       | Free Throws made per game |
| FTA      | Free Throw attempts per game |
| FT%      | Free Throw percentage (FT ÷ FTA) |
| ORB      | Offensive Rebounds per game |
| DRB      | Defensive Rebounds per game |
| TRB      | Total Rebounds per game |
| AST      | Assists per game |
| STL      | Steals per game |
| BLK      | Blocks per game |
| TOV      | Turnovers per game |
| PF       | Personal Fouls per game |
| PTS      | Points scored per game |
| CHAMPION | Did that team win the championship that year? 1 = Yes, 0 = No |

</details>

---

# Review Questions

Below, you will be asked a series of questions to review the material you have learned throughout this module.

In [None]:
# Import the necessary librarys for the module
# Basic Data Science Library for importing data and data manipulation
import pandas as pd

# Library for mathematical operations
import numpy as np

# Import only the necessary functions from the sklearn library
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler

### 1.) How does KNN work and what needs to be done to the data before use of a KNN model?

### 2.) Using the 2020-2024 dataset, create a list of predictor variables that include FG%, 3P%, 2P%, FT%, ORB, AST. Scale the list of predictor variables. Create a new column in the 2020-2024 dataset that assesses whether a teams points ('PTS') are over 85 per game. 1 = True, 0 = False.

Hint: _Use .astype(int)_ 

### 3.) Using the scaled data you created in question 2, predict whether a team(s) will have above 85 points per game during the 2025 WNBA season with a QDA model. Record an accuracy score. 

### 4.) Do the same thing you did in question 3 but with 3 KNN models instead. Use K=5, k=25 and k=53 respectively. Record an accuracy score for each.

### 5.) As you did in questions 3 and 4, make a prediction but use a Logistic model instead. Record an accuracy score.

### 6.) Which model(s) had the highest accuracy score? Is there a tie between the most accurate models? Or is there a model that is clearly the best in terms of an accuracy score?

### 7.) What differentiates LDA and QDA? List a few differences.