# **Temp Title**

##### *Tanisha Amrin, Charmaine Chui, Jakob Sereda, Julian Wright*

## **Introduction**

One of the most popular sports in North America and worldwide (*SportsPro Media*, 2023), games of basketball have been played amongst millions, both professional and amateur, since its invention in 1891 (*National Geographic*, 2021). When it comes to professional basketball, all eyes are on the National Basketball Association (or NBA), which is considered to be the premier professional basketball league in the world (*SportsPro Media*, 2023). Each of the 30 teams in the NBA holds their players to an immense standard of performance, only signing the very best in the world to hit the court wearing their jerseys.

A traditionally key aspect of basketball is the numerous positions players take on the court. Each player fills one position, and although there are many roles officially documented, they can be grouped into the three main categories of centers, forwards, and guards, with many hybrids and combinations to fill in the gaps (*Under Armour*). NBA team coaches are tasked with appropriately assigning drafted players and new recruits to positions that will most suit their body type and skill set. The nuances of such a process can have a profound impact on a team's ability to perform, allowing individual players to show their unique talents if placed in the right role (*RedBull*, 2022). The goal of this project is to provide a model that can assist coaches in assigning positions to their players, through statistical analysis and reasoning.

The question we seek to answer is: **What position is a player most likely to play, given their height, weight, free-throw percentage, and 3-point field goal percentage?**

The data set we will use to answer this question was pulled from [*nba.com*](https://www.nba.com/stats/players), the official website of the NBA, using the NBA API Client package; [*github.com/swar/nba_api*](https://github.com/swar/nba_api)  (Swar Patel, Randall Forbes, et al). It contains stats on individual players during each NBA season. The are 58 columns and 4,917 rows in this data set, missing values are represented by "NA". As there is a large amount of data stored by the NBA, and thus many columns in this data set, we will not list every single column in this data set below. We provide descriptions of the following columns from the data relevant to our analysis below:
- **PERSON_ID:** the given player's id
- **HEIGHT:** player's height in inches
- **WEIGHT:** player's weight in lbs
- **POSITION:** player's position (one of Forward, Center, Guard, Center-Forward, Forward-Center, Guard-Forward, Forward-Guard)
- **FG3_PCT:** 3-point field goal percentage, value between 0 and 1 (3-point field goals made / 3-point field goals attempted)
- **FT_PCT:** free throw percentage, value between 0 and 1 (free throws made / free throws attempted)

## **Preliminary Exploratory Data Analysis**

In [19]:
# importing libraries
library(tidyverse)
library(tidymodels)
library(RColorBrewer)

── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.1.1 ──

[32m✔[39m [34mbroom       [39m 1.0.5     [32m✔[39m [34mrsample     [39m 1.2.0
[32m✔[39m [34mdials       [39m 1.2.0     [32m✔[39m [34mtune        [39m 1.1.2
[32m✔[39m [34minfer       [39m 1.0.5     [32m✔[39m [34mworkflows   [39m 1.1.3
[32m✔[39m [34mmodeldata   [39m 1.2.0     [32m✔[39m [34mworkflowsets[39m 1.0.1
[32m✔[39m [34mparsnip     [39m 1.1.1     [32m✔[39m [34myardstick   [39m 1.2.0
[32m✔[39m [34mrecipes     [39m 1.0.8     

── [1mConflicts[22m ───────────────────────────────────────── tidymodels_conflicts() ──
[31m✖[39m [34mscales[39m::[32mdiscard()[39m masks [34mpurrr[39m::discard()
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m   masks [34mstats[39m::filter()
[31m✖[39m [34mrecipes[39m::[32mfixed()[39m  masks [34mstringr[39m::fixed()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m      masks [34mstats[39m::lag()
[31m✖[39m [3

In [21]:
set.seed(2023) 

# reading in the data from the web
nba_data <- read_csv("https://raw.githubusercontent.com/jakobsereda/dsci-100-project/main/stats.csv") 

# cleaning and wrangling the data
names(nba_data) <- tolower(names(nba_data))

nba_clean <- nba_data |>
    select(person_id, first_name, last_name, height, weight, position, season_id, 
           gp, gs, min, fg_pct, fg3_pct, ft_pct, reb, ast, stl, blk, tov, pf, pts) |>
    na.omit() |>
    mutate(position = ifelse(position == "Center-Forward", "Center", position)) |>
    mutate(position = ifelse(position == "Guard-Forward", "Guard", position)) |>
    mutate(position = ifelse(position == "Forward-Guard" | position == "Forward-Center", "Forward", position)) |>
    mutate(position = as_factor(position))

# splitting the data into training and testing sets
nba_split <- initial_split(nba_clean, prop = 3/4, strata = position)

nba_train <- training(nba_split)
nba_test <- testing(nba_split)

head(nba_train)

[1mRows: [22m[34m4916[39m [1mColumns: [22m[34m58[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (21): FIRST_NAME, LAST_NAME, DISPLAY_FIRST_LAST, DISPLAY_LAST_COMMA_FIR...
[32mdbl[39m  (31): PERSON_ID, HEIGHT, WEIGHT, SEASON_EXP, TEAM_ID, FROM_YEAR, TO_YEA...
[33mlgl[39m   (5): GAMES_PLAYED_CURRENT_SEASON_FLAG, DLEAGUE_FLAG, NBA_FLAG, GAMES_P...
[34mdttm[39m  (1): BIRTHDATE

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


person_id,first_name,last_name,height,weight,position,season_id,gp,gs,min,fg_pct,fg3_pct,ft_pct,reb,ast,stl,blk,tov,pf,pts
<dbl>,<chr>,<chr>,<dbl>,<dbl>,<fct>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
76003,Kareem,Abdul-Jabbar,86,225,Center,1988-89,74,74,1695,0.475,0.0,0.739,334,74,38,85,95,196,748
76009,Mark,Acres,83,220,Center,1992-93,18,7,269,0.531,0.5,0.688,67,5,3,6,13,34,64
76011,Alvan,Adams,81,210,Center,1987-88,82,25,1646,0.496,0.5,0.844,365,183,82,41,140,245,611
203500,Steven,Adams,83,265,Center,2022-23,42,42,1133,0.597,0.0,0.364,485,97,36,46,79,98,361
1628389,Bam,Adebayo,81,255,Center,2023-24,53,53,1835,0.508,0.071,0.769,546,214,57,51,130,128,1057
202374,Solomon,Alabi,85,252,Center,2011-12,14,0,122,0.361,0.0,0.875,47,3,2,9,5,11,33


## **Bibliography**

- Data is pulled from [*nba.com*](https://www.nba.com/stats/players), the official website of the NBA, 
   using the NBA API Client package; [*github.com/swar/nba_api*](https://github.com/swar/nba_api) (Swar Patel, Randall Forbes, et al). *Note: this dataset was not provided on Canvas, but was approved by a TA during tutorial.*
- McMurray, Ben. "Why the NBA is America's Most Globally Relevant Sports Property." *SportsPro Media*, 24 Oct. 2023, [*link*](https://www.sportspromedia.com/insights/analysis/nba-tv-rights-revenue-global-popularity-data-ampere-analysis/).
- Toole, T.C. "Here's the History of Basketball - From Peach Baskets in Springfield to Global Phenomenon." *National Geographic*, 27 Mar. 2021, [*link*](https://www.nationalgeographic.com/history/article/basketball-only-major-sport-invented-united-states-how-it-was-created).
- "What are the 5 Basketball Positions and Their Roles?", *Under Armour*, [*link*](https://www.underarmour.com/en-us/t/playbooks/basketball/basketball-positions/).
- Lister, Aimee. "Basketball Positions Explained: What Each Player Does." *RedBull*, 21 Jul. 2022, [*link*](https://www.redbull.com/us-en/basketball-positions-what-each-player-does#:~:text=Generally%20speaking%2C%20each%20team%20is,handling%2C%20passing%20and%20shooting%20skills.).