Title: Predicting tennis player rank categories based on the number of seasons active, prize money won, age, and height.

Background information:
Tennis rankings are determined based on the number of points a player has. Points are acquired by playing in tournaments. Winning higher-status tournaments, such as a Grand Slam, results in more points gained (2000). Winning tournaments also lets players earn prize money, where like points, winning a higher-status tournament gives more prize money. Higher-ranking players are therefore more likely to have earned more prize money. Additionally, players who have competed in more seasons have had more opportunities to win tournaments, possibly indicating that they have a higher ranking



Question: 
The question we will be exploring in this project is: Can we use known player statistics (number of seasons active, prize money, age, and height) of the top 500 players to predict which category rank a future player will reside i? 

To answer this question we will be using the “Player Stats for the Top 500 Players” data frame whi ch provides statistics on each top player such as age, height, number of seasons active, and current ra


Variables:

We will be using the number of seasons a player has played in, how much prize money they've won, the age of the player, and the height of the player.


ayer
nk.
.




In [1]:
library(repr)
library(tidyverse)
library(tidymodels)
library(readxl)
library(janitor)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.3     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.0
[32m✔[39m [34mggplot2  [39m 3.4.4     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.0
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors
── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.1.1 ──

[32m✔[39m [34mbroom       [39m 1.0.5     [32m✔[39m [34mrsample     [39

In [25]:
tennis_data <- read_csv("Data/player_stats.csv", skip = 1)

tennis_df <- tennis_data |>
    clean_names() |>
    select(age, current_rank, prize_money, height, seasons) |>
    separate(col = height, 
            into = c("height_cm", "discard"), 
           sep = " ") |>
    separate(col = current_rank, 
            into = c("rank", "discard_1"), 
           sep = " ") |>
    separate(col = age, 
            into = c("age", "discard_2"), 
           sep = " ") |>
    select(-discard_2, -discard_1, -discard) |>
    mutate(age = as.numeric(age),
           rank = as.numeric(rank),
           height_cm = as.numeric(height_cm))

means <- tennis_df |>
            select(age, prize_money:seasons) |>
            map_dfr(mean, na.rm = TRUE)

means
tennis_df

[1m[22mNew names:
[36m•[39m `` -> `...1`
[1mRows: [22m[34m500[39m [1mColumns: [22m[34m38[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (24): Age, Country, Plays, Wikipedia, Current Rank, Best Rank, Name, Bac...
[32mdbl[39m (14): ...1, Prize Money, Turned Pro, Seasons, Titles, Best Season, Retir...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


age,prize_money,height_cm,seasons
<dbl>,<dbl>,<dbl>,<dbl>
25.96794,3416440,185.7913,6.494652


age,rank,prize_money,height_cm,seasons
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
26,378,,,
18,326,59040,,
32,178,3261567,185,14
21,236,374093,,2
27,183,6091971,193,11
22,31,1517157,,5
28,307,278709,,1
21,232,59123,,1
25,417,122734,,5
20,104,74927,,3
