**Introduction:**

Exploratory view of improving Pythagorean Win Percentage and Projected Win Percentage calculations for predicting the season win percentage by NBA teams.

**Theory: **

Season win prediction calculations should be compensated based on the quality of opponents played using Strength of Schedule.

**Sample data:**

Chose the Chicago Bulls for the 2016-2017 season because they finished with an even .500 record.  Omitted games prior to 12/15 to reduce eratic early season behavior.

In [None]:
library(ggplot2)

standings <- read.csv("../input/2016-17_standings.csv")
standings$stDate <- as.Date(standings$stDate)

team <- standings[which(standings$teamAbbr=='CHI'),]
team <- subset(team, stDate > "2016-12-15")

tail(team)

**Season Win Percentage:**

The fraction of games a team has won at the end of the regular season.  The goal of predicting a season win percentage is to mirror this line for the entire season.

In [None]:
maxDate <- subset(team, team$stDate == max(team$stDate))
seasonWinPct <- maxDate$gameWon / maxDate$gamePlay

g <- ggplot(data=team, aes(x=stDate)) +
    geom_line(aes(y=seasonWinPct, color="Season Win%")) +
    ylab(label="Win Percentage") +
    xlab(label="Date") +
    ggtitle("12/15/16 to 04/12/17 for Chicago Bulls") +
    scale_colour_manual("", 
        breaks = c("Season Win%", "Pyth13.91 Win%", "Projected Win%", "SosPyth13.91 Win%", "SosProjected Win%"),
        values = c("Season Win%"="black", "Pyth13.91 Win%"="blue", "Projected Win%"="green", "SosPyth13.91 Win%"="red", "SosProjected Win%"="darkgoldenrod4"))
g

**Pythagorean Win Percentage:**

An estimation of what a team’s win percentage will be at the end of the season based on the accumulated points for and against at any point during the season.

The calculation was invented by Bill James for baseball, and was later adapted to other sports.  American sports executive Daryl Morley is credited with being the first to adapt the concept to basketball, using the exponent 13.91.

Formula: ptsFor ^ 13.91 / (ptsFor ^ 13.91 + ptsAgnst ^ 13.91)

In [None]:
g + geom_line(aes(y=pyth.13.91, color="Pyth13.91 Win%")) +
    geom_point(aes(y=pyth.13.91, color="Pyth13.91 Win%"))

**Projected Win Percentage:**

NBAstuffer defines this formula that uses a team's net overall point differential rather than points scored and points allowed.  Each point of differential translates to 2.7 wins over the course of the season.

Formula: [((ptsFor – ptsAgnst) / gamePlay) * 2.7) + 41]/82

In [None]:
g + geom_line(aes(y=pyth.13.91, color="Pyth13.91 Win%")) +
    geom_point(aes(y=pyth.13.91, color="Pyth13.91 Win%")) +
    geom_line(aes(y=pw., color="Projected Win%")) +
    geom_point(aes(y=pw., color="Projected Win%"))

**Strength of Schedule:**

A rating which applies to a team's schedule such that the stronger the opponents, the higher the SOS rating, or conversely, the weaker the opponents, the lower the SOS rating.

Formula: [(2 * Opponents Record) + (Opponents Opponents Record)] / 3

**Pythagorean Win Percentage influence by Strength of Schedule:**

Formula: (Pythagorean Win Percentage + Strength of Schedule) / 2

In [None]:
sosPythWinPct <- (team$pyth.13.91 + team$sos) / 2

g + geom_line(aes(y=pyth.13.91, color="Pyth13.91 Win%")) +
    geom_point(aes(y=pyth.13.91, color="Pyth13.91 Win%")) +
    geom_line(aes(y=pw., color="Projected Win%")) +
    geom_point(aes(y=pw., color="Projected Win%")) +
    geom_line(aes(y=sosPythWinPct, color="SosPyth13.91 Win%")) +
    geom_point(aes(y=sosPythWinPct, color="SosPyth13.91 Win%")) 

**Projected Win Percentage influence by Strength of Schedule:**

Formula: (Projected Win Percentage + Strength of Schedule) / 2

In [None]:
sosProjWinPct <- (team$pw. + team$sos) / 2

g + geom_line(aes(y=pyth.13.91, color="Pyth13.91 Win%")) +
    geom_point(aes(y=pyth.13.91, color="Pyth13.91 Win%")) +
    geom_line(aes(y=pw., color="Projected Win%")) +
    geom_point(aes(y=pw., color="Projected Win%")) +
    geom_line(aes(y=sosPythWinPct, color="SosPyth13.91 Win%")) +
    geom_point(aes(y=sosPythWinPct, color="SosPyth13.91 Win%")) +
    geom_line(aes(y=sosProjWinPct, color="SosProjected Win%")) +
    geom_point(aes(y=sosProjWinPct, color="SosProjected Win%"))

**Conclusion:**

Except for a few weeks in January when the Bulls record hovered around their Season Win Percentage, blending the Strength of Schedule with the Pythagorean Win Percentage and Projected Win Percentage calculations has a significant improvement from the base calculations on the sample data.

Next steps would be to expand testing of these calculations against the remainder of the league and against multiple seasons.  Also evaluate if the weight given to Strength of Schedule in these formulas should be adjusted for greater accuracy.