# NBA Top 10 GOAT Debate through the numbers
This is a quick notebook that uses data and clear parameters for the evaluation of the Greatest Player of All Time (GOAT) in the NBA.
For too long, we have heard so-called "Sport Analysts" creating criteria for their GOAT on the spot and trying to give reasons without applying those same criteria to every player.

And while I believe evaluating/debating a GOAT doesn't have much meaning because it's extremely hard to calculate the value a player has on a team, and even harder to compare players who have not competed against each other at their best, this notebook gives a data-first shot at deciding the NBA GOAT.

If my values of what makes an NBA player great could be phrased in one sentence, it would be:
"You play the game to win, so the main factors should be winning!"

## Criteria
- (4 points) Lead a team to a **championship** as the **best** player
- (3 points) Lead a team to a **championship** as the **co-best** player
- (2 points) Lead a team to the **finals** as the **best** player
- (1 points) Lead a team to the **finals** as the **co-best** player
- (1 points) Be the **2nd best** player on a **championship** team
- (1 point) Lead a team to the **conference finals** as the **best** player
- (1 point) Finish in the **top 3 for MVP** voting in a season

## How to decide whether a player is the best, co-best, 2nd, player on the  (rank)?
For this I decided to use a stat called PER https://www.basketball-reference.com/about/per.html.
PER takes into account all the well known stats such as points, rebounds, assists, shooting percentage along with many others.
And as a result it gives us one number by which we can compare the players.
Its not a perfect number but its pretty great and in my opinion the best we have (maybe VOPR can also be considered).

The rank of a player is decided based on the PER of the players on the team:
- in the regular season
- and in the playoffs (valued higher than the regular season)

If the PER of two players is "close enough" then they will be considered co-best players.

## What is this not taking into account?
- PER isn't very good at evaluating defense and it might give too much value to rebounding.
- By simply looking at the best PER we are also not taking into account dominance - how much better a player is than the rest.

## Include all the players but can fairly only compare players post-1980
- How many rules do you need to change before a sport becomes a different sport?
    - basketball before the 3pt line in the NBA looked much different and required a completely different skill-set.
- I decided to make a cutoff in 1980 for multiple reasons:
    - that is when the 3pt shot was added which really opened up the floor (I understand this didn't happen immidiatelly).
    - PER statistic includes blocks, steal, turnovers: All of which were first recorded in the 1977-78 season.
    - the 70s don't really have a player in the GOAT discussion (except Kareem)
- This removes a few players from the discussion: Bill, Wilt, Oscar, Jerry, etc.
- As mentioned the only player who is in between areas is Kareem:
    - should his entire career count?
    - should only his career post 1980 count?
    - should he not be considered?
    - **I have decided to include his entire career into the mix**

In [2]:
import pandas as pd
import json
from goat_utils import *

BASE_URL = "https://www.basketball-reference.com/"

players_data = {}

# Collecting the Data
- The data for the players in the dictionary has already been collected and is in a JSON file

## scrape the data
- Basketball Reference has limits on how many requests can be made to their page
- so I suggest commenting out blocks of players and running the below Cell for only a few players at a time
- the results will persist

In [95]:
players_urls = {
    "michael jordan": "https://www.basketball-reference.com/players/j/jordami01.html",
    "lebron james": "https://www.basketball-reference.com/players/j/jamesle01.html",
    "magic johnson": "https://www.basketball-reference.com/players/j/johnsma02.html",
    "shaquille o'neal": "https://www.basketball-reference.com/players/o/onealsh01.html",
    "tim duncan": "https://www.basketball-reference.com/players/d/duncati01.html",
    "kobe bryant": "https://www.basketball-reference.com/players/b/bryanko01.html",
    "larry bird": "https://www.basketball-reference.com/players/b/birdla01.html",
    "stephen curry": "https://www.basketball-reference.com/players/c/curryst01.html",
    "hakeem olajuwon": "https://www.basketball-reference.com/players/o/olajuha01.html",
    "moses malone": "https://www.basketball-reference.com/players/m/malonmo01.html",
    "isiah thomas": "https://www.basketball-reference.com/players/t/thomais01.html",
    "kareem abdul-jabbar": "https://www.basketball-reference.com/players/a/abdulka01.html",
    "dirk nowitzki": "https://www.basketball-reference.com/players/n/nowitdi01.html",
    "dwyane wade": "https://www.basketball-reference.com/players/w/wadedw01.html",
    "nikola jokic": "https://www.basketball-reference.com/players/j/jokicni01.html",
    "kevin durant": "https://www.basketball-reference.com/players/d/duranke01.html",
    "giannis antetokounmpo": "https://www.basketball-reference.com/players/a/antetgi01.html",
    ############## PRE-1980 PLAYERS ############
    "bill russell": "https://www.basketball-reference.com/players/r/russebi01.html",
    "wilt chamberlain": "https://www.basketball-reference.com/players/c/chambwi01.html",
    "jerry west": "https://www.basketball-reference.com/players/w/westje01.html",
    "oscar robertson": "https://www.basketball-reference.com/players/r/roberos01.html",
    "john havlicek": "https://www.basketball-reference.com/players/h/havlijo01.html",
    "sam jones": "https://www.basketball-reference.com/players/j/jonessa01.html",
}

for idxp, player in enumerate(players_urls.keys()):

    print(f"processing player: {player}")

    player_url = players_urls[player]

    response = requests.get(player_url)
    if response.status_code != 200:
        raise Exception('Response from the page is different than 200')
    player_page_soup = BeautifulSoup(response.content, "html.parser")

    # get top-N mvp results
    mvp_results = parse_num_mvp(player_page_soup, 3)

    # get all links to teams by year
    team_urls = extract_team_urls(player_page_soup, BASE_URL)

    players_data[player] = {}
    players_data[player]['url'] = players_urls[player]
    players_data[player]['mvp_results'] = mvp_results
    
    players_data[player]['seasons'] = []

    for idxy, team_url in enumerate(team_urls):

        season = int(team_url[-9:-5])

        print(f"...processing season: {season}")

        response = requests.get(team_url)
        if response.status_code != 200:
            raise Exception('Response from the page is different than 200')
        team_year_soup = BeautifulSoup(response.content, "html.parser")

        team_result = parse_team_playoff_result(team_year_soup)
        top_team_year_per = get_top_regular_season_per_players(team_year_soup, top_n=3)
        top_team_playoffs_per = get_top_playoff_per_players(team_year_soup, top_n=3)

        players_data[player]['seasons'].append({
            "year": season,
            "team_result": team_result,
            "top_season_per": top_team_year_per,
            "top_playoff_per": top_team_playoffs_per,
        })

print(f"Currently we have data for {len(players_data.keys())} players.")

processing player: isiah thomas
...processing season: 1982
...processing season: 1983
...processing season: 1984
...processing season: 1985
...processing season: 1986
...processing season: 1987
...processing season: 1988
...processing season: 1989
...processing season: 1990
...processing season: 1991
...processing season: 1992
...processing season: 1993
...processing season: 1994
Currently we have data for 23 players.


### Save the collected data

In [96]:
with open('players_data.json', 'w') as file:
    json.dump(players_data, file, indent=4)

## Load the collected data from JSON

In [3]:
# Read from JSON file
with open('players_data.json', 'r') as file:
    loaded_dict = json.load(file)

len(loaded_dict.keys()), loaded_dict.keys()

(23,
 dict_keys(['michael jordan', 'lebron james', 'magic johnson', "shaquille o'neal", 'tim duncan', 'kobe bryant', 'larry bird', 'stephen curry', 'hakeem olajuwon', 'moses malone', 'kareem abdul-jabbar', 'dirk nowitzki', 'dwyane wade', 'nikola jokic', 'kevin durant', 'giannis antetokounmpo', 'bill russell', 'wilt chamberlain', 'jerry west', 'oscar robertson', 'john havlicek', 'sam jones', 'isiah thomas']))

# Transforming the Data
- (4 points) Lead a team to a **championship** as the **best** player
- (3 points) Lead a team to a **championship** as the **co-best** player
- (2 points) Lead a team to the **finals** as the **best** player
- (1 points) Lead a team to the **finals** as the **co-best** player
- (1 points) Be the **2nd best** player on a **championship** team
- (1 point) Lead a team to the **conference finals** as the **best** player
- (1 point) Finish in the **top 3 for MVP** voting in a season

In [4]:
SEASON_PER_WEIGHT = 0.35 # how much importance does the regular season PER play in determining the best player on the team
PLAYOFF_PER_WEIGHT = 0.65 # how much importance does the playoff PER play in determining the best player on the team
CO_MARGIN = 1.5 # margin between best PER and 2nd best (if between then Co-)

goat_res = []

for player in loaded_dict:

    player_goat_res = {
        "player": player,
        "goat_points": 0,
        "c1": {"count": 0, "goat_points": 0, "years": []},
        "c12": {"count": 0, "goat_points": 0, "years": []},
        "f1": {"count": 0, "goat_points": 0, "years": []},
        "c2": {"count": 0, "goat_points": 0, "years": []},
        "f12": {"count": 0, "goat_points": 0, "years": []},
        "cf1": {"count": 0, "goat_points": 0, "years": []},
        "mvp3": {"count": 0, "goat_points": 0, "years": []},
    }

    URL_ID = '/' + loaded_dict[player]['url'].replace(BASE_URL, '')

    mvp_results = loaded_dict[player]['mvp_results']

    player_goat_res["mvp3"]["count"] = mvp_results[0]
    player_goat_res["mvp3"]["years"] = mvp_results[1]
    
    for season in loaded_dict[player]['seasons']:
        if season['team_result'] == 'other':
            continue

        rank = rank_on_team(
            URL_ID,
            season['top_season_per'],
            season['top_playoff_per'],
            SEASON_PER_WEIGHT,
            PLAYOFF_PER_WEIGHT,
            CO_MARGIN,
        )

        if (season['team_result'] == 'champions') & (rank == '1'):
            player_goat_res["c1"]["count"] += 1
            player_goat_res["c1"]["years"].append(season['year'])
        if (season['team_result'] == 'champions') & (rank == '1/2'):
            player_goat_res["c12"]["count"] += 1
            player_goat_res["c12"]["years"].append(season['year'])
        if (season['team_result'] == 'champions') & (rank == '2'):
            player_goat_res["c2"]["count"] += 1
            player_goat_res["c2"]["years"].append(season['year'])
        if (season['team_result'] == 'conference champions') & (rank == '1'):
            player_goat_res["f1"]["count"] += 1
            player_goat_res["f1"]["years"].append(season['year'])
        if (season['team_result'] == 'conference champions') & (rank == '1/2'):
            player_goat_res["f12"]["count"] += 1
            player_goat_res["f12"]["years"].append(season['year'])
        if (season['team_result'] == 'conference finals') & (rank == '1'):
            player_goat_res["cf1"]["count"] += 1
            player_goat_res["cf1"]["years"].append(season['year'])

    # GOAT point calculations
    player_goat_res["c1"]["goat_points"] = player_goat_res["c1"]["count"] * 4
    player_goat_res["c12"]["goat_points"] = player_goat_res["c12"]["count"] * 3
    player_goat_res["c2"]["goat_points"] = player_goat_res["c2"]["count"] * 1
    player_goat_res["f1"]["goat_points"] = player_goat_res["f1"]["count"] * 2
    player_goat_res["f12"]["goat_points"] = player_goat_res["f12"]["count"] * 1
    player_goat_res["cf1"]["goat_points"] = player_goat_res["cf1"]["count"] * 1
    player_goat_res["mvp3"]["goat_points"] = player_goat_res["mvp3"]["count"] * 1
    
    player_goat_res["goat_points"] = (
        player_goat_res["mvp3"]["goat_points"]
        + player_goat_res["c1"]["goat_points"]
        + player_goat_res["c12"]["goat_points"]
        + player_goat_res["c2"]["goat_points"]
        + player_goat_res["f1"]["goat_points"]
        + player_goat_res["f12"]["goat_points"]
        + player_goat_res["cf1"]["goat_points"]
    )
    
    goat_res.append(player_goat_res)

# Analysing the Data
- we might considred different criteria to rank best players on the team (VOPR)

In [5]:
transformed_goat_res = []

for player_data in goat_res:  # Replace 'original_data' with your list variable
    expanded_player_data = {}
    for key, value in player_data.items():
        if isinstance(value, dict):  # Check if the value is a dictionary
            for sub_key, sub_value in value.items():
                if sub_key == "goat_points":
                    continue
                expanded_key = f"{key}_{sub_key}"
                expanded_player_data[expanded_key] = sub_value
        else:
            expanded_player_data[key] = value
    transformed_goat_res.append(expanded_player_data)

goat_df = pd.DataFrame(transformed_goat_res)

In [6]:
goat_df_post_1980 = goat_df[~goat_df.player.isin([
    "bill russell",
    "wilt chamberlain",
    "jerry west",
    "oscar robertson",
    "john havlicek",
    "sam jones",
])]

goat_df_post_1980.sort_values("goat_points", ascending=False).reset_index(drop=True)

Unnamed: 0,player,goat_points,c1_count,c1_years,c12_count,c12_years,f1_count,f1_years,c2_count,c2_years,f12_count,f12_years,cf1_count,cf1_years,mvp3_count,mvp3_years
0,lebron james,38,3,"[2012, 2013, 2016]",1,[2020],5,"[2007, 2014, 2015, 2017, 2018]",0,[],1,[2011],1,[2009],11,"[2005-06, 2008-09, 2009-10, 2010-11, 2011-12, ..."
1,michael jordan,36,6,"[1991, 1992, 1993, 1996, 1997, 1998]",0,[],0,[],0,[],0,[],2,"[1989, 1990]",10,"[1986-87, 1987-88, 1988-89, 1989-90, 1990-91, ..."
2,magic johnson,30,2,"[1987, 1988]",2,"[1982, 1985]",2,"[1989, 1991]",1,[1980],1,[1984],1,[1986],9,"[1982-83, 1983-84, 1984-85, 1985-86, 1986-87, ..."
3,kareem abdul-jabbar,30,2,"[1971, 1980]",2,"[1982, 1985]",2,"[1974, 1983]",0,[],1,[1984],2,"[1972, 1977]",9,"[1969-70, 1970-71, 1971-72, 1972-73, 1973-74, ..."
4,tim duncan,26,3,"[2003, 2005, 2007]",2,"[1999, 2014]",0,[],0,[],1,[2013],2,"[2008, 2012]",5,"[1998-99, 2000-01, 2001-02, 2002-03, 2003-04]"
5,shaquille o'neal,24,3,"[2000, 2001, 2002]",0,[],2,"[1995, 2004]",1,[2006],0,[],2,"[1996, 1998]",5,"[1994-95, 1999-00, 2000-01, 2001-02, 2004-05]"
6,larry bird,23,2,"[1984, 1986]",1,[1981],2,"[1985, 1987]",0,[],0,[],0,[],8,"[1980-81, 1981-82, 1982-83, 1983-84, 1984-85, ..."
7,stephen curry,19,2,"[2015, 2022]",2,"[2017, 2018]",1,[2016],0,[],0,[],0,[],3,"[2014-15, 2015-16, 2020-21]"
8,kobe bryant,17,1,[2009],1,[2010],1,[2008],3,"[2000, 2001, 2002]",0,[],0,[],5,"[2002-03, 2006-07, 2007-08, 2008-09, 2009-10]"
9,kevin durant,15,0,[],2,"[2017, 2018]",2,"[2012, 2019]",0,[],0,[],1,[2011],4,"[2009-10, 2011-12, 2012-13, 2013-14]"


In [7]:
goat_df.sort_values("goat_points", ascending=False).reset_index(drop=True).head(10)

Unnamed: 0,player,goat_points,c1_count,c1_years,c12_count,c12_years,f1_count,f1_years,c2_count,c2_years,f12_count,f12_years,cf1_count,cf1_years,mvp3_count,mvp3_years
0,bill russell,40,4,"[1959, 1960, 1961, 1962]",5,"[1957, 1963, 1964, 1965, 1966]",0,[],0,[],0,[],0,[],9,"[1957-58, 1958-59, 1959-60, 1960-61, 1961-62, ..."
1,lebron james,38,3,"[2012, 2013, 2016]",1,[2020],5,"[2007, 2014, 2015, 2017, 2018]",0,[],1,[2011],1,[2009],11,"[2005-06, 2008-09, 2009-10, 2010-11, 2011-12, ..."
2,michael jordan,36,6,"[1991, 1992, 1993, 1996, 1997, 1998]",0,[],0,[],0,[],0,[],2,"[1989, 1990]",10,"[1986-87, 1987-88, 1988-89, 1989-90, 1990-91, ..."
3,kareem abdul-jabbar,30,2,"[1971, 1980]",2,"[1982, 1985]",2,"[1974, 1983]",0,[],1,[1984],2,"[1972, 1977]",9,"[1969-70, 1970-71, 1971-72, 1972-73, 1973-74, ..."
4,magic johnson,30,2,"[1987, 1988]",2,"[1982, 1985]",2,"[1989, 1991]",1,[1980],1,[1984],1,[1986],9,"[1982-83, 1983-84, 1984-85, 1985-86, 1986-87, ..."
5,tim duncan,26,3,"[2003, 2005, 2007]",2,"[1999, 2014]",0,[],0,[],1,[2013],2,"[2008, 2012]",5,"[1998-99, 2000-01, 2001-02, 2002-03, 2003-04]"
6,shaquille o'neal,24,3,"[2000, 2001, 2002]",0,[],2,"[1995, 2004]",1,[2006],0,[],2,"[1996, 1998]",5,"[1994-95, 1999-00, 2000-01, 2001-02, 2004-05]"
7,larry bird,23,2,"[1984, 1986]",1,[1981],2,"[1985, 1987]",0,[],0,[],0,[],8,"[1980-81, 1981-82, 1982-83, 1983-84, 1984-85, ..."
8,stephen curry,19,2,"[2015, 2022]",2,"[2017, 2018]",1,[2016],0,[],0,[],0,[],3,"[2014-15, 2015-16, 2020-21]"
9,kobe bryant,17,1,[2009],1,[2010],1,[2008],3,"[2000, 2001, 2002]",0,[],0,[],5,"[2002-03, 2006-07, 2007-08, 2008-09, 2009-10]"
