# Capstone Project Data Preparation

* Authors: Jules Mejia
* Instructor name: Hardik
* Date: Sunday 16 July 2023
***

## Overview

The data for this project is sourced from [Basketball Reference](https://www.basketball-reference.com/). and [Hoops Hype.](https://hoopshype.com/) Both websites are trusted and reputable sources of NBA statistics and news. The business problem is aimed at players looking to secure a big contract from the 2023-24 season and beyond. Therefore the data will be taken from the latest season 2022-23 as it represents the trends in the current NBA landscape. 

Basketball-Reference contains the salaries of the players however it is not accessible in a way that relates to their statistics. Therefore the player's salary is web scraped from the page [2022/23 NBA Player Salaries.](https://hoopshype.com/salaries/players/2022-2023/)

Below are the tables taken from the [2022-23 season.](https://www.basketball-reference.com/leagues/NBA_2023_totals.html) They represent the statistics each player has accumulated during the regular season. 
* Totals
* Per Game
* Advanced
* Play-by-Play
* Shooting
* Adjusted Shooting

The aim of the data preparation is to create .csv file that will be ready for exploratory data analysis. 

In [1]:
# Web scraping for the player salary

import requests
from bs4 import BeautifulSoup
import pandas as pd

# Send a GET request to the webpage
url = "https://hoopshype.com/salaries/players/2022-2023/"
response = requests.get(url)

# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')

# Find the table containing the player salaries
table = soup.find('table')

# Extract the data from the table
rows = table.find_all('tr')

data = []
for row in rows:
    # Extract the columns from each row
    columns = row.find_all('td')

    # Extract the player name and salary
    if len(columns) >= 3:
        player_name = columns[1].text.strip()
        player_salary = columns[2].text.strip()
        data.append([player_name, player_salary])

# Create a DataFrame from the extracted data
df_sal = pd.DataFrame(data, columns=['Player', 'Salary'])

# Remove the first row as it contains the column names
df_sal = df_sal.iloc[1:]

# Reset the index
df_sal = df_sal.reset_index(drop=True)

# Print the transformed DataFrame
print(df_sal)

                Player       Salary
0        Stephen Curry  $48,070,014
1            John Wall  $47,345,760
2    Russell Westbrook  $47,080,179
3         LeBron James  $44,474,988
4         Kevin Durant  $44,119,845
..                 ...          ...
569          Gabe York      $32,171
570         Ibou Badji      $18,226
571   Tristan Thompson      $16,700
572       RaiQuan Gray       $5,849
573      Jacob Gilyard       $5,849

[574 rows x 2 columns]


In [2]:
df_sal.to_csv('player_salaries_22-23.csv', index=False)

In [7]:
file_path = 'player_totals_22-23.xlsx'

df_tot = pd.read_excel(file_path)

df_tot.head()

Unnamed: 0,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Player-additional
0,1,Precious Achiuwa,C,23,TOR,55,12,1140,196,404,...,100,228,328,50,31,30,59,102,508,achiupr01
1,2,Steven Adams,C,29,MEM,42,42,1133,157,263,...,214,271,485,97,36,46,79,98,361,adamsst01
2,3,Bam Adebayo,C,25,MIA,75,75,2598,602,1114,...,184,504,688,240,88,61,187,208,1529,adebaba01
3,4,Ochai Agbaji,SG,22,UTA,59,22,1209,165,386,...,43,78,121,67,16,15,41,99,467,agbajoc01
4,5,Santi Aldama,PF,22,MEM,77,20,1682,247,525,...,85,286,371,97,45,48,60,143,696,aldamsa01


In [11]:
df_tot.columns

Index(['Rk', 'Player', 'Pos', 'Age', 'Tm', 'G', 'GS', 'MP', 'FG', 'FGA', 'FG%',
       '3P', '3PA', '3P%', '2P', '2PA', '2P%', 'eFG%', 'FT', 'FTA', 'FT%',
       'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'PTS',
       'Player-additional'],
      dtype='object')

In [8]:
df_pg = pd.read_excel('player_pergame_22-23.xlsx')

df_pg.head()

Unnamed: 0,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Player-additional
0,1,Precious Achiuwa,C,23,TOR,55,12,20.7,3.6,7.3,...,1.8,4.1,6.0,0.9,0.6,0.5,1.1,1.9,9.2,achiupr01
1,2,Steven Adams,C,29,MEM,42,42,27.0,3.7,6.3,...,5.1,6.5,11.5,2.3,0.9,1.1,1.9,2.3,8.6,adamsst01
2,3,Bam Adebayo,C,25,MIA,75,75,34.6,8.0,14.9,...,2.5,6.7,9.2,3.2,1.2,0.8,2.5,2.8,20.4,adebaba01
3,4,Ochai Agbaji,SG,22,UTA,59,22,20.5,2.8,6.5,...,0.7,1.3,2.1,1.1,0.3,0.3,0.7,1.7,7.9,agbajoc01
4,5,Santi Aldama,PF,22,MEM,77,20,21.8,3.2,6.8,...,1.1,3.7,4.8,1.3,0.6,0.6,0.8,1.9,9.0,aldamsa01
