<a href="https://colab.research.google.com/github/hudsonts/my_projects/blob/main/BasketballEnglandScoutingReport.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Basketball England Scouting Report**
###### This notebook is organized in phases through which player performance is tracked across age groups to assess team strengths and weaknesses, while also seeking insight into the deficiencies of Great Britains rosters across age groups

###### *The approach for this task will be broken down into __ steps*

1. Webscrape and clean data across various FIBA webpages to pull the most important statistics about each team
2. Look at the stats of the top performing teams, and the worst performing teams in a given division to highlight discrepancies

# 1. Webscrape and clean data across various FIBA webpages to pull the most important statistics about each team

In [34]:
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
import warnings

#Since we know FIBA is a trustworthy website the following lines of code enable it to be flagged as a trusted source and disable warning messages
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)


In [73]:
class StatDisplay:
    def __init__(self, division, country):
      self.division = division
      self.country = country

    def get_country_info(self):
      return f"{self.country}, {self.division}"

    def get_height(self):
      url = f"https://www.fiba.basketball/europe/{self.division}/2022/team/{self.country}#|tab=roster"
      r = requests.get(url, verify=False)
      soup = bs(r.text, 'html.parser')
      results = soup.find(id="team_profile_roster")
      heights = results.find_all('div', class_='height')

      height_list = []  # Store the extracted heights

      for height in heights[1:13]:
        height_text = height.text.strip()
        height_list.append(height_text[73:])

      return height_list

    def get_birthdate(self):
      url = f"https://www.fiba.basketball/europe/{self.division}/2022/team/{self.country}#|tab=roster"
      r = requests.get(url, verify=False)
      soup = bs(r.text, 'html.parser')
      results = soup.find(id="team_profile_roster")
      birthdates = results.find_all('div', class_='birth')

      birth_list = []  # Store the extracted birthdates

      for birth in birthdates[1:13]:
        birth_text = birth.text.strip()
        birth_list.append(birth_text[:10])

      return birth_list

      # Find and remove the average age
      average_age = soup.find('div', class_='average_age')
      if average_age:
        average_age.extract()

    def get_position(self):
      url = f"https://www.fiba.basketball/europe/{self.division}/2022/team/{self.country}#|tab=roster"
      r = requests.get(url,verify=False)
      soup = bs(r.text, 'html.parser')
      results = soup.find(id="team_profile_roster")
      positions = results.find_all('div', class_='position')

      position_list = [] # Store the extracted positions

      for position in positions:
        position_text = position.text.strip()
        position_list.append(position_text)

      return position_list

    def display_stats(self):
        url = f"https://www.fiba.basketball/europe/{self.division}/2022/team/{self.country}#|tab=overview,average_statistics"

        r = requests.get(url, verify=False)
        soup = bs(r.text, 'html.parser')
        player_table = soup.find('table', class_='comparative columnhover responsive')

        if player_table is None:
            print("Player table not found. Please check the URL or the website structure.")
            return

        # Create empty lists for stats to be held
        players_list = []
        points_list = []
        games_played_list = []
        minutes_avg_list = []
        fg_pct_list = []
        fg3_pct_list = []
        ft_pct_list = []
        o_rebounds_list = []
        d_rebounds_list = []
        rebounds_list = []
        assists_list = []
        fouls_list = []
        turnovers_list = []
        steals_list = []
        blocks_list = []
        stocks_list = []
        plus_minus_list = []
        player_efficiency_list = []

        # Iterate through HTML table in url
        for player in player_table.find_all('tbody'):
            rows = player.find_all('tr')
            for row in rows:
                player_name = row.find('td', class_='fixed aligned_left')
                games_played = row.find_all('td')[2]
                minutes_avg = row.find_all('td')[3]
                fg_pct = row.find_all('td')[4]
                fg3_pct = row.find_all('td')[6]
                ft_pct = row.find_all('td')[7]

                # Check if the row has enough elements before accessing specific indices
                if len(row.find_all('td')) >= 13:
                    o_rebounds = row.find_all('td')[9]
                    d_rebounds = row.find_all('td')[10]
                    assists = row.find_all('td')[12]
                    fouls = row.find_all('td')[13]
                    turnovers = row.find_all('td')[14]
                    steals = row.find_all('td')[15]
                    blocks = row.find_all('td')[16]
                    plus_minus = row.find_all('td')[17]
                    player_efficiency = row.find_all('td')[18]

                    if len(row.find_all('td')) >= 20:
                        points = row.find_all('td')[19]

                        # Ensures we are considering only nonempty strings and cleaning data accordingly
                        if player_name is not None and games_played is not None and minutes_avg is not None and \
                                fg_pct is not None and fg3_pct is not None:
                            player_name = player_name.text.strip()
                            points = points.text.strip()
                            games_played = games_played.text.strip()
                            minutes_avg = minutes_avg.text.strip()
                            o_rebounds = o_rebounds.text.strip()
                            d_rebounds = d_rebounds.text.strip()
                            assists = assists.text.strip()
                            fouls = fouls.text.strip()
                            turnovers = turnovers.text.strip()
                            steals = steals.text.strip()
                            blocks = blocks.text.strip()
                            plus_minus = plus_minus.text.strip()
                            player_efficiency = player_efficiency.text.strip()
                            fg_pct = fg_pct.text.strip()
                            fg3_pct = fg3_pct.text.strip().split('/')[1][3:]
                            ft_pct = ft_pct.text.strip().split('/')[1][3:]
                            if fg_pct[6] == '.':
                                fg_pct = fg_pct.split('/')[1][4:]
                            elif fg_pct[5] == '.':
                                fg_pct = fg_pct.split('/')[1][3:]
                            if player_name and games_played:
                                players_list.append(player_name)
                                points_list.append(points)
                                games_played_list.append(games_played)
                                minutes_avg_list.append(minutes_avg)
                                fg_pct_list.append(fg_pct)
                                fg3_pct_list.append(fg3_pct)
                                ft_pct_list.append(ft_pct)
                                o_rebounds_list.append(o_rebounds)
                                d_rebounds_list.append(d_rebounds)
                                rebounds_list.append(float(o_rebounds) + float(d_rebounds))
                                assists_list.append(assists)
                                fouls_list.append(fouls)
                                turnovers_list.append(turnovers)
                                steals_list.append(steals)
                                blocks_list.append(blocks)
                                stocks_list.append(float(steals) + float(blocks))
                                plus_minus_list.append(plus_minus)
                                player_efficiency_list.append(player_efficiency)

        pd.set_option('display.max_columns', None)
        pd.set_option('display.expand_frame_repr', False)

        df = pd.DataFrame({'PLAYERS': players_list,
                           'Position' : self.get_position(),
                           'Height' : self.get_height(),
                           'PTS': points_list,
                           'GP': games_played_list,
                           'MIN': minutes_avg_list,
                           'FG%': fg_pct_list,
                           '3PT FG%': fg3_pct_list,
                           'FT%': ft_pct_list,
                           'OREB': o_rebounds_list,
                           'DREB': d_rebounds_list,
                           'TOT': rebounds_list,
                           'AST': assists_list,
                           'PF': fouls_list,
                           'TO': turnovers_list,
                           'STL': steals_list,
                           'BLK': blocks_list,
                           'STOCKS': stocks_list,
                           '+/-': plus_minus_list,
                           'EF': player_efficiency_list,
                           'Birthdate' : self.get_birthdate()})
        df.index = range(1, len(df) + 1)

        return df

# Test DataFrame result
t = StatDisplay('u18', 'Spain')
df = t.display_stats()
print(t.get_country_info())
print(df)

Spain, u18
         PLAYERS       Position Height   PTS GP   MIN    FG% 3PT FG%    FT% OREB DREB   TOT  AST   PF   TO  STL  BLK  STOCKS   +/-    EF   Birthdate
1   J. Rodriguez          Guard   6'4"   9.0  7  24.8  32.3%   26.2%  66.7%  0.3  2.9   3.2  4.4  1.1  3.3  0.7  0.0     0.7  14.7   7.0  09/08/2004
2     I. Almansa    Point Guard   6'1"  15.7  7  24.6  61.7%      -%    50%  3.7  7.0  10.7  0.9  1.6  1.6  1.6  1.1     2.7  18.9  22.6  06/09/2004
3      A. Garuba        Forward   6'6"   8.6  7  22.8  37.3%   20.8%    55%  1.9  2.9   4.8  0.9  1.6  2.3  2.0  0.3     2.3  12.3   7.6  25/04/2004
4      E. Pinedo          Guard   6'4"   5.6  7  21.8  43.6%   33.3%    30%  2.7  1.6   4.3  1.0  2.1  0.7  1.3  0.9     2.2  14.0   8.1  18/02/2004
5      R. Villar  Small Forward   6'8"   8.4  7  21.2    50%   36.4%    63%  0.7  2.3   3.0  4.0  1.4  2.1  1.6  0.3     1.9  14.4  11.0  10/01/2004
6     N. Cebrian        Forward   6'4"   5.7  7  19.8  31.6%   30.3%  85.7%  0.1  2.4   2.5  2.

# 2. Look at the stats of the top performing teams compared to the worst performing teams

##### Using the results of the 2022 tournaments, the three top performing teams were Spain, Turkey, and Serbia, while the lowest performing teams were North Macedonia, Great Britain, and Montenegro. The statistical averages of these teams' players are shown below:

In [61]:
teams = [
    StatDisplay('u18', 'Spain'),
    StatDisplay('u18', 'Turkey'),
    StatDisplay('u18', 'Serbia'),
    StatDisplay('u18', 'North-Macedonia'),
    StatDisplay('u18', 'Great-Britain'),
    StatDisplay('u18', 'Montenegro')
]

for team in teams:
    print(team.get_country_info())
    print(team.display_stats())

Spain, u18
         PLAYERS       Position   PTS GP   MIN    FG% 3PT FG%    FT% OREB DREB   TOT  AST   PF   TO  STL  BLK  STOCKS   +/-    EF   Birthdate
1   J. Rodriguez          Guard   9.0  7  24.8  32.3%   26.2%  66.7%  0.3  2.9   3.2  4.4  1.1  3.3  0.7  0.0     0.7  14.7   7.0  09/08/2004
2     I. Almansa    Point Guard  15.7  7  24.6  61.7%      -%    50%  3.7  7.0  10.7  0.9  1.6  1.6  1.6  1.1     2.7  18.9  22.6  06/09/2004
3      A. Garuba        Forward   8.6  7  22.8  37.3%   20.8%    55%  1.9  2.9   4.8  0.9  1.6  2.3  2.0  0.3     2.3  12.3   7.6  25/04/2004
4      E. Pinedo          Guard   5.6  7  21.8  43.6%   33.3%    30%  2.7  1.6   4.3  1.0  2.1  0.7  1.3  0.9     2.2  14.0   8.1  18/02/2004
5      R. Villar  Small Forward   8.4  7  21.2    50%   36.4%    63%  0.7  2.3   3.0  4.0  1.4  2.1  1.6  0.3     1.9  14.4  11.0  10/01/2004
6     N. Cebrian        Forward   5.7  7  19.8  31.6%   30.3%  85.7%  0.1  2.4   2.5  2.4  0.9  2.3  1.1  0.1     1.2  10.4   5.9  02/02/

###**To better understand our opponents for the upcoming tournaments, we must adjust rosters and make predictions**