## Necessary input: competition, year, data_type

### Input explanation and example

#### Competition: 
Country (e.g. "England"), competition name (e.g. "Premier League"), or three letter country code (e.g. "eng")

#### Year:
Starting year of the season (e.g. for 2024-2025 season - input 2024)

#### Data Type: 

Standard stats - input "stats"

Goalkeeping - input "keepers"

Advanced goalkeeping - input "keepersadv"

Shooting - input "shooting"

Passing - input "passing"

Pass Types - input "passing_types"

Goal and Shot Creation - input "gca"

Defensive Actions - input "defense"

Possession - input "possession"

Playing Time - input "playingtime"

Miscellaneous Stats - input "misc"

#### Further explanation

For competition, both uppercase and lowercase are accepted
For competition and data_type, quotation marks are neccessary

In [94]:
## Create a dataset with all the players
## Create a dataframe with all the possible data for every player
## Do the same for the goalkeepers

## Team Data
## Match Data

## League Standings

## Add other possible competitions

In [54]:
## Importing necessary librarires

import pandas as pd
import numpy as np

import requests
from bs4 import BeautifulSoup

import time

import warnings
warnings.filterwarnings('ignore')

from internal_packages import check_comp

In [56]:
## Fecthing seasonal player data

def fetch_seasonal_player_data(url):
    
    ## Getting raw data using BeautifulSoup library
    
    resp = requests.get(url) 
    resp.encoding = 'utf-8'
    soup = BeautifulSoup(resp.text.replace('<!--', '').replace('--!>', ''), 'html.parser')

    ## Creting two lists which will be filled with data
    
    headers, rows = [], []
    
    ## Getting values from the very first row of the table, from which with iterration we will get all the values from 'data-stat' attributes, which will be used as headers

    table_data = soup.find_all('tbody')[2]
    table_rows = table_data.find_all('tr')

    ## Getting the headers and putting them into the headers list
    
    for value in table_rows[0]:
        headers.append(value.get('data-stat'))
        
    ## Getting every row value and putting them into the rows list

    for row in table_rows:
        row_values = []
        row_data = row.find_all('td')
        for data_value in row_data:
            if data_value.get('data-append-csv'):
                row_values.append(data_value.get('data-append-csv')) ## 'data-append-csv' is an attribute inside of each row representing the unique code for every player, which will be used as 'player_id' later
            row_values.append(data_value.get_text())
        rows.append(row_values)
        
    ## Creating the dataframe

    df = pd.DataFrame(rows, columns = headers).fillna(0) ## Creating the dataframe from the data
    df = df.loc[df['player'] != 0].reset_index(drop = True) ## FBRef has an unused row every approx 25 rows which is filled with NaN. This code is to remove it, while immediately reseting the index to ease further iterrations

    ## Minutes are displayed with a comma where the values go over one thousand, so we need to remove ',' from the string, to later convert the value into numerical
    
    if 'minutes' in headers:
        df['minutes'] = df['minutes'].str.replace(',', '')
    elif 'gk_minutes' in headers:
        df['gk_minutes'] = df['gk_minutes'].str.replace(',', '')

    ## For the actual season, age column always shows the age in format yy-ddd, with the players age exactly shown in years and days
    ## Following algorithm is used to remove dash and ddd, so data can be transformed into numerical value
    
    for i, row in df.iterrows():
        try:
            if '-' in row['age']: ## If '-' exist in the column
                age_num = row['age'].split('-')[0] ## Splitting the string where the '-' is and keeping only the part before it (years of age), storing it into an age_num variable
                df.at[i, 'age'] = age_num ## Changing values in every row where column is age with the age_num variable
        except Exception: pass ## Ignoring if data transformation is not necessary

    df.rename(columns={'ranker' : 'player_id'}, inplace=True) ## Renaming the first column from ranker to player_id for easier understading
    df = df.apply(pd.to_numeric, errors = 'ignore') ## Transforming all possible values from object into numerical values, ignoring the ones that cannot be changed instead of deleting or changing them
    
    return df ## Returning the dataframe from the function

In [73]:
## Creating the url and returning fetched dataframe

def seasonal_player_data(competition, year, data_type):
    try:
        comp = check_comp.check_comp(competition) ## Checking competitions with pre-created formula
        season = f"{str(year)}-{str(year+1)}" ## Creating proper season string
        url = f"https://fbref.com/en/comps/{comp[0]}/{season}/{data_type}/{season}-{comp[1]}-Stats" ## Creating the url
        df = fetch_seasonal_player_data(url) ## Fetching data

        return df ## Returning dataframe if all inputs are proper
    except: return "Invalid data input" ## Returning string what shows invalid input

In [76]:
seasonal_player_data("eng", 2024, "keepers")

Unnamed: 0,player_id,player,nationality,position,team,age,birth_year,gk_games,gk_games_starts,gk_minutes,...,gk_ties,gk_losses,gk_clean_sheets,gk_clean_sheets_pct,gk_pens_att,gk_pens_allowed,gk_pens_saved,gk_pens_missed,gk_pens_save_pct,matches
0,7a2e46a8,Alisson,br BRA,GK,Liverpool,32,1992,13,13,1158,...,3,1,5,38.5,0,0,0,0,,Matches
1,2f965a72,Alphonse Areola,fr FRA,GK,West Ham,31,1993,11,10,910,...,2,6,2,20.0,0,0,0,0,,Matches
2,28d596a0,Kepa Arrizabalaga,es ESP,GK,Bournemouth,30,1994,15,15,1350,...,3,4,4,26.7,1,1,0,0,0.0,Matches
3,5e253986,Brandon Austin,eng ENG,GK,Tottenham,26,1999,1,1,90,...,0,1,0,0.0,0,0,0,0,,Matches
4,3a949a25,Martin Dúbravka,sk SVK,GK,Newcastle Utd,36,1989,7,7,630,...,0,1,5,71.4,0,0,0,0,,Matches
5,3bb7b8b4,Ederson,br BRA,GK,Manchester City,31,1993,14,14,1260,...,2,4,3,21.4,2,2,0,0,0.0,Matches
6,9328b835,Łukasz Fabiański,pl POL,GK,West Ham,39,1985,13,12,1070,...,3,4,2,16.7,3,3,0,0,0.0,Matches
7,a92ab7be,Mark Flekken,nl NED,GK,Brentford,31,1993,22,22,1925,...,4,10,2,9.1,1,1,0,0,0.0,Matches
8,c3e39f12,Fraser Forster,eng ENG,GK,Tottenham,36,1988,7,7,630,...,2,4,1,14.3,2,2,0,0,0.0,Matches
9,e5a76dfe,Dean Henderson,eng ENG,GK,Crystal Palace,27,1997,22,22,1980,...,9,7,6,27.3,1,0,1,0,100.0,Matches
