## 데이터 전처리
데이터 분석에 앞서, 분석에 필요한 데이터들로 전처리 작업을 해줍니다.

### 패키지 설치

In [239]:
import requests
import pandas as pd
import numpy as np
from dotenv import load_dotenv
import os
from bs4 import BeautifulSoup
import re

In [64]:
load_dotenv()
api_key = os.environ.get('API_KEY')

### 시즌아이디(seasonId) 메타데이터 조회
필요한 데이터는 챔피언스리그 시즌 데이터 이므로 챔피언스리그 시즌에 대응하는 아이디 값만 필터링 해줍니다.

In [69]:
seasonId_res = requests.get('https://static.api.nexon.co.kr/fifaonline4/latest/seasonid.json')

if seasonId_res.status_code == 200:
    seasonId_parsed_data = seasonId_res.json()
    seasonId_data = pd.DataFrame(seasonId_parsed_data)
    # print(seasonId_data)
elif seasonId_res.status_code == 404:
    print('Not Found.')
else:
    print('An error has occurred.')

uefa_data = seasonId_data.loc[seasonId_data['className'].str.contains('UEFA')]
print(uefa_data)

    seasonId                          className  \
16       221  19 UCL (19 UEFA Champions League)   
29       242  20 UCL (20 UEFA Champions League)   
42       260  21 UCL (21 UEFA Champions League)   

                                            seasonImg  
16  https://ssl.nexon.com/s2/game/fo4/obt/external...  
29  https://ssl.nexon.com/s2/game/fo4/obt/external...  
42  https://ssl.nexon.com/s2/game/fo4/obt/external...  


챔피언스리그 시즌 아이디는 221, 242, 260 임을 알 수 있습니다.

### 선수 고유 식별자(spid) 메타데이터 조회
선수 고유 식별자는 시즌아이디 (seasonid) 3자리 + 선수아이디 (pid) 6자리로 구성되어 있습니다.
앞서 필터링한 챔피언스리그 시즌 아이디를 사용하여 챔피언스리그에 출전한 선수들로만 추가 필터링을 해줍니다.

In [80]:
spId_res = requests.get('https://static.api.nexon.co.kr/fifaonline4/latest/spid.json')

if spId_res.status_code == 200:
    spId_parsed_data = spId_res.json()
    spId_data = pd.DataFrame(spId_parsed_data)
    # print(spId_data)
elif spId_res.status_code == 404:
    print('Not Found.')
else:
    print('An error has occurred.')

uefa19_data = spId_data.loc[spId_data['id'].astype(str).str.startswith('221')]
uefa20_data = spId_data.loc[spId_data['id'].astype(str).str.startswith('242')]
uefa21_data = spId_data.loc[spId_data['id'].astype(str).str.startswith('260')]

print(uefa21_data)

             id        name
6819  260002147  M. 스테켈렌뷔르흐
6820  260020801  크리스티아누 호날두
6821  260124375     부라크 일마즈
6822  260135507       페르난지뉴
6823  260138412      제임스 밀너
...         ...         ...
7179  260263413      R. 시미치
7180  260263439   파울루 베르나르두
7181  260263943      하비 세라노
7182  260264022   곤살루 에스테베스
7183  260265459       T. 모턴

[365 rows x 2 columns]


분석에 필요한 데이터는 첼시 선수들 한정이므로 팀 구분자를 제공해주면 좋겠지만 아쉽게도 제공해주지 않네요.
첼시 선수 명단을 직접 넣어줘야 할 것 같습니다.

In [79]:
chelsea_player = [
    '안토니오 뤼디거',
    '티아구 실바',
    '티모 베르너',
    '에두아르 멘디',
    'A. 크리스텐센',
    '리스 제임스',
    '로멜루 루카쿠',
    '조르지뉴',
    '하킴 지예시',
    'C. 허드슨-오도이',
    '아스필리쿠에타',
    '루벤 로프터스-칙',
    '카이 하베르츠',
    '트레보 찰로바',
    '메이슨 마운트',
    '마르코스 알론소',
    '벤 칠웰',
    '사울',
    '은골로 캉테',
    '크리스천 풀리식',
    '말랑 사르',
    '로스 바클리',
    '케파',
    'M. 베티넬리'
]


uefa19_chelsea_data = uefa19_data.loc[spId_data['name'].str.contains('|'.join(chelsea_player))]
uefa20_chelsea_data = uefa20_data.loc[spId_data['name'].str.contains('|'.join(chelsea_player))]
uefa21_chelsea_data = uefa21_data.loc[spId_data['name'].str.contains('|'.join(chelsea_player))]

print(uefa21_chelsea_data)

             id        name
6833  260164240      티아구 실바
6862  260184432     아스필리쿠에타
6884  260192505     로멜루 루카쿠
6885  260192638    마르코스 알론소
6909  260199189      로스 바클리
6935  260204246     M. 베티넬리
6943  260205452    안토니오 뤼디거
6944  260205498        조르지뉴
6953  260206585          케파
6963  260208421          사울
6965  260208670      하킴 지예시
6984  260212188      티모 베르너
6998  260213661    A. 크리스텐센
6999  260213666   루벤 로프터스-칙
7004  260215914      은골로 캉테
7040  260227796    크리스천 풀리식
7050  260229984        벤 칠웰
7056  260230918     트레보 찰로바
7069  260233064     메이슨 마운트
7077  260234642     에두아르 멘디
7084  260235454       말랑 사르
7086  260235790     카이 하베르츠
7103  260238074      리스 제임스
7116  260240740  C. 허드슨-오도이


### 데이터 크롤링
각 선수들의 데이터 조회는 피파온라인4 데이터센터 사이트의 선수 상세 정보 페이지에서 크롤링 하는 방향으로 진행해야 할 것 같습니다.

https://fifaonline4.nexon.com/DataCenter/PlayerInfo?spid={선수 고유 식별자}

In [249]:
url = 'https://fifaonline4.nexon.com/DataCenter/PlayerInfo?spid='
player_simple_19_df = pd.DataFrame()
player_detail_19_df = pd.DataFrame()
player_simple_20_df = pd.DataFrame()
player_detail_20_df = pd.DataFrame()
player_simple_21_df = pd.DataFrame()
player_detail_21_df = pd.DataFrame()

for idx, val in enumerate(uefa19_chelsea_data['id']):
    response = requests.get(url + str(val))

    if response.status_code == 200:
        html = response.text
        soup = BeautifulSoup(html, 'html.parser')
        name = soup.select_one('#middle .datacenter .player_view .content_header .info_name .name').getText()
        position = soup.select_one('#middle .datacenter .player_view .content_header .thumb .position').getText()
        overall = soup.select_one('#middle .datacenter .player_view .content_header .thumb .ovr').getText()
        birth = soup.select_one('#middle .datacenter .player_view .content_header .info_etc .birth').getText().strip()
        height = soup.select_one('#middle .datacenter .player_view .content_header .info_etc .height').getText()
        weight = soup.select_one('#middle .datacenter .player_view .content_header .info_etc .weight').getText()
        physical = soup.select_one('#middle .datacenter .player_view .content_header .info_etc .physical').getText()
        skill = soup.select_one('#middle .datacenter .player_view .content_header .info_etc .skill').getText().strip()
        foot = soup.select_one('#middle .datacenter .player_view .content_header .info_etc .foot').getText().strip()
        season = soup.select_one('#middle .datacenter .player_view .content_header .info_etc .season').getText()
        team = soup.select_one('#middle .datacenter .player_view .content_header .info_team .team .txt').getText()
        nation = soup.select_one('#middle .datacenter .player_view .content_header .info_team .nation .txt').getText()
        skill_wrap = soup.select_one('#middle .datacenter .player_view .content_header .skill_wrap').getText()
        simple_stats = soup.select('#middle .datacenter .player_view .content_middle .txt')
        simple_values = soup.select('#middle .datacenter .player_view .content_middle .value')
        detail_stats = soup.select('#middle .datacenter .player_view .content_bottom .txt')
        detail_values = soup.select('#middle .datacenter .player_view .content_bottom .value')

        player_simple_data = pd.DataFrame([{'name':name}])
        player_simple_data['position'] = position
        player_simple_data['overall'] = overall
        player_simple_data['birth'] = birth
        player_simple_data['height'] = height
        player_simple_data['weight'] = weight
        player_simple_data['physical'] = physical
        player_simple_data['skill'] = skill
        player_simple_data['foot'] = foot
        player_simple_data['season'] = season
        player_simple_data['team'] = team
        player_simple_data['nation'] = nation
        player_simple_data['skill_wrap'] = re.sub("\s|특성", "", skill_wrap)
        for s, v in zip(simple_stats, simple_values):
            player_simple_data[s.getText()] = v.getText()
        player_simple_19_df = pd.concat([player_simple_19_df, player_simple_data])

        player_detail_data = pd.DataFrame([{'name':name}])
        player_detail_data['position'] = position
        player_detail_data['overall'] = overall
        player_detail_data['birth'] = birth
        player_detail_data['height'] = height
        player_detail_data['weight'] = weight
        player_detail_data['physical'] = physical
        player_detail_data['skill'] = skill
        player_detail_data['foot'] = foot
        player_detail_data['season'] = season
        player_detail_data['team'] = team
        player_detail_data['nation'] = nation
        player_detail_data['skill_wrap'] = re.sub("\s|특성", "", skill_wrap)
        for s, v in zip(detail_stats, detail_values):
            player_detail_data[s.getText()] = v.getText()
        player_detail_19_df = pd.concat([player_detail_19_df, player_detail_data])

    elif response.status_code == 404:
        print('Not Found.')
    else:
        print('An error has occurred.')

for idx, val in enumerate(uefa20_chelsea_data['id']):
    response = requests.get(url + str(val))

    if response.status_code == 200:
        html = response.text
        soup = BeautifulSoup(html, 'html.parser')
        name = soup.select_one('#middle .datacenter .player_view .content_header .info_name .name').getText()
        position = soup.select_one('#middle .datacenter .player_view .content_header .thumb .position').getText()
        overall = soup.select_one('#middle .datacenter .player_view .content_header .thumb .ovr').getText()
        birth = soup.select_one('#middle .datacenter .player_view .content_header .info_etc .birth').getText().strip()
        height = soup.select_one('#middle .datacenter .player_view .content_header .info_etc .height').getText()
        weight = soup.select_one('#middle .datacenter .player_view .content_header .info_etc .weight').getText()
        physical = soup.select_one('#middle .datacenter .player_view .content_header .info_etc .physical').getText()
        skill = soup.select_one('#middle .datacenter .player_view .content_header .info_etc .skill').getText().strip()
        foot = soup.select_one('#middle .datacenter .player_view .content_header .info_etc .foot').getText().strip()
        season = soup.select_one('#middle .datacenter .player_view .content_header .info_etc .season').getText()
        team = soup.select_one('#middle .datacenter .player_view .content_header .info_team .team .txt').getText()
        nation = soup.select_one('#middle .datacenter .player_view .content_header .info_team .nation .txt').getText()
        skill_wrap = soup.select_one('#middle .datacenter .player_view .content_header .skill_wrap').getText()
        simple_stats = soup.select('#middle .datacenter .player_view .content_middle .txt')
        simple_values = soup.select('#middle .datacenter .player_view .content_middle .value')
        detail_stats = soup.select('#middle .datacenter .player_view .content_bottom .txt')
        detail_values = soup.select('#middle .datacenter .player_view .content_bottom .value')

        player_simple_data = pd.DataFrame([{'name':name}])
        player_simple_data['position'] = position
        player_simple_data['overall'] = overall
        player_simple_data['birth'] = birth
        player_simple_data['height'] = height
        player_simple_data['weight'] = weight
        player_simple_data['physical'] = physical
        player_simple_data['skill'] = skill
        player_simple_data['foot'] = foot
        player_simple_data['season'] = season
        player_simple_data['team'] = team
        player_simple_data['nation'] = nation
        player_simple_data['skill_wrap'] = re.sub("\s|특성", "", skill_wrap)
        for s, v in zip(simple_stats, simple_values):
            player_simple_data[s.getText()] = v.getText()
        player_simple_20_df = pd.concat([player_simple_20_df, player_simple_data])

        player_detail_data = pd.DataFrame([{'name':name}])
        player_detail_data['position'] = position
        player_detail_data['overall'] = overall
        player_detail_data['birth'] = birth
        player_detail_data['height'] = height
        player_detail_data['weight'] = weight
        player_detail_data['physical'] = physical
        player_detail_data['skill'] = skill
        player_detail_data['foot'] = foot
        player_detail_data['season'] = season
        player_detail_data['team'] = team
        player_detail_data['nation'] = nation
        player_detail_data['skill_wrap'] = re.sub("\s|특성", "", skill_wrap)
        for s, v in zip(detail_stats, detail_values):
            player_detail_data[s.getText()] = v.getText()
        player_detail_20_df = pd.concat([player_detail_20_df, player_detail_data])

    elif response.status_code == 404:
        print('Not Found.')
    else:
        print('An error has occurred.')

for idx, val in enumerate(uefa21_chelsea_data['id']):
    response = requests.get(url + str(val))

    if response.status_code == 200:
        html = response.text
        soup = BeautifulSoup(html, 'html.parser')
        name = soup.select_one('#middle .datacenter .player_view .content_header .info_name .name').getText()
        position = soup.select_one('#middle .datacenter .player_view .content_header .thumb .position').getText()
        overall = soup.select_one('#middle .datacenter .player_view .content_header .thumb .ovr').getText()
        birth = soup.select_one('#middle .datacenter .player_view .content_header .info_etc .birth').getText().strip()
        height = soup.select_one('#middle .datacenter .player_view .content_header .info_etc .height').getText()
        weight = soup.select_one('#middle .datacenter .player_view .content_header .info_etc .weight').getText()
        physical = soup.select_one('#middle .datacenter .player_view .content_header .info_etc .physical').getText()
        skill = soup.select_one('#middle .datacenter .player_view .content_header .info_etc .skill').getText().strip()
        foot = soup.select_one('#middle .datacenter .player_view .content_header .info_etc .foot').getText().strip()
        season = soup.select_one('#middle .datacenter .player_view .content_header .info_etc .season').getText()
        team = soup.select_one('#middle .datacenter .player_view .content_header .info_team .team .txt').getText()
        nation = soup.select_one('#middle .datacenter .player_view .content_header .info_team .nation .txt').getText()
        skill_wrap = soup.select_one('#middle .datacenter .player_view .content_header .skill_wrap').getText()
        simple_stats = soup.select('#middle .datacenter .player_view .content_middle .txt')
        simple_values = soup.select('#middle .datacenter .player_view .content_middle .value')
        detail_stats = soup.select('#middle .datacenter .player_view .content_bottom .txt')
        detail_values = soup.select('#middle .datacenter .player_view .content_bottom .value')

        player_simple_data = pd.DataFrame([{'name':name}])
        player_simple_data['position'] = position
        player_simple_data['overall'] = overall
        player_simple_data['birth'] = birth
        player_simple_data['height'] = height
        player_simple_data['weight'] = weight
        player_simple_data['physical'] = physical
        player_simple_data['skill'] = skill
        player_simple_data['foot'] = foot
        player_simple_data['season'] = season
        player_simple_data['team'] = team
        player_simple_data['nation'] = nation
        player_simple_data['skill_wrap'] = re.sub("\s|특성", "", skill_wrap)
        for s, v in zip(simple_stats, simple_values):
            player_simple_data[s.getText()] = v.getText()
        player_simple_21_df = pd.concat([player_simple_21_df, player_simple_data])

        player_detail_data = pd.DataFrame([{'name':name}])
        player_detail_data['position'] = position
        player_detail_data['overall'] = overall
        player_detail_data['birth'] = birth
        player_detail_data['height'] = height
        player_detail_data['weight'] = weight
        player_detail_data['physical'] = physical
        player_detail_data['skill'] = skill
        player_detail_data['foot'] = foot
        player_detail_data['season'] = season
        player_detail_data['team'] = team
        player_detail_data['nation'] = nation
        player_detail_data['skill_wrap'] = re.sub("\s|특성", "", skill_wrap)
        for s, v in zip(detail_stats, detail_values):
            player_detail_data[s.getText()] = v.getText()
        player_detail_21_df = pd.concat([player_detail_21_df, player_detail_data])

    elif response.status_code == 404:
        print('Not Found.')
    else:
        print('An error has occurred.')

with pd.ExcelWriter('player_data.xlsx') as writer:
    player_simple_19_df.to_excel(writer, sheet_name='simple_19', index=False)
    player_detail_19_df.to_excel(writer, sheet_name='detail_19', index=False)
    player_simple_20_df.to_excel(writer, sheet_name='simple_20', index=False)
    player_detail_20_df.to_excel(writer, sheet_name='detail_20', index=False)
    player_simple_21_df.to_excel(writer, sheet_name='simple_21', index=False)
    player_detail_21_df.to_excel(writer, sheet_name='detail_21', index=False)