# 2018 KBO Mock Rookie Draft

## Formulars
* WAR = RAR / 10
* RAR = RAA + 20 (PA / 600)
* RAA = wRAA + wSB + UBR + UZR + Position Constant
* wRAA = ((wOBA - league wOBA)/wOBAscale) * PA

타자<br/>
** wOBA(weight on Base Average) = (0.735 * (NIBB+HBP) + 0.90 * 1B + 1.24 * 2B + 1.56 * 3B + 1.95 * HR) / PA **


투수<br/>
** FIP = { ( 13 x HR ) + ( 3 x BB ) – ( 2 x K ) } / 이닝 + cFIP **


투타 통합
WAR(Wins Above Replacement) 로 투타통합 전체순위 계산

투수 : 선발 = ((9이닝 당 대체선수 대비 기대승률) / 9) * Inning
구원 = (선발식) * ((1 + gmLI) / 2) <br/>
타자 : RAR / (R/W) <br/>
https://namu.wiki/w/WAR
NIBB : Not Intentional base on balls<br/>
IBB : Intentional base on balls<br/>
PA : Plate appearance

** References<br/> **
https://en.wikipedia.org/wiki/WOBA<br/>
http://ko.yagongso.wikidok.net/wp-d/59e0e5f411f7fa6a3ea5545a/View



## 1. 타자

In [2]:
import pandas as pd
import numpy as np

#Import data
hitter = pd.read_csv('2017hitter.csv')
hitter.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 619 entries, 0 to 618
Data columns (total 17 columns):
순위     619 non-null int64
이름     619 non-null object
경기     619 non-null int64
타율     619 non-null float64
타석     619 non-null int64
타수     619 non-null int64
안타     619 non-null int64
2루타    619 non-null int64
3루타    619 non-null int64
홈런     619 non-null int64
타점     619 non-null int64
득점     619 non-null int64
사사구    619 non-null int64
삼진     619 non-null int64
출루율    619 non-null float64
장타율    619 non-null float64
도루     619 non-null int64
dtypes: float64(3), int64(13), object(1)
memory usage: 82.3+ KB


### Data Cleaning for Hitter 

In [3]:
#check missing value
hitter.isnull().sum()

순위     0
이름     0
경기     0
타율     0
타석     0
타수     0
안타     0
2루타    0
3루타    0
홈런     0
타점     0
득점     0
사사구    0
삼진     0
출루율    0
장타율    0
도루     0
dtype: int64

In [4]:
hitter.head()

Unnamed: 0,순위,이름,경기,타율,타석,타수,안타,2루타,3루타,홈런,타점,득점,사사구,삼진,출루율,장타율,도루
0,1,장재우(광천고),1,1.0,4,4,4,0,0,0,0,2,0,0,1.0,1.0,1
1,2,문정재(광천고),1,0.5,4,2,1,0,1,0,1,1,0,0,0.5,1.5,0
2,3,배지환(경북고),27,0.474,120,95,45,3,5,1,17,31,20,10,0.556,0.642,30
3,4,천현재(부경고),15,0.463,53,41,19,6,3,0,13,9,11,2,0.566,0.756,2
4,5,김민석(부경고),15,0.462,36,26,12,1,0,1,9,7,9,3,0.583,0.615,1


규정타석을 채우지 못한 타자들을 거를 필요가 있다 <br/>

2017년 기준
* 전반기 주말리그 6.5경기
* 후반기 주말리그 6.5경기
* 황금사자기
* 청룡선수권
* 대통령배
* 봉황대기
* 전국체전 

전국체전을 제외한 대회를 각각 1.5경기를 평균으로 해서 총 19경기를 소속팀 평균 경기수로 계산했다

규정타석 : 소속팀 경기수 x 0.8 x 3 = 46타석 <br/>
http://www.hsbaseball.kr/board/bbs_cmu_read.php?idxno=972&menu_idxno=11&page=31&search_item=&search_word=&category=

In [5]:
hitter = hitter[hitter['타석'] >= 46]
hitter.shape

(553, 17)

In [6]:
#단타 지표 추가
hitter['단타'] = hitter.apply(lambda x: x[6] - (x[7] + x[8] +x[9]), axis=1)
col_names = ['순위','이름','경기','타율','타석','타수','안타','단타','2루타','3루타','홈런','타점','득점','사사구','삼진','출루율','장타율','도루']
hitter = hitter.reindex(columns=col_names)
hitter.head()

Unnamed: 0,순위,이름,경기,타율,타석,타수,안타,단타,2루타,3루타,홈런,타점,득점,사사구,삼진,출루율,장타율,도루
2,3,배지환(경북고),27,0.474,120,95,45,36,3,5,1,17,31,20,10,0.556,0.642,30
3,4,천현재(부경고),15,0.463,53,41,19,10,6,3,0,13,9,11,2,0.566,0.756,2
5,6,안영환(신흥고),16,0.452,46,31,14,10,3,1,0,6,8,12,1,0.605,0.613,2
6,7,홍혁준(충훈고),14,0.447,59,47,21,16,4,1,0,14,12,9,7,0.517,0.574,7
7,8,이상훈(김해고),13,0.438,54,48,21,15,2,4,0,4,9,6,1,0.5,0.646,7


In [7]:
#타자 종합지표 UEQR도입
hitter['wOBA'] = hitter.apply(lambda x: (0.735*x[13] + 0.9*x[7] + 1.24*x[8] + 1.56*x[9] + 1.95*x[10])/x[5], axis=1)
hitter = hitter.sort_values(by='wOBA',ascending=0)
hitter.head(10)

Unnamed: 0,순위,이름,경기,타율,타석,타수,안타,단타,2루타,3루타,홈런,타점,득점,사사구,삼진,출루율,장타율,도루,wOBA
16,17,박영완(대구고),14,0.4,59,35,14,7,5,2,0,13,14,20,6,0.596,0.657,2,0.866286
5,6,안영환(신흥고),16,0.452,46,31,14,10,3,1,0,6,8,12,1,0.605,0.613,2,0.745161
3,4,천현재(부경고),15,0.463,53,41,19,10,6,3,0,13,9,11,2,0.566,0.756,2,0.712317
23,24,김민기(덕수고),30,0.393,124,89,35,25,8,1,1,13,27,31,10,0.545,0.539,13,0.659719
25,26,추재현(신일고),22,0.39,99,77,30,18,6,3,3,25,23,22,4,0.525,0.662,4,0.653766
8,9,강백호(서울고),31,0.434,133,106,46,30,13,0,3,34,37,27,10,0.549,0.642,10,0.649198
2,3,배지환(경북고),27,0.474,120,95,45,36,3,5,1,17,31,20,10,0.556,0.642,30,0.637579
63,64,김다운(율곡고),25,0.353,115,85,30,18,8,4,0,14,34,29,7,0.518,0.541,4,0.631471
145,146,박준형(북일고),16,0.311,68,45,14,10,3,1,0,14,10,19,5,0.5,0.422,3,0.627667
29,30,박민석(장충고),19,0.385,86,65,25,20,2,2,1,17,21,20,7,0.529,0.523,13,0.619231


## 2. 투수

In [8]:
pitcher = pd.read_csv('2017pitcher.csv')
pitcher.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 233 entries, 0 to 232
Data columns (total 20 columns):
순위        233 non-null int64
이름(팀명)    233 non-null object
경기        233 non-null int64
승         233 non-null int64
패         233 non-null int64
세이브       233 non-null int64
이닝        233 non-null float64
타자        233 non-null int64
타수        233 non-null int64
피안타       233 non-null int64
피홈        233 non-null int64
희생        233 non-null int64
4구        233 non-null int64
사구        233 non-null int64
삼진        233 non-null int64
실점        233 non-null int64
자책        233 non-null int64
투구        233 non-null int64
S         233 non-null int64
방어율       233 non-null float64
dtypes: float64(2), int64(17), object(1)
memory usage: 36.5+ KB


### Data Cleaning for Pitcher 

In [9]:
pitcher.isnull().sum()

순위        0
이름(팀명)    0
경기        0
승         0
패         0
세이브       0
이닝        0
타자        0
타수        0
피안타       0
피홈        0
희생        0
4구        0
사구        0
삼진        0
실점        0
자책        0
투구        0
S         0
방어율       0
dtype: int64

In [10]:
pitcher.head()

Unnamed: 0,순위,이름(팀명),경기,승,패,세이브,이닝,타자,타수,피안타,피홈,희생,4구,사구,삼진,실점,자책,투구,S,방어율
0,1,김정우(동산고),24,5,0,0,28.0,107,95,15,0,2,5,3,29,5,3,422,0,0.965
1,2,김영준(선린인고),18,5,1,0,65.0,258,219,44,0,12,15,8,55,19,7,987,0,0.97
2,3,안승민(원주고),14,0,1,0,17.2,73,63,11,0,1,5,4,8,6,2,282,0,1.02
3,4,남가현(배명고),23,2,1,0,26.0,103,85,16,0,6,9,1,18,5,3,381,0,1.04
4,5,정철원(안산공고),23,9,0,0,85.0,334,290,59,0,7,30,7,82,17,10,1230,0,1.059


규정이닝 : 소속팀 경기수(19) x 0.8 = 15.2타석 <br/>

In [11]:
pitcher = pitcher[pitcher['이닝'] >= 15.2]
pitcher.shape

(226, 20)

In [12]:
#FIP = { ( 13 x HR ) + ( 3 x BB ) – ( 2 x K ) } / 이닝 + cFIP
pitcher['FIP'] = pitcher.apply(lambda x: ( 13*x[10] + 3*x[12] - 2*x[14])/ x[6] + 3.16, axis=1)
pitcher = pitcher.sort_values(by='FIP',ascending=True)

pitcher['FIP'] = pitcher.apply(lambda x: ())
pitcher.head()

Unnamed: 0,순위,이름(팀명),경기,승,패,세이브,이닝,타자,타수,피안타,...,희생,4구,사구,삼진,실점,자책,투구,S,방어율,FIP
49,49,곽빈(배명고),23,2,2,0,28.1,112,92,16,...,8,6,6,41,9,7,431,0,2.225,0.88242
28,29,전용주(안산공고),23,3,1,0,40.1,157,135,26,...,4,9,5,58,10,8,610,0,1.786,1.264738
13,14,양창섭(덕수고),30,7,2,0,50.1,198,178,39,...,9,7,2,56,10,8,692,0,1.431,1.343633
65,66,강백호(서울고),31,3,2,0,31.2,143,116,27,...,5,14,4,49,15,9,560,0,2.559,1.365128
117,118,윤강찬(김해고),13,4,4,0,66.1,276,247,64,...,8,6,15,73,36,26,994,0,3.529,1.420212


## 3. 투타통합 (WAR 기준)

In [19]:
total = pd.concat([hitter,pitcher], ignore_index=True, sort=False)
total.head()

Unnamed: 0,순위,이름,경기,타율,타석,타수,안타,단타,2루타,3루타,...,피홈,희생,4구,사구,실점,자책,투구,S,방어율,FIP
0,17,박영완(대구고),14,0.4,59.0,35,14.0,7.0,5.0,2.0,...,,,,,,,,,,
1,6,안영환(신흥고),16,0.452,46.0,31,14.0,10.0,3.0,1.0,...,,,,,,,,,,
2,4,천현재(부경고),15,0.463,53.0,41,19.0,10.0,6.0,3.0,...,,,,,,,,,,
3,24,김민기(덕수고),30,0.393,124.0,89,35.0,25.0,8.0,1.0,...,,,,,,,,,,
4,26,추재현(신일고),22,0.39,99.0,77,30.0,18.0,6.0,3.0,...,,,,,,,,,,
