# Phase 4 - Hypothesis Test

In [4]:
#import requests
#from bs4 import BeautifulSoup

import pandas as pd
import numpy as np
import time

import seaborn
from matplotlib import pyplot

from sklearn.linear_model import LogisticRegression

%load_ext sql

%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False

%sql duckdb:///:memory:

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [3]:
chess_games_cleaned = pd.read_csv('chess_games_cleaned.csv')
display(chess_games_cleaned)

Unnamed: 0.1,Unnamed: 0,Result,WhiteElo,BlackElo,WhiteRatingDiff,ECO,Opening,TimeControl,Termination,Base (min),Increment (sec),Win_Rate
0,0,1-0,1901,1896,5,D10,slav defense,300+5,Time forfeit,5.0,5,0.490376
1,1,0-1,1641,1627,14,C20,king's pawn opening: 2.b3,300+0,Normal,5.0,0,0.412060
2,2,1-0,1647,1688,-41,B01,scandinavian defense: mieses-kotroc variation,180+0,Time forfeit,3.0,0,0.558550
3,3,0-1,1945,1900,45,B90,"sicilian defense: najdorf, lipnitsky attack",180+0,Time forfeit,3.0,0,0.444695
4,4,0-1,1773,1809,-36,C27,vienna game,180+0,Normal,3.0,0,0.579278
...,...,...,...,...,...,...,...,...,...,...,...,...
601600,601600,0-1,1798,1753,45,B06,modern defense,60+0,Time forfeit,1.0,0,0.480550
601601,601601,0-1,1711,1578,133,B08,pirc defense: classical variation,300+0,Normal,5.0,0,0.457798
601602,601602,1-0,1762,1683,79,C00,st. george defense,300+4,Normal,5.0,4,0.539982
601603,601603,1-0,2023,1742,281,A45,indian game,180+0,Normal,3.0,0,0.450352


# Hypothesis Test 1
#### Elo rating differential will be a more influential factor than opening played in predicting win probability (βelo > βopening)

* Do a regression and test if your coefficients are significant
* Multivariate logistic regression (win/not win), using opening played, elo rating differential, interaction
* Then test coefficients to see if they have a significant p-value


Common aims in opening play[edit]
Whether they are trying to gain the upper hand as White, or to equalize as Black or to create dynamic imbalances, players generally devote a lot of attention in the opening stages to the following strategies:[7]
Development: One of the main aims of the opening is to mobilize the pieces on useful squares where they will have impact on the game. To this end, knights are usually developed to f3, c3, f6, and c6 (or sometimes e2, d2, e7, or d7), and both players' king and queen pawns are moved so the bishops can be developed (alternatively, the bishops may be fianchettoed with a maneuver such as g3 and Bg2). Rapid mobilization is the key. The queen, and to a lesser extent the rooks, are not usually played to a central position until later in the game, when many minor pieces and pawns are no longer present.[8]
Control of the center: At the start of the game, it is not clear on which part of the board the pieces will be needed. However, control of the central squares allows pieces to be moved to any part of the board relatively easily, and can also have a cramping effect on the opponent. The classical view is that central control is best effected by placing pawns there, ideally establishing pawns on d4 and e4 (or d5 and e5 for Black). However, the hypermodern school showed that it was not always necessary or even desirable to occupy the center in this way, and that too broad a pawn front could be attacked and destroyed, leaving its architect vulnerable; an impressive-looking pawn center is worth little unless it can be maintained. The hypermoderns instead advocated controlling the center from a distance with pieces, breaking down one's opponent's center, and only taking over the center oneself later in the game. This leads to openings such as Alekhine's Defense – in a line like 1.e4 Nf6 2.e5 Nd5 3.d4 d6 4.c4 Nb6 5.f4 (the Four Pawns Attack) White has a formidable pawn center for the moment, but Black hopes to undermine it later in the game, leaving White's position exposed.[9]
King safety: The king is somewhat exposed in the middle of the board. Measures must be taken to reduce his vulnerability. It is therefore common for both players either to castle in the opening (simultaneously developing one of the rooks) or to otherwise bring the king to the side of the board via artificial castling.
Prevention of pawn weakness: Most openings strive to avoid the creation of pawn weaknesses such as isolated, doubled and backward pawns, pawn islands, etc. Some openings sacrifice endgame considerations for a quick attack on the opponent's position. Some unbalanced openings for Black, in particular, make use of this idea, such as the Dutch and the Sicilian. Other openings, such as the Alekhine and the Benoni, invite the opponent to overextend and form pawn weaknesses. Specific openings accept pawn weaknesses in exchange for compensation in the form of dynamic play. (See Pawn structure.)
Piece coordination: As the players mobilize their pieces, they both seek to ensure that they are working harmoniously towards the control of key squares.[9]
Create positions in which the player is more comfortable than the opponent: Transposition is one common way of doing this.[10][11]
Apart from these ideas, other strategies used in the middlegame may also be carried out in the opening. These include preparing pawn breaks to create counterplay, creating weaknesses in the opponent's pawn structure, seizing control of key squares, making favorable exchanges of minor pieces (e.g. gaining the bishop pair), or gaining a space advantage, whether in the center or on the flanks.


In order to test our hypothesis, we needed to group together openings to generalize our test. The four grouping types we are classifying on are 'Open Game', 'Semi-Open Game', 'Semi-Closed Game', and 'Closed Game'.

Open Game: Open games begin with White playing 1.e4 (the pawn in front of the King moves forward). It is the most popular opening move, as it begins White's 'center strength', as well as opens two pieces (the Queen and the King's bishop) to develop and begin attack. 

the move e4-e5, meaning that the pawn in front of the White and Black kings are moved forward as the first move for each player. This allows for an open game with a lot of piece development, not hindered by blockage by other pieces. 

White opens by playing 1.e4, which is the most popular opening move and has many strengths—it immediately stakes a claim in the center, and frees two pieces (the queen and king's bishop) for action. The oldest openings in chess follow 1.e4. Bobby Fischer wrote that 1.e4 was "Best by test." On the negative side, 1.e4 places a pawn on an undefended square and weakens the squares d4 and f4. If Black keeps the symmetry by replying 1...e5, the result is an Open Game (Hooper & Whyld 1992) (Watson 2006:87–90).

In [6]:
#CLASSIFICATION FOR EACH OPENING --> Open Game, Semi-open game, Semi-Closed game, Closed game

#Created a column OpenGame - True if open game type, False otherwise
chess_games_cleaned["OpenGame"] = chess_games_cleaned["Opening"].str.contains("portuguese opening") | \
chess_games_cleaned["Opening"].str.contains("centre pawn opening") | \
chess_games_cleaned["Opening"].str.contains("vienna game") | \
chess_games_cleaned["Opening"].str.contains("bishop's opening") | \
chess_games_cleaned["Opening"].str.contains("danish gambit") | \
chess_games_cleaned["Opening"].str.contains("center game") | \
chess_games_cleaned["Opening"].str.contains("alapin's opening") | \
chess_games_cleaned["Opening"].str.contains("ruy lopez") | \
chess_games_cleaned["Opening"].str.contains("ponziani opening") | \
chess_games_cleaned["Opening"].str.contains("three knights game") | \
chess_games_cleaned["Opening"].str.contains("four knights game") | \
chess_games_cleaned["Opening"].str.contains("italian game") | \
chess_games_cleaned["Opening"].str.contains("giuoco piano") | \
chess_games_cleaned["Opening"].str.contains("evans gambit") | \
chess_games_cleaned["Opening"].str.contains("hungarian defense") | \
chess_games_cleaned["Opening"].str.contains("two knights defense") | \
chess_games_cleaned["Opening"].str.contains("scotch game") | \
chess_games_cleaned["Opening"].str.contains("inverted hungarian opening") | \
chess_games_cleaned["Opening"].str.contains("konstantinopolsky opening") | \
chess_games_cleaned["Opening"].str.contains("elephant gambit") | \
chess_games_cleaned["Opening"].str.contains("philidor defense") | \
chess_games_cleaned["Opening"].str.contains("latvian gambit") | \
chess_games_cleaned["Opening"].str.contains("damiano defense") | \
chess_games_cleaned["Opening"].str.contains("petrov's defense") | \
chess_games_cleaned["Opening"].str.contains("greco defense") | \
chess_games_cleaned["Opening"].str.contains("napoleon opening") | \
chess_games_cleaned["Opening"].str.contains("king's gambit") | \
chess_games_cleaned["Opening"].str.contains("king's pawn opening") | \
chess_games_cleaned["Opening"].str.contains("danvers opening") | \
chess_games_cleaned["Opening"].str.contains("bongcloud attack")

#Created a column SemiOpenGame - True if semi-open game type, False otherwise
chess_games_cleaned["SemiOpenGame"] = chess_games_cleaned["Opening"].str.contains("corn stalk defense") | \
chess_games_cleaned["Opening"].str.contains("st. george defense") | \
chess_games_cleaned["Opening"].str.contains("lemming defense") | \
chess_games_cleaned["Opening"].str.contains("owen's defense") | \
chess_games_cleaned["Opening"].str.contains("sicilian defense") | \
chess_games_cleaned["Opening"].str.contains("caro-kann defense") | \
chess_games_cleaned["Opening"].str.contains("nimzowitch defense") | \
chess_games_cleaned["Opening"].str.contains("scandinavian defense") | \
chess_games_cleaned["Opening"].str.contains("balogh defense") | \
chess_games_cleaned["Opening"].str.contains("pirc defense") | \
chess_games_cleaned["Opening"].str.contains("french defense") | \
chess_games_cleaned["Opening"].str.contains("fred defense") | \
chess_games_cleaned["Opening"].str.contains("barnes defense") | \
chess_games_cleaned["Opening"].str.contains("alehkine's defense") | \
chess_games_cleaned["Opening"].str.contains("borg opening") | \
chess_games_cleaned["Opening"].str.contains("modern defense") | \
chess_games_cleaned["Opening"].str.contains("goldsmith defense") | \
chess_games_cleaned["Opening"].str.contains("carr defense") | \
chess_games_cleaned["Opening"].str.contains("adams defense")

#Created a column SemiClosedGame - True if semi-closed game type, False otherwise
chess_games_cleaned["SemiClosedGame"] = chess_games_cleaned["Opening"].str.contains("polish defense") | \
chess_games_cleaned["Opening"].str.contains("benoni defense") | \
chess_games_cleaned["Opening"].str.contains("queen's knight defense") | \
chess_games_cleaned["Opening"].str.contains("wade defense") | \
chess_games_cleaned["Opening"].str.contains("englund gambit") | \
chess_games_cleaned["Opening"].str.contains("english defense") | \
chess_games_cleaned["Opening"].str.contains("keres defense") | \
chess_games_cleaned["Opening"].str.contains("dutch defense") | \
chess_games_cleaned["Opening"].str.contains("indian game") | \
chess_games_cleaned["Opening"].str.contains("nimzo-indian defense") | \
chess_games_cleaned["Opening"].str.contains("queen's indian defense") | \
chess_games_cleaned["Opening"].str.contains("bogo–indian defense") | \
chess_games_cleaned["Opening"].str.contains("blumenfeld countergambit") | \
chess_games_cleaned["Opening"].str.contains("catalan opening") | \
chess_games_cleaned["Opening"].str.contains("king's indian defense") | \
chess_games_cleaned["Opening"].str.contains("benoni defense") | \
chess_games_cleaned["Opening"].str.contains("benko gambit") | \
chess_games_cleaned["Opening"].str.contains("old indian defense") | \
chess_games_cleaned["Opening"].str.contains("budapest gambit") | \
chess_games_cleaned["Opening"].str.contains("modern benoni") | \
chess_games_cleaned["Opening"].str.contains("queen's gambit declined")

#Created a column ClosedGame - True if closed game type, False otherwise
    # NOTE: "Queens Gambit" includes both Queen's gambit accepted and Queen's gambit declined
chess_games_cleaned["ClosedGame"] = chess_games_cleaned["Opening"].str.contains("queen's pawn") | \
chess_games_cleaned["Opening"].str.contains("closed game") | \
chess_games_cleaned["Opening"].str.contains("queen's gambit") | \
chess_games_cleaned["Opening"].str.contains("slav defense") | \
chess_games_cleaned["Opening"].str.contains("stonewall attack") | \
chess_games_cleaned["Opening"].str.contains("colle system") | \
chess_games_cleaned["Opening"].str.contains("richter-veresov attack") | \
chess_games_cleaned["Opening"].str.contains("torre attack") | \
chess_games_cleaned["Opening"].str.contains("symmetrical defense") | \
chess_games_cleaned["Opening"].str.contains("chigorin defense") | \
chess_games_cleaned["Opening"].str.contains("baltic defense") | \
chess_games_cleaned["Opening"].str.contains("marshall defense") | \
chess_games_cleaned["Opening"].str.contains("blackmar-diemer gambit") | \
chess_games_cleaned["Opening"].str.contains("colle system") | \
chess_games_cleaned["Opening"].str.contains("london system")

#display(chess_games.head(50))

chess_games_cleaned.loc[(chess_games_cleaned['SemiClosedGame'] == True), 'GameType'] = 'Semi Closed Game'
chess_games_cleaned.loc[(chess_games_cleaned['SemiOpenGame'] == True), 'GameType'] = 'Semi Open Game'
chess_games_cleaned.loc[(chess_games_cleaned['ClosedGame'] == True), 'GameType'] = 'Closed Game'
chess_games_cleaned.loc[(chess_games_cleaned['OpenGame'] == True), 'GameType'] = 'Open Game'

chess_games_cleaned = chess_games_cleaned.drop(['OpenGame', 'SemiOpenGame', 'SemiClosedGame', 'ClosedGame'], axis = 1)

display(chess_games_cleaned.head(50))

Unnamed: 0.1,Unnamed: 0,Result,WhiteElo,BlackElo,WhiteRatingDiff,ECO,Opening,TimeControl,Termination,Base (min),Increment (sec),Win_Rate,GameType
0,0,1-0,1901,1896,5,D10,slav defense,300+5,Time forfeit,5.0,5,0.490376,Closed Game
1,1,0-1,1641,1627,14,C20,king's pawn opening: 2.b3,300+0,Normal,5.0,0,0.41206,Open Game
2,2,1-0,1647,1688,-41,B01,scandinavian defense: mieses-kotroc variation,180+0,Time forfeit,3.0,0,0.55855,Semi Open Game
3,3,0-1,1945,1900,45,B90,"sicilian defense: najdorf, lipnitsky attack",180+0,Time forfeit,3.0,0,0.444695,Semi Open Game
4,4,0-1,1773,1809,-36,C27,vienna game,180+0,Normal,3.0,0,0.579278,Open Game
5,5,0-1,1895,1886,9,B10,caro-kann defense: two knights attack,180+0,Time forfeit,3.0,0,0.52122,Semi Open Game
6,6,1-0,2155,2356,-201,D02,queen's pawn game: london system,180+0,Normal,3.0,0,0.499276,Closed Game
7,7,0-1,2010,2111,-101,A45,indian game,300+0,Normal,5.0,0,0.450352,Semi Closed Game
8,8,1-0,1764,1773,-9,B01,scandinavian defense: mieses-kotroc variation,180+0,Time forfeit,3.0,0,0.55855,Semi Open Game
9,9,0-1,1649,1638,11,C57,"italian game: two knights defense, traxler cou...",900+3,Normal,15.0,3,0.408377,Open Game


In [11]:
#Changing Results to Binary
chess_games_cleaned.loc[(chess_games_cleaned['Result'] == '1-0'), 'Result_Binary'] = 1
chess_games_cleaned.loc[(chess_games_cleaned['Result'] == '0-1'), 'Result_Binary'] = 0
chess_games_cleaned.loc[(chess_games_cleaned['Result'] == '1/2-1/2'), 'Result_Binary'] = 2

chess_games_cleaned

Unnamed: 0.1,Unnamed: 0,Result,WhiteElo,BlackElo,WhiteRatingDiff,ECO,Opening,TimeControl,Termination,Base (min),Increment (sec),Win_Rate,GameType,Result_Binary
0,0,1-0,1901,1896,5,D10,slav defense,300+5,Time forfeit,5.0,5,0.490376,Closed Game,1.0
1,1,0-1,1641,1627,14,C20,king's pawn opening: 2.b3,300+0,Normal,5.0,0,0.412060,Open Game,0.0
2,2,1-0,1647,1688,-41,B01,scandinavian defense: mieses-kotroc variation,180+0,Time forfeit,3.0,0,0.558550,Semi Open Game,1.0
3,3,0-1,1945,1900,45,B90,"sicilian defense: najdorf, lipnitsky attack",180+0,Time forfeit,3.0,0,0.444695,Semi Open Game,0.0
4,4,0-1,1773,1809,-36,C27,vienna game,180+0,Normal,3.0,0,0.579278,Open Game,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
601600,601600,0-1,1798,1753,45,B06,modern defense,60+0,Time forfeit,1.0,0,0.480550,Semi Open Game,0.0
601601,601601,0-1,1711,1578,133,B08,pirc defense: classical variation,300+0,Normal,5.0,0,0.457798,Semi Open Game,0.0
601602,601602,1-0,1762,1683,79,C00,st. george defense,300+4,Normal,5.0,4,0.539982,Semi Open Game,1.0
601603,601603,1-0,2023,1742,281,A45,indian game,180+0,Normal,3.0,0,0.450352,Semi Closed Game,1.0


# Hypothesis Test 2
#### A time constraint less than 10 minutes will amplify the impact of rating differential in predicting win probability by at least 1.5 times compared to games with time constraints greater than or equal to 10 minutes. (βrating diff with constraint < 10 Minutes / βrating diff with constraint > 10 Minutes >= 1.5)

* Single var regression for elo difference vs win prob, data =  constraint < 10 minutes
* Single var regression for elo difference vs win prob, data = constraint > 10 minutes
* Compare coefficients
