# DAV 5400 Fall 2019 Project 3
## Analyzing Chess Tournament Results 
### Group Members:
- Julian Ruggiero
- Randy Leon
- Omar M. Hussein
<img src="chess.jpg" />

### Introduction

- In this assignment, you’re given a text file (“tournamentinfo.txt”) with chess tournament results where the information
has some structure. Your job is to create a Jupyter Notebook that generates a .CSV file with the following information
for all of the chess players:

- 1] Player’s -Name done
- 2] Player’s State done
- 3] Total Number of Points -done
- 4] Player’s Pre-Rating  done
- 5] Average Pre-Tournament Chess Rating of Opponents.

**Example : Gary Hua, ON, 6.0, 1794, 1605**

In [31]:
## importing all of the tools we need.
import re
import pandas as pd
import numpy as np
import urllib
url = "https://raw.githubusercontent.com/OMS1996/DAV-5400/master/tournamentinfo.txt"
data = urllib.request.urlopen(url).readlines()
type(data)
content = [x.strip().decode("utf-8") for x in data]
content

['-----------------------------------------------------------------------------------------',
 'Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round|',
 'Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  |',
 '-----------------------------------------------------------------------------------------',
 '1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|',
 'ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |',
 '-----------------------------------------------------------------------------------------',
 '2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|',
 'MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |',
 '-----------------------------------------------------------------------------------------',
 '3 | ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W

Here we took made sure our code was reproduce-able by ensuring that this text file could be read from the internet. Anyone should be able to run this code given that the data is being pulled straight from Omar's GitHub. Our .decode functionality helps us achieve this and not have to depend on another user to download the text file.

In the following cells we create a variety of regular expressions to extract the necassary / relevant data in our project first.

Player’s State

In [32]:
# here regex state
# Player’s State
regex_state = r"'[A-Z]{2}"
# Compiling the regex
state_reg=re.compile(regex_state)
# Getting all the values and avoiding errors
state = state_reg.findall(str(content))
# Checking to see if it captured all the 64 values or not
print(len(state))
# Viewing
print(state[0:5])
# Removing the extraneous chars
states = [s.replace('\'', '') for s in state]
# Viewing
states[0:5]

64
["'ON", "'MI", "'MI", "'MI", "'MI"]


['ON', 'MI', 'MI', 'MI', 'MI']

First thing was to obtain the states found in the text file. We used a regex expression that looked for a group of two capital alphabet characters. We tested our findings by viewing the first five states available.

 Player’s Pre-Rating 

In [41]:
# Regex Pre 4
# Player’s Pre-Rating
regex_Pre = r"R:\s*\d{1,4}"
# Compiling the Values
pre_reg=re.compile(regex_Pre)
# Casting into string to avoid errors and make it into list format
pre =pre_reg.findall(str(content))
print(pre[0:5])
# Removing all the extraneous values 
pres = [p.replace('R', '') for p in pre]
pres = [p.replace(':', '') for p in pres]
pres = [p.replace(' ', '') for p in pres]
# Casting the Values into integers for further future calculations
pres= [int(p) for p in pres]
# Viewing to make sure the values are correct
print(len(pres))
pres[:5]

['R: 1794', 'R: 1553', 'R: 1384', 'R: 1716', 'R: 1655']
64


[1794, 1553, 1384, 1716, 1655]

Each player's pre-rating was found with a regex expression that looked for a capital letter "R" followed by a colon, then a space, then a series of 1 to 4 numbers.

We then replaced the R, colon, and space with nothing so we'd just be left with the 4 digit numerical expression of the player's pre-rating.

We lastly printed the length of values to ensure we were able to capture all 64 players as well as print the first five to test our code out.

In [34]:
#OPTION N°2 TO INCLUDE RECORDS WITH FIRST NAME, 2 MIDDLE NAMES WITH -, AND LAST NAME.
#number of rounds
rounds=7
#regex to extract result + opponent. (W  39)
re_opp= "([A-Z]{1})\s+(\d*)\|"
#create a string to apply the regex above n number of times, to avoid having a long general regex.
regex_rounds = re_opp * rounds
#create the general regex.
re1=re.compile(r"([0-9]{1,})\s+\|\s+([^\|]+)\|(\d+\.?\d+)\s+\|"+regex_rounds)
result=[]
#for element in content:
for element in content:
    if re.match(re1,element):
        result.append(re.findall(re1,element))

We defined the rounds as a constant of '7', used a regex expression to extract the result of the match and the player number identifier, then created a general regex to compile a list.

In [35]:
#convert from a list of lists to a list of tuples.
newList=[element for big_list in result for element in big_list]

In [36]:
# Creating the table with the column names.
df=pd.DataFrame(newList,columns=['Player_ID', 'Name', 'Points','r1_res','r1_op','r2_res',
                                 'r2_op','r3_res','r3_op','r4_res','r4_op','r5_res','r5_op',
                                 'r6_res','r6_op','r7_res','r7_op'])
#Showing the first five values.
df.head()

Unnamed: 0,Player_ID,Name,Points,r1_res,r1_op,r2_res,r2_op,r3_res,r3_op,r4_res,r4_op,r5_res,r5_op,r6_res,r6_op,r7_res,r7_op
0,1,GARY HUA,6.0,W,39,W,21,W,18,W,14,W,7,D,12,D,4
1,2,DAKSHESH DARURI,6.0,W,63,W,58,L,4,W,17,W,16,W,20,W,7
2,3,ADITYA BAJAJ,6.0,L,8,W,61,W,25,W,21,W,11,W,13,W,12
3,4,PATRICK H SCHILLING,5.5,W,23,D,28,W,2,W,26,D,5,W,19,D,1
4,5,HANSHI ZUO,5.5,W,45,W,37,D,12,D,13,D,4,W,14,W,17


Here, after making our list a list of tuples, we were able to create a dataframe with all of our variables.

In [37]:
# Adding in the states to the data frame.
df['state']=states

In [38]:
# Adding in the pre ratings to the data frame.
df['pre_rating']=pres

In [39]:
# showing the first five, sanity check.
df.head()

Unnamed: 0,Player_ID,Name,Points,r1_res,r1_op,r2_res,r2_op,r3_res,r3_op,r4_res,r4_op,r5_res,r5_op,r6_res,r6_op,r7_res,r7_op,state,pre_rating
0,1,GARY HUA,6.0,W,39,W,21,W,18,W,14,W,7,D,12,D,4,ON,1794
1,2,DAKSHESH DARURI,6.0,W,63,W,58,L,4,W,17,W,16,W,20,W,7,MI,1553
2,3,ADITYA BAJAJ,6.0,L,8,W,61,W,25,W,21,W,11,W,13,W,12,MI,1384
3,4,PATRICK H SCHILLING,5.5,W,23,D,28,W,2,W,26,D,5,W,19,D,1,MI,1716
4,5,HANSHI ZUO,5.5,W,45,W,37,D,12,D,13,D,4,W,14,W,17,MI,1655


We then added our previously made lists as columns in our dataframe.

In [13]:
#replacing missing data with NaN to check for null values later.
df=df.replace(r'', np.NaN)

Creating a subset to help facilitate the calculation of the average

In [14]:
# Creating a subset of only player values.
player_opponents_df = df.iloc[:,[0,4,6,8,10,12,14,16,18]]
# Showing the first five.
player_opponents_df.head()

Unnamed: 0,Player_ID,r1_op,r2_op,r3_op,r4_op,r5_op,r6_op,r7_op,pre_rating
0,1,39,21,18,14,7,12,4,1794
1,2,63,58,4,17,16,20,7,1553
2,3,8,61,25,21,11,13,12,1384
3,4,23,28,2,26,5,19,1,1716
4,5,45,37,12,13,4,14,17,1655


In [15]:
# Storing the averages
averages = []
# Looping through every row
for index, row in player_opponents_df.iterrows():
    # temporary value
    temp=0
    # Count the number of opponents
    count = 0
    # check if it not null
    if pd.notna(row['r1_op']):
        #add opponents average
        temp+=int(player_opponents_df.iloc[int(row[1])-1,8])
        count+=1
     # check if it not null
    if pd.notna(row['r2_op']):
        #add opponents average
        temp+=int(player_opponents_df.iloc[int(row[2])-1,8])
        count+=1
     # check if it not null
    if pd.notna(row['r3_op']):
        #add opponents average
        temp+=int(player_opponents_df.iloc[int(row[3])-1,8])
        count+=1
     # check if it not null
    if pd.notna(row['r4_op']):
        #add opponents average:
        temp+=int(player_opponents_df.iloc[int(row[4])-1,8])
        count+=1
     # check if it not null
    if pd.notna(row['r5_op']):
        #add opponents average
        temp+=int(player_opponents_df.iloc[int(row[5])-1,8])
        count+=1
     # check if it not null
    if pd.notna(row['r6_op']):
        temp+=int(player_opponents_df.iloc[int(row[6])-1,8])
        count+=1
     # check if it not null
    if pd.notna(row['r7_op']):
        #add opponents average
        temp+=int(player_opponents_df.iloc[int(row[7])-1,8])
        #adding the count
        count+=1
    #divide the total averages by the number of opponents to get the opposition's average
    temp/=count
    # add to list
    averages.append(int(temp))
    # reintialize it back to zero
    temp=0
#Show all the averages
print(averages)

[1605, 1469, 1563, 1573, 1500, 1518, 1372, 1468, 1523, 1554, 1467, 1506, 1497, 1515, 1483, 1385, 1498, 1480, 1426, 1410, 1470, 1300, 1213, 1357, 1363, 1506, 1221, 1522, 1313, 1144, 1259, 1378, 1276, 1375, 1149, 1388, 1384, 1539, 1429, 1390, 1248, 1149, 1106, 1327, 1152, 1357, 1392, 1355, 1285, 1296, 1356, 1494, 1345, 1206, 1406, 1414, 1363, 1391, 1319, 1330, 1327, 1186, 1350, 1263]


Here, we used a for loop to obtain the pre-tournament averages of every player in the tournament. We checked for null values so it wouldn't throw off our algorithm as well. 

In [30]:
#add the averages of the opponents for every Player to the data frame
df['opp_avg']=averages
df.head()

Unnamed: 0,Player_ID,Name,Points,r1_res,r1_op,r2_res,r2_op,r3_res,r3_op,r4_res,r4_op,r5_res,r5_op,r6_res,r6_op,r7_res,r7_op,state,pre_rating,opp_avg
0,1,GARY HUA,6.0,W,39,W,21,W,18,W,14,W,7,D,12,D,4,ON,1794,1605
1,2,DAKSHESH DARURI,6.0,W,63,W,58,L,4,W,17,W,16,W,20,W,7,MI,1553,1469
2,3,ADITYA BAJAJ,6.0,L,8,W,61,W,25,W,21,W,11,W,13,W,12,MI,1384,1563
3,4,PATRICK H SCHILLING,5.5,W,23,D,28,W,2,W,26,D,5,W,19,D,1,MI,1716,1573
4,5,HANSHI ZUO,5.5,W,45,W,37,D,12,D,13,D,4,W,14,W,17,MI,1655,1500


In [40]:
#changing the data into a Comma Separated Value file.
df.to_csv('chessinformation.csv')

We have successfully been able to make a .csv file with all of the necessary requirements. After running all cells of this code, you, the user, should be able to find the "chessinformation.csv" on your hard drive.