# Benjamin Freund
# Project 3

For this project, we are presented with a text file containing structured results from a chess tournament. We are tasked with generating a CSV that returns the players' name, state, total number of points, and pre-rating.

Note: The assignment also asked for the average pre-tournament chess rating of opponents. I was not able to properly calculate this. However, there will be comments and commented out code throughout this assignment that shows how I attempted to calculate it. Any comment or code of this nature will be commented out with two pound signs (##) instead of one (#).

The first step is to import the necessary libraries. The pandas library will be imported to create a data frame of the necessary information, and the re library will be imported to help extract the necessary data.

In [1]:
# Importing the pandas and re libraries
import pandas as pd
import re

The next step is to open the text file in our Python environment and to read the lines into a variable.

In [2]:
# Opening the text file and storing it in chess_data
chess_data = open('tournamentinfo.txt')

# Reading the lines of chess_data into the variable lines
lines = chess_data.readlines()

Next, we print lines out line by line to display the text file in its proper structure.

In [3]:
# Printing out the text file line by line
for index, line in enumerate(lines):
    print(line)

-----------------------------------------------------------------------------------------

 Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| 

 Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | 

-----------------------------------------------------------------------------------------

    1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|

   ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |

-----------------------------------------------------------------------------------------

    2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|

   MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |

-----------------------------------------------------------------------------------------

    3 | ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W  13|W  12

Now we need to extract the requested data. In order to do this, we are going to loop through each line, split it, and index the necessary data. Some notes about how this works follow.

Each person has three lines. The dotted line above their name is indexed as 0, the line with their name as 1, and the line below their name as 2. Each line has data seperated by the | separator. For each line, we can split the line by the | separator. Once the line is split, each item within that line can then be indexed individually to extract the desired data.

This data will then be put into a list of lists. Each list within the larger list will be a player and all of their information.

The code and comments in the cell below should help clarify the process.

In [4]:
# Creating an empty list data. This will be the larger, outer list
data = []

# Creating an empty list entry. This will be the smaller, inner lists where each players' information is stored
entry = []

# Looping through each line
for index, line in enumerate(lines):
    # If the index modulated by 3 leaves a remainder of 0, this means the index is 0, indicating the first line
    if index % 3 == 0:
        # If the index is 0, add an empty entry list into the larger data list
        data.append(entry)
        # Establishing this new entry as an empty list
        entry = []
    # If the index modulated by 3 leaves a remainder of 1, this means the index is 1, indicating the second line
    if index % 3 == 1:
        # If the index is 1, we want to extract the players' name and total points
        # Therefore, we split the line by the | operator, index the second and third values, and strip away the spaces
        ## We also want to extract the opponents, so we can calculate their average pre-rating
        ## Therefore, we split the line by the | operator, index each opponent, only select the opponents' IDs,
        ## and strip away the spaces.
        entry.append(line.split("|")[1].strip())
        entry.append(line.split("|")[2].strip())
        ## entry.append(line.split("|")[3][1:].strip())
        ## entry.append(line.split("|")[4][1:].strip())
        ## entry.append(line.split("|")[5][1:].strip())
        ## entry.append(line.split("|")[6][1:].strip())
        ## entry.append(line.split("|")[7][1:].strip())
        ## entry.append(line.split("|")[8][1:].strip())
        ## entry.append(line.split("|")[9][1:].strip())
    # If the index modulated by 3 leaves a remainder of 2, this means the index is 2, indicating the third line
    if index % 3 == 2:
        # If the index is 2, we want to extract the players' state and pre-rating
        # To get the state, we split the line by |, index the first value, and strip away the spaces
        # To get the pre-rating, we use regexes to split the line by |, :, and -.
        # Then, we index the third value and strip away the spaces
        entry.append(line.split("|")[0].strip())
        entry.append(re.split(r'[|:-]', line)[2].strip()[0:4])

# Using this method, the first entry list is an empty list, while the second is the headings
# Therefore, since we only need from the third entry list and on, we get rid of the first two and store the rest
needed_data = data[2:]

# Return needed_data
needed_data

[['GARY HUA', '6.0', 'ON', '1794'],
 ['DAKSHESH DARURI', '6.0', 'MI', '1553'],
 ['ADITYA BAJAJ', '6.0', 'MI', '1384'],
 ['PATRICK H SCHILLING', '5.5', 'MI', '1716'],
 ['HANSHI ZUO', '5.5', 'MI', '1655'],
 ['HANSEN SONG', '5.0', 'OH', '1686'],
 ['GARY DEE SWATHELL', '5.0', 'MI', '1649'],
 ['EZEKIEL HOUGHTON', '5.0', 'MI', '1641'],
 ['STEFANO LEE', '5.0', 'ON', '1411'],
 ['ANVIT RAO', '5.0', 'MI', '1365'],
 ['CAMERON WILLIAM MC LEMAN', '4.5', 'MI', '1712'],
 ['KENNETH J TACK', '4.5', 'MI', '1663'],
 ['TORRANCE HENRY JR', '4.5', 'MI', '1666'],
 ['BRADLEY SHAW', '4.5', 'MI', '1610'],
 ['ZACHARY JAMES HOUGHTON', '4.5', 'MI', '1220'],
 ['MIKE NIKITIN', '4.0', 'MI', '1604'],
 ['RONALD GRZEGORCZYK', '4.0', 'MI', '1629'],
 ['DAVID SUNDEEN', '4.0', 'MI', '1600'],
 ['DIPANKAR ROY', '4.0', 'MI', '1564'],
 ['JASON ZHENG', '4.0', 'MI', '1595'],
 ['DINH DANG BUI', '4.0', 'ON', '1563'],
 ['EUGENE L MCCLURE', '4.0', 'MI', '1555'],
 ['ALAN BUI', '4.0', 'ON', '1363'],
 ['MICHAEL R ALDRICH', '4.0', 'MI', 

Initially, this looks nice. However, when we take a closer look, we notice that some of the pre-ratings have Ps at the end of them. The following code finds all entries with a P in them and gets rid of the P.

In [5]:
# Looping through each player in the needed_data list
for player in needed_data:
    # Compiling a regex that finds any three numbers followed by a P
    regex = re.compile(r'\A\d{3}[P]')
    # Finding all pre-ratings that have three numbers followed by a P
    if re.findall(regex, str(player[3])):
        # Replacing the original pre-rating with the pre-rating without the P
        player[3] = player[3][0:3]
    
# Return needed_data
needed_data

[['GARY HUA', '6.0', 'ON', '1794'],
 ['DAKSHESH DARURI', '6.0', 'MI', '1553'],
 ['ADITYA BAJAJ', '6.0', 'MI', '1384'],
 ['PATRICK H SCHILLING', '5.5', 'MI', '1716'],
 ['HANSHI ZUO', '5.5', 'MI', '1655'],
 ['HANSEN SONG', '5.0', 'OH', '1686'],
 ['GARY DEE SWATHELL', '5.0', 'MI', '1649'],
 ['EZEKIEL HOUGHTON', '5.0', 'MI', '1641'],
 ['STEFANO LEE', '5.0', 'ON', '1411'],
 ['ANVIT RAO', '5.0', 'MI', '1365'],
 ['CAMERON WILLIAM MC LEMAN', '4.5', 'MI', '1712'],
 ['KENNETH J TACK', '4.5', 'MI', '1663'],
 ['TORRANCE HENRY JR', '4.5', 'MI', '1666'],
 ['BRADLEY SHAW', '4.5', 'MI', '1610'],
 ['ZACHARY JAMES HOUGHTON', '4.5', 'MI', '1220'],
 ['MIKE NIKITIN', '4.0', 'MI', '1604'],
 ['RONALD GRZEGORCZYK', '4.0', 'MI', '1629'],
 ['DAVID SUNDEEN', '4.0', 'MI', '1600'],
 ['DIPANKAR ROY', '4.0', 'MI', '1564'],
 ['JASON ZHENG', '4.0', 'MI', '1595'],
 ['DINH DANG BUI', '4.0', 'ON', '1563'],
 ['EUGENE L MCCLURE', '4.0', 'MI', '1555'],
 ['ALAN BUI', '4.0', 'ON', '1363'],
 ['MICHAEL R ALDRICH', '4.0', 'MI', 

In [6]:
## In order to calculate the average pre-rating of the opponents, the following code would be run.
## This didn't work, however, because when this code was run in conjunction with the commented out code above
## (symbolized by the ##), the P returned in some entries. I'm not sure why this happened.

## Looping through each player in needed_data
## for player in needed_data:
    ## Storing the IDs of each opponent in opponents
    ## opponents = player[2:8]
    ## Creating an empty list called opponent_scores
    ## opponent_scores = []
    ## Looping through each opponent in opponents
    ## for opponent in opponents:
        ## Appending the score of each opponent to the opponent_scores list
        ## Note: We index opponent-1 because the list in the original text file started with index 1, while Python
        ## lists start with index 0.
        ## opponent_scores.append(int(needed_data[int(opponent)-1][10]))
        ## Calculate the average of opponent_scores
        ## average = sum(opponent_scores) / len(opponent_scores)
        ## Print the average
        ## print(average)

Now that we have the desired information, we will store this information in a Pandas data frame.

In [7]:
# Creating the data frame tournament_info based off the aforementioned info
tournament_info = pd.DataFrame.from_records(needed_data, columns = ['Name', 'Total Points', 'State', 'Pre-Rating'])

# Displaying tournament_info
tournament_info

Unnamed: 0,Name,Total Points,State,Pre-Rating
0,GARY HUA,6.0,ON,1794
1,DAKSHESH DARURI,6.0,MI,1553
2,ADITYA BAJAJ,6.0,MI,1384
3,PATRICK H SCHILLING,5.5,MI,1716
4,HANSHI ZUO,5.5,MI,1655
...,...,...,...,...
59,JULIA SHEN,1.5,MI,967
60,JEZZEL FARKAS,1.5,ON,955
61,ASHWIN BALAJI,1.0,MI,1530
62,THOMAS JOSEPH HOSMER,1.0,MI,1175


Now that we have the data frame, we can generate a CSV using the code below.

In [8]:
# Generate a CSV based on the above data frame
tournament_info.to_csv('tournament_info.csv', index = False)

This will return a CSV with each player's name, total points, state, and pre-rating.