### MY470 Computer Programming

### Final Assignment, MT 2022

#### \*\*\* Due 12:00 noon on Monday, January 17, 2022 \*\*\*

---

## Fifteen Years of Women's Tennis

The final assignment asks you to use the computational thinking and programming skills you learned in the course to answer an empirical social science question. You are expected to apply the best practices and theoretical concepts we covered in the course to produce a program that not only returns the correct output but is also legible, modular, and reasonably optimized. The assignment assumes mastery of loops, conditionals, and functions, as well as awareness of issues related to runtime performance.

In honor of Emma Raducanu's historical achievements this year, we will study the results of womens' tennis matches over the period 2007-2021. Your objectives are to parse the data, reconstruct tournament brackets, identify the top players, and implement an algorithm to provide an alternative ranking for the players.

**NOTE: You are only allowed to use fundamental Python data types (lists, tuples, dictionaries, numpy.ndarray, etc.) to complete this assignment.** You are not allowed to use advanced data querying and data analysis packages such as pandas, sqlite, networkx, or similar. We impose this restriction in order to test your grasp of fundamental programming concepts, not your scripting experience with Python libraries you acquired from prior work or other courses. 

#### Hints

Although this assignment is quite streamlined, imagine that the tasks here are part of a larger project. How would you structure your program if in the future you may need to use a different dataset with similar structure, manipulate the data somewhat differently, add additional analyses, or modify the focus of the current analysis?  

Keep different data manipulations in separate functions/methods and group related functions/classes in separate `.py` files. Name your modules in an informative way.

Using an object-oriented approach to solve the problems is entirely optional and you will not obtain bonus points for this. If you are not confident in your programming skills, we recommend developing your solution with functions only.

### Data

You will find the data in the repository [https://github.com/lse-my470/assignment-final-data.git](https://github.com/lse-my470/assignment-final-data.git). Please clone the data repository in the same directory where you clone the repository `assignment-final-yourgithubname`. Keep the name for the data folder `assignment-final-data`. Any time when you refer to the data in your code, please use a relative path such as `'../assignment-final-data/filename.csv'` instead of an absolute path such as `'/Users/myname/Documents/my470/assignment-final-data/filename.csv'`. This way, we will be able to test your submission with our own copy of the data without having to modify your code.

The repository contains fifteen `.csv` files with match results, one file for each year. Each file contains the following variables:

* Tournament – the name of the tournament that the match was part of.
* Start date – the date when the tournament starts.
* End date – the date when the tournament ends.
* Best of – 3 means that first player to win 2 sets wins match (all WTA matches are best of 3 sets).
* Player 1, Player 2 – names of the players in the match.
* Rank 1, Rank 2 – WTA ranks of Player 1 and Player 2 before the start of the tournament. Not all players will have a ranking.
* Set 1-3 – result for each set played where the score is shown as: number of games won by Player 1 - number of games won by Player 2. The player that wins the most games in a set wins that set.
* Comment
  * Completed means match was played.
  * Player retired means that the named player withdrew and the other player won by default.

#### Hints

When writing your code, test it on a small "toy dataset", instead of the entire data. This way, you won't need to wait for minutes/hours just to find out that you have a syntax error!

You should not modify the original data in any way. If your code creates new data files, you should save them in the directory where this file resides. 

You should consider whether and which estimates to save on disk to speed up queries, instead of calcualting them again and again from the raw data. If you decide to do so, please write your code to save any such files with processed data in the directory where this file resides. This way, we can run your code without having to alter it. Think about what is the most efficient way, both in terms of time and space, to save the data.

### 1. Reconstructing the tournaments

Tournaments in tennis are typically in knockout format. In each round there are several pairs of fixtures, the winners of each match advance to the next round and the losers are eliminated. The winners then proceed to the next round and the process continues until two players contest the final. Typically, the rounds in the competition go as follows: \[`First Round`, `Second Round`, ...,\] `Quarterfinals`, `Semifinals`, `Final`.

In some cases, tournaments have a `Round Robin` (also known as all-play-all) group stage, meaning that each player in a group plays against each other player in turn. There are usually two parallel groups with 4 players in each. The top player(s) (i.e. those who won the most matches) in each group advance to a short knockout stage (typically just Semifinals and Final). These tournaments are:

* Sony Ericsson Championships 2007-2015
* Commonwealth Bank Tournament of Champions 2009
* Qatar Airways Tournament of Champions Sofia 2012
* Garanti Koza WTA Tournament of Champions 2013-2014
* BNP Paribas WTA Finals 2016-2018
* WTA Elite Trophy 2015-2019
* WTA Finals 2019, 2021

Very occasionally, tournaments also include a `Third Place` match too.

Your task is to identify the winner in each match and the round in which the match was played. To check your work, please call the procedures you have writen to print the answers to the following questions:

* Who won the final of the 2021 Women's US Open?
* Who played against whom in the 4th Round of the 2018 French Open? 
* In which round was Venus Williams eliminated in the 2011 Australian Open?
* How many finals has Naomi Osaka played in until now?
* How many times have Venus and Serena Williams played against each other and how many of these matches each won?

### 2. Winners win

One simple and naive way to rank players is to count how many matches they won each year. Write a procedure that estimates this. Then print the three top ranked players for the year 2021 and for the period 2007-2021, together with the total number of matches they won. Higher scores are better, so ranking is the reversed order of most matches won.


### 3. Winners don't lose

A more sophisticated ranking algorithm will account for the fact that some players may play fewer games (e.g., due to an injury) and that wins in later stages of a tournament (e.g., in the final and semi-final compared to earlier rounds of the competition) are more important. Write another procedure that estimates a player's rank by adding `r` points for every win and subtracting `1/r` for every loss, where `r = 1` for the lowest elimination round of the tournament, `r = 2` for the next round of the tournament, and so on. In other words, `r` starts at 1 and increases for every next elimination round. This way, winning larger competitions brings more points (they have more elimination rounds), wins in later rounds improve one's rank more, and losses in earlier rounds diminish one's rank more. For round-robin stage matches, assume that `r = 1`. 

Use this measure to print the three top ranked players for the year 2021 and for the period 2007-2021, together with their scores.


### 4. Winners beat other winners (WbW)

Another idea for ranking players is that winning over better players should count more. However, how we measure a good player depends on whether they beat other good players, so we get into a recursive situation. Do not worry, you will not have to write a recursive procedure here as every recursive solution can be rewritten as iteration! This is what we will do:

1. First, count all the players in the given data and assign each a score of 1/n, where n is the number of unique players.

2. Then, repeatedly do the following sequence of steps:
    1. Each player divides its current score equally among all the matches they have lost and passes these shares onto the players they lost to. If the player never lost, then they pass their current score to themselves. If the player lost two times to a specific individual, then they pass two shares to that individual.   
    2. Each player updates their score to be the sum of the shares they receive.    
    3. Rescale the score of each player by multiplying it by 0.85 and adding 0.15/n.
    
3. Repeat the procedure until adjustments are too small to matter. You may need to come up with modifications of the algorithm or the data if the algorithm cannot converge or produces non-sensical results in specific situations.

This algorithm essentially starts with a world in which everyone is equally important and then starts to "pass importance" iteratively to the winners until an equilibrium is reached.

Use this measure, which we will call WbW, to print the three top ranked players for the year 2021 and for the period 2007-2021, together with their scores. 


### 5. Compare your WbW ranking measure

In fact, the data already contain the players' official WTA rank in the variables Rank1 and Rank2, which follows its own [complex procedure](https://en.wikipedia.org/wiki/WTA_Rankings). How well does your WbW ranking correlate with the WTA ranking over time? 

Use the data from 2007 to estimate the players' WbW ranking at the end of this year. This will give you sufficient data to initialize your estimate. 

Then, starting with 2008, update your ranking before the start of each tournament, based only on results from tournaments completed in the previous 52 weeks. Use a scatter plot from `matplotlib` to plot the players' WTA ranking on the x-axis against your WbW ranking calculated before the start of the same tournament on the y-axis. If an individual's WTA ranking changes mid-tournament (rare), take their first listed ranking within that tournament. Each point on the plot should be a player's ranking before the start of each tournament for each tournament they participated in from 2008 until now. So the number of points to plot will be the number of unique individuals that took part in each tournament for all tournaments during that period.

Write a couple of sentences to note what you observe. You will not be marked for your observations but some reflection is important as it could help you identify problems with your code, for example.


In [381]:
import os
import csv
import math
os.getcwd()


'/Users/gamzetekin/Desktop/assignment-final-gatekin'

In [382]:
absolutepath = os.getcwd()+'/assignment-final-data-main'
print(absolutepath)

fileDirectory = os.path.dirname(absolutepath)
print(fileDirectory)
#Path of parent directory
parentDirectory = os.path.dirname(fileDirectory)
print(parentDirectory)
#Navigate to Strings directory
newPath = os.path.join(parentDirectory, 'Strings')   
print(newPath)


/Users/gamzetekin/Desktop/assignment-final-gatekin/assignment-final-data-main
/Users/gamzetekin/Desktop/assignment-final-gatekin
/Users/gamzetekin/Desktop
/Users/gamzetekin/Desktop/Strings


## Import and run your code here

Keep your code in separate `.py` files and then import it in the code cell below. Then call the functions/methods you need to conduct the analysis described above and print the requested outputs. We should be able to run the cell below to calculate again the results and get the requested output, without having to modify your code in any way. 

In [427]:
# Import modules to estimate and show results

#Question 1 identify the winner in each match and the round in which the match was played
# First Read the data 

winners = []
Start_Date = []
End_Date = []
BestOf = []
Player_1 = []
Player_2 = []
Set_1 = []
Set_2 = []
Set_3 = []
Tournament_name = []
Rank_1 = []
Rank_2 = []
Status = []
Year = []
 

years = list(range(2007, 2022))
print(years)
years = [str(i) for i in years]
print(type(years))
print(years[1])

filename = ['/'+ i + '.csv' for i in years]
j = 0

for i in filename:
    print(absolutepath+i)
    with open(absolutepath+i, 'r') as csv_file:
        csv_reader = csv.reader(csv_file)
        next(csv_reader)
        for line in csv_reader:
            Tournament_name.append(line[0])
            Start_Date.append(line[1])
            End_Date.append(line[2])
            BestOf.append(line[3])
            Player_1.append(line[4])
            Player_2.append(line[5])
            Rank_1.append(line[6])
            Rank_2.append(line[7])
            Set_1.append(line[8])
            Set_2.append(line[9])
            Set_3.append(line[10])
            Status.append(line[11])
            Year.append(years[j])
        j = j + 1
            
Data = [Tournament_name,End_Date,Start_Date,BestOf,Player_1,Player_2,Rank_1,Rank_2,Set_1,Set_2,Set_3,Status,Year]

print(type(Data))   
print(len(Player_2))

#for the exceptions where the player retired 
retired_player_names = [i.rsplit(' ', 1)[0] for i in Status]

for i in range(0,len(retired_player_names)):
    if(retired_player_names[i] != 'Completed'):
        if(retired_player_names[i] == Player_1[i]):
            Status[i] = "retired_1"
        elif(retired_player_names[i] == Player_2[i]):
            Status[i] = "retired_2"
set(Status)    



[2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021]
<class 'list'>
2008
/Users/gamzetekin/Desktop/assignment-final-gatekin/assignment-final-data-main/2007.csv
/Users/gamzetekin/Desktop/assignment-final-gatekin/assignment-final-data-main/2008.csv
/Users/gamzetekin/Desktop/assignment-final-gatekin/assignment-final-data-main/2009.csv
/Users/gamzetekin/Desktop/assignment-final-gatekin/assignment-final-data-main/2010.csv
/Users/gamzetekin/Desktop/assignment-final-gatekin/assignment-final-data-main/2011.csv
/Users/gamzetekin/Desktop/assignment-final-gatekin/assignment-final-data-main/2012.csv
/Users/gamzetekin/Desktop/assignment-final-gatekin/assignment-final-data-main/2013.csv
/Users/gamzetekin/Desktop/assignment-final-gatekin/assignment-final-data-main/2014.csv
/Users/gamzetekin/Desktop/assignment-final-gatekin/assignment-final-data-main/2015.csv
/Users/gamzetekin/Desktop/assignment-final-gatekin/assignment-final-data-main/2016.csv
/Users/gamzetekin/D

{'Completed', 'retired_1', 'retired_2'}

**Key for Data**

Tournament_name(line[0])  
Start_Date(line[1])  
End_Date(line[2])  
BestOf(line[3])  
Player_1(line[4])  
Player_2(line[5])  
Rank_1(line[6])  
Rank_2(line[7])  
Set_1(line[8])  
Set_2(line[9])  
Set_3(line[10])  
Status(line[11])  
Year(line[12])



In [516]:
#Functions used in answering questions

def FindIndices(arr,item):
    """Looks for an item within a list and returns which index it is from"""
    ind = []
    for i in range(0,len(arr)):
        if(arr[i] == item):
            ind.append(i)
    return ind

def Subset(data,year,tournament):
    """Takes as argument the year and the name of the tournament an iterates over the their indices
    and finds the intersection of tournament name and the year"""
    year_ind = FindIndices(data[12],year)
    tournament_ind = FindIndices(data[0],tournament)
    
    intersect = set.intersection(set(year_ind),set(tournament_ind))
    return list(intersect)


def ReturnRoundIndices(data, year, tournament, rnd):
    """Takes as argument the year, tournament name and round number, and iterates over the indeces of 
    the matches in that round"""
    indices = Subset(data,year,tournament)
    #print(len(indices))
    round_count = math.log(len(indices)+1,2)
    #print(math.log(len(indices)+1,2))
    if math.log(len(indices)+1,2) < rnd:
        return 'FALSE'
    
    lastn = int((2**(round_count-rnd) + 2**(round_count-rnd)) - 1)
 
    indices = indices[lastn*-1:]
    print(indices)
    indices[0:int((2**(round_count-rnd)))]
    
    return indices[0:int((2**(round_count-rnd)))]

def FindWinners(data, indices):
    """Takes as argument the indeces, and
    finds the winner of a single match by comparing 3 sets"""
    player_1 = 0
    player_2 = 0
    
    if data[11] == 'retired_1': #if the first player retires than first one wins autamtically
        return data[5][indices]
    elif data[11] == 'retired_2':#if the second player retires than first one wins autamtically
        return data[4][indices]
    
    set_1 = data[8][indices].strip().split('-') 
    
    if set_1 != ['']:
        [int(x) for x in set_1 ] #convert to integer
        if set_1[0] > set_1[1]:
            player_1 += 1 
        else:
            player_2 += 1
    
    set_2 = data[9][indices].strip().split('-') 
    
    if set_2 != ['']:
        [int(x) for x in set_2]
        if set_2[0] > set_2[1]:
            player_1 += 1 
        else:
            player_2 += 1
    
    set_3 = data[10][indices].strip().split('-')
   
    if set_3 != ['']:
        [int(x) for x in set_3]
        if set_3[0] > set_3[1]:
            player_1 += 1 
        else:
            player_2 += 1
    
    if player_1 > player_2:
        return data[4][indices]
    else:
        return data[5][indices]
    

def PlayerEliminated(data,tournament,player_name,year):
    """Takes as argumenent the tournamet, player_name and year, and returns which round a 
    specified player is eliminated"""    
    indices = Subset(data,year,tournament) #Find the index of a a given match
 
    last_index = -1
    for i in indices:
        if player_name == data[4][i]:
            if last_index < i:
                last_index = i

    for i in indices:
        if player_name == data[5][i]:
             if last_index < i:
                last_index = i 
                
    if last_index == -1: #if the player did not participate in a given tournament
        return -1, -2

    #Find the round of a given match  
    match_count = len(indices)
    round_count = math.ceil(math.log(match_count+1,2))
    

    match_order = indices.index(last_index)
    match_reverse_order = match_count-match_order

    r = int(round_count - math.floor(math.log(match_reverse_order, 2)))

    #if the player is not eliminated untill the last round
    if(r == round_count):
        winner = FindWinners(data,last_index)
        if winner == player_name:
            return "Champion", round_count
    
    return r ,round_count

def FacetoFace(data,player_1,player_2):
    """Takes as argumnent player_1 and player_2, 
    then returns how many matches players payed against each other and how many of those
    each player won """
    player_1_Wins = 0 
    player_2_Wins = 0
    
    #iterate over player indices
    match_indices = FindIndices(data[4],player_1)
    match_indices2 = FindIndices(data[5],player_2)
    
    match_indices3 = FindIndices(data[4],player_2)
    match_indices4 = FindIndices(data[5],player_1)
    
    #Find the intersection of each player indices to find the matches against eachother
    intersect_1_and_2 = set.intersection(set(match_indices),set(match_indices2))
    intersect_2_and_3 = set.intersection(set(match_indices3),set(match_indices4))
    
    all_matches = list(list(intersect_1_and_2) + list(intersect_2_and_3))

    for i in all_matches:
        temp = FindWinners(data,i)
        if player_1 == temp:
            player_1_Wins += 1
        else: 
            player_2_Wins += 1
    
    return player_1_Wins, player_2_Wins, player_1_Wins + player_2_Wins
        
def MatchesWon(data,year):
    """Counts how many matches players won each year"""
    winners = []
    frequency = {}
    
    
    if year == 'All':
        years_index = data[12]
    else:
        years_index = FindIndices(data[12],year)
    
    
    for i in years_index:
        winners.append(FindWinners(data,int(i)))
        
    for item in winners:
        if item in frequency:
            frequency[item] += 1
        else:
            frequency[item] = 1
    
    from collections import Counter
    k = Counter(frequency)
    # Finding 3 highest values
    high = k.most_common(3)
    for i in high:
        print(i[0]," :",i[1]," ")
    
    return frequency
#Source for top most frequent element of a dictionary: 
#https://www.geeksforgeeks.org/python-program-to-find-the-highest-3-values-in-a-dictionary/

def FindLosers(data,indices):
    """Takes as argument the indeces, and
    finds the loser of a single match by comparing 3 sets"""
    player_1 = 0
    player_2 = 0
    
    if data[11] == 'retired_1': #if the first player retires than first one wins autamtically
        return data[5][indices]
    elif data[11] == 'retired_2':#if the second player retires than first one wins autamtically
        return data[4][indices]
    
    set_1 = data[8][indices].strip().split('-') 
    
    if set_1 != ['']:
        [int(x) for x in set_1 ] #convert to integer
        if set_1[0] > set_1[1]:
            player_1 += 1 
        else:
            player_2 += 1
    
    set_2 = data[9][indices].strip().split('-') 
    
    if set_2 != ['']:
        [int(x) for x in set_2]
        if set_2[0] > set_2[1]:
            player_1 += 1 
        else:
            player_2 += 1
    
    set_3 = data[10][indices].strip().split('-')
   
    if set_3 != ['']:
        [int(x) for x in set_3]
        if set_3[0] > set_3[1]:
            player_1 += 1 
        else:
            player_2 += 1
    
    if player_1 < player_2:
        return data[4][indices]
    else:
        return data[5][indices]


def FindPlayersRank(data,year):
    """Takes as argument the year and returns the rank of the player"""
    if year == 'All':
        indices = data[12]
        indices = [int(i) for i in indices]
    else:
        indices = FindIndices(data[12],year)
    
    frequency = {}
 
    for i in indices:
    
        round_indices = Subset(data,data[12][i],data[0][i])
        match_count = len(round_indices)
        round_count = math.log(match_count+1,2)

        match_order = round_indices.index(i)
        match_reverse_order = match_count-match_order

        r = round_count - math.floor(math.log(match_reverse_order, 2))#Rounding down the round values
        
        winners = FindWinners(data,i)
        if winners in frequency:
            frequency[winners] += r
        else:
            frequency[winners] = r 

        losers = FindLosers(data,i)
        if losers in frequency:
            frequency[losers] -= 1/r
        else:
            frequency[losers] = - 1/r
    
    from collections import Counter
    k = Counter(frequency)
    # Finding 3 highest values
    high = k.most_common(3)
    for i in high:
        print(i[0]," :",i[1]," ")
    return frequency
    
#The runtime of the function is a bit slow but it is working


def WbW(dataset,year):
    """Assumes that all players starts with 1/n scores and as players lose and win matches 
    updates the scores and finally rescale the score of each player by 
    multiplying it by 0.85 and adding 0.15/n """
    data = []
    if year != 'All':
        year_id = FindIndices(dataset[12],year) 
        for i in range(0,len(dataset)):
            data.append([dataset[i][j] for j in year_id])
    else:
        data = dataset
    #Find unique players by combining lines 4(player_1) lines 5(player_2)
    unique_players = set((data[4]) +(data[5])) 
    
    list_players = list(unique_players)
    
    #Every player starts with 1/n scores
    scores = {i:1/len(list_players) for i in list_players}
    updated_scores = {i:0 for i in list_players} 
    
    winners = []
    losers = []
    for i in range(0,len(data[4])):
        winners.append(FindWinners(data,i))
        
    for i in range(0,len(data[4])):
        losers.append(FindLosers(data,i))
    
   
    
    while(1):
        updated_scores = {i:0 for i in list_players}
        for i in list_players: # 1- Find indeces in losers
            losers_index = FindIndices(losers,i)
            corr_winners = [winners[i] for i in losers_index] #2- get the corresponding winners

            frequency = {}

            #3- look for frequency of each player in the result from step 2     
            for item in corr_winners:
                if item in frequency:
                    frequency[item] += 1
                else:
                    frequency[item] = 1

            if len(losers_index) == 0: #no mathces lost scenario
                
                updated_scores[i] = scores[i]
            else:
                for player in frequency.keys():
                    updated_scores[player] += frequency[player] * (scores[i]/len(losers_index))
        
        updated_scores = {i: updated_scores[i] * 0.85 + 0.15/len(list_players) for i in updated_scores}
        #Rescale the score of each player by multiplying it by 0.85 and adding 0.15/n.
        change = 0
        for j in scores.keys():
            change += abs(scores[j] - updated_scores[j])
        
        
        if(change < 0.01):
        
            break
            
    
        prev_change = change
        scores = updated_scores
        
  
       
     
    
    return scores


In [499]:

#Problem 1: Who won the final of the 2021 Women's US Open?

print('index it is found:') 

ReturnRoundIndices(Data,'2021','US Open',7)


print('Winner of 2021 Womens US Open:')
FindWinners(Data,35124)

index it is found:
[35124]
Winner of 2021 Womens US Open:


'Raducanu E.'

In [500]:
#Q1 Problem 2 : Who played against whom in the 4th Round of the 2018 French Open?

ReturnRoundIndices(Data,'2018','French Open',4)
print(Data[4][28342],'vs',Data[5][28342]) 

print(Data[4][28343],'vs',Data[5][28343])


print(Data[4][28344],'vs',Data[5][28344])

print(Data[4][28345],'vs',Data[5][28345] )


print(Data[4][28346],'vs',Data[5][28346])


print(Data[4][28347],'vs',Data[5][28347])


print(Data[4][28348],'vs',Data[5][28348])


print(Data[4][28349],'vs',Data[5][28349])




[28342, 28343, 28344, 28345, 28346, 28347, 28348, 28349, 28350, 28351, 28352, 28353, 28354, 28355, 28356]
Buzarnescu M. vs Keys M.
Strycova B. vs Putintseva Y.
Stephens S. vs Kontaveit A.
Mertens E. vs Halep S.
Kasatkina D. vs Wozniacki C.
Kerber A. vs Garcia C.
Williams S. vs Sharapova M.
Tsurenko L. vs Muguruza G.


In [513]:
#Q1 Problem 3: In which round Venus Williams was eliminated in 2011 Australian Open
print('Venus Williams eliminated at round:')
eliminated , total_rnd = PlayerEliminated(Data,'Australian Open','Williams V.','2011')
print(eliminated)

Venus Williams eliminated at round:
3


In [515]:
#Q1 Problem 4: How many finals has Naomi Osaka played in until now?

def FinalsPlayed(data,player):
    FindIndices(data[4],player)
    FindIndices(data[5],player)
    ind = FindIndices(data[4],player) + FindIndices(data[5],player)#combine player_1 and player_2 indices
    
    tournaments = set([data[0][i] for i in ind])
    year = set([data[12][i] for i in ind])
    finals = 0
    for i in tournaments:
        for j in year:
            r, round_count = PlayerEliminated(data,i,player,j)
            
            
            if r == 'Champion' or r == round_count:
                finals += 1
                
                
    return finals


a = FinalsPlayed(Data, 'Osaka N.')
print(a)
                       

7


In [505]:
#Q1 Problem 5: How many times have Venus and Serena Williams played against each other 
#and how many of these matches each won?

print('Matches Serena Williams won, Matches Venus Williams won and Total matches played in that respect of order')
FacetoFace(Data, 'Williams S.', 'Williams V.')


Matches Serena Williams won, Matches Venus Williams won and Total matches played in that respect of order


(12, 6, 18)

In [434]:

#Q2 print the three top ranked players for the year 2021 and for the period 2007-2021

print('Three top ranked players for the year 2021:')
MatchesWon(Data,'2021') 

print('Three top ranked players for the period 2007-2021:')
MatchesWon(Data,'All')


Three top ranked players for the year 2021:
Kontaveit A.  : 49  
Jabeur O.  : 48  
Sabalenka A.  : 44  
Three top ranked players for the period 2007-2021:
Radwanska A.  : 2522  
Szavay A.  : 2521  
Chakvetadze A.  : 2514  


{'Safina D.': 2509,
 'Bartoli M.': 2386,
 'Bammer S.': 2433,
 'Ivanovic A.': 2448,
 'Williams S.': 2468,
 'Jankovic J.': 2445,
 'Williams V.': 2458,
 'Vakulenko J.': 2422,
 'Szavay A.': 2521,
 'Radwanska A.': 2522,
 'Chakvetadze A.': 2514,
 'Paszek T.': 2483,
 'Azarenka V.': 2444,
 'Kuznetsova S.': 1055,
 'Peer S.': 2447}

In [414]:
#Q3 print the three top ranked players for the year 2021 and for the period 2007-2021

print('Three top ranked players for the year 2021:')
FindPlayersRank(Data,'2021') 

print('Three top ranked players for the period 2007-2021:')
FindPlayersRank(Data,'All')

Three top ranked players for the year 2021:
Barty A.  : 113.90342809894918  
Sabalenka A.  : 99.08588678661731  
Jabeur O.  : 86.69097912503963  
Three top ranked players for the period 2007-2021:
Radwanska A.  : 7566.0  
Szavay A.  : 7563.0  
Chakvetadze A.  : 7542.0  


{'Safina D.': 7527.0,
 'Rolle A.': -836.3333333333585,
 'Bartoli M.': 7158.0,
 'Safarova L.': -795.3333333333538,
 'Bammer S.': 7299.0,
 'Dementieva E.': -811.0000000000223,
 'Ivanovic A.': 7344.0,
 'Dushevina V.': -816.0000000000229,
 'Williams S.': 7404.0,
 'Zvonareva V.': -822.6666666666903,
 'Jankovic J.': 7335.0,
 'Cornet A.': -815.0000000000227,
 'Williams V.': 7374.0,
 'Bondarenko A.': -819.3333333333566,
 'Vakulenko J.': 7266.0,
 'Kirilenko M.': -807.3333333333552,
 'Szavay A.': 7563.0,
 'Petrova N.': -840.333333333359,
 'Radwanska A.': 7566.0,
 'Sharapova M.': -840.6666666666923,
 'Chakvetadze A.': 7542.0,
 'Mirza S.': -838.0000000000254,
 'Paszek T.': 7449.0,
 'Schnyder P.': -827.6666666666908,
 'Azarenka V.': 7332.0,
 'Hingis M.': -814.6666666666894,
 'Kuznetsova S.': 3165.0,
 'Medina Garrigues A.': -351.6666666666641,
 'Peer S.': 7341.0,
 'Vaidisova N.': -815.6666666666895}

In [517]:
# Q4 print the three top ranked players for the year 2021 and for the period 2007-2021

scores_2021 = WbW(Data,'2021')

print('Three top ranked players for the year 2021:')
max_key = max(scores_2021, key=scores_2021.get)
scores_2021 = sorted(scores_2021.items(), key=lambda x: x[1], reverse=True)
scores_2021[:3]


Three top ranked players for the year 2021:


[('Barty A.', 0.026849487647621274),
 ('Sabalenka A.', 0.024556762256554426),
 ('Kontaveit A.', 0.022489388001369206)]

In [518]:
#Q4 print the three top ranked players the period 2007-2021

scores_All = WbW(Data,'All')

print('Three top ranked players for the period 2007-2021:')
max_key = max(scores_All, key=scores_All.get)
scores_All = sorted(scores_All.items(), key=lambda x: x[1], reverse=True)
scores_All[:3]

Three top ranked players for the period 2007-2021:


[('Williams S.', 0.017032908908715772),
 ('Wozniacki C.', 0.01490017135899838),
 ('Azarenka V.', 0.013955874990086095)]

---

### Evaluation

| Aspect  | Mark     | Comment   
|:-------:|:--------:|:----------------------
| 1       |   /15    |      
| 2       |   /5     |   
| 3       |   /10    | 
| 4       |   /10    | 
| 5       |   /10     | 
| Legibility     |   /10    | 
| Modularity     |   /10    | 
| Optimization   |   /30    | 
|**Total**|  **/100**  | 
