## Analysing atp tennis matches with pandas

The dataset we will manipulate with consists of men's tennis matches at Wimbledon.<br><br>Many different approaches are possible so what I am trying to do here is to draw the attention to points that might not seem to be obvious at the first glance.<br>The data I am using can be downloaded in .csv format from the following link.<br>

## 1. Show installed versions

<br><br>https://raw.githubusercontent.com/solajozsef/ipython-notebooks/main/AtpWimbledonMatches.csv<br><br>If you need more data it is available at<br><br>http://tennis-data.co.uk<br><br>and there is a 'notes.txt' page explaining the abbreviations on <br><br>http://tennis-data.co.uk/notes.txt<br>

### Importing some libraries

In [44]:
import numpy as np
import pandas as pd
import math
import random
import glob

<br>This code provides multiple outputs from one code cell.<br>

In [48]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

<br>The key player of such a notebook is python pandas so let's check which version we use.<br><br>

In [3]:
!pip show pandas

Name: pandas
Version: 1.2.3
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
Author: None
Author-email: None
License: BSD
Location: /home/sj/.local/lib/python3.8/site-packages
Requires: numpy, python-dateutil, pytz
Required-by: visions, seaborn, phik, pandas-profiling


Settting some display options.

In [4]:
pd.set_option("display.max_rows", 3000, "display.max_columns", 50)
pd.get_option("display.max_rows"),
pd.get_option("display.max_columns")

(3000,)

50

<br>The dataset we are using can be downloaded from the following github link.<br><br>

In [7]:
url = 'https://raw.githubusercontent.com/solajozsef/ipython-notebooks/main/AtpWimbledonMatches.csv'

<br>Because it is a .csv file we have to read it in a pandas dataframe.<br><br>

In [8]:
df = pd.read_csv(url)

<br>Some basic info about df columns.<br><br>

In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1905 entries, 0 to 1904
Data columns (total 21 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Location    1905 non-null   object 
 1   Tournament  1905 non-null   object 
 2   Date        1905 non-null   object 
 3   Winner      1905 non-null   object 
 4   Loser       1905 non-null   object 
 5   WRank       1905 non-null   int64  
 6   LRank       1905 non-null   int64  
 7   W1          1898 non-null   float64
 8   L1          1898 non-null   float64
 9   W2          1885 non-null   float64
 10  L2          1886 non-null   float64
 11  W3          1858 non-null   float64
 12  L3          1858 non-null   float64
 13  W4          922 non-null    float64
 14  L4          922 non-null    float64
 15  W5          358 non-null    float64
 16  L5          358 non-null    float64
 17  Wsets       1899 non-null   float64
 18  Lsets       1899 non-null   float64
 19  B365W       1897 non-null  

In [10]:
df.shape

(1905, 21)

<br>As we see there are 1905 rows and 17 columns in the dataframe we are going to play with.<br><br>

In [11]:
df.describe()

Unnamed: 0,WRank,LRank,W1,L1,W2,L2,W3,L3,W4,L4,W5,L5,Wsets,Lsets,B365W,B365L
count,1905.0,1905.0,1898.0,1898.0,1885.0,1886.0,1858.0,1858.0,922.0,922.0,358.0,358.0,1899.0,1899.0,1897.0,1899.0
mean,49.756955,88.554331,5.800843,4.278714,5.855703,4.229056,5.824004,4.043595,5.805857,4.053145,7.178771,4.444134,2.938915,0.678252,1.777132,5.196707
std,70.233601,95.966322,1.196269,1.815351,1.184819,1.792565,1.142381,1.835665,1.237422,1.810932,4.082675,4.441459,0.344239,0.772421,1.696712,5.187248
min,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,1.0,1.002
25%,11.0,36.0,6.0,3.0,6.0,3.0,6.0,3.0,6.0,3.0,6.0,3.0,3.0,0.0,1.11,2.0
50%,30.0,70.0,6.0,4.0,6.0,4.0,6.0,4.0,6.0,4.0,6.0,4.0,3.0,0.0,1.3,3.4
75%,68.0,109.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,7.0,5.0,3.0,1.0,1.72,6.5
max,1065.0,1085.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,70.0,68.0,3.0,2.0,29.0,34.0


<br>We can check the first 5 rows of df.<br><br>

In [12]:
df.head()

Unnamed: 0,Location,Tournament,Date,Winner,Loser,WRank,LRank,W1,L1,W2,L2,W3,L3,W4,L4,W5,L5,Wsets,Lsets,B365W,B365L
0,London,Wimbledon,29/06/15,Broady L.,Matosevic M.,182,138,5.0,7.0,4.0,6.0,6.0,3.0,6.0,2.0,6.0,3.0,3.0,2.0,2.62,1.44
1,London,Wimbledon,29/06/15,Cilic M.,Moriya H.,9,174,6.0,3.0,6.0,2.0,7.0,6.0,,,,,3.0,0.0,1.02,17.0
2,London,Wimbledon,29/06/15,Berankis R.,Haider-Maurer A.,90,57,6.0,2.0,5.0,2.0,,,,,,,1.0,0.0,1.5,2.5
3,London,Wimbledon,29/06/15,Thiem D.,Sela D.,30,85,2.0,6.0,6.0,3.0,6.0,4.0,6.0,4.0,,,3.0,1.0,1.53,2.37
4,London,Wimbledon,29/06/15,Granollers M.,Tipsarevic J.,72,486,6.0,3.0,6.0,4.0,6.0,2.0,,,,,3.0,0.0,1.57,2.25


<br>Filling NaN(Not a Number) values with 0 and check the first 5 rows.<br><br>

In [27]:
df.fillna(value=0, inplace=True)

In [28]:
df.head()

Unnamed: 0,location,tournament,date,winner,loser,wrank,lrank,w1,l1,w2,l2,w3,l3,w4,l4,w5,l5,wsets,lsets,b365w,b365l
0,London,Wimbledon,29/06/15,broady l.,matosevic m.,182,138,5.0,7.0,4.0,6.0,6.0,3.0,6.0,2.0,6.0,3.0,3.0,2.0,2.62,1.44
1,London,Wimbledon,29/06/15,cilic m.,moriya h.,9,174,6.0,3.0,6.0,2.0,7.0,6.0,0.0,0.0,0.0,0.0,3.0,0.0,1.02,17.0
2,London,Wimbledon,29/06/15,berankis r.,haider-maurer a.,90,57,6.0,2.0,5.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.5,2.5
3,London,Wimbledon,29/06/15,thiem d.,sela d.,30,85,2.0,6.0,6.0,3.0,6.0,4.0,6.0,4.0,0.0,0.0,3.0,1.0,1.53,2.37
4,London,Wimbledon,29/06/15,granollers m.,tipsarevic j.,72,486,6.0,3.0,6.0,4.0,6.0,2.0,0.0,0.0,0.0,0.0,3.0,0.0,1.57,2.25


<br>We can collect df's column names with<br><br>

In [19]:
df.columns

Index(['Location', 'Tournament', 'Date', 'Winner', 'Loser', 'WRank', 'LRank',
       'W1', 'L1', 'W2', 'L2', 'W3', 'L3', 'W4', 'L4', 'W5', 'L5', 'Wsets',
       'Lsets', 'B365W', 'B365L'],
      dtype='object')

<br>I like changing capital letters in columns' names, makes life easier.<br><br>

In [20]:
df.columns = df.columns.str.lower()

In [21]:
df.columns

Index(['location', 'tournament', 'date', 'winner', 'loser', 'wrank', 'lrank',
       'w1', 'l1', 'w2', 'l2', 'w3', 'l3', 'w4', 'l4', 'w5', 'l5', 'wsets',
       'lsets', 'b365w', 'b365l'],
      dtype='object')

<br>I also recommend changing 'winner' and 'loser' columns' contents to lowercase because we are going to use them a lot.<br><br>

In [22]:
df['winner'] = df['winner'].str.lower()

In [23]:
df['loser'] = df['loser'].str.lower()

<br>and check if it worked<br><br>

In [29]:
df.head(10)

Unnamed: 0,location,tournament,date,winner,loser,wrank,lrank,w1,l1,w2,l2,w3,l3,w4,l4,w5,l5,wsets,lsets,b365w,b365l
0,London,Wimbledon,29/06/15,broady l.,matosevic m.,182,138,5.0,7.0,4.0,6.0,6.0,3.0,6.0,2.0,6.0,3.0,3.0,2.0,2.62,1.44
1,London,Wimbledon,29/06/15,cilic m.,moriya h.,9,174,6.0,3.0,6.0,2.0,7.0,6.0,0.0,0.0,0.0,0.0,3.0,0.0,1.02,17.0
2,London,Wimbledon,29/06/15,berankis r.,haider-maurer a.,90,57,6.0,2.0,5.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.5,2.5
3,London,Wimbledon,29/06/15,thiem d.,sela d.,30,85,2.0,6.0,6.0,3.0,6.0,4.0,6.0,4.0,0.0,0.0,3.0,1.0,1.53,2.37
4,London,Wimbledon,29/06/15,granollers m.,tipsarevic j.,72,486,6.0,3.0,6.0,4.0,6.0,2.0,0.0,0.0,0.0,0.0,3.0,0.0,1.57,2.25
5,London,Wimbledon,29/06/15,goffin d.,zeballos h.,15,128,7.0,6.0,6.0,1.0,6.0,1.0,0.0,0.0,0.0,0.0,3.0,0.0,1.11,6.5
6,London,Wimbledon,29/06/15,verdasco f.,klizan m.,43,39,4.0,6.0,6.0,2.0,6.0,3.0,6.0,7.0,13.0,11.0,3.0,2.0,1.28,3.5
7,London,Wimbledon,29/06/15,kyrgios n.,schwartzman d.,29,64,6.0,0.0,6.0,2.0,7.0,6.0,0.0,0.0,0.0,0.0,3.0,0.0,1.04,13.0
8,London,Wimbledon,29/06/15,mayer l.,kokkinakis t.,21,71,7.0,6.0,7.0,6.0,6.0,4.0,0.0,0.0,0.0,0.0,3.0,0.0,1.53,2.37
9,London,Wimbledon,29/06/15,isner j.,soeda g.,17,91,7.0,6.0,6.0,4.0,6.0,4.0,0.0,0.0,0.0,0.0,3.0,0.0,1.06,10.0


<br>Further checking the date column we could notice it uses varying date formats so to make them uniform run the following one-liner.<br><br>

In [32]:
df.date = pd.to_datetime(df.date)

<br>Check the modified look.<br><br>

In [33]:
df.head()

Unnamed: 0,location,tournament,date,winner,loser,wrank,lrank,w1,l1,w2,l2,w3,l3,w4,l4,w5,l5,wsets,lsets,b365w,b365l
0,London,Wimbledon,2015-06-29,broady l.,matosevic m.,182,138,5.0,7.0,4.0,6.0,6.0,3.0,6.0,2.0,6.0,3.0,3.0,2.0,2.62,1.44
1,London,Wimbledon,2015-06-29,cilic m.,moriya h.,9,174,6.0,3.0,6.0,2.0,7.0,6.0,0.0,0.0,0.0,0.0,3.0,0.0,1.02,17.0
2,London,Wimbledon,2015-06-29,berankis r.,haider-maurer a.,90,57,6.0,2.0,5.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.5,2.5
3,London,Wimbledon,2015-06-29,thiem d.,sela d.,30,85,2.0,6.0,6.0,3.0,6.0,4.0,6.0,4.0,0.0,0.0,3.0,1.0,1.53,2.37
4,London,Wimbledon,2015-06-29,granollers m.,tipsarevic j.,72,486,6.0,3.0,6.0,4.0,6.0,2.0,0.0,0.0,0.0,0.0,3.0,0.0,1.57,2.25


<br>We can sort our dataframe by date for example.<br><br>

In [38]:
df.sort_values(by='date', ascending=True, ignore_index=True).head()

Unnamed: 0,location,tournament,date,winner,loser,wrank,lrank,w1,l1,w2,l2,w3,l3,w4,l4,w5,l5,wsets,lsets,b365w,b365l
0,London,Wimbledon,2005-01-07,federer r.,hewitt l.,1,2,6.0,3.0,6.0,4.0,7.0,6.0,0.0,0.0,0.0,0.0,3.0,0.0,1.11,6.5
1,London,Wimbledon,2005-02-07,roddick a.,johansson t.,4,22,6.0,7.0,6.0,2.0,7.0,6.0,7.0,6.0,0.0,0.0,3.0,1.0,1.19,4.5
2,London,Wimbledon,2005-03-07,federer r.,roddick a.,1,4,6.0,2.0,7.0,6.0,6.0,4.0,0.0,0.0,0.0,0.0,3.0,0.0,1.14,5.5
3,London,Wimbledon,2005-06-20,mayer f.,ventura s.,57,110,6.0,4.0,7.0,5.0,6.0,2.0,0.0,0.0,0.0,0.0,3.0,0.0,1.06,8.0
4,London,Wimbledon,2005-06-20,philippoussis m.,beck k.,142,43,7.0,5.0,6.0,4.0,6.0,2.0,0.0,0.0,0.0,0.0,3.0,0.0,1.66,2.1


In [36]:
df.sort_values(by=['winner', 'loser'], ascending=True, ignore_index=True).iloc[:50, :]

Unnamed: 0,location,tournament,date,winner,loser,wrank,lrank,w1,l1,w2,l2,w3,l3,w4,l4,w5,l5,wsets,lsets,b365w,b365l
0,London,Wimbledon,2006-06-27,agassi a.,pashanski b.,20,71,2.0,6.0,6.0,2.0,6.0,4.0,6.0,3.0,0.0,0.0,3.0,1.0,1.1,6.5
1,London,Wimbledon,2006-06-29,agassi a.,seppi a.,20,68,6.0,4.0,7.0,6.0,6.0,4.0,0.0,0.0,0.0,0.0,3.0,0.0,1.44,2.62
2,London,Wimbledon,2017-04-07,albot r.,bagnis f.,108,107,4.0,6.0,6.0,4.0,7.0,6.0,7.0,6.0,0.0,0.0,3.0,1.0,1.2,4.5
3,London,Wimbledon,2018-04-07,albot r.,bedene a.,98,71,6.0,2.0,4.0,6.0,7.0,6.0,5.0,7.0,6.0,3.0,3.0,2.0,3.4,1.33
4,London,Wimbledon,2018-03-07,albot r.,carreno busta p.,98,12,3.0,6.0,6.0,0.0,6.0,7.0,6.0,2.0,6.0,1.0,3.0,2.0,3.0,1.4
5,London,Wimbledon,2016-06-29,albot r.,elias g.,110,92,3.0,6.0,6.0,2.0,7.0,5.0,6.0,4.0,0.0,0.0,3.0,1.0,1.28,3.75
6,London,Wimbledon,2009-06-24,almagro n.,beck k.,48,143,6.0,4.0,7.0,6.0,3.0,6.0,3.0,6.0,7.0,5.0,3.0,2.0,1.9,1.8
7,London,Wimbledon,2016-06-27,almagro n.,dutra silva r.,47,386,6.0,3.0,7.0,6.0,5.0,7.0,3.0,6.0,6.0,3.0,3.0,2.0,1.12,6.0
8,London,Wimbledon,2008-06-24,almagro n.,granollers m.,12,52,4.0,6.0,6.0,3.0,7.0,5.0,6.0,2.0,0.0,0.0,3.0,1.0,1.3,3.4
9,London,Wimbledon,2011-06-23,almagro n.,isner j.,15,47,7.0,6.0,7.0,6.0,6.0,7.0,6.0,3.0,0.0,0.0,3.0,1.0,2.75,1.4


So let's get to it.


Let's see how many favorite players with odds lower than say 1.30 won the match and how many lost. The winners odds are in the 'b365w' column.

To get a clear view we need to know how many players with odds lower than 1.3 are in the winners' column(b365w) and how many are in the losers' column(b365l).

In [14]:
df.query('(b365w <= 1.3) or (b365l <= 1.3) ').b365w.count()

1104

In [15]:
df.query('(b365w <= 1.3)').b365w.count()

961

<br>When the favorite loses his odds appear in the 'b365l' column.<br><br>

In [16]:
df.query('(b365l <= 1.3)').b365l.count()

143

<br>What we see here is that out of 1104 matches the favorites won 961 matches and lost 143 ones.<br><br>

<br>How many winners' ranks are above say 900 in world ranking.<br><br>

In [17]:
df[df['wrank'] > 900].count().b365w

2

<br>and who are those...<br><br>

In [18]:
df[df['wrank'] > 900]

Unnamed: 0,location,tournament,date,winner,loser,wrank,lrank,w1,l1,w2,l2,w3,l3,wsets,lsets,b365w,b365l
1320,London,Wimbledon,26/06/07,Kiefer N.,Volandri F.,1065,27,6.0,3.0,7.0,6.0,6.0,1.0,3.0,0.0,1.16,4.5
1354,London,Wimbledon,28/06/07,Kiefer N.,Santoro F.,1065,70,6.0,4.0,6.0,3.0,6.0,4.0,3.0,0.0,2.37,1.53


<br>Say I would like to see the favorite losers where the favorite odds were between 1.4 and 1.5<br><br>

In [19]:
df.query('(b365l >= 1.4) and (b365l < 1.5)').count().b365w

51

<br>... and the total number of games within these favorite odds<br><br>

In [20]:
df.query('((b365w >= 1.4) and (b365w < 1.5)) or ((b365l >= 1.4) and (b365l < 1.5))').count().b365w

150

<br>So altogether there were 150 games within these favorite odds and the favorite won 51 matches.<br><br>

<br>Underdogs with odds higher than 4.0 winning the match. <br><br>

In [18]:
df.query('b365w >= 4').count().b365w

109

<br>And underdogs with odds higher than 4.0 losing the match. <br><br>

In [21]:
df.query('b365l >= 4').count().b365w

845

<br>Let's see how tennis aces perfom against each other. The following cell filters those games where players' world rankings were between 1 and 10 and we printed the relevant cells.<br><br>wrank: winner's rank<br>lrank: loser's rank<br>b365w: winner's odds<br>b365l: loser's odds<br><br>The first column is the row number column in our table.<br>

In [22]:
df.query('((wrank >= 1) and (wrank <= 10 )) and ((lrank >= 1) and (lrank <= 10 ) and (b365w < b365l))')[['wrank', 'lrank', 'b365w', 'b365l']],

(      wrank  lrank  b365w  b365l
 123       1      9   1.08   7.50
 126       1      2   1.83   2.00
 247       1     10   1.05   9.00
 253       1      2   1.16   5.00
 380       2      6   1.11   6.50
 505       1      2   1.11   6.50
 507       1      4   1.14   5.50
 630       1      6   1.66   2.20
 631       4     10   1.22   4.33
 633       1      4   1.57   2.37
 757       3      7   1.11   6.50
 761       1      3   1.55   2.60
 884       5      7   1.20   4.50
 888       5      6   1.22   4.50
 1011      1      9   1.14   5.50
 1014      1      4   1.50   2.62
 1136      1      6   1.10   7.00
 1140      1      8   1.16   5.00
 1268      2      9   1.10   7.00
 1269      2      7   1.25   4.33
 1382      1     10   1.01  15.00
 1395      2      5   1.25   4.00
 1396      1      2   1.14   5.50
 1519      4      3   1.28   3.75
 1522      4      9   1.40   3.00
 1523      2      4   1.61   2.40
 1646      1      4   1.36   3.20
 1900      4      5   1.44   2.75
 1903      4  

In [23]:
df.query('((wrank >= 1) and (wrank <= 10 )) and ((lrank >= 1) and (lrank <= 10 ) and (b365l < b365w))')[['wrank', 'lrank', 'b365w', 'b365l']],

(      wrank  lrank  b365w  b365l
 379       6      3   3.50   1.30
 760       3      2   2.10   1.72
 1005      9      7   2.87   1.40
 1015      2      1   2.30   1.66
 1137      8      4   2.10   1.72
 1142      2      1   2.50   1.57
 1267      7      3   2.62   1.50
 1645      8      2  10.00   1.07
 1648      8     10   2.00   1.80
 1777      2      1   2.20   1.66
 1897      5      9   2.20   1.66
 1902      3      1   2.87   1.44,)

<br>The first table shows the matches where the favorite won and the second where the favorite lost.<br>

Let's focus on very strong favorites whose booky odds are below 1.10 and see how they perform.<br>First we check how many matches they won.<br><br>

In [24]:
df.query('b365w <= 1.1').count().b365w

458

<br>Now how many matches they lost<br><br>

In [25]:
df.query('b365l <= 1.1').count().b365w

24

<br>The figures are quite impressive. We can say that strong favorites with odds lower than 1.10 win the majority of the games. <br><br>

In [26]:
df.query('b365l <= 1.1').head()

Unnamed: 0,location,tournament,date,winner,loser,wrank,lrank,w1,l1,w2,l2,w3,l3,wsets,lsets,b365w,b365l
29,London,Wimbledon,29/06/15,Ilhan M.,Janowicz J.,82,47,7.0,6.0,6.0,4.0,6.0,7.0,3.0,1.0,7.0,1.1
46,London,Wimbledon,30/06/15,Ramos-Vinolas A.,Istomin D.,65,62,6.0,2.0,6.0,2.0,3.0,2.0,2.0,0.0,7.0,1.1
223,London,Wimbledon,30/06/06,Verdasco F.,Nalbandian D.,30,3,7.0,6.0,7.0,6.0,6.0,2.0,3.0,0.0,7.0,1.08
317,London,Wimbledon,24/06/09,Gimeno-Traver D.,Dent T.,98,266,7.0,5.0,7.0,6.0,4.0,6.0,3.0,2.0,7.0,1.08
464,London,Wimbledon,23/06/05,Tursunov D.,Henman T.,152,9,3.0,6.0,6.0,2.0,3.0,6.0,3.0,2.0,7.0,1.08


<br>We can search for players by name like so<br><br>

In [27]:
name = 'Nadal'
wn = df.winner.str.contains(name)
ls = df.loser.str.contains(name)

<br>Above we created two variables 'wn' and 'ln'. 'wn' stands for winner's name and 'ln' for loser's name.<br>If we want to see how many times this player won or lost, the code is below.<br>Note: if we want to refer to a variable within a query we use the '@' character<br><br>

In [28]:
print('win: ', df.query('@wn == True').shape[0],  'times')
print('lose: ', df.query('@ls == True').shape[0], 'times')

win:  51 times
lose:  11 times


<br>The same process with Thiem.<br><br>

In [29]:
name = 'Thiem'
wn = df.winner.str.contains(name)
ls = df.loser.str.contains(name)

In [30]:
print('win: ', df.query('@wn == True').shape[0],  'times')
print('lose: ', df.query('@ls == True').shape[0], 'times')

win:  5 times
lose:  6 times


<br>We can even write a little function for this operation.<br><br>

In [31]:
def win_lose(name):
    wn = df.winner.str.contains(name)
    ls = df.loser.str.contains(name)
    
    if df.winner.str.contains(name).any():
      #  var = df.query('@mm == True').shape[0]
        print('win:  ', df.query('@wn == True').shape[0],  'times')
        
    if df.loser.str.contains(name).any():
       # var = df.query('@mm == True').shape[0]
        print('lose: ', df.query('@ls == True').shape[0],  'times')

<br>and then we only have to modify the 'name' variable.<br><br>

In [34]:
name = 'Cilic'
win_lose(name)

win:   29 times
lose:  13 times


In [35]:
name = 'Berankis'
win_lose(name)

win:   3 times
lose:  7 times


In [37]:
name = 'Djoko'
win_lose(name)

win:   72 times
lose:  10 times


<br>If you want to know who has defeated Djokovic one possible way is below:<br><br>

In [43]:
df[df.loser.str.contains('kovic') == True][['winner', 'loser', 'b365w', 'b365l']]

Unnamed: 0,winner,loser,b365w,b365l
243,ancic m.,djokovic n.,1.36,3.0
375,haas t.,djokovic n.,3.2,1.36
486,grosjean s.,djokovic n.,1.14,5.0
632,berdych t.,djokovic n.,2.0,1.8
885,berdych t.,djokovic n.,4.0,1.25
1142,murray a.,djokovic n.,2.5,1.57
1243,querrey s.,djokovic n.,26.0,1.01
1395,nadal r.,djokovic n.,1.25,4.0
1718,safin m.,djokovic n.,9.0,1.05
1902,federer r.,djokovic n.,2.87,1.44


<br>Or how the matches ended between Djokovic and Federer. One thing to note here: pandas' contains() method returns results of partial matches too. So if you do not remember the exact name of a player only a few characters will do the trick like instead of 'Djokovic' I used ' joko' only. This code still works.<br><br>

In [42]:
df[df.loser.str.contains('joko|edere') & df.winner.str.contains('joko|edere')]

Unnamed: 0,location,tournament,date,winner,loser,wrank,lrank,w1,l1,w2,l2,w3,l3,w4,l4,w5,l5,wsets,lsets,b365w,b365l
126,London,Wimbledon,2015-12-07,djokovic n.,federer r.,1,2,7.0,6.0,6.0,7.0,6.0,4.0,6.0,3.0,0.0,0.0,3.0,1.0,1.83,2.0
761,London,Wimbledon,2019-07-14,djokovic n.,federer r.,1,3,7.0,6.0,1.0,6.0,7.0,6.0,4.0,6.0,13.0,12.0,3.0,2.0,1.55,2.6
1523,London,Wimbledon,2014-06-07,djokovic n.,federer r.,2,4,6.0,7.0,6.0,4.0,7.0,6.0,5.0,7.0,6.0,4.0,3.0,2.0,1.61,2.4
1902,London,Wimbledon,2012-06-07,federer r.,djokovic n.,3,1,6.0,3.0,3.0,6.0,6.0,4.0,6.0,3.0,0.0,0.0,3.0,1.0,2.87,1.44


<br>Another way is when we sort out the df by the 'winner' and 'loser' columns so we can see who was playing whom.Many options arise here.<br><br>

In [39]:
df.sort_values(by=['winner', 'loser'], ascending=True, ignore_index=True).iloc[1500:1650, :]

Unnamed: 0,location,tournament,date,winner,loser,wrank,lrank,w1,l1,w2,l2,w3,l3,w4,l4,w5,l5,wsets,lsets,b365w,b365l
1500,London,Wimbledon,2005-06-21,rochus o.,goldstein p.,35,101,6.0,4.0,6.0,2.0,6.0,2.0,0.0,0.0,0.0,0.0,3.0,0.0,1.16,4.5
1501,London,Wimbledon,2006-06-27,rochus o.,muller g.,29,66,6.0,4.0,7.0,5.0,6.0,3.0,0.0,0.0,0.0,0.0,3.0,0.0,1.28,3.5
1502,London,Wimbledon,2008-06-23,rochus o.,sela d.,67,66,6.0,4.0,7.0,5.0,6.0,4.0,0.0,0.0,0.0,0.0,3.0,0.0,1.61,2.2
1503,London,Wimbledon,2006-06-29,rochus o.,zib t.,29,126,6.0,1.0,6.0,1.0,6.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,1.12,5.5
1504,London,Wimbledon,2005-06-25,roddick a.,andreev i.,4,42,6.0,2.0,6.0,2.0,7.0,6.0,0.0,0.0,0.0,0.0,3.0,0.0,1.06,8.0
1505,London,Wimbledon,2012-06-27,roddick a.,baker j.,25,186,7.0,6.0,6.0,4.0,7.0,5.0,0.0,0.0,0.0,0.0,3.0,0.0,1.04,13.0
1506,London,Wimbledon,2011-06-21,roddick a.,beck a.,10,156,6.0,4.0,7.0,6.0,6.0,3.0,0.0,0.0,0.0,0.0,3.0,0.0,1.02,12.0
1507,London,Wimbledon,2009-06-29,roddick a.,berdych t.,6,20,7.0,6.0,6.0,4.0,6.0,3.0,0.0,0.0,0.0,0.0,3.0,0.0,1.5,2.5
1508,London,Wimbledon,2005-06-24,roddick a.,bracciali d.,4,120,7.0,5.0,6.0,3.0,6.0,7.0,4.0,6.0,6.0,3.0,3.0,2.0,1.02,12.0
1509,London,Wimbledon,2009-06-23,roddick a.,chardy j.,6,41,6.0,3.0,7.0,6.0,4.0,6.0,6.0,3.0,0.0,0.0,3.0,1.0,1.06,8.0


<br>if you do not need that many columns...<br>

In [41]:
df.sort_values(by=['winner', 'loser'], ascending=True, ignore_index=True).iloc[600:900, :][['date', 'winner', 'b365w', 'loser', 'b365l']]

Unnamed: 0,date,winner,b365w,loser,b365l
600,2010-06-24,ferrer d.,1.22,serra f.,4.0
601,2006-06-28,ferrer d.,1.39,stadler s.,2.75
602,2008-06-23,ferrer d.,1.06,stakhovsky s.,8.0
603,2007-06-29,ferrero j.c.,4.0,blake j.,1.22
604,2005-06-20,ferrero j.c.,1.28,delgado j.,3.5
605,2009-06-27,ferrero j.c.,2.62,gonzalez f.,1.44
606,2007-06-26,ferrero j.c.,1.14,hajek j.,5.0
607,2006-06-28,ferrero j.c.,1.3,karanusic r.,3.39
608,2005-06-22,ferrero j.c.,1.38,lee h.t.,2.87
609,2005-06-25,ferrero j.c.,1.66,mayer f.,2.1


<br>Modify the code blocks as you like!