# Kamindu's Test Match Data Analysis

Kamindu Mendis is Sri Lanka's latest test protige constantly compared to Don Bradman, mostly owing to his 100+ average and lately owing to equalling Don Bradman's record of 13 innings to a 1000 test runs. But is he really as good as we think? I thought of doing an analysis to find out.

### Document
<b>Author:</b> Sahan Fernando<br>
<b>Title:</b> Kamindu's Test Match Data Analysis <br>

### Data source

<b>Author:</b> Sahan Fernando<br>
<b>Title:</b> Kamindu's test performances dataset<br>
<b>Description: </b>Compiled scorecard information of all tests that Kamindu Mendis has played since his debut on 8th July 2022 to his purple patch performances in 2024 via espncricinfo stats.

### Import libraries

In [208]:
import pandas as pd
import numpy as np

### Import the data set

In [209]:
df_read = pd.read_csv("kamindu_test_dataset.csv")
df_read.head(12)

Unnamed: 0,batting_position,batter,mode_of_dismissal,score,balls,minutes,fours,sixes,strike_rate,team_innings_total,innings,opposition,venue,host_country,date,year,sl_result,test_no
0,1.0,Pathum Nissanka,c Green b Starc,6.0,25.0,38.0,0.0,0.0,24.0,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
1,2.0,Dimuth Karunaratne (c),lbw b Swepson,86.0,165.0,246.0,10.0,0.0,52.12,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
2,3.0,Kusal Mendis,lbw b Lyon,85.0,161.0,253.0,9.0,0.0,52.79,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
3,4.0,Angelo Mathews,c Labuschagne b Starc,52.0,117.0,163.0,4.0,0.0,44.44,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
4,5.0,Dinesh Chandimal,not out,206.0,326.0,545.0,16.0,5.0,63.19,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
5,6.0,Kamindu Mendis,b Swepson,61.0,137.0,197.0,7.0,0.0,44.52,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
6,7.0,Niroshan Dickwella †,c Cummins b Lyon,5.0,13.0,17.0,0.0,0.0,38.46,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
7,8.0,Ramesh Mendis,lbw b Starc,29.0,98.0,118.0,1.0,0.0,29.59,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
8,9.0,Maheesh Theekshana,b Cummins,10.0,27.0,40.0,2.0,0.0,37.03,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
9,10.0,Prabath Jayasuriya,b Starc,0.0,9.0,16.0,0.0,0.0,0.0,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471


## Data exploration and cleaning

In [210]:
df_read.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 192 entries, 0 to 191
Data columns (total 18 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   batting_position    176 non-null    float64
 1   batter              192 non-null    object 
 2   mode_of_dismissal   176 non-null    object 
 3   score               155 non-null    float64
 4   balls               141 non-null    float64
 5   minutes             141 non-null    float64
 6   fours               141 non-null    float64
 7   sixes               141 non-null    float64
 8   strike_rate         140 non-null    float64
 9   team_innings_total  168 non-null    float64
 10  innings             192 non-null    int64  
 11  opposition          192 non-null    object 
 12  venue               192 non-null    object 
 13  host_country        192 non-null    object 
 14  date                192 non-null    object 
 15  year                192 non-null    int64  
 16  sl_resul

In [211]:
#df_read =df_read.astype({'batting_position':'int','score':'int','balls':'int','minutes':'int','fours':'int','sixes':'int'})
#df_read.info()

In [212]:
#df_read.head()

In [213]:
df_read.describe()

Unnamed: 0,batting_position,score,balls,minutes,fours,sixes,strike_rate,team_innings_total,innings,year,test_no
count,176.0,155.0,141.0,141.0,141.0,141.0,140.0,168.0,192.0,192.0,192.0
mean,6.0,30.245161,56.148936,83.765957,3.35461,0.312057,48.011714,334.857143,2.375,2023.75,2535.375
std,3.1713,38.256371,61.601477,93.642301,4.028709,0.934534,31.280143,134.172571,1.113929,0.663167,24.919241
min,1.0,0.0,0.0,2.0,0.0,0.0,0.0,157.0,1.0,2022.0,2471.0
25%,3.0,4.0,12.0,19.0,0.0,0.0,25.0,236.0,1.0,2024.0,2536.75
50%,6.0,13.0,26.0,41.0,2.0,0.0,50.0,298.5,2.5,2024.0,2545.5
75%,9.0,48.0,91.0,135.0,6.0,0.0,67.2925,418.0,3.0,2024.0,2548.25
max,11.0,206.0,326.0,545.0,16.0,6.0,135.71,602.0,4.0,2024.0,2551.0


In [214]:
#remove captain "(c)" and wicket keeper "†" symbols from players since captaincy change and wicket keeper change (with the player being in the squad)
#will adversely affect our aggregation
df= df_read
df['batter'] = df_read['batter'].replace({'\(c\)':'','†':''},regex=True)
df.head(12)


Unnamed: 0,batting_position,batter,mode_of_dismissal,score,balls,minutes,fours,sixes,strike_rate,team_innings_total,innings,opposition,venue,host_country,date,year,sl_result,test_no
0,1.0,Pathum Nissanka,c Green b Starc,6.0,25.0,38.0,0.0,0.0,24.0,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
1,2.0,Dimuth Karunaratne,lbw b Swepson,86.0,165.0,246.0,10.0,0.0,52.12,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
2,3.0,Kusal Mendis,lbw b Lyon,85.0,161.0,253.0,9.0,0.0,52.79,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
3,4.0,Angelo Mathews,c Labuschagne b Starc,52.0,117.0,163.0,4.0,0.0,44.44,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
4,5.0,Dinesh Chandimal,not out,206.0,326.0,545.0,16.0,5.0,63.19,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
5,6.0,Kamindu Mendis,b Swepson,61.0,137.0,197.0,7.0,0.0,44.52,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
6,7.0,Niroshan Dickwella,c Cummins b Lyon,5.0,13.0,17.0,0.0,0.0,38.46,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
7,8.0,Ramesh Mendis,lbw b Starc,29.0,98.0,118.0,1.0,0.0,29.59,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
8,9.0,Maheesh Theekshana,b Cummins,10.0,27.0,40.0,2.0,0.0,37.03,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
9,10.0,Prabath Jayasuriya,b Starc,0.0,9.0,16.0,0.0,0.0,0.0,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471


In [215]:
#There seems to be spaces at the end of the names of players. Applying the strip method to remove unwanted spaces

df['batter']= df['batter'].apply(lambda x:x.strip())
df.head(12)

Unnamed: 0,batting_position,batter,mode_of_dismissal,score,balls,minutes,fours,sixes,strike_rate,team_innings_total,innings,opposition,venue,host_country,date,year,sl_result,test_no
0,1.0,Pathum Nissanka,c Green b Starc,6.0,25.0,38.0,0.0,0.0,24.0,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
1,2.0,Dimuth Karunaratne,lbw b Swepson,86.0,165.0,246.0,10.0,0.0,52.12,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
2,3.0,Kusal Mendis,lbw b Lyon,85.0,161.0,253.0,9.0,0.0,52.79,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
3,4.0,Angelo Mathews,c Labuschagne b Starc,52.0,117.0,163.0,4.0,0.0,44.44,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
4,5.0,Dinesh Chandimal,not out,206.0,326.0,545.0,16.0,5.0,63.19,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
5,6.0,Kamindu Mendis,b Swepson,61.0,137.0,197.0,7.0,0.0,44.52,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
6,7.0,Niroshan Dickwella,c Cummins b Lyon,5.0,13.0,17.0,0.0,0.0,38.46,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
7,8.0,Ramesh Mendis,lbw b Starc,29.0,98.0,118.0,1.0,0.0,29.59,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
8,9.0,Maheesh Theekshana,b Cummins,10.0,27.0,40.0,2.0,0.0,37.03,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
9,10.0,Prabath Jayasuriya,b Starc,0.0,9.0,16.0,0.0,0.0,0.0,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471


In [216]:
#remove the "Extras" rows as it is not required for our analysis

df.drop(df[df.batter=="Extras"].index,inplace=True)
df.head()

Unnamed: 0,batting_position,batter,mode_of_dismissal,score,balls,minutes,fours,sixes,strike_rate,team_innings_total,innings,opposition,venue,host_country,date,year,sl_result,test_no
0,1.0,Pathum Nissanka,c Green b Starc,6.0,25.0,38.0,0.0,0.0,24.0,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
1,2.0,Dimuth Karunaratne,lbw b Swepson,86.0,165.0,246.0,10.0,0.0,52.12,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
2,3.0,Kusal Mendis,lbw b Lyon,85.0,161.0,253.0,9.0,0.0,52.79,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
3,4.0,Angelo Mathews,c Labuschagne b Starc,52.0,117.0,163.0,4.0,0.0,44.44,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
4,5.0,Dinesh Chandimal,not out,206.0,326.0,545.0,16.0,5.0,63.19,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471


In [217]:
#Filter out just Kamindu's records
kamindu = df[(df['batter']=='Kamindu Mendis')]
kamindu.head()

Unnamed: 0,batting_position,batter,mode_of_dismissal,score,balls,minutes,fours,sixes,strike_rate,team_innings_total,innings,opposition,venue,host_country,date,year,sl_result,test_no
5,6.0,Kamindu Mendis,b Swepson,61.0,137.0,197.0,7.0,0.0,44.52,554.0,2,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
17,6.0,Kamindu Mendis,DNB,,,,,,,,4,Australia,Galle,Sri Lanka,8-Jul,2022,win,2471
30,7.0,Kamindu Mendis,c †Litton Das b Nahid Rana,102.0,127.0,176.0,11.0,3.0,80.31,280.0,1,Bangladesh,Sylhet,Bangladesh,22-Mar,2024,win,2536
43,8.0,Kamindu Mendis,c Mehidy Hasan Miraz b Taijul Islam,164.0,237.0,300.0,16.0,6.0,69.19,418.0,3,Bangladesh,Sylhet,Bangladesh,22-Mar,2024,win,2536
54,7.0,Kamindu Mendis,not out,92.0,167.0,236.0,7.0,2.0,55.08,531.0,1,Bangladesh,Chattogram,Bangladesh,30-Mar,2024,win,2537


In [218]:
kamindu.groupby('opposition')['test_no'].count()

opposition
Australia      2
Bangladesh     4
England        6
New Zealand    4
Name: test_no, dtype: int64

In [219]:
kamindu.groupby('opposition').mean()

Unnamed: 0_level_0,batting_position,score,balls,minutes,fours,sixes,strike_rate,team_innings_total,innings,year,test_no
opposition,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Australia,6.0,61.0,137.0,197.0,7.0,0.0,44.52,554.0,3.0,2022.0,2471.0
Bangladesh,7.25,91.75,137.0,183.25,9.0,2.75,64.38,346.5,2.0,2024.0,2536.5
England,7.166667,53.4,84.8,121.4,6.2,0.8,64.344,255.333333,2.666667,2024.0,2546.333333
New Zealand,5.0,103.0,145.0,229.333333,10.0,1.333333,82.34,405.333333,2.0,2024.0,2550.0


In [220]:
df.groupby('opposition').mean()

Unnamed: 0_level_0,batting_position,score,balls,minutes,fours,sixes,strike_rate,team_innings_total,innings,year,test_no
opposition,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Australia,6.0,49.090909,98.818182,151.545455,4.454545,0.454545,35.103636,554.0,3.0,2022.0,2471.0
Bangladesh,6.0,31.833333,54.095238,80.380952,3.261905,0.52381,49.721905,346.5,2.0,2024.0,2536.5
England,6.0,24.576271,41.508475,64.237288,2.949153,0.152542,49.826552,255.333333,2.666667,2024.0,2546.333333
New Zealand,6.0,39.758621,72.724138,102.689655,3.896552,0.275862,46.801379,405.333333,2.0,2024.0,2550.0


In [221]:
df[(df['batting_position']<8)&(df['batter']!='Kamindu Mendis')].groupby('opposition').mean()

Unnamed: 0_level_0,batting_position,score,balls,minutes,fours,sixes,strike_rate,team_innings_total,innings,year,test_no
opposition,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Australia,3.666667,73.333333,134.5,210.333333,6.5,0.833333,45.833333,554.0,3.0,2022.0,2471.0
Bangladesh,3.64,33.76,55.72,85.16,3.48,0.44,58.3008,349.36,2.04,2024.0,2536.48
England,3.594595,27.685714,44.228571,71.714286,3.342857,0.085714,51.180857,256.324324,2.702703,2024.0,2546.324324
New Zealand,3.833333,45.333333,86.166667,117.444444,4.388889,0.222222,51.681667,405.333333,2.0,2024.0,2550.0


In [222]:
#make a dataframe with just 2024 data
df_2024 = df[(df['year']==2024)]
df_2024.head()

Unnamed: 0,batting_position,batter,mode_of_dismissal,score,balls,minutes,fours,sixes,strike_rate,team_innings_total,innings,opposition,venue,host_country,date,year,sl_result,test_no
24,1.0,Nishan Madushka,c Mehidy Hasan Miraz b Khaled Ahmed,2.0,9.0,9.0,0.0,0.0,22.22,280.0,1,Bangladesh,Sylhet,Bangladesh,22-Mar,2024,win,2536
25,2.0,Dimuth Karunaratne,b Khaled Ahmed,17.0,37.0,65.0,1.0,0.0,45.94,280.0,1,Bangladesh,Sylhet,Bangladesh,22-Mar,2024,win,2536
26,3.0,Kusal Mendis,c Zakir Hasan b Khaled Ahmed,16.0,26.0,48.0,2.0,0.0,61.53,280.0,1,Bangladesh,Sylhet,Bangladesh,22-Mar,2024,win,2536
27,4.0,Angelo Mathews,run out (Najmul Hossain Shanto),5.0,7.0,14.0,1.0,0.0,71.42,280.0,1,Bangladesh,Sylhet,Bangladesh,22-Mar,2024,win,2536
28,5.0,Dinesh Chandimal,c Mehidy Hasan Miraz b Shoriful Islam,9.0,13.0,24.0,0.0,0.0,69.23,280.0,1,Bangladesh,Sylhet,Bangladesh,22-Mar,2024,win,2536


## Comparison with the current Sri Lanka squad in 2024

In [223]:
df[(df['year']==2024)]['score'].sum()

3940.0

In [224]:
df[(df['batter']=='Kamindu Mendis')&(df['year']==2024)]['score'].sum()

943.0

In [225]:
(df[(df['batter']=='Kamindu Mendis')&(df['year']==2024)]['score'].sum()/df[(df['year']==2024)]['score'].sum())*100

23.934010152284262

In [226]:
#creating a dataframe with aggregate scores for individual batters in 2024
df_batter_score = df_2024[['batter','score','balls','minutes','fours','sixes']]

df_batter_score= df_batter_score.groupby('batter').sum().sort_values('score',ascending=False).reset_index()
df_batter_score.head()

Unnamed: 0,batter,score,balls,minutes,fours,sixes
0,Kamindu Mendis,943.0,1407.0,2028.0,97.0,19.0
1,Dhananjaya de Silva,580.0,903.0,1303.0,63.0,6.0
2,Dinesh Chandimal,461.0,813.0,1342.0,54.0,2.0
3,Angelo Mathews,438.0,937.0,1434.0,37.0,2.0
4,Dimuth Karunaratne,398.0,758.0,1218.0,41.0,2.0


In [227]:
#Percentage of runs scored by Kamindu from the total runs scored by Sri Lanka in 2024
(df_batter_score[df_batter_score['batter']=='Kamindu Mendis']['score'].iloc[0])*100/(df[(df['year']==2024)]['score'].sum())

23.934010152284262

In [228]:
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

for index, row in df.iterrows():
    print(f"Index: {index}, Row: {row['A'], row['B']}")
    print(row['A'])
    

Index: 0, Row: (1, 4)
1
Index: 1, Row: (2, 5)
2
Index: 2, Row: (3, 6)
3


In [229]:
innings_dict = {}

for index, row in df_batter_score.iterrows():
    innings_count = 0
    batter = row['batter']
    #print(batter)

    for index, row in df_2024.iterrows():
        if (row['batter']==batter) and (row['mode_of_dismissal'].strip() !='not out' and row['mode_of_dismissal'].strip() !='DNB'):
            innings_count += 1
            
    innings_dict[batter]=innings_count
    
    
print(innings_dict)
            
            

{'Kamindu Mendis': 10, 'Dhananjaya de Silva': 12, 'Dinesh Chandimal': 12, 'Angelo Mathews': 12, 'Dimuth Karunaratne': 13, 'Kusal Mendis': 10, 'Pathum Nissanka': 6, 'Milan Rathnayake': 5, 'Nishan Madushka': 8, 'Prabath Jayasuriya': 9, 'Vishwa Fernando': 6, 'Lahiru Kumara': 6, 'Ramesh Mendis': 2, 'Asitha Fernando': 3, 'Kasun Rajitha': 0, 'Nishan Peiris': 0}


In [230]:
df_batter_score['innings'] = df_batter_score['batter'].map(innings_dict)
df_batter_score.head()

Unnamed: 0,batter,score,balls,minutes,fours,sixes,innings
0,Kamindu Mendis,943.0,1407.0,2028.0,97.0,19.0,10
1,Dhananjaya de Silva,580.0,903.0,1303.0,63.0,6.0,12
2,Dinesh Chandimal,461.0,813.0,1342.0,54.0,2.0,12
3,Angelo Mathews,438.0,937.0,1434.0,37.0,2.0,12
4,Dimuth Karunaratne,398.0,758.0,1218.0,41.0,2.0,13


In [231]:
df_batter_score['batting_average'] = df_batter_score['score']/df_batter_score['innings']
df_batter_score.head()

Unnamed: 0,batter,score,balls,minutes,fours,sixes,innings,batting_average
0,Kamindu Mendis,943.0,1407.0,2028.0,97.0,19.0,10,94.3
1,Dhananjaya de Silva,580.0,903.0,1303.0,63.0,6.0,12,48.333333
2,Dinesh Chandimal,461.0,813.0,1342.0,54.0,2.0,12,38.416667
3,Angelo Mathews,438.0,937.0,1434.0,37.0,2.0,12,36.5
4,Dimuth Karunaratne,398.0,758.0,1218.0,41.0,2.0,13,30.615385


In [232]:
df_batter_score.to_csv("kamindu_SL_batters_comparison.csv",index=False)

## Comparison with The modern day "big 4"

Now that we have compared Kamindu to the current Sri Lankan players, we shall move on to comparing him to the modern day (and all time) greats 

1. Virat Kohli (India)
2. Joe Root (England)
3. Kane Williamson (New Zealand)
4. Steven Smith (Australia)

We shall see how Kamindu's stats compare to their all time best test year as well as their first 8 tests.

In [233]:
match_innings_list = []
agg_score_list = []
agg_average_list = []
agg_innings_list = []
batter_list = []

match_innings = 0
innings_count = 0
aggregate_score = 0
aggregate_average = 0


for index, row in kamindu.iterrows():
    match_innings += 1
    batter = 'Kamindu Mendis'
    if (row['mode_of_dismissal'].strip() !='DNB'):
        if(row['mode_of_dismissal'].strip() !='not out'):
            innings_count += 1
            aggregate_score = aggregate_score+row['score']
            if innings_count>0:
                aggregate_average = aggregate_score/innings_count
        
        else:
            aggregate_score = aggregate_score+row['score']
            if innings_count>0:
                aggregate_average = aggregate_score/innings_count
    
    batter_list.append(batter)
    match_innings_list.append(match_innings)
    agg_score_list.append(aggregate_score)
    agg_average_list.append(aggregate_average)
    agg_innings_list.append(innings_count) 
    
    

            
kamindu_aggregate_dict = {'batter':batter_list,'match_innings':match_innings_list, 'aggregate_scores':agg_score_list, 'aggregate_average':agg_average_list, 'aggregate_innings':agg_innings_list}
batter_aggregate_stats = pd.DataFrame(kamindu_aggregate_dict)
batter_aggregate_stats.head()

Unnamed: 0,batter,match_innings,aggregate_scores,aggregate_average,aggregate_innings
0,Kamindu Mendis,1,61.0,61.0,1
1,Kamindu Mendis,2,61.0,61.0,1
2,Kamindu Mendis,3,163.0,81.5,2
3,Kamindu Mendis,4,327.0,109.0,3
4,Kamindu Mendis,5,419.0,139.666667,3


In [234]:
def batter_aggregate_stats_calc(batter_name,score_list):
    match_innings_list = []
    agg_score_list = []
    agg_average_list = []
    agg_innings_list = []
    batter_list = []

    match_innings = 0
    innings_count = 0
    aggregate_score = 0
    aggregate_average = 0


    for row in score_list:
        match_innings += 1
        batter = batter_name
        if (isinstance(row,str)):
            #nothing to be done for 'DNB'
            #if a batter is not out it will be displayed as 8*
            if(row=='DNB'):
                aggregate_score = aggregate_score
                aggregate_average = aggregate_average
                #continue
            elif('*' in row):
                score=int(row.split('*')[0])
                aggregate_score = aggregate_score+score
                if innings_count>0:
                    aggregate_average = aggregate_score/innings_count
            else:
                print("Issue with score input: "+row)
                break
            
        elif (isinstance(row,int) or isinstance(row,float)):
            innings_count += 1
            #print('score: '+str(row)+' innings count: '+str(innings_count)+' match count: '+str(match_innings))
            aggregate_score = aggregate_score+row
            if innings_count>0:
                aggregate_average = aggregate_score/innings_count
                
        else:
            print("Issue with score input")
            break
            


        batter_list.append(batter_name)
        match_innings_list.append(match_innings)
        agg_score_list.append(aggregate_score)
        agg_average_list.append(aggregate_average)
        agg_innings_list.append(innings_count)
        
    
    batter_aggregate_dict = {'batter':batter_list,'match_innings':match_innings_list, 'aggregate_scores':agg_score_list, 'aggregate_average':agg_average_list, 'aggregate_innings':agg_innings_list}
    batter_aggregate_stats = pd.DataFrame(batter_aggregate_dict)
    return batter_aggregate_stats


In [235]:
#Virat's best test year (2016) -> 1216 runs at an average of 75.93 (2018 scored more runs but at a low average)
#virat's first 8 test matches
virat_score_list = [4,15,0,27,30,'DNB',52,63,11,0,23,9,44,75,116,22]
#Creating a dataframe for Virat for aggregate stats of his first 8 test matches
virat_aggregate_stats = batter_aggregate_stats_calc('Virat Kohli',virat_score_list)
virat_aggregate_stats.head()


Unnamed: 0,batter,match_innings,aggregate_scores,aggregate_average,aggregate_innings
0,Virat Kohli,1,4,4.0,1
1,Virat Kohli,2,19,9.5,2
2,Virat Kohli,3,19,6.333333,3
3,Virat Kohli,4,46,11.5,4
4,Virat Kohli,5,76,15.2,5


In [236]:
#Joe Root's best test year -> 1216 runs at an average of 75.93
root_score_list = [73,'20*',4,0,10,'DNB',45,29,40,71,104,28,30,5,6,180]
#Creating a dataframe for Joe Root for aggregate stats of his first 8 test matches
root_aggregate_stats = batter_aggregate_stats_calc('Joe Root',root_score_list)
root_aggregate_stats.head()


Unnamed: 0,batter,match_innings,aggregate_scores,aggregate_average,aggregate_innings
0,Joe Root,1,73,73.0,1
1,Joe Root,2,93,93.0,1
2,Joe Root,3,97,48.5,2
3,Joe Root,4,97,32.333333,3
4,Joe Root,5,107,26.75,4


In [57]:
batter_aggregate_stats.to_csv("kamindu_aggregate_stats.csv",index=False)

## Comparison with Donald Bradman himself

It would be remiss if we didn't compare Kamindu to the man whose record he equalled (joint 3rd fastest to 1000 test runs). While this comparison maybe unfair owing to how far the cricket landscape has changed, it does bode for an interesting comparison.