# Test cricket analysis

Submitted by:
<br>Nabila Tajrin Bristy
<br>IIT, University of Dhaka
<br>Dhaka, Bangladesh

#### Objective
This project analyzes the data of a test cricket file.

#### Variables
Player, Span, Mat, Inns, NO, Runs, HS, Ave,	100, 50, 0.

#### Tasks:
##### Part 01
1. Import the excel file in your jupyter notebook.<br>
2. Display the first 10 rows of the dataframe.<br>
3. Create a markdown cell and explain the meaning of each column.<br>
4. Find the number of rows and columns in the dataframe.<br>
5. Find the data statistics and check for the data types.<br>
6. Are there any missing values present in the dataset?<br>
7. Rename the column names appropriately.<br>
8. Remove a column from the dataframe.<br>
##### Part 02
9. Remove the columns BBI and BBM.<br>
10. How many players played for ICC?<br>
11. How many different countries are present in this dataset?<br>
12. Which player(s) had played for the longest period of time?<br>
13. Which player(s) had played for the shortest period of time?<br>
14. How many Australian Bowlers are present in this dataset?<br>
15. Is there any Bangladeshi player present in this dataset?<br>
16. Which player had the lowest economy rate?<br>
17. Which player had the lowest strike rate?<br>
18. Which player had the lowest bowling average?<br>

#### References:
- Dataset source: https://stats.espncricinfo.com/ci/content/records/93276.html

### Import required libraries and packages

In [7]:
import pandas as pd
import numpy as np

### 1. Reading excel file

In [21]:
# read the ford.csv file 
df = pd.read_excel('test_cricket.xlsx', sheet_name = 'runs')
display(df)

Unnamed: 0,Player,Span,Mat,Inns,NO,Runs,HS,Ave,100,50,0
0,SR Tendulkar (INDIA),1989-2013,200,329,33,15921,248*,53.78,51,68,14
1,RT Ponting (AUS),1995-2012,168,287,29,13378,257,51.85,41,62,17
2,JH Kallis (ICC/SA),1995-2013,166,280,40,13289,224,55.37,45,58,16
3,R Dravid (ICC/INDIA),1996-2012,164,286,32,13288,270,52.31,36,63,8
4,AN Cook (ENG),2006-2018,161,291,16,12472,294,45.35,33,57,9
...,...,...,...,...,...,...,...,...,...,...,...
92,IT Botham (ENG),1977-1992,102,161,6,5200,208,33.54,14,22,14
93,FDM Karunaratne (SL),2012-2021,72,139,5,5176,244,38.62,12,26,12
94,JH Edrich (ENG),1963-1976,77,127,9,5138,310*,43.54,12,24,6
95,A Ranatunga (SL),1982-2000,93,155,12,5105,135*,35.69,4,38,12


### 2. Display the first 10 rows of the dataframe

In [9]:
display(df.head(10))

Unnamed: 0,Player,Span,Mat,Inns,NO,Runs,HS,Ave,100,50,0
0,SR Tendulkar (INDIA),1989-2013,200,329,33,15921,248*,53.78,51,68,14
1,RT Ponting (AUS),1995-2012,168,287,29,13378,257,51.85,41,62,17
2,JH Kallis (ICC/SA),1995-2013,166,280,40,13289,224,55.37,45,58,16
3,R Dravid (ICC/INDIA),1996-2012,164,286,32,13288,270,52.31,36,63,8
4,AN Cook (ENG),2006-2018,161,291,16,12472,294,45.35,33,57,9
5,KC Sangakkara (SL),2000-2015,134,233,17,12400,319,57.4,38,52,11
6,BC Lara (ICC/WI),1990-2006,131,232,6,11953,400*,52.88,34,48,17
7,S Chanderpaul (WI),1994-2015,164,280,49,11867,203*,51.37,30,66,15
8,DPMD Jayawardene (SL),1997-2014,149,252,15,11814,374,49.84,34,50,15
9,AR Border (AUS),1978-1994,156,265,44,11174,205,50.56,27,63,11


### 3. Create a markdown cell and explain the meaning of each column

#### Features:
-Player: Name of the player and country or team he played for.<br>
-Span: Duration of the time a player played for the team.<br>
-Mat: Number of the match a player played.<br>
-Inns: Number of innings that is played by the player.<br>
-NO: Number of matches a player was not out.<br>
-Runs: Total runs of the player's test career.<br>
-HS: High score of the player.<br>
-Ave: Average run of a player.<br>
-100: Number of times a player made a century.<br>
-50: Number of times a player made a half-century.<br>
-0: Number of times a player was duck.

In [10]:
#Checking for data types of each columns
display(df.dtypes)

Player     object
Span       object
Mat         int64
Inns        int64
NO          int64
Runs        int64
HS         object
Ave       float64
100         int64
50          int64
0           int64
dtype: object

In [11]:
#Checking for missing values
display(df.isnull().sum())

Player    0
Span      0
Mat       0
Inns      0
NO        0
Runs      0
HS        0
Ave       0
100       0
50        0
0         0
dtype: int64

### 4. Number of rows and columns in the dataframe

In [12]:
print('Rows, Columns:', df.shape)
#or, do it separately
print("Number of rows:", df.shape[0])
print("Number of columns:", df.shape[1])

Rows, Columns: (97, 11)
Number of rows: 97
Number of columns: 11


### 5. Data statistics and data types

In [13]:
#Data statistics
df.describe()

Unnamed: 0,Mat,Inns,NO,Runs,Ave,100,50,0
count,97.0,97.0,97.0,97.0,97.0,97.0,97.0,97.0
mean,104.979381,178.752577,16.051546,7574.175258,46.781031,20.546392,35.474227,11.329897
std,27.064729,44.963418,8.754012,2224.255278,8.168268,8.226001,11.499178,4.147594
min,52.0,80.0,5.0,5062.0,30.3,4.0,13.0,2.0
25%,86.0,146.0,10.0,5825.0,42.29,15.0,27.0,9.0
50%,102.0,176.0,15.0,7214.0,45.84,19.0,33.0,11.0
75%,117.0,200.0,20.0,8540.0,50.66,24.0,42.0,14.0
max,200.0,329.0,49.0,15921.0,99.94,51.0,68.0,22.0


In [14]:
#Data types
df.dtypes

Player     object
Span       object
Mat         int64
Inns        int64
NO          int64
Runs        int64
HS         object
Ave       float64
100         int64
50          int64
0           int64
dtype: object

### 6. Checking missing values in the dataset

In [15]:
df.isnull().sum()

Player    0
Span      0
Mat       0
Inns      0
NO        0
Runs      0
HS        0
Ave       0
100       0
50        0
0         0
dtype: int64

### 7. Renaming the column names appropriately

In [16]:
#Present column names
df.head()

Unnamed: 0,Player,Span,Mat,Inns,NO,Runs,HS,Ave,100,50,0
0,SR Tendulkar (INDIA),1989-2013,200,329,33,15921,248*,53.78,51,68,14
1,RT Ponting (AUS),1995-2012,168,287,29,13378,257,51.85,41,62,17
2,JH Kallis (ICC/SA),1995-2013,166,280,40,13289,224,55.37,45,58,16
3,R Dravid (ICC/INDIA),1996-2012,164,286,32,13288,270,52.31,36,63,8
4,AN Cook (ENG),2006-2018,161,291,16,12472,294,45.35,33,57,9


In [17]:
renamed_cols = {'Mat': 'Match', 'Inns': 'Innings', 'NO': 'Not_out', 'HS': 'High_score', 'Ave': 'Average', 100: 'Centuries', 50: 'Half_centuries', 0: 'Ducks'}
df.rename(columns = renamed_cols, inplace = True)

In [18]:
#Updated column names
df.head()

Unnamed: 0,Player,Span,Match,Innings,Not_out,Runs,High_score,Average,Centuries,Half_centuries,Ducks
0,SR Tendulkar (INDIA),1989-2013,200,329,33,15921,248*,53.78,51,68,14
1,RT Ponting (AUS),1995-2012,168,287,29,13378,257,51.85,41,62,17
2,JH Kallis (ICC/SA),1995-2013,166,280,40,13289,224,55.37,45,58,16
3,R Dravid (ICC/INDIA),1996-2012,164,286,32,13288,270,52.31,36,63,8
4,AN Cook (ENG),2006-2018,161,291,16,12472,294,45.35,33,57,9


### 8. Removing a column from the dataframe

In [19]:
df.drop('Ducks', axis = 1, inplace = True)

In [20]:
#Updated column names
df.head()

Unnamed: 0,Player,Span,Match,Innings,Not_out,Runs,High_score,Average,Centuries,Half_centuries
0,SR Tendulkar (INDIA),1989-2013,200,329,33,15921,248*,53.78,51,68
1,RT Ponting (AUS),1995-2012,168,287,29,13378,257,51.85,41,62
2,JH Kallis (ICC/SA),1995-2013,166,280,40,13289,224,55.37,45,58
3,R Dravid (ICC/INDIA),1996-2012,164,286,32,13288,270,52.31,36,63
4,AN Cook (ENG),2006-2018,161,291,16,12472,294,45.35,33,57


## Part 02
### 9. Remove the columns BBI and BBM

In [22]:
file = df.copy()

### 10. How many players played for ICC?

In [23]:
print("Number of players who played for ICC:", df.shape[0])

Number of players who played for ICC: 97
