## Importing Libraries

In [2]:
import numpy as np
import pandas as pd

## Definition of the Data


Virat Kohli is one of the most famous cricketers in the world. Here you are given a dataset of all the ODI matches played by Virat Kohli from 18 August 2008 to 22 January 2017. You are required to analyze the performance of Virat Kohli in ODI matches.

1. runs: Number of runs scored by the Virat in a match.
2. NotOut: Indicates whether the Virat was not out in the innings (1 for not out, 0 for out).
3. mins: Number of minutes the Virat spent batting in the innings.
4. bf: Balls faced by the Virat during their innings.
5. fours: Number of boundaries (fours) hit by the Virat.
6. sixes: Number of sixes hit by the Virat.
7. sr: Strike rate of the Virat in that match, calculated as (runs/bf)*100.
8. Inns: Indicates the innings in which the Virat batted (1 for first innings, 2 for second innings).
9. Opp: The opposing team in that match.
10. Ground: The location where the match was played.
11. Date: The date on which the match was played.as played.match

## Important date concept

In [406]:
df['Date'].max()

'2020-02-11'

In [409]:
df['Date'].min()

'2008-08-18'

In [402]:
df['Date'] = df['Date'].dt.strftime('%Y-%m-%d')

In [388]:
df['Date'] = pd.to_datetime(df['Date'], format='%d-%b-%y')

## Reading the dataset

In [7]:
df = pd.read_csv('Virat_ODI.csv')

In [9]:
df.head()

Unnamed: 0,runs,NotOut,mins,bf,fours,sixes,sr,Inns,Opp,Ground,Date
0,183,0,211,148,22,1,123.64,2,Pakistan,Dhaka,18-Mar-12
1,160,1,220,159,12,2,100.62,1,South Africa,Cape Town,07-Feb-18
2,157,1,217,129,13,4,121.7,1,West Indies,Visakhapatnam,24-Oct-18
3,154,1,202,134,16,1,114.92,2,New Zealand,Mohali,23-Oct-16
4,140,0,0,107,21,2,130.84,2,West Indies,Guwahati,21-Oct-18


In [11]:
df.tail()

Unnamed: 0,runs,NotOut,mins,bf,fours,sixes,sr,Inns,Opp,Ground,Date
233,0,0,7,5,0,0,0.0,2,South Africa,Durban,08-Dec-13
234,0,0,1,3,0,0,0.0,1,England,Cardiff,27-Aug-14
235,0,0,5,5,0,0,0.0,1,Sri Lanka,The Oval,08-Jun-17
236,0,0,7,4,0,0,0.0,1,Australia,Chennai,17-Sep-17
237,0,0,0,1,0,0,0.0,1,West Indies,Visakhapatnam,18-Dec-19


In [13]:
df.columns

Index(['runs', 'NotOut', 'mins', 'bf', 'fours', 'sixes', 'sr', 'Inns', 'Opp',
       'Ground', 'Date'],
      dtype='object')

In [17]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 238 entries, 0 to 237
Data columns (total 11 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   runs    238 non-null    int64  
 1   NotOut  238 non-null    int64  
 2   mins    238 non-null    int64  
 3   bf      238 non-null    int64  
 4   fours   238 non-null    int64  
 5   sixes   238 non-null    int64  
 6   sr      238 non-null    float64
 7   Inns    238 non-null    int64  
 8   Opp     238 non-null    object 
 9   Ground  238 non-null    object 
 10  Date    238 non-null    object 
dtypes: float64(1), int64(7), object(3)
memory usage: 20.6+ KB


In [27]:
df.describe()

Unnamed: 0,runs,NotOut,mins,bf,fours,sixes,sr,Inns
count,238.0,238.0,238.0,238.0,238.0,238.0,238.0,238.0
mean,49.861345,0.163866,63.617647,53.470588,4.689076,0.508403,80.319286,1.563025
std,43.123531,0.370934,59.129701,39.314339,4.667416,0.988293,35.869022,0.497057
min,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0
25%,11.0,0.0,10.0,18.0,1.0,0.0,62.9125,1.0
50%,37.0,0.0,44.5,48.5,3.5,0.0,81.86,2.0
75%,81.75,0.0,109.75,83.0,7.0,1.0,100.0,2.0
max,183.0,1.0,220.0,159.0,22.0,7.0,209.09,2.0


In [29]:
df.describe(include='object')

Unnamed: 0,Opp,Ground,Date
count,238,238,238
unique,13,60,238
top,Sri Lanka,Dhaka,18-Mar-12
freq,46,13,1


In [21]:
df.select_dtypes(include=np.number).mean()

runs      49.861345
NotOut     0.163866
mins      63.617647
bf        53.470588
fours      4.689076
sixes      0.508403
sr        80.319286
Inns       1.563025
dtype: float64

In [23]:
df.select_dtypes(include=np.number).median()

runs      37.00
NotOut     0.00
mins      44.50
bf        48.50
fours      3.50
sixes      0.00
sr        81.86
Inns       2.00
dtype: float64

In [25]:
df.isnull().sum()

runs      0
NotOut    0
mins      0
bf        0
fours     0
sixes     0
sr        0
Inns      0
Opp       0
Ground    0
Date      0
dtype: int64

## Filtering the data

1. What is the average number of runs scored by the player?
2. How many matches has the player remained not out?
3. Which opponent has the player scored the most runs against?
4. What is the most common ground where the player has played?
5. What is the player's highest strike rate in a match?
6. How many centuries (100+ runs) has the player scored?
7. What is the average strike rate across all matches?
8. What is the maximum number of fours hit in a single match?
9. What is the total number of sixes hit by the player?
10. How many times has the player scored more than 150 runs in a match?


### 1. What is the average number of runs scored by Virat?

In [303]:
print('Average of Virat is',df[['runs']].mean())

Average of Virat is runs    49.861345
dtype: float64


### 2.How many matches has the player remained not out?

In [306]:
Total_NO = pd.DataFrame(df[df['NotOut']==1])
print('Total Matches Virat finishes Not Out is:',Total_NO['NotOut'].count())
#Total_NO

Total Matches Virat finishes Not Out is: 39


### 3. Which opponent has the virat scored the most runs against?

In [309]:
df1 = df.groupby('Opp')

In [311]:
df1[['runs']].sum()

Unnamed: 0_level_0,runs
Opp,Unnamed: 1_level_1
Afghanistan,67
Australia,1910
Bangladesh,680
England,1178
Ireland,78
Netherlands,12
New Zealand,1378
Pakistan,536
South Africa,1287
Sri Lanka,2220


In [313]:
print('Viratt Scored most runs against opp is',df1[['runs']].sum().max())

Viratt Scored most runs against opp is runs    2235
dtype: int64


### 4. What is the most common ground where the Virat has played?

In [316]:
df.Ground.value_counts().max()

13

In [318]:
Grounds_cts=tuple(zip(df['Ground'],df.Ground.value_counts()))

In [320]:
for i in Grounds_cts:
    if i[1]==df.Ground.value_counts().max():
        print('The Ground on which Virat Played the most in:',i[0],'\nThe count is.',i[1])
    else:
        continue



The Ground on which Virat Played the most in: Dhaka 
The count is. 13


### 5. What is the player's highest strike rate in a match?

In [323]:
sr = df[df['sr'] == df['sr'].max()]

In [325]:
print('Virat has highest average',sr['sr'].values[0],'with runs',sr['runs'].values[0],'against',sr['Opp'].values[0])

Virat has highest average 209.09 with runs 23 against West Indies


### 6. How many centuries (100+ runs) has the Virat scored?

In [328]:
df[df['runs']>=100].count()[0]

  df[df['runs']>=100].count()[0]


43

In [330]:
Centuries = df[df['runs']>=100].shape[0]

In [331]:
print('Virat has scored',Centuries,'Centuries')

Virat has scored 43 Centuries


### 7. What is the average strike rate across all matches?

In [335]:
avg = df['sr']

In [336]:
print('The Average Strike Rate of Virat across all matches:',avg.mean())

The Average Strike Rate of Virat across all matches: 80.31928571428571


### 8. What is the maximum number of fours hit in a single match?

In [340]:
fs=df[df['fours']==df['fours'].max()]

In [341]:
print(f"Virat's maximum fours {fs['fours'][0]} in a single match on {fs['Date'][0]} against {fs['Opp'][0]} at {fs['Ground'][0]}")

Virat's maximum fours 22 in a single match on 18-Mar-12 against Pakistan at Dhaka


### 9. What is the total number of sixes hit by Virat?

In [345]:
print(f'Virat has hit {df.sixes.value_counts().sum()} total sixes till {df.Date.max()}')

Virat has hit 238 total sixes till 31-Jul-12


### 10. How many times has the player scored more than 150 runs in a match?

In [359]:
v_150=df[df['runs']>=150]

In [361]:
print(f'Virat has scored 150+ runs {df[df['runs']>=150].shape[0]} times, with max run as {v_150['runs'].max()}')

Virat has scored 150+ runs 4 times, with max run as 183
