# IPL Dataset Analysis

## Problem Statement
We want to know as to what happens during an IPL match which raises several questions in our mind with our limited knowledge about the game called cricket on which it is based. This analysis is done to know as which factors led one of the team to win and how does it matter.

## About the Dataset :
The Indian Premier League (IPL) is a professional T20 cricket league in India contested during April-May of every year by teams representing Indian cities. It is the most-attended cricket league in the world and ranks sixth among all the sports leagues. It has teams with players from around the world and is very competitive and entertaining with a lot of close matches between teams.

The IPL and other cricket related datasets are available at [cricsheet.org](https://cricsheet.org/%c2%a0(data). Feel free to visit the website and explore the data by yourself as exploring new sources of data is one of the interesting activities a data scientist gets to do.

## About the dataset:
Snapshot of the data you will be working on:<br>
<br>
The dataset 1452 data points and 23 features<br>

|Features|Description|
|-----|-----|
|match_code|Code pertaining to individual match|
|date|Date of the match played|
|city|Location where the match was played|
|team1|team1|
|team2|team2|
|toss_winner|Who won the toss out of two teams|
|toss_decision|toss decision taken by toss winner|
|winner|Winner of that match between two teams|
|win_type|How did the team won(by wickets or runs etc.)|
|win_margin|difference with which the team won| 
|inning|inning type(1st or 2nd)|
|delivery|ball delivery|
|batting_team|current team on batting|
|batsman|current batsman on strike|
|non_striker|batsman on non-strike|
|bowler|Current bowler|
|runs|runs scored|
|extras|extra run scored|
|total|total run scored on that delivery including runs and extras|
|extras_type|extra run scored by wides or no ball or legby|
|player_out|player that got out|
|wicket_kind|How did the player got out|
|wicket_fielders|Fielder who caught out the player by catch|


### Analysing data using numpy module

### Read the data using numpy module.

In [2]:
import numpy as np
# Not every data format will be in csv there are other file formats also.
# This exercise will help you deal with other file formats and how toa read it.
path = 'ipl_matches_small.csv'
data_ipl = np.genfromtxt(path, delimiter=',', skip_header=1, dtype=str)



In [3]:
data_ipl

array([['392203', '2009-05-01', 'East London', ..., '', '', ''],
       ['392203', '2009-05-01', 'East London', ..., '', '', ''],
       ['392203', '2009-05-01', 'East London', ..., '', '', ''],
       ...,
       ['335987', '2008-04-21', 'Jaipur', ..., '', '', ''],
       ['335987', '2008-04-21', 'Jaipur', ..., '', '', ''],
       ['335987', '2008-04-21', 'Jaipur', ..., '', '', '']], dtype='<U21')

In [4]:
data_ipl[:,3]

array(['Kolkata Knight Riders', 'Kolkata Knight Riders',
       'Kolkata Knight Riders', ..., 'Rajasthan Royals',
       'Rajasthan Royals', 'Rajasthan Royals'], dtype='<U21')

### Calculate the unique no. of matches in the provided dataset ?

In [5]:
set(data_ipl[:,0])

{'335987', '392197', '392203', '392212', '501226', '729297'}

In [6]:
# How many matches were held in total we need to know so that we can analyze further statistics keeping that in mind.
len(set(data_ipl[:,0]))

6

### Find the set of all unique teams that played in the matches in the data set.

In [7]:
# this exercise deals with you getting to know that which are all those six teams that played in the tournament.
team1_set = set(data_ipl[:, 3])
team2_set = set(data_ipl[:, 4])
unique_teams = team1_set.union(team2_set)

In [8]:
unique_teams

{'Chennai Super Kings',
 'Deccan Chargers',
 'Kings XI Punjab',
 'Kolkata Knight Riders',
 'Mumbai Indians',
 'Pune Warriors',
 'Rajasthan Royals'}

### Find sum of all extras in all deliveries in all matches in the dataset

In [9]:
# An exercise to make you familiar with indexing and slicing up within data.
extras = data_ipl[:, 17]

In [10]:
type(extras)

numpy.ndarray

In [27]:
extras_int = extras.astype(np.int16)
extras_int.sum()

88

### Get the array of all delivery numbers when a given player got out. Also mention the wicket type.

In [13]:
wicket_filter = (data_ipl[:, 20] == 'SR Tendulkar')
wickets_arr = data_ipl[wicket_filter]

In [14]:
wickets_arr

array([['392203', '2009-05-01', 'East London', 'Kolkata Knight Riders',
        'Mumbai Indians', 'Mumbai Indians', 'bat', 'Mumbai Indians',
        'runs', '9.0', '1', '7.6', 'Mumbai Indians', 'SR Tendulkar',
        'AM Nayar', 'AB Agarkar', '0', '0', '0', '', 'SR Tendulkar',
        'caught', 'BB McCullum'],
       ['392197', '2009-04-27', 'Port Elizabeth',
        'Kolkata Knight Riders', 'Mumbai Indians', 'Mumbai Indians',
        'bat', 'Mumbai Indians', 'runs', '92.0', '1', '12.2',
        'Mumbai Indians', 'SR Tendulkar', 'ST Jayasuriya', 'LR Shukla',
        '0', '0', '0', '', 'SR Tendulkar', 'lbw', ''],
       ['392212', '2009-05-06', 'Centurion', 'Deccan Chargers',
        'Mumbai Indians', 'Deccan Chargers', 'bat', 'Deccan Chargers',
        'runs', '19.0', '2', '1.5', 'Mumbai Indians', 'SR Tendulkar',
        'PR Shah', 'RP Singh', '0', '0', '0', '', 'SR Tendulkar',
        'bowled', '']], dtype='<U21')

In [15]:
wickets_arr[:, 11]

array(['7.6', '12.2', '1.5'], dtype='<U21')

In [16]:
wickets_arr[:, 21]

array(['caught', 'lbw', 'bowled'], dtype='<U21')

### How many matches the team `Mumbai Indians` has won the toss?

In [17]:
# this exercise will help you get the statistics on one particular team
team_records = data_ipl[data_ipl[:, 5] == 'Mumbai Indians']

In [18]:
type(team_records)

numpy.ndarray

In [19]:
unique_matches = set(team_records[:, 0])
len(unique_matches)

2

### Create a filter that filters only those records where the batsman scored 6 runs. Also who has scored the maximum no. of sixes overall ?

In [20]:
len(data_ipl[data_ipl[:, 16].astype(int) == 6])

59

In [21]:
# An exercise to know who is the most aggresive player or maybe the scoring player 
sixes = data_ipl[data_ipl[:, 16].astype(np.int16) == 6]

In [22]:
sixes[:, 13]

array(['SR Tendulkar', 'SR Tendulkar', 'JP Duminy', 'JP Duminy',
       'JP Duminy', 'JP Duminy', 'BJ Hodge', 'BJ Hodge', 'BJ Hodge',
       'SR Tendulkar', 'SR Tendulkar', 'ST Jayasuriya', 'ST Jayasuriya',
       'SR Tendulkar', 'ST Jayasuriya', 'ST Jayasuriya', 'SR Tendulkar',
       'Harbhajan Singh', 'Harbhajan Singh', 'CH Gayle', 'SC Ganguly',
       'TL Suman', 'TL Suman', 'AC Gilchrist', 'RG Sharma', 'DR Smith',
       'Y Venugopal Rao', 'PR Shah', 'PR Shah', 'RR Raje', 'DR Smith',
       'DR Smith', 'DR Smith', 'SV Samson', 'SV Samson', 'SR Watson',
       'R Bhatia', 'DS Kulkarni', 'DS Kulkarni', 'MEK Hussey', 'M Vijay',
       'MS Dhoni', 'S Badrinath', 'JD Ryder', 'M Manhas', 'K Goel',
       'K Goel', 'KC Sangakkara', 'Yuvraj Singh', 'Yuvraj Singh',
       'Yuvraj Singh', 'IK Pathan', 'Kamran Akmal', 'SR Watson',
       'SR Watson', 'SR Watson', 'SR Watson', 'SR Watson', 'RA Jadeja'],
      dtype='<U21')

In [23]:
from collections import Counter
most_sixes_scored = Counter(sixes[:,13],)

In [24]:
most_sixes_scored

Counter({'SR Tendulkar': 6,
         'JP Duminy': 4,
         'BJ Hodge': 3,
         'ST Jayasuriya': 4,
         'Harbhajan Singh': 2,
         'CH Gayle': 1,
         'SC Ganguly': 1,
         'TL Suman': 2,
         'AC Gilchrist': 1,
         'RG Sharma': 1,
         'DR Smith': 4,
         'Y Venugopal Rao': 1,
         'PR Shah': 2,
         'RR Raje': 1,
         'SV Samson': 2,
         'SR Watson': 6,
         'R Bhatia': 1,
         'DS Kulkarni': 2,
         'MEK Hussey': 1,
         'M Vijay': 1,
         'MS Dhoni': 1,
         'S Badrinath': 1,
         'JD Ryder': 1,
         'M Manhas': 1,
         'K Goel': 2,
         'KC Sangakkara': 1,
         'Yuvraj Singh': 3,
         'IK Pathan': 1,
         'Kamran Akmal': 1,
         'RA Jadeja': 1})

In [25]:
most_sixes_scored.most_common(3)

[('SR Tendulkar', 6), ('SR Watson', 6), ('JP Duminy', 4)]