# IPL Dataset Analysis

## Problem Statement
We want to know as to what happens during an IPL match which raises several questions in our mind with our limited knowledge about the game called cricket on which it is based. This analysis is done to know as which factors led one of the team to win and how does it matter.

## About the Dataset :
The Indian Premier League (IPL) is a professional T20 cricket league in India contested during April-May of every year by teams representing Indian cities. It is the most-attended cricket league in the world and ranks sixth among all the sports leagues. It has teams with players from around the world and is very competitive and entertaining with a lot of close matches between teams.

The IPL and other cricket related datasets are available at [cricsheet.org](https://cricsheet.org/%c2%a0(data). Feel free to visit the website and explore the data by yourself as exploring new sources of data is one of the interesting activities a data scientist gets to do.

## About the dataset:
Snapshot of the data you will be working on:<br>
<br>
The dataset 1452 data points and 23 features<br>

|Features|Description|
|-----|-----|
|match_code|Code pertaining to individual match|
|date|Date of the match played|
|city|Location where the match was played|
|team1|team1|
|team2|team2|
|toss_winner|Who won the toss out of two teams|
|toss_decision|toss decision taken by toss winner|
|winner|Winner of that match between two teams|
|win_type|How did the team won(by wickets or runs etc.)|
|win_margin|difference with which the team won| 
|inning|inning type(1st or 2nd)|
|delivery|ball delivery|
|batting_team|current team on batting|
|batsman|current batsman on strike|
|non_striker|batsman on non-strike|
|bowler|Current bowler|
|runs|runs scored|
|extras|extra run scored|
|total|total run scored on that delivery including runs and extras|
|extras_type|extra run scored by wides or no ball or legby|
|player_out|player that got out|
|wicket_kind|How did the player got out|
|wicket_fielders|Fielder who caught out the player by catch|


### Analysing data using numpy module

### Read the data using numpy module.

In [1]:
import numpy as np
# Not every data format will be in csv there are other file formats also.
# This exercise will help you deal with other file formats and how toa read it.
path = './data/ipl_matches_small.csv'
data_ipl = np.genfromtxt(path, delimiter=',', skip_header=1, dtype=str)



In [2]:
data_ipl[:,3]

array(['Kolkata Knight Riders', 'Kolkata Knight Riders',
       'Kolkata Knight Riders', ..., 'Rajasthan Royals',
       'Rajasthan Royals', 'Rajasthan Royals'], dtype='<U21')

### Calculate the unique no. of matches in the provided dataset ?

In [3]:
# How many matches were held in total we need to know so that we can analyze further statistics keeping that in mind.
print('Unique no. of matches is ',len(np.unique(data_ipl[:,0])))
              

    


Unique no. of matches is  6


### Find the set of all unique teams that played in the matches in the data set.

In [4]:
# this exercise deals with you getting to know that which are all those six teams that played in the tournament.


In [31]:
teams = np.unique(data_ipl[:, 3:5])
print('Unique teams: ', teams)

Unique teams:  ['Chennai Super Kings' 'Deccan Chargers' 'Kings XI Punjab'
 'Kolkata Knight Riders' 'Mumbai Indians' 'Pune Warriors'
 'Rajasthan Royals']


### Find sum of all extras in all deliveries in all matches in the dataset

In [40]:
# An exercise to make you familiar with indexing and slicing up within data.
sum_of_extra = np.sum(data_ipl[:,17].astype(np.int))
print('Sum of all extras is ',sum_of_extra)

Sum of all extras is  88


### Get the array of all delivery numbers when a given player got out. Also mention the wicket type.

In [13]:
wicket_details = data_ipl[np.ix_(data_ipl[:, 21] != '', (11, 20, 21))]

In [17]:
print('Array of all delivery numbers when a player got out and wicket type is: ',wicket_details)
print()

Array of all delivery numbers when a player got out and wicket type is:  [['3.2' 'ST Jayasuriya' 'caught']
 ['5.5' 'Harbhajan Singh' 'caught']
 ['7.6' 'SR Tendulkar' 'caught']
 ['11.4' 'AM Nayar' 'bowled']
 ['15.6' 'GR Napier' 'caught']
 ['18.6' 'AM Rahane' 'caught']
 ['0.4' 'SC Ganguly' 'bowled']
 ['2.2' 'CH Gayle' 'bowled']
 ['14.5' 'MN van Wyk' 'caught']
 ['17.2' 'LR Shukla' 'bowled']
 ['18.6' 'BJ Hodge' 'run out']
 ['19.3' 'BB McCullum' 'caught']
 ['12.2' 'SR Tendulkar' 'lbw']
 ['13.5' 'Harbhajan Singh' 'caught']
 ['14.4' 'ST Jayasuriya' 'caught']
 ['15.1' 'AM Nayar' 'run out']
 ['16.6' 'DJ Bravo' 'caught']
 ['18.5' 'S Dhawan' 'caught']
 ['1.7' 'BB McCullum' 'caught']
 ['2.7' 'CH Gayle' 'caught']
 ['10.2' 'BJ Hodge' 'bowled']
 ['12.1' 'SC Ganguly' 'caught']
 ['12.3' 'AN Ghosh' 'caught']
 ['13.2' 'Yashpal Singh' 'caught']
 ['14.5' 'LR Shukla' 'caught']
 ['15.1' 'BAW Mendis' 'bowled']
 ['15.2' 'AB Dinda' 'bowled']
 ['1.5' 'HH Gibbs' 'caught']
 ['5.3' 'TL Suman' 'caught']
 ['9.4' 'AC 

### How many matches the team `Mumbai Indians` has won the toss?

In [36]:
# this exercise will help you get the statistics on one particular team
print(len(np.unique(data_ipl[np.ix_(data_ipl[:, 5] == 'Mumbai Indians', (0, 3, 4))][:, 0])), "times 'Mumbai Indian' has won the toss")

2 times 'Mumbai Indian' has won the toss


### Create a filter that filters only those records where the batsman scored 6 runs. Also who has scored the maximum no. of sixes overall ?

In [46]:
# An exercise to know who is the most aggresive player or maybe the scoring player 
most_sixes = data_ipl[np.ix_(data_ipl[:, 16] == '6', (13, 16))]
print('Batsman scored 6 runs:',most_sixes)

Batsman scored 6 runs: [['SR Tendulkar' '6']
 ['SR Tendulkar' '6']
 ['JP Duminy' '6']
 ['JP Duminy' '6']
 ['JP Duminy' '6']
 ['JP Duminy' '6']
 ['BJ Hodge' '6']
 ['BJ Hodge' '6']
 ['BJ Hodge' '6']
 ['SR Tendulkar' '6']
 ['SR Tendulkar' '6']
 ['ST Jayasuriya' '6']
 ['ST Jayasuriya' '6']
 ['SR Tendulkar' '6']
 ['ST Jayasuriya' '6']
 ['ST Jayasuriya' '6']
 ['SR Tendulkar' '6']
 ['Harbhajan Singh' '6']
 ['Harbhajan Singh' '6']
 ['CH Gayle' '6']
 ['SC Ganguly' '6']
 ['TL Suman' '6']
 ['TL Suman' '6']
 ['AC Gilchrist' '6']
 ['RG Sharma' '6']
 ['DR Smith' '6']
 ['Y Venugopal Rao' '6']
 ['PR Shah' '6']
 ['PR Shah' '6']
 ['RR Raje' '6']
 ['DR Smith' '6']
 ['DR Smith' '6']
 ['DR Smith' '6']
 ['SV Samson' '6']
 ['SV Samson' '6']
 ['SR Watson' '6']
 ['R Bhatia' '6']
 ['DS Kulkarni' '6']
 ['DS Kulkarni' '6']
 ['MEK Hussey' '6']
 ['M Vijay' '6']
 ['MS Dhoni' '6']
 ['S Badrinath' '6']
 ['JD Ryder' '6']
 ['M Manhas' '6']
 ['K Goel' '6']
 ['K Goel' '6']
 ['KC Sangakkara' '6']
 ['Yuvraj Singh' '6']
 ['Y

In [62]:
unique, counts = np.unique(most_sixes[:, 0], return_counts=True)
player = sorted(dict(zip(unique, counts)).items(), reverse=True, key=lambda x: x[1])
print(player[0][0],'has scored most number of sixes')

SR Tendulkar has scored most number of sixes
