# Probability Concepts

In [1]:
import pandas as pd

In [3]:
cric_df = pd.read_csv('https://raw.githubusercontent.com/tkseneee/Dataset/master/crick.csv') 
print(cric_df.shape)
cric_df.head()

(3932, 7)


Unnamed: 0,Scorecard,Team 1,Team 2,Winner,Margin,Ground,Match Date
0,ODI # 1,Australia,England,Australia,5 wickets,Melbourne,"Jan 5, 1971"
1,ODI # 2,England,Australia,England,6 wickets,Manchester,"Aug 24, 1972"
2,ODI # 3,England,Australia,Australia,5 wickets,Lord's,"Aug 26, 1972"
3,ODI # 4,England,Australia,England,2 wickets,Birmingham,"Aug 28, 1972"
4,ODI # 5,New Zealand,Pakistan,New Zealand,22 runs,Christchurch,"Feb 11, 1973"


## Handle Missing Values

In [4]:
cric_df.isna().sum()

Scorecard       0
Team 1          0
Team 2          0
Winner          0
Margin        179
Ground          0
Match Date      0
dtype: int64

In [5]:
cric_df=cric_df.dropna()
cric_df.shape

(3753, 7)

## Problem Statement

Probability of India winning the match is higher when India bats first or India bats last

## Solution

**India batted first** (create a column is_india_bat_first with values 1,0)
  * ind won by 5 runs => winner=India and Margin contains 'run'
         OR
  * opponent team won by 4 wickets => winner!=India and Margin contains 'wicket'
  
---

**Matches India batted first** => sum of is_india_bat_first column values

**Matches India batted first and Won** => Matches India batted first and winner = India

**Probability India Won Given India Batted First** = Matches India batted first and Won / Matches India batted = 0.51

---

**Matches India batted last** => Matches played by India - Matches India Batted First

**Matches India batted last and Won** => Matches India batted last and winner = India 

**Probability India Won Given India Batted Last** = Matches India batted last and Won / Matches India batted last = 0.57

**Probability India Wins** = Total Matches India Won / Total Matches Played By India = 0.54

---

## Observation

Probability of India winning the Match is higher when India bats last.

In [48]:
# total matches india won
matches_ind_won = cric_df[cric_df.Winner=='India'].shape[0]

# total matches india played
matches_ind_played_df = cric_df[(cric_df['Team 1']=='India')|(cric_df['Team 2']=='India')].copy()
matches_ind_played = matches_ind_played_df.shape[0]

# total matches india batted first
is_ind_bat_first = lambda winner,margin: 1 if (winner=='India' and 'run' in margin) or ((winner!='India') and 'wicket' in margin) else 0
# syntax to use apply method on more than one column
matches_ind_played_df['is_ind_bat_first'] = matches_ind_played_df.apply(lambda x: is_ind_bat_first(x.Winner, x.Margin), axis=1)
matches_ind_bat_first = matches_ind_played_df.is_ind_bat_first.sum()

# total matches India Batted first and won
matches_ind_bat_first_win = matches_ind_played_df[(matches_ind_played_df.is_ind_bat_first==1)&(matches_ind_bat_first_df.Winner=='India')].shape[0]


# total matches india batted last
matches_ind_bat_last = matches_ind_played_df.shape[0]-matches_ind_bat_first
# total matches India Batted last and won
matches_ind_bat_last_win = matches_ind_played_df[(matches_ind_played_df.is_ind_bat_first==0) & (matches_ind_played_df.Winner=='India')].shape[0]

ind_win_prob = round(matches_ind_won/matches_ind_played,2)
ind_bat_first_win_prob = round(matches_ind_bat_first_win/matches_ind_bat_first,2)
ind_bat_last_win_prob = round(matches_ind_bat_last_win/matches_ind_bat_last,2)

print("India Winning Probability = {}%".format(ind_win_prob))
print("India Winning Probility given India batted first = {}%".format(ind_bat_first_win_prob))
print("India Winning Probility given India batted last = {}%".format(ind_bat_last_win_prob))

India Winning Probability = 54%
India Winning Probility given India batted first = 51%
India Winning Probility given India batted last = 57%
