## Introduction to Basic and Conditional Probability

## Basic Probability

To perform basic probability calculations on IPL (Indian Premier League) data using Python, you can use the pandas library to load and analyze the data. Here's an example of how you can calculate probabilities based on IPL data:

In [4]:
# importing the libraries
import pandas as pd
import numpy as np

In [5]:
# reading the dataset
df=pd.read_excel("matches.xlsx")

# checking the first five rows
df.head()

Unnamed: 0,id,Season,city,date,team1,team2,toss_winner,toss_decision,result,dl_applied,winner,win_by_runs,win_by_wickets,player_of_match,venue,umpire1,umpire2,umpire3
0,1,IPL-2017,Hyderabad,2017-05-04 00:00:00,Sunrisers Hyderabad,Royal Challengers Bangalore,Royal Challengers Bangalore,field,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,
1,2,IPL-2017,Pune,2017-06-04 00:00:00,Mumbai Indians,Rising Pune Supergiant,Rising Pune Supergiant,field,normal,0,Rising Pune Supergiant,0,7,SPD Smith,Maharashtra Cricket Association Stadium,A Nand Kishore,S Ravi,
2,3,IPL-2017,Rajkot,2017-07-04 00:00:00,Gujarat Lions,Kolkata Knight Riders,Kolkata Knight Riders,field,normal,0,Kolkata Knight Riders,0,10,CA Lynn,Saurashtra Cricket Association Stadium,Nitin Menon,CK Nandan,
3,4,IPL-2017,Indore,2017-08-04 00:00:00,Rising Pune Supergiant,Kings XI Punjab,Kings XI Punjab,field,normal,0,Kings XI Punjab,0,6,GJ Maxwell,Holkar Cricket Stadium,AK Chaudhary,C Shamshuddin,
4,5,IPL-2017,Bangalore,2017-08-04 00:00:00,Royal Challengers Bangalore,Delhi Daredevils,Royal Challengers Bangalore,bat,normal,0,Royal Challengers Bangalore,15,0,KM Jadhav,M Chinnaswamy Stadium,,,


In [6]:
# checking the columns present in the data
df.columns

Index(['id', 'Season', 'city', 'date', 'team1', 'team2', 'toss_winner',
       'toss_decision', 'result', 'dl_applied', 'winner', 'win_by_runs',
       'win_by_wickets', 'player_of_match', 'venue', 'umpire1', 'umpire2',
       'umpire3'],
      dtype='object')

**Calculating the probability of a team winning a match:**

In [7]:
# Total number of matches
total_matches = len(df)

# Number of matches won by Mumbai Indians
team_wins = len(df[df['winner'] == 'Mumbai Indians'])

probability = team_wins / total_matches
print("Probability of Mumbai Indians winning a match:{0:.2f}%".format(probability*100))

Probability of Mumbai Indians winning a match:14.42%


**Calculating the probability distribution of toss results:**

In [8]:
# Count of each team winning the toss
toss_counts = df['toss_winner'].value_counts()

# Total number of matches
total_matches = len(df)

toss_probability = (toss_counts / total_matches)*100
toss_probability = round(toss_probability, 2)
print("Probability distribution of toss results:")
print(toss_probability)

Probability distribution of toss results:
Mumbai Indians                 12.96
Kolkata Knight Riders          12.17
Chennai Super Kings            11.77
Royal Challengers Bangalore    10.71
Kings XI Punjab                10.71
Delhi Daredevils               10.58
Rajasthan Royals               10.58
Sunrisers Hyderabad             6.08
Deccan Chargers                 5.69
Pune Warriors                   2.65
Gujarat Lions                   1.98
Delhi Capitals                  1.32
Kochi Tuskers Kerala            1.06
Rising Pune Supergiants         0.93
Rising Pune Supergiant          0.79
Name: toss_winner, dtype: float64


**Probability of a specific outcome in the toss (e.g., winning the toss and choosing to bat):**

In [9]:
# Total number of tosses
total_tosses = len(df)
batting_choice = len(df[(df['toss_winner'] == 'Chennai Super Kings') & (df['toss_decision'] == 'bat')])

probability = batting_choice / total_tosses
print("Probability of Chennai Super Kings choosing to bat after winning the toss:{0:.2f}%".format(probability*100))

Probability of Chennai Super Kings choosing to bat after winning the toss:6.35%


**Probability of a team winning after winning the toss and choosing to field:**

In [10]:
# Total number of matches
total_matches = len(df)
toss_field = len(df[(df['toss_decision'] == 'field') & (df['winner'] == df['toss_winner'])])

probability = toss_field / total_matches
print("Probability of a team winning after winning the toss and choosing to field:{0:.2f}%".format(probability*100))

Probability of a team winning after winning the toss and choosing to field:34.26%


**Probability of a specific event occurring in a match (e.g., a player scoring a century):**

In [11]:
# Total number of matches
total_matches = len(df)
century_matches = len(df[df['player_of_match'] == 'MS Dhoni'])

probability = century_matches / total_matches
print("Probability of MS Dhoni being the player of the match and scoring a century:{0:.2f}%".format(probability*100))

Probability of MS Dhoni being the player of the match and scoring a century:2.25%


##  Set Theory

To perform set theory operations on two umpire columns in IPL (Indian Premier League) data using Python, you can apply the set operations individually on each column or consider both columns together. Here's an example of how you can apply set operations on two umpire columns:

In [12]:
# Extract unique umpires from the first umpire column
umpire1_set = set(df['umpire1'])

# Extract unique umpires from the second umpire column
umpire2_set = set(df['umpire2'])

# Perform union of umpires from both columns
union_set = umpire1_set.union(umpire2_set)

print("Union Set:")
print(union_set)

Union Set:
{'GA Pratapkumar', 'Subroto Das', 'Rod Tucker', 'BG Jerling', 'K Ananthapadmanabhan', 'Ulhas Gandhe', 'K Hariharan', 'RK Illingworth', 'A Deshmukh', 'TH Wijewardene', 'C Shamshuddin', 'AV Jayaprakash', 'PG Pathak', 'Aleem Dar', 'Nand Kishore', 'K Srinath', 'Bruce Oxenford', 'Vineet Kulkarni', 'SL Shastri', 'Anil Dandekar', 'S Das', 'Anil Chaudhary', 'Sundaram Ravi', 'MR Benson', 'S Asnani', 'Marais Erasmus', 'Yeshwant Barde', 'RJ Tucker', 'Nigel Llong', 'AY Dandekar', 'HDPK Dharmasena', 'SJ Davis', 'VK Sharma', 'SD Fry', 'GAV Baxter', 'JD Cloete', 'AL Hill', 'I Shivram', 'AK Chaudhary', 'IL Howell', 'M Erasmus', 'SJA Taufel', 'YC Barde', 'KN Ananthapadmanabhan', 'Nitin Menon', 'Asad Rauf', 'S Ravi', 'Chris Gaffaney', 'Nanda Kishore', 'O Nandan', 'NJ Llong', 'PR Reiffel', nan, 'Virender Kumar Sharma', 'SD Ranade', 'CB Gaffaney', 'VA Kulkarni', 'SK Tarapore', 'KN Anantapadmanabhan', 'Ian Gould', 'SS Hazare', 'DJ Harper', 'BR Doctrove', 'AM Saheba', 'K Bharatan', 'RM Deshpande'

In [13]:
# Perform intersection of umpires from both columns
intersection_set = umpire1_set.intersection(umpire2_set)

print("Intersection Set:")
print(intersection_set)

Intersection Set:
{'Anil Chaudhary', 'S Das', 'CB Gaffaney', 'Rod Tucker', 'BG Jerling', 'VA Kulkarni', 'K Ananthapadmanabhan', 'MR Benson', 'Ulhas Gandhe', 'SK Tarapore', 'K Hariharan', 'RK Illingworth', 'A Deshmukh', 'S Asnani', 'Yeshwant Barde', 'AK Chaudhary', 'C Shamshuddin', 'IL Howell', 'SS Hazare', 'DJ Harper', 'Nigel Llong', 'Ian Gould', 'BR Doctrove', 'HDPK Dharmasena', 'AM Saheba', 'AV Jayaprakash', 'M Erasmus', 'SJA Taufel', 'A Nanda Kishore', 'KN Ananthapadmanabhan', 'SJ Davis', 'Nitin Menon', 'PG Pathak', 'S Ravi', 'Chris Gaffaney', 'RE Koertzen', 'BNJ Oxenford', 'CK Nandan', 'O Nandan', 'Nanda Kishore', 'NJ Llong', 'A Nand Kishore', 'SD Fry', 'PR Reiffel', 'K Srinath', nan, 'JD Cloete', 'Bruce Oxenford', 'Vineet Kulkarni', 'Kumar Dharmasena', 'SL Shastri', 'Anil Dandekar'}


In [14]:
# Perform difference between umpires of the first column and second column
difference1_set = umpire1_set.difference(umpire2_set)

print("Difference (Umpire1 - Umpire2) Set:")
print(difference1_set)

Difference (Umpire1 - Umpire2) Set:
{'K Bharatan', 'RM Deshpande', 'Marais Erasmus', 'YC Barde', 'BF Bowden', 'Sundaram Ravi', 'Aleem Dar', 'Asad Rauf', 'GAV Baxter', 'AY Dandekar'}


##  Conditional Probability
Conditional probability in IPL data involves calculating the probability of an event occurring given that another event has already occurred. Here's an example of calculating conditional probability in IPL data using Python:

**Suppose you want to calculate the probability of a team winning the match given that they have won the toss. You can use the following code:**

In [15]:
# Total number of matches
total_matches = len(df)

 # Number of matches won by Mumbai Indians in the toss
toss_wins = len(df[df['toss_winner'] == 'Mumbai Indians'])
match_wins_given_toss = len(df[(df['toss_winner'] == 'Mumbai Indians') & (df['winner'] == 'Mumbai Indians')])

conditional_probability = match_wins_given_toss / toss_wins
print("Conditional Probability of Mumbai Indians winning the match given they won the toss: {0:.2f}%".format(conditional_probability*100))

Conditional Probability of Mumbai Indians winning the match given they won the toss: 57.14%


In this code, toss_wins represents the number of matches where Mumbai Indians won the toss, and match_wins_given_toss represents the number of matches Mumbai Indians won after winning the toss. The conditional probability is then calculated by dividing match_wins_given_toss by toss_wins.

**Suppose you want to calculate the probability of a team winning the match given that they have chosen to field after winning the toss. You can use the following code:**

In [16]:
# Total number of matches
total_matches = len(df)

# Number of matches where the toss decision was to field
toss_field = len(df[(df['toss_decision'] == 'field')])
match_wins_given_field = len(df[(df['toss_decision'] == 'field') & (df['winner'] == 'Mumbai Indians')])

conditional_probability = match_wins_given_field / toss_field
print("Probability of Mumbai Indians winning the match while fielding : {0:.2f}%".format(conditional_probability*100))

Probability of Mumbai Indians winning the match while fielding : 13.82%


In this code, toss_field represents the number of matches where the toss decision was to field, and match_wins_given_field represents the number of matches won by Mumbai Indians when they chose to field. The conditional probability is then calculated by dividing match_wins_given_field by toss_field.

## Bayes Theorem

Bayes' theorem is a fundamental concept in probability theory that allows us to update the probability of an event based on new evidence. Here's an example of applying Bayes' theorem in IPL data using Python:

**Suppose you want to calculate the probability of a team winning the match given that they have won the toss. You also have prior knowledge about the overall win rate of teams in the IPL. Here's how you can use Bayes' theorem to update the probability:**

In [None]:
# Total number of matches
total_matches = len(df)

# Number of matches won by Mumbai Indians in the toss
toss_wins = len(df[df['toss_winner'] == 'Mumbai Indians'])

# Number of matches won by Mumbai Indians in both toss and match
match_wins_and_toss = len(df[(df['winner'] == 'Mumbai Indians') & (df['toss_winner'] == 'Mumbai Indians')])

# Prior Probability: Probability of Mumbai Indians winning the match
prior_probability = match_wins_and_toss / total_matches

# Likelihood: Probability of winning the toss and the match
likelihood = match_wins_and_toss / toss_wins

# Evidence: Probability of Mumbai Indians winning the toss
evidence = toss_wins / total_matches

# Applying Bayes' theorem
posterior_probability = (likelihood * prior_probability) / evidence

print("Posterior Probability of Mumbai Indians winning the match given they won the toss: {0:.2f}%".format(posterior_probability*100))

In this code, prior_probability represents the probability of Mumbai Indians winning the match before considering the toss outcome. likelihood represents the probability of Mumbai Indians winning the toss given that they won the match. evidence represents the probability of Mumbai Indians winning the toss based on the total matches. Bayes' theorem is then applied to calculate the posterior_probability of Mumbai Indians winning the match given they won the toss.