## **1. Introduction to the Dataset & Business Context**  
About the Dataset and Business Case  

<table align="center" width="100%">
    <tr>
        <td width="35%">
            <img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/101/398/original/sachin.webp">
        </td>
        <td>
            <div align="center">
                <font color="#e66e82" size="5">
                    <b>Sachin Tendulkar ODI Cricket Career</b>
                </font>
            </div>
        </td>
    </tr>
</table>

**Dataset:** <font color="violet">**Sachin Tendulkar ODI Cricket Career**</font> 🏏  
We are analyzing a dataset containing the **ODI (One Day International) cricket stats** of Sachin Tendulkar, one of the greatest cricketers of all time. The dataset provides detailed performance metrics across **360 matches** of his illustrious career. By studying this data, we can explore and gain insights into:  

- **Performance Patterns:** <font color="skyblue">Runs, strike rate, centuries, boundaries</font>, and other key performance indicators.  
- **Match Outcomes:** The relationship between Sachin's performances and the overall outcomes of matches (<font color="green">win or loss</font>).  
- **Opposition & Venue Analysis:** How performance varies against different opponents and at specific grounds.  

**Key Business Objectives:**  
1. <font color="teal">**Understanding Winning Patterns**</font>: Explore how Sachin’s performance influenced match results.  
2. <font color="teal">**Performance Insights**</font>: Identify trends in centuries, strike rates, and boundary contributions.  
3. <font color="teal">**Strategic Scenarios**</font>: Use probability-based analysis to uncover interesting scenarios, such as how often a century correlates with a win.  
4. <font color="teal">**Historical Appreciation**</font>: Showcase Sachin's impact and consistency in ODI cricket with data-driven insights.  

We’ll use this dataset to address a series of probability-related questions, exploring Sachin’s performances across various scenarios. This analysis will demonstrate **data manipulation**, **type conversions**, and **logical operations**.

In [None]:
!wget https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/101/397/original/Sachin_ODI.csv

--2025-01-02 05:11:53--  https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/101/397/original/Sachin_ODI.csv
Resolving d2beiqkhq929f0.cloudfront.net (d2beiqkhq929f0.cloudfront.net)... 3.167.84.28, 3.167.84.196, 3.167.84.9, ...
Connecting to d2beiqkhq929f0.cloudfront.net (d2beiqkhq929f0.cloudfront.net)|3.167.84.28|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 26440 (26K) [text/plain]
Saving to: ‘Sachin_ODI.csv’


2025-01-02 05:11:53 (141 KB/s) - ‘Sachin_ODI.csv’ saved [26440/26440]



In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

sachin_data  = pd.read_csv("Sachin_ODI.csv")

sachin_data .head()

Unnamed: 0,runs,NotOut,mins,bf,fours,sixes,sr,Inns,Opp,Ground,Date,Winner,Won,century
0,13,0,30,15,3,0,86.66,1,New Zealand,Napier,1995-02-16,New Zealand,False,False
1,37,0,75,51,3,1,72.54,2,South Africa,Hamilton,1995-02-18,South Africa,False,False
2,47,0,65,40,7,0,117.5,2,Australia,Dunedin,1995-02-22,India,True,False
3,48,0,37,30,9,1,160.0,2,Bangladesh,Sharjah,1995-04-05,India,True,False
4,4,0,13,9,1,0,44.44,2,Pakistan,Sharjah,1995-04-07,Pakistan,False,False


In [None]:
sachin_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 360 entries, 0 to 359
Data columns (total 14 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   runs     360 non-null    int64  
 1   NotOut   360 non-null    int64  
 2   mins     360 non-null    object 
 3   bf       360 non-null    int64  
 4   fours    360 non-null    int64  
 5   sixes    360 non-null    int64  
 6   sr       360 non-null    float64
 7   Inns     360 non-null    int64  
 8   Opp      360 non-null    object 
 9   Ground   360 non-null    object 
 10  Date     360 non-null    object 
 11  Winner   360 non-null    object 
 12  Won      360 non-null    bool   
 13  century  360 non-null    bool   
dtypes: bool(2), float64(1), int64(6), object(5)
memory usage: 34.6+ KB




---



## 🎯 <font color="magenta">2. Basic Probability Terminology & Concepts</font>





### 🌟 <font color="skyblue">1. Experiment, Outcomes, and Sample Space</font>

### **Experiment**  
An experiment is any action or process that generates outcomes.  
💡 *Example*: Observing Sachin’s performance in an ODI match is an experiment.

### **Outcome**  
An outcome is a single result of an experiment.  
💡 *Example*: Sachin scoring 100 runs in a specific match is one possible outcome.

### **Sample Space (S)**  
The sample space is the set of all possible outcomes of an experiment.  
💡 *Example*: For Sachin’s ODI career, the sample space might include all possible scores he could achieve in a match (0, 1, 2, ..., 200+).


In [None]:
# Extract unique scores from the dataset
sample_space = sachin_data['runs'].unique()

# Display the sample space
print("Sample Space (Unique Runs):", sorted(sample_space))
print("Number of Unique Outcomes in the Sample Space:", len(sample_space))

Sample Space (Unique Runs): [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 57, 60, 61, 62, 63, 64, 65, 67, 68, 69, 70, 71, 72, 74, 77, 78, 79, 80, 81, 82, 83, 85, 86, 87, 88, 89, 90, 91, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 104, 105, 110, 111, 112, 113, 114, 117, 118, 120, 122, 123, 124, 127, 128, 134, 137, 138, 139, 140, 141, 143, 146, 152, 163, 175, 186, 200]
Number of Unique Outcomes in the Sample Space: 122


### 🌟 <font color="green">2. Events</font>

An event is a subset of the sample space. It can contain one or more outcomes.  
💡 *Example*:  
- **Event A**: Sachin scores more than 100 runs in a match.  
- **Event B**: Sachin scores exactly 0 runs (a duck).

### **Types of Events**

1. **Mutually Exclusive Events**  
   Events that cannot happen at the same time.  
   💡 *Example*: Sachin scoring a century and scoring a duck in the same match.

2. **Joint Events**  
   Events that can happen simultaneously.  
   💡 *Example*: Sachin scores more than 50 runs *and* the match is won by India.

3. **Independent Events**  
   Events where the occurrence of one does not affect the other.  
   💡 *Example*: The outcome of Sachin’s performance and the weather conditions (assuming no match cancellations).

4. **Exhaustive Events**  
   A set of events that covers all possible outcomes in the sample space.  
   💡 *Example*:  
   - Event A: Sachin scores less than 50 runs.  
   - Event B: Sachin scores 50 or more runs.  
   Together, these two events are exhaustive.

### 🌟 <font color="orange">3. Visualizing Events with Set Operations</font>

Probability often uses **set theory** to describe events. Here are three key operations:



#### <font color="magenta">Intersection (∩)</font>: Matches where both events happen

💡 *Example*: Matches where Sachin scored more than 50 runs *and* India won.



In [None]:
sachin_data.head()

Unnamed: 0,runs,NotOut,mins,bf,fours,sixes,sr,Inns,Opp,Ground,Date,Winner,Won,century
0,13,0,30,15,3,0,86.66,1,New Zealand,Napier,1995-02-16,New Zealand,False,False
1,37,0,75,51,3,1,72.54,2,South Africa,Hamilton,1995-02-18,South Africa,False,False
2,47,0,65,40,7,0,117.5,2,Australia,Dunedin,1995-02-22,India,True,False
3,48,0,37,30,9,1,160.0,2,Bangladesh,Sharjah,1995-04-05,India,True,False
4,4,0,13,9,1,0,44.44,2,Pakistan,Sharjah,1995-04-07,Pakistan,False,False


In [None]:
# Define events
event_A = sachin_data[sachin_data['runs'] > 50]  # Event A: Scores > 50
event_B = sachin_data[sachin_data['Won'] == True]  # Event B: India wins

# Intersection (A ∩ B)
intersection = pd.merge(event_A, event_B, how='inner')

# Display results
print("Matches where Sachin scored > 50 runs and India won:")
intersection[['Opp', 'Ground', 'runs', 'century']]

Matches where Sachin scored > 50 runs and India won:


Unnamed: 0,Opp,Ground,runs,century
0,Sri Lanka,Sharjah,112,True
1,Kenya,Cuttack,127,True
2,West Indies,Gwalior,70,False
3,Pakistan,Sharjah,118,True
4,Pakistan,Toronto,89,False
...,...,...,...,...
68,Sri Lanka,Cuttack,96,False
69,South Africa,Gwalior,200,True
70,Australia,Ahmedabad,53,False
71,Pakistan,Mohali,85,False


#### <font color="skyblue">Union (∪)</font>: Matches where either event happens

💡 *Example*: Matches where Sachin scored more than 50 runs *or* India won.


In [None]:
# Union (A ∪ B) using concatenation and dropping duplicates
union = pd.concat([event_A, event_B]).drop_duplicates()

# Display results
print("Matches where Sachin scored > 50 runs or India won:")
union[['Opp', 'Ground', 'runs', 'century']]

Matches where Sachin scored > 50 runs or India won:


Unnamed: 0,Opp,Ground,runs,century
5,Sri Lanka,Sharjah,112,True
10,New Zealand,Nagpur,65,False
12,Kenya,Cuttack,127,True
13,West Indies,Gwalior,70,False
14,Australia,Mumbai,90,False
...,...,...,...,...
346,West Indies,Chennai,2,False
349,Sri Lanka,Mumbai,18,False
351,Sri Lanka,Perth,48,False
356,Sri Lanka,Hobart,39,False


#### <font color="green">Complement (A')</font>: Matches where the event does not happen

💡 *Example*: Matches where Sachin did **not** score a century.

In [None]:
# Complement (A'): Matches where Sachin did not score a century
event_A_complement = sachin_data[sachin_data['runs'] < 100]  # Not a century

# Display results
print("Matches where Sachin did not score a century:")
event_A_complement[['Opp', 'Ground', 'runs', 'century']]

Matches where Sachin did not score a century:


Unnamed: 0,Opp,Ground,runs,century
0,New Zealand,Napier,13,False
1,South Africa,Hamilton,37,False
2,Australia,Dunedin,47,False
3,Bangladesh,Sharjah,48,False
4,Pakistan,Sharjah,4,False
...,...,...,...,...
354,Sri Lanka,Brisbane,22,False
355,Australia,Sydney,14,False
356,Sri Lanka,Hobart,39,False
357,Sri Lanka,Dhaka,6,False




---
## 🎯 <font color="magenta">3. Probability Rules</font>

Understanding the fundamental rules of probability helps us compute the likelihood of events systematically. Let’s explore these rules using **Sachin Tendulkar’s ODI career data**.


### 🌟 <font color="skyblue">1. Addition Rule</font>

The addition rule calculates the probability of either one event or another happening (union of events).

**Formula:**

$
P(A \cup B) = P(A) + P(B) - P(A \cap B)
$

💡 **Example:**

What is the probability that Sachin scored more than 50 runs (Event A) **or** India won the match (Event B)?  

In [None]:
# Total matches
total_matches = len(sachin_data)

# Event A: Sachin scored > 50 runs
event_A = sachin_data[sachin_data['runs'] > 50]

# Event B: India won the match
event_B = sachin_data[sachin_data['Won'] == True]

# Intersection (A ∩ B): Matches where Sachin scored > 50 runs and India won
intersection = pd.merge(event_A, event_B, how='inner')

# Probabilities
P_A = len(event_A) / total_matches
P_B = len(event_B) / total_matches
P_A_intersection_B = len(intersection) / total_matches

# Applying the addition rule
P_A_union_B = P_A + P_B - P_A_intersection_B

print(f"Probability of scoring > 50 runs or India winning: {P_A_union_B:.4f}")

Probability of scoring > 50 runs or India winning: 0.6389


### 🌟 <font color="green">2. Multiplication Rule</font>

The multiplication rule calculates the probability of both events happening (intersection of events).

**Formula:**  

$
P(A \cap B) = P(A) \cdot P(B \mid A)
$

💡 **Example:**

What is the probability that Sachin scored more than 50 runs (Event A) **and** India won the match (Event B)?

In [None]:
# Event A: Sachin scored > 50 runs
P_A = len(event_A) / total_matches

# Conditional probability: P(B | A)
conditional_data = sachin_data[sachin_data['runs'] > 50]
P_B_given_A = len(conditional_data[conditional_data['Won'] == True]) / len(event_A)

# Applying the multiplication rule
P_A_intersection_B = P_A * P_B_given_A

print(f"Probability of scoring > 50 runs and India winning: {P_A_intersection_B:.4f}")

Probability of scoring > 50 runs and India winning: 0.2028


### 🌟 <font color="orange">3. Marginal vs. Joint Probability</font>

- **Marginal Probability**: Probability of a single event happening (e.g., P(A) or P(B)).  
  💡 *Example*: Probability that Sachin scored a century (Event C).  

- **Joint Probability**: Probability of two events happening together (e.g., P(A ∩ B)).  
  💡 *Example*: Probability that Sachin scored more than 50 runs **and** hit at least one six (Event D).

In [None]:
# Marginal probability: Sachin scored a century
event_C = sachin_data[sachin_data['century'] == True]
P_C = len(event_C) / total_matches

# Joint probability: Sachin scored > 50 runs and hit at least 1 six
event_D = sachin_data[(sachin_data['runs'] > 50) & (sachin_data['sixes'] > 0)]
P_D = len(event_D) / total_matches

print(f"Marginal Probability of Sachin scoring a century: {P_C:.4f}")
print(f"Joint Probability of scoring > 50 runs and hitting at least 1 six: {P_D:.4f}")

Marginal Probability of Sachin scoring a century: 0.1278
Joint Probability of scoring > 50 runs and hitting at least 1 six: 0.1889


### 🌟 <font color="purple">4. Conditional Probability</font>

Conditional probability calculates the likelihood of an event given that another event has already occurred.

**Formula:**  

$
P(B \mid A) = \frac{P(A \cap B)}{P(A)}
$

💡 **Example:**  

Given that Sachin scored more than 50 runs (Event A), what is the probability that he scored a century (Event C)?

In [None]:
# Event A: Sachin scored > 50 runs
event_A = sachin_data[sachin_data['runs'] > 50]

# Event C: Sachin scored a century
event_C_given_A = event_A[event_A['century'] == True]

# Conditional probability
P_C_given_A = len(event_C_given_A) / len(event_A)

print(f"Probability of scoring a century given runs > 50: {P_C_given_A:.4f}")

Probability of scoring a century given runs > 50: 0.3866




---

## 🎯 <font color="magenta">4. Bayes' Theorem</font>

Bayes' Theorem is a cornerstone of probability theory that helps us revise probabilities based on new evidence. It enables us to calculate **conditional probabilities** when the reverse probability and other relevant information are known.


### 🌟 <font color="skyblue">1. Formula for Bayes' Theorem</font>

The formula for Bayes' Theorem is:  
$
P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B)}
$

Where:  
- $ P(A \mid B) $: Probability of $ A $ occurring given that $ B $ has occurred (posterior).  
- $ P(B \mid A) $: Probability of $ B $ occurring given that $ A $ has occurred (likelihood).  
- $ P(A) $: Probability of $ A $ occurring (prior).  
- $ P(B) $: Probability of $ B $ occurring (evidence).


### 🌟 <font color="green">2. Intuition Behind Bayes' Theorem</font>

💡 **Example:**  
In Sachin’s dataset:
- Event $ A $: Sachin scores a century.  
- Event $ B $: Sachin batted first.

We want to compute $ P(A \mid B) $:  
> *What is the probability that Sachin scores a century given he batted first?*

In [None]:
# Define the events
event_A = sachin_data[sachin_data['century'] == True]  # Event A: Scored a century
event_B = sachin_data[sachin_data['Inns'] == 1]        # Event B: Batted first

# Total matches
total_matches = len(sachin_data)

# Probabilities
P_A = len(event_A) / total_matches  # Prior probability of scoring a century
P_B = len(event_B) / total_matches  # Probability of batting first
P_B_given_A = len(event_A[event_A['Inns'] == 1]) / len(event_A)  # Likelihood

# Bayes' Theorem
P_A_given_B = (P_B_given_A * P_A) / P_B

print(f"Probability of scoring a century given Sachin batted first: {P_A_given_B:.4f}")

Probability of scoring a century given Sachin batted first: 0.1765


### 🌟 <font color="purple">4. Real-Life Relevance</font>

Bayes' Theorem is widely used in:
- **Spam Detection**: Classifying emails as spam or not based on keywords.  
- **Medical Testing**: Revising the probability of having a disease based on test results.  
- **Sports Analytics**: Understanding player performance under specific conditions.


### 🌟 <font color="skyblue">5. Simple Tree Diagram Approach (Optional)</font>

A **tree diagram** can help visualize the probabilities involved in Bayes' Theorem.

💡 **Example**: Using Tree Diagram Data for Complex Joint Probabilities  

What is the probability of **Sachin scoring between 50-100 runs** and **India winning**, considering the batting order?

![](https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/101/404/original/Screenshot_2024-12-30_113248.png)

---
