# 🏏 **Cricket Data Analytics Project**

This notebook analyzes **T20 World Cup 2022** match results, player performance, and other cricket-related statistics. It utilizes **Pandas** for data manipulation and **JSON/CSV** for data storage. The notebook covers data loading, transformation, and analysis to derive insights from cricket matches.

---

# T20 World Cup 2022 Data Analysis Project 
### This notebook processes cricket match data including player stats, match results, and detailed summaries from the T20 World Cup 2022


This section of code defines **file paths** for reading and writing cricket match data in both CSV and JSON formats.  

---

### 🔍 **Breakdown of the Code**
1. **Import Required Libraries**
   ```python
   import pandas as pd  # For data manipulation and analysis
   import json          # For JSON file operations
   ```
   - `pandas`: Helps in reading and processing structured data like CSV and JSON files.  
   - `json`: Allows parsing JSON files into Python dictionaries.  

2. **Define File Paths (CSV & JSON)**
   - **Input Data Files (Raw Data)**  
     - CSV and JSON files store match results, player statistics, and other tournament-related information.  
     - These files contain historical **T20 World Cup 2022** data.  

   - **Output Data Files (Processed Data)**  
     - After data processing, structured information is saved in **fact tables** (match statistics) and **dimension tables** (player info).  

---

### 📂 **File Organization**
| File Type | Data Stored | Format | Example |
|-----------|------------|--------|---------|
| **Match Results** | Match summaries | CSV / JSON | `t20_wc_match_results.json` |
| **Batting Stats** | Player batting performances | CSV / JSON | `t20_wc_batting_summary.csv` |
| **Bowling Stats** | Player bowling performances | CSV / JSON | `t20_wc_bowling_summary.csv` |
| **Player Info** | Player details (names, teams) | CSV / JSON | `t20_wc_player_info.json` |
| **Scorecard URLs** | Links to match scorecards | JSON | `scorecard_urls.json` |
| **Fact Tables** | Processed match data | CSV | `fact_bating_summary.csv` |
| **Dimension Tables** | Player details (structured) | CSV | `dim_players.csv` |

---

### 🚀 **Purpose of This Code**
✔️ Defines **file paths** for structured data processing.  
✔️ Helps in **loading raw data** for cleaning, transformation, and analysis.  
✔️ Organizes the data into **fact tables** (performance stats) and **dimension tables** (player details).  

Would you like to visualize the data with graphs after processing? 📊🔥

In [1]:
#----------------------------------------------------------
# Required Libraries
#----------------------------------------------------------
import pandas as pd  # For data manipulation and analysis
import json         # For JSON file operations

#----------------------------------------------------------
# Input Data File Paths - CSV Format
#----------------------------------------------------------
# Match results from ESPN for T20 World Cup 2022
t20_wc_match_results_csv_file_path = r"E:\Work\Personal\Codes\Python Codes\Cricket Data Analytics Project\archive\csv\t20_wc_match_results.csv"

# Player performance summaries
batting_summary_csv_file_path = r"E:\Work\Personal\Codes\Python Codes\Cricket Data Analytics Project\archive\csv\t20_wc_batting_summary.csv"
bowling_summary_csv_file_path = r"E:\Work\Personal\Codes\Python Codes\Cricket Data Analytics Project\archive\csv\t20_wc_bowling_summary.csv"

# Player information and details
player_details_csv_file_path = r"E:\Work\Personal\Codes\Python Codes\Cricket Data Analytics Project\archive\csv\t20_wc_player_info.csv"
player_details_imagerurl_csv_file_path = r"E:\Work\Personal\Codes\Python Codes\Cricket Data Analytics Project\archive\csv\t20_wc_player_info_with_imageurl.csv"

#----------------------------------------------------------
# Input Data File Paths - JSON Format
#----------------------------------------------------------
# Match results and summaries
t20_wc_match_results_json_file_path = r"E:\Work\Personal\Codes\Python Codes\Cricket Data Analytics Project\archive\json\t20_wc_match_results.json"
batting_summary_json_file_path = r"E:\Work\Personal\Codes\Python Codes\Cricket Data Analytics Project\archive\json\t20_wc_batting_summary.json"
bowling_summary_json_file_path = r"E:\Work\Personal\Codes\Python Codes\Cricket Data Analytics Project\archive\json\t20_wc_bowling_summary.json"

# Player information files
player_details_json_file_path = r"E:\Work\Personal\Codes\Python Codes\Cricket Data Analytics Project\archive\json\t20_wc_player_info.json"
player_details_imageurl_json_file_path = r"E:\Work\Personal\Codes\Python Codes\Cricket Data Analytics Project\archive\json\t20_wc_player_info_with_imageurl.json"

# Additional match data
scorecard_urls_json_file_path = r"E:\Work\Personal\Codes\Python Codes\Cricket Data Analytics Project\archive\json\scorecard_urls.json"

#----------------------------------------------------------
# Output File Paths - Processed Data
#----------------------------------------------------------
# Fact tables for performance analysis
fact_bating_summary = r"E:\Work\Personal\Codes\Python Codes\Cricket Data Analytics Project\archive\Power BI Imports\fact_batting_summary.csv"
fact_bowling_summary = r"E:\Work\Personal\Codes\Python Codes\Cricket Data Analytics Project\archive\Power BI Imports\fact_bowling_summary.csv"

# Dimension tables for player information
dim_players = r"E:\Work\Personal\Codes\Python Codes\Cricket Data Analytics Project\archive\Power BI Imports\dim_players.csv"
dim_players_no_images = r"E:\Work\Personal\Codes\Python Codes\Cricket Data Analytics Project\archive\Power BI Imports\dim_players_no_images.csv"

# Dimension table for match summary
dim_match_summary = r"E:\Work\Personal\Codes\Python Codes\Cricket Data Analytics Project\archive\Power BI Imports\dim_match_summary.csv"

This below code snippet reads a **JSON file** containing T20 World Cup 2022 match results and loads its content into a Python dictionary.  

### 🔍 Breakdown of the Code:
1. **`with open(t20_wc_match_results_json_file_path, encoding="utf-8") as file:`**  
   - Opens the JSON file located at `t20_wc_match_results_json_file_path`.  
   - Uses `utf-8` encoding to ensure special characters (if any) are handled correctly.  
   - The `with open(...)` syntax ensures the file is properly closed after reading.  

2. **`data = json.load(file)`**  
   - Reads the content of the JSON file.  
   - Converts the JSON structure into a Python **dictionary** (`dict`).  

3. **`data`**  
   - Displays the loaded JSON data.  
   - The output is typically a dictionary containing match results, team statistics, and other details.  

### 🏏 Example Output (Simplified):
```python
{
    "matchSummary": [
        {
            "match_id": 1,
            "teams": ["India", "Pakistan"],
            "winner": "India",
            "score": {"India": 160, "Pakistan": 159},
            "date": "2022-10-23"
        },
        ...
    ]
}
```
This means the JSON file likely contains multiple matches with details such as participating teams, scores, and winners.  

Would you like to explore specific parts of this JSON data, such as extracting team names or match results? 🚀

In [2]:
# open the espn_t20_wc_2022_results_json_file
with open(t20_wc_match_results_json_file_path, encoding="utf-8") as file:
    data = json.load(file)
data

[{'matchSummary': [{'Team 1': 'England',
    'Team 2': 'Pakistan',
    'Winner': 'England',
    'Margin': '5 wickets',
    'Ground': 'Melbourne',
    'Match Date': 'Nov 13, 2022',
    'Scorecard': 'T20I # 1879'},
   {'Team 1': 'England',
    'Team 2': 'India',
    'Winner': 'England',
    'Margin': '10 wickets',
    'Ground': 'Adelaide',
    'Match Date': 'Nov 10, 2022',
    'Scorecard': 'T20I # 1878'},
   {'Team 1': 'New Zealand',
    'Team 2': 'Pakistan',
    'Winner': 'Pakistan',
    'Margin': '7 wickets',
    'Ground': 'Sydney',
    'Match Date': 'Nov 9, 2022',
    'Scorecard': 'T20I # 1877'},
   {'Team 1': 'India',
    'Team 2': 'Zimbabwe',
    'Winner': 'India',
    'Margin': '71 runs',
    'Ground': 'Melbourne',
    'Match Date': 'Nov 6, 2022',
    'Scorecard': 'T20I # 1873'},
   {'Team 1': 'Bangladesh',
    'Team 2': 'Pakistan',
    'Winner': 'Pakistan',
    'Margin': '5 wickets',
    'Ground': 'Adelaide',
    'Match Date': 'Nov 6, 2022',
    'Scorecard': 'T20I # 1872'},
   {'T

This below code extracts **match summary data** from the loaded JSON and converts it into a Pandas **DataFrame** for further analysis.  

---

### 🔍 Breakdown of the Code:
1. **`data[0]['matchSummary']`**  
   - Extracts the `matchSummary` field from the **first** element (`data[0]`) in the JSON data.  
   - The JSON data (`data`) is a list, where each element represents a match.  
   - `matchSummary` likely contains key match details such as teams, scores, and results.  

2. **`dataframe = data[0]['matchSummary']`**  
   - Stores the extracted match summary into a Python dictionary or list (`dataframe`).  

3. **`dataframe_match = pd.DataFrame(dataframe)`**  
   - Converts the `dataframe` dictionary or list into a **Pandas DataFrame** for structured tabular analysis.  
   - This makes it easier to manipulate and analyze match summary data using Pandas functions.  

4. **`dataframe_match.head()`**  
   - Displays the first **five rows** of the `dataframe_match` DataFrame.  
   - Useful for quickly inspecting the data.  

---

### 🏏 Example JSON Structure (`data[0]`)
```json
{
    "matchSummary": [
        {"team1": "India", "team2": "Pakistan", "winner": "India", "score": "160/6 vs 159/8"},
        {"team1": "Australia", "team2": "England", "winner": "England", "score": "178/7 vs 180/5"}
    ]
}
```

### 🏏 Expected DataFrame Output:
| team1     | team2     | winner  | score             |
|-----------|----------|---------|-------------------|
| India     | Pakistan | India   | 160/6 vs 159/8   |
| Australia | England  | England | 178/7 vs 180/5   |

---

### ✅ What This Code Achieves:
- Extracts **match summaries** from the JSON dataset.
- Converts them into a **structured DataFrame** for easy processing.
- Enables further analysis like filtering matches, finding top-performing teams, etc.

Would you like to extract additional details, such as top scorers or player performances? 🚀

In [3]:
dataframe = data[0]['matchSummary']
dataframe_match = pd.DataFrame(dataframe)
dataframe_match.head()

Unnamed: 0,Team 1,Team 2,Winner,Margin,Ground,Match Date,Scorecard
0,England,Pakistan,England,5 wickets,Melbourne,"Nov 13, 2022",T20I # 1879
1,England,India,England,10 wickets,Adelaide,"Nov 10, 2022",T20I # 1878
2,New Zealand,Pakistan,Pakistan,7 wickets,Sydney,"Nov 9, 2022",T20I # 1877
3,India,Zimbabwe,India,71 runs,Melbourne,"Nov 6, 2022",T20I # 1873
4,Bangladesh,Pakistan,Pakistan,5 wickets,Adelaide,"Nov 6, 2022",T20I # 1872


In [4]:
dataframe_match.shape

(42, 7)

### **📌 What Does `dataframe_match.shape` Do?**
The `.shape` attribute in Pandas **returns the dimensions** of the DataFrame in the form of a tuple:  
```python
(rows, columns)
```

---

In [5]:
dataframe_match.rename({'Scorecard':'Match_ID'}, axis=1, inplace=True)
dataframe_match.head()

Unnamed: 0,Team 1,Team 2,Winner,Margin,Ground,Match Date,Match_ID
0,England,Pakistan,England,5 wickets,Melbourne,"Nov 13, 2022",T20I # 1879
1,England,India,England,10 wickets,Adelaide,"Nov 10, 2022",T20I # 1878
2,New Zealand,Pakistan,Pakistan,7 wickets,Sydney,"Nov 9, 2022",T20I # 1877
3,India,Zimbabwe,India,71 runs,Melbourne,"Nov 6, 2022",T20I # 1873
4,Bangladesh,Pakistan,Pakistan,5 wickets,Adelaide,"Nov 6, 2022",T20I # 1872


### **📌 What Does This Code Do?**
This code **renames a column** in the `dataframe_match` DataFrame, changing `"Scorecard"` to `"Match_ID"`.

---

### **🔍 Breakdown of the Code**
```python
dataframe_match.rename({'Scorecard': 'Match_ID'}, axis=1, inplace=True)
```
- **`rename({'Scorecard': 'Match_ID'}, axis=1)`**  
  - Finds the column named `"Scorecard"` and renames it to `"Match_ID"`.  
  - `axis=1` specifies that we are renaming a **column** (not rows).  

- **`inplace=True`**  
  - Applies the change **directly** to `dataframe_match` without creating a new DataFrame.  

- **`dataframe_match.head()`**  
  - Displays the first **five rows** of the updated DataFrame.  

---

### **📊 Why Rename `Scorecard` to `Match_ID`?**
✔️ **"Match_ID" is more meaningful** – likely represents a unique match identifier.  
✔️ Ensures **consistent column naming** for further data processing.  
✔️ Useful for **merging datasets** (e.g., linking with batting & bowling stats).  

---

### **🧐 Example Before & After**
#### **Before Renaming (`dataframe_match.head()`)**
| Team1 | Team2 | Winner | Score | Venue | **Scorecard** |
|-------|-------|--------|-------|--------|------------|
| India | Pakistan | India | 160/6 vs 159/8 | Melbourne | `match_001` |
| Australia | England | England | 178/7 vs 180/5 | Sydney | `match_002` |

#### **After Renaming (`dataframe_match.head()`)**
| Team1 | Team2 | Winner | Score | Venue | **Match_ID** |
|-------|-------|--------|-------|--------|------------|
| India | Pakistan | India | 160/6 vs 159/8 | Melbourne | `match_001` |
| Australia | England | England | 178/7 vs 180/5 | Sydney | `match_002` |

---

In [6]:
match_ids_dict = {}
for index, row, in dataframe_match.iterrows():
    key1 = row['Team 1'] + ' Vs ' + row['Team 2']
    key2 = row['Team 2'] + ' Vs ' + row['Team 1']

    match_ids_dict[key1] = row['Match_ID']
    match_ids_dict[key2] = row['Match_ID']
match_ids_dict

{'England Vs Pakistan': 'T20I # 1879',
 'Pakistan Vs England': 'T20I # 1879',
 'England Vs India': 'T20I # 1878',
 'India Vs England': 'T20I # 1878',
 'New Zealand Vs Pakistan': 'T20I # 1877',
 'Pakistan Vs New Zealand': 'T20I # 1877',
 'India Vs Zimbabwe': 'T20I # 1873',
 'Zimbabwe Vs India': 'T20I # 1873',
 'Bangladesh Vs Pakistan': 'T20I # 1872',
 'Pakistan Vs Bangladesh': 'T20I # 1872',
 'Netherlands Vs South Africa': 'T20I # 1871',
 'South Africa Vs Netherlands': 'T20I # 1871',
 'England Vs Sri Lanka': 'T20I # 1867',
 'Sri Lanka Vs England': 'T20I # 1867',
 'Australia Vs Afghanistan': 'T20I # 1864',
 'Afghanistan Vs Australia': 'T20I # 1864',
 'Ireland Vs New Zealand': 'T20I # 1862',
 'New Zealand Vs Ireland': 'T20I # 1862',
 'Pakistan Vs South Africa': 'T20I # 1861',
 'South Africa Vs Pakistan': 'T20I # 1861',
 'Bangladesh Vs India': 'T20I # 1860',
 'India Vs Bangladesh': 'T20I # 1860',
 'Netherlands Vs Zimbabwe': 'T20I # 1859',
 'Zimbabwe Vs Netherlands': 'T20I # 1859',
 'Englan

### **📌 What Does This Code Do?**
This code **creates a dictionary (`match_ids_dict`)** that maps each match (both team order variations) to its corresponding `Match_ID`.

---

### **🔍 Breakdown of the Code**
#### **Step 1: Initialize an Empty Dictionary**
```python
match_ids_dict = {}
```
- This dictionary will store **match identifiers (`Match_ID`)** for both possible team orderings.

---

#### **Step 2: Iterate Through Each Row in `dataframe_match`**
```python
for index, row in dataframe_match.iterrows():
```
- `iterrows()` allows us to **loop through each row** of the DataFrame.
- `row` represents **one match record** at a time.

---

#### **Step 3: Create Two Keys for Each Match**
```python
key1 = row['Team 1'] + ' Vs ' + row['Team 2']
key2 = row['Team 2'] + ' Vs ' + row['Team 1']
```
- `key1` stores the match as **"Team 1 Vs Team 2"**.
- `key2` stores the match as **"Team 2 Vs Team 1"**.
- This ensures that we can **look up a match by either team order**.

---

#### **Step 4: Store Match IDs for Both Keys**
```python
match_ids_dict[key1] = row['Match_ID']
match_ids_dict[key2] = row['Match_ID']
```
- Both variations (`key1` and `key2`) **point to the same `Match_ID`**.
- This helps when **searching for match data regardless of team order**.

---

### **📊 Example Input & Output**
#### **Example `dataframe_match` Data**
| Team 1   | Team 2  | Match_ID  |
|----------|--------|------------|
| India    | Pakistan  | match_001  |
| England  | Australia | match_002  |

#### **Generated `match_ids_dict`**
```python
{
    "India Vs Pakistan": "match_001",
    "Pakistan Vs India": "match_001",
    "England Vs Australia": "match_002",
    "Australia Vs England": "match_002"
}
```

---

### **🚀 Why Use This Dictionary?**
✔️ **Fast match lookups** by either team order.  
✔️ Useful for **joining with other datasets** (e.g., batting & bowling stats).  
✔️ Helps in **searching match records efficiently**.  

---

In [7]:
# open the batting_summary_json_file
with open(batting_summary_json_file_path, encoding="utf-8") as file:
    battingSummarydata = json.load(file)
    all_records = []
    for battingSummary in battingSummarydata:
        all_records.extend(battingSummary['battingSummary'])
dataframe_battingSummary = pd.DataFrame(all_records)
dataframe_battingSummary

Unnamed: 0,match,battingTeam,battingPos,batsmanName,batsmanNameUrl,dismissal,runs,balls,4s,6s,SR
0,Pakistan Vs England,Pakistan,1,Mohammad Rizwan †,https://www.espncricinfo.com/cricketers/mohamm...,b Curran,15,14,0,1,107.14
1,Pakistan Vs England,Pakistan,2,Babar Azam (c),https://www.espncricinfo.com/cricketers/babar-...,c & b Rashid,32,28,2,0,114.28
2,Pakistan Vs England,Pakistan,3,Mohammad Haris,https://www.espncricinfo.com/cricketers/mohamm...,c Stokes b Rashid,8,12,1,0,66.66
3,Pakistan Vs England,Pakistan,4,Shan Masood,https://www.espncricinfo.com/cricketers/shan-m...,c Livingstone b Curran,38,28,2,1,135.71
4,Pakistan Vs England,Pakistan,5,Iftikhar Ahmed,https://www.espncricinfo.com/cricketers/iftikh...,c †Buttler b Stokes,0,6,0,0,0.00
...,...,...,...,...,...,...,...,...,...,...,...
694,Sri Lanka Vs Namibia,Namibia,7,Wanindu Hasaranga,https://www.espncricinfo.com/cricketers/wanind...,c Loftie-Eaton b Scholtz,4,8,0,0,50.00
695,Sri Lanka Vs Namibia,Namibia,8,Chamika Karunaratne,https://www.espncricinfo.com/cricketers/chamik...,c Baard b Smit,5,8,0,0,62.50
696,Sri Lanka Vs Namibia,Namibia,9,Pramod Madushan,https://www.espncricinfo.com/cricketers/pramod...,run out (van Lingen/†Green),0,0,0,0,-
697,Sri Lanka Vs Namibia,Namibia,10,Dushmantha Chameera,https://www.espncricinfo.com/cricketers/dushma...,c Erasmus b Wiese,8,15,0,0,53.33


### **📌 What Does This Code Do?**  
This code **opens the batting summary JSON file**, extracts all batting records, and converts them into a **pandas DataFrame** for further analysis.

---

### **🔍 Breakdown of the Code**
#### **Step 1: Open the Batting Summary JSON File**
```python
with open(batting_summary_json_file_path, encoding="utf-8") as file:
    battingSummarydata = json.load(file)
```
- Opens the JSON file specified in `batting_summary_json_file_path`.
- Loads the content into `battingSummarydata` using `json.load(file)`, which converts the JSON into a Python **list of dictionaries**.

---

#### **Step 2: Extract All Batting Records**
```python
all_records = []
for battingSummary in battingSummarydata:
    all_records.extend(battingSummary['battingSummary'])
```
- **`battingSummarydata` is a list of match summaries**, where each match contains **a 'battingSummary' key**.
- We iterate through each match and **extract the 'battingSummary' list**.
- We use `extend()` to **flatten** all match records into a **single list (`all_records`)**.

---

#### **Step 3: Convert Data into a Pandas DataFrame**
```python
dataframe_battingSummary = pd.DataFrame(all_records)
```
- Converts the **list of all batting records** into a **structured pandas DataFrame** (`dataframe_battingSummary`).
- This DataFrame will now contain **individual player batting stats from all matches**.

---

In [8]:
dataframe_battingSummary.tail()

Unnamed: 0,match,battingTeam,battingPos,batsmanName,batsmanNameUrl,dismissal,runs,balls,4s,6s,SR
694,Sri Lanka Vs Namibia,Namibia,7,Wanindu Hasaranga,https://www.espncricinfo.com/cricketers/wanind...,c Loftie-Eaton b Scholtz,4,8,0,0,50.00
695,Sri Lanka Vs Namibia,Namibia,8,Chamika Karunaratne,https://www.espncricinfo.com/cricketers/chamik...,c Baard b Smit,5,8,0,0,62.50
696,Sri Lanka Vs Namibia,Namibia,9,Pramod Madushan,https://www.espncricinfo.com/cricketers/pramod...,run out (van Lingen/†Green),0,0,0,0,-
697,Sri Lanka Vs Namibia,Namibia,10,Dushmantha Chameera,https://www.espncricinfo.com/cricketers/dushma...,c Erasmus b Wiese,8,15,0,0,53.33
698,Sri Lanka Vs Namibia,Namibia,11,Maheesh Theekshana,https://www.espncricinfo.com/cricketers/mahees...,not out,11,11,0,1,100.00


### **📌 What Does `dataframe_battingSummary.tail()` Do?**  
This **displays the last 5 rows** of the `dataframe_battingSummary` DataFrame, giving a quick look at the most recent batting records.

---

### **🔍 Breakdown of the Code**
```python
dataframe_battingSummary.tail()
```
- **`tail(n=5)`**: By default, `tail()` returns the last **5 rows** of the DataFrame.
- If you need more rows, you can specify `n`:  
  ```python
  dataframe_battingSummary.tail(10)  # Returns last 10 rows
  ```
---

### **🚀 Why Use `tail()`?**
✔️ Quickly inspect **recent** batting performances.  
✔️ Check if the **data is correctly structured**.  
✔️ Useful when dealing with **large datasets** (instead of printing everything).  

In [9]:
dataframe_battingSummary["out/not_out"] = dataframe_battingSummary.dismissal.apply(lambda x: "not_out" if x == "not out" else "out")
dataframe_battingSummary

Unnamed: 0,match,battingTeam,battingPos,batsmanName,batsmanNameUrl,dismissal,runs,balls,4s,6s,SR,out/not_out
0,Pakistan Vs England,Pakistan,1,Mohammad Rizwan †,https://www.espncricinfo.com/cricketers/mohamm...,b Curran,15,14,0,1,107.14,out
1,Pakistan Vs England,Pakistan,2,Babar Azam (c),https://www.espncricinfo.com/cricketers/babar-...,c & b Rashid,32,28,2,0,114.28,out
2,Pakistan Vs England,Pakistan,3,Mohammad Haris,https://www.espncricinfo.com/cricketers/mohamm...,c Stokes b Rashid,8,12,1,0,66.66,out
3,Pakistan Vs England,Pakistan,4,Shan Masood,https://www.espncricinfo.com/cricketers/shan-m...,c Livingstone b Curran,38,28,2,1,135.71,out
4,Pakistan Vs England,Pakistan,5,Iftikhar Ahmed,https://www.espncricinfo.com/cricketers/iftikh...,c †Buttler b Stokes,0,6,0,0,0.00,out
...,...,...,...,...,...,...,...,...,...,...,...,...
694,Sri Lanka Vs Namibia,Namibia,7,Wanindu Hasaranga,https://www.espncricinfo.com/cricketers/wanind...,c Loftie-Eaton b Scholtz,4,8,0,0,50.00,out
695,Sri Lanka Vs Namibia,Namibia,8,Chamika Karunaratne,https://www.espncricinfo.com/cricketers/chamik...,c Baard b Smit,5,8,0,0,62.50,out
696,Sri Lanka Vs Namibia,Namibia,9,Pramod Madushan,https://www.espncricinfo.com/cricketers/pramod...,run out (van Lingen/†Green),0,0,0,0,-,out
697,Sri Lanka Vs Namibia,Namibia,10,Dushmantha Chameera,https://www.espncricinfo.com/cricketers/dushma...,c Erasmus b Wiese,8,15,0,0,53.33,out


### **📌 What Does This Code Do?**  
This **creates a new column** `"out/not_out"` in `dataframe_battingSummary` based on the `"dismissal"` column.  

---

### **🔍 Code Breakdown**
```python
dataframe_battingSummary["out/not_out"] = dataframe_battingSummary.dismissal.apply(lambda x: "not_out" if x == "not out" else "out")
```
1. **`dataframe_battingSummary.dismissal.apply(...)`**  
   - This applies a **lambda function** (anonymous function) to each value in the `"dismissal"` column.
   
2. **Lambda function logic:**
   ```python
   lambda x: "not_out" if x == "not out" else "out"
   ```
   - If a player’s dismissal status is `"not out"`, it assigns `"not_out"`.  
   - Otherwise, it assigns `"out"`.

3. **Stores the result** in a new column called `"out/not_out"`.

---

In [10]:
dataframe_battingSummary.drop(columns=['dismissal'], inplace=True)
dataframe_battingSummary.drop(columns=['batsmanNameUrl'], inplace=True)
dataframe_battingSummary

Unnamed: 0,match,battingTeam,battingPos,batsmanName,runs,balls,4s,6s,SR,out/not_out
0,Pakistan Vs England,Pakistan,1,Mohammad Rizwan †,15,14,0,1,107.14,out
1,Pakistan Vs England,Pakistan,2,Babar Azam (c),32,28,2,0,114.28,out
2,Pakistan Vs England,Pakistan,3,Mohammad Haris,8,12,1,0,66.66,out
3,Pakistan Vs England,Pakistan,4,Shan Masood,38,28,2,1,135.71,out
4,Pakistan Vs England,Pakistan,5,Iftikhar Ahmed,0,6,0,0,0.00,out
...,...,...,...,...,...,...,...,...,...,...
694,Sri Lanka Vs Namibia,Namibia,7,Wanindu Hasaranga,4,8,0,0,50.00,out
695,Sri Lanka Vs Namibia,Namibia,8,Chamika Karunaratne,5,8,0,0,62.50,out
696,Sri Lanka Vs Namibia,Namibia,9,Pramod Madushan,0,0,0,0,-,out
697,Sri Lanka Vs Namibia,Namibia,10,Dushmantha Chameera,8,15,0,0,53.33,out


### **📌 What Does This Code Do?**  
This **removes unnecessary columns** (`'dismissal'` and `'batsmanNameUrl'`) from `dataframe_battingSummary`.

---

### **🔍 Code Breakdown**
```python
dataframe_battingSummary.drop(columns=['dismissal'], inplace=True)
dataframe_battingSummary.drop(columns=['batsmanNameUrl'], inplace=True)
```
1. **`.drop(columns=[...])`**  
   - Removes specified columns from the DataFrame.  

2. **`inplace=True`**  
   - Modifies the DataFrame **directly** without needing reassignment.

---

In [11]:
dataframe_battingSummary['match_id'] = dataframe_battingSummary['match'].map(match_ids_dict)
dataframe_battingSummary

Unnamed: 0,match,battingTeam,battingPos,batsmanName,runs,balls,4s,6s,SR,out/not_out,match_id
0,Pakistan Vs England,Pakistan,1,Mohammad Rizwan †,15,14,0,1,107.14,out,T20I # 1879
1,Pakistan Vs England,Pakistan,2,Babar Azam (c),32,28,2,0,114.28,out,T20I # 1879
2,Pakistan Vs England,Pakistan,3,Mohammad Haris,8,12,1,0,66.66,out,T20I # 1879
3,Pakistan Vs England,Pakistan,4,Shan Masood,38,28,2,1,135.71,out,T20I # 1879
4,Pakistan Vs England,Pakistan,5,Iftikhar Ahmed,0,6,0,0,0.00,out,T20I # 1879
...,...,...,...,...,...,...,...,...,...,...,...
694,Sri Lanka Vs Namibia,Namibia,7,Wanindu Hasaranga,4,8,0,0,50.00,out,T20I # 1823
695,Sri Lanka Vs Namibia,Namibia,8,Chamika Karunaratne,5,8,0,0,62.50,out,T20I # 1823
696,Sri Lanka Vs Namibia,Namibia,9,Pramod Madushan,0,0,0,0,-,out,T20I # 1823
697,Sri Lanka Vs Namibia,Namibia,10,Dushmantha Chameera,8,15,0,0,53.33,out,T20I # 1823


### **📌 What Does This Code Do?**  
This **adds a `match_id` column** to `dataframe_battingSummary` by mapping the `'match'` column to the corresponding **Match ID** using `match_ids_dict`.

---

### **🔍 Code Breakdown**
```python
dataframe_battingSummary['match_id'] = dataframe_battingSummary['match'].map(match_ids_dict)
```
1. **`match_ids_dict`**  
   - A **dictionary** created earlier that maps `"Team1 Vs Team2"` → `"Match_ID"`.
   
2. **`.map(match_ids_dict)`**  
   - It **replaces** each value in the `'match'` column with its corresponding **Match ID** from `match_ids_dict`.

3. **New `'match_id'` column**  
   - Stores the mapped **Match ID** for each row.

---


In [12]:
dataframe_battingSummary['batsmanName'] = dataframe_battingSummary['batsmanName'].str.replace('â€', '').str.replace('†', '')
dataframe_battingSummary.head()

Unnamed: 0,match,battingTeam,battingPos,batsmanName,runs,balls,4s,6s,SR,out/not_out,match_id
0,Pakistan Vs England,Pakistan,1,Mohammad Rizwan,15,14,0,1,107.14,out,T20I # 1879
1,Pakistan Vs England,Pakistan,2,Babar Azam (c),32,28,2,0,114.28,out,T20I # 1879
2,Pakistan Vs England,Pakistan,3,Mohammad Haris,8,12,1,0,66.66,out,T20I # 1879
3,Pakistan Vs England,Pakistan,4,Shan Masood,38,28,2,1,135.71,out,T20I # 1879
4,Pakistan Vs England,Pakistan,5,Iftikhar Ahmed,0,6,0,0,0.0,out,T20I # 1879


### **📌 What Does This Code Do?**  
This **cleans the `'batsmanName'` column** by **removing unwanted special characters** like `â€` and `†`.

---

### **🔍 Code Breakdown**
```python
dataframe_battingSummary['batsmanName'] = dataframe_battingSummary['batsmanName'] \
    .str.replace('â€', '').str.replace('†', '')
```
1. **`.str.replace('â€', '')`**  
   - Finds and removes the unwanted character **`â€`** from `'batsmanName'`.
   
2. **`.str.replace('†', '')`**  
   - Removes the special symbol **†** (which often appears near wicketkeepers' names).

3. **`dataframe_battingSummary.head()`**  
   - Displays the first 5 rows of the cleaned DataFrame.

---

### **📊 Example Before & After**
#### **Before Cleaning**
| Batsman Name      |
|-------------------|
| Jos Buttler†      |
| MS Dhoniâ€       |
| Quinton de Kock†  |

#### **After Cleaning**
| Batsman Name     |
|------------------|
| Jos Buttler     |
| MS Dhoni        |
| Quinton de Kock |

---

### **🚀 Why Is This Useful?**
✔️ **Removes unnecessary symbols** for **better readability**.  
✔️ **Prepares names for further analysis** (e.g., grouping, merging).  
✔️ **Avoids data inconsistency** in reports and visualizations.  


In [13]:
# open the batting_summary_json_file
with open(bowling_summary_json_file_path, encoding="utf-8") as file:
    bowlingSummaryData = json.load(file)
    all_records = []
    for bowlingSummary in bowlingSummaryData:
        all_records.extend(bowlingSummary['bowlingSummary'])
dataframe_bowlingSummary = pd.DataFrame(all_records)
dataframe_bowlingSummary

Unnamed: 0,match,bowlingTeam,bowlerName,bowlerNameUrl,overs,maidens,runs,wickets,economy,0s,4s,6s,wd,nb
0,England Vs Pakistan,England,Ben Stokes,https://www.espncricinfo.com/cricketers/ben-st...,4,0,32,1,8.00,6,1,0,2,1
1,England Vs Pakistan,England,Chris Woakes,https://www.espncricinfo.com/cricketers/chris-...,3,0,26,0,8.66,7,2,1,2,0
2,England Vs Pakistan,England,Sam Curran,https://www.espncricinfo.com/cricketers/sam-cu...,4,0,12,3,3.00,15,0,0,0,0
3,England Vs Pakistan,England,Adil Rashid,https://www.espncricinfo.com/cricketers/adil-r...,4,1,22,2,5.50,10,1,0,1,0
4,England Vs Pakistan,England,Chris Jordan,https://www.espncricinfo.com/cricketers/chris-...,4,0,27,2,6.75,9,3,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,Namibia Vs Sri Lanka,Sri Lanka,David Wiese,https://www.espncricinfo.com/cricketers/david-...,4,0,16,2,4.00,13,1,0,0,0
496,Namibia Vs Sri Lanka,Sri Lanka,Bernard Scholtz,https://www.espncricinfo.com/cricketers/bernar...,4,0,18,2,4.50,10,1,0,0,0
497,Namibia Vs Sri Lanka,Sri Lanka,Ben Shikongo,https://www.espncricinfo.com/cricketers/ben-sh...,3,1,22,2,7.33,6,3,0,0,0
498,Namibia Vs Sri Lanka,Sri Lanka,JJ Smit,https://www.espncricinfo.com/cricketers/jj-smi...,3,0,16,1,5.33,7,0,0,1,0


### **📌 What Does This Code Do?**  
This **opens, reads, and processes the bowling summary JSON file** into a **structured DataFrame**.

---

### **🔍 Code Breakdown**
```python
with open(bowling_summary_json_file_path, encoding="utf-8") as file:
    bowlingSummaryData = json.load(file)
```
1. **Opens the JSON file** (`bowling_summary_json_file_path`).
2. **Reads and loads the data** into the variable `bowlingSummaryData` as a Python dictionary.

---

```python
all_records = []
for bowlingSummary in bowlingSummaryData:
    all_records.extend(bowlingSummary['bowlingSummary'])
```
3. **Creates an empty list (`all_records`)** to store the extracted data.
4. **Loops through `bowlingSummaryData`**, extracting the `'bowlingSummary'` key from each entry.
5. **Extends `all_records`** by appending all individual bowling records.

---

```python
dataframe_bowlingSummary = pd.DataFrame(all_records)
dataframe_bowlingSummary
```
6. **Converts `all_records` into a Pandas DataFrame** (`dataframe_bowlingSummary`).
7. **Displays the DataFrame** to verify the extracted data.

---

### **🚀 Why Is This Useful?**
✔️ **Extracts structured data** from JSON for analysis.  
✔️ **Converts nested data** into a Pandas DataFrame.  
✔️ **Prepares data for further processing** (like filtering, aggregating, and visualization).  

In [14]:
dataframe_bowlingSummary.drop(columns=['bowlerNameUrl'], inplace=True)
dataframe_bowlingSummary

Unnamed: 0,match,bowlingTeam,bowlerName,overs,maidens,runs,wickets,economy,0s,4s,6s,wd,nb
0,England Vs Pakistan,England,Ben Stokes,4,0,32,1,8.00,6,1,0,2,1
1,England Vs Pakistan,England,Chris Woakes,3,0,26,0,8.66,7,2,1,2,0
2,England Vs Pakistan,England,Sam Curran,4,0,12,3,3.00,15,0,0,0,0
3,England Vs Pakistan,England,Adil Rashid,4,1,22,2,5.50,10,1,0,1,0
4,England Vs Pakistan,England,Chris Jordan,4,0,27,2,6.75,9,3,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,Namibia Vs Sri Lanka,Sri Lanka,David Wiese,4,0,16,2,4.00,13,1,0,0,0
496,Namibia Vs Sri Lanka,Sri Lanka,Bernard Scholtz,4,0,18,2,4.50,10,1,0,0,0
497,Namibia Vs Sri Lanka,Sri Lanka,Ben Shikongo,3,1,22,2,7.33,6,3,0,0,0
498,Namibia Vs Sri Lanka,Sri Lanka,JJ Smit,3,0,16,1,5.33,7,0,0,1,0


### **📌 What Does This Code Do?**  
Removes the **'bowlerNameUrl'** column from `dataframe_bowlingSummary`, keeping only relevant data.  

✔ **Cleans the dataset**  
✔ **Removes unnecessary URL information**  
✔ **Prepares for analysis**  

In [15]:
dataframe_bowlingSummary['match_id'] = dataframe_bowlingSummary['match'].map(match_ids_dict)
dataframe_bowlingSummary

Unnamed: 0,match,bowlingTeam,bowlerName,overs,maidens,runs,wickets,economy,0s,4s,6s,wd,nb,match_id
0,England Vs Pakistan,England,Ben Stokes,4,0,32,1,8.00,6,1,0,2,1,T20I # 1879
1,England Vs Pakistan,England,Chris Woakes,3,0,26,0,8.66,7,2,1,2,0,T20I # 1879
2,England Vs Pakistan,England,Sam Curran,4,0,12,3,3.00,15,0,0,0,0,T20I # 1879
3,England Vs Pakistan,England,Adil Rashid,4,1,22,2,5.50,10,1,0,1,0,T20I # 1879
4,England Vs Pakistan,England,Chris Jordan,4,0,27,2,6.75,9,3,0,0,0,T20I # 1879
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,Namibia Vs Sri Lanka,Sri Lanka,David Wiese,4,0,16,2,4.00,13,1,0,0,0,T20I # 1823
496,Namibia Vs Sri Lanka,Sri Lanka,Bernard Scholtz,4,0,18,2,4.50,10,1,0,0,0,T20I # 1823
497,Namibia Vs Sri Lanka,Sri Lanka,Ben Shikongo,3,1,22,2,7.33,6,3,0,0,0,T20I # 1823
498,Namibia Vs Sri Lanka,Sri Lanka,JJ Smit,3,0,16,1,5.33,7,0,0,1,0,T20I # 1823


### **📌 What Does This Code Do?**  
Adds a **'match_id'** column to `dataframe_bowlingSummary` by mapping match names to their unique IDs from `match_ids_dict`.  

✔ **Links match data with unique IDs**  
✔ **Facilitates merging with other datasets**

In [16]:
# open the batting_summary_json_file
with open(player_details_imageurl_json_file_path, encoding="utf-8") as file:
    playerDetailsImageData = json.load(file)
dataframe_playerDetailsImageData = pd.DataFrame(playerDetailsImageData)
dataframe_playerDetailsImageData

Unnamed: 0,name,team,born,age,nicknames,battingStyle,bowlingStyle,fieldingPosition,playingRole,height,playerImageUrl
0,Mohammad Rizwan †,Pakistan,"June 01, 1992, Peshawar, Khyber Pakhtunkhwa",32y 254d,"Rizi, Rizu",Right hand Bat,Right arm Medium,Wicketkeeper,Wicketkeeper Batter,5ft 7in,"https://img1.hscicdn.com/image/upload/f_auto,t..."
1,Babar Azam (c),Pakistan,"October 15, 1994, Lahore, Punjab",30y 118d,,Right hand Bat,Right arm Offbreak,,Batter,,"https://img1.hscicdn.com/image/upload/f_auto,t..."
2,Mohammad Haris,Pakistan,"March 30, 2001, Peshawar",23y 317d,,Right hand Bat,Right arm Offbreak,Occasional Wicketkeeper,Wicketkeeper Batter,5ft 4in,"https://img1.hscicdn.com/image/upload/f_auto,t..."
3,Shan Masood,Pakistan,"October 14, 1989, Kuwait",35y 119d,Shaani,Left hand Bat,Right arm Medium fast,,Opening Batter,,"https://img1.hscicdn.com/image/upload/f_auto,t..."
4,Iftikhar Ahmed,Pakistan,"September 03, 1990, Peshawar, North-West Front...",34y 160d,,Right hand Bat,Right arm Offbreak,,Middle order Batter,,"https://img1.hscicdn.com/image/upload/f_auto,t..."
...,...,...,...,...,...,...,...,...,...,...,...
1194,David Wiese,Namibia,"May 18, 1985, Roodepoort",39y 268d,,Right hand Bat,Right arm Medium fast,,Allrounder,,"https://img1.hscicdn.com/image/upload/f_auto,t..."
1195,Bernard Scholtz,Namibia,"October 03, 1990, Keetmanshoop, Namibia",34y 130d,,Right hand Bat,Slow Left arm Orthodox,,Bowler,,"https://img1.hscicdn.com/image/upload/f_auto,t..."
1196,Ben Shikongo,Namibia,"May 08, 2000",24y 278d,,Right hand Bat,Right arm Medium fast,,Bowler,,"https://img1.hscicdn.com/image/upload/f_auto,t..."
1197,JJ Smit,Namibia,"November 10, 1995, keetmanshoop",29y 92d,,Right hand Bat,Left arm Medium fast,,Bowling Allrounder,,"https://img1.hscicdn.com/image/upload/f_auto,t..."


### **📌 What Does This Code Do?**  
Loads player details (including image URLs) from a JSON file into a Pandas DataFrame.  

✔ **Reads JSON data**  
✔ **Converts it into a structured DataFrame**  
✔ **Prepares for further processing**

In [17]:
selected_columns = ['name', 'team', 'battingStyle', 'bowlingStyle', 'playerImageUrl','playingRole']
dataframe_playerDetailsImageData = dataframe_playerDetailsImageData[selected_columns]
dataframe_playerDetailsImageData.head()

Unnamed: 0,name,team,battingStyle,bowlingStyle,playerImageUrl,playingRole
0,Mohammad Rizwan †,Pakistan,Right hand Bat,Right arm Medium,"https://img1.hscicdn.com/image/upload/f_auto,t...",Wicketkeeper Batter
1,Babar Azam (c),Pakistan,Right hand Bat,Right arm Offbreak,"https://img1.hscicdn.com/image/upload/f_auto,t...",Batter
2,Mohammad Haris,Pakistan,Right hand Bat,Right arm Offbreak,"https://img1.hscicdn.com/image/upload/f_auto,t...",Wicketkeeper Batter
3,Shan Masood,Pakistan,Left hand Bat,Right arm Medium fast,"https://img1.hscicdn.com/image/upload/f_auto,t...",Opening Batter
4,Iftikhar Ahmed,Pakistan,Right hand Bat,Right arm Offbreak,"https://img1.hscicdn.com/image/upload/f_auto,t...",Middle order Batter


### **📌 What Does This Code Do?**  
Filters `dataframe_playerDetailsImageData` to keep only key player details.  

✔ **Retains essential columns:** `name`, `team`, `battingStyle`, `bowlingStyle`, `playerImageUrl`  
✔ **Removes unnecessary data**  
✔ **Prepares for analysis & visualization**## 🔹 Step 22: Process Data
### 📌 Description:
*This cell performs a specific data transformation or analysis task.*

In [18]:
# Move the playerImageUrl column to the 3rd position
playerImageUrl = dataframe_playerDetailsImageData.pop('playerImageUrl')
dataframe_playerDetailsImageData.insert(2, 'playerImageUrl', playerImageUrl)
dataframe_playerDetailsImageData.head()

Unnamed: 0,name,team,playerImageUrl,battingStyle,bowlingStyle,playingRole
0,Mohammad Rizwan †,Pakistan,"https://img1.hscicdn.com/image/upload/f_auto,t...",Right hand Bat,Right arm Medium,Wicketkeeper Batter
1,Babar Azam (c),Pakistan,"https://img1.hscicdn.com/image/upload/f_auto,t...",Right hand Bat,Right arm Offbreak,Batter
2,Mohammad Haris,Pakistan,"https://img1.hscicdn.com/image/upload/f_auto,t...",Right hand Bat,Right arm Offbreak,Wicketkeeper Batter
3,Shan Masood,Pakistan,"https://img1.hscicdn.com/image/upload/f_auto,t...",Left hand Bat,Right arm Medium fast,Opening Batter
4,Iftikhar Ahmed,Pakistan,"https://img1.hscicdn.com/image/upload/f_auto,t...",Right hand Bat,Right arm Offbreak,Middle order Batter


### **🔹 What This Does?**  
Moves the **'playerImageUrl'** column to the **3rd position** in `dataframe_playerDetailsImageData`.  

✔ **Enhances data organization**  
✔ **Improves readability**  
✔ **Keeps player image URLs closer to key attributes**

In [19]:
# Remove duplicate records based on the 'name' column, keeping the first occurrence
dataframe_playerDetailsImageData = dataframe_playerDetailsImageData.drop_duplicates(subset=['name'], keep='first')

# Display the updated DataFrame
dataframe_playerDetailsImageData

Unnamed: 0,name,team,playerImageUrl,battingStyle,bowlingStyle,playingRole
0,Mohammad Rizwan †,Pakistan,"https://img1.hscicdn.com/image/upload/f_auto,t...",Right hand Bat,Right arm Medium,Wicketkeeper Batter
1,Babar Azam (c),Pakistan,"https://img1.hscicdn.com/image/upload/f_auto,t...",Right hand Bat,Right arm Offbreak,Batter
2,Mohammad Haris,Pakistan,"https://img1.hscicdn.com/image/upload/f_auto,t...",Right hand Bat,Right arm Offbreak,Wicketkeeper Batter
3,Shan Masood,Pakistan,"https://img1.hscicdn.com/image/upload/f_auto,t...",Left hand Bat,Right arm Medium fast,Opening Batter
4,Iftikhar Ahmed,Pakistan,"https://img1.hscicdn.com/image/upload/f_auto,t...",Right hand Bat,Right arm Offbreak,Middle order Batter
...,...,...,...,...,...,...
1031,Kashif Daud,United Arab Emirates,"https://img1.hscicdn.com/image/upload/f_auto,t...",Right hand Bat,Right arm Medium fast,Bowling Allrounder
1049,Divan la Cock,Namibia,"https://img1.hscicdn.com/image/upload/f_auto,t...",Right hand Bat,Legbreak,Opening Batter
1072,Gerhard Erasmus,Namibia,"https://img1.hscicdn.com/image/upload/f_auto,t...",Right hand Bat,Right arm Offbreak,Allrounder
1141,Zawar Farid,United Arab Emirates,"https://img1.hscicdn.com/image/upload/f_auto,t...",Right hand Bat,Right arm Medium,Bowler


In [20]:
# open the batting_summary_json_file
with open(player_details_json_file_path, encoding="utf-8") as file:
    playerDetailsData = json.load(file)
dataframe_playerDetailsData = pd.DataFrame(playerDetailsData)
dataframe_playerDetailsData

Unnamed: 0,name,team,born,age,nicknames,battingStyle,bowlingStyle,fieldingPosition,playingRole,height
0,Mohammad Rizwan †,Pakistan,"June 01, 1992, Peshawar, Khyber Pakhtunkhwa",32y 254d,"Rizi, Rizu",Right hand Bat,Right arm Medium,Wicketkeeper,Wicketkeeper Batter,5ft 7in
1,Babar Azam (c),Pakistan,"October 15, 1994, Lahore, Punjab",30y 118d,,Right hand Bat,Right arm Offbreak,,Batter,
2,Mohammad Haris,Pakistan,"March 30, 2001, Peshawar",23y 317d,,Right hand Bat,Right arm Offbreak,Occasional Wicketkeeper,Wicketkeeper Batter,5ft 4in
3,Shan Masood,Pakistan,"October 14, 1989, Kuwait",35y 119d,Shaani,Left hand Bat,Right arm Medium fast,,Opening Batter,
4,Iftikhar Ahmed,Pakistan,"September 03, 1990, Peshawar, North-West Front...",34y 160d,,Right hand Bat,Right arm Offbreak,,Middle order Batter,
...,...,...,...,...,...,...,...,...,...,...
1194,David Wiese,Namibia,"May 18, 1985, Roodepoort",39y 268d,,Right hand Bat,Right arm Medium fast,,Allrounder,
1195,Bernard Scholtz,Namibia,"October 03, 1990, Keetmanshoop, Namibia",34y 130d,,Right hand Bat,Slow Left arm Orthodox,,Bowler,
1196,Ben Shikongo,Namibia,"May 08, 2000",24y 278d,,Right hand Bat,Right arm Medium fast,,Bowler,
1197,JJ Smit,Namibia,"November 10, 1995, keetmanshoop",29y 92d,,Right hand Bat,Left arm Medium fast,,Bowling Allrounder,


### **🔹 What This Does?**  
Loads **player details** from a JSON file into a Pandas DataFrame.  

✔ **Reads JSON data**  
✔ **Converts it into a structured DataFrame**  
✔ **Prepares for further processing**

In [21]:
selected_columns = ['name', 'team', 'battingStyle', 'bowlingStyle','playingRole']
dataframe_playerDetailsData = dataframe_playerDetailsData[selected_columns]
dataframe_playerDetailsData.head()

Unnamed: 0,name,team,battingStyle,bowlingStyle,playingRole
0,Mohammad Rizwan †,Pakistan,Right hand Bat,Right arm Medium,Wicketkeeper Batter
1,Babar Azam (c),Pakistan,Right hand Bat,Right arm Offbreak,Batter
2,Mohammad Haris,Pakistan,Right hand Bat,Right arm Offbreak,Wicketkeeper Batter
3,Shan Masood,Pakistan,Left hand Bat,Right arm Medium fast,Opening Batter
4,Iftikhar Ahmed,Pakistan,Right hand Bat,Right arm Offbreak,Middle order Batter


### **🔹 What This Does?**  
Filters `dataframe_playerDetailsData` to keep only key player attributes.  

✔ **Retains essential columns:** `name`, `team`, `battingStyle`, `bowlingStyle`  
✔ **Removes unnecessary data**  
✔ **Prepares for streamlined analysis**

In [22]:
# Filter duplicate records based on the 'name' column
duplicate_players = dataframe_playerDetailsData[dataframe_playerDetailsData.duplicated(subset=['name'], keep=False)]

# Display duplicate records
duplicate_players

Unnamed: 0,name,team,battingStyle,bowlingStyle,playingRole
0,Mohammad Rizwan †,Pakistan,Right hand Bat,Right arm Medium,Wicketkeeper Batter
1,Babar Azam (c),Pakistan,Right hand Bat,Right arm Offbreak,Batter
2,Mohammad Haris,Pakistan,Right hand Bat,Right arm Offbreak,Wicketkeeper Batter
3,Shan Masood,Pakistan,Left hand Bat,Right arm Medium fast,Opening Batter
4,Iftikhar Ahmed,Pakistan,Right hand Bat,Right arm Offbreak,Middle order Batter
...,...,...,...,...,...
1194,David Wiese,Namibia,Right hand Bat,Right arm Medium fast,Allrounder
1195,Bernard Scholtz,Namibia,Right hand Bat,Slow Left arm Orthodox,Bowler
1196,Ben Shikongo,Namibia,Right hand Bat,Right arm Medium fast,Bowler
1197,JJ Smit,Namibia,Right hand Bat,Left arm Medium fast,Bowling Allrounder


In [23]:
# Remove duplicate records based on the 'name' column, keeping the first occurrence
dataframe_playerDetailsData = dataframe_playerDetailsData.drop_duplicates(subset=['name'], keep='first')

# Display the updated DataFrame
dataframe_playerDetailsData

Unnamed: 0,name,team,battingStyle,bowlingStyle,playingRole
0,Mohammad Rizwan †,Pakistan,Right hand Bat,Right arm Medium,Wicketkeeper Batter
1,Babar Azam (c),Pakistan,Right hand Bat,Right arm Offbreak,Batter
2,Mohammad Haris,Pakistan,Right hand Bat,Right arm Offbreak,Wicketkeeper Batter
3,Shan Masood,Pakistan,Left hand Bat,Right arm Medium fast,Opening Batter
4,Iftikhar Ahmed,Pakistan,Right hand Bat,Right arm Offbreak,Middle order Batter
...,...,...,...,...,...
1031,Kashif Daud,United Arab Emirates,Right hand Bat,Right arm Medium fast,Bowling Allrounder
1049,Divan la Cock,Namibia,Right hand Bat,Legbreak,Opening Batter
1072,Gerhard Erasmus,Namibia,Right hand Bat,Right arm Offbreak,Allrounder
1141,Zawar Farid,United Arab Emirates,Right hand Bat,Right arm Medium,Bowler


In [24]:
dataframe_match.to_csv(dim_match_summary,index=False)
dataframe_battingSummary.to_csv(fact_bating_summary, index=False)
dataframe_bowlingSummary.to_csv(fact_bowling_summary, index=False)
dataframe_playerDetailsData.to_csv(dim_players_no_images,index=False)
dataframe_playerDetailsImageData.to_csv(dim_players,index=False)

### **🔹 What This Does?**  
- **saves the cleaned match DataFrame** (`dataframe_match`) to a CSV file.
- **saves the cleaned batting summary DataFrame** (`dataframe_battingSummary`) to a CSV file.
- **saves the cleaned bowling summary DataFrame** (`dataframe_bowlingSummary`) to a CSV file.
- **saves the cleaned player details DataFrame** (`dataframe_playerDetailsData`) to a CSV file.
- **saves the cleaned player details with imageurl DataFrame** (`dataframe_playerDetailsImageData`) to a CSV file.