The code imports several libraries: BeautifulSoup from bs4, requests, pandas, numpy, and json. These libraries are commonly used for web scraping, data manipulation, and working with JSON data.

The BeautifulSoup library is used for parsing HTML and XML documents, requests is used for making HTTP requests to fetch web pages, pandas is a powerful data manipulation library, numpy provides support for numerical operations, and json is used for working with JSON data.

By importing these libraries, the code sets up the necessary dependencies to perform tasks such as scraping web pages, processing data, and working with JSON files.

In [52]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
import json

### Step-1

The code is fetching a match summary table from the ESPN Cricinfo page for the year 2022. It is using the BeautifulSoup library to parse the HTML of the web page and extract the desired table.

Let's go through the code step by step:

1. It starts by importing the necessary libraries: BeautifulSoup from bs4, requests, pandas, numpy, and json. These libraries are commonly used for web scraping, data manipulation, and working with JSON data.

2. The code defines a variable URL which contains the web address of the page that needs to be scraped.

3. The requests.get() function is used to send an HTTP GET request to the specified URL and retrieve the web page content. The response from the request is stored in the variable r.

In [None]:
##Collecting the match summary table for thee year 2022 from ESPN CRICK INFO page
## This method is using Beautiful Soup. We are creating a soup and extracting the table
URL = 'https://www.espncricinfo.com/records/year/team-match-results/2022-2022/twenty20-internationals-3'
r = requests.get(URL)
soup = BeautifulSoup(r.text,'lxml')
table = soup.find(class_="ds-overflow-x-auto ds-scrollbar-hide")
headers = table.find_all("span", class_="ds-cursor-pointer")
body = table.find_all("td", class_="ds-min-w-max")
titles = [i.text for i in headers]
data = [j.text for j in body]

In [None]:
##Collecting the match summary table for thee year 2022 from ESPN CRICK INFO page
## As we are collecting the table, we can also use pandas library to extract tables directly into dataframes
URL = 'https://www.espncricinfo.com/records/year/team-match-results/2022-2022/twenty20-internationals-3'
table = pd.read_html(URL)
table

#### The remaining data is extracted through Brighdata website

### Match result Summary

1. The code is reading data from a JSON file and creating a pandas DataFrame to store the Match result Summary data.
2. Cleaning and Transforming the Match result Summary data.

In [None]:
with open('t20_json_files/t20_wc_match_results.json') as f:
    data = json.load(f)
df_result = pd.DataFrame(data[0]['matchSummary'])
df_result  

In [None]:
df_result.rename(columns={'scorecard':'match_id'} ,inplace = True)
df_result

In [127]:
df_result.to_csv('t20_csv_files/t20_wc_match_results.csv', index = False)

In [None]:
match_id_dict = {}
for index,row in df_result.iterrows():
    key1 = row['team1']+ ' ' + 'Vs' + ' '+ row['team2']
    key2 = row['team2']+ ' ' + 'Vs' + ' '+ row['team1']
    match_id_dict[key1] = row['match_id']
    match_id_dict[key2] = row['match_id']
    
match_id_dict

### Batting Summary

1. The code is reading data from a JSON file and creating a pandas DataFrame to store the data.
2. Cleaning and Transforming the data.

In [None]:
with open('t20_json_files/t20_wc_batting_summary.json') as f:
    data = json.load(f)
    bat_lst = []
    for i in data:
        bat_lst.extend(i['battingSummary']) 
df_bat = pd.DataFrame(bat_lst)
df_bat

In [None]:
df_bat['Result'] = df_bat.dismissal.apply(lambda x:'not out' if x=="" else "out")

In [None]:
df_bat.drop(columns = 'dismissal' ,inplace = True)

In [None]:
df_bat['batsmanName'] = df_bat['batsmanName'].apply(lambda x:x.replace('â€',''))
df_bat.head(15)

In [None]:
df_bat['match_id'] = df_bat['match'].map(match_id_dict)
df_bat

In [90]:
df_bat.to_csv('t20_csv_files/t20_wc_batting_summary.csv', index = False)

### Bowling Summary

1. The code is reading data from a JSON file and creating a pandas DataFrame to store the data.
2. Cleaning and Transforming the data.

In [None]:
with open('t20_json_files/t20_wc_bowling_summary.json') as f:
    data = json.load(f)
    all_rec = []
    for rec in data:
        all_rec.extend(rec['bowlingSummary'])
df_bowl = pd.DataFrame(all_rec)

In [None]:
df_bowl['match_id'] = df_bowl['match'].map(match_id_dict)
df_bowl

In [126]:
df_bowl.to_csv('t20_csv_files/t20_wc_bowling_summary.csv', index = False)

### Player Info

1. The code is reading data from a JSON file and creating a pandas DataFrame to store the data.
2. Cleaning and Transforming the data.

In [None]:
with open('t20_json_files/t20_wc_player_info.json') as f:
    data = json.load(f)
df_info = pd.DataFrame(data)
df_info['name'] = df_info['name'].apply(lambda x:x.replace('â€',''))
df_info.head(30)

In [125]:
df_info.to_csv('t20_csv_files/t20_wc_player_info.csv', index = False)