This Notebook is used to demonstrate a Python script that extracts indian-premier-league data from a https://www.espncricinfo.com/ using web scraping techniques. 
It leverages the requests library to fetch HTML content and BeautifulSoup to parse and extract the desired information.
At the end we have download the fetched data as a CSV file.

Click [link](https://www.espncricinfo.com/records/trophy/team-series-results/indian-premier-league-117) to visit the source page(Official ESPN).

In [1]:
from bs4 import BeautifulSoup
import pandas as pd

import requests

In [2]:
# URL of the espncricinfo to scrape
url = "https://www.espncricinfo.com/records/trophy/team-series-results/indian-premier-league-117"

# Send an HTTP GET request to the website
response = requests.get(url)

# Parse the HTML code using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')

We have single table in the above url, used thead span to get columns details and table/tbody span to get records.

In [3]:
# Select headers/columns name of tables
column_list = []
thead_items = soup.select('thead span')

for item in thead_items:
    column_list.append(item.get_text())
    
column_list

['Series/Tournament', 'Season', 'Winner', 'Margin']

In [4]:
# Select all list item tags
data = []
for item in soup.select('tbody tr'):
    l = item.select('span')
    link = item.find_all('a')
    data.append([l[1].get_text(), l[2].get_text(), l[3].get_text(), link[0].get('href')])
    # break
# data

In [5]:
series_results = pd.DataFrame(data, columns=column_list)
series_results.head()

Unnamed: 0,Series/Tournament,Season,Winner,Margin
0,Indian Premier League (in India),2007/08,Royals,/series/indian-premier-league-2007-08-313494/m...
1,Indian Premier League (in South Africa),2009,Chargers,/series/indian-premier-league-2009-374163/matc...
2,Indian Premier League (in India),2009/10,Super Kings,/series/indian-premier-league-2009-10-418064/m...
3,Indian Premier League (in India),2011,Super Kings,/series/indian-premier-league-2011-466304/matc...
4,Indian Premier League (in India),2012,KKR,/series/indian-premier-league-2012-520932/matc...


There are some conflict in column:Season which is indicating the Year and column:Series/Tournament has basically shows the place where the match was played. 

In [6]:
# Rename columns
series_results.columns = ['country', 'year', 'winner', 'link']
series_results.head()

Unnamed: 0,country,year,winner,link
0,Indian Premier League (in India),2007/08,Royals,/series/indian-premier-league-2007-08-313494/m...
1,Indian Premier League (in South Africa),2009,Chargers,/series/indian-premier-league-2009-374163/matc...
2,Indian Premier League (in India),2009/10,Super Kings,/series/indian-premier-league-2009-10-418064/m...
3,Indian Premier League (in India),2011,Super Kings,/series/indian-premier-league-2011-466304/matc...
4,Indian Premier League (in India),2012,KKR,/series/indian-premier-league-2012-520932/matc...


some pre-processing steps:
- extract only place name in country column.
- Place correct year name in year column.

In [7]:
series_results.country = series_results.country.apply(lambda country: \
                                                      country.replace('(','/').replace(')','/').split('/')[-2])
series_results.country = series_results.country.apply(lambda country: country.replace('in', '').strip())

In [8]:
year_list = series_results.year.tolist()
len_list = len(year_list)

for i in range(len_list):
    if year_list[i].find('/') != -1:
        if i == 0 and i != len_list-1:
            year_list[i] = str(int(year_list[i+1])-1)
        elif i != 0 and i == len_list-1 :
            year_list[i] = str(int(year_list[i-1])+1)
        else :
            year_list[i] = str(int(year_list[i-1])+1)
            
series_results.year = year_list
series_results

Unnamed: 0,country,year,winner,link
0,India,2008,Royals,/series/indian-premier-league-2007-08-313494/m...
1,South Africa,2009,Chargers,/series/indian-premier-league-2009-374163/matc...
2,India,2010,Super Kings,/series/indian-premier-league-2009-10-418064/m...
3,India,2011,Super Kings,/series/indian-premier-league-2011-466304/matc...
4,India,2012,KKR,/series/indian-premier-league-2012-520932/matc...
5,India,2013,Mumbai,/series/indian-premier-league-2013-586733/matc...
6,United Arab Emirates,2014,KKR,/series/pepsi-indian-premier-league-2014-69587...
7,India,2015,Mumbai,/series/pepsi-indian-premier-league-2015-79112...
8,India,2016,Sunrisers,/series/ipl-2016-968923/match-schedule-fixture...
9,India,2017,Mumbai,/series/ipl-2017-1078425/match-schedule-fixtur...


In [9]:
series_results.winner.value_counts()

Super Kings    5
Mumbai         5
KKR            2
Royals         1
Chargers       1
Sunrisers      1
Titans         1
Name: winner, dtype: int64

In [10]:
series_results.to_csv("ipl_series_results.csv", index=False)
print("ipl_series_results.csv downloaded !!!")

ipl_series_results.csv downloaded !!!
