# Web Data Scraping

__Web Data Scraping__ is a technique used to extract data from websites. This process involves programmatically accessing web pages and pulling out the information that you need. Web scraping can be used to gather data from websites that do not provide an __*Application Program Interface (API)*__ for easy data access or when you need large amounts of data quickly and the site's API limits do not allow for this. Here are the key aspects of web scraping:

1. __Sending a Request:__ The first step is to send a request to the web server hosting the website from which data is to be scraped. This request is typically done using HTTP or HTTPS protocols.

2. __Receiving the Response:__ The server responds to the request by sending back the requested web page, often in HTML format. Other formats like JSON and XML can also be received depending on the API or web service.

3. __Parsing the Data:__ Once the data is received, it needs to be parsed. For HTML, this usually involves using libraries like BeautifulSoup in Python, which allow for easy navigation of the structure of the HTML and extraction of the relevant information.

4. __Data Extraction:__ After parsing, the necessary data is extracted. This could be anything from product details on an ecommerce site, stock prices, sports statistics, or any other information available on the web.

# Imports

In [None]:
import pandas as pd
import numpy as np
import requests
import warnings
import matplotlib.pyplot as plt
import seaborn as sns
from bs4 import BeautifulSoup

warnings.filterwarnings("ignore")

# Web Scrape ESPN NBA Data

In [None]:
# URL

url = 'https://www.espn.ph/nba/table/_/group/league'

# Send requests

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

# Response

response = requests.get(url = url, headers = headers)

# Create a bs4 object to parse the HTML content

soup = BeautifulSoup(response.content, 'html.parser')

In [None]:
# Find the table containing teams

table_teams = soup.select_one('tbody.Table__TBODY')

# Find the table with the stats

table_data = soup.select_one('div.Table__Scroller tbody.Table__TBODY')

In [None]:
# Define column names

column_names = ["Wins", "Losses", "WinPCT", "GB", "Home", "Away", "Div", "Conf", "PPG", "Opp_PPG", "Diff", "Strk", "L10"]

In [None]:
# Team Names

team = []

for teams in table_teams.find_all('span', class_ = 'hide-mobile'):

  team.append(teams.get_text())

#team

In [None]:
# Team Stats

stat = []

for tr in table_data.find_all('tr'):

  row_data = []

  for td in tr.find_all('td'):

    row_data.append(td.get_text())

  stat.append(row_data)

# stat

In [None]:
# Create Dataframe for Team

df_team = pd.DataFrame(team, columns = ['Team'])
df_team.head()

Unnamed: 0,Team
0,Cleveland Cavaliers
1,Oklahoma City Thunder
2,Boston Celtics
3,Los Angeles Lakers
4,Denver Nuggets


In [None]:
# Create Dataframe for Stats

df_stats = pd.DataFrame(stat, columns = column_names)
df_stats.head()

Unnamed: 0,Wins,Losses,WinPCT,GB,Home,Away,Div,Conf,PPG,Opp_PPG,Diff,Strk,L10
0,54,10,0.844,-,29-4,25-6,11-1,37-7,122.9,111.5,11.4,W14,10-0
1,53,11,0.828,1,28-4,24-7,11-3,32-10,119.6,106.6,13.0,W7,9-1
2,46,18,0.719,8,22-11,24-7,11-2,32-11,117.0,108.2,8.8,W4,8-2
3,40,22,0.645,13,25-7,15-15,11-3,27-12,112.9,111.1,1.8,L1,8-2
4,41,23,0.641,13,22-9,19-14,6-5,24-14,121.2,116.5,4.7,L1,6-4


In [None]:
# Concatenate

df = pd.concat([df_team, df_stats], axis = 1)
df

Unnamed: 0,Team,Wins,Losses,WinPCT,GB,Home,Away,Div,Conf,PPG,Opp_PPG,Diff,Strk,L10
0,Cleveland Cavaliers,54,10,0.844,-,29-4,25-6,11-1,37-7,122.9,111.5,11.4,W14,10-0
1,Oklahoma City Thunder,53,11,0.828,1,28-4,24-7,11-3,32-10,119.6,106.6,13.0,W7,9-1
2,Boston Celtics,46,18,0.719,8,22-11,24-7,11-2,32-11,117.0,108.2,8.8,W4,8-2
3,Los Angeles Lakers,40,22,0.645,13,25-7,15-15,11-3,27-12,112.9,111.1,1.8,L1,8-2
4,Denver Nuggets,41,23,0.641,13,22-9,19-14,6-5,24-14,121.2,116.5,4.7,L1,6-4
5,New York Knicks,40,23,0.635,13.5,21-11,19-12,10-3,28-13,116.9,112.8,4.1,L3,5-5
6,Memphis Grizzlies,40,24,0.625,14,22-10,18-14,10-5,23-16,122.7,116.7,6.0,W2,4-6
7,Houston Rockets,39,25,0.609,15,21-10,18-14,12-3,24-16,113.2,109.3,3.9,W2,5-5
8,Milwaukee Bucks,36,27,0.571,17.5,21-11,14-16,6-6,26-18,114.6,112.5,2.1,L2,7-3
9,Indiana Pacers,35,27,0.565,18,19-9,15-17,7-4,20-19,116.6,115.3,1.3,L2,6-4
