# Web Scraping - Dados NBA

## Objetivo

> O objetivo com esse projeto é extrair dados do site [Sports Reference](https://www.sports-reference.com/). No caso, escolhi coletar dados do basquete masculino [NBA](https://www.basketball-reference.com/). 
Faremos um processo de ETL, depois analisaremos os dados, buscando informações que respondam nossas perguntas.


In [1]:
# Carregar Bibliotecas

from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd
import requests

### ETL

> Extrair, transformar e carregar (ETL) é o processo que as organizações orientadas a dados usam para coletar dados de várias fontes e reuni-los para dar suporte à descoberta, à geração de relatórios, à análise e à tomada de decisões.

Nessa etapa faremos o processo de ETL, de maneira que nosso código fique otimizado, possamos carregar qualquer dataframe, dentro de 'year_range' ou seja de 2000 à 2022.

In [2]:
# Range com temporadas de 2000 à 2022:
from datetime import date
start = date(2000, 1, 1)
end = date(2022, 1, 1)

year_range = [year for year in range(start.year, end.year + 1)]

In [3]:
names = ['Season_{}'.format(i) for i in year_range]

In [10]:
# Extraindo os dados e visualizando em dataframes
for i in range(len(year_range)):
    url = 'https://www.basketball-reference.com/leagues/NBA_{}_per_game.html'.format(year_range[i])
    
    #this is HTML from given URL
    html = urlopen(url)
    soup = BeautifulSoup(html)
    headers = [th.get_text() for th in soup.find_all('tr', limit = 2)[0].find_all('th')]
    headers = headers[1:]
    
    #avoid the first header row
    rows = soup.find_all('tr')[1:]
    stats = [[td.get_text() for td in rows[j].find_all('td')]
                   for j in range(len(rows))]
    locals()[names[i]] = pd.DataFrame(stats, columns = headers)


In [5]:
# Temporada 2000
Season_2000.head()

Unnamed: 0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,Tariq Abdul-Wahad,SG,25,TOT,61,56,25.9,4.5,10.6,0.424,...,0.756,1.7,3.1,4.8,1.6,1.0,0.5,1.7,2.4,11.4
1,Tariq Abdul-Wahad,SG,25,ORL,46,46,26.2,4.8,11.2,0.433,...,0.762,1.7,3.5,5.2,1.6,1.2,0.3,1.9,2.5,12.2
2,Tariq Abdul-Wahad,SG,25,DEN,15,10,24.9,3.4,8.7,0.389,...,0.738,1.6,1.9,3.5,1.7,0.4,0.8,1.3,2.1,8.9
3,Shareef Abdur-Rahim,SF,23,VAN,82,82,39.3,7.2,15.6,0.465,...,0.809,2.7,7.4,10.1,3.3,1.1,1.1,3.0,3.0,20.3
4,Cory Alexander,PG,26,DEN,29,2,11.3,1.0,3.4,0.286,...,0.773,0.3,1.2,1.4,2.0,0.8,0.1,1.0,1.3,2.8


In [7]:
# Temporada 2022
Season_2022.head()

Unnamed: 0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,Precious Achiuwa,C,22,TOR,73,28,23.6,3.6,8.3,0.439,...,0.595,2.0,4.5,6.5,1.1,0.5,0.6,1.2,2.1,9.1
1,Steven Adams,C,28,MEM,76,75,26.3,2.8,5.1,0.547,...,0.543,4.6,5.4,10.0,3.4,0.9,0.8,1.5,2.0,6.9
2,Bam Adebayo,C,24,MIA,56,56,32.6,7.3,13.0,0.557,...,0.753,2.4,7.6,10.1,3.4,1.4,0.8,2.6,3.1,19.1
3,Santi Aldama,PF,21,MEM,32,0,11.3,1.7,4.1,0.402,...,0.625,1.0,1.7,2.7,0.7,0.2,0.3,0.5,1.1,4.1
4,LaMarcus Aldridge,C,36,BRK,47,12,22.3,5.4,9.7,0.55,...,0.873,1.6,3.9,5.5,0.9,0.3,1.0,0.9,1.7,12.9


In [34]:
# Salvando todos dataframes em CSV:

Season_2000.to_csv('Season_2000.csv')
Season_2001.to_csv('Season_2001.csv')
Season_2002.to_csv('Season_2002.csv')
Season_2003.to_csv('Season_2003.csv')
Season_2004.to_csv('Season_2004.csv')
Season_2005.to_csv('Season_2005.csv')
Season_2006.to_csv('Season_2006.csv')
Season_2007.to_csv('Season_2007.csv')
Season_2008.to_csv('Season_2008.csv')
Season_2009.to_csv('Season_2009.csv')
Season_2010.to_csv('Season_2010.csv')
Season_2011.to_csv('Season_2011.csv')
Season_2012.to_csv('Season_2012.csv')
Season_2013.to_csv('Season_2013.csv')
Season_2014.to_csv('Season_2014.csv')
Season_2015.to_csv('Season_2015.csv')
Season_2016.to_csv('Season_2016.csv')
Season_2017.to_csv('Season_2017.csv')
Season_2018.to_csv('Season_2018.csv')
Season_2019.to_csv('Season_2019.csv')
Season_2020.to_csv('Season_2020.csv')
Season_2021.to_csv('Season_2021.csv')
Season_2022.to_csv('Season_2022.csv')