# Part I: Setup & Data

In this section you'll want to:
- **import any packages** you'll need for your analysis
    - Reminder: we'll be doing web scraping, so you'll need to include the following:
        - `import requests`
        - `import bs4` 
        - `from bs4 import BeautifulSoup`

- **get the data**:
    - read the Congress dataset in (URL: https://raw.githubusercontent.com/fivethirtyeight/data/master/congress-age/congress-terms.csv)
    - Read, understand, and run the webscraping code below

In [1]:
# YOUR SETUP CODE HERE
import pandas as pd
import requests
import bs4
from bs4 import BeautifulSoup

In [2]:
# READ CONGRESS DATA IN HERE
politics = pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/congress-age/congress-terms.csv')

In [10]:
## CODE TO SCRAPE TABLE FROM WIKIPEDIA
## UNDERSTAND AND RUN ALL FOLLOWING CELLS

# specify webpage we want to scrape 
wiki = 'https://www.basketball-reference.com/teams/GSW/2010/gamelog/?fbclid=IwAR3BFW5ivLDuQE5NRVkPbHnEIlwe-CCCsoeo8RxOxmcPHssP0_mzfJgsVr8'
req = requests.get(wiki)
soup = BeautifulSoup(req.content, 'html.parser') # get contents of web page

In [11]:
wikitables = soup.findAll("table", 'row_summable sortable stats_table') # get tables
# extract the tables we want
tbl1 = wikitables[0] 
#tbl2 = wikitables[6]
tbl1

<table class="row_summable sortable stats_table" data-cols-to-freeze="3" id="tgl_basic"><caption>Regular Season Table</caption>
<colgroup><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/></colgroup>
<thead>
<tr class="over_header">
<th aria-label="" class=" over_header center" colspan="6" data-stat=""></th><th></th><th></th>
<th aria-label="" class=" over_header center" colspan="16" data-stat="header_totals">Team</th><th></th>
<th aria-label="" class=" over_header center" colspan="16" data-stat="header_opp">Opponent</th>
</tr>
<tr>
<th aria-label="Rank" class="ranker poptip sort_default_asc show_partial_when_sorting right" data-stat="ranker" data-tip="Rank" scope="col">Rk</th>
<th aria-label="Season Game" class=" poptip right" data-stat="game_season" data-tip="Season Game" scope="col">G</th>


In [12]:
# create some empty dataframes
# note the tables aren't the same size. ugh.
new_tbl1 = pd.DataFrame(columns=range(0,40),index = range(0,91)) # I know the size 
#new_tbl2 = pd.DataFrame(columns=range(0,13), index = range(0,3))

In [24]:
#new_tbl1

In [35]:
# get the column names for our first table
ind=0
cols_list = []
for header in tbl1.find_all('tr'): # specify HTML tags
    header_name = header.find_all('th') # tag containing column names
    for head in header_name:
        cols_list.append(head.get_text()) # get the text from between the tags
#new_tbl1.columns = [s.replace('\n','') for s in cols_list] # get rid of new line characters in column names

In [36]:
cols_list
new_tbl1.columns

RangeIndex(start=0, stop=40, step=1)

In [41]:
# fill in the contents for our first table
row_marker = -1
for row in tbl1.find_all('tr'):
    column_marker = 0
    columns = row.find_all('td') # different tag than above for table contents
    for column in columns:
        new_tbl1.iat[row_marker,column_marker] = column.get_text()
        column_marker += 1
    row_marker += 1

In [42]:
new_tbl1

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,30,31,32,33,34,35,36,37,38,39
0,Rk,G,Date,,Opp,W/L,Tm,Opp,FG,FGA,...,3P%,FT,FTA,FT%,ORB,TRB,AST,STL,BLK,TOV
1,1,2009-10-28,,HOU,L,107,108,44,89,.494,...,16,20,.800,13,41,30,14,7,18,22
2,2,2009-10-30,@,PHO,L,101,123,36,85,.424,...,21,34,.618,9,47,30,7,7,23,25
3,3,2009-11-04,,MEM,W,113,105,47,87,.540,...,24,36,.667,13,46,24,8,4,17,20
4,4,2009-11-06,,LAC,L,90,118,29,84,.345,...,24,32,.750,7,44,28,11,9,16,29
5,5,2009-11-08,@,SAC,L,107,120,42,83,.506,...,29,38,.763,15,52,21,8,4,14,22
6,6,2009-11-09,,MIN,W,146,105,52,91,.571,...,16,20,.800,21,53,23,7,6,28,31
7,7,2009-11-11,@,IND,L,94,108,37,92,.402,...,25,34,.735,16,57,18,5,12,18,25
8,8,2009-11-13,@,NYK,W,121,107,49,84,.583,...,15,19,.789,12,37,21,7,4,21,21
9,9,2009-11-14,@,MIL,L,125,129,46,82,.561,...,19,30,.633,15,46,21,5,4,14,22
