## BeautifulSoup Tutorial 1
### Objective: scrap data from nasdaq.com
ref: https://towardsdatascience.com/web-scraping-for-beginners-beautifulsoup-scrapy-selenium-twitter-api-f5a6d0589ea6
Data will be scraped from the following table:
https://www.nasdaq.com/markets/indices/major-indices.aspx

### Steps:
#### 1.) Select URL to scap;
#### 2.) Finalize info needed to scap from the site;
#### 3.) Get request
#### 4.) Inspect website
#### 5.) Beautiful soup HTML parser
#### 6.) Select data, append to list
#### 7.) Download data to CSV, save locally
#### 8.) Use pandas to analyze the data


In [1]:
# Import following libraries:
from time import time, sleep
from random import randint
from IPython.core.display import clear_output
from requests import get
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt
import pandas as pd

### Read URL

In [2]:
url = 'https://www.nasdaq.com/markets/indices/major-indices.aspx'
response = get(url)

In [3]:
print(url)

https://www.nasdaq.com/markets/indices/major-indices.aspx


In [4]:
print(response)

<Response [200]>


### Create bs4 object with response from above and parser method = html.parser

In [7]:
page_html = BeautifulSoup(response.text, 'html.parser')

### Select stocks in the first page

In [8]:
data = [] #create an empty list

In [14]:
stable = page_html.find('table', attrs={'class':'USMN_MarketIndices'}) #find the table

In [16]:
print(stable) # /home/python/beautifulsoup/stable.png

<table class="USMN_MarketIndices">
<thead>
<tr>
<th>Symbol</th>
<th>Name</th>
<th>Index Value</th>
<th>Change Net / %</th>
<th>High</th>
<th>Low</th>
</tr>
</thead>
<tr>
<td>
<h3><a href="https://www.nasdaq.com/aspx/infoquotes.aspx?symbol=XAX&amp;selected=XAX">XAX</a></h3>
</td>
<td><a href="https://www.nasdaq.com/aspx/infoquotes.aspx?symbol=XAX&amp;selected=XAX">Amex Composite</a></td>
<td>2,543.03</td>
<td><span class="red">15.17 ▼ 0.59%</span></td>
<td>2,558.2</td>
<td>2,532.45</td>
</tr>
<tr>
<td>
<h3><a href="https://www.nasdaq.com/aspx/infoquotes.aspx?symbol=VOLNDX&amp;selected=VOLNDX">VOLNDX</a></h3>
</td>
<td><a href="https://www.nasdaq.com/aspx/infoquotes.aspx?symbol=VOLNDX&amp;selected=VOLNDX">DWS NASDAQ-100 Volatility Target Index</a></td>
<td>1,689.47</td>
<td>unch</td>
<td>1,689.47</td>
<td>1,689.47</td>
</tr>
<tr>
<td>
<h3><a href="https://www.nasdaq.com/aspx/infoquotes.aspx?symbol=FTSEQ500&amp;selected=FTSEQ500">FTSEQ500</a></h3>
</td>
<td><a href="https://www.nasdaq.com

In [17]:
rows = stable.find_all('tr') #find all rows, <tr> is row

In [18]:
print(rows)

[<tr>
<th>Symbol</th>
<th>Name</th>
<th>Index Value</th>
<th>Change Net / %</th>
<th>High</th>
<th>Low</th>
</tr>, <tr>
<td>
<h3><a href="https://www.nasdaq.com/aspx/infoquotes.aspx?symbol=XAX&amp;selected=XAX">XAX</a></h3>
</td>
<td><a href="https://www.nasdaq.com/aspx/infoquotes.aspx?symbol=XAX&amp;selected=XAX">Amex Composite</a></td>
<td>2,543.03</td>
<td><span class="red">15.17 ▼ 0.59%</span></td>
<td>2,558.2</td>
<td>2,532.45</td>
</tr>, <tr>
<td>
<h3><a href="https://www.nasdaq.com/aspx/infoquotes.aspx?symbol=VOLNDX&amp;selected=VOLNDX">VOLNDX</a></h3>
</td>
<td><a href="https://www.nasdaq.com/aspx/infoquotes.aspx?symbol=VOLNDX&amp;selected=VOLNDX">DWS NASDAQ-100 Volatility Target Index</a></td>
<td>1,689.47</td>
<td>unch</td>
<td>1,689.47</td>
<td>1,689.47</td>
</tr>, <tr>
<td>
<h3><a href="https://www.nasdaq.com/aspx/infoquotes.aspx?symbol=FTSEQ500&amp;selected=FTSEQ500">FTSEQ500</a></h3>
</td>
<td><a href="https://www.nasdaq.com/aspx/infoquotes.aspx?symbol=FTSEQ500&amp;select

In [19]:
for row in rows: # iterate over each row
    cols = row.find_all('td') # find the the cells, where each cell is <td>
    cols = [ele.text.strip() for ele in cols] # for each of the cells found, strip any leading or trailing whitespaces, ref: https://www.programiz.com/python-programming/methods/string/strip
    data.append([ele for ele in cols if ele]) #get rid of empty values and append non empty values to the list

In [20]:
print(data) #HTML is now parced, providing all non empty cells of the table

[[], ['XAX', 'Amex Composite', '2,543.03', '15.17\xa0▼\xa00.59%', '2,558.2', '2,532.45'], ['VOLNDX', 'DWS NASDAQ-100 Volatility Target Index', '1,689.47', 'unch', '1,689.47', '1,689.47'], ['FTSEQ500', 'FTSE NASDAQ 500 Index', '5,914.69', 'unch', '5,946.98', '5,899.54'], ['RCMP', 'NASDAQ Capital Market Composite Index', '189.5', '0.76\xa0▲\xa00.40%', '189.95', '188.2'], ['IXIC', 'NASDAQ Composite', '8,030.05', '19.59\xa0▼\xa00.24%', '8,094.06', '7,976.77'], ['NQGM', 'NASDAQ Global Market Composite', '2,401.15', '0.40\xa0▼\xa00.02%', '2,411.98', '2,376.27'], ['NQGS', 'NASDAQ Global Select Market Composite', '3,782.43', '9.66\xa0▼\xa00.25%', '3,813.18', '3,757.57'], ['QOMX', 'NASDAQ OMX 100 Index', '1,945.96', 'unch', '1,945.96', '1,945.96'], ['ILTI', 'NASDAQ OMX AeA Illinois Tech Index', '1,054.38', 'unch', '1,054.38', '1,054.38'], ['QMEA', 'NASDAQ OMX Middle East North Africa Index', '128.54', 'unch', '128.54', '128.54'], ['IXNDX', 'NASDAQ-100', '7,719.8', '32.05\xa0▼\xa00.41%', '7,791.