# Web Scraping with Python
## Using BeautifulSoup and Selenium

## Import Libraries and Classes

* We will be using the **selenium** library to control our browser (Chrome)
* The **BeautifulSoup (bs4)** library is used to parse and process HTML code. It can also be used for web scraping of simple sites without using selenium.
* We use the **time** and **datetime** classes to convert between date strings and the internal representation of dates used by the web site and our database.
* The **Stock** and **DailyData** classes are used to store the information about the stocks and price history in our portfolio. They are defined in the stock_class.py file.

In [None]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import time
from datetime import datetime
from stock_class import Stock, DailyData

## Retrieve Data from Yahoo! Finance using Selenium and Beautiful Soup
<br/>
This code is broken down into two sections. The first uses Selenium to control the browser which will navigate to the web site and retrieve the data. The second part of the code uses BeautifulSoup to process the HTML code, locate the tags of interest, and extract the price and volume data.

In [None]:
# Get stock price history from web using Web Scraping
def retrieve_stock_web(dateStart,dateEnd,stock_list):
# Use Selenium to control the browser and retrieve HTML code.
    dateFrom = str(int(time.mktime(time.strptime(dateStart,"%m/%d/%y"))))
    dateTo = str(int(time.mktime(time.strptime(dateEnd,"%m/%d/%y"))))
    recordCount = 0
    for stock in stock_list:
        stockSymbol = stock.symbol
        url = "https://finance.yahoo.com/quote/"+stockSymbol+"/history?period1="+dateFrom+"&period2="+dateTo+"&interval=1d&filter=history&frequency=1d"
        # Note this code assumes the use of the Chrome browser.
        # You will have to modify if you are using a different browser.
        options = webdriver.ChromeOptions()
        options.add_experimental_option('excludeSwitches',['enable-logging'])
        options.add_experimental_option("prefs",{'profile.managed_default_content_settings.javascript': 2})
        try:
            driver = webdriver.Chrome(options=options)
            driver.implicitly_wait(60)
            driver.get(url)
        except:
            raise RuntimeWarning("Chrome Driver Not Found")
# Use BeautifulSoup to parse and process HTML code.
        soup = BeautifulSoup(driver.page_source,"html.parser")
        row = soup.find('table',class_="W(100%) M(0)")
        dataRows = soup.find_all('tr')
        for row in dataRows:
            td = row.find_all('td')
            rowList = [i.text for i in td]
            columnCount = len(rowList)
            if columnCount == 7: # This row is a standard data row (otherwise it's a special case such as dividend which will be ignored)
                daily_data = DailyData(datetime.strptime(rowList[0],"%b %d, %Y"),float(rowList[5].replace(',','')),float(rowList[6].replace(',','')))
                stock.add_data(daily_data)
                recordCount += 1
    return recordCount

## Create Portfolio and Add Stock

In [None]:
portfolio = []
stock = Stock('MSFT','Microsoft',100)
portfolio.append(stock)


## Call retrieve_stock_web() to Get Stock Data

In [None]:
retrieve_stock_web('1/1/20','3/1/20',portfolio)

## Display Results

In [None]:
for stock in portfolio:
    headingLabel = stock.name + " - " + str(stock.shares) + " Shares\n"
    print(headingLabel)
    print("- Date -   - Price -   - Volume -")
    print("=================================")
    for daily_data in stock.DataList:
        row = daily_data.date.strftime("%m/%d/%y") + "   " +  '${:0,.2f}'.format(daily_data.close) + "   " + str(daily_data.volume)
        print(row)