# Webscraping Yahoo Finance
## Introduction
In this challenge, we will be using what we have learnt in the class to scrape stock information from Yahoo Finance. 


The following are general steps to scrape websites:
1. Identify target website.
2. Learn how the website constructs their URL so we can retrieve the desired page from the website.
3. Programmatically retrieve websites with the URL we reverse-engineered from the website.
4. Use libraries to traverse and obtain information we need from website.
5. Organise the information and return in desired format.


In [None]:
# Some of the libraries we need
import requests
from bs4 import BeautifulSoup
import json

## Step 1. Identify target website

Our target website is Yahoo Finance. Our desired information is the stock information available on the website. 

Before we start scraping, have a look at how a typical stock information page looks like: 

__[Apple - AAPL](https://finance.yahoo.com/quote/AAPL?p=AAPL)__ : https://finance.yahoo.com/quote/AAPL?p=AAPL


## Step 2. Deconstruct URL

We don't want to manually type in the search box and search for stocks we want. Instead, by studying the URL, we can deconstruct them, and by tweaking them, we can get to the page that holds the stock information we want. 


For example, look at the URL in **Step 1**. The ticker symbol of the Apple is **AAPL**. Notice that **AAPL** repeats twice in the URL. 

Perhaps, if we replace **AAPL** in the URL with any ticker symbol we want, we can get the stock information for that stock. 

For example, try __[Google - GOOG](https://finance.yahoo.com/quote/GOOG?p=GOOG)__ : https://finance.yahoo.com/quote/GOOG?p=GOOG


With that knowledge, come up with a function that can give you the URL for any stock ticker


In [None]:
def get_stock_url(ticker):
    # fill in the part here to get the URL for a particular stock





## Step 3. Retrieve website

Now that we know which URL to go to, we can retrieve the webpage programmatically, using **requests**. 

To do so, we use a Python Library called **requests**, which we have imported above. 

After retrieving the webpage, we then pass it to another library, **BeautifulSoup**, a library that will greatly faciliate our webscraping.

**Be patient, the site might take some time to retrieve (~60 seconds)**


In [None]:
def retrieve_website_html(url):
    # We need to set HTTP headers to "trick" Yahoo Finance to think that we are sending the requests from a desktop browser.
    # This is because Yahoo Finance does something called device targeting - sending slightly different website content
    # to different devices(e.g. mobile vs desktop). 
    # We base our scraping logic on what we see on the desktop version of the website, therefore, we want to "trick" it as such.
    headers = {"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
            "accept-encoding": "gzip, deflate, br",
            "accept-language": "en-GB,en;q=0.9,en-US;q=0.8,ml;q=0.7",
            "cache-control": "max-age=0",
            "dnt": "1",
            "sec-fetch-dest": "document",
            "sec-fetch-mode": "navigate",
            "sec-fetch-site": "none",
            "sec-fetch-user": "?1",
            "upgrade-insecure-requests": "1",
            "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36"}
    
    response = requests.get(url, headers=headers, verify=False)
    
    # Use BeautifulSoup to get the website HTML content using BeautifulSoup

## Step 4. Retrieve desired information

This step is probably the hardest step. 

We need to look at our a website is constructed, and utilise bits of the metadata to reliably retrieve the information we need.


### Chrome Developer Tool
To do so, we can open the **Chrome Developer Tools**. 

On the Yahoo Finance page, 
1. hover the mouse over any element you would like to find out more information about,
2. Right-click on it.
3. Then select inspect.

A side bar should popup with the raw HTML information. As you hover your mouse over the HTML elements, it should highlight the location of the element on the page. 



### Example
In the code below, we use the `find` method to look for a HTML element with *tag* `span` and *class* `Fz(36px)`. 

We then retrieve the textual information nested inside this element with `getText()`.



### Your turn!
Now, try to identify and scrape more financial data.

Here's a list for you to try:
1. PE Ratio (TTM)
2. EPS (TTM)
3. Earnings Date
4. Market Cap

#### Expected Output for ticker, AAPL:
<code>
    {
        'stock_ticker': 'AAPL,
        'price': 317.94,
        'pe_ratio': 24.98,
        'market_cap': '1.378T',
        'eps': 11.89
    }
</code>

In [1]:
def get_stock_information(ticker, website_html):    
######## Add in your webscrapping code here to scrape more information #############
    
####################################################################################
######## Remember to add the information to the summary_data object below ##########
    
    summary_data = {
       'ticker': ticker
    }
    return summary_data

## Putting it all together

The method below puts everything together.
Use the functions you have created above to get the data.
1. get_stock_url
2. retrieve_website_html
3. get_stock_information

In [None]:
def get_ticker_data(ticker):
    
    url = get_stock_url(ticker)
    print("getting from URL: " + url) #j_ignore_
    website = retrieve_website_html(url)

    summary_data = get_stock_information(ticker, website)
    
    return summary_data


## Step 5: Use the function you have created to fill in the data
 Download `stock_list_empty.csv` and use your function to fill in missing data.
 Remember, you can amend data on a Dataframe using `.loc[row_index, column_name]`.

In [10]:
import pandas as pd

stock_df = pd.read_csv("stock_list_empty.csv")


### Use your function below.

