# Scraping top companies stock details from companiesmarketcap & Yahoo finance 
### Introduction
This project scrapes the web to get the stock details of 100 largest companies trading in US. The stock names are obtained from companiesmarketcap.com website. The stock details for each stock is obtained from 'finance.yahoo.com'.

### Project Outline    
- We use Webscraping to accomplish this goal. Webscraping is a technique to programatically get and parse the information from a website.
- First, We will get N most popular stock symbols from companiesmarketcap.com website, https://companiesmarketcap.com/usa/largest-companies-in-the-usa-by-market-cap/?page=1
![](https://i.imgur.com/AHMq49x.png)
- Then, We will get the stock information such as price,market value,company name etc from yahoo finance website https://finance.yahoo.com/ for each of the stocks.
![](https://i.imgur.com/TyerqTr.png)
- Tools used:
   - Python/jupyter Notebook 
   - Python requests package to download the web page
   - Python BeautifulSoup package to parse the html page downloaded with the requests package
   

- Finally, Save the information to a csv file in the folloiwng format:
```
Company,Symbol,Marketprice,previousClosePrice,changeInPrice,Volume,MarketCap
Sundial Growers Inc.,SNDL,0.575,0.6113,-0.04,66243601,1.184B
Microsoft Corporation,MSFT,287.93,290.73,-2.8,34264008,2.159T
Snap Inc.,SNAP,38.01,39.45,-1.44,23064203,61.74B
Robinhood Markets Inc.,HOOD,11.81,12.25,-0.44,19269403,9.869B

````


In [64]:
!pip install jovian --upgrade --quiet

In [65]:
import jovian

In [66]:
# Execute this to save new versions of the notebook
jovian.commit(project="webscrape-top-n-stocks")

<IPython.core.display.Javascript object>

[jovian] Updating notebook "pramation/webscrape-top-n-stocks" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/pramation/webscrape-top-n-stocks[0m


'https://jovian.ai/pramation/webscrape-top-n-stocks'

### Importing necessary libraries
- `requests`, to download web page
- `BeautifulSoup` to parse the downloaded HTML page
- `pandas` to read the csv file into a dataFrame
- `math` to use ceil function 

In [67]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from math import ceil

### Defining 'get_url_page' function  to accept URL and return a HTML document.

In [68]:
def get_url_page(url_link):
    '''This accepts an URL as a parameter, 
       accesses and loads the webpage into a variable
       retuns a document of the type BeautifulSoup  '''
    
    #uses requests function to access and load the web page
    stock_page_response=requests.get(url_link)
    
    if not stock_page_response.ok :
        print('Status code for {}: {}'.format(url_link,stock_page_response.status_code))
        #raise Exception('Failed to fetch web page ' + url_link)
        return ''
        
    # If the status code is success , the page is sent through html parser and builds a parsed document.
    stock_page_doc=BeautifulSoup(stock_page_response.text,'html.parser')
    
    # Returns a beautifulSoup document.
    return stock_page_doc

### Defining 'get_popular_stocks' function to get N most popular stocks

In [69]:
def get_popular_stocks(num_stocks=10):
    '''
      This functions builds a list of most popular stock symbols.
      Returns the list of N number of popular stocks
    '''
    # Get the number of pages to access based on the number of stocks that need to be processed. each page has 100 stocks
    page_numbers=int((lambda x:1 if x<1 else ceil(x/100))(num_stocks))
    
    stocks_symbols=[]
    for page_number in range(1,page_numbers+1):
        popular_stocks_url='https://companiesmarketcap.com/usa/largest-companies-in-the-usa-by-market-cap/?page='+str(page_number)+'/'

        print("Web Page: ",popular_stocks_url)
        #Call the function 'get_url_page' and get parsed html document
        stocks_symbols_tags=get_url_page(popular_stocks_url).find_all('div',{'class':'company-code'})

        # Extract ticker symbol name from the tag 'div' in the document
        for stocks_symbols_tag in stocks_symbols_tags:
            stocks_symbols.append(stocks_symbols_tag.text.strip())
    
    #Return the list with N stocks    
    return stocks_symbols[:num_stocks]

### Defining 'get_name_n_symbol' function to get each stock's details

In [70]:
def get_name_n_symbol(companyName):
    ''' 
       A Helper function to accept Name and returns company Name and ticker symbol
    '''
    cName=companyName.split("(")
    return cName[0].strip(),cName[1].strip(')')

def get_ticker_details(ticker_symbol):
    '''
       This function accepts the ticker symbol,
       gets the html parsed document, finds appropriate tags and its value(text)
       massages the data and returns stocks details as a python Dictionary
    '''
    
    #print("Processing : ",ticker_symbol)
    ticker_url='https://finance.yahoo.com/quote/'+ticker_symbol
    
    #get html parsed document. 
    stock_page_doc=get_url_page(ticker_url)
    
    if len(stock_page_doc)== 0:
        return ''
    
    #Use find function of BeatufulSoup objet to get the values of the tags
       #Use helper function get_name_n_symbol to extract company name and ticker symbol from the h1 name
    cName,ticker=get_name_n_symbol(stock_page_doc.h1.text)
    MarketPrice=stock_page_doc.find('fin-streamer',{'class':"Fw(b) Fz(36px) Mb(-4px) D(ib)",'data-field':"regularMarketPrice" }).text.replace(",","")
    previousClosePrice=stock_page_doc.find('td',{'class':"Ta(end) Fw(600) Lh(14px)",'data-test':"PREV_CLOSE-value"}).text.replace(",","")
    Volume=stock_page_doc.find('td',{'class':"Ta(end) Fw(600) Lh(14px)",'data-test':"TD_VOLUME-value"}).text.replace(",","")
    
    #Some of the stocks(ex.S&P) does not have market capital, using lambda function to replace such vaules with 0
    MarketCap=(lambda x: x.text.replace(',','') if x != None else '0' )(stock_page_doc.find('td',{'class':"Ta(end) Fw(600) Lh(14px)",'data-test':"MARKET_CAP-value"}))
    
    ticker_dict={'Company':cName.replace(',',''),
             'Symbol':ticker,
             'Marketprice':float(MarketPrice),
             'previousClosePrice':float(previousClosePrice),
             'changeInPrice':round(float(MarketPrice)- float(previousClosePrice),2),
             'Volume':int(Volume),
             'MarketCap':MarketCap}
    
    #Return Dictionary with stock details
    return ticker_dict

### Defining 'write_csv' function

In [71]:
def write_csv(dict_items,file_name):
    ''' 
       Accepts list of python dictionary with stock details and write it to a csv file
       Prints success message upon completing the writing to the file
    '''
    
    #open the file for writing
    with open(file_name,'w') as f:
        
        #Get headers(keys) of the first dictionary from the list. Convert to a list, join each element of the list
        #with ',' to form a string and write to the file.
        headers=list(dict_items[0].keys())
        f.write(",".join(headers)+"\n")
        
        # For each Dictionary item, create a list with values and write it to the file
        for dict_item in dict_items:
            values=[]
            for header in headers:
                try:
                    values.append(str(dict_item.get(header,''))) 
                except:
                    pass
            f.write(",".join(values)+"\n")
    
    print("Writing to file '{}' completed".format(file_name))

### Defining 'verify_results' function to verify the output:
- Display Sample Output
- Get the number of records , and match it with the number of stock symbols passed.

In [72]:
def verify_results(file_name):
    ''' 
        This Function verifies the File Output.
        Accepts file name as the parameter and displays sample output and row count.
    '''
    
    # Create the dataFrame with the csv file
    stocks_df=pd.read_csv(file_name)
    
    #print a record count of a single column
    print('')
    print('Checking Output written to the file')
    print('---------------------------------------')
    print("Number of records written to the file : ",stocks_df.count()[1])
    print('')
    #print a sample output of first 4 rows in the file alson with its headers
    print("Sample Output : ")
    display(stocks_df.head(4))

### Define 'scrape_stocks_info' function to bring all the fuctions together.
 - Gets the popular stock symbols
 - Pass each symbol as a parameter to get_ticker_details function and get stock details
 - build a list of dictionary with stock details
 - Write the information to a file.

In [73]:
def scrape_stocks_info(num_stocks):
    ''' 
      This function Accepts number of stocks to be processed and writes the stock information to a file
    '''
                       
    #Gets List of popular stocks and passes them to the function 'get_ticker_details' one by one.
    #This is return a list of dictionaries with stock details.
    print("Start processing Stock symbols...")
    stocks_info=[get_ticker_details(ticker_name) for ticker_name in get_popular_stocks(num_stocks)]
    print("End processing Stock symbols...")
    
    # Pass the list of dictionies to the 'write_csv' function which writes it to the file.
    file_name=str(num_stocks)+"_most_popular_stocks_on_yahoo.csv"
    write_csv(stocks_info,file_name)
    
    #Verify Results:
    verify_results(file_name)

### Call scrape_stocks_info function to get the stock details and write it to a file, This accepts number of stocks to process


In [74]:
scrape_stocks_info(200)

Start processing Stock symbols...
Web Page:  https://companiesmarketcap.com/usa/largest-companies-in-the-usa-by-market-cap/?page=1/
Web Page:  https://companiesmarketcap.com/usa/largest-companies-in-the-usa-by-market-cap/?page=2/
Status code for https://finance.yahoo.com/quote/LBSI: 404
End processing Stock symbols...
Writing to file '200_most_popular_stocks_on_yahoo.csv' completed

Checking Output written to the file
---------------------------------------
Number of records written to the file :  199

Sample Output : 


Unnamed: 0,Company,Symbol,Marketprice,previousClosePrice,changeInPrice,Volume,MarketCap
0,Apple Inc.,AAPL,159.98,163.17,-3.19,73117951,2.611T
1,Microsoft Corporation,MSFT,280.89,289.86,-8.97,29489527,2.106T
2,Alphabet Inc.,GOOG,2553.74,2642.44,-88.7,1071672,1.687T
3,Amazon.com Inc.,AMZN,2753.43,2912.82,-159.39,3449112,1.401T


In [75]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Updating notebook "pramation/webscrape-top-n-stocks" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/pramation/webscrape-top-n-stocks[0m


'https://jovian.ai/pramation/webscrape-top-n-stocks'

### Summary
In this project, we got the details of the 100 largest companies trading in the US stock exchanges.
    
This is accomplished by following the below outlined steps:
1. Got the list of stock tickers of 100 largest companies from `companiesmarket.com` website. This is done using `requests` and `BeautifulSoup` libraries.
2. For each stock symbol, We got the following  details from `https://finance.yahoo.com`:
   * Today's price
   * previous Day's price
   * Change in Price
   * Volume
   * Company Name  
   
   We created a `python dictionary` to save all the details.<br>
   This is done by using the function `get_ticker_details` and used `requests` and `BeautifulSoup` libraries.


3. We built a list with the details of all the stock symbol from the above step.
4. We wrote the data from above step to a `csv` file. This is done with `write_csv` function.
5. Finally, we verified the data written to the file by doing the following:
    * We read the data from csv file into a pandas DataFrame.
    * Got the row count and compared it with the expected number.
    * Displayed a sample output and visually verified it.
6. Now this information can be used to get a sense of day's market trend of these stocks and possibly make a buy/sell decision.


### Future Work
Now, with the foundation to scrape stock details in place. 
We can take this forward and schedule this to run every day and track the stock prices over a period of time. 
This Can be used to analyze the stock trends.
This data can be further used to build Machine Learning models to do predictive analysis of the stock prices.
In future, this can be used to analyze stocks traded in the stock exchanges across the globe.
       

### References
1. Python offical documentation. https://docs.python.org/3/

2. Requests library. https://pypi.org/project/requests/

3. Beautiful Soup documentation. https://www.crummy.com/software/BeautifulSoup/bs4/doc/

4. Jovian, Introduction to Web Scraping. https://jovian.ai/aakashns/python-web-scraping-and-rest-api

5. Pandas library documentation. https://pandas.pydata.org/docs/

In [None]:
jovian.commit()

<IPython.core.display.Javascript object>

References: Jovian lectures
Future: This information can be tracked overtime and can be used to predict stock prices.

In [None]:
jovian.commit(files=['200_most_popular_stocks_on_yahoo.csv'])

<IPython.core.display.Javascript object>