Web Scraping using Python BeautifulSoup

FAANGM Stock data scraping from Yahoo Finance

In [57]:
!pip3 install requests



In [58]:
!pip3 install BeautifulSoup4



In [59]:
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd

In [60]:
url ="https://finance.yahoo.com/quote/NFLX"

In [61]:
response = requests.get(url) #Make a request to a web page; The get() method sends a GET request to the specified url.

In [62]:
response.status_code #The HTTP status code 200 OK means that a request was successful and the server was able to fulfill it

200

In [63]:
soup = BeautifulSoup(response.text,'html.parser')

BeautifulSoup: This is a class from the BeautifulSoup library, which is used to parse HTML and XML documents. It allows you to extract data from HTML tags, navigate the tree structure, and search for specific elements.<br><br>response.text: This is the content of the web page returned by a request made using the requests library.<br><br>response.text contains the HTML content of the page as a string.<br><br>'html.parser': This specifies the parser that BeautifulSoup should use to parse the HTML content. 'html.parser' is the built-in parser in Python that is part of the standard library. It is relatively fast and suitable for most purposes.

In [64]:
print(soup.title.text)

Netflix, Inc. (NFLX) Stock Price, News, Quote & History - Yahoo Finance


In [65]:
price = soup.find('div',{'class' : 'container yf-mgkamr'}).find_all('fin-streamer')[0].text
change = soup.find('div',{'class' : 'container yf-mgkamr'}).find_all('fin-streamer')[1].text
percent = soup.find('div',{'class' : 'container yf-mgkamr'}).find_all('fin-streamer')[2].text

soup.find('div', {'class': 'container yf-mgkamr'}): This part of the code searches for the first <div> tag in the HTML with the class D(ib) Mend(20px). The find method returns the first matching element.<br><br>.find_all('fin-streamer'): After finding the <div>, this code searches within that <div> for all occurrences of the <fin-streamer> tags. find_all returns a list of all such tags.<br><br>[0]: This accesses the first <fin-streamer> tag in the list returned by find_all.<br><br>.text: This extracts the text content from the first <fin-streamer> tag. .text retrieves the textual part inside the tag, which is often the main data point you want.

In [66]:
print(price, change, percent)

697.54 +13.70 (+2.00%)


In [69]:
mystocks = ['META','NFLX','GOOG','MSFT','AAPL','AMZN']
stockData = []

def getData(comp):
    url =f'https://finance.yahoo.com/quote/{comp}'
    req = requests.get(url)
    soup = BeautifulSoup(req.text,'html.parser')
    stock = {
        'comp' : comp,
        'price' : soup.find('div',{'class' : 'container yf-mgkamr'}).find_all('fin-streamer')[0].text,
        'change' : soup.find('div',{'class' : 'container yf-mgkamr'}).find_all('fin-streamer')[1].text,
        'percent' : soup.find('div',{'class' : 'container yf-mgkamr'}).find_all('fin-streamer')[2].text,
            }
    return stock

for i in mystocks:
    stockData.append(getData(i))

In [70]:
stockData

[{'comp': 'META', 'price': '521.12', 'change': '-6.88', 'percent': '(-1.30%)'},
 {'comp': 'NFLX',
  'price': '697.54',
  'change': '+13.70',
  'percent': '(+2.00%)'},
 {'comp': 'GOOG', 'price': '167.01', 'change': '+2.51', 'percent': '(+1.53%)'},
 {'comp': 'MSFT', 'price': '419.49', 'change': '+8.89', 'percent': '(+2.17%)'},
 {'comp': 'AAPL', 'price': '232.05', 'change': '+5.56', 'percent': '(+2.46%)'},
 {'comp': 'AMZN', 'price': '174.21', 'change': '+3.41', 'percent': '(+2.00%)'}]

Storing Retrieved data in JSON and CSV format

In [71]:
with open('stockData.json','w') as f:
    json.dump(stockData,f)
    
df = pd.DataFrame(stockData) 
df.to_csv('stockData.csv')