This repository contains a few web scrapers I built primarily using Python's requests
and bs4
libraries.
There are two webscrapers in this repo:
bookstore_webscraper.py
: This file contains functions that help scrape book information from this website.nasdaq_webscraper.py
: This file contains functions that help scrape stock data from this NASDAQ webpage.
To scrape data from pages 1 to 3, run the following command.
data = bookstore_scraper('http://books.toscrape.com/catalogue/',1,3)
The output of the above function is a pandas dataframe.
To scrape data from pages 1 to 10, run the following command.
urls = generate_url('https://www.nasdaq.com/screening/companies-by-industry.aspx',10)
companies = get_financial_data(urls)
The output of the above function is a nested dictionary of the form:
{'Company_Name': [{'Symbol': 'XXX',
'Current_Market_Cap': 'YYY',
'Country': 'ZZZ',
'IPO_Year': 'AAA',
'Subsector': 'BBB'}],.....}