Web Scrapers

This repository contains a few web scrapers I built primarily using Python's requests and bs4 libraries.

There are two webscrapers in this repo:

bookstore_webscraper.py: This file contains functions that help scrape book information from this website.
nasdaq_webscraper.py: This file contains functions that help scrape stock data from this NASDAQ webpage.

Sample Usage for Each File:

To scrape data from pages 1 to 3, run the following command.

data = bookstore_scraper('http://books.toscrape.com/catalogue/',1,3)

The output of the above function is a pandas dataframe.

To scrape data from pages 1 to 10, run the following command.

urls = generate_url('https://www.nasdaq.com/screening/companies-by-industry.aspx',10)
companies = get_financial_data(urls)

The output of the above function is a nested dictionary of the form:

{'Company_Name': [{'Symbol': 'XXX',
'Current_Market_Cap': 'YYY',
'Country': 'ZZZ',
'IPO_Year': 'AAA',
'Subsector': 'BBB'}],.....}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
bookstore_webscraper.py		bookstore_webscraper.py
nasdaq_webscraper.py		nasdaq_webscraper.py