Web scraper built to scrape holiday book lists and Amazon book/product pages for winter holiday book data. Made with Python. As it is now, it scrapes data for just under 200 books (over 100 of those are Christmas books).
- requests
- pandas
- bs4
- jupyter (optional - if you want to view the .ipynb file)
- main.py
- Script for the web scraper. This script scrapes book lists for links to children's books about Thanksgiving, Hanukkah, Kwanzaa, Christmas, and New Years. It then uses those links to scrape book/product information from the individual Amazon pages and outputs the data to a csv file.
- Christmas Book Web Scraping.ipynb
- This file was used for scratch but is mostly empty now
- holiday_books.csv
- csv output from the main.py script
I had to use several web scraping/crawler user agents to get the Amazon data. Those user agents were pulled from this WhatIsMyBrowser.com list: Crawler User Agents.
- BeautifulSoup (bs4) documentation
- Requests for Humans Documentation
- Thanksgiving Books for Kids
- 10 Best Hanukkah Books for Kids
- The 10 Best Kwanzaa Books for Kids
- 100 Christmas Books Every Child Should Read Before They Turn 10
- This page actually has links for ~120 books
- Happy New Years Books for Kids