Skip to content
/ crawler Public

A generic crawler to crawl ecommerce websites

Notifications You must be signed in to change notification settings

psr-ai/crawler

Repository files navigation

Smart Crawler v1.0

Crawls e-commerce websites such as Amazon and Flipkart and extracts structured information

Python Version

2.7.10

Running locally

  1. Clone the repo
  2. Create a virtual environment (here is the doc)
  3. Activate the virtual environment
  4. Just in case, I have also committed my virtual env (for Mac) if you want to activate it directly
  5. Navigate to the root directory through terminal and execute pip install requirements.txt
  6. Edit run.py and specify the url variable, give the path of first page of results to be extracted
  7. You can specify the number of desired results in the run.py file (Crawler(total_items_to_scrape=your_desired_number))
  8. Run the run.py file, utf8 output in console, csv output in output/output.csv

Magic

Since the similar elements belong to the same key, we can easily use NLP to rename the keys to columns such as Price, Reviews etc

About

A generic crawler to crawl ecommerce websites

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published