Smart Crawler v1.0
Crawls e-commerce websites such as Amazon and Flipkart and extracts structured information
2.7.10
- Clone the repo
- Create a virtual environment (here is the doc)
- Activate the virtual environment
- Just in case, I have also committed my virtual env (for Mac) if you want to activate it directly
- Navigate to the root directory through terminal
and execute
pip install requirements.txt
- Edit
run.py
and specify theurl
variable, give the path of first page of results to be extracted - You can specify the number of desired results in the
run.py
file (Crawler(total_items_to_scrape=your_desired_number)
) - Run the
run.py
file, utf8 output in console, csv output in output/output.csv
Since the similar elements belong to the same key, we can easily use NLP to rename the keys to columns such as Price, Reviews etc