AmazonPlaywrightSpider is a powerful Scrapy + Playwright-based web scraper built to extract product details (title, price, rating, image) from Amazon.com. It automates a Chromium browser to safely and efficiently scrape dynamic product data, even from JavaScript-heavy pages. You can extract data without any kind Proxy and this is able to do 300+ requests on Amazone.
- πΉ Playwright-powered Scraping β Handles JavaScript-rendered Amazon pages.
- π Colorful CLI β Fully color-coded output with banners and warnings.
β οΈ Ethical Notice System β Shows a warning box before starting.- π¦ Auto Data Export β Saves results to
product.csvandproduct.json. - π§ Pagination Support β Automatically crawls through multiple result pages.
- π» Lightweight & Customizable β Works directly with
scrapy crawl amazon_playwright.
- Python 3.9+
- Scrapy
- Scrapy-Playwright
- Playwright (Chromium browser)
- Node.js (for Playwright backend)
- Programming language.
- Web scraping framework
- Headless browser automation
- Async I/O event system for Scrapy
amazon_scraper/
β
βββ amazon/
β βββ spiders/
β β βββ amazon_playwright_spider.py # main spider (this file)
β βββ settings.py # Scrapy configuration
β
βββ product.json # output file (auto-generated)
βββ product.csv # output file (auto-generated)
βββ README.md # documentation1οΈβ£ Clone Repository
git clone https://github.com/your-username/amazon-playwright-scraper.git
cd amazon-playwright-scraper2οΈβ£ Create Virtual Environment
python -m venv venv
venv\Scripts\activate # (Windows)
# or
source venv/bin/activate # (Linux/Mac)3οΈβ£ Install Dependencies
pip install scrapy scrapy-playwright4οΈβ£ Install Playwright Browsers
playwright installOption 1 β From Scrapy CLI
scrapy crawl amazon_playwrightOption 2 β Run Script Directly
python amazon_playwright_spider.pyWhen you run it directly, it will:
-
Show a fancy banner
-
Display a warning box
-
Show version and author
-
Ask confirmation before crawling
Sample JSON Output
[
{
"title": "Logitech Wireless Mouse M510",
"price": "$24.99",
"rating": "4.7 out of 5 stars",
"image": "https://images.amazon.com/...jpg"
},
{
"title": "HP USB Keyboard 320K",
"price": "$17.45",
"rating": "4.5 out of 5 stars",
"image": "https://images.amazon.com/...jpg"
}
]When the spider finishes running, it automatically saves results in product.csv and product.json.
Hereβs an example of how the CSV output looks:
| title | price | rating | image |
|---|---|---|---|
| Logitech MX Master 3S Wireless Mouse | $99.99 | 4.8 out of 5 stars | https://m.media-amazon.com/images/I/71X9ppvP+aL._AC_SL1500_.jpg |
| Corsair K70 RGB TKL Mechanical Gaming Keyboard | $129.99 | 4.7 out of 5 stars | https://m.media-amazon.com/images/I/81uO-KnH1HL._AC_SL1500_.jpg |
| Razer Kraken V3 X Gaming Headset | $49.99 | 4.5 out of 5 stars | https://m.media-amazon.com/images/I/61QyH9PoWQL._AC_SL1500_.jpg |
π The files are saved automatically in your project root directory after each crawl:
product.csv
product.json-
This script is for educational and research use only.
-
Do NOT use it for aggressive or commercial scraping.
-
Always respect Amazonβs Terms of Service.
-
Use download delays and low concurrency to prevent blocking.
- *Version: v1.0
- Language: Python
- *Framework: Scrapy + Playwright
-
Add support for multiple Amazon categories
-
Implement rotating user-agents & proxy pool
-
Add progress bar for live scraping status
-
Build web dashboard for live scraped data
Made with β€οΈ by MS Coder
Version 1.0 β’ Built for learning, with style & responsibility π§ **
β MS Coder