Skip to content

πŸ•΅οΈβ€β™‚οΈ A colorful, ethical Amazon web scraper built with Scrapy + Playwright. βš™οΈ Extracts product titles, prices, ratings, and images β€” all with a modern CLI, banner, and warning system.You can scrape amazon without Proxy and this is able to due 300+ requests on amazon

Notifications You must be signed in to change notification settings

mscoder-py/amazon-playwright-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ•·οΈ Amazon Product Scraper v1.0

Python License: MIT Made with Love

βš™οΈ Overview

AmazonPlaywrightSpider is a powerful Scrapy + Playwright-based web scraper built to extract product details (title, price, rating, image) from Amazon.com. It automates a Chromium browser to safely and efficiently scrape dynamic product data, even from JavaScript-heavy pages. You can extract data without any kind Proxy and this is able to do 300+ requests on Amazone.

✨ Features

  • πŸ•Ή Playwright-powered Scraping – Handles JavaScript-rendered Amazon pages.
  • 🌈 Colorful CLI – Fully color-coded output with banners and warnings.
  • ⚠️ Ethical Notice System – Shows a warning box before starting.
  • πŸ“¦ Auto Data Export – Saves results to product.csv and product.json.
  • 🧭 Pagination Support – Automatically crawls through multiple result pages.
  • πŸ’» Lightweight & Customizable – Works directly with scrapy crawl amazon_playwright.

🧰 Requirements

  • Python 3.9+
  • Scrapy
  • Scrapy-Playwright
  • Playwright (Chromium browser)
  • Node.js (for Playwright backend)

🧠 Technology Stack

Python

  • Programming language.

Scrapy

  • Web scraping framework

Playwright

  • Headless browser automation

Twisted Reactor

  • Async I/O event system for Scrapy

🧩 Project Structure

amazon_scraper/
β”‚
β”œβ”€β”€ amazon/
β”‚   β”œβ”€β”€ spiders/
β”‚   β”‚   └── amazon_playwright_spider.py   # main spider (this file)
β”‚   β”œβ”€β”€ settings.py                       # Scrapy configuration
β”‚
β”œβ”€β”€ product.json                          # output file (auto-generated)
β”œβ”€β”€ product.csv                           # output file (auto-generated)
└── README.md                             # documentation

βš™οΈ Installation Guide

1️⃣ Clone Repository

git clone https://github.com/your-username/amazon-playwright-scraper.git
cd amazon-playwright-scraper

2️⃣ Create Virtual Environment

python -m venv venv
venv\Scripts\activate  # (Windows)
# or
source venv/bin/activate  # (Linux/Mac)

3️⃣ Install Dependencies

pip install scrapy scrapy-playwright

4️⃣ Install Playwright Browsers

playwright install

▢️ How to Run

Option 1 β€” From Scrapy CLI

scrapy crawl amazon_playwright

Option 2 β€” Run Script Directly

python amazon_playwright_spider.py

When you run it directly, it will:

  • Show a fancy banner

  • Display a warning box

  • Show version and author

  • Ask confirmation before crawling

🧾 Output Example

Sample JSON Output

[
    {
        "title": "Logitech Wireless Mouse M510",
        "price": "$24.99",
        "rating": "4.7 out of 5 stars",
        "image": "https://images.amazon.com/...jpg"
    },
    {
        "title": "HP USB Keyboard 320K",
        "price": "$17.45",
        "rating": "4.5 out of 5 stars",
        "image": "https://images.amazon.com/...jpg"
    }
]

πŸ“Š Sample CSV Output

When the spider finishes running, it automatically saves results in product.csv and product.json.

Here’s an example of how the CSV output looks:

title price rating image
Logitech MX Master 3S Wireless Mouse $99.99 4.8 out of 5 stars https://m.media-amazon.com/images/I/71X9ppvP+aL._AC_SL1500_.jpg
Corsair K70 RGB TKL Mechanical Gaming Keyboard $129.99 4.7 out of 5 stars https://m.media-amazon.com/images/I/81uO-KnH1HL._AC_SL1500_.jpg
Razer Kraken V3 X Gaming Headset $49.99 4.5 out of 5 stars https://m.media-amazon.com/images/I/61QyH9PoWQL._AC_SL1500_.jpg

πŸ“ The files are saved automatically in your project root directory after each crawl:

product.csv
product.json

⚠️ Important Notes

  • This script is for educational and research use only.

  • Do NOT use it for aggressive or commercial scraping.

  • Always respect Amazon’s Terms of Service.

  • Use download delays and low concurrency to prevent blocking.

πŸ§‘β€πŸ’» Author & Credits

Developer: MS Coder

  • *Version: v1.0
  • Language: Python
  • *Framework: Scrapy + Playwright

πŸ’‘ Future Plans

  • Add support for multiple Amazon categories

  • Implement rotating user-agents & proxy pool

  • Add progress bar for live scraping status

  • Build web dashboard for live scraped data


🧾 License & Credits

MIT License Made with Python Powered by Scrapy Playwright Integration


Made with ❀️ by MS Coder
Version 1.0 β€’ Built for learning, with style & responsibility 🧠**

🏁 Final Note

β€œScrape responsibly. Automate smartly. Respect platforms.”

β€” MS Coder

About

πŸ•΅οΈβ€β™‚οΈ A colorful, ethical Amazon web scraper built with Scrapy + Playwright. βš™οΈ Extracts product titles, prices, ratings, and images β€” all with a modern CLI, banner, and warning system.You can scrape amazon without Proxy and this is able to due 300+ requests on amazon

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages