Skip to content

A parallelized parser to extract attributes from Amazon product pages

Notifications You must be signed in to change notification settings

saurabh3949/AmzGlass

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Amazon Glass

Parse Amazon product pages using Python to get the following attributes:

  • Title
  • Brand
  • Product description
  • Tech specs
  • Price
  • Image URL (to be added)
  • Category

Usage

git clone https://github.com/saurabh3949/AmzGlass.git
cd AmzGlass
pip install -r requirements.txt

Now copy all the product HTMLs to data folder and run:

scrapy crawl amazon -o output.json

to generate the extracts in output.json

In case of any bugs, please email me at saurabh3949@gmail.com

About

A parallelized parser to extract attributes from Amazon product pages

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published