Amazon ASIN Scraper

This project allows you to input a book's ASIN or product url from Amazon.com (no official additional region support at this time) and retrieve the book's metadata by scraping the html page that's returned from their server, including the: title, author, and description (as html and raw text). I went with the scraping route because Amazon's API doesn't seem adquate to get all potentially relevant information about a book.

It also converts some text classes within the description html into their proper html element such as .a-text-bold into <b></b>. This is so you can keep the formatting while removing classes that aren't useful since you don't have the stylesheet for it anyway. It then strips the remaining classes so you're left with barebones (and not prettified) html. It does not however make any other additions and if you're using this to fill out a form, the final formatting will be up to your form's interpretation and you may need to make adjustments. I do also make a minimal effort to format the raw text of the description by looping through any elements within it and creating a custom description text with proper line breaks. This does not work for every book and for those you will be better off directly copying the text right from the page.

It also dumps JSON specifically targeted for MAM fast form filling, but may be compatible with others. It dumps the following format:

{
    title: result.title,
    authors: [result.author],
    isbn: 'ASIN:' + result.asin,
    language: 'English',
    thumbnail: result.imageUrl
}

For now this is just a node terminal application and you will input the ASIN via terminal and retrieve the metadata via terminal. Future improvements would be giving a few options for how a user will get their requested data. For example: the terminal, a file output, and an embeded webserver that runs a web-UI for easier data input and copying.

Installation

Requirements

Node.js (LTS or higher)

Setup

Clone this project or download the .zip from the same place
Open a terminal window and point it to the amazon-asin-scraper folder
- cd <filepath to folder>
Run npm install
To start the application run npm run dev

Usage

This project uses TypeScript (TS) which is a superset of JavaScript (JS). These days modern JS is fairly similar to TS making it incredibly easy to get started with. This project includes a transpiler that will process the .ts and .tsx files located in the src directory into JS.

To build the project, run npm run build in the terminal. This will transpile the .ts files into dist folder where you'll see index.js which is the entry point of the application. You can then run the application with node dist/index.js or combine both the build and run commands with npm run dev.

From here the app will ask you for an Amazon ASIN or url input. This can either be one or multiple ASIN that are comma separated, OR one url.

Valid inputs may look like this:
Amazon ASIN: B0CPN3LC6N
Amazon ASIN: B0CPN3LC6N,B0CQQ3VB8S,B0C1ZPZC1Y
Amazon ASIN: https://www.amazon.com/gp/product/B0CPCJW9SX
Amazon ASIN: https://www.amazon.com/gp/product/B0CPCJW9SX/ref=kinw_myk_ro_title

Keep in mind that the more scraping this app does within a short amount of time, the more likely it is to be flagged as automation and denied access. I do not yet know if this is against the Amazon usage agreement, nor do I know of the potential consequences if it is. I'm not mass-scraping books, I just do a few maximum at a time, once or twice per day.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
.prettierrc.js		.prettierrc.js
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon ASIN Scraper

Installation

Requirements

Setup

Usage

About

Releases

Packages

Languages

License

nullgato/amazon-asin-scraper

Folders and files

Latest commit

History

Repository files navigation

Amazon ASIN Scraper

Installation

Requirements

Setup

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages