Skip to content

nilportugues/amazon-scraper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Amazon Product Scraper

NPM npm

Useful tool to scrape product information from the amazon

If you like this tool then please Star it


Buy Me A Coffee


Features

  • Scrape products from the search result
  • Scrape product data by asin
  • Scrape product reviews
  • Sort result by sponsored products only
  • Sorts result by discounted products only
  • Result can be saved to the JSON/CSV files
  • You can scrape up to 500 produtcs and 1000 reviews

Product List alt text Review List alt text

Note:

  • Empty parameter = empty value

Possible errors

  • If there will be let me know

Installation

Install from NPM

$ npm i -g amazon-buddy

Install from YARN

$ yarn global add amazon-buddy

USAGE

Terminal

$ amazon-buddy --help

Usage: amazon-buddy <command> [options]

Commands:
  amazon-buddy products   scrape for a products from the provided key word
  amazon-buddy reviews    scrape reviews from a product by using ASIN
  amazon-buddy asin [id]  scrape data from a single product by using ASIN

Options:
  --help, -h     help                                                  [boolean]
  --version      Show version number                                   [boolean]
  --keyword, -k  Amazon search keyword ex. 'Xbox one'     [string] [default: ""]
  --number, -n   Number of products to scrape. Maximum 100 products or 300 reviews        [default: 10]
  --filetype      Type of the output file where data will be saved. 'all' - save
                  datat to the ` 'json' and 'csv' files
                            [choices: "csv", "json", "all", ""] [default: "csv"]
  --sort         If searching for a products then list will be sorted by a higher
                 score(reviews*rating). If searching for a reviews then they will
                 be sorted by rating.                 [boolean] [default: false]
  --discount, -d Scrape only products with the discount
                                                      [boolean] [default: false]
  --sponsored     Scrape only sponsored products      [boolean] [default: false]
  --min-rating    Minimum allowed rating                            [default: 1]
  --max-rating    Maximum allowed rating                            [default: 5]
  --host, -H      The custom amazon host (can be www.amazon.fr, www.amazon.de, etc.)
                                            [string] [default: "www.amazon.com"]
  --random-ua     Randomize user agent version. This helps to prevent request
                  blocking from the amazon side       [boolean] [default: false]
  --timeout, -t   Timeout between requests. Timeout is set in mls: 1000 mls = 1
                  second                                   [number] [default: 0]


Examples:
  amazon-buddy products -k 'Xbox one'
  amazon-buddy products -k 'Xbox one' -h 'www.amazon.fr'
  amazon-buddy reviews B01GW3H3U8
  amazon-buddy asin B01GW3H3U8

Example 1

Scrape 40 producs from amazon search result by using keyword "vacume cleaner" and save result to the CSV file

$ amazon-buddy products -k 'vacume cleaner' -n 40 --filetype csv

The file will be saved in a folder from which you run the script: products(vacume cleaner)_1589470796380

Example 2

Scrape 100 reviews from a product by using ASIN. NOTE: ASIN is a uniq amazon product ID, it can be found in product URL or if you have scraped product list with our tool you will find it in a CSV/JSON files

$ amazon-buddy reviews B01GW3H3U8 -n 100

The file will be saved in a folder from which you run the script: reviews(B01GW3H3U8)_1589470878252

Example 3

Scrape 300 producs from the "xbox one" keyword with rating minimum rating 3 and maximum rating 4 and save everything to a CSV file

$ amazon-buddy products -k 'xbox one' -n 300 --min-rating 3 --max-rating 4

The file will be saved in a folder from which you run the script: 1552945544582_products.csv

Module

Promise

const amazonScraper = require('amazon-buddy');

(async () => {
    try {
        // Collect 50 products from a keyword 'xbox one'
        const products = await amazonScraper.products({ keyword: 'Xbox One', number: 50, save: true });
        // Collect products that are located on page number 2
        const reviews = await amazonScraper.products({ keyword: 'Xbox One', bulk: false, page: 2 });
        // Collect 50 products from a keyword 'xbox one' with rating between 3-5 stars
        const products_rank = await amazonScraper.products({ keyword: 'Xbox One', number: 50, rating: [3, 5] });

        // Collect 50 reviews from a product ID B01GW3H3U8
        const reviews = await amazonScraper.reviews({ asin: 'B01GW3H3U8', number: 50, save: true });
        // Collect 50 reviews from a product ID B01GW3H3U8  with rating between 1-2 stars
        const reviews_rank = await amazonScraper.reviews({ asin: 'B01GW3H3U8', number: 50, rating: [1, 2] });

        const product_by_asin = await amazonScraper.asin({ asin: 'B01GW3H3U8' });
    } catch (error) {
        console.log(error);
    }
})();

Event

  • You won't be able to use promises.
  • {sort} and {save} will be ignored
const amazonScraper = require('amazon-buddy');

let products = amazonScraper.products({
    keyword: 'xbox',
    number: 50,
    event: true,
});

products.on('error message', (error) => {
    console.log(error);
});

products.on('item', (item) => {
    console.log(item);
});

products.on('completed', () => {
    console.log('completed');
});
products._startScraper();

JSON/CSV output(products):

[{
    asin: 'B01N6HLV9L',
    discounted: false,  // is true if product is with the discount
    sponsored: false,  // is true if product is sponsored
    amazonChoice: true,// if amazon choice badge is present
    price: '$32.99',
    before_discount: '$42.99', // displayed only if price is discounted
    title:'product title',
    url:'long amazon url'
}...]

JSON/CSV output(reviews):

[{
    id: 'R335O5YFEWQUNE',
    review_data: '6-Apr-17',
    name: 'Bob',
    title: 'Happy Gamer',
    rating: 5,
    review: 'blah blah blah'
}...]

JSON/CSV output(asin):

{
        title: 'Apple iPhone 6S, 64GB, Rose Gold - For AT&T / T-Mobile (Renewed)',
        url: 'https://www.amazon.com/dp/B01CR1FQMG',
        reviews: { total_reviews: 2406, rating: '3.8', answered_questions: 677 },
        price: { current_price: 14.98, discounted: false, before_price: 14.98, savings_amount: 0, savings_percent: 0 },
        images: [
            'https://images-na.ssl-images-amazon.com/images/I/412jWjEIzKL._AC_SY879_.jpg',
            'https://images-na.ssl-images-amazon.com/images/I/41XdO4T0xvL._AC_SY879_.jpg',
            'https://images-na.ssl-images-amazon.com/images/I/31qHuwnKOkL._AC_SY879_.jpg',
            'https://images-na.ssl-images-amazon.com/images/I/21CAx9aDlfL._AC_SY879_.jpg',
        ],
        storeID: 'wireless',
        brand: 'Amazon Renewed',
        badges: { amazonChoice: false, amazonPrime: false },
    }

Options

let options = {
    //Search keyword: {string default: ""}
    keyword: "",

    //Number of products to scrape: {int default: 10}
    number: 10,

    // If {bulk} is set to {false} then you can only scrape products by page. Note that {number} will be ignored
    // Very usefull if you need to scrape products from a specific page
    bulk: true,

    // Search result {page} number
    // You can set this value to 5 and scraper will collect all products starting from the {page} number 5
    page: 0,

    // Enable/disabled EventEmitter: {boolean default: false}
    // If enabled then you won't be able to use promises
    event: false,

    // Save result to a file: {boolean default: ''}
    // You can set ['json', 'csv', 'all', '']
    // 'all' - save result to JSON and CSV files
    filetype: '',

    //Set proxy: {string default: ""}
    proxy: "",

    //Sort by rating. [minRating, maxRating]: {array default: [1,5]}
    rating:[1,5],

    //Sorting. If searching for a products then list will be sorted by a higher score(number of reviews*rating). If searching for a reviews then they will be sorted by rating.: {boolean default: false}
    sort: false,

    //Scrape only products with the discount: {boolean default: false}
    discount: false,

    //Scrape only sponsored products: {boolean default: false}
    sponsored: false,

    //Search on custom amazon host to list products in specific language
    host: "www.amazon.de",

    //Randomize user agent version. This helps to prevent request blocking from the amazon side
    randomUa: false

    //Timeout between requests. Timeout is set in mls: 1000 mls = 1 second
    timeout: 0
};

License

MIT

Free Software

About

Amazon Scraper. Scrape products from the amazon search result or reviews from the specific product

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 100.0%