Skip to content

l-portet/yellow-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

yellow-scraper

Scrape the french yellow pages (Pages Jaunes) with puppeteer

⚠️ MAY BE DEPRECATED: Since Pages Jaunes pages and data structure may change, this scraper won't be automatically updated.

Installation

npm install

Usage

Set up the config.js file

Sample config

module.exports = {
    query: {
        keyword: 'luthier',
        location: 'Rennes'
    }, // Will search all 'luthier' businesses in 'Rennes'
    headless: true, // Use chrome in headless mode
    userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
    acceptLanguage: 'fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7,la;q=0.6',
    outputFilename: 'output',
    outputFormat: 'csv', // Supported format : 'json', 'csv'
    maxResults: -1, // -1 => all or N max allowed results (the scraper will stop when the limit is outreached)
    puppeteerArgs: [], // Additional args for puppeteer (like proxy for example)
    baseURL: 'https://www.pagesjaunes.fr', // Only target this domain if you have the proper rights
    safeMode: true // Safe mode sets a delay between each query
}

Run the scraper

npm start

Todo

Export as Excel format (xls)

Issues

If you find an issue, feel free to contact me or open an issue on github. You can also contribute by creating a pull request.

Disclaimer

I can't be charged for any abusive usage or problem of this software. Be sure you have the proper rights before you run it.

Releases

No releases published

Packages

No packages published