Skip to content

kbrown333/scraper_api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Synopsis

This application exposes a scraper API for applications to request raw html content from 3rd party sites.

Quick Start

To start the server, open a command line interface, navigate to the project folder, then type 'npm start'. To test the server, open the following link in Chrome: http://localhost:3000/scrape?to_url=google.com&https=true

API Parameters

The API allows you to pass the below parameters to GET requests in order to configure requests dynamically.

FORMAT: http://localhost:/scrape?to_url=&https=<true/false>&iso_body=<true/false>&remove_origin=<true/false>&prep_html=<true/false>

  1. to_url (required): link you are wanting to scrape (please remove 'http://', 'https://' and instead use 'https' parameter)
  2. https (optional): set to true if site requires https, leave blank or set to false if http is allowed
  3. iso_body (optional): only returns html body
  4. remove_origin (optional): remove references to 3rd party links (scripts, styles, iframes, etc); use this if you are injecting the html into another page.
  5. prep_html (optional): changes html container tags (, , ) into injectable tags (<html_scraper>, etc); use this if you are injecting the html into another page.

About

Node JS / Express server for downloading raw html from 3rd party websites (scraping)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published