GitHub - kbrown333/scraper_api: Node JS / Express server for downloading raw html from 3rd party websites (scraping)

Synopsis

This application exposes a scraper API for applications to request raw html content from 3rd party sites.

To start the server, open a command line interface, navigate to the project folder, then type 'npm start'. To test the server, open the following link in Chrome: http://localhost:3000/scrape?to_url=google.com&https=true

The API allows you to pass the below parameters to GET requests in order to configure requests dynamically.

FORMAT: http://localhost:/scrape?to_url=&https=<true/false>&iso_body=<true/false>&remove_origin=<true/false>&prep_html=<true/false>

to_url (required): link you are wanting to scrape (please remove 'http://', 'https://' and instead use 'https' parameter)
https (optional): set to true if site requires https, leave blank or set to false if http is allowed
iso_body (optional): only returns html body
remove_origin (optional): remove references to 3rd party links (scripts, styles, iframes, etc); use this if you are injecting the html into another page.
prep_html (optional): changes html container tags (, , ) into injectable tags (<html_scraper>, etc); use this if you are injecting the html into another page.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
controllers		controllers
data_types		data_types
node_modules		node_modules
public		public
routes		routes
README.md		README.md
app.json		app.json
index.js		index.js
package.json		package.json