This application exposes a scraper API for applications to request raw html content from 3rd party sites.
To start the server, open a command line interface, navigate to the project folder, then type 'npm start'. To test the server, open the following link in Chrome: http://localhost:3000/scrape?to_url=google.com&https=true
The API allows you to pass the below parameters to GET requests in order to configure requests dynamically.
FORMAT: http://localhost:/scrape?to_url=&https=<true/false>&iso_body=<true/false>&remove_origin=<true/false>&prep_html=<true/false>
- to_url (required): link you are wanting to scrape (please remove 'http://', 'https://' and instead use 'https' parameter)
- https (optional): set to true if site requires https, leave blank or set to false if http is allowed
- iso_body (optional): only returns html body
- remove_origin (optional): remove references to 3rd party links (scripts, styles, iframes, etc); use this if you are injecting the html into another page.
- prep_html (optional): changes html container tags (, , ) into injectable tags (<html_scraper>, etc); use this if you are injecting the html into another page.