Harvest: Web Scraper

A simple web scraper that takes a snapshot of a target website. The keyword being "simple"; this scraper can take in and store as much data as it can, perform navigation, and store the result in multiple formats, but will never perform data extraction/processing, that step will be performed further down the line on a different project. This protects us from having to deal with site restructuring messing up with data extraction.

Features

Developer "Quality-of-Life" Features

Developer Notes

install GCloud/Firebase CLI and setup account
initial setup

npm install -g firebase-tools
npm install --prefix ./functions

typescript

sudo npm install -g typescript

Unit Test

npm test --prefix ./functions

Deploy

firebase deploy --token $FIREBASE_TOKEN --project $FIREBASE_PROJECT --only functions

ERROR: Failed to launch chrome!

im running node on Ubuntu

sudo apt-get install \
gconf-service \
libasound2 \
libatk1.0-0 \
libatk-bridge2.0-0 \
libc6 \
libcairo2 \
libcups2 \
libdbus-1-3 \
libexpat1 \
libfontconfig1 \
libgcc1 \
libgconf-2-4 \
libgdk-pixbuf2.0-0 \
libglib2.0-0 \
libgtk-3-0 \
libnspr4 \
libpango-1.0-0 \
libpangocairo-1.0-0 \
libstdc++6 \
libx11-6 \
libx11-xcb1 \
libxcb1 \
libxcomposite1 \
libxcursor1 \
libxdamage1 \
libxext6 \
libxfixes3 \
libxi6 \
libxrandr2 \
libxrender1 \
libxss1 \
libxtst6 \
ca-certificates \
fonts-liberation \
libappindicator1 \
libnss3 \
lsb-release \
xdg-utils \
wget

Notes for IntelliJ Users

Please use Windows Linux subsystem and install NodeJS "Settings > Languages and Frameworks > Node.JS and NPM > Node Interpreter: Ubuntu"
Settings > Languages and Frameworks > Javascript > Javascript Language Version

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
functions		functions
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
firebase.json		firebase.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Harvest: Web Scraper

Features

Developer "Quality-of-Life" Features

Developer Notes

Unit Test

Deploy

ERROR: Failed to launch chrome!

Notes for IntelliJ Users

About

Releases

Packages

Languages

License

kwler/harvest-webscraper

Folders and files

Latest commit

History

Repository files navigation

Harvest: Web Scraper

Features

Developer "Quality-of-Life" Features

Developer Notes

Unit Test

Deploy

ERROR: Failed to launch chrome!

Notes for IntelliJ Users

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages