A simple web scraper that takes a snapshot of a target website. The keyword being "simple"; this scraper can take in and store as much data as it can, perform navigation, and store the result in multiple formats, but will never perform data extraction/processing, that step will be performed further down the line on a different project. This protects us from having to deal with site restructuring messing up with data extraction.
- wait for "orders" from HTTP
- wait for "orders" from PubSub
- navigate websites
- take a screenshot
- store the html contents
- write results to HTTP response
- write results to PubSub
- write results to Cloud Storage
- perform other commands aside from basic navigation
- security
- DoS mitigation
- continuous integration
- TypeScript
- unit tests
- unit test mocks
- integration tests running on local emulator
- environment variables
- install GCloud/Firebase CLI and setup account
- initial setup
npm install -g firebase-tools
npm install --prefix ./functions
sudo npm install -g typescript
npm test --prefix ./functions
firebase deploy --token $FIREBASE_TOKEN --project $FIREBASE_PROJECT --only functions
- im running node on Ubuntu
sudo apt-get install \
gconf-service \
libasound2 \
libatk1.0-0 \
libatk-bridge2.0-0 \
libc6 \
libcairo2 \
libcups2 \
libdbus-1-3 \
libexpat1 \
libfontconfig1 \
libgcc1 \
libgconf-2-4 \
libgdk-pixbuf2.0-0 \
libglib2.0-0 \
libgtk-3-0 \
libnspr4 \
libpango-1.0-0 \
libpangocairo-1.0-0 \
libstdc++6 \
libx11-6 \
libx11-xcb1 \
libxcb1 \
libxcomposite1 \
libxcursor1 \
libxdamage1 \
libxext6 \
libxfixes3 \
libxi6 \
libxrandr2 \
libxrender1 \
libxss1 \
libxtst6 \
ca-certificates \
fonts-liberation \
libappindicator1 \
libnss3 \
lsb-release \
xdg-utils \
wget
- Please use Windows Linux subsystem and install NodeJS "Settings > Languages and Frameworks > Node.JS and NPM > Node Interpreter: Ubuntu"
- Settings > Languages and Frameworks > Javascript > Javascript Language Version