Skip to content

hellomouse/apps-site-service

Repository files navigation

Hellomouse Apps Site Queue

A fun Microservice for scraping stuff from sites for Hellomouse Apps

Features:

  • Download webpages as HTML (with assets like CSS, videos, images, etc... embedded as base64), PDF, WEBP (screenshot)
  • Special handling for certain websites, currently we have:
    • Twitter / X: Tweets are downloaded as HTML + attached media (images, videos)
    • Reddit: Posts and comments are downloaded with any attached assets
    • Soundcloud: Songs are downloaded with metadata (HTML + audio)
    • Newgrounds: Songs are downloaded with metadata (HTML + audio)
    • Imgur: Albums and gallerys are downloaded with all images and metadata (HTML + images / videos)
    • Youtube: Videos are downloaded
    • Pixiv: Albums are downloaded
    • Bilibili: Videos are downloaded

Built With

  • NodeJS
  • Postgres
  • Puppeteer

Setup

Install dependencies

npm install

Setup the config. You will need a PostgresSQL database running as well as the hellomouse-apps-api server (run the server first to generate the required tables).

There is an example config in the root directory. Copy it and rename it to config.js. Here are the properties:

export const dbUser = 'hellomouse_board';  // PostgresSQL user
export const dbIp = '127.0.0.1';           // Postgres Server location
export const dbPort = 5433;                // Postgres Server port 
export const dbPassword = 'my password';   // Postgres Server password
export const dbName = 'hellomouse_board';  // Postgres Server DB name

export const fileDir = './saves';          // Path to store all files, in general, web files are stored under this path/site_downloads/file.ext

To setup yt-dlp (optional) you can place your browser cookies in secret/yt-cookies.txt for use in downloading youtube videos, and secret/bilibili-cookies.txt for downloading bilibili videos.

To setup pixiv cookies (optional, for bypassing rate limiting and age restrictions) you can place your browser cookies (exported as a JS array of objects like [{ name: ... }])) and put the result in secret/pixiv-cookies.txt.

Run the server:

node index.js

About

Website downloader service for Hellomouse Apps

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published