Skip to content
Mao Yu edited this page Dec 17, 2023 · 7 revisions

Epublifier

This extension is an interactive web scrapping tool targeting text/image heavy sites (such as web novels, documentation, blogs, etc).

I specifically wrote this tool to create Epubs for offline reading. As of now, I have gotten it to work on:

  1. Novel Updates
  2. Wuxiaworld
  3. Royal Road
  4. Python documentation

Installation

Example Usage

Extracting list of pages

List of pages

Tranversing Webapp through next button

Traverse

Extracting other documentation

DOcumentation

Features

  • Support novels with many chapters (tested up to 300 chapters)
  • Downloads and embeds images
  • Can selectively parse/compile chapters with check boxes
  • Automatically catches main content with readability.js
  • Cover image, author, title, description are auto-parsed from some sites.
    • Novel Updates
    • Royal Road
  • Configurable parsers for list of links or webapps.

How to Use

Novel Update

  • 🛈 (For novelupdates) Click on the ☰ menu button (Show all chapters) above the chapter list
  • Click Epublifier's icon on your browser's extension bar, which will open a popup. It will automatically try to load the series' metadata.
  • Select some chapters (or all of them)
    • 🛈 You may use Shift+Click to select a range of chapters to include or delete.
  • Click Parse, if all is well, the parsed column should turn from circle to checkmark
  • Click Epub to generate the ePub as a download

Wuxia World

  • Navigate to the first chapter of a series
  • Click Epublifier's icon on your browser's extension bar, which will open a popup.
  • Go to the Add Page Parser options tab, configure time to wait (depending on your internet speed, this supports decimals), and maximum chapter to parse with each click.
  • Click the Add This Page button to parse # of chapters as defined in max chaps.
  • Select a list of chaps with the check box or shift-click
  • Click Epub to generate the ePub as a download

Royal Road

  • Go to any series table of contents page
  • Click Epublifier's icon on your browser's extension bar, which will open a popup. It will automatically try to load the series' metadata.
  • Select some chapters (or all of them)
    • 🛈 You may use Shift+Click to select a range of chapters to include or delete.
  • Click Parse, if all is well, the parsed column should turn from circle to checkmark
  • Click Epub to generate the ePub as a download

Custom websites (Non-webapp)

If a website has a list of links that can be defined user query selectors and regex on the link text, you can try the Chapter Links parser

  • Click Epublifier's icon on your browser's extension bar, which will open a popup. It will say "No parser available"
  • Go to the Links Parser tab, and select Chapter Links in the list box
  • Configure regex and query selector for links
  • Click the (Re)Parse links button
  • Select some chapters (or all of them)
    • 🛈 You may use Shift+Click to select a range of chapters to include or delete.
  • Click Parse, if all is well, the parsed column should turn from circle to checkmark
  • Click Epub to generate the ePub as a download

Custom webapps

If a webapp has a Next button, you can try the Add page parser

  • Click Epublifier's icon on your browser's extension bar, which will open a popup. It will say "No parser available"
  • Select Parse as app
  • Open the Add Page Parser tab
  • Click the search button next the Next Element textbox, and select the next button, it should highlight red
  • Click the search button next to the Title Element textbox, and select the title text, or leave it blank and have the parser try to auto detect
  • Click Add This Page button until you reach the end
  • Select a list of chaps with the check box or shift-click
  • Click Epub to generate the ePub as a download

Advanced User Configurations

Warning: Advanced configuration requires javascript knowledge

Currently this extension does not save any modification to the parser definition, so keep a copy locally.

Overview

main_def - This object defines all the parsers in the current file. If you add a new parser, it must go in here.

Detector - The detector function tries to detect which parser to set for the current page.

There are two types of parsers:

  1. Links parser - This type of parser detects a list of links given a website URL and DOM
  2. Text parser - This type of parser extracts text content from a URL and DOM

Links Parser

Links parser returns a list of chapters, see one of the two examples.

Text Parser

Mostly just extracts text.