ReadabilityJS for Ruby

Clean up web pages and extract the main content, powered by Mozilla Readability.

This is a Ruby wrapper gem for readability, by running a node process with nodo.

    require 'readability_js'
    html = File.read("my_article.html")
    # extend has included a DEFAULT_SELECTOR_BLACKLIST and you can add your own selectors to it as well, 
    # that will be used to remove unwanted elements from the content before parsing at all.
    result = ReadabilityJs.parse_extended(html, blacklist_selectors: [".advertisement", "#sponsored"])
    p result

Query parameters

You can pass all parameters supported by readability, checkout the rubydoc for more details.

Here an example with all parameters, the camelCase parameters are converted to snake_case in ruby:

    require 'readability_js'
data = ReadabilityJs.parse(
  # TODO: add parameters here
)
# => Hash

Parse response

The response object is of type Hash. It contains the data returned by readability, with hash keys transformed in snake_case.

{
  "title" => "Article Title",
  "content" => "<div>...</div>",
  "text_content" => "Plain text content",
  "markdown_content" => "## Markdown content", # only for extended parse
  "length" => 1234,
  "excerpt" => "This is an excerpt of the article...",
  "byline" => "Author Name",
  "dir" => "ltr",
  "site_name" => "example.com",
  "lang" => "en",
  "published_time" => "2024-01-01T12:00:00Z",
  "image_url" => "https://example.com/image.jpg" # only for extended parse
}

Documentation

Check out the doc at RubyDoc:
https://rubydoc.info/github/magynhard/ruby-readability_js

As this library is only a wrapper, checkout the original readability documentation:
https://github.com/mozilla/readability

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/magynhard/ruby-readability_js.

This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
bin		bin
cli		cli
lib		lib
spec		spec
.gitattributes		.gitattributes
.gitignore		.gitignore
.rspec		.rspec
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Gemfile		Gemfile
LICENSE		LICENSE
README.md		README.md
Rakefile		Rakefile
readability_js.gemspec		readability_js.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

ReadabilityJS for Ruby

Contents

Installation

Prerequisites

Gem

Usage examples

Original parse

Extended parse

Query parameters

Parse response

Documentation

Contributing

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Languages

Uh oh!

License

magynhard/ruby-readability_js

Folders and files

Latest commit

History

Repository files navigation

ReadabilityJS for Ruby

Contents

Installation

Prerequisites

Gem

Usage examples

Original parse

Extended parse

Query parameters

Parse response

Documentation

Contributing

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Languages

Packages