Usage

CL-BOILERPIPE is a Common Lisp library for extracting the main content from web pages like newspaper articles and blog posts. It was designed for expanding truncated articles in feeds.

CL-BOILERPIPE is based on the Java Boilerpipe library, based in turn on Kohlschütter et al., “Boilerplate Detection using Shallow Text Features”.

Only the simplest version of the Boilerpipe algorithm is implemented here; I find that it works well enough.

Usage

Given an HTML string, call:

 (cl-boilerpipe:strip-boilerpipe html)

This returns the main content as another HTML string.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
LICENSE.txt		LICENSE.txt
README.markdown		README.markdown
cl-boilerpipe.asd		cl-boilerpipe.asd
cl-boilerpipe.lisp		cl-boilerpipe.lisp
package.lisp		package.lisp
sanitize.lisp		sanitize.lisp
util.lisp		util.lisp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Usage

About

Releases

Packages

Languages

License

ruricolist/cl-boilerpipe

Folders and files

Latest commit

History

Repository files navigation

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages