Create your own GitHub profile
Sign up for your own profile on GitHub, the best place to host code, manage projects, and build software alongside 36 million developers.Sign up
A dataset of popular pages (taken from <dir.yahoo.com>) with manually marked up semantic blocks.
A dataset of random pages with manually marked up semantic blocks.
Forked from misja/python-boilerpipe
Python interface to Boilerpipe, Boilerplate Removal and Fulltext Extraction from HTML pages
Generate an SVG Scrabble-board from any text document.
Forked from bcoe/sandcastle
Forked from tpopela/vips_java
Implementation of Vision Based Page Segmentation algorithm in Java