HtmlParser

HtmlParser is a Java library that parses HTML as input and returns clean, easy-to-read text.

EXAMPLES

Instantiate the JHtmlParser class via any one of the provided constructors, depending on where the interested HTML page is from:

JHtmlParser jHtmlParser = new JHtmlParser(uri, userAgent); // URI
JHtmlParser jHtmlParser = new JHtmlParser(url, userAgent);  // URL

Start content extraction by running:

jHtmlParser.init();

The output is clean, readable content in HTML format. You can obtain the output with:

jHtmlParser.outerHtml();

Also you can obtain the output object Page, which contain domain site, source document and output article:

jHtmlParser.getPage();

INCLUDING IN YOUR PROJECT

If you use Maven to build your project you can simply add a dependency to this library.

    <dependency>
        <groupId>com.github.falloutfire</groupId>
        <artifactId>htmlparser</artifactId>
        <version>0.2.2</version>
    </dependency>

If you use Gradle to build your project you can simply add a dependency to this library.

    implementation 'com.github.falloutfire:htmlparser:0.2.2'

DEPENDENCY

HtmlParser depends on the jsoup library.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
gradle/wrapper		gradle/wrapper
src/main/java/com/github/falloutfire/htmlparser		src/main/java/com/github/falloutfire/htmlparser
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HtmlParser

EXAMPLES

INCLUDING IN YOUR PROJECT

DEPENDENCY

About

Releases

Packages

Languages

License

falloutfire/HtmlParser

Folders and files

Latest commit

History

Repository files navigation

HtmlParser

EXAMPLES

INCLUDING IN YOUR PROJECT

DEPENDENCY

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages