Skip to content

oiwn/dom-content-extraction

Repository files navigation

dom-content-extraction

Rust implementation of Fei Sun, Dandan Song and Lejian Liao paper:

Content Extraction via Text Density (CETD)

use dom_content_extraction::{DensityTree, get_node_text};

let dtree = DensityTree::from_document(&document); // &scraper::Html 
let sorted_nodes = dtree.sorted_nodes();
let node_id = sorted_nodes.last().unwrap().node_id;

println!("{}", get_node_text(node_id, &document));

Read documentation on docs.rs

About

DOM Based Content Extraction via Text Density

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published