0.5.0
Release Notes for 0.5.0 (2025-02-06) π
β¨ New Features
-
Introduced
Config::candidate_select_mode, allowing you to choose between the Readability.js order or the crate's exclusive implementation for adjusting the top candidate. -
Added
Config::text_modeto control text formatting. The default isTextMode::Raw, ensuring full compatibility with previous versions.
π§ Changes
-
Optimized
Readability::grab_articleto retain only the best attempt among failed ones, reducing unnecessary data retention. -
Improved internal code to reduce execution time.
-
Normalization of
Metadata.bylinewhen assigned during article extraction β removing new lines and trailing spaces.
β οΈ Breaking Changes
- Revised document filtering logic: filtering content is now handled separately from the process of extracting elements for scoring. Previously, filtering was tightly coupled with scoring, which impacted the document structure and in some cases resulted in inconsistent content extraction. With the new approach:
- The removal of duplicate
Metadata.titleelements is handled more accurately, reducing redundancy and improving document clarity. Metadata.bylineis less likely to incorrectly identify commentators or unrelated elements as the article's author, addressing edge cases observed inmozilla/readabilitytest pages where byline extraction failed or was inaccurate.- Structural changes in the processed document may affect downstream processing, as the filtered content now more precisely reflects the intended scoring elements.
- The removal of duplicate
π Bug Fixes
-
Fixed the omission of manually created
pelements used for scoring. -
Corrected ancestor assignment beyond the
bodyelement to prevent runtime panics caused by incorrect root element assignments.
Full Changelog: 0.4.0...0.5.0