Skip to content

0.5.0

Choose a tag to compare

@niklak niklak released this 06 Feb 11:27
· 441 commits to main since this release

Release Notes for 0.5.0 (2025-02-06) πŸš€

✨ New Features

  • Introduced Config::candidate_select_mode, allowing you to choose between the Readability.js order or the crate's exclusive implementation for adjusting the top candidate.

  • Added Config::text_mode to control text formatting. The default is TextMode::Raw, ensuring full compatibility with previous versions.

πŸ”§ Changes

  • Optimized Readability::grab_article to retain only the best attempt among failed ones, reducing unnecessary data retention.

  • Improved internal code to reduce execution time.

  • Normalization of Metadata.byline when assigned during article extraction β€” removing new lines and trailing spaces.

⚠️ Breaking Changes

  • Revised document filtering logic: filtering content is now handled separately from the process of extracting elements for scoring. Previously, filtering was tightly coupled with scoring, which impacted the document structure and in some cases resulted in inconsistent content extraction. With the new approach:
    • The removal of duplicate Metadata.title elements is handled more accurately, reducing redundancy and improving document clarity.
    • Metadata.byline is less likely to incorrectly identify commentators or unrelated elements as the article's author, addressing edge cases observed in mozilla/readability test pages where byline extraction failed or was inaccurate.
    • Structural changes in the processed document may affect downstream processing, as the filtered content now more precisely reflects the intended scoring elements.

πŸ› Bug Fixes

  • Fixed the omission of manually created p elements used for scoring.

  • Corrected ancestor assignment beyond the body element to prevent runtime panics caused by incorrect root element assignments.

Changelog

Full Changelog: 0.4.0...0.5.0