Extraction fails for frontend-heavy Angular SPA HTML with large inline style/font blocks

I tested Defuddle against a large Angular SPA page source and observed extraction failure when the HTML primarily contained framework boilerplate and massive inline CSS/font definitions.

Observed Behavior

Defuddle returned empty/invalid markdown output and extraction failed because the HTML contained very little semantic readable content compared to DOM noise.

The HTML included:

large inline <style> blocks
thousands of @font-face declarations
bootstrap/material CSS
Angular app shell markup
tracking scripts and metadata
Expected Behavior

Defuddle should ideally:

ignore noisy/non-semantic nodes during preprocessing
or provide a preprocessing option for frontend-heavy SPA HTML
Suggested Improvement

A preprocessing step before readability extraction could help significantly, for example removing:

script
style
noscript
svg
stylesheet-related nodes

before running extraction.

Additional Context

The issue was reproduced consistently using fixture-based testing with a saved HTML payload from an Angular application page source.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extraction fails for frontend-heavy Angular SPA HTML with large inline style/font blocks #283

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Extraction fails for frontend-heavy Angular SPA HTML with large inline style/font blocks #283

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions