We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Options
const options = { wordwrap: null, selectors: [ { selector: 'a', options: { ignoreHref: true } }, { selector: 'img', format: 'skip' }, { selector: 'nav', format: 'skip' }, { selector: 'header', format: 'skip' }, { selector: 'footer', format: 'skip' }, { selector: '*[data-elementor-type=footer]', format: 'skip' }, { selector: '*[data-elementor-type=header]', format: 'skip' }, ], };``` **Version information** "html-to-text": "^9.0.5", "next": "^13.4.8", ---- When trying to scrape a webpage, i try to remove the header, footer images, navs and links to get only the text. however, for some reason, i get the footer text in the result i tried this both on elementor sites (with and without the data attributes) and on non-elementor sites (with the footer tag), also tried with and without the astric before the data attribute
The text was updated successfully, but these errors were encountered:
const html = ` <header>header</header> <div data-elementor-type="header">elementor type header</div> <p>paragraph</p> <div data-elementor-type="footer">elementor type footer</div> <footer>footer</footer>`; const options = { wordwrap: null, selectors: [ { selector: 'a', options: { ignoreHref: true } }, { selector: 'img', format: 'skip' }, { selector: 'nav', format: 'skip' }, { selector: 'header', format: 'skip' }, { selector: 'footer', format: 'skip' }, { selector: '*[data-elementor-type=footer]', format: 'skip' }, { selector: '*[data-elementor-type=header]', format: 'skip' }, ], }; const text = htmlToText(html, options); console.log(text);
Outputs only
paragraph
Start reducing your issue to a minimal example to find out what might be wrong in your case.
Sorry, something went wrong.
With no follow-up, I consider this resolved.
Most likely cause - unexpected input HTML and insufficient attention to what input HTNL actually contains and what options are actually used.
No branches or pull requests
Options
The text was updated successfully, but these errors were encountered: