Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: ignoreSelector Array #159

Closed
GeorgeWL opened this issue Jun 7, 2018 · 9 comments
Closed

Feature Request: ignoreSelector Array #159

GeorgeWL opened this issue Jun 7, 2018 · 9 comments

Comments

@GeorgeWL
Copy link

GeorgeWL commented Jun 7, 2018

I think this lib is excellent, but I realised an issue with it that I'd like to see as part of the options object:

There are times where you want to transform all of the contents of a Selector, apart from say one or two internal selectors that are only used on the dynamic version of the website (say something like a element that only shows on hover) without moving them out into another element just so they don't get parsed by HtmlToText

I propose the addition of an array of selectors which defaults to empty.

so something like:

let options = {
        hideLinkHrefIfSameAsText: true,
        ignoreImage: true,
        uppercaseHeadings: false,
        preserveNewLines:true,
        tables: '.content',
        baseElement: ['table.content','table.footer']
        ignoreSelectors:['.hidden','.hover','#hover'] 
//selectors using the same method as baseElement does
    }

which then makes any html elements with those selectors ignored by the parser.

@Niek
Copy link

Niek commented Feb 9, 2021

There's a PR here but it's severely outdated: #186

The best would be if tags would allow CSS selectors so you can format: 'skip' them. Is that hard to add @KillyMXI?

@KillyMXI
Copy link
Member

KillyMXI commented Feb 9, 2021

@Niek yeah, I'm thinking how to expand the system in this direction. Don't want to rush it and end up with something slow or hard to evolve.

As a workaround it might be possible to make a custom formatter to handle the filtering for you. I thought about including something like this along with skip but couldn't finalize the design so left it out for now.

@mattcobb
Copy link

mattcobb commented Feb 12, 2021

+1 for this feature. Our specific use case is we would like to skip tags of a given class, for example <div class="reflist"> or <ol class="references"> from wikipedia articles.

@KillyMXI
Copy link
Member

I can see this being the most demanded feature.
I can't promise it will be the next thing I'll ship but it is on the table and I'll give it more attention.

@mattcobb
Copy link

Other than PR 186 being outdated, what issues does it have?

@KillyMXI
Copy link
Member

I think I should write a contribution guideline to avoid PRs that are unlikely to be merged.

#186:

  • does a couple of things outside of it's own defined scope (minor issue in a grand scheme of things);
  • has no opened issue with agreed design proposal;
  • provided design is not something I'm happy with, and there are enough breaking changes happen naturally already to consciously publish things that I know I'm going to break soon.

What exactly makes me unhappy about it:

  • I'd like to avoid adding ignoredSelectors array and solve it differently. I have a couple of approaches in mind with a bunch of considerations to check;
  • it adds it's own limited selectors checking logic. There are a couple of places already in the code with similar but different selectors logic. I want to unify that. And again, there are some connected questions I'll have to explore;
  • I think it does way too many checks (performance hit) and I can do better.

So it's really not in a state where I could write it down and delegate. A lot of exploratory work.

@KillyMXI
Copy link
Member

Short update:
Solving this issue among with several related ones will be the topic of upcoming version (8.0.0).
I've decided on the design, and done a good portion of work for html-to-text.
Then I realized I can't use a package I was aiming for and have to develop a new one. This is half-way done too.
No ETA though. I don't expect much trouble completing both parts. But most disruptions currently come from elsewhere.

@KillyMXI
Copy link
Member

KillyMXI commented Jun 2, 2021

I've pushed new code into a separate branch - selectors
Tracking issue: #228

I'd be grateful if some of you can test this version on real tasks and provide your feedback before release (in a week probably).
(You can install the package directly from GitHub branch. Instruction is in the tracking issue.)

@KillyMXI
Copy link
Member

KillyMXI commented Jun 9, 2021

Version 8 is now live.
Arbitrary selectors can be ignored as follows:

{
  selectors: [
    { selector: 'foo.hidden', format: 'skip' }
  ]
}

@KillyMXI KillyMXI closed this as completed Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants