Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you define options for a baseElement selector? #281

Closed
pgoldweic opened this issue Jan 24, 2023 · 7 comments
Closed

Can you define options for a baseElement selector? #281

pgoldweic opened this issue Jan 24, 2023 · 7 comments

Comments

@pgoldweic
Copy link

pgoldweic commented Jan 24, 2023

I am trying to retrieve text for a specific selector ONLY, with specific options for such selector. I've tried doing the following:

 let text = convert(html, {baseElements: { selectors: ['h1.PageTitle']}} )

which works correctly but does NOT have any options for how I want to the title to show up. When I try instead:

let text = convert(html, {baseElements: { selectors: ['h1.PageTitle'] }, 
        selectors: [ {selector: 'h1.PageTitle', format: 'block', options: { uppercase: false} } ] })

I get the text for the whole document and NOT just for the baseElement selectors. What am I doing wrong? Or, is there no way to specify formatting for the base element selectors?

@pgoldweic pgoldweic changed the title Can you define a selector that applies only to those meant to be included in the output (via baseElements)? Can you define a selector that applies only to the relevant HTML meant to be included in the output (via baseElements)? Jan 24, 2023
@pgoldweic pgoldweic changed the title Can you define a selector that applies only to the relevant HTML meant to be included in the output (via baseElements)? Can you define options for a selector that applies only to the baseElements? Jan 24, 2023
@pgoldweic pgoldweic changed the title Can you define options for a selector that applies only to the baseElements? Can you define options for a baseElement selector? Jan 24, 2023
@KillyMXI
Copy link
Member

I checked it to make sure, and I can't reproduce the issue.

{
  baseElements: { selectors: ['div.foo'] },
  selectors: [
    { selector: 'div.foo', format: 'blockTag', options: { leadingLineBreaks: 5 } }
  ]
}

-- this works just fine in my experiments, elements are selected and formatted accordingly.

That's how it works in the code. Selected base elements are processed by the same rules as any children elements.

I can't see typos in your second example (block formatter doesn't have anything to do with uppercase option but that's irrelevant to the described issue).
Make sure you are running what you think you are running.

@pgoldweic
Copy link
Author

Thanks @KillyMXI for your prompt response! However, I continue to see the totality of the text in my tests... this is very odd. I have double checked to ensure that my syntax is correct and haven't found anything wrong yet. I've also changed to using a 'heading' format instead of 'block' to see if that causes any changes, but the output hasn't changed. Let me know if you have any other ideas. Thanks!

@KillyMXI
Copy link
Member

I don't have enough information to even guess.

How do you run your code?
If in Node.js, then what Node version is it?
Are you using html-to-text version 9.0.3?
Is there any chance you're editing one file but testing another?
Are you preprocessing your html in any way before converting?

Try to make an isolated example. (npm init a separate package, npm i html-to-text, in the index.js do just the conversion, similar to the example, just with your html and options. Run it with node ./index.js)
Does the issue persist this way? If yes, then I'd like to take a look at the reproduction example (code and html). If no, then you'd have to keep narrowing on the cause of the issue in your pipeline differences.

@pgoldweic
Copy link
Author

ok @KillyMXI , I think I figured out how to resolve the problem, although I'm not sure I can explain it myself (most likely I misunderstood the use of the configuration instructions for better performance - that is the 'compile' option). This morning I had changed in my script the line that read:

const { convert } = require('html-to-text')

and changed it with:

const { compile } = require('html-to-text')
const convert = compile({
    wordwrap: 130
})

and then used 'convert' just like I was using it before the change. However, this caused the code to break as I described earlier. When I changed it back to using the original configuration for 'convert', it started working again. From here I conclude that the 'compile' configuration is likely not appropriate for regular use.

@KillyMXI
Copy link
Member

KillyMXI commented Jan 24, 2023

const { compile } = require('html-to-text')

const convert = compile({ ...options }) // options here

const text = convert(html) // no options here

-- this convert is different - it already has options in it. You can't add more options later when you call it.
It is recommended when you have to process many documents with the same options.

Perhaps I can improve the documentation a bit to make the difference clearer.

@pgoldweic
Copy link
Author

That sounds like a good idea @KillyMXI . Thanks for your explanation!

@KillyMXI
Copy link
Member

I updated readme a bit. That will hopefully reduce the chance of such confusion.

Documentation is due for a rework. I'm not paying a lot of attention to it currently, before I will get to properly organizing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants