Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TidyNodeGetText returns text with a new line appended #775

Closed
davidgumberg opened this issue Nov 20, 2018 · 2 comments
Closed

TidyNodeGetText returns text with a new line appended #775

davidgumberg opened this issue Nov 20, 2018 · 2 comments

Comments

@davidgumberg
Copy link

davidgumberg commented Nov 20, 2018

Currently TidyNodeGetText returns the node text and appends the newline Character set by TidyNewline and there is no way to retrieve the text without the appended newline

https://github.com/htacg/tidy-html5/blob/master/src/tidylib.c#L2388

@geoffmcl
Copy link
Contributor

@davidgumberg as the code TY_(PFlushLine)( doc, 0 ); indicates, it will append a TidyNewline to the buffer...

Yes, this could be avoided by an option, but what option? Current, or new...

Conversely, an app, using this API, could easily remove such a trailing TidyNewline from the buffer... if need be...

Maybe the docs on TidyNodeGetText could be enhanced with ...

Gets the text of a node and places it into the given TidyBuffer.

The text will be terminated with a `TidyNewline`.

If you want the raw utf-8 stream see `tidyNodeGetValue`.

Or do you think there should be an option controlling this pretty print output? What?

Look forward to further feedback... thanks...

@davidgumberg
Copy link
Author

davidgumberg commented Nov 21, 2018

I agree about the significant difference in work between implementing an option and rewriting TidyNodeGetText, and the user simply removing the character from the buffer.

I'm not at all qualified to say whether or not this option should be available because I don't have experience with API maintenance and don't know much about the priorities and considerations of library developers. And I'm not sure that I speak for the actual use cases of most people using the library as I'm currently using it to scrape data from webpages not for displaying and correcting errors in a document.

So I think for use cases like mine tidyNodeGetValue may be more appropriate, but I do agree that the documentation should be updated as you said.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants