Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when preserveComments directive is enabled, the HTML comments are moved to the end #61

Open
kenakune opened this issue Dec 16, 2020 · 2 comments
Labels

Comments

@kenakune
Copy link

kenakune commented Dec 16, 2020

Added:

into the config

I used this input:

<p>this is a test content before start testing</p>
<!-- TESTING COMMENT --><p>another line</p>
<p>end of the content</p>

then after

Policy policy = Policy.getInstance(App.class.getResourceAsStream("/antisamyConfig.xml"));
AntiSamy sanitizer = new AntiSamy(policy); 
CleanResults scanned = sanitizer.scan(input);
String sanitized = scanned.getCleanHTML(); 

The output was:

<p>this is a test content before start testing</p>
<p>another line</p>
<p>end of the content</p>
<!-- TESTING COMMENT -->
@spassarop
Copy link
Collaborator

Hi @kenakune. I wasn't aware about which version, directives or parser you were using, so I tested the combinations.

With the SAX parser, I could reproduce your issue. No matter which other directives I use, the comment goes to the end. Debugging the behavior in the Neko library I sadly could not understand how it actually works and why the comment goes to the end. It seems like it does write on a certain buffer on the process and it may not do that again for the <p> tags, maybe that buffer is read at the end and written into the result... but that's just speculation.

With the DOM parser, the comment just disappears :/ so that's even worse. There is something wrong on the output formatters which puts the comments where it wants (or nowhere).

It's difficult for me to say if this can be solved at AntiSamy level or it's a problem of the other libraries like Neko HTML. @nahsra is the one that may know how that works internally, but for now I cannot offer any solution :(

@davewichers davewichers added the bug label Jan 8, 2022
@nahsra
Copy link
Owner

nahsra commented Jan 11, 2022

(Just FYI, I edited the issue description to have code formatting.)

Sorry, I don't understand why that's happening. I took a look at the preserveComments directive and it looks like it's being honored correctly and set appropriately for every scan in both engines. We don't control what events (elements) are emitted back to our listeners, so this is probably upstream of us. I've not had much luck convincing the neko library to issue updates even for DoS issues (see AntiSamyDOMScanner.java#L152), so even if we spent the time to diagnose the issue, I'm skeptical anything can be done about it, as the transformation is lossy and we won't be able to even create a workaround for you.

I just checked and we do have several test cases that involve comments and they there doesn't seem to be any issue in them, so my suspicion is it's a narrow case. Wish I had a better answer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants