Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextFormatter handling improvements #1516

Closed
luceos opened this issue Jul 23, 2018 · 14 comments
Closed

TextFormatter handling improvements #1516

luceos opened this issue Jul 23, 2018 · 14 comments

Comments

@luceos
Copy link
Member

luceos commented Jul 23, 2018

The comment post validator limits the content to 65535 characters. The text formatter compiles that text to something it can understand (xml). Storing it to the database could create a truncated value which text formatter can no longer transform back.

We need to tackle this before beta 8, as this has caused some mischievous behavior.

In addition we need to capture any errors text formatter throws while rendering the saved content.

PHP message: PHP Warning:  DOMDocument::loadXML(): Premature end of data in tag p line 81 in Entity, line: 82 in /home/forge/discuss.flarum.org/vendor/s9e/text-formatter/src/Renderer.php on line 20
PHP message: PHP Warning:  DOMDocument::loadXML(): Premature end of data in tag r line 1 in Entity, line: 82 in /home/forge/discuss.flarum.org/vendor/s9e/text-formatter/src/Renderer.php on line 20
PHP message: PHP Fatal error:  Uncaught TypeError: Argument 1 passed to Renderer_823d754b1702b30c553dd3e2526c676306a6a45c::at() must be an instance of DOMNode, null given, called in /home/forge/discuss.flarum.org/storage/formatter/Renderer_823d754b1702b30c553dd3e2526c676306a6a45c.php on line 30 and defined in /home/forge/discuss.flarum.org/storage/formatter/Renderer_823d754b1702b30c553dd3e2526c676306a6a45c.php:33
Stack trace:
#0 /home/forge/discuss.flarum.org/storage/formatter/Renderer_823d754b1702b30c553dd3e2526c676306a6a45c.php(30): Renderer_823d754b1702b30c553dd3e2526c676306a6a45c->at(NULL)
#1 /home/forge/discuss.flarum.org/vendor/s9e/text-formatter/src/Renderer.php(28): Renderer_823d754b1702b30c553dd3e2526c676306a6a45c->renderRichText('<r><p>&lt;r&gt;...')

I am pushing this onto beta 8 to prevent abuse of user forums.

Based on research by @sijad

@luceos luceos added this to the 0.1.0-beta.8 milestone Jul 23, 2018
@luceos luceos changed the title TextFormatter improvements TextFormatter handling improvements Jul 23, 2018
@sijad
Copy link
Contributor

sijad commented Jul 23, 2018

in some cases textformatter skip renderQuick() and try to parse text via DOMDocument as truncated post's content is not an valid XML, DOMDocument returns a null node thus when textformatter try to access to node properties it throws this error.

currently there's a way to bypass 65535 limit using some unicodes, which results truncated text, I guess the best solution here would be to make sure post content never truncate.

@franzliedke
Copy link
Contributor

/cc @JoshyPHP

@JoshyPHP
Copy link
Contributor

JoshyPHP commented Jul 23, 2018

I've updated the renderer to throw an exception if loadXML fails. It'll prevent it from causing a fatal error. s9e/TextFormatter@806df83 will be in 1.2.2.

If you feed malformed XML to the renderer, it may return malformed HTML or throw an exception, depending on whether you hit a hot path or a cold one. You'll have to test the XML manually if there's a chance it's malformed.

@JoshyPHP
Copy link
Contributor

1.2.2 has been released and it should detect truncated XML regardless of what code path it takes. It does not validate the XML systematically though, so if the XML is corrupted in any other way than a simple truncation, it will not always detect it.

@franzliedke franzliedke self-assigned this Aug 24, 2018
@franzliedke

This comment has been minimized.

@JoshyPHP

This comment has been minimized.

@franzliedke
Copy link
Contributor

Sorry, @JoshyPHP. I think I confused this with issue with #1452. Let's continue this discussion there.

@Magicalex
Copy link

Magicalex commented Sep 16, 2018

Why the comment post validator limits the content to 65535 characters?
see: #1044

@JoshyPHP
Copy link
Contributor

JoshyPHP commented Sep 16, 2018

The column used to be TEXT, which is limited to 65535 characters.

As of 986102c it's MEDIUMTEXT, limited to 2^24-1.

@tobyzerner
Copy link
Contributor

So do we need to do anything else here? posts.content is now MEDIUMTEXT but the PostValidator still limits post content to 65535 characters, then there shouldn't be an issue with database truncation any more. We can make the 65535 limit configurable down the track.

@franzliedke
Copy link
Contributor

I think you're right. The validation happens after the content is parsed / transformed by TextFormatter, so this should be fine.

@franzliedke
Copy link
Contributor

Would be worth a regression test, though...

@b1n4ryj4n
Copy link

Is it still limited to 65535 characters?

@Magicalex
Copy link

Yes it's still limited by PostValidator
https://github.com/flarum/core/blob/master/src/Post/PostValidator.php#L19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants