Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ticket/16985] Fix MYSQLi bug - Incorrect string value for non-BMP chars #6384

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion phpBB/phpbb/db/driver/mysqli.php
Original file line number Diff line number Diff line change
Expand Up @@ -331,7 +331,10 @@ function sql_freeresult($query_id = false)
*/
function sql_escape($msg)
{
return @mysqli_real_escape_string($this->db_connect_id, $msg);
return @mysqli_real_escape_string(
$this->db_connect_id,
utf8_encode_ucr($msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is assuming the intention for all usage and all extensions that already use this function. What exactly happens when this function has already been called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because utf8_encode_ucr only modifies non-BMP unicode characters, and HTML entities are ASCII only (i.e. will always lie well within the BMP), that means it's idempotent for multiple calls. In other words, utf8_encode_ucr(utf8_encode_ucr(utf8_encode_ucr($str))) will always yield the same result as utf8_encode_ucr($str).

See this playground on repl.it (functions copied directly from includes/utf/utf_tools.php):

https://replit.com/@lionel_rowe/utf8encodeucr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not convinced this is the right approach. It is arbitrarily deciding how everyone should store their data and will likely produce unexpected results. Probably better to do specific implementations where needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to do specific implementations where needed.

As it has been done so far, let's say there are no bugs yet uncovered or there are some areas that have not fallbacks. Everything has been covered until proven otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arbitrarily deciding how everyone should store their data

I wholeheartedly agree in principle, but unfortunately that ship has sailed long ago, back when utf8_encode_ucr was introduced into the codebase in the first place, and probably even before that — the vast majority of string fields in phpBB are already (unnecessarily) stored as HTML, which should really be handled at the templating layer. For example, if I create a user with the username "<jimmy>", this is stored in the database as &lt;jimmy&gt;. All this PR does is ensure that (for example) username "jimmy💩" would also be stored as jimmy&#128169; to avoid MYSQLi throwing errors all over the place, which I've already encountered multiple times (I use a lot of emojis 😂).

Worst case with this PR is that a user occasionally sees strings looking like "&#128169;", as opposed to the current worst case, which is that they get full-page error messages and potential data loss (e.g. the action they were performing is lost).

If this was merged along with #6377, even that relatively minor tradeoff could also be eliminated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are many less-than-ideal things about the phpBB codebase, as surely everyone will agree.

Ok, you're putting new logic into a function that is core to the product (dbal) and have so far justified it with "the function exists so it should be used in a spot that affects any and all code that uses it (that ship has sailed)", "if it breaks someone's code then so be it (user will sometimes see HTML encoded representations of the text)", and "the function wasn't used here because the code base is less than ideal".

While these aren't really good reasons/arguments I realize this PR is for master anyways so it's probably ok as a potentially breaking change for the next major version given the expectation that some extensions may need some changes anyways.

Copy link
Contributor

@3D-I 3D-I Apr 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavidIQ Do you think it is a good idea to have Emoji in usernames or other parts of the code? I don't think so as we have already discussed this and put fallbacks in place.

#5556

This cannot be a general fixing where the text parser is not already otherwise present, we have already considered all possibilities. If a new deficiency is discovered then the same function should be used (or its counterpart for HTML). This PR does not make logical sense.

Copy link
Contributor

@3D-I 3D-I Apr 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I create a user with the username "<jimmy>", this is stored in the database as &lt;jimmy&gt;. All this PR does is ensure that (for example) username "jimmy💩" would also be stored as jimmy&#128169; to avoid MYSQLi throwing errors all over the place

#5556 - We do not want Emoji in usernames and some other places.

It does not make sense, please try first. There are multiple configurations for usernames at registration time. I would suggest that you learn more about phpBB at the UI level. Names like <jimmy> are possible. https://www.phpbb.com/community/memberlist.php?first_char=other#memberlist

image

So do not confuse what is stored in the database with what is being rendered at runtime (html).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@3D-I the username thing was just an example. Also, if an admin wants to allow users to have emojis in their username, why shouldn't that be allowed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavidIQ Ok, you're putting new logic into a function that is core to the product (dbal) and have so far justified it with "the function exists so it should be used in a spot that affects any and all code that uses it (that ship has sailed)", "if it breaks someone's code then so be it (user will sometimes see HTML encoded representations of the text)", and "the function wasn't used here because the code base is less than ideal".

What I'm saying is that any ways this breaks code will be less severe than what it fixed, i.e. it's better to occasionally show raw HTML entities to users than to occasionally show full-page MYSQLi errors to users that may also cause them to lose work.

However, I accept that there may be a better way to fix this, so I'll have a bit more of a think about this one.

);
}

/**
Expand Down