-
-
Notifications
You must be signed in to change notification settings - Fork 946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ticket/16985] Fix MYSQLi bug - Incorrect string value for non-BMP chars #6384
Draft
lionel-rowe
wants to merge
1
commit into
phpbb:master
Choose a base branch
from
lionel-rowe:ticket/16985
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is assuming the intention for all usage and all extensions that already use this function. What exactly happens when this function has already been called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because
utf8_encode_ucr
only modifies non-BMP unicode characters, and HTML entities are ASCII only (i.e. will always lie well within the BMP), that means it's idempotent for multiple calls. In other words,utf8_encode_ucr(utf8_encode_ucr(utf8_encode_ucr($str)))
will always yield the same result asutf8_encode_ucr($str)
.See this playground on repl.it (functions copied directly from
includes/utf/utf_tools.php
):https://replit.com/@lionel_rowe/utf8encodeucr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not convinced this is the right approach. It is arbitrarily deciding how everyone should store their data and will likely produce unexpected results. Probably better to do specific implementations where needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it has been done so far, let's say there are no bugs yet uncovered or there are some areas that have not fallbacks. Everything has been covered until proven otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wholeheartedly agree in principle, but unfortunately that ship has sailed long ago, back when
utf8_encode_ucr
was introduced into the codebase in the first place, and probably even before that — the vast majority of string fields in phpBB are already (unnecessarily) stored as HTML, which should really be handled at the templating layer. For example, if I create a user with the username "<jimmy>", this is stored in the database as<jimmy>
. All this PR does is ensure that (for example) username "jimmy💩" would also be stored asjimmy💩
to avoid MYSQLi throwing errors all over the place, which I've already encountered multiple times (I use a lot of emojis 😂).Worst case with this PR is that a user occasionally sees strings looking like "💩", as opposed to the current worst case, which is that they get full-page error messages and potential data loss (e.g. the action they were performing is lost).
If this was merged along with #6377, even that relatively minor tradeoff could also be eliminated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, you're putting new logic into a function that is core to the product (dbal) and have so far justified it with "the function exists so it should be used in a spot that affects any and all code that uses it (that ship has sailed)", "if it breaks someone's code then so be it (user will sometimes see HTML encoded representations of the text)", and "the function wasn't used here because the code base is less than ideal".
While these aren't really good reasons/arguments I realize this PR is for master anyways so it's probably ok as a potentially breaking change for the next major version given the expectation that some extensions may need some changes anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DavidIQ Do you think it is a good idea to have Emoji in usernames or other parts of the code? I don't think so as we have already discussed this and put fallbacks in place.
#5556
This cannot be a general fixing where the text parser is not already otherwise present, we have already considered all possibilities. If a new deficiency is discovered then the same function should be used (or its counterpart for HTML). This PR does not make logical sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#5556 - We do not want Emoji in usernames and some other places.
It does not make sense, please try first. There are multiple configurations for usernames at registration time. I would suggest that you learn more about phpBB at the UI level. Names like
<jimmy>
are possible. https://www.phpbb.com/community/memberlist.php?first_char=other#memberlistSo do not confuse what is stored in the database with what is being rendered at runtime (html).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@3D-I the username thing was just an example. Also, if an admin wants to allow users to have emojis in their username, why shouldn't that be allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm saying is that any ways this breaks code will be less severe than what it fixed, i.e. it's better to occasionally show raw HTML entities to users than to occasionally show full-page MYSQLi errors to users that may also cause them to lose work.
However, I accept that there may be a better way to fix this, so I'll have a bit more of a think about this one.