-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warning: htmlspecialchars() Invalid multibyte sequence in argument in core/DataTable/Renderer.php on line 223 #3259
Comments
A temporary quick fix for users with this issue is as follwos:
But this is only temporary, definitely not the ideal fix. if it fails and returns empty string, maybe we could filte rthe initial input to a-zA-Z0-9 or similar and keep only the safe characters? |
Fix suggested by Anthon: I see in the php doc: http://www.php.net/manual/en/function.iconv.php#108643 that maybe this doesn't work in all php versions? The comment suggests: Any particular thoughts? |
We can try to run some tests between php versions and compare. The more portable, the better. |
I have a similar error at some actions at "Visitors --> Visitor Log":
|
There is a similar bug, reported in the forums |
@capedfuzz maybe similar to the other bug you're dealing with. |
This bug is due to query parameter values being encoded in non-UTF-8 charsets. When archiving, there's no way to know what this other encoding is, and so we have to use UTF-8 w/ htmlspecialchars. I came up w/ two possible ways of fixing this one:
|
when the charset is defined and not utf-8, maybe in the Tracker, we should also re-encode the URL in UTF-8, since it's likely the URL parameters are encoded in the page charset...? |
Replying to matt:
I tried that and it works in that it displays the label correctly, but then the URL becomes invalid. As in, if you visit the link, it won't work anymore. As far as I can tell, there has to be some way to store the charset of the query params somewhere (like in the piwik_site table). |
How can we quickly fix this problem, since the proper solution would have performance overhead which we can't consider for this use case. The goal is to see a quick fix, for example, would |
Replying to matt:
From what I remember of my test, this would strip the non-UTF8 encoded characters from the string, so I guess "www.site.com?q=%EC%ED..." would turn into "www.site.com?q=". Since this only affects the query string, we might be able to do this to each individual query param and for stuff that doesn't have UTF8 encoding, we just display the encoded data (ie, %EC%ED)? Not sure if it'll work, though, but when I get a chance I'll check it out. |
(In [7336]) Refs #3259, strip non-UTF-8 chars from strings in outputted XML/HTML so htmlspecialchars won't fail. |
I've applied one part of a fix for this. This will make sure the warning doesn't display and will make sure at least SOMETHING is shown as the label for strings w/ non-UTF-8 chars. The non-UTF-8 stuff won't show though. The only way I can think of to fix that is to look for each %.. encoded item and check if the decoded char is invalid UTF-8. If so, it gets double encoded so html_entity_decode won't turn it into the non-UTF-8 value. This seems excessive, but I can't think of another way of fixing this (specifying a charset column for piwik_site might work, but would likely take a while to implement). |
Looks good to me! Thanks for researching and finding this fix, which is better than before. We cant really afford to store charsets per website... too complicated. However, I have an idea for a possible "ultimate" solution: could we use http://php.net/manual/en/function.mb-detect-encoding.php ?? :) that looks interesting. Maybe we could run this function when htmlspecialchars return empty string (prior to your commmit) ? would it work to detect encoding of the URL containing non utf8 characters? |
Replying to matt:
We can tell if it's not UTF-8, but that doesn't mean we can find out what the true encoding is. Some non-UTF8 strings can still be interpreted as UTF-8 though the result will be gibberish. |
This function will find out the encoding I think: http://php.net/manual/en/function.mb-detect-encoding.php -- it might work on the non utf8 URL, so you could display a label that works with non UTF8 URLs without modifying the URLs, simply detect encoding > convert from detectedEncoding to UTF8 > Parse URL To get the Pretty name <label> |
The warning is gone, for now I think that's good enough. |
We have a variation of similar error in: #3859 |
This bug has occurred again in Piwik 2.2.2. I created ticket at #5157 |
Reported in forum
Input data:
with API request: index.php?module=API&method=Actions.getPageUrls&idSite=1&period=day&date=today&format=xml&token_auth=x
results in:
Bug fix ?
I'm not sure if we can reproduce this error with this input data, and I'm not sure which fix is correct in this case. Is it a bug in PHP? is it that we truncate the input data too early?
we could fix it by keeping the original input data if the result of htmlspecialchars is empty. But then it could maybe cause some XSS or other security issue if someone manages to generate bogus data leading to this error in order to print out a xss?
The text was updated successfully, but these errors were encountered: