Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Warning: htmlspecialchars() Invalid multibyte sequence in argument in core/DataTable/Renderer.php on line 223 #3259

Closed
mattab opened this Issue · 19 comments

4 participants

@mattab
Owner

Reported in forum
Input data:

                <url>http://example/?c_id=6&amp;sc_id=97&amp;chars=%EC%E5%F8%EA%EE%E2%FB%E5&amp;brand=ZELMER</url>

                <url>http:/example/?c_id=7&amp;sc_id=206&amp;chars=%E3%E0%E7%EE%E2%FB%E5</url>

with API request: index.php?module=API&method=Actions.getPageUrls&idSite=1&period=day&date=today&format=xml&token_auth=x

results in:

Warning: htmlspecialchars() [<a href='function.htmlspecialchars'>function.htmlspecialchars</a>]: Invalid multibyte sequence in argument in //public_html/core/DataTable/Renderer.php on line 223

Backtrace -->
#0 Piwik_ErrorHandler(...) called at [:]#1 htmlspecialchars(...) called at [//public_html/core/DataTable/Renderer.php:223]#2 Piwik_DataTable_Renderer::formatValueXml(...) called at [//public_html/core/DataTable/Renderer/Xml.php:342]#3 Piwik_DataTable_Renderer_Xml->renderDataTable(...) called at [//public_html/core/DataTable/Renderer/Xml.php:337]#4 Piwik_DataTable_Renderer_Xml->renderDataTable(...) called at [//public_html/core/DataTable/Renderer/Xml.php:142]#5 Piwik_DataTable_Renderer_Xml->renderTable(...) called at [//public_html/core/DataTable/Renderer/Xml.php:33]#6 Piwik_DataTable_Renderer_Xml->render(...) called at [//public_html/core/API/ResponseBuilder.php:221]#7 Piwik_API_ResponseBuilder->getRenderedDataTable(...) called at [//public_html/core/API/ResponseBuilder.php:325]#8 Piwik_API_ResponseBuilder->handleDataTable(...) called at [//public_html/core/API/ResponseBuilder.php:79]#9 Piwik_API_ResponseBuilder->getResponse(...) called at [//public_html/core/API/Request.php:130]#10 Piwik_API_Request->process(...) called at [//public_html/plugins/API/Controller.php:27]#11 Piwik_API_Controller->index(...) called at [:]#12 call_user_func_array(...) called at [//public_html/core/FrontController.php:138]#13 Piwik_FrontController->dispatch(...) called at [//public_html/index.php:53]

Bug fix ?
I'm not sure if we can reproduce this error with this input data, and I'm not sure which fix is correct in this case. Is it a bug in PHP? is it that we truncate the input data too early?

we could fix it by keeping the original input data if the result of htmlspecialchars is empty. But then it could maybe cause some XSS or other security issue if someone manages to generate bogus data leading to this error in order to print out a xss?

@mattab
Owner

A temporary quick fix for users with this issue is as follwos:

Index: core/DataTable/Renderer.php
===================================================================
--- core/DataTable/Renderer.php (revision 6483)
+++ core/DataTable/Renderer.php (working copy)
@@ -220,7 +220,7 @@
            && !is_numeric($value)) 
        {
            $value = html_entity_decode($value, ENT_COMPAT, 'UTF-8');
-           $value = htmlspecialchars($value, ENT_COMPAT, 'UTF-8');
+           $value = @htmlspecialchars($value, ENT_COMPAT, 'UTF-8');

But this is only temporary, definitely not the ideal fix.

if it fails and returns empty string, maybe we could filte rthe initial input to a-zA-Z0-9 or similar and keep only the safe characters?

@mattab
Owner

Fix suggested by Anthon:
if(function_exists('iconv')) $value = @iconv('UTF-8', 'UTF-8//IGNORE', $value);

I see in the php doc: http://www.php.net/manual/en/function.iconv.php#108643

that maybe this doesn't work in all php versions?

The comment suggests:
ini_set('mbstring.substitute_character', "none");
$text= mb_convert_encoding($text, 'UTF-8', 'UTF-8');

Any particular thoughts?

@robocoder
Collaborator

We can try to run some tests between php versions and compare.

The more portable, the better.

@anonymous-piwik-user

I have a similar error at some actions at "Visitors --> Visitor Log":

There is an error. Please report the message (Piwik 1.8.3) and full backtrace in the Piwik forums (please do a Search first as it might have been reported already!).

Warning: htmlspecialchars(): Invalid multibyte sequence in argument in /srv/www/htdocs/piwik/core/SmartyPlugins/modifier.escape.php on line 31

Backtrace -->
#0 Piwik_ErrorHandler(...) called at [:]#1 htmlspecialchars(...) called at [/srv/www/htdocs/piwik/core/SmartyPlugins/modifier.escape.php:31]#2 smarty_modifier_escape(...) called at [/srv/www/htdocs/piwik/tmp/templates_c/%%46^462^46227F0A%%visitorLog.tpl.php:337]#3 include(...) called at [/srv/www/htdocs/piwik/libs/Smarty/Smarty.class.php:1263]#4 Smarty->fetch(...) called at [/srv/www/htdocs/piwik/core/View.php:133]#5 Piwik_View->render(...) called at [/srv/www/htdocs/piwik/core/Controller.php:153]#6 Piwik_Controller->renderView(...) called at [/srv/www/htdocs/piwik/plugins/Live/Controller.php:84]#7 Piwik_Live_Controller->getVisitorLog(...) called at [:]#8 call_user_func_array(...) called at [/srv/www/htdocs/piwik/core/FrontController.php:138]#9 Piwik_FrontController->dispatch(...) called at [/srv/www/htdocs/piwik/index.php:53]
@mattab
Owner

There is a similar bug, reported in the forums

@mattab
Owner

@capedfuzz maybe similar to the other bug you're dealing with.

@diosmosis
Collaborator

This bug is due to query parameter values being encoded in non-UTF-8 charsets. When archiving, there's no way to know what this other encoding is, and so we have to use UTF-8 w/ htmlspecialchars. I came up w/ two possible ways of fixing this one:

  • Either associate the charset w/ the URLs in the DB when tracking (using the 'cs' query param I added in #3450). (Not sure how to do this efficiently)
  • Or, add a charset field to the piwik_site table and let users set it if their site does not use UTF-8. We can display an info here saying it's better to just use UTF-8.
@mattab
Owner

when the charset is defined and not utf-8, maybe in the Tracker, we should also re-encode the URL in UTF-8, since it's likely the URL parameters are encoded in the page charset...?

@diosmosis
Collaborator

Replying to matt:

when the charset is defined and not utf-8, maybe in the Tracker, we should also re-encode the URL in UTF-8, since it's likely the URL parameters are encoded in the page charset...?

I tried that and it works in that it displays the label correctly, but then the URL becomes invalid. As in, if you visit the link, it won't work anymore. As far as I can tell, there has to be some way to store the charset of the query params somewhere (like in the piwik_site table).

@mattab
Owner

How can we quickly fix this problem, since the proper solution would have performance overhead which we can't consider for this use case.

The goal is to see a quick fix, for example, would
if(function_exists('iconv')) $value = @iconv('UTF-8', 'UTF-8//IGNORE', $value);
work to force convert to Utf8 the url when the htmlspecialchars return empty?

@diosmosis
Collaborator

Replying to matt:

How can we quickly fix this problem, since the proper solution would have performance overhead which we can't consider for this use case.

The goal is to see a quick fix, for example, would
if(function_exists('iconv')) $value = @iconv('UTF-8', 'UTF-8//IGNORE', $value);
work to force convert to Utf8 the url when the htmlspecialchars return empty?

From what I remember of my test, this would strip the non-UTF8 encoded characters from the string, so I guess "www.site.com?q=%EC%ED..." would turn into "www.site.com?q=". Since this only affects the query string, we might be able to do this to each individual query param and for stuff that doesn't have UTF8 encoding, we just display the encoded data (ie, %EC%ED)? Not sure if it'll work, though, but when I get a chance I'll check it out.

@diosmosis
Collaborator

(In [7336]) Refs #3259, strip non-UTF-8 chars from strings in outputted XML/HTML so htmlspecialchars won't fail.

@diosmosis
Collaborator

I've applied one part of a fix for this. This will make sure the warning doesn't display and will make sure at least SOMETHING is shown as the label for strings w/ non-UTF-8 chars. The non-UTF-8 stuff won't show though.

The only way I can think of to fix that is to look for each %.. encoded item and check if the decoded char is invalid UTF-8. If so, it gets double encoded so html_entity_decode won't turn it into the non-UTF-8 value. This seems excessive, but I can't think of another way of fixing this (specifying a charset column for piwik_site might work, but would likely take a while to implement).

@mattab
Owner

Looks good to me! Thanks for researching and finding this fix, which is better than before. We cant really afford to store charsets per website... too complicated.

However, I have an idea for a possible "ultimate" solution: could we use http://php.net/manual/en/function.mb-detect-encoding.php ?? :) that looks interesting. Maybe we could run this function when htmlspecialchars return empty string (prior to your commmit) ? would it work to detect encoding of the URL containing non utf8 characters?

@diosmosis
Collaborator

Replying to matt:

Looks good to me! Thanks for researching and finding this fix, which is better than before. We cant really afford to store charsets per website... too complicated.

However, I have an idea for a possible "ultimate" solution: could we use http://php.net/manual/en/function.mb-detect-encoding.php ?? :) that looks interesting. Maybe we could run this function when htmlspecialchars return empty string (prior to your commmit) ? would it work to detect encoding of the URL containing non utf8 characters?

We can tell if it's not UTF-8, but that doesn't mean we can find out what the true encoding is. Some non-UTF8 strings can still be interpreted as UTF-8 though the result will be gibberish.

@mattab
Owner

This function will find out the encoding I think: http://php.net/manual/en/function.mb-detect-encoding.php -- it might work on the non utf8 URL, so you could display a label that works with non UTF8 URLs without modifying the URLs, simply detect encoding > convert from detectedEncoding to UTF8 > Parse URL To get the Pretty name <label>

@diosmosis
Collaborator

The warning is gone, for now I think that's good enough.

@mattab
Owner

We have a variation of similar error in: #3859

@mattab
Owner

Invalid multibyte sequence in argument in /html/piwik/vendor/twig/twig/lib/Twig/Extension/Core.php on line 927

This bug has occurred again in Piwik 2.2.2. I created ticket at #5157

@mattab mattab added this to the 1.10 - Piwik 1.10 milestone
@diosmosis diosmosis was assigned by mattab
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.