Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always use htmlspecialchars($str, ENT_COMPAT, 'UTF-8') #10399

Closed
mahagr opened this issue May 11, 2016 · 17 comments
Closed

Always use htmlspecialchars($str, ENT_COMPAT, 'UTF-8') #10399

mahagr opened this issue May 11, 2016 · 17 comments

Comments

@mahagr
Copy link
Contributor

mahagr commented May 11, 2016

One of our users ran into some issues with his Joomla installation because of his server had default_charset=latin1 in php.ini.

Warning: htmlspecialchars(): charset `latin1' not supported, assuming utf-8 in ....

Documentation for htmlspecialchars() says that: "PHP 5.6: The default value for the encoding parameter was changed to be the value of the default_charset configuration option"

The issue can be fixed by making sure that all htmlspecialchars() calls have encoding set.

Additionally in my tests there is an issue if for some reason incoming encoding is wrong, which is easy to fix by using ENT_SUBSTITUTE (PHP 5.4+ only!):

    var_dump(htmlspecialchars('Täällä olen minä',  ENT_COMPAT, 'UTF-8'));
    var_dump(htmlspecialchars('T'.chr(132).chr(132).'ll'.chr(132).' olen min'.chr(132), ENT_COMPAT, 'UTF-8'));
    var_dump(htmlspecialchars('T'.chr(132).chr(132).'ll'.chr(132).' olen min'.chr(132), ENT_COMPAT|ENT_SUBSTITUTE, 'UTF-8'));

Results:

string(20) "Täällä olen minä"
string(0) ""
string(24) "T��ll� olen min�"
@Bakual
Copy link
Contributor

Bakual commented May 11, 2016

Afaik, we usually define the encoding when using htmlspecialchars().
Do you have a specific place where it wasn't done and produced that error?

@mahagr
Copy link
Contributor Author

mahagr commented May 11, 2016

A quick search indicates that there are probably hundreds of locations where the call only has a single parameter. It looks like the parameters are omitted more often than used.

@brianteeman
Copy link
Contributor

And yet we have never had reports. Isn't this a server config issue?
On 11 May 2016 12:08 pm, "Matias Griese" notifications@github.com wrote:

A quick search indicates that there are probably hundreds of locations
where the call only has a single parameter. It looks like the parameters
are omited more often than used.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#10399 (comment)

@mahagr mahagr changed the title Always use htmlspecialchars($str, ENT_COMPAT|ENT_SUBSTITUTE, 'UTF-8') Always use htmlspecialchars($str, ENT_COMPAT, 'UTF-8') May 11, 2016
@mahagr
Copy link
Contributor Author

mahagr commented May 11, 2016

Yes, in this case the issues were caused by a bad server configuration, but PHP documentation says that omitting the encoding parameter is not recommended.

But as you still support PHP 5.3, you may get into troubles as the function call in PHP 5.3 still uses ISO-8859-1 encoding by default, which may result on broken characters with UTF-8.

So not having those two extra parameters set is a bad idea in both PHP 5.3 and 5.6+.

@brianteeman
Copy link
Contributor

If it was just one or two places to edit then of course I would say go for
it but if it really is so many places and with no other reports i am
reluctant

On 11 May 2016 at 12:25, Matias Griese notifications@github.com wrote:

Yes, in this case the issues were caused by a bad server configuration,
but PHP documentation says that omitting the encoding parameter is not
recommended.

But as you still support PHP 5.3, you may get into troubles as the
function call in PHP 5.3 still uses ISO-8859-1 encoding by default, which
may result on broken characters with UTF-8.

So not having those two extra parameters set is a bad idea in both PHP 5.3
and 5.6+.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#10399 (comment)

Brian Teeman
Co-founder Joomla! and OpenSourceMatters Inc.
http://brian.teeman.net/

@Bakual
Copy link
Contributor

Bakual commented May 11, 2016

So latin1 isn't a supported value for default_charset and thus it produced the warning. At least I understand the documentation like this.

I'm a bit with Brian here. While it would be recommended to always specify the encoding in htmlspecialchars(), we didn't really experience any issues in the places where it wasn't done.
Technically, the issue is valid. In practice, it doesn't seem to matter.

If someone wants to track down all occurances and fix them, go ahead. It could be a pain to test though.

@mahagr
Copy link
Contributor Author

mahagr commented May 11, 2016

LOL, it looks that at least @zero-24 is working on this issue. :)

I reported this issue as it took fairly long time to figure out what was wrong in the user setup and as it affected our software as well. And the warning was indeed caused by bad configuration.

BTW: Would it be a good idea to introduce some common escape() method into Joomla? I hate using htmlspecialchars() for some obvious reasons.

@Bakual
Copy link
Contributor

Bakual commented May 11, 2016

BTW: Would it be a good idea to introduce some common escape() method into Joomla? I hate using htmlspecialchars() for some obvious reasons.

You mean something like https://github.com/joomla/joomla-cms/blob/staging/libraries/legacy/view/legacy.php#L347-L367

@mahagr
Copy link
Contributor Author

mahagr commented May 11, 2016

Yes, but something that can be used outside of views.

@Bakual
Copy link
Contributor

Bakual commented May 11, 2016

Not sure if it makes sense for modules and plugins as those usually don't have that many strings to escape.
It's only a single function call we're talking about. You may end up writing more code than what you try to replace 😄

@mahagr
Copy link
Contributor Author

mahagr commented May 11, 2016

Yeah, I agree its likely much bigger change, but it would prevent people from omitting parameters. But as you can see, nobody uses $this->escape() either in template files... :)

@Bakual
Copy link
Contributor

Bakual commented May 11, 2016

But as you can see, nobody uses $this->escape() either in template files... :)

That was my thinking as well, but a search proved me wrong 😄

@brianteeman
Copy link
Contributor

Closed as @zero-24 is being a hero


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/10399.

@zero-24
Copy link
Contributor

zero-24 commented May 11, 2016

I'm doing it step by step so it can be easily reviewed and tested 😄

@izharaazmi
Copy link
Contributor

Can we have something to abstract this dirty looking function call into something like: JHtmlString::escape() or JHtml::escape() and use it consistently in the core at least?

About the 3rd party developers we can't help much. I have seen many of them still using $_GET, $_REQUEST, $_SESSION etc. So they are not a question here.

PS: If done so, we may deprecate the JViewLegacy::escape too and have a common method everywhere. This is however just a flying idea and needs discussion.

@Bakual
Copy link
Contributor

Bakual commented May 11, 2016

I'm not a huge fan of wrapping a native PHP function into an own static method.
JViewLegacy::escape is a bit special as it allows different encoding and escaping mechanism being used. I don't know if that functionality is actually used anywhere but it's there. If not for that additional feature, I would be in favor of just replacing it with the htmlspecialchars() call directly.

@mahagr
Copy link
Contributor Author

mahagr commented May 11, 2016

I can see one use for the method: being able to do better in PHP 5.4+:

htmlspecialchars($str, ENT_COMPAT|ENT_SUBSTITUTE, 'UTF-8');

That would not break if input is for some reason in non-UTF-8 encoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants