Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New way for input getString and getHtml #16875

Closed
wants to merge 1 commit into from

Conversation

csthomas
Copy link
Contributor

@csthomas csthomas commented Jun 27, 2017

Pull Request for Issue ##16812

I'm not sure if this is correct. This is my initial concept.

Summary of Changes

  1. New way for getString() method:
    getString() - always remove all tags

Above idea is borrowed from #16842

  1. New way for getHtml() method:
    getHtml() - escapes < and > instead remove it:
  • if character < is followed by a white space, ex 4 < 5 then it will be replaced by 4 &lt; 5
  • if > does not have paired <, ex: A>B => A&gt;B
  • if <> then that characters will be replace by &lt;&gt; ???
  • if < does not have a pair with > then it will be replaced by &lt;, ex: A<B => A&lt;B
  1. Fix notice:
    Notice: A non well formed numeric value encountered in .../libraries/joomla/filter/input.php on line 1202 divHello "Joomla"

tested as:

$_REQUEST['test'] = '<div&#x003e;Hello &quot;Joomla&quot;&#60;/div>';
echo JFactory::getApplication()->input->getString('test', '');die;

Testing Instructions

Not yet.
I suspect that above point 2 could not be accepted in full. Any comments?

Expected result

getString() will always return clean text without any tags. Input whitelist and blacklist will not change that.

Actual result

...

Documentation Changes Required

Probably.

Details

  1. For filter instance with default blacklist enabled:
No. Input Type Old behaviour* This PR
1. '<em' 'STRING' 'em' ''
'<em' '' 'em' '&lt;em'
'<em' 'HTML' 'em' '&lt;em'
'&lt;em' 'STRING' 'em' ''
'&lt;em' '' 'em' '&lt;em'
'&lt;em' 'HTML' '&lt;em' '&lt;em'
2. 'em>' 'STRING' 'em>' 'em>'
'em>' '' 'em>' 'em&gt;'
'em>' 'HTML' 'em>' 'em&gt;'
'em&gt;' 'STRING' 'em>' 'em>'
'em&gt;' '' 'em>' 'em&gt;'
'em&gt;' 'HTML' 'em&gt;' 'em&gt;'
3. '< ' 'STRING' ' ' '< '
'< ' '' ' ' '&lt; '
'< ' 'HTML' ' ' '&lt; '
'&lt; ' 'STRING' ' ' '< '
'&lt; ' '' ' ' '&lt; '
'&lt; ' 'HTML' '&lt; ' '&lt; '
4.** '<>' 'STRING' '' ''
'<>' '' '' '&lt;&gt;'
'<>' 'HTML' '' '&lt;&gt;'
'&lt;&gt;' 'STRING' '' ''
'&lt;&gt;' '' '' '&lt;&gt;'
'&lt;&gt;' 'HTML' '&lt;&gt;' '&lt;&gt;'

(*) Old behaviour means behaviour on J3.7.2 and older.
(**) I'm going to revert behaviour for point 4 on this PR.

@joeforjoomla
Copy link
Contributor

Sounds good to me.
Just tested the getString method and it works as expected. Now it's possible to preserve '<' and '>' characters if used in a generic context.

@rdeutz rdeutz self-assigned this Jun 27, 2017
'',
'<>',
'&lt;&gt;',
'From generic cases'
Copy link
Contributor

@rdeutz rdeutz Jun 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't looks right to me, I think this should be let it as it is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add one detail. This test is parsing input as HTML (not STRING). As a result should be valid HTML. It can not stay as <>. I can only remove it. I probably do that.

'',
'< >',
'&lt; &gt;',
'From generic cases'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The php function strip_tags() do almost the same, means output < >. Old version from 3.7.2 leave only >

'<< >>',
'&lt;&lt; &gt;&gt;',
'From generic cases'
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Contributor Author

@csthomas csthomas Jun 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may change that to output only &gt; or >

@rdeutz
Copy link
Contributor

rdeutz commented Jun 28, 2017

I am not going to merge this. After some discussion I came to the conclusion that we need to look deeper into this and that we don't have the time to do this before the next release will be out.

@csthomas
Copy link
Contributor Author

OK. If this can help I can remove all changes from point 2, means all changes for getHtml().
This way it will be more similar to #16842

@joeforjoomla
Copy link
Contributor

joeforjoomla commented Jun 28, 2017

@rdeutz If this issue #16842 is going to be introduced in the next 3.7.3 update it should be at least documented as potential issues can arise. For example our extensions need an update because the '>' character is stripped out by getString.
If we don't have time to get it definitely fixed for 3.7.3, at least it would be the case to revert the getString method to still allow the '>' character as in 3.7.2 to avoid bugs.

@csthomas
Copy link
Contributor Author

I wrote small PR to fix only B/C break in input filter. It is not complicated as this one.

@csthomas csthomas closed this Nov 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants