Skip to content

Commit

Permalink
trapping dodgy alt
Browse files Browse the repository at this point in the history
  • Loading branch information
Nicholas Peter Bamber committed Oct 4, 2010
1 parent 6a7cadf commit 6bbe560
Show file tree
Hide file tree
Showing 4 changed files with 102 additions and 2 deletions.
27 changes: 25 additions & 2 deletions lib/HTML/Acid.pm
Expand Up @@ -43,6 +43,12 @@ Readonly my $URL_REGEX => qr{
\z # end of string
}xms;

Readonly my $ALT_REGEX => qr{
\A # start of string
[\w\s\.\,]+ #
\z # end of string
}xms;

sub new {
my $class = shift;
my %args = @_;
Expand All @@ -60,6 +66,7 @@ sub new {
api_version => 3,
empty_element_tags=>1,
strict_comment=>1,
attr_encoded=>1,
);

# Set up HTML::Parser handlers
Expand Down Expand Up @@ -226,7 +233,7 @@ sub _img_start {
$self->_buffer("<img alt=\"$alt\" height=\"$height\" src=\"$src\" "
."title=\"$title\" width=\"$width\" />");
}
else {
elsif ($alt =~ $ALT_REGEX) {
$self->_buffer(" $alt ");
}
return;
Expand Down Expand Up @@ -394,7 +401,8 @@ URL. They may also have a C<title> attribute.
=item * Images must have C<src>, C<title>, C<alt>, C<height> and C<width>
attributes. The C<src> attribute must match the same regular expression
as C<href>. If any of these tags are missing the image is replaced by
the contents of the alt tag.
the contents of the alt attribute, so long as it consists only of alphanumeric
characters, spaces, full stops and commas. Otherwise the image is removed.
=item * All other tags must have no attributes and may only contain text.
Expand Down Expand Up @@ -459,6 +467,21 @@ This module works by subclassing L<HTML::Parser>.
None reported.
=head1 TODO
=over
=item * More relaxed treatment of the alt tag would be good. However it is
easier to go from restrictive behaviour to more relaxed so it will stay like
it is for now.
=item * Sooner or later a little more flexibility in handling attributes
will be required.
=item * I think this module could do with an XS backend for a speed up.
=back
=head1 BUGS AND LIMITATIONS
No bugs have been reported.
Expand Down
33 changes: 33 additions & 0 deletions t/in/38-dodgy-alt
@@ -0,0 +1,33 @@
<h3>The first<br/> test</h3 class="really_end">
<em>This</em> is a very conformant
<a href="/xhtml#def">XHTML</a>
fragment. <span class="outrageous">I hope you like it</span>. I will be using it as test material</em>
during the development of this module. Actually not just <em>development</em>
but subsequent <strong>support</strong> will both use this and variant
files for regression tests. <h3 id="play_havoc">Blah<p>This paragraph is intended to include <br/>all the<br/>
features permitted in our restricted subset of HTML. Therefore I need to
include an utterly gratuitous
<img alt="<script type='text/javascript'>alert('XSS')</script>" height="100000" title="a gratuitous image" width="100000"><a id="/blah" title="Woo I;m invisible">
</a>
at no extra cost.</p><p>










</p>
<h3 id="second_paragraph"></h3>
<p>This second paragraph is likely to be a bit shorter. However I cannot
entirely guarantee that. That is I might have to revise that prediction
once the paragraph is complete.</p>
<h3 id="test2">The big idea</h3>
<p>Actually I am sorry to <span class="invisible">admit</span> it, but I am running out of ideas for
test material. I mean I really needed a second header and so I had to
have yet another paragraph. I do apologize for any inconvenience reading
this <strong>material may have caused you.
<br/>
22 changes: 22 additions & 0 deletions t/out/38-dodgy-alt
@@ -0,0 +1,22 @@
<h3 id="the-first-test">The first test</h3>
<p><em>This</em> is a very conformant
<a href="/xhtml#def">XHTML</a>
fragment. I hope you like it. I will be using it as test material
during the development of this module. Actually not just <em>development</em>
but subsequent <strong>support</strong> will both use this and variant
files for regression tests. </p><h3 id="play_havoc">Blah</h3><p>This paragraph is intended to include all the
features permitted in our restricted subset of HTML. Therefore I need to
include an utterly gratuitous

at no extra cost.</p>

<p>This second paragraph is likely to be a bit shorter. However I cannot
entirely guarantee that. That is I might have to revise that prediction
once the paragraph is complete.</p>
<h3 id="test2">The big idea</h3>
<p>Actually I am sorry to admit it, but I am running out of ideas for
test material. I mean I really needed a second header and so I had to
have yet another paragraph. I do apologize for any inconvenience reading
this <strong>material may have caused you.

</strong></p>
22 changes: 22 additions & 0 deletions t/variant/38-dodgy-alt
@@ -0,0 +1,22 @@
<h3 id="the-first-test">The first test</h3>
<p><em>This</em> is a very conformant
<a href="/xhtml#def">XHTML</a>
fragment. I hope you like it. I will be using it as test material
during the development of this module. Actually not just <em>development</em>
but subsequent <strong>support</strong> will both use this and variant
files for regression tests. </p><h3 id="play_havoc">Blah</h3><p>This paragraph is intended to include all the
features permitted in our restricted subset of HTML. Therefore I need to
include an utterly gratuitous
<div></div></p><p>
at no extra cost.</p>

<p>This second paragraph is likely to be a bit shorter. However I cannot
entirely guarantee that. That is I might have to revise that prediction
once the paragraph is complete.</p>
<h3 id="test2">The big idea</h3>
<p>Actually I am sorry to admit it, but I am running out of ideas for
test material. I mean I really needed a second header and so I had to
have yet another paragraph. I do apologize for any inconvenience reading
this <strong>material may have caused you.

</strong></p>

0 comments on commit 6bbe560

Please sign in to comment.