Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

heap-buffer-overflow S_scan_const (toke.c:2975) #15624

Closed
p5pRT opened this issue Sep 23, 2016 · 14 comments

Comments

@p5pRT
Copy link
Collaborator

@p5pRT p5pRT commented Sep 23, 2016

Migrated from rt.perl.org#129342 (status was 'resolved')

Searchable as RT129342$

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 23, 2016

From @geeknik

Triggered in Perl v5.25.5-8-g3c42ae1 with AFL+ASAN.

./perl -e 'y%\o-0%%'

=================================================================
==7216==ERROR​: AddressSanitizer​: heap-buffer-overflow on address
0x60200000e1af at pc 0x000000681ac7 bp 0x7fff7a36dd50 sp 0x7fff7a36dd48
READ of size 1 at 0x60200000e1af thread T0
  #0 0x681ac6 in S_scan_const /root/perl/toke.c​:2975​:21
  #1 0x615b1f in Perl_yylex /root/perl/toke.c​:4774​:10
  #2 0x6addee in Perl_yyparse /root/perl/perly.c​:334​:19
  #3 0x59c491 in S_parse_body /root/perl/perl.c​:2374​:9
  #4 0x59282c in perl_parse /root/perl/perl.c​:1689​:2
  #5 0x4de5d5 in main /root/perl/perlmain.c​:121​:18
  #6 0x7f15f31aab44 in __libc_start_main /build/glibc-uPj9cH/glibc-2.
19/csu/libc-start.c​:287
  #7 0x4de26c in _start (/root/perl/perl+0x4de26c)

0x60200000e1af is located 1 bytes to the left of 10-byte region
[0x60200000e1b0,0x60200000e1ba)
allocated by thread T0 here​:
  #0 0x4c0beb in malloc (/root/perl/perl+0x4c0beb)
  #1 0x7f8617 in Perl_safesysmalloc /root/perl/util.c​:153​:21

SUMMARY​: AddressSanitizer​: heap-buffer-overflow /root/perl/toke.c​:2975
S_scan_const
Shadow bytes around the buggy address​:
  0x0c047fff9be0​: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff9bf0​: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff9c00​: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff9c10​: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff9c20​: fa fa fa fa fa fa fa fa fa fa fa fa fa fa 00 00
=>0x0c047fff9c30​: fa fa fd fd fa[fa]00 02 fa fa 00 04 fa fa 01 fa
  0x0c047fff9c40​: fa fa 05 fa fa fa 00 02 fa fa 00 02 fa fa 00 04
  0x0c047fff9c50​: fa fa 02 fa fa fa 00 02 fa fa 00 07 fa fa 00 fa
  0x0c047fff9c60​: fa fa 00 02 fa fa 05 fa fa fa 00 02 fa fa 06 fa
  0x0c047fff9c70​: fa fa 00 02 fa fa 05 fa fa fa 00 05 fa fa 04 fa
  0x0c047fff9c80​: fa fa 05 fa fa fa 05 fa fa fa 00 00 fa fa 00 02
Shadow byte legend (one shadow byte represents 8 application bytes)​:
  Addressable​: 00
  Partially addressable​: 01 02 03 04 05 06 07
  Heap left redzone​: fa
  Heap right redzone​: fb
  Freed heap region​: fd
  Stack left redzone​: f1
  Stack mid redzone​: f2
  Stack right redzone​: f3
  Stack partial redzone​: f4
  Stack after return​: f5
  Stack use after scope​: f8
  Global redzone​: f9
  Global init order​: f6
  Poisoned by user​: f7
  Container overflow​: fc
  ASan internal​: fe
==7216==ABORTING

A non-ASAN build of 5.20.2 errors out​:
Missing braces on \o{} at -e line 1, within string
Execution of -e aborted due to compilation errors.

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Oct 5, 2016

From @hvds

The tr/// range constructor in scan_const knows that the characters representing start and end of the range have been written to sv; however on parse errors we don't do that but still continue.

The effect is to read a (possibly utf8) character before the start of sv's PV, but then still give the parse error. I don't think this is a security concern; if someone gives a +1 I'll push the patch and open up the ticket.

The attached patch fixes it, so valgrind shows no error on 'y//\o-x/' nor on 'y//x-\o/', but I can't think how to write a test.

Hugo

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Oct 5, 2016

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Oct 5, 2016

The RT System itself - Status changed from 'new' to 'open'

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Jan 17, 2017

From @tonycoz

On Wed, 05 Oct 2016 07​:08​:23 -0700, hv wrote​:

The tr/// range constructor in scan_const knows that the characters
representing start and end of the range have been written to sv;
however on parse errors we don't do that but still continue.

The effect is to read a (possibly utf8) character before the start of
sv's PV, but then still give the parse error. I don't think this is a
security concern; if someone gives a +1 I'll push the patch and open
up the ticket.

The attached patch fixes it, so valgrind shows no error on 'y//\o-x/'
nor on 'y//x-\o/', but I can't think how to write a test.

Your fix looks fine to me.

Patch with a test attached.

Tony

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Jan 17, 2017

From @tonycoz

0001-perl-129342-test-for-buffer-overflow.patch
From d8435d6bd3614c149b02ee591dba8d7f769843fc Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Tue, 17 Jan 2017 11:52:53 +1100
Subject: (perl #129342) test for buffer overflow

---
 t/lib/croak/toke | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/t/lib/croak/toke b/t/lib/croak/toke
index d35eab6..f1817b3 100644
--- a/t/lib/croak/toke
+++ b/t/lib/croak/toke
@@ -350,3 +350,10 @@ EXPECT
 syntax error at - line 4, near "ϡ time"
   (Might be a runaway multi-line ϡϡ string starting on line 3)
 Execution of - aborted due to compilation errors.
+########
+# NAME tr/// handling of mis-formatted \o characters
+# may only fail with ASAN
+tr/\o-0//;
+EXPECT
+Missing braces on \o{} at - line 2, within string
+Execution of - aborted due to compilation errors.
-- 
2.1.4

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Jan 19, 2017

From @hvds

On Mon, 16 Jan 2017 16​:53​:51 -0800, tonyc wrote​:

Your fix looks fine to me.

Patch with a test attached.

Thanks Tony, I've pushed my fix and your test as 3dd4eae and 59143e2, and will move the ticket to the open queue.

Hugo

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Jan 19, 2017

@hvds - Status changed from 'open' to 'pending release'

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Jan 20, 2017

From @khwilliamson

I now think this is the wrong solution.

There are other occurrences of where there is an error in the parse and we continue on, so that the tr code will have invalid inputs. Rather than fix them all (and we might miss some), why not have the tr code check if there has been an error, and if so, quit now. instead of forging ahead and getting into trouble?

Trying to continue parsing in the face of errors is problematic. There are times when we can't obviously continue, and there are times when we can continue, so as to find the most number of problems in a single run. (I remember how frustrating it was to work with SNOBOL that quit on the first error. This was compounded by the fact that it was punch cards, and the turnaround was 8 hours per job, except in the middle of the night.) And there are times when we think we can continue, but it creates problems. This is one of them. And we could continue except in the case of tr. So I think the tr code should be the part that changes.

--
Karl Williamson

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Jan 21, 2017

From @hvds

On Fri, 20 Jan 2017 10​:49​:40 -0800, khw wrote​:

I now think this is the wrong solution.

There are other occurrences of where there is an error in the parse
and we continue on, so that the tr code will have invalid inputs.
Rather than fix them all (and we might miss some), why not have the tr
code check if there has been an error, and if so, quit now. instead of
forging ahead and getting into trouble?

Trying to continue parsing in the face of errors is problematic.
There are times when we can't obviously continue, and there are times
when we can continue, so as to find the most number of problems in a
single run. (I remember how frustrating it was to work with SNOBOL
that quit on the first error. This was compounded by the fact that it
was punch cards, and the turnaround was 8 hours per job, except in the
middle of the night.) And there are times when we think we can
continue, but it creates problems. This is one of them. And we could
continue except in the case of tr. So I think the tr code should be
the part that changes.

I'm happy to consider replacing my fix with such an approach - do you have an implementation to show? "Wrong" feels like an overstatement though, and I don't think further improvements belong on this ticket.

More generally, scan_const as a whole has become unwieldy enough to merit some refactoring. And in considering the current complexity, I'm also looking askance at your recently added cases for range_min == range_max and range_min + 1 == range_max​: a small compile-time benefit for code that nobody would normally write does not seem worth any added complexity to the implementation.

Hugo

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Jan 21, 2017

From @khwilliamson

On 01/21/2017 03​:29 AM, Hugo van der Sanden via RT wrote​:

On Fri, 20 Jan 2017 10​:49​:40 -0800, khw wrote​:

I now think this is the wrong solution.

There are other occurrences of where there is an error in the parse
and we continue on, so that the tr code will have invalid inputs.
Rather than fix them all (and we might miss some), why not have the tr
code check if there has been an error, and if so, quit now. instead of
forging ahead and getting into trouble?

Trying to continue parsing in the face of errors is problematic.
There are times when we can't obviously continue, and there are times
when we can continue, so as to find the most number of problems in a
single run. (I remember how frustrating it was to work with SNOBOL
that quit on the first error. This was compounded by the fact that it
was punch cards, and the turnaround was 8 hours per job, except in the
middle of the night.) And there are times when we think we can
continue, but it creates problems. This is one of them. And we could
continue except in the case of tr. So I think the tr code should be
the part that changes.

I'm happy to consider replacing my fix with such an approach - do you have an implementation to show? "Wrong" feels like an overstatement though, and I don't think further improvements belong on this ticket.

Perhaps you take the word "wrong" more seriously than I meant it. The
fix fixes the bug reported in this ticket. In that sense it's right.
But in thinking about it, there appear to be similar issues that weren't
addressed by this ticket,l and I think the approach you took could be
extended to fix these, but we might miss some, or some might get added
later by someone not familiar with the nuances. I was suggesting a
different approach, as I believe that the tr code is the only part of
this routine that relies on reading the results of previous loop iterations.

I hadn't come up with an implementation, but it would essentially be
done by looking at the syntax error count, at the point where it
otherwise would be looking at the results of the previous iterations,
and if that count was not zero, bailing out immediately. Perhaps this
could be abstracted into a macro or function. I don't like the
asymmetry currently where it calls yyerror or Perl_croak depending on
whether to bail out later or now. I would prefer there to be something
like yyerror_quit_now(), and it might display a message to the effect of
not being able to continue parsing.

More generally, scan_const as a whole has become unwieldy enough to merit some refactoring.

It's been unwieldy longer than I've been involved in this project; and
in fact a comment last touched by Dave Mitchell says it is "terrifying
code". I've tried to help by adding comments. I haven't spent any time
considering how to refactor, other than to extract out some code that
was common to regcomp.c and this function (and with subtle variations)
into separate functions. As always, patches welcome.
  And in considering the current complexity, I'm also looking askance at
your recently added cases for range_min == range_max and range_min + 1
== range_max​: a small compile-time benefit for code that nobody would
normally write does not seem worth any added complexity to the
implementation.

You're right, but the reason I did this is that there is now code below
that doesn't work correctly unless the range is at least 3 elements
long. Rather than special casing that, it seemed to me preferable to
just handle the cases earlier when it actually would be of some (albeit
uncommon) benefit.

Hugo

---
via perlbug​: queue​: perl5 status​: pending release
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=129342

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Feb 14, 2017

From @khwilliamson

On 01/21/2017 03​:43 PM, Karl Williamson wrote​:

On 01/21/2017 03​:29 AM, Hugo van der Sanden via RT wrote​:

On Fri, 20 Jan 2017 10​:49​:40 -0800, khw wrote​:

I now think this is the wrong solution.

There are other occurrences of where there is an error in the parse
and we continue on, so that the tr code will have invalid inputs.
Rather than fix them all (and we might miss some), why not have the tr
code check if there has been an error, and if so, quit now. instead of
forging ahead and getting into trouble?

Trying to continue parsing in the face of errors is problematic.
There are times when we can't obviously continue, and there are times
when we can continue, so as to find the most number of problems in a
single run. (I remember how frustrating it was to work with SNOBOL
that quit on the first error. This was compounded by the fact that it
was punch cards, and the turnaround was 8 hours per job, except in the
middle of the night.) And there are times when we think we can
continue, but it creates problems. This is one of them. And we could
continue except in the case of tr. So I think the tr code should be
the part that changes.

I'm happy to consider replacing my fix with such an approach - do you
have an implementation to show? "Wrong" feels like an overstatement
though, and I don't think further improvements belong on this ticket.

Perhaps you take the word "wrong" more seriously than I meant it. The
fix fixes the bug reported in this ticket. In that sense it's right.
But in thinking about it, there appear to be similar issues that weren't
addressed by this ticket,l and I think the approach you took could be
extended to fix these, but we might miss some, or some might get added
later by someone not familiar with the nuances. I was suggesting a
different approach, as I believe that the tr code is the only part of
this routine that relies on reading the results of previous loop
iterations.

I hadn't come up with an implementation, but it would essentially be
done by looking at the syntax error count, at the point where it
otherwise would be looking at the results of the previous iterations,
and if that count was not zero, bailing out immediately. Perhaps this
could be abstracted into a macro or function. I don't like the
asymmetry currently where it calls yyerror or Perl_croak depending on
whether to bail out later or now. I would prefer there to be something
like yyerror_quit_now(), and it might display a message to the effect of
not being able to continue parsing.

More generally, scan_const as a whole has become unwieldy enough to
merit some refactoring.

It's been unwieldy longer than I've been involved in this project; and
in fact a comment last touched by Dave Mitchell says it is "terrifying
code". I've tried to help by adding comments. I haven't spent any time
considering how to refactor, other than to extract out some code that
was common to regcomp.c and this function (and with subtle variations)
into separate functions. As always, patches welcome.
And in considering the current complexity, I'm also looking askance at
your recently added cases for range_min == range_max and range_min + 1
== range_max​: a small compile-time benefit for code that nobody would
normally write does not seem worth any added complexity to the
implementation.

You're right, but the reason I did this is that there is now code below
that doesn't work correctly unless the range is at least 3 elements
long. Rather than special casing that, it seemed to me preferable to
just handle the cases earlier when it actually would be of some (albeit
uncommon) benefit.

In the end, I came to the conclusion that Hugo's original approach is
probably the best. I pushed 6b58f9b
to that effect​:

  commit 6b58f9b
  Author​: Karl Williamson <khw@​cpan.org>
  Date​: Mon Feb 13 19​:22​:59 2017 -0700

* toke.c​: Make sure things are initialized

  Commit 3dd4eae fixed a bug
wherein the
  tr/// operator parsing code could be looking at uninitialized data.
  This happens only because we try to carry on when we find errors,
so as
  to find as many errors as possible in a single run, as a
convenience to
  the person debugging the script being compiled. And we failed to
  initialize stuff upon getting an error; stuff that was later looked at
  by tr///.

  That commit fixed the ticket by making sure the things mentioned there
  got initialized upon error, but didn't handle the various other places
  in the loop where the same thing could happen.

  At the time, I thought it would be easier to instead change the tr///
  handling code to know that its inputs were problematic, and to avoid
  looking at them in that case. This is easily done, and would
  automatically catch all the cases in the loop, now and any added
in the
  future.

  But then I thought, maybe tr/// isn't the only operator that could be
  thrown off by this. It is the most obvious one, to someone who knows
  how it goes about getting compiled; but there may be other operators
  that I don't know how they get compiled and have the same or a similar
  problem. The better solution then would be to extend
  3dd4eae to make sure everything gets
  initialized when there is an error. That is what this current commit
  does.

  The previous few commits have refactored things so as to minimize the
  number of places that need to be handled here, down to three. I
kinda
  doubt that new constructs will be added, at this stage in the language
  development, that would require the same initialization handling.
But,
  if they were, hopefully those doing it would follow the existing
  paradigm that this commit and 3dd4eae
  establish.

  Another way to handle this would have been to, instead of doing an
  initialize-and-'continue', to instead jump to a common label at the
  bottom of the loop which does the initialization. I think it doesn't
  matter much which, so left it as this.

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented May 30, 2017

From @khwilliamson

Thank you for filing this report. You have helped make Perl better.

With the release today of Perl 5.26.0, this and 210 other issues have been
resolved.

Perl 5.26.0 may be downloaded via​:
https://metacpan.org/release/XSAWYERX/perl-5.26.0

If you find that the problem persists, feel free to reopen this ticket.

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented May 30, 2017

@khwilliamson - Status changed from 'pending release' to 'resolved'

@p5pRT p5pRT closed this May 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.