Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

commit.c: ensure find_header_mem() doesn't scan beyond given range #1652

Closed
wants to merge 1 commit into from

Conversation

Chand-ra
Copy link

@Chand-ra Chand-ra commented Feb 5, 2024

Thanks for the feedback, Kyle and René! I have update the
patch to actually solve the problem at hand but I am not very
sure about the resulting dropping of const-ness of 'eol' from
this and how big of a problem it might create (if any). I wonder
if a custom strchrnul() is the best solution to this after all.

cc: René Scharfe l.s.r@web.de
cc: Kyle Lippincott spectral@google.com
cc: Jeff King peff@peff.net

@Chand-ra
Copy link
Author

Chand-ra commented Feb 5, 2024

/preview

Copy link

gitgitgadget bot commented Feb 5, 2024

Preview email sent as pull.1652.git.1707153446245.gitgitgadget@gmail.com

@Chand-ra
Copy link
Author

Chand-ra commented Feb 5, 2024

/submit

Copy link

gitgitgadget bot commented Feb 5, 2024

Submitted as pull.1652.git.1707153705840.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1652/Chand-ra/strchrnul-v1

To fetch this version to local tag pr-1652/Chand-ra/strchrnul-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1652/Chand-ra/strchrnul-v1

Copy link

gitgitgadget bot commented Feb 5, 2024

On the Git mailing list, René Scharfe wrote (reply to this):

Am 05.02.24 um 18:21 schrieb Chandra Pratap via GitGitGadget:
> From: Chandra Pratap <chandrapratap3519@gmail.com>
>
> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
> ---
>     commit.c: ensure strchrnul() doesn't scan beyond range
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1652%2FChand-ra%2Fstrchrnul-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1652/Chand-ra/strchrnul-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1652
>
>  commit.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/commit.c b/commit.c
> index ef679a0b939..a65b8e92e94 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -1743,15 +1743,9 @@ const char *find_header_mem(const char *msg, size_t len,
>  	int key_len = strlen(key);
>  	const char *line = msg;
>
> -	/*
> -	 * NEEDSWORK: It's possible for strchrnul() to scan beyond the range
> -	 * given by len. However, current callers are safe because they compute
> -	 * len by scanning a NUL-terminated block of memory starting at msg.
> -	 * Nonetheless, it would be better to ensure the function does not look
> -	 * at msg beyond the len provided by the caller.
> -	 */
>  	while (line && line < msg + len) {
>  		const char *eol = strchrnul(line, '\n');
> +		assert(eol - line <= len);

Something like this might work in Verse, but C is more simple-minded.
You can't undo an out-of-bounds access after the fact, and assert()
would be compiled out if the code is built with NDEBUG anyway.

If you want to make the code work with buffers that lack a terminating
NUL then you need to replace the strchrnul() call with something that
respects buffer lengths.  You could e.g. call memchr().  Don't forget
to check for NUL to preserve the original behavior.  Or you could roll
your own custom replacement, perhaps like this:

char *strnchrnul(const char *s, int c, size_t len)
{
	while (len-- && *s && *s != c)
		s++;
	return (char *)s;
}

A test with the new unit-test framework would be nice.  It should be
possible to show that the current code runs over the passed len,
without causing undefined behavior.  E.g. find_header_mem("foo bar",
2, "foo", &len) is safe, but returns "bar" instead of NULL.

>
>  		if (line == eol)
>  			return NULL;
>
> base-commit: a54a84b333adbecf7bc4483c0e36ed5878cac17b

Copy link

gitgitgadget bot commented Feb 5, 2024

User René Scharfe <l.s.r@web.de> has been added to the cc: list.

Copy link

gitgitgadget bot commented Feb 6, 2024

On the Git mailing list, Kyle Lippincott wrote (reply to this):

On Mon, Feb 5, 2024 at 9:23 AM Chandra Pratap via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Chandra Pratap <chandrapratap3519@gmail.com>
>
> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
> ---
>     commit.c: ensure strchrnul() doesn't scan beyond range
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1652%2FChand-ra%2Fstrchrnul-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1652/Chand-ra/strchrnul-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1652
>
>  commit.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/commit.c b/commit.c
> index ef679a0b939..a65b8e92e94 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -1743,15 +1743,9 @@ const char *find_header_mem(const char *msg, size_t len,
>         int key_len = strlen(key);
>         const char *line = msg;
>
> -       /*
> -        * NEEDSWORK: It's possible for strchrnul() to scan beyond the range
> -        * given by len. However, current callers are safe because they compute
> -        * len by scanning a NUL-terminated block of memory starting at msg.
> -        * Nonetheless, it would be better to ensure the function does not look
> -        * at msg beyond the len provided by the caller.
> -        */
>         while (line && line < msg + len) {
>                 const char *eol = strchrnul(line, '\n');
> +               assert(eol - line <= len);

I don't think this is sufficient to address the NEEDSWORK. `assert` is
only active in debug builds, and strchrnul would have already
potentially exceeded the bounds of its memory by the time this check
is happening. We'd need a safe version of strchrnul that took the
maximum length and never exceeded it.

>
>                 if (line == eol)
>                         return NULL;
>
> base-commit: a54a84b333adbecf7bc4483c0e36ed5878cac17b
> --
> gitgitgadget
>

Copy link

gitgitgadget bot commented Feb 6, 2024

User Kyle Lippincott <spectral@google.com> has been added to the cc: list.

Copy link

gitgitgadget bot commented Feb 6, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

René Scharfe <l.s.r@web.de> writes:

>>  	while (line && line < msg + len) {
>>  		const char *eol = strchrnul(line, '\n');
>> +		assert(eol - line <= len);
>
> Something like this might work in Verse, but C is more simple-minded.
> You can't undo an out-of-bounds access after the fact, and assert()
> would be compiled out if the code is built with NDEBUG anyway.

Good comments.  Thanks.

Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
@Chand-ra Chand-ra changed the title commit.c: ensure strchrnul() doesn't scan beyond range commit.c: ensure find_header_mem() doesn't scan beyond given range Feb 7, 2024
@Chand-ra
Copy link
Author

Chand-ra commented Feb 7, 2024

/preview

Copy link

gitgitgadget bot commented Feb 7, 2024

Preview email sent as pull.1652.v2.git.1707311986346.gitgitgadget@gmail.com

@Chand-ra
Copy link
Author

Chand-ra commented Feb 7, 2024

/submit

Copy link

gitgitgadget bot commented Feb 7, 2024

Submitted as pull.1652.v2.git.1707314246530.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1652/Chand-ra/strchrnul-v2

To fetch this version to local tag pr-1652/Chand-ra/strchrnul-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1652/Chand-ra/strchrnul-v2

Copy link

gitgitgadget bot commented Feb 7, 2024

On the Git mailing list, René Scharfe wrote (reply to this):

Am 07.02.24 um 14:57 schrieb Chandra Pratap via GitGitGadget:
> From: Chandra Pratap <chandrapratap3519@gmail.com>
>
> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
> ---
>     commit.c: ensure find_header_mem() doesn't scan beyond given range
>
>     Thanks for the feedback, Kyle and René! I have update the patch to
>     actually solve the problem at hand but I am not very sure about the
>     resulting dropping of const-ness of 'eol' from this and how big of a
>     problem it might create (if any). I wonder if a custom strchrnul() is
>     the best solution to this after all.
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1652%2FChand-ra%2Fstrchrnul-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1652/Chand-ra/strchrnul-v2
> Pull-Request: https://github.com/gitgitgadget/git/pull/1652
>
> Range-diff vs v1:
>
>  1:  1c62f6ee353 ! 1:  dcb2de3faea commit.c: ensure strchrnul() doesn't scan beyond range
>      @@ Metadata
>       Author: Chandra Pratap <chandrapratap3519@gmail.com>
>
>        ## Commit message ##
>      -    commit.c: ensure strchrnul() doesn't scan beyond range
>      +    commit.c: ensure find_header_mem() doesn't scan beyond given range
>
>           Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
>
>      @@ commit.c: const char *find_header_mem(const char *msg, size_t len,
>       -	 * at msg beyond the len provided by the caller.
>       -	 */
>        	while (line && line < msg + len) {
>      - 		const char *eol = strchrnul(line, '\n');
>      -+		assert(eol - line <= len);
>      +-		const char *eol = strchrnul(line, '\n');
>      ++		char *eol = (char *) line;
>      ++		for (size_t i = 0; i < len && *eol && *eol != '\n'; i++) {
>      ++			eol++;
>      ++		}
>
>        		if (line == eol)
>        			return NULL;
>
>
>  commit.c | 12 ++++--------
>  1 file changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/commit.c b/commit.c
> index ef679a0b939..9a460b2fd6f 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -1743,15 +1743,11 @@ const char *find_header_mem(const char *msg, size_t len,
>  	int key_len = strlen(key);
>  	const char *line = msg;
>
> -	/*
> -	 * NEEDSWORK: It's possible for strchrnul() to scan beyond the range
> -	 * given by len. However, current callers are safe because they compute
> -	 * len by scanning a NUL-terminated block of memory starting at msg.
> -	 * Nonetheless, it would be better to ensure the function does not look
> -	 * at msg beyond the len provided by the caller.
> -	 */
>  	while (line && line < msg + len) {
> -		const char *eol = strchrnul(line, '\n');
> +		char *eol = (char *) line;
> +		for (size_t i = 0; i < len && *eol && *eol != '\n'; i++) {
> +			eol++;
> +		}

This uses the pointer eol only for reading, so you can keep it const.

The loop starts counting from 0 to len for each line, which cannot be
right.  find_header_mem("headers\nfoo bar", 9, "foo", &len) would still
return "bar" instead of NULL.

You could initialize i to the offset of line within msg instead (i.e.
i = line - msg).  Or check eol < msg + len instead of i < len -- then
you don't even need to introduce that separate counter.

Style nit: We tend to omit curly braces if they contain only a single
statement.

>  		if (line == eol)
>  			return NULL;
>
> base-commit: a54a84b333adbecf7bc4483c0e36ed5878cac17b

Copy link

gitgitgadget bot commented Feb 7, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

René Scharfe <l.s.r@web.de> writes:

>> -	/*
>> -	 * NEEDSWORK: It's possible for strchrnul() to scan beyond the range
>> -	 * given by len. However, current callers are safe because they compute
>> -	 * len by scanning a NUL-terminated block of memory starting at msg.
>> -	 * Nonetheless, it would be better to ensure the function does not look
>> -	 * at msg beyond the len provided by the caller.
>> -	 */
>>  	while (line && line < msg + len) {
>> -		const char *eol = strchrnul(line, '\n');
>> +		char *eol = (char *) line;
>> +		for (size_t i = 0; i < len && *eol && *eol != '\n'; i++) {
>> +			eol++;
>> +		}
>
> This uses the pointer eol only for reading, so you can keep it const.
>
> The loop starts counting from 0 to len for each line, which cannot be
> right.  find_header_mem("headers\nfoo bar", 9, "foo", &len) would still
> return "bar" instead of NULL.
>
> You could initialize i to the offset of line within msg instead (i.e.
> i = line - msg).  Or check eol < msg + len instead of i < len -- then
> you don't even need to introduce that separate counter.
>
> Style nit: We tend to omit curly braces if they contain only a single
> statement.

All true.  As we already use an extra variable 'i' for counting, we
can do without eol and reference line[i] instead, which would make
the whole thing something like

	while (line && line < msg + len) {
		size_t i;
		for (i = 0;
                     i < len && line[i] && line[i] != '\n';
		     i++)
			;
		if (key_len < i &&
		    !strncmp(line, key, ken_len) &&
		    linhe[key_len] == ' ') {
			*out_len = i - key_len - 1;
			return line + key_len + 1;
		}
                line = line[i] ? line + i + 1 : NULL;
	}

which is not too bad, simply because the original already needed to
know the length of the current line and due to lack of this "i" you
introduced, it used "eol-line" instead.  Now you have "i", the code
may get even simpler by getting rid of "eol".

Copy link

gitgitgadget bot commented Feb 8, 2024

On the Git mailing list, Jeff King wrote (reply to this):

On Mon, Feb 05, 2024 at 08:57:46PM +0100, René Scharfe wrote:

> If you want to make the code work with buffers that lack a terminating
> NUL then you need to replace the strchrnul() call with something that
> respects buffer lengths.  You could e.g. call memchr().  Don't forget
> to check for NUL to preserve the original behavior.  Or you could roll
> your own custom replacement, perhaps like this:

I'm not sure it is worth retaining the check for NUL. The original
function added by me in fe6eb7f2c5 (commit: provide a function to find a
header in a buffer, 2014-08-27) just took a NUL-terminated string, so
we certainly were not expecting embedded NULs.

In cfc5cf428b (receive-pack.c: consolidate find header logic,
2022-01-06) we switched to taking the "len" parameter, but the new
caller just passes strlen(msg) anyway.

I guess you could argue that before that commit, receive-pack.c's
find_header() which took a length was buggy to use strchrnul(). It gets
fed with a push-cert buffer. I guess it's possible for there to be an
embedded NUL there, but in practice there shouldn't be. If we are
thinking of malformed or malicious input, it's not clear which behavior
(finding or not finding a header past a NUL) is more harmful. So all
things being equal, I would try to reduce the number of special cases
here by not worrying about NULs.

(Though if somebody really wants to dig, it's possible there's a clever
dual-parser attack here where "\nfoo\0bar baz" finds the header "bar
baz" in one parser but not in another).

-Peff

Copy link

gitgitgadget bot commented Feb 8, 2024

User Jeff King <peff@peff.net> has been added to the cc: list.

Copy link

gitgitgadget bot commented Feb 8, 2024

On the Git mailing list, René Scharfe wrote (reply to this):

Am 08.02.24 um 02:00 schrieb Jeff King:
> On Mon, Feb 05, 2024 at 08:57:46PM +0100, René Scharfe wrote:
>
>> If you want to make the code work with buffers that lack a terminating
>> NUL then you need to replace the strchrnul() call with something that
>> respects buffer lengths.  You could e.g. call memchr().  Don't forget
>> to check for NUL to preserve the original behavior.  Or you could roll
>> your own custom replacement, perhaps like this:
>
> I'm not sure it is worth retaining the check for NUL. The original
> function added by me in fe6eb7f2c5 (commit: provide a function to find a
> header in a buffer, 2014-08-27) just took a NUL-terminated string, so
> we certainly were not expecting embedded NULs.
>
> In cfc5cf428b (receive-pack.c: consolidate find header logic,
> 2022-01-06) we switched to taking the "len" parameter, but the new
> caller just passes strlen(msg) anyway.
>
> I guess you could argue that before that commit, receive-pack.c's
> find_header() which took a length was buggy to use strchrnul(). It gets
> fed with a push-cert buffer. I guess it's possible for there to be an
> embedded NUL there, but in practice there shouldn't be. If we are
> thinking of malformed or malicious input, it's not clear which behavior
> (finding or not finding a header past a NUL) is more harmful. So all
> things being equal, I would try to reduce the number of special cases
> here by not worrying about NULs.
>
> (Though if somebody really wants to dig, it's possible there's a clever
> dual-parser attack here where "\nfoo\0bar baz" finds the header "bar
> baz" in one parser but not in another).

Good point.  A _mem function shouldn't worry about NULs.  Its callers
are responsible for that -- if necessary.

No idea what an attacker could do with nonce and push-option headers
with varying visibility.  Version detection?  Something worse?

But anyway: If NULs are of no concern and we currently end parsing when
we see one in all cases, why do we need a _mem function at all?  The
original version of the function, find_commit_header(), should suffice.
check_nonce() could be run against the NUL-terminated sigcheck.payload
and check_cert_push_options() parses an entire strbuf, so there is no
risk of out-of-bounds access.

René

Copy link

gitgitgadget bot commented Feb 8, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

René Scharfe <l.s.r@web.de> writes:

> But anyway: If NULs are of no concern and we currently end parsing when
> we see one in all cases, why do we need a _mem function at all?  The
> original version of the function, find_commit_header(), should suffice.
> check_nonce() could be run against the NUL-terminated sigcheck.payload
> and check_cert_push_options() parses an entire strbuf, so there is no
> risk of out-of-bounds access.

If I recall correctly, the caller that does not pass strlen() as the
payload length gives a length that is shorter than the buffer, i.e.
"stop the parsing here, do not get confused into thinking the
garbage after this point contains useful payload" was the reason why
we have a separate "len".

Copy link

gitgitgadget bot commented Feb 8, 2024

On the Git mailing list, Kyle Lippincott wrote (reply to this):

On Thu, Feb 8, 2024 at 11:48 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> René Scharfe <l.s.r@web.de> writes:
>
> > But anyway: If NULs are of no concern and we currently end parsing when
> > we see one in all cases, why do we need a _mem function at all?  The
> > original version of the function, find_commit_header(), should suffice.
> > check_nonce() could be run against the NUL-terminated sigcheck.payload
> > and check_cert_push_options() parses an entire strbuf, so there is no
> > risk of out-of-bounds access.
>
> If I recall correctly, the caller that does not pass strlen() as the
> payload length gives a length that is shorter than the buffer, i.e.
> "stop the parsing here, do not get confused into thinking the
> garbage after this point contains useful payload" was the reason why
> we have a separate "len".
>

I just rediscovered that. I think this probably should be something
that caller (check_nonce) implements, then. Having a _mem function
implies to me (though I'm very new to this codebase) that it supports
embedded NULs, but that's not what's happening here.

Copy link

gitgitgadget bot commented Feb 8, 2024

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, Feb 08, 2024 at 11:48:05AM -0800, Junio C Hamano wrote:

> René Scharfe <l.s.r@web.de> writes:
> 
> > But anyway: If NULs are of no concern and we currently end parsing when
> > we see one in all cases, why do we need a _mem function at all?  The
> > original version of the function, find_commit_header(), should suffice.
> > check_nonce() could be run against the NUL-terminated sigcheck.payload
> > and check_cert_push_options() parses an entire strbuf, so there is no
> > risk of out-of-bounds access.
> 
> If I recall correctly, the caller that does not pass strlen() as the
> payload length gives a length that is shorter than the buffer, i.e.
> "stop the parsing here, do not get confused into thinking the
> garbage after this point contains useful payload" was the reason why
> we have a separate "len".

Yes, check_nonce() passes in a length limited by the start of the actual
signature, as determined by parse_signed_buffer(). Though that generally
comes after a blank line, which would also stop find_header() from
parsing further.

But more interestingly: even though we pass a buf/len pair to
parse_signed_buffer(), it then calls get_format_by_sig() which takes
only a NUL-terminated string. So:

  1. It is not possible for the buf/len pair we pass to check_nonce() to
     contain a NUL. And thus there is no caller of find_header_mem()
     that can contain an embedded NUL. So switching from strchrnul() to
     just memchr() should be OK there.

  2. That raises the question of whether parse_signed_buffer() has a
     similar walk-too-far problem. ;) The answer is no, because we feed
     it from a strbuf. But it's not a great pattern overall.

-Peff

Copy link

gitgitgadget bot commented Feb 8, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff King <peff@peff.net> writes:

>   1. It is not possible for the buf/len pair we pass to check_nonce() to
>      contain a NUL. And thus there is no caller of find_header_mem()
>      that can contain an embedded NUL. So switching from strchrnul() to
>      just memchr() should be OK there.

Correct.

>   2. That raises the question of whether parse_signed_buffer() has a
>      similar walk-too-far problem. ;) The answer is no, because we feed
>      it from a strbuf. But it's not a great pattern overall.

True, too.

Thanks.

@Chand-ra Chand-ra closed this May 10, 2024
@Chand-ra Chand-ra deleted the strchrnul branch May 10, 2024 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant