fuzz: add new oss-fuzz fuzzer for date.c / date.h #1612

arthurscchan · 2023-11-10T20:55:36Z

This patch is aimed to add a new oss-fuzz fuzzer to the oss-fuzz directory for fuzzing date.c / date.h in the base directory.

The .gitignore of the oss-fuzz directory and the Makefile have been modified to accommodate the new fuzzer fuzz-date.c.

Fixed the objects order in .gitignore and Makefiles and fixed some of the logic and formatting for the fuzz-date.c fuzzer in v2.

Fixed the creation and memory allocation of the fuzzing str in v3. Also fixed the tz type and sign-extended the data before passing to the tz variable.

Fixed the tz variable allocations and some of the bytes used for fuzzing variables in v4.

Comment:
Yes, indeed. It is quite annoying to have that twice.
Yes, the tz should be considered as attacker controllable and thus negative values should be considered. But it is tricky to fuzz it because the date.c::gm_time_t() will call die() if the value is invalid and that exit the fuzzer directly. OSS-Fuzz may consider it as an issue (or bug) because the fuzzer exit "unexpectedly". I agree that if we consider the tz as "attacker controllable, we should include negative values, but since it will cause the fuzzer exit, I am not sure if it is the right approach from the fuzzing perspective. Also, it is something that date.c already take care of with the conditional checking, thus it may also be worth to do some checking and exclude some invalid values before calling date.c::show_date() but this may result in copying some conditional checking code from date.c.

Additional comment for v4:
Thanks for the suggestion. Yes, that maybe the easier approach. Since the new logic is only using 2 bytes for the int16_t tz, thus the local and dmtype variable could use separate bytes to increase "randomness".

Thanks for taking the time to contribute to Git! Please be advised that the
Git community does not use github.com for their contributions. Instead, we use
a mailing list (git@vger.kernel.org) for code submissions, code reviews, and
bug reports. Nevertheless, you can use GitGitGadget (https://gitgitgadget.github.io/)
to conveniently send your Pull Requests commits to our mailing list.

Please read the "guidelines for contributing" linked above!

cc: Jeff King peff@peff.net

arthurscchan · 2023-11-11T17:38:54Z

/submit

gitgitgadget · 2023-11-11T17:39:46Z

Submitted as pull.1612.git.1699724379458.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1612/arthurscchan/new-fuzzer-date-v1

To fetch this version to local tag pr-1612/arthurscchan/new-fuzzer-date-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1612/arthurscchan/new-fuzzer-date-v1

gitgitgadget · 2023-11-12T06:03:00Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Arthur Chan via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/Makefile b/Makefile
> index 03adcb5a480..c9fe99a8c88 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -752,6 +752,7 @@ ETAGS_TARGET = TAGS
>  FUZZ_OBJS += oss-fuzz/fuzz-commit-graph.o
>  FUZZ_OBJS += oss-fuzz/fuzz-pack-headers.o
>  FUZZ_OBJS += oss-fuzz/fuzz-pack-idx.o
> +FUZZ_OBJS += oss-fuzz/fuzz-date.o

The same comment applies to .gitignore but I think the existing
entries are sorted and fuzz-date should be added between
fuzz-commit-graph and fuzz-pack-headers.

> diff --git a/oss-fuzz/fuzz-date.c b/oss-fuzz/fuzz-date.c
> new file mode 100644
> index 00000000000..29bcaf595e4
> --- /dev/null
> +++ b/oss-fuzz/fuzz-date.c
> @@ -0,0 +1,75 @@
> +#include "git-compat-util.h"
> +#include "date.h"
> +
> +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);
> +
> +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
> +{
> +	int type;
> +	int time;
> +	int num;
> +	char *str;
> +	timestamp_t ts;
> +	enum date_mode_type dmtype;
> +	struct date_mode *dm;
> +
> +	if (size <= 8)
> +	{
> +		return 0;
> +	}

How much do we care about sticking to our coding style for source
files in this directory?  If we do (and I do not see a strong reason
not to), let's lose the {unneeded braces} around a single statement
block.

> +	type = (*((int *)data)) % 8;
> +	data += 4;
> +	size -= 4;

I'd prefer to avoid these hardcoded and unexplained constants.  I
think "8" relates to the number of case arms below?  I am not sure
if "4" is justifiable without "we assume everybody's int is four
bytes long", but if that is what is going on, perhaps use uint32_t
or something?

Also, is data[] guaranteed to be aligned well to allow us to do the
above?  As we only need to spread to DATE_UNIX-1 types (because we
exclude DATE_STRFTIME), it is sufficient to look at the lower nibble
of a single byte.  The upper nibble could be used to fuzz the .local
bit if you wanted to, e.g. so I wonder

	local_bit = !!(*data & 0x10);
	dmtype = (enum date_mode_type)(*data % DATE_UNIX);
	if (dmtype == DATE_STRFTIME)
		return 0;
	data++;
	size--;

> +	time = abs(*((int *)data));
> +	data += 4;
> +	size -= 4;

Ditto.  Rename "time" because the second parameter to show_date() is
*not* "time" but "tz".  The valid range of "tz" comfortably fits in
16-bit signed int, but note that there are valid negative values in
the range.

Are we assuming that the "tz" is attacker controlled?  Why are you
limiting its value to non-negative, yet you are not rejecting absurd
timezone offsets?  Good values lie in a range much narrower than
between -2400 and 2400.  Subjecting "tz" to fuzzer is perfectly
fine, but then limiting its value to non-negative contradicts with
it, so I am not sure what your intention is.

As I used the first byte to fuzz dmtype and .local, let's use the
next three bytes to allow feeding overly wild timezone values to the
machinery and see what breaks, perhaps like so:

	tz = *data++; /* int tz; */
	tz = (tz << 8) | *data++;
	tz = (tz << 8) | *data++;
	size -= 3;

Now the upfront length check needs to reject any input shorter than
4 bytes, so do so with a comment accordingly, perhaps like

	if (size < 4)
		/*
                 * we use the first byte to fuzz dmtype and local,
                 * then the next three bytes to fuzz tz	offset,
                 * and the remainder is fed as end-user input to
		 * approxidate().
		 */
		return 0;

before everything I wrote so far.

> +	str = (char *)malloc(size+1);

	(char *)malloc(size + 1);

> +	if (!str)
> +	{
> +		return 0;
> +	}

Ditto on {unnecessary braces}.

> +	memcpy(str, data, size);
> +	str[size] = '\0';
> +
> +	ts = approxidate_careful(str, &num);
> +	free(str);
> +
> +	switch(type)
> +	{
> +		case 0: default:
> +			dmtype = DATE_NORMAL;
> +			break;

Style.  In our codebase, "switch" and "case" align at the same
column, and case arms are written one per line, i.e.,

	switch (type) {
	case 0:
	default:
		...

The way dmtype is handled in a switch() below tells me that you do
not consider it is a potential attack vector (e.g., an attacker
cannot force us to use dmtype==DATE_STRFTIME without the format and
cause us to die).  Am I reading your intention correctly?

If so, I'd just do the "use the lower nibble of the first byte" as I
shown earlier, and this large switch statement will go away.

> +		case 1:
> +			dmtype = DATE_HUMAN;
> +			break;
> +		case 2:
> +			dmtype = DATE_SHORT;
> +			break;
> +		case 3:
> +			dmtype = DATE_ISO8601;
> +			break;
> +		case 4:
> +			dmtype = DATE_ISO8601_STRICT;
> +			break;
> +		case 5:
> +			dmtype = DATE_RFC2822;
> +			break;
> +		case 6:
> +			dmtype = DATE_RAW;
> +			break;
> +		case 7:
> +			dmtype = DATE_UNIX;
> +			break;
> +	}
> +
> +	dm = date_mode_from_type(dmtype);
> +	dm->local = 1;

Don't we want to allow the incoming data to fuzz the local bit, too?

> +	show_date(ts, time, dm);
> +
> +	date_mode_release(dm);
> +
> +	return 0;
> +}
>
> base-commit: dadef801b365989099a9929e995589e455c51fed

gitgitgadget · 2023-11-12T12:44:52Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

Junio C Hamano <gitster@pobox.com> writes:

> As I used the first byte to fuzz dmtype and .local, let's use the
> next three bytes to allow feeding overly wild timezone values to the
> machinery and see what breaks, perhaps like so:
>
> 	tz = *data++; /* int tz; */
> 	tz = (tz << 8) | *data++;
> 	tz = (tz << 8) | *data++;
> 	size -= 3;

Just this part.  As data points at unsigned char, the above would
not give us any negative number.  We'd need to sign-extend the
24-bit resulting value if we are going to adopt the above approach.

arthurscchan · 2023-11-13T16:21:41Z

/submit

gitgitgadget · 2023-11-13T16:22:55Z

Submitted as pull.1612.v2.git.1699892568344.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1612/arthurscchan/new-fuzzer-date-v2

To fetch this version to local tag pr-1612/arthurscchan/new-fuzzer-date-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1612/arthurscchan/new-fuzzer-date-v2

gitgitgadget · 2023-11-13T18:38:57Z

On the Git mailing list, Jeff King wrote (reply to this):

On Mon, Nov 13, 2023 at 04:22:48PM +0000, Arthur Chan via GitGitGadget wrote:

> +	str = (char *)malloc(size + 1);
> +	if (!str)
> +		return 0;
> +	memcpy(str, data, size);
> +	str[size] = '\0';

Is it important that we avoid calling die() if the malloc fails here?

The usual way to write this in our code base is just:

  str = xmemdupz(data, size);

It's not entirely a style thing; we sometimes audit the code base
looking for computations on malloc sizes (for integer overflows) as well
as sites that should be using xmalloc and are not. Obviously we can
exclude oss-fuzz/ from such audits, but if there's no reason not to
prefer our usual style, it's one less thing to worry about.

-Peff

gitgitgadget · 2023-11-13T18:39:00Z

User Jeff King <peff@peff.net> has been added to the cc: list.

gitgitgadget · 2023-11-13T23:29:52Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Arthur Chan via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);

It is somewhat annoying that everybody has to repeat this twice
here, but it is not your fault X-<.

> +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
> +{
> +	int local;
> +	int num;
> +	uint16_t tz;

tz offset can be negative, so uint16_t is not appropriate.  See
date.c:gm_time_t() that is eventually called from show_date().

> +	char *str;
> +	timestamp_t ts;
> +	enum date_mode_type dmtype;
> +	struct date_mode *dm;
> +
> +	if (size <= 4)
> +		/*
> +		 * we use the first byte to fuzz dmtype and local,
> +		 * then the next three bytes to fuzz tz	offset,
> +		 * and the remainder (at least one byte) is fed
> +		 * as end-user input to approxidate_careful().
> +		 */
> +		return 0;
> +
> +	local = !!(*data & 0x10);
> +	dmtype = (enum date_mode_type)(*data % DATE_UNIX);
> +	if (dmtype == DATE_STRFTIME)
> +		/*
> +		 * Currently DATE_STRFTIME is not supported.
> +		 */
> +		return 0;

There is an off-by-one error above, as modulo DATE_UNIX will never
yield DATE_UNIX.  Presumably we could do something silly like

	tmp = *data % DATE_UNIX;
	if (DATE_STRFTIME <= tmp)
		tmp++;
	dmtime = (enum date_mode_type)tmp;

to pick values from [0..DATE_UNIX) and then shift everything above
DATE_STRFTIME by one to create a hole there and fill DATE_UNIX at
the same time, without wasting a sample by returning.

> +	data++;
> +	size--;
> +
> +	tz = *data++;
> +	tz = (tz << 8) | *data++;
> +	tz = (tz << 8) | *data++;
> +	size -= 3;

If your tz is 16-bit wide, then we do not have to eat three bytes
here, do we?

You never answered my question on your intention.  Is "tz"
considered attacker controlled (and needs to be fuzzed including
invalid values)?

> +	str = (char *)malloc(size + 1);
> +	if (!str)
> +		return 0;
> +	memcpy(str, data, size);
> +	str[size] = '\0';
> +
> +	ts = approxidate_careful(str, &num);
> +	free(str);
> +
> +	dm = date_mode_from_type(dmtype);
> +	dm->local = local;
> +	show_date(ts, (int16_t)tz, dm);
> +
> +	date_mode_release(dm);
> +
> +	return 0;
> +}
>
> base-commit: dadef801b365989099a9929e995589e455c51fed

Thanks.

gitgitgadget · 2023-11-13T23:29:54Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff King <peff@peff.net> writes:

> On Mon, Nov 13, 2023 at 04:22:48PM +0000, Arthur Chan via GitGitGadget wrote:
>
>> +	str = (char *)malloc(size + 1);
>> +	if (!str)
>> +		return 0;
>> +	memcpy(str, data, size);
>> +	str[size] = '\0';
>
> Is it important that we avoid calling die() if the malloc fails here?
>
> The usual way to write this in our code base is just:
>
>   str = xmemdupz(data, size);
>
> It's not entirely a style thing; we sometimes audit the code base
> looking for computations on malloc sizes (for integer overflows) as well
> as sites that should be using xmalloc and are not. Obviously we can
> exclude oss-fuzz/ from such audits, but if there's no reason not to
> prefer our usual style, it's one less thing to worry about.

Good point.  Thanks.

arthurscchan · 2023-11-14T10:52:10Z

/submit

gitgitgadget · 2023-11-14T10:53:12Z

Submitted as pull.1612.v3.git.1699959186146.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1612/arthurscchan/new-fuzzer-date-v3

To fetch this version to local tag pr-1612/arthurscchan/new-fuzzer-date-v3:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1612/arthurscchan/new-fuzzer-date-v3

gitgitgadget · 2023-11-14T17:06:01Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Arthur Chan via GitGitGadget" <gitgitgadget@gmail.com> writes:

>      ++	tmp_data = (int8_t*)data;
>      ++	tz = *tmp_data++;
>      ++	tz = (tz << 8) | *tmp_data++;
>      ++	tz = (tz << 8) | *tmp_data++;

This has a funny skew towards negative number.  Any time MSB of the
one of the three bytes is set, tz becomes negative.  Worse, a byte
taken from *tmp_data that has its MSB on will _wipe_ what was read
in tz so far, because its higher order bits above 8th bit are sign
extended.  If the incoming data is evenly distributed, 7/8 of the
time, you'd end up with a negative number in tz, no?

I think you can and should pick bytes with uint8_t pointer to avoid
sign extending individual bytes and sign extend the resulting number
at the end.  Or if it is too cumbersome to do so, using "int16_t tz"
and filling it with two bytes from *data will sign extend itself
when we pass it to show_date() as a parameter of type "int", which
may be the easiest to code, I suspect.

Thanks.

Signed-off-by: Arthur Chan <arthur.chan@adalogics.com>

arthurscchan · 2023-11-17T17:47:07Z

/submit

gitgitgadget · 2023-11-17T17:47:55Z

Submitted as pull.1612.v4.git.1700243267653.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1612/arthurscchan/new-fuzzer-date-v4

To fetch this version to local tag pr-1612/arthurscchan/new-fuzzer-date-v4:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1612/arthurscchan/new-fuzzer-date-v4

gitgitgadget · 2023-11-20T17:09:29Z

This branch is now known as ac/fuzz-show-date.

gitgitgadget · 2023-11-20T17:09:30Z

This patch series was integrated into seen via git@7cec110.

gitgitgadget · 2023-11-20T17:26:19Z

There was a status update in the "New Topics" section about the branch ac/fuzz-show-date on the Git mailing list:

Subject approxidate() and show_date() macchinery to OSS-Fuzz.

Will merge to 'next'?
source: <pull.1612.v4.git.1700243267653.gitgitgadget@gmail.com>

gitgitgadget · 2023-11-27T14:30:13Z

There was a status update in the "Cooking" section about the branch ac/fuzz-show-date on the Git mailing list:

Subject approxidate() and show_date() macchinery to OSS-Fuzz.

Will merge to 'next'?
source: <pull.1612.v4.git.1700243267653.gitgitgadget@gmail.com>

gitgitgadget · 2023-12-09T01:13:58Z

This patch series was integrated into seen via git@0407405.

gitgitgadget · 2023-12-09T02:08:29Z

There was a status update in the "Cooking" section about the branch ac/fuzz-show-date on the Git mailing list:

Subject approxidate() and show_date() machinery to OSS-Fuzz.

Will merge to 'next'.
source: <pull.1612.v4.git.1700243267653.gitgitgadget@gmail.com>

gitgitgadget · 2023-12-10T07:10:23Z

This patch series was integrated into seen via git@ad7d331.

gitgitgadget · 2023-12-10T07:10:24Z

This patch series was integrated into next via git@80b7ebc.

gitgitgadget · 2023-12-11T16:27:51Z

This patch series was integrated into next via git@f36795a.

gitgitgadget · 2023-12-11T22:15:57Z

This patch series was integrated into seen via git@7cf98cf.

gitgitgadget · 2023-12-12T01:27:32Z

There was a status update in the "Cooking" section about the branch ac/fuzz-show-date on the Git mailing list:

Subject approxidate() and show_date() machinery to OSS-Fuzz.

Will merge to 'master'.
source: <pull.1612.v4.git.1700243267653.gitgitgadget@gmail.com>

gitgitgadget · 2023-12-13T01:12:05Z

This patch series was integrated into seen via git@6c52031.

gitgitgadget · 2023-12-13T22:17:20Z

This patch series was integrated into seen via git@094e5a6.

gitgitgadget · 2023-12-14T19:29:41Z

This patch series was integrated into seen via git@eb14b71.

gitgitgadget · 2023-12-15T23:05:35Z

This patch series was integrated into seen via git@371c63d.

gitgitgadget · 2023-12-19T00:54:46Z

This patch series was integrated into seen via git@3335365.

gitgitgadget · 2023-12-19T00:54:47Z

This patch series was integrated into master via git@3335365.

gitgitgadget · 2023-12-19T00:54:48Z

This patch series was integrated into next via git@3335365.

gitgitgadget · 2023-12-19T00:54:50Z

Closed via 3335365.

arthurscchan force-pushed the new-fuzzer-date branch 3 times, most recently from 43aeea6 to d43724c Compare November 10, 2023 21:20

arthurscchan force-pushed the new-fuzzer-date branch 2 times, most recently from 76a1ad5 to 2928e2b Compare November 13, 2023 15:23

arthurscchan force-pushed the new-fuzzer-date branch from 2928e2b to 5b9d59a Compare November 13, 2023 22:10

arthurscchan force-pushed the new-fuzzer-date branch 3 times, most recently from 4cc78fa to 046bca3 Compare November 14, 2023 00:17

arthurscchan force-pushed the new-fuzzer-date branch 2 times, most recently from fcc1933 to 9b43f41 Compare November 17, 2023 16:48

fuzz: add new oss-fuzz fuzzer for date.c / date.h

33a72d4

Signed-off-by: Arthur Chan <arthur.chan@adalogics.com>

arthurscchan force-pushed the new-fuzzer-date branch from 9b43f41 to 33a72d4 Compare November 17, 2023 16:54

gitgitgadget bot added the seen label Nov 20, 2023

gitgitgadget bot added the next label Dec 10, 2023

gitgitgadget bot added the master label Dec 19, 2023

gitgitgadget bot closed this Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fuzz: add new oss-fuzz fuzzer for date.c / date.h #1612

fuzz: add new oss-fuzz fuzzer for date.c / date.h #1612

arthurscchan commented Nov 10, 2023 •

edited

Loading

arthurscchan commented Nov 11, 2023

gitgitgadget bot commented Nov 11, 2023

gitgitgadget bot commented Nov 12, 2023

gitgitgadget bot commented Nov 12, 2023

arthurscchan commented Nov 13, 2023

gitgitgadget bot commented Nov 13, 2023

gitgitgadget bot commented Nov 13, 2023

gitgitgadget bot commented Nov 13, 2023

gitgitgadget bot commented Nov 13, 2023

gitgitgadget bot commented Nov 13, 2023

arthurscchan commented Nov 14, 2023

gitgitgadget bot commented Nov 14, 2023

gitgitgadget bot commented Nov 14, 2023

arthurscchan commented Nov 17, 2023

gitgitgadget bot commented Nov 17, 2023

gitgitgadget bot commented Nov 20, 2023

gitgitgadget bot commented Nov 20, 2023

gitgitgadget bot commented Nov 20, 2023

gitgitgadget bot commented Nov 27, 2023

gitgitgadget bot commented Dec 9, 2023

gitgitgadget bot commented Dec 9, 2023

gitgitgadget bot commented Dec 10, 2023

gitgitgadget bot commented Dec 10, 2023

gitgitgadget bot commented Dec 11, 2023

gitgitgadget bot commented Dec 11, 2023

gitgitgadget bot commented Dec 12, 2023

gitgitgadget bot commented Dec 13, 2023

gitgitgadget bot commented Dec 13, 2023

gitgitgadget bot commented Dec 14, 2023

gitgitgadget bot commented Dec 15, 2023

gitgitgadget bot commented Dec 19, 2023

gitgitgadget bot commented Dec 19, 2023

gitgitgadget bot commented Dec 19, 2023

gitgitgadget bot commented Dec 19, 2023

fuzz: add new oss-fuzz fuzzer for date.c / date.h #1612

fuzz: add new oss-fuzz fuzzer for date.c / date.h #1612

Conversation

arthurscchan commented Nov 10, 2023 • edited Loading

arthurscchan commented Nov 11, 2023

gitgitgadget bot commented Nov 11, 2023

gitgitgadget bot commented Nov 12, 2023

gitgitgadget bot commented Nov 12, 2023

arthurscchan commented Nov 13, 2023

gitgitgadget bot commented Nov 13, 2023

gitgitgadget bot commented Nov 13, 2023

gitgitgadget bot commented Nov 13, 2023

gitgitgadget bot commented Nov 13, 2023

gitgitgadget bot commented Nov 13, 2023

arthurscchan commented Nov 14, 2023

gitgitgadget bot commented Nov 14, 2023

gitgitgadget bot commented Nov 14, 2023

arthurscchan commented Nov 17, 2023

gitgitgadget bot commented Nov 17, 2023

gitgitgadget bot commented Nov 20, 2023

gitgitgadget bot commented Nov 20, 2023

gitgitgadget bot commented Nov 20, 2023

gitgitgadget bot commented Nov 27, 2023

gitgitgadget bot commented Dec 9, 2023

gitgitgadget bot commented Dec 9, 2023

gitgitgadget bot commented Dec 10, 2023

gitgitgadget bot commented Dec 10, 2023

gitgitgadget bot commented Dec 11, 2023

gitgitgadget bot commented Dec 11, 2023

gitgitgadget bot commented Dec 12, 2023

gitgitgadget bot commented Dec 13, 2023

gitgitgadget bot commented Dec 13, 2023

gitgitgadget bot commented Dec 14, 2023

gitgitgadget bot commented Dec 15, 2023

gitgitgadget bot commented Dec 19, 2023

gitgitgadget bot commented Dec 19, 2023

gitgitgadget bot commented Dec 19, 2023

gitgitgadget bot commented Dec 19, 2023

arthurscchan commented Nov 10, 2023 •

edited

Loading