Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strengthen fsck checks for submodule URLs #1635

Closed

Conversation

vdye
Copy link

@vdye vdye commented Jan 9, 2024

While testing 'git fsck' checks on .gitmodules URLs, I noticed that some invalid URLs were passing the checks. Digging into it a bit more, the issue turned out to be that 'credential_from_url_gently()' parses certain URLs (like "http://example.com:something/deeper/path") incorrectly, in a way that appeared to return a valid result.

Fortunately, these URLs are rejected in fetches/clones/pushes anyway because 'url_normalize()' (called in 'validate_remote_url()') correctly identifies them as invalid. So, to bring 'git fsck' in line with other (stronger) validation done on remote URLs, this series replaces the 'credential_from_url_gently()' check with one that uses 'url_normalize()'.

  • Patch 1 moves 'check_submodule_url()' to a public location so that it can be used outside of 'fsck.c'.
  • Patch 2 removes the obsolete/never-used code in 'test-tool submodule check-name' handling names provided on the command line.
  • Patch 3 adds a 'check-url' mode to 'test-tool submodule', calling the now-public 'check_submodule_url()' method on a given URL, and adds new tests checking valid and invalid submodule URLs.
  • Patch 4 replaces the 'credential_from_url_gently()' check with 'url_normalize()' followed by 'url_decode()' and an explicit check for newlines (to preserve the newline handling added in 07259e74ec1 (fsck: detect gitmodules URLs with embedded newlines, 2020-03-11)).

Changes since V1

  • Added 'TEST_TOOL_CHECK_URL_USAGE' to 'submodule_usage'.
  • Removed unused/unreachable code related to command line inputs in 'test-tool submodule check-name' and 'test-tool submodule check-url'.
  • Split the new t7450 test case into two tests (the first contains URLs that are validated successfully, the second demonstrates a URL incorrectly marked valid) to clearly show which pattern is handled improperly. The tests are merged in the final patch once the validation is corrected.

Thanks!

  • Victoria

cc: Patrick Steinhardt ps@pks.im
cc: Jeff King peff@peff.net

@vdye vdye self-assigned this Jan 9, 2024
@vdye
Copy link
Author

vdye commented Jan 9, 2024

/submit

Copy link

gitgitgadget bot commented Jan 9, 2024

Submitted as pull.1635.git.1704822817.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1635/vdye/vdye/strengthen-fsck-url-checks-v1

To fetch this version to local tag pr-1635/vdye/vdye/strengthen-fsck-url-checks-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1635/vdye/vdye/strengthen-fsck-url-checks-v1

@@ -15,6 +15,13 @@ static const char *submodule_check_name_usage[] = {
NULL
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +#define TEST_TOOL_CHECK_URL_USAGE \
> +	"test-tool submodule check-url <url>"
> +static const char *submodule_check_url_usage[] = {
> +	TEST_TOOL_CHECK_URL_USAGE,
> +	NULL
> +};

Granted, the entry that follows this new one already uses the same
pattern, but TEST_TOOL_CHECK_URL_USAGE being used only once here and
nowhere else, with its name almost as long as the value it expands to,
I found it unnecessarily verbose and confusing.

>  #define TEST_TOOL_IS_ACTIVE_USAGE \
>  	"test-tool submodule is-active <name>"
>  static const char *submodule_is_active_usage[] = {


> +typedef int (*check_fn_t)(const char *);
> +
>  /*
>   * Exit non-zero if any of the submodule names given on the command line is
>   * invalid. If no names are given, filter stdin to print only valid names
>   * (which is primarily intended for testing).
>   */

OK.  As long as each of the input lines are unique, we can use the
usual "does the actual output match the expected?" to test many of
them at once, and notice if there is an extra one in the output that
shouldn't have been emitted, or there is a missing one that should
have.

> -static int check_name(int argc, const char **argv)
> +static int check_submodule(int argc, const char **argv, check_fn_t check_fn)
>  {
>  	if (argc > 1) {
>  		while (*++argv) {
> -			if (check_submodule_name(*argv) < 0)
> +			if (check_fn(*argv) < 0)

Quite nice way to reuse what we already have, thanks to [1/3].

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Victoria Dye wrote (reply to this):

Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +#define TEST_TOOL_CHECK_URL_USAGE \
>> +	"test-tool submodule check-url <url>"
>> +static const char *submodule_check_url_usage[] = {
>> +	TEST_TOOL_CHECK_URL_USAGE,
>> +	NULL
>> +};
> 
> Granted, the entry that follows this new one already uses the same
> pattern, but TEST_TOOL_CHECK_URL_USAGE being used only once here and
> nowhere else, with its name almost as long as the value it expands to,
> I found it unnecessarily verbose and confusing.

This is only used once because I missed the second place it should be used
(in 'submodule_usage[]'). It's still somewhat verbose, but once I fix that
it'll at least have the benefit of avoiding some duplication.

@@ -14,6 +14,8 @@
#include "parse-options.h"
#include "thread-utils.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Victoria Dye <vdye@github.com>
>
> Update the validation of "curl URL" submodule URLs (i.e. those that specify
> an "http[s]" or "ftp[s]" protocol) in 'check_submodule_url()' to catch more
> invalid URLs. The existing validation using 'credential_from_url_gently()'
> parses certain URLs incorrectly, leading to invalid submodule URLs passing
> 'git fsck' checks. Conversely, 'url_normalize()' - used to validate remote
> URLs in 'remote_get()' - correctly identifies the invalid URLs missed by
> 'credential_from_url_gently()'.
>
> To catch more invalid cases, replace 'credential_from_url_gently()' with
> 'url_normalize()' followed by a 'url_decode()' and a check for newlines
> (mirroring 'check_url_component()' in the 'credential_from_url_gently()'
> validation).

Thanks.  Left hand and right hand checking the same thing in
different ways and coming up with different result is never a happy
situation.  Making sure we consistently use the same definition of
what the valid URLs are is a very welcome thing to do, of course.

> -test_expect_failure 'check urls' '
> +test_expect_success 'check urls' '
>  	cat >expect <<-\EOF &&
>  	./bar/baz/foo.git
>  	https://example.com/foo.git

It is a bit unfortunate that from here we cannot tell which bogus
URLs in this test that were incorrectly accepted are now rejected.

Among the many bogus URLs in the input, we used to allow

    http://example.com:test/foo.git

(we do not accept non-numeric representation of port numbers, so
http://example.com:http/foo.git would also be rejected), but with
this change, it is now rejected.  All the other bogus ones are
rejected just as before this change.

Will queue.  Thanks.

Copy link

gitgitgadget bot commented Jan 10, 2024

This branch is now known as vd/fsck-submodule-url-test.

Copy link

gitgitgadget bot commented Jan 10, 2024

This patch series was integrated into seen via git@a9e89ef.

@gitgitgadget gitgitgadget bot added the seen label Jan 10, 2024
Copy link

gitgitgadget bot commented Jan 10, 2024

There was a status update in the "New Topics" section about the branch vd/fsck-submodule-url-test on the Git mailing list:

Tighten URL checks fsck makes in a URL recorded for submodules.

Will merge to 'next'?
source: <pull.1635.git.1704822817.gitgitgadget@gmail.com>

@@ -14,6 +14,8 @@
#include "parse-options.h"
#include "thread-utils.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Patrick Steinhardt wrote (reply to this):

On Tue, Jan 09, 2024 at 05:53:37PM +0000, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> Update the validation of "curl URL" submodule URLs (i.e. those that specify
> an "http[s]" or "ftp[s]" protocol) in 'check_submodule_url()' to catch more
> invalid URLs. The existing validation using 'credential_from_url_gently()'
> parses certain URLs incorrectly, leading to invalid submodule URLs passing
> 'git fsck' checks. Conversely, 'url_normalize()' - used to validate remote
> URLs in 'remote_get()' - correctly identifies the invalid URLs missed by
> 'credential_from_url_gently()'.

Okay, so we retain the wrong behavior of `credential_from_url_gently()`,
right? I wonder whether this can be abused in any way, doubly so because
the function gets invoked with untrusted input from the remote server
when we handle redirects in `http_request_reauth()`. But the redirect
URL we end up passing to `credential_from_url_gently()` would have to
contain a non-numeric port, and curl seemingly does not know to handle
those either.

Other callsites include fsck (which you're fixing) and the credential
store (which is entirely user-controlled). It would be great regardless
to fix the underlying bug in `credential_from_url_gently()` eventually
though. But I do not think that this has to be part this patch series
here, which is a strict improvement.

Thanks!

Patrick

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Victoria Dye wrote (reply to this):

Patrick Steinhardt wrote:
> On Tue, Jan 09, 2024 at 05:53:37PM +0000, Victoria Dye via GitGitGadget wrote:
>> From: Victoria Dye <vdye@github.com>
>>
>> Update the validation of "curl URL" submodule URLs (i.e. those that specify
>> an "http[s]" or "ftp[s]" protocol) in 'check_submodule_url()' to catch more
>> invalid URLs. The existing validation using 'credential_from_url_gently()'
>> parses certain URLs incorrectly, leading to invalid submodule URLs passing
>> 'git fsck' checks. Conversely, 'url_normalize()' - used to validate remote
>> URLs in 'remote_get()' - correctly identifies the invalid URLs missed by
>> 'credential_from_url_gently()'.
> 
> Okay, so we retain the wrong behavior of `credential_from_url_gently()`,
> right? I wonder whether this can be abused in any way, doubly so because
> the function gets invoked with untrusted input from the remote server
> when we handle redirects in `http_request_reauth()`. But the redirect
> URL we end up passing to `credential_from_url_gently()` would have to
> contain a non-numeric port, and curl seemingly does not know to handle
> those either.

Correct, nothing about 'credential_from_url_gently()' changes here. As for
whether it could be abused - I don't *think* so, but I'm definitely not a
security expert. If it helps, here's a more detailed breakdown of the issue:

In 'credential_from_url_1()', suppose we have URL
"http://example.com:test/repo.git". Stepping through the variables:

- 'cp' is "example.com:test/repo.git"
- 'at' is NULL
- 'colon' is ":test/repo.git"
- 'slash' is "/repo.git"

Because 'at' is NULL, we set 'host = cp'. Later, because 'slash - host > 0',
we call 'url_decode_mem()' on "example.com:test" (which, in this case,
doesn't change anything) and the result 'host' to "example.com:test".

The issue for the fsck check is that 'credential_from_url_gently()' doesn't
validate the hostname it extracts (e.g. whether ':' precedes a valid port,
or if the hostname contains a '%'-escaped sequence). I don't *think* that
could be abused since, like you said, cURL should just reject the invalid
URL altogether.

> 
> Other callsites include fsck (which you're fixing) and the credential
> store (which is entirely user-controlled). It would be great regardless
> to fix the underlying bug in `credential_from_url_gently()` eventually
> though. But I do not think that this has to be part this patch series
> here, which is a strict improvement.

Agreed! I think normalizing the URL before trying to extract the credentials
may be all that's needed to avoid surprise URL errors, but that probably
warrants a separate patch submission (with appropriately thorough testing).

> 
> Thanks!
> 
> Patrick

Copy link

gitgitgadget bot commented Jan 10, 2024

User Patrick Steinhardt <ps@pks.im> has been added to the cc: list.

Copy link

gitgitgadget bot commented Jan 10, 2024

On the Git mailing list, Jeff King wrote (reply to this):

On Tue, Jan 09, 2024 at 05:53:34PM +0000, Victoria Dye via GitGitGadget wrote:

> While testing 'git fsck' checks on .gitmodules URLs, I noticed that some
> invalid URLs were passing the checks. Digging into it a bit more, the issue
> turned out to be that 'credential_from_url_gently()' parses certain URLs
> (like "http://example.com:something/deeper/path") incorrectly, in a way that
> appeared to return a valid result.

I don't think that checks was ever intended to be an overall URL-quality
check. The reason we used the credential code in the fsck check is that
we were checking for URLs which triggered a specific credential-related
vulnerability.

I don't mind tightening things further as long as:

  1. We are not allowing any cases that the credential code would have
     forbidden (i.e., something that might let the vulnerability slip
     through, since ultimately it is the credential code which will need
     to be protected). You ported over the newline check, which is the
     main thing. It's possible that there is some difference between the
     two parsers that may allow an invalid input to create a newline for
     one but not the other, but having now looked over the code, I don't
     think so.

     And I think one could argue that the security-importance of the
     fsck check has mostly run its course. The real fix was in the
     credential code itself, and the matching fsck change was mostly
     about protecting downstream clients until they were upgraded. Now
     that it's been several years, there's not as much value there.

  2. It is not making it harder for users to work with repositories that
     may contain malformed URLs that _aren't_ vulnerabilities. It sounds
     like the specific cases you found already don't work at all with
     Git, so presumably nobody is using them. By making it an fsck
     check, though, any mistakes that are embedded in history (even if
     they are now corrected) will make it a pain to use the repository
     with sites that enable transfer.fsckObjects.

     My gut feeling is that this is probably OK in practice. If it does
     cause pain, we might consider loosening the fsck.gitmodulesUrl
     severity (under the notion from above that it is no longer a
     critical security check). But if it doesn't cause real-world pain,
     being pickier is probably better (it may save us from a
     vulnerability down the road).

-Peff

Copy link

gitgitgadget bot commented Jan 10, 2024

User Jeff King <peff@peff.net> has been added to the cc: list.

@@ -15,6 +15,13 @@ static const char *submodule_check_name_usage[] = {
NULL
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Tue, Jan 09, 2024 at 05:53:36PM +0000, Victoria Dye via GitGitGadget wrote:

> +#define TEST_TOOL_CHECK_URL_USAGE \
> +	"test-tool submodule check-url <url>"

I don't think this command-line "<url>" mode works at all. Your
underlying function can handle either stdin or arguments:

> -static int check_name(int argc, const char **argv)
> +static int check_submodule(int argc, const char **argv, check_fn_t check_fn)
>  {
>  	if (argc > 1) {
>  		while (*++argv) {
> -			if (check_submodule_name(*argv) < 0)
> +			if (check_fn(*argv) < 0)
>  				return 1;
>  		}
>  	} else {
>  		struct strbuf buf = STRBUF_INIT;
>  		while (strbuf_getline(&buf, stdin) != EOF) {
> -			if (!check_submodule_name(buf.buf))
> +			if (!check_fn(buf.buf))
>  				printf("%s\n", buf.buf);
>  		}
>  		strbuf_release(&buf);

...but the new caller rejects them before we get there:

> +static int cmd__submodule_check_url(int argc, const char **argv)
> +{
> +	struct option options[] = {
> +		OPT_END()
> +	};
> +	argc = parse_options(argc, argv, "test-tools", options,
> +			     submodule_check_url_usage, 0);
> +	if (argc)
> +		usage_with_options(submodule_check_url_usage, options);
> +
> +	return check_submodule(argc, argv, check_submodule_url);
>  }

So you'd want at least:

diff --git a/t/helper/test-submodule.c b/t/helper/test-submodule.c
index da89d265f0..6b964c88ab 100644
--- a/t/helper/test-submodule.c
+++ b/t/helper/test-submodule.c
@@ -88,8 +88,6 @@ static int cmd__submodule_check_url(int argc, const char **argv)
 	};
 	argc = parse_options(argc, argv, "test-tools", options,
 			     submodule_check_url_usage, 0);
-	if (argc)
-		usage_with_options(submodule_check_url_usage, options);
 
 	return check_submodule(argc, argv, check_submodule_url);
 }

but then that reveals another mismatch. In check_submodule() above we
expect argv[0] to be uninteresting (i.e., the name of the program), but
parse_options() will already have thrown it away. So we silently fail to
check the first option (which is especially bad since the only output is
the exit code, and thus the skipped one looks the same as one that
validated correctly).

All of this is inherited from the existing check_name() code, which I
think has all of the same bugs. The test scripts all just use the stdin
mode, so they don't notice. It's not too hard to fix, but maybe it's
worth just ripping out the unreachable code.

-Peff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Victoria Dye wrote (reply to this):

Jeff King wrote:
> On Tue, Jan 09, 2024 at 05:53:36PM +0000, Victoria Dye via GitGitGadget wrote:
> 
>> +#define TEST_TOOL_CHECK_URL_USAGE \
>> +	"test-tool submodule check-url <url>"
> 
> I don't think this command-line "<url>" mode works at all. Your
> underlying function can handle either stdin or arguments:

...

> All of this is inherited from the existing check_name() code, which I
> think has all of the same bugs. The test scripts all just use the stdin
> mode, so they don't notice. It's not too hard to fix, but maybe it's
> worth just ripping out the unreachable code.

Thanks for pointing out those issues, I think removing the command line
input mode is the way to go. The description of the 'check_name()' mentions
that the stdin mode was "primarily intended for testing". But as 85321a346b5
(submodule--helper: move "check-name" to a test-tool, 2022-09-01) pointed
out, 'check_name()' was never used outside of tests anyway, so whatever use
case was imagined for the command line mode never seemed to have existed. 

Combine that with the fact that the command line mode is so different from
the stdin mode (non-zero exit code for invalid names, prints nothing vs.
zero exit code, prints valid names), there don't seem to be any real
downsides to removing the unused code.

> 
> -Peff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, Jan 11, 2024 at 08:54:47AM -0800, Victoria Dye wrote:

> > All of this is inherited from the existing check_name() code, which I
> > think has all of the same bugs. The test scripts all just use the stdin
> > mode, so they don't notice. It's not too hard to fix, but maybe it's
> > worth just ripping out the unreachable code.
> 
> Thanks for pointing out those issues, I think removing the command line
> input mode is the way to go. The description of the 'check_name()' mentions
> that the stdin mode was "primarily intended for testing". But as 85321a346b5
> (submodule--helper: move "check-name" to a test-tool, 2022-09-01) pointed
> out, 'check_name()' was never used outside of tests anyway, so whatever use
> case was imagined for the command line mode never seemed to have existed. 
> 
> Combine that with the fact that the command line mode is so different from
> the stdin mode (non-zero exit code for invalid names, prints nothing vs.
> zero exit code, prints valid names), there don't seem to be any real
> downsides to removing the unused code.

That sounds like a good plan to me. :)

-Peff

Copy link

gitgitgadget bot commented Jan 10, 2024

This patch series was integrated into seen via git@14fe7a7.

Copy link

gitgitgadget bot commented Jan 12, 2024

This patch series was integrated into seen via git@02f54ec.

Copy link

gitgitgadget bot commented Jan 12, 2024

There was a status update in the "Cooking" section about the branch vd/fsck-submodule-url-test on the Git mailing list:

Tighten URL checks fsck makes in a URL recorded for submodules.

Expecting a reroll (and review response).
cf. <20240110103812.GB16674@coredump.intra.peff.net>
cf. <ZZ46MrjSocJl-kpU@tanuki>
source: <pull.1635.git.1704822817.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented Jan 13, 2024

This patch series was integrated into seen via git@f2de042.

Copy link

gitgitgadget bot commented Jan 16, 2024

This patch series was integrated into seen via git@652c970.

Copy link

gitgitgadget bot commented Jan 16, 2024

There was a status update in the "Cooking" section about the branch vd/fsck-submodule-url-test on the Git mailing list:

Tighten URL checks fsck makes in a URL recorded for submodules.

Expecting a reroll (and review response).
cf. <20240110103812.GB16674@coredump.intra.peff.net>
cf. <ZZ46MrjSocJl-kpU@tanuki>
source: <pull.1635.git.1704822817.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented Jan 17, 2024

This patch series was integrated into seen via git@2b5062e.

Move 'check_submodule_url' out of 'fsck.c' and into 'submodule-config.h' as
a public method, similar to 'check_submodule_name'. With the function now
accessible outside of 'fsck', it can be used in a later commit to extend
'test-tool submodule' to check the validity of submodule URLs as it does
with names in the 'check-name' subcommand.

Other than its location, no changes are made to 'check_submodule_url' in
this patch.

Signed-off-by: Victoria Dye <vdye@github.com>
The 'check-name' subcommand to 'test-tool submodule' is documented as being
able to take a command line argument '<name>'. However, this does not work -
and has never worked - because 'argc > 0' triggers the usage message in
'cmd__submodule_check_name()'. To simplify the helper and avoid future
confusion around proper use of the subcommand, remove any references to
command line arguments for 'check-name' in usage strings and handling in
'check_name()'.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Victoria Dye <vdye@github.com>
@vdye vdye force-pushed the vdye/strengthen-fsck-url-checks branch from 8930715 to 5eb2f1c Compare January 17, 2024 20:00
Add tests to 't7450-bad-git-dotfiles.sh' to check the validity of different
submodule URLs. To verify this directly (without setting up test
repositories & submodules), add a 'check-url' subcommand to 'test-tool
submodule' that calls 'check_submodule_url' in the same way that
'check-name' calls 'check_submodule_name'.

Add two tests to separately address cases where the URL check correctly
filters out invalid URLs and cases where the check misses invalid URLs. Mark
the latter ("url check misses invalid cases") with 'test_expect_failure' to
indicate that this not the undesired behavior.

Signed-off-by: Victoria Dye <vdye@github.com>
@vdye vdye force-pushed the vdye/strengthen-fsck-url-checks branch from 5eb2f1c to c5428bd Compare January 17, 2024 21:53
Copy link

gitgitgadget bot commented Jan 18, 2024

This patch series was integrated into seen via git@e80bc5a.

Update the validation of "curl URL" submodule URLs (i.e. those that specify
an "http[s]" or "ftp[s]" protocol) in 'check_submodule_url()' to catch more
invalid URLs. The existing validation using 'credential_from_url_gently()'
parses certain URLs incorrectly, leading to invalid submodule URLs passing
'git fsck' checks. Conversely, 'url_normalize()' - used to validate remote
URLs in 'remote_get()' - correctly identifies the invalid URLs missed by
'credential_from_url_gently()'.

To catch more invalid cases, replace 'credential_from_url_gently()' with
'url_normalize()' followed by a 'url_decode()' and a check for newlines
(mirroring 'check_url_component()' in the 'credential_from_url_gently()'
validation).

Signed-off-by: Victoria Dye <vdye@github.com>
@vdye vdye force-pushed the vdye/strengthen-fsck-url-checks branch from c5428bd to b79b1a7 Compare January 18, 2024 01:08
@vdye
Copy link
Author

vdye commented Jan 18, 2024

/submit

Copy link

gitgitgadget bot commented Jan 18, 2024

Submitted as pull.1635.v2.git.1705542918.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1635/vdye/vdye/strengthen-fsck-url-checks-v2

To fetch this version to local tag pr-1635/vdye/vdye/strengthen-fsck-url-checks-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1635/vdye/vdye/strengthen-fsck-url-checks-v2

Copy link

gitgitgadget bot commented Jan 18, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> While testing 'git fsck' checks on .gitmodules URLs, I noticed that some
> invalid URLs were passing the checks. Digging into it a bit more, the issue
> turned out to be that 'credential_from_url_gently()' parses certain URLs
> (like "http://example.com:something/deeper/path") incorrectly, in a way that
> appeared to return a valid result.
>
> Fortunately, these URLs are rejected in fetches/clones/pushes anyway because
> 'url_normalize()' (called in 'validate_remote_url()') correctly identifies
> them as invalid. So, to bring 'git fsck' in line with other (stronger)
> validation done on remote URLs, this series replaces the
> 'credential_from_url_gently()' check with one that uses 'url_normalize()'.
>
>  * Patch 1 moves 'check_submodule_url()' to a public location so that it can
>    be used outside of 'fsck.c'.
>  * Patch 2 removes the obsolete/never-used code in 'test-tool submodule
>    check-name' handling names provided on the command line.
>  * Patch 3 adds a 'check-url' mode to 'test-tool submodule', calling the
>    now-public 'check_submodule_url()' method on a given URL, and adds new
>    tests checking valid and invalid submodule URLs.
>  * Patch 4 replaces the 'credential_from_url_gently()' check with
>    'url_normalize()' followed by 'url_decode()' and an explicit check for
>    newlines (to preserve the newline handling added in 07259e74ec1 (fsck:
>    detect gitmodules URLs with embedded newlines, 2020-03-11)).

Nicely done.  I'll wait for a few days to see if anybody else has
reaction but after reading the patches myself, my inclination is to
suggest merging it to 'next'.

Thanks.

@@ -9,7 +9,7 @@
#include "submodule.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Victoria Dye <vdye@github.com>
>
> The 'check-name' subcommand to 'test-tool submodule' is documented as being
> able to take a command line argument '<name>'. However, this does not work -
> and has never worked - because 'argc > 0' triggers the usage message in
> 'cmd__submodule_check_name()'. To simplify the helper and avoid future
> confusion around proper use of the subcommand, remove any references to
> command line arguments for 'check-name' in usage strings and handling in
> 'check_name()'.
>
> Helped-by: Jeff King <peff@peff.net>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  t/helper/test-submodule.c | 29 +++++++++--------------------
>  1 file changed, 9 insertions(+), 20 deletions(-)

Excellent, both of you.

Copy link

gitgitgadget bot commented Jan 18, 2024

This patch series was integrated into seen via git@e4697cd.

@@ -9,12 +9,19 @@
#include "submodule.h"

#define TEST_TOOL_CHECK_NAME_USAGE \
"test-tool submodule check-name <name>"
"test-tool submodule check-name"
static const char *submodule_check_name_usage[] = {
TEST_TOOL_CHECK_NAME_USAGE,
NULL
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Patrick Steinhardt wrote (reply to this):

On Thu, Jan 18, 2024 at 01:55:17AM +0000, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> Add tests to 't7450-bad-git-dotfiles.sh' to check the validity of different
> submodule URLs. To verify this directly (without setting up test
> repositories & submodules), add a 'check-url' subcommand to 'test-tool
> submodule' that calls 'check_submodule_url' in the same way that
> 'check-name' calls 'check_submodule_name'.
> 
> Add two tests to separately address cases where the URL check correctly
> filters out invalid URLs and cases where the check misses invalid URLs. Mark
> the latter ("url check misses invalid cases") with 'test_expect_failure' to
> indicate that this not the undesired behavior.

Nit: this should probably say "to indicate that this is not the desired
behaviour." But given that the other patches in this series look good to
me I don't think this warrants a reroll.

Thanks!

Patrick

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Patrick Steinhardt <ps@pks.im> writes:

>> Add two tests to separately address cases where the URL check correctly
>> filters out invalid URLs and cases where the check misses invalid URLs. Mark
>> the latter ("url check misses invalid cases") with 'test_expect_failure' to
>> indicate that this not the undesired behavior.
>
> Nit: this should probably say "to indicate that this is not the desired
> behaviour." But given that the other patches in this series look good to
> me I don't think this warrants a reroll.

Good eyes.

I'll rewrite that part to "... to indicate that this is currently
broken, which will be fixed in the next step." before merging the
series to 'next'.

Thanks.

Copy link

gitgitgadget bot commented Jan 20, 2024

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, Jan 18, 2024 at 10:24:51AM -0800, Junio C Hamano wrote:

> >  * Patch 1 moves 'check_submodule_url()' to a public location so that it can
> >    be used outside of 'fsck.c'.
> >  * Patch 2 removes the obsolete/never-used code in 'test-tool submodule
> >    check-name' handling names provided on the command line.
> >  * Patch 3 adds a 'check-url' mode to 'test-tool submodule', calling the
> >    now-public 'check_submodule_url()' method on a given URL, and adds new
> >    tests checking valid and invalid submodule URLs.
> >  * Patch 4 replaces the 'credential_from_url_gently()' check with
> >    'url_normalize()' followed by 'url_decode()' and an explicit check for
> >    newlines (to preserve the newline handling added in 07259e74ec1 (fsck:
> >    detect gitmodules URLs with embedded newlines, 2020-03-11)).
> 
> Nicely done.  I'll wait for a few days to see if anybody else has
> reaction but after reading the patches myself, my inclination is to
> suggest merging it to 'next'.

It all looks good to me to go to 'next'.

After simplifying the input handling in patch 2, I probably would not
have bothered with the abstracted interface in patch 3 (and instead just
repeated the few lines of boilerplate, since there's so much already).
Mostly just because function pointers in C often make reading and
debugging more annoying. But I don't think it's a very big deal either
way in this instance.

-Peff

Copy link

gitgitgadget bot commented Jan 20, 2024

This patch series was integrated into seen via git@9b70840.

Copy link

gitgitgadget bot commented Jan 20, 2024

This patch series was integrated into next via git@dad35ae.

@gitgitgadget gitgitgadget bot added the next label Jan 20, 2024
Copy link

gitgitgadget bot commented Jan 20, 2024

There was a status update in the "Cooking" section about the branch vd/fsck-submodule-url-test on the Git mailing list:

Tighten URL checks fsck makes in a URL recorded for submodules.

Will merge to 'master'.
source: <pull.1635.v2.git.1705542918.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented Jan 22, 2024

This patch series was integrated into seen via git@e3024dd.

Copy link

gitgitgadget bot commented Jan 23, 2024

This patch series was integrated into seen via git@69d76ed.

Copy link

gitgitgadget bot commented Jan 23, 2024

This patch series was integrated into seen via git@74fc911.

Copy link

gitgitgadget bot commented Jan 24, 2024

There was a status update in the "Cooking" section about the branch vd/fsck-submodule-url-test on the Git mailing list:

Tighten URL checks fsck makes in a URL recorded for submodules.

Will merge to 'master'.
source: <pull.1635.v2.git.1705542918.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented Jan 24, 2024

This patch series was integrated into seen via git@81202ee.

Copy link

gitgitgadget bot commented Jan 26, 2024

This patch series was integrated into seen via git@3c3bfed.

Copy link

gitgitgadget bot commented Jan 26, 2024

This patch series was integrated into seen via git@76bd129.

Copy link

gitgitgadget bot commented Jan 26, 2024

This patch series was integrated into master via git@76bd129.

Copy link

gitgitgadget bot commented Jan 26, 2024

This patch series was integrated into next via git@76bd129.

@gitgitgadget gitgitgadget bot added the master label Jan 26, 2024
Copy link

gitgitgadget bot commented Jan 26, 2024

Closed via 76bd129.

@gitgitgadget gitgitgadget bot closed this Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant