Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NTLM support for HTTP(s) servers and proxies #5052

Merged
merged 26 commits into from
Jun 11, 2019
Merged

Conversation

ethomson
Copy link
Member

When we removed libcurl, we inadvertantly removed support for NTLM connections to proxies; it was supported automatically, which I don't think any of the contributors realized.

This adds NTLM support by including https://github.com/ethomson/ntlmclient, which is a pure C NTLM client implementation that was built to live nicely within the libgit2 codebase.

In addition, it does some refactoring to the HTTP codebase to support NTLM connections to both endpoint servers and proxies.

@ethomson
Copy link
Member Author

Rebased on master.

@ethomson
Copy link
Member Author

What would help review this? Should I break out the necessary http client refactoring so that one can review the necessary bits and then follow up with an addition of the NTLM library?

Copy link
Contributor

@tiennou tiennou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I had wanted to review sooner that but got caught up onto things. So, here's a "sparse" review, in that I've read everything, but didn't look hard for bugs, as most of it seems NFC and/or (nice) cleanups.


static int ntlm_init_context(
http_auth_ntlm_context *ctx,
const git_net_url *url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is out of order, as the git_net_url rename commit is later in the patch series. Just a heads up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the GitHub UI may be showing commits out of order, 8497027 (the rename to git_net_url) is the first commit in this series.

src/net.c Outdated
git_buf_oom(&host) ||
git_buf_oom(&port) ||
git_buf_oom(&path) ||
git_buf_oom(&query) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: indent.

@@ -682,12 +682,6 @@ void test_online_clone__ssh_memory_auth(void)
cl_git_pass(git_clone(&g_repo, _remote_url, "./foo", &g_options));
}

void test_online_clone__url_with_no_path_returns_EINVALIDSPEC(void)
{
cl_git_fail_with(git_clone(&g_repo, "http://github.com", "./foo", &g_options),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

return -1;
}

git_buf_dispose(&request);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I went looking for a realloc call and found none. I'm not sure I got how this makes it "not realloc", though I understood the intent is to remove that dispose call ⬆️ so the request isn't lost. Maybe ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I could have been more clear. We're not explicitly calling realloc, but we are doing that implicitly, since we were defining the request buffer inside the replay block. So we would create a new buffer for every replay, which is not necessary. Instead we can clear the existing buffer and just put the new URI inside of it.

@@ -0,0 +1,33 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping external dependency code because of sheer laziness. Would you like it reviewed anyway ? (I remember taking a look a few months ago and didn't see anything that stood out).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty comfortable with it. The functional portions are a reasonably direct port of Microsoft's code and from a technical perspective, I've run this through fuzzers and memory checkers galore. So although there may be (and I'm sure there still will be) bugs, I think we'll need to ship this to find them.

@@ -483,22 +537,22 @@ static int on_headers_complete(http_parser *parser)
if (parser->status_code == 407 && get_verb == s->verb)
return on_auth_required(&t->proxy.cred,
parser,
&t->proxy,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: strange indent? Not sure who's wrong though 😅.

proxy_auth_types);

/* Check for an authentication failure. */
if (parser->status_code == 401 && get_verb == s->verb)
return on_auth_required(&t->server.cred,
parser,
&t->server,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

}

return error;
return git_net_url_parse(&t->proxy.url, t->proxy_opts.url);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥👏

}
}

/* Enforce a reasonable cap on the number of replays */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: indent.

@ethomson
Copy link
Member Author

ethomson commented Jun 8, 2019

/rebuild

@libgit2-azure-pipelines
Copy link

Okay, @ethomson, I started to rebuild this pull request as build #2017.

"Connection data" is an imprecise and largely incorrect name; these
structures are actually parsed URLs.  Provide a parser that takes a URL
string and produces a URL structure (if it is valid).

Separate the HTTP redirect handling logic from URL parsing, keeping a
`gitno_connection_data_handle_redirect` whose only job is redirect
handling logic and does not parse URLs itself.
There's no reason a git repository couldn't be at the root of a server,
and URLs should have an implicit path of '/' when one is not specified.
We did not properly support default credentials for proxies, only for
destination servers.  Refactor the credential handling to support sending
either username/password _or_ default credentials to either the proxy or
the destination server.

This actually shares the authentication logic between proxy servers and
destination servers.  Due to copy/pasta drift over time, they had
diverged.  Now they share a common logic which is: first, use
credentials specified in the URL (if there were any), treating empty
username and password (ie, "http://:@foo.com/") as default credentials,
for compatibility with git.  Next, call the credential callbacks.
Finally, fallback to WinHTTP compatibility layers using built-in
authentication like we always have.

Allowing default credentials for proxies requires moving the security
level downgrade into the credential setting routines themselves.
We will update our security level to "high" by default which means that
we will never send default credentials without prompting.  (A lower
setting, like the WinHTTP default of "medium" would allow WinHTTP to
handle credentials for us, despite what a user may have requested with
their structures.)  Now we start with "high" and downgrade to "low" only
after a user has explicitly requested default credentials.
Update our CI tests to start a proxy that requires NTLM authentication;
ensure that our WIndows HTTP client can speak NTLM.
Increase the permissible replay count; with multiple-step authentication
schemes (NTLM, Negotiate), proxy authentication and redirects, we need
to be mindful of the number of steps it takes to get connected.

7 seems high but can be exhausted quickly with just a single authentication
failure over a redirected multi-state authentication pipeline.
We cannot examine the keep-alive status of the http parser in
`http_connect`; it's too late and the critical information about whether
keep-alive is supported has been destroyed.

Per the documentation for `http_should_keep_alive`:

> If http_should_keep_alive() in the on_headers_complete or
> on_message_complete callback returns 0, then this should be
> the last message on the connection.

Query then and set the state.
Some authentication mechanisms (like HTTP Basic and Digest) have a
one-step mechanism to create credentials, but there are more complex
mechanisms like NTLM and Negotiate that require challenge/response after
negotiation, requiring several round-trips.  Add an `is_complete`
function to know when they have round-tripped enough to be a single
authentication and should now either have succeeded or failed to
authenticate.
When we get an authentication failure, we must consume the entire body
of the response.  If we only read half of the body (on the assumption
that we can ignore the rest) then we will never complete the parsing of
the message.  This means that we will never set the complete flag, and
our replay must actually tear down the connection and try again.

This is particularly problematic for stateful authentication mechanisms
(SPNEGO, NTLM) that require that we keep the connection alive.

Note that the prior code is only a problem when the 401 that we are
parsing is too large to be read in a single chunked read from the http
parser.

But now we will continue to invoke the http parser until we've got a
complete message in the authentication failed scenario.  Note that we
need not do anything with the message, so when we get an authentication
failed, we'll stop adding data to our buffer, we'll simply loop in the
parser and let it advance its internal state.
We must always consume the full parser body if we're going to
keep-alive.  So in the authentication failure case, continue advancing
the http message parser until it's complete, then we can retry the
connection.

Not doing so would mean that we have to tear the connection down and
start over.  Advancing through fully (even though we don't use the data)
will ensure that we can retry a connection with keep-alive.
Ensure that the server supports the particular credential type that
we're specifying.  Previously we considered credential types as an
input to an auth mechanism - since the HTTP transport only supported
default credentials (via negotiate) and username/password credentials
(via basic), this worked.  However, if we are to add another mechanism
that uses username/password credentials, we'll need to be careful to
identify the types that are accepted.
A "connection" to a server is transient, and we may reconnect to a
server in the midst of authentication failures (if the remote indicates
that we should, via `Connection: close`) or in a redirect.
Hold an individual authentication context instead of trying to maintain
all the contexts; we can select the preferred context during the initial
negotiation.

Subsequent authentication steps will re-use the chosen authentication
(until such time as it's rejected) instead of trying to manage multiple
contexts when all but one will never be used (since we can only
authenticate with a single mechanism at a time.)

Also, when we're given a 401 or 407 in the middle of challenge/response
handling, short-circuit immediately without incrementing the retry
count.  The multi-step authentication is expected, and not a "retry" and
should not be penalized as such.

This means that we don't need to keep the contexts around and ensures
that we do not unnecessarily fail for too many retries when we have
challenge/response auth on a proxy and a server and potentially
redirects in play as well.
For request-based authentication mechanisms (Basic, Digest) we should
keep the authentication context alive across socket connections, since
the authentication headers must be transmitted with every request.

However, we should continue to remove authentication contexts for
mechanisms with connection affinity (NTLM, Negotiate) since we need to
reauthenticate for every socket connection.
Instead of using `is_complete` to decide whether we have connection or
request affinity for authentication mechanisms, set a boolean on the
mechanism definition itself.
We stop the read loop when we have read all the data.  We should also
consider the server's feelings.

If the server hangs up on us, we need to stop our read loop.  Otherwise,
we'll try to read from the server - and fail - ad infinitum.
When we have a keep-alive connection to the server, that server may
legally drop the connection for any reason once a successful request and
response has occurred.  It's common for servers to drop the connection
after some amount of time or number of requests have occurred.
When we're issuing a CONNECT to a proxy, we expect to keep-alive to the
proxy.  However, during authentication negotiations, the proxy may close
the connection.  Reconnect if the server closes the connection.
When we send HTTP credentials but the server rejects them, tear down the
authentication context so that we can start fresh.  To maintain this
state, additionally move all of the authentication handling into
`on_auth_required`.
@ethomson ethomson merged commit 110b589 into master Jun 11, 2019
@ethomson ethomson deleted the ethomson/netrefactor branch February 2, 2020 13:08
joebonrichie pushed a commit to solus-packages/atom that referenced this pull request Aug 17, 2023
This appears to be the result of Atom's git-utils package vendoring libgit2 during 5.6.0 and 5.6.1 at an incorrect point in libgit2's history, where there was an ABI change where gitno_connection_data was renamed to get_net_url. This landed on June 11th, per [this pull request](libgit2/libgit2#5052) but doesn't account for subsequent commits to src/net.c, realistically they should sort out their submodule or not rely on it in the first place. This new git-utils package was pulled in specifically for 1.41.0 and thus reverting it 5.5.0 from 1.40.0 resolves the issue.

Alongside this change I also went ahead and removed some long unnecessary node engine min / max version comparisons.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

None yet

2 participants