Follow cURL's rules for parsing and matching NO_PROXY #1332

abatkin · 2021-09-20T04:48:06Z

There are a few ways in which reqwest's handling of NO_PROXY differs from cURL (and other implementations). The biggest issue is that whitespace between entries should be ignored/trimmed, but is not (i.e. "NO_PROXY='a, b'" would never match "b"). In addition, according to cURL's rules, a NO_PROXY entry without a leading dot should match the domain itself as well as any subdomains (reqwest only handles exact matches if there is no leading dot) and entries with a leading dot should only match subdomains (but request allows exact matches). Finally, cURL allows a special entry "*" to match all entries (effectively disabling use of the proxy).

All changes in behavior have corresponding changes to the tests.

I tried to keep the logic clear/clean/idiomatic but I'm definitely open to suggestions for improvement.

src/proxy.rs

seanmonstar · 2021-09-21T23:05:04Z

Thanks for this! I think trying to match curl's rules is a great goal. Just to help me reason on the change, will this "break" any behavior that people might have come to rely on? I realize in a crazy way, people can rely on any ol' bug, but I mean if things were sufficiently different.

abatkin · 2021-09-23T03:23:42Z

The addition of wildcard (*) support is a new feature that shouldn't break anything, and handling embedded spaces in lists (i.e. google.com,microsoft.com and google.com, microsoft.com should be equivalent) I look at as a bugfix.

This does change behavior in a breaking way for a couple cases:

no_proxy	prior behavior	new behavior
`no_proxy=google.com` (no leading dot)	`google.com` (exact match) would match and not use the proxy, `www.google.com` (subdomains) would not match and would use the proxy	`google.com` (exact match) will will match and will use the proxy, *`www.google.com` (subdomains) will also* match and will not use the proxy**
`no_proxy=.google.com` (leading dot)	`google.com` (exact match) and `www.google.com` (subdomains) would match and not use the proxy	*`google.com` (exact match) would not* match and will use the proxy**, `www.google.com` (subdomains) will continue to match and use the proxy

I would argue that the first of these changes is absolutely correct (i.e. if this breaks someone, they need to fix their no_proxy) whereas the second one is a little more questionable. Doing some more research, it does appear that - while cURL follows these (newly implemented) rules - not all other implementations do. In other words, some implementations treat no_proxy=google.com and no_proxy=.google.com identically (match on exact strings sans dot, as well as substrings).

I can undo that second bit if you think it is safer. So: in this instance, should reqwest prefer better compatibility with the existing behavior, or copy cURL's behavior? (I'm torn)

abatkin · 2021-09-23T03:34:15Z

I should add that reqwest's no_proxy behavior already diverges from cURL: reqwest allows IP address (both v4 and v6) matches to specify a "prefix mask" (i.e. no_proxy=192.168.1.0/24 would match 192.168.1.17) whereas cURL only allows exact string matches. There are other implementations that do what reqwest does, and I'd say the behavior here is far superior compared to cURL.

…itself

This commit makes NO_PROXY entries without leading dots match subdomains (which they had not done previously). Additionally, the magic entry "*" matches all domains (effectively disabling use of the proxy).

abatkin · 2021-10-03T02:31:56Z

@seanmonstar Rebased against master, reverted the (incompatible) bit where NO_PROXY entries beginning with . no longer matched themselves, and added some docs to the NoProxy struct.

seanmonstar

reverted the (incompatible) bit where NO_PROXY entries beginning with . no longer matched themselves

I kinda think we could make that change even though it's breaking, since it doesn't seem like someone would normally be using .foo.bar unless trying to get the exact behavior of curl. But we can also make that a follow-up... Either way, thanks for this! It's excellent!

abatkin commented Sep 20, 2021

View reviewed changes

src/proxy.rs Show resolved Hide resolved

abatkin added 7 commits October 2, 2021 19:44

Handle NO_PROXY with spaces around items in list

88116aa

Leading dot in NO_PROXY should only match subdomains, not the domain …

7da7a76

…itself

Follow cURL's rules for matching NO_PROXY entries

4e24f3f

This commit makes NO_PROXY entries without leading dots match subdomains (which they had not done previously). Additionally, the magic entry "*" matches all domains (effectively disabling use of the proxy).

rustfmt

31868ba

add some comments around noproxy tests

6ff1f42

add another no_proxy test to make sure no_proxy="," works

ec56161

one more no_proxy test

8fa4bb8

abatkin force-pushed the master branch from 2ac6ae5 to 8fa4bb8 Compare October 2, 2021 23:45

abatkin added 2 commits October 2, 2021 21:47

no_proxy entries exact match domains even if they start with .

a7d8c95

More complete docs on NoProxy

b56c9b8

seanmonstar approved these changes Oct 5, 2021

View reviewed changes

seanmonstar enabled auto-merge (squash) October 5, 2021 00:19

seanmonstar disabled auto-merge October 7, 2021 18:39

seanmonstar merged commit 203cd5b into seanmonstar:master Oct 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow cURL's rules for parsing and matching NO_PROXY #1332

Follow cURL's rules for parsing and matching NO_PROXY #1332

abatkin commented Sep 20, 2021

seanmonstar commented Sep 21, 2021

abatkin commented Sep 23, 2021

abatkin commented Sep 23, 2021

abatkin commented Oct 3, 2021

seanmonstar left a comment

Follow cURL's rules for parsing and matching NO_PROXY #1332

Follow cURL's rules for parsing and matching NO_PROXY #1332

Conversation

abatkin commented Sep 20, 2021

seanmonstar commented Sep 21, 2021

abatkin commented Sep 23, 2021

abatkin commented Sep 23, 2021

abatkin commented Oct 3, 2021

seanmonstar left a comment

Choose a reason for hiding this comment