Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http: support environment-defined proxy #8381

Closed
silverwind opened this issue Sep 2, 2016 · 84 comments
Closed

http: support environment-defined proxy #8381

silverwind opened this issue Sep 2, 2016 · 84 comments
Labels
feature request Issues that request new features to be added to Node.js. http Issues or PRs related to the http subsystem. https Issues or PRs related to the https subsystem.

Comments

@silverwind
Copy link
Contributor

silverwind commented Sep 2, 2016

To enable HTTP connectivity behind corporate firewalls, a number of tools and programming languages support HTTP/HTTPS proxies defined through environment variables like

HTTP_PROXY=http://proxy.com
HTTPS_PROXY=https://proxy.com
NO_PROXY="*.home.com,another.com"

Note that there seems to be no consensus on the case of these variables and all-lowercase variable names are also very common. My limited research suggest that at least the following languages automatically obtain and use a proxy from the environment:

  • Python
  • Go
  • Ruby
  • R

The request module also supports these variables, but I feel they show be respected by core http and https for best compatibilty.

@silverwind silverwind added http Issues or PRs related to the http subsystem. https Issues or PRs related to the https subsystem. feature request Issues that request new features to be added to Node.js. labels Sep 2, 2016
@martinheidegger
Copy link

While I do agree that this is a very common thing to want, I think it is important to point out that so are also many other things that request implements. I think it is pretty cool of Node to allow to bypass the environment variable, so If it is implemented I think it should be optional and for backwards compatibility reasons "off-by-default".

@silverwind
Copy link
Contributor Author

silverwind commented Sep 2, 2016

off-by-default

I don't think this is going to work here. Imagine a CLI tool spawning a child process of node. In that case, the user cannot reasonably provide a --flag to enable proxy environment support. I think support should be unconditionally enabled. If one wants to skip the proxy, they can always do export NO_PROXY="*" (or unset the variables).

@martinheidegger
Copy link

martinheidegger commented Sep 2, 2016

...cannot reasonably provide a --flag to enable proxy environment support...

I think this is a fallacy (don't ask me which) because the statement can be reversed to: The user cannot reasonably provide a --not-flag to disable the proxy environment support.

The person that writes the request:

http.request(url, {
  respectEnv: true
})

decides if this request respects the environment variable (new mode) or not (legacy).

I am pushing this because HTTP_PROXY environment variables and errors related to them are horrible to debug once you have an application running and just upgraded a node version.

In any case I would say that a default: respectEnv=false should be treated as semver:minor.
A default of respectEnv=true as semver:major.

@jasnell
Copy link
Member

jasnell commented Sep 2, 2016

Not saying this shouldn't be done, but there is a bit of a security risk inherent in this. Using an environment variable, it would be possible for a rogue module to set the environment variable discreetly causing traffic to be redirected through the proxy without the developers/users awareness.

@silverwind
Copy link
Contributor Author

@jasnell Pretty sure something similar could be achieved by monkey-patching http right now. In the end, you just have to trust your modules.

@jasnell
Copy link
Member

jasnell commented Sep 2, 2016

Yep, as I said, I didn't say it shouldn't be done ;-) If we do it tho, we need to make sure the risks are well documented.

@mscdex
Copy link
Contributor

mscdex commented Sep 2, 2016

-1 as I think this is something that is better handled in userland

@silverwind
Copy link
Contributor Author

silverwind commented Sep 2, 2016

This would depend on proxy support in the agent, something which #1490 looks to have been for, but was closed for some reason.

Also to quote @sindresorhus from sindresorhus/got#79:

I strongly believe this is something that should be a part of Node.js and not every module doing HTTP

@tjwebb
Copy link

tjwebb commented Dec 5, 2016

it would be possible for a rogue module to set the environment variable discreetly causing traffic to be redirected through the proxy

A Rogue module can monkey-patch all the methods in the http module if it wants to. Rogue modules are their own separate problem that I don't think we can solve here.

-1 as I think this is something that is better handled in userland

I don't think the structure of a network which is beyond my control qualifies as a userland problem. My OS deals with any proxy I may or may not need to connect through so that each application doesn't have to deal with this. Does this example analogy not hold for nodejs core? If not, do we still want to continue burdening every module using http with implementing this option?

@bnoordhuis
Copy link
Member

Related: #1490

To summarize: "Proxy support?" "Not saying no but..."

@matthewwiesen
Copy link

I would agree that ultimately this is something that should be handled directly within node.

Node should be read my environment variables to see that I am starting the node executable with my http_proxy environment variable and thus I want all http requests to go through my proxy.

curl, npm, git, etc, all respect these environment variables by default. This is the purpose of the environment. The user has already specified these within their environment, so the fact that they aren't being respected is very confusing.

It shouldn't be left up to a non-node-core library developer to decide to support my proxy environment by properly configuring node supported HTTP module with my environment variables or via a configuration to be passed into their library because ultimately I may not be directly using those modules, such as in the case of using a framework that includes a library that includes a library that interfaces with the HTTP module. Thus this "userland" solution essentially creates a recursive issue through all dependencies which is much harder to solve in all cases.

Since the node HTTP library is at the core of this, it should solve this issue by respecting the environment variables used at run time and override other attempts where libraries / modules may try to set these settings itself unless some other environment variable is passed to allow for this.

@bnoordhuis
Copy link
Member

@matthewwiesen The counterargument to your argument is that curl, npm and git are all end user programs, whereas node is a platform.

A better comparison is python: the builtin httplib doesn't respect http_proxy, that is left to user libraries. Python, unlike node, has a strong "batteries included" philosophy, so that is saying something.

Also: big slippery slope. Yes, curl respects http_proxy... but it also honors all_proxy, no_proxy with patterns and wildcards, and happily parses your .netrc with every connection. I don't think that has a place in core. Core is for mechanism, not policy.

@matthewwiesen
Copy link

Looking at the Python, it seems that the documentation does mention respecting http_proxy in regard to the core urllib.request module:

https://docs.python.org/3.4/library/urllib.request.html?highlight=http_proxy

In addition, if proxy settings are detected (for example, when a *_proxy environment variable like http_proxy is set), ProxyHandler is default installed and makes sure the requests are handled through the proxy.

Although, this page does mention to "users" that is recommended for a higher-level http client interface such as the Requests module, which seems to also support the environment variables:

http://docs.python-requests.org/en/master/user/advanced/#proxies

Looking at Go, it seems to be pretty similar, it seem to be supported within their Net HTTP Transport module:
https://golang.org/src/net/http/transport.go?h=http_proxy

Looking at Ruby, this seems to be supported as well:
https://github.com/ruby/ruby/blob/f845a9ef76c0195254ded79c85c24332534f4057/lib/net/http.rb#L638

Unless I'm mistaken on these sources.

@bnoordhuis
Copy link
Member

I picked httplib because it's the python counterpart to node's http module. urllib is an 'open any kind of url' toolkit - useful, but without a node equivalent.

requests is actually a good example of what I mean. It's the python sister to request. request supports proxies, tunnels, auth, etc; it's the go-to package for anyone making http requests.

I don't see a compelling reason to duplicate that functionality in core when a best-of-breed solution exists and is in widespread use.

@sam-github
Copy link
Contributor

node's focus is implementing HTTP mechanism well, not policies, to enable diversity and iterative improvement of user-facing HTTP client libraries like https://www.npmjs.com/package/fetch and https://www.npmjs.com/package/request. It happens that node core http is reasonably useable as-is, but if you hit its limitations, I suggest you use something else.

Since node's APIs cannot be changed/improved without causing great difficulties, because they can't be properly versioned, unlike userland modules, we should be very, very cautious in adding functionality. The "widely useful" test is not sufficient, it needs to be "doing it in userland causes real problems". node comes with npm included - its a great way to select the API and version of that API that you want to use that has the features you need.

@matthewwiesen
Copy link

So just to be clear, the proposed solution here is to:

  1. Review application frameworks/modules/libraries that one uses
  2. If the framework/module/library doesn't provide support for proxy environment settings, open an issue with the author.
  3. They become aware of proxy environments and then need to review their own code and dependencies to see what is causing the issue and why the proxy environment is not being respected.
  4. This leads to potentially more issues being opened as we travel to upstream modules/libraries to support this to downstream module/library authors.
  5. Keep going until we get to the root cause, which leads us to likely one of two scenarios:
    1. Proxy environment is supported, but is either not configured properly through dependency chain to support it back through all libraries to the end user.
    2. Proxy environment is not supported at all and we need to get everyone on board to support this.

This ultimately leads toward educating a significant sector of the Node community to HTTP proxy environments, when this is something that specific to how HTTP requests should be handled within a specific user environment.

Why should we go to every node module author to tell them that we can't use their framework/module/library because at the end of the day somewhere something is wraping the node HTTP request and that they specifically didn't support the HTTP Proxy environment because in all likelyhood they were not aware of this use case since probably over 95% of the node community doesn't have an environment like this. They likely aren't aware of running their code within a proxy environment and thus did not allow for or implement pass-thru support to allow for an end user to configure their code to ultimately configure Node HTTP to work properly in an environment where access to the external internet must go through a HTTP Proxy.

@bnoordhuis To address some of your comments from here:

I don't think proxy support in core is completely out of the question but it's a bit of a slippery slope.

Speaking from prior experience where proxy support was added to an existing product, you start with HTTP CONNECT but you end up supporting everything from SOCKSv5 to HTTP POST, basic/digest/and-so-on auth, Kerberos, NTLMv1 and v2, custom CA chains, client certificates, certificate fingerprinting, etc., etc.

There is no strict need to add it to core either because perfectly good user-land solutions exist.

I don't see how allowing HTTP proxy will suddenly open the flood gate to supporting all of those items you mention.

  • SOCKS5 operates at a lower level than HTTP, so why would this be supported?
  • NTLM/Kerberos/etc - These are their own protocols, so why would these be supported?

Regarding the SSL Certificates:

I will say that I think supporting internal CA certificates should also be supported since ultimately CA certificates themselves are already supported within Nodes HTTPS module here via the TLS module here here, which indicates that when the ca value is "omitted several well known "root" CAs (like VeriSign) will be used", which are included via openssl by default.

The same issue as the proxy environment occurs in this scenario where with custom internal Root CA certificates when there is not an easy way to inject the internal CA certificate chains at run time via an environment variable so that the TLS module will augment its default CA and append my internal CA certs to the existing chain it already supports. But what is required today is that we necessitate that all downstream modules play nice to allow for easily hooking into and passing these configuration options through the full chain so that my HTTPS requests to sites secured with an internal CA issued certificate will validate properly. Or there is always the insecure NODE_TLS_REJECT_UNAUTHORIZED=0 work around to just not validate my internal CA issued certificates if there may be no way for me configure/pass my Root CA certificate through the various dependency trees of my frameworks / libraries which may have not thought to support this since over 95% of the node community doesn't run their own Certificate Authority.

Finally, in regard to security with respect to the "rogue module" scenario, wouldn't implementing this feature in core prevent this?

If node HTTP was by itself setting the proxy settings via the user supplied environment variable and essentially rejecting configuration options passed to it when an environment variable is available, then there would be no way for a "rogue module" siting between the end users code and the Node HTTP module to manipulate this.

I agree that supporting these changes, does seem to be breaking with the current behavior of how things work today, but :

  1. Probably over 95% of users wouldn't be affected by this to begin with.
  2. This greatly simplifies HTTP proxy and internal CA chain management by allowing end users to directly configure / override current behaviors to actually support their environment by default.
  3. These changes could be rolled in over time to give people who run their node applications/scripts with a HTTP proxy environment variable set, would be presented a warning to indicate that a change in behavior of this will occur and inform that there is a way to revert this behavior via a separate environment variable (e.g. NODE_HTTP_PROXY_ENV_VARIABLE=0) or something of that nature.

Personally, I would think that if someone where starting their application/script with their HTTP proxy environment set, then it is not unreasonable to assume that this is what they want since they configured this. If they did not want to proxy their requests, they could always do what they do now, by not setting the HTTP proxy and directly configuring this within their various libraries / modules or directly via the node HTTP lib.

@silverwind
Copy link
Contributor Author

silverwind commented Dec 6, 2016

I will say that I think supporting internal CA certificates should also be supported

Support for custom CAs has landed in #9139. It's not released yet, but should be available in the next minor 7.x release, and it'll possibly be backported some point after that.

@sam-github
Copy link
Contributor

To be clear, your straw man procedure:

  1. Review application frameworks/modules/libraries that one uses
  2. If the framework/module/library doesn't provide support for proxy environment settings, open an issue with the author.
    ... etc, etc

is just that, a strawman. The actual procedure is:

  1. use request or fetch when making client requests
  2. if you find a module not doing that, ask them why they aren't
  3. if for some reason authors have a compelling reason to code directly at the HTTP layer using the node HTTP APIs... they are responsible for that choice, and thankfully, node allows them to do exactly that, its APIs allow quite flexible use of HTTP.

Also, I suggest that 95% of users are probably using request already, so are already unaffected by this.

What exact problem did you have that led you to think Node should do this in core?

Did you not know that node encourages the use of npm modules, wrote code using http, and your code didn't work, and that surprised you? If so, we can strengthen the docs on this.

Or did you find a third party module that was coding directly to the http API, and it didn't work? If so, we should still strengthen the docs, and an issue should be opened with that third party module.

@bnoordhuis
Copy link
Member

I don't see how allowing HTTP proxy will suddenly open the flood gate to supporting all of those items you mention.

That's simply how such things play out when a piece of software is popular. It's only a matter of time before someone requests them because "feature $x doesn't work for me because you don't support feature $y."

To answer your specific questions: SOCKS because people use it (request supports it although I don't know if it does DNS-over-SOCKS), NTLM and Kerberos because many proxies require authenticated connections.

@matthewwiesen
Copy link

@silverwind great to hear this is being included. Looking at #9139 this seems like good news on this front. Glad to see there is agreement there.

@sam-github The example that I am providing is that:

I would like to use a Node Web Application Framework, this application is dependent upon lots of modules. Those modules are dependent upon modules, which are dependent upon modules and ultimately we arrive at https://www.npmjs.com/package/got, which is were the we see the issue.

got appears to directly interface with HTTP, but doesn't appear to support HTTP Proxy itself and bills itself as stripped down version of request. However, got appears to be somewhat popular with 3.5 million downloads over the last month, but it appears that 645 other modules are use got as a direct dependency.

Now you say that request is depended upon 95% of the base, but npm shows it gets installed 18.5 million times per month and 19k modules are dependent upon it. While got appears to be in the minority here, it doesn't seem that insignificant either.

So what are my options here, go to the got authors and ask them to support HTTP Proxy?
Go to the dependencies that use got and tell them that they should switch to request to support HTTP Proxy?

If got ends up supporting HTTP proxy, do they do as I suggest and make this simple by looking for the environment variable (much in the same way that request does this or do they implement this in some other way to require passing configuration options through code so that I need to get modules that depend on this to support these new configuration options?

Great we solved this for got now, but what about the next module that breaks this convention?

You see how to essentially support HTTP proxy we are forcing this to be done on every module? Or the solution is just to not use that module?

I don't see how that is a solution at all.

I certainly don't see a straw man here as since I'm not misrepresenting or exaggerating the options here.

If anything the opposing argument I see here is that if we implement HTTP proxy then it opens the flood gate, which certainly an assumption and definitely not true.

@sam-github
Copy link
Contributor

Thanks @matthewwiesen , that helps a lot to give context.

And I'm still not in agreement, and somewhat baffled. got exists exactly because node core doesn't support:

following redirects, promises, streams, retries, automagically handling gzip/deflate and some convenience options. Created because request is bloated (several megabytes!).

Why is this one single feature (that is missing from got), proxy support, something that should be merged into node core, and the laundry lists of features above something that should not? Where would you and @sindresorhus draw the line? I draw the line at: must be implemented in the Node HTTP layer to enable user land to efficiently implement friendly APIs on top. By which categorization, proxy, redirects, etc. are out.

Particularly when the very existence of got is a reaction to request having "too many" features, you can see how one person's "bloat" is another's "essential", and how much node should strive to not have any unnecessary features, so it can never be bloated.

I'd say, the bare-bones nature of node's HTTP enables fetch/request/got to all have their own opinions on what features are bloat, and what are essential. If got thinks proxy support (or following redirects) is essential, then implement it.

got can get together with fetch (https://github.com/andris9/fetch/issues/23) and maybe even request, and share the same code, code which may already exist: sindresorhus/got#79 (comment)

@matthewwiesen
Copy link

matthewwiesen commented Dec 7, 2016

This brings me to my original request.

If a user has explicitly defined the HTTP proxy environment variable in the environment that the node processes is started, when would it ever be the desired behavior that this is not respected?

This is the main reason why I believe this should be embedded in core, so that this is enforced as per the users explicit request by setting the HTTP proxy environment variable within the shell to ensure that ALL HTTP requests made through node pass through this user's proxy since if this is not done undesirable effects will occur.

I believe this fits into the similar use case of why it is important to allow the user to define their own custom CA certificates within an environment variable and I agree with the point you made here.

Ultimately this is necessary in core to ensure that node HTTP behaves in a way that is conducive to the corporate/enterprise environment in a consistent/enforceable fashion, so that the HTTP Proxy is honored regardless of downstream module authors since they may not support this because they are not familiar with proxy environments which ultimately leads to their modules not behaving correctly within the corporate environment where a proxy is essentially standard. This will ultimately mean that requests they expect to function normally will be blocked because they are not routed through the HTTP Proxy as defined by the user's environment variables.

The ultimate desired effect from the user's perspective is that there exists a need for ALL HTTP requests honor the proxy environment since it is undesirable where when the HTTP Proxy environment variable is set and is not enforced as this leads to undesirable behaviors with the HTTP environment where requests will not function unless all downstream modules that interface with Node HTTP do so where the proxy environment is ultimately configured, which begs the question as to why would we not want this in core?

It is because of this which is why I propose the HTTP proxy to be included into core so that this is the default effect when a user supplies the HTTP proxy environment.

As for all other HTTP customization/configurations these are not relevant to core.

If the user did not want to enforce the HTTP Proxy in this proposed way, it would be reasonable that the user not define their HTTP Proxy environment variable and leave this unset.

Edit:
Fixed some spelling/typos being on mobile.

@sam-github
Copy link
Contributor

sam-github commented Dec 7, 2016

@nodejs/ctc needs more input. I'd be interested in knowing from the authors of request (@mikeal, @simov ), fetch (@andris9), and got (@sindresorhus).

I'm half convinced there should be a canonical github.com/nodejs/http-proxy or something of the like that can be used by down stream HTTP client APIs, still not so convinced that it should be in node core, would like to hear more from implementors.

@andris9
Copy link

andris9 commented Dec 7, 2016

Just to be sure, did you had node-fetch in mind instead of my fetch which is some pretty old code and not used so much?

I don't know a lot about HTTP proxies but I have added proxy support to Nodemailer for SMTP connections (docs here). Basically I added a new method getSocket that by default opens a new socket against provided host and port but can be overriden to provide an existing socket from somewhere else.

@cawoodm
Copy link

cawoodm commented Dec 4, 2019

So... does node-fetch support no_proxy or not?

@silverwind
Copy link
Contributor Author

silverwind commented Aug 26, 2020

I think http/https proxy support should be reconsidered for a future fetch API. I still run into the issue often and this issue alone has 71 votes currently showing there is a strong need for this from the community.

heylel-b-sh added a commit to datirium/scidap-satellite that referenced this issue Nov 6, 2020
Tries to read 'proxySettings' string from config.json. If it was
provided, sets it to --all-proxy parameter when running aria2c
and https_proxy environment variable when running satellite (node)
Refere to these links:
nodejs/node#8381
https://github.com/request/request#controlling-proxy-behaviour-using-environment-variables
tommoor added a commit to outline/outline that referenced this issue Apr 17, 2021
closes #1893

For some fun discussion on why this is required, see this issue: nodejs/node#8381
tommoor added a commit to outline/outline that referenced this issue May 13, 2021
* fix: Add server-side proxy support via fetch-with-proxy

closes #1893

For some fun discussion on why this is required, see this issue: nodejs/node#8381

* lint
@wang1212
Copy link

Because nodejs does not support http_proxy configuration proxy, users in some regions cannot use many third-party tool libraries that need to download remote resources, unless the tool library supports local loading of resources or provides parameter configuration, but obviously many tool libraries do not not support.

@bnoordhuis
Copy link
Member

FWIW, I've changed my mind on this. I feel it would be convenient to have proxy support in core (esp. because of fetch()) but I haven't thought too deeply about the details. Feature creep is still a real problem, too.

Having said all that, I suspect this is waiting for nodejs/undici#1650 before anything else can happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. http Issues or PRs related to the http subsystem. https Issues or PRs related to the https subsystem.
Projects
None yet
Development

No branches or pull requests