Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http: support environment-defined proxy #8381

Closed
silverwind opened this issue Sep 2, 2016 · 81 comments
Closed

http: support environment-defined proxy #8381

silverwind opened this issue Sep 2, 2016 · 81 comments

Comments

@silverwind
Copy link
Contributor

@silverwind silverwind commented Sep 2, 2016

To enable HTTP connectivity behind corporate firewalls, a number of tools and programming languages support HTTP/HTTPS proxies defined through environment variables like

HTTP_PROXY=http://proxy.com
HTTPS_PROXY=https://proxy.com
NO_PROXY="*.home.com,another.com"

Note that there seems to be no consensus on the case of these variables and all-lowercase variable names are also very common. My limited research suggest that at least the following languages automatically obtain and use a proxy from the environment:

  • Python
  • Go
  • Ruby
  • R

The request module also supports these variables, but I feel they show be respected by core http and https for best compatibilty.

@martinheidegger

This comment has been minimized.

Copy link

@martinheidegger martinheidegger commented Sep 2, 2016

While I do agree that this is a very common thing to want, I think it is important to point out that so are also many other things that request implements. I think it is pretty cool of Node to allow to bypass the environment variable, so If it is implemented I think it should be optional and for backwards compatibility reasons "off-by-default".

@silverwind

This comment has been minimized.

Copy link
Contributor Author

@silverwind silverwind commented Sep 2, 2016

off-by-default

I don't think this is going to work here. Imagine a CLI tool spawning a child process of node. In that case, the user cannot reasonably provide a --flag to enable proxy environment support. I think support should be unconditionally enabled. If one wants to skip the proxy, they can always do export NO_PROXY="*" (or unset the variables).

@martinheidegger

This comment has been minimized.

Copy link

@martinheidegger martinheidegger commented Sep 2, 2016

...cannot reasonably provide a --flag to enable proxy environment support...

I think this is a fallacy (don't ask me which) because the statement can be reversed to: The user cannot reasonably provide a --not-flag to disable the proxy environment support.

The person that writes the request:

http.request(url, {
  respectEnv: true
})

decides if this request respects the environment variable (new mode) or not (legacy).

I am pushing this because HTTP_PROXY environment variables and errors related to them are horrible to debug once you have an application running and just upgraded a node version.

In any case I would say that a default: respectEnv=false should be treated as semver:minor.
A default of respectEnv=true as semver:major.

@jasnell

This comment has been minimized.

Copy link
Member

@jasnell jasnell commented Sep 2, 2016

Not saying this shouldn't be done, but there is a bit of a security risk inherent in this. Using an environment variable, it would be possible for a rogue module to set the environment variable discreetly causing traffic to be redirected through the proxy without the developers/users awareness.

@silverwind

This comment has been minimized.

Copy link
Contributor Author

@silverwind silverwind commented Sep 2, 2016

@jasnell Pretty sure something similar could be achieved by monkey-patching http right now. In the end, you just have to trust your modules.

@jasnell

This comment has been minimized.

Copy link
Member

@jasnell jasnell commented Sep 2, 2016

Yep, as I said, I didn't say it shouldn't be done ;-) If we do it tho, we need to make sure the risks are well documented.

@mscdex

This comment has been minimized.

Copy link
Contributor

@mscdex mscdex commented Sep 2, 2016

-1 as I think this is something that is better handled in userland

@silverwind

This comment has been minimized.

Copy link
Contributor Author

@silverwind silverwind commented Sep 2, 2016

This would depend on proxy support in the agent, something which #1490 looks to have been for, but was closed for some reason.

Also to quote @sindresorhus from sindresorhus/got#79:

I strongly believe this is something that should be a part of Node.js and not every module doing HTTP

@tjwebb

This comment has been minimized.

Copy link

@tjwebb tjwebb commented Dec 5, 2016

it would be possible for a rogue module to set the environment variable discreetly causing traffic to be redirected through the proxy

A Rogue module can monkey-patch all the methods in the http module if it wants to. Rogue modules are their own separate problem that I don't think we can solve here.

-1 as I think this is something that is better handled in userland

I don't think the structure of a network which is beyond my control qualifies as a userland problem. My OS deals with any proxy I may or may not need to connect through so that each application doesn't have to deal with this. Does this example analogy not hold for nodejs core? If not, do we still want to continue burdening every module using http with implementing this option?

@bnoordhuis

This comment has been minimized.

Copy link
Member

@bnoordhuis bnoordhuis commented Dec 5, 2016

Related: #1490

To summarize: "Proxy support?" "Not saying no but..."

@matthewwiesen

This comment has been minimized.

Copy link

@matthewwiesen matthewwiesen commented Dec 5, 2016

I would agree that ultimately this is something that should be handled directly within node.

Node should be read my environment variables to see that I am starting the node executable with my http_proxy environment variable and thus I want all http requests to go through my proxy.

curl, npm, git, etc, all respect these environment variables by default. This is the purpose of the environment. The user has already specified these within their environment, so the fact that they aren't being respected is very confusing.

It shouldn't be left up to a non-node-core library developer to decide to support my proxy environment by properly configuring node supported HTTP module with my environment variables or via a configuration to be passed into their library because ultimately I may not be directly using those modules, such as in the case of using a framework that includes a library that includes a library that interfaces with the HTTP module. Thus this "userland" solution essentially creates a recursive issue through all dependencies which is much harder to solve in all cases.

Since the node HTTP library is at the core of this, it should solve this issue by respecting the environment variables used at run time and override other attempts where libraries / modules may try to set these settings itself unless some other environment variable is passed to allow for this.

@bnoordhuis

This comment has been minimized.

Copy link
Member

@bnoordhuis bnoordhuis commented Dec 6, 2016

@matthewwiesen The counterargument to your argument is that curl, npm and git are all end user programs, whereas node is a platform.

A better comparison is python: the builtin httplib doesn't respect http_proxy, that is left to user libraries. Python, unlike node, has a strong "batteries included" philosophy, so that is saying something.

Also: big slippery slope. Yes, curl respects http_proxy... but it also honors all_proxy, no_proxy with patterns and wildcards, and happily parses your .netrc with every connection. I don't think that has a place in core. Core is for mechanism, not policy.

@matthewwiesen

This comment has been minimized.

Copy link

@matthewwiesen matthewwiesen commented Dec 6, 2016

Looking at the Python, it seems that the documentation does mention respecting http_proxy in regard to the core urllib.request module:

https://docs.python.org/3.4/library/urllib.request.html?highlight=http_proxy

In addition, if proxy settings are detected (for example, when a *_proxy environment variable like http_proxy is set), ProxyHandler is default installed and makes sure the requests are handled through the proxy.

Although, this page does mention to "users" that is recommended for a higher-level http client interface such as the Requests module, which seems to also support the environment variables:

http://docs.python-requests.org/en/master/user/advanced/#proxies

Looking at Go, it seems to be pretty similar, it seem to be supported within their Net HTTP Transport module:
https://golang.org/src/net/http/transport.go?h=http_proxy

Looking at Ruby, this seems to be supported as well:
https://github.com/ruby/ruby/blob/f845a9ef76c0195254ded79c85c24332534f4057/lib/net/http.rb#L638

Unless I'm mistaken on these sources.

@bnoordhuis

This comment has been minimized.

Copy link
Member

@bnoordhuis bnoordhuis commented Dec 6, 2016

I picked httplib because it's the python counterpart to node's http module. urllib is an 'open any kind of url' toolkit - useful, but without a node equivalent.

requests is actually a good example of what I mean. It's the python sister to request. request supports proxies, tunnels, auth, etc; it's the go-to package for anyone making http requests.

I don't see a compelling reason to duplicate that functionality in core when a best-of-breed solution exists and is in widespread use.

@sam-github

This comment has been minimized.

Copy link
Member

@sam-github sam-github commented Dec 6, 2016

node's focus is implementing HTTP mechanism well, not policies, to enable diversity and iterative improvement of user-facing HTTP client libraries like https://www.npmjs.com/package/fetch and https://www.npmjs.com/package/request. It happens that node core http is reasonably useable as-is, but if you hit its limitations, I suggest you use something else.

Since node's APIs cannot be changed/improved without causing great difficulties, because they can't be properly versioned, unlike userland modules, we should be very, very cautious in adding functionality. The "widely useful" test is not sufficient, it needs to be "doing it in userland causes real problems". node comes with npm included - its a great way to select the API and version of that API that you want to use that has the features you need.

@matthewwiesen

This comment has been minimized.

Copy link

@matthewwiesen matthewwiesen commented Dec 6, 2016

So just to be clear, the proposed solution here is to:

  1. Review application frameworks/modules/libraries that one uses
  2. If the framework/module/library doesn't provide support for proxy environment settings, open an issue with the author.
  3. They become aware of proxy environments and then need to review their own code and dependencies to see what is causing the issue and why the proxy environment is not being respected.
  4. This leads to potentially more issues being opened as we travel to upstream modules/libraries to support this to downstream module/library authors.
  5. Keep going until we get to the root cause, which leads us to likely one of two scenarios:
    1. Proxy environment is supported, but is either not configured properly through dependency chain to support it back through all libraries to the end user.
    2. Proxy environment is not supported at all and we need to get everyone on board to support this.

This ultimately leads toward educating a significant sector of the Node community to HTTP proxy environments, when this is something that specific to how HTTP requests should be handled within a specific user environment.

Why should we go to every node module author to tell them that we can't use their framework/module/library because at the end of the day somewhere something is wraping the node HTTP request and that they specifically didn't support the HTTP Proxy environment because in all likelyhood they were not aware of this use case since probably over 95% of the node community doesn't have an environment like this. They likely aren't aware of running their code within a proxy environment and thus did not allow for or implement pass-thru support to allow for an end user to configure their code to ultimately configure Node HTTP to work properly in an environment where access to the external internet must go through a HTTP Proxy.

@bnoordhuis To address some of your comments from here:

I don't think proxy support in core is completely out of the question but it's a bit of a slippery slope.

Speaking from prior experience where proxy support was added to an existing product, you start with HTTP CONNECT but you end up supporting everything from SOCKSv5 to HTTP POST, basic/digest/and-so-on auth, Kerberos, NTLMv1 and v2, custom CA chains, client certificates, certificate fingerprinting, etc., etc.

There is no strict need to add it to core either because perfectly good user-land solutions exist.

I don't see how allowing HTTP proxy will suddenly open the flood gate to supporting all of those items you mention.

  • SOCKS5 operates at a lower level than HTTP, so why would this be supported?
  • NTLM/Kerberos/etc - These are their own protocols, so why would these be supported?

Regarding the SSL Certificates:

I will say that I think supporting internal CA certificates should also be supported since ultimately CA certificates themselves are already supported within Nodes HTTPS module here via the TLS module here here, which indicates that when the ca value is "omitted several well known "root" CAs (like VeriSign) will be used", which are included via openssl by default.

The same issue as the proxy environment occurs in this scenario where with custom internal Root CA certificates when there is not an easy way to inject the internal CA certificate chains at run time via an environment variable so that the TLS module will augment its default CA and append my internal CA certs to the existing chain it already supports. But what is required today is that we necessitate that all downstream modules play nice to allow for easily hooking into and passing these configuration options through the full chain so that my HTTPS requests to sites secured with an internal CA issued certificate will validate properly. Or there is always the insecure NODE_TLS_REJECT_UNAUTHORIZED=0 work around to just not validate my internal CA issued certificates if there may be no way for me configure/pass my Root CA certificate through the various dependency trees of my frameworks / libraries which may have not thought to support this since over 95% of the node community doesn't run their own Certificate Authority.

Finally, in regard to security with respect to the "rogue module" scenario, wouldn't implementing this feature in core prevent this?

If node HTTP was by itself setting the proxy settings via the user supplied environment variable and essentially rejecting configuration options passed to it when an environment variable is available, then there would be no way for a "rogue module" siting between the end users code and the Node HTTP module to manipulate this.

I agree that supporting these changes, does seem to be breaking with the current behavior of how things work today, but :

  1. Probably over 95% of users wouldn't be affected by this to begin with.
  2. This greatly simplifies HTTP proxy and internal CA chain management by allowing end users to directly configure / override current behaviors to actually support their environment by default.
  3. These changes could be rolled in over time to give people who run their node applications/scripts with a HTTP proxy environment variable set, would be presented a warning to indicate that a change in behavior of this will occur and inform that there is a way to revert this behavior via a separate environment variable (e.g. NODE_HTTP_PROXY_ENV_VARIABLE=0) or something of that nature.

Personally, I would think that if someone where starting their application/script with their HTTP proxy environment set, then it is not unreasonable to assume that this is what they want since they configured this. If they did not want to proxy their requests, they could always do what they do now, by not setting the HTTP proxy and directly configuring this within their various libraries / modules or directly via the node HTTP lib.

@silverwind

This comment has been minimized.

Copy link
Contributor Author

@silverwind silverwind commented Dec 6, 2016

I will say that I think supporting internal CA certificates should also be supported

Support for custom CAs has landed in #9139. It's not released yet, but should be available in the next minor 7.x release, and it'll possibly be backported some point after that.

@sam-github

This comment has been minimized.

Copy link
Member

@sam-github sam-github commented Dec 6, 2016

To be clear, your straw man procedure:

  1. Review application frameworks/modules/libraries that one uses
  2. If the framework/module/library doesn't provide support for proxy environment settings, open an issue with the author.
    ... etc, etc

is just that, a strawman. The actual procedure is:

  1. use request or fetch when making client requests
  2. if you find a module not doing that, ask them why they aren't
  3. if for some reason authors have a compelling reason to code directly at the HTTP layer using the node HTTP APIs... they are responsible for that choice, and thankfully, node allows them to do exactly that, its APIs allow quite flexible use of HTTP.

Also, I suggest that 95% of users are probably using request already, so are already unaffected by this.

What exact problem did you have that led you to think Node should do this in core?

Did you not know that node encourages the use of npm modules, wrote code using http, and your code didn't work, and that surprised you? If so, we can strengthen the docs on this.

Or did you find a third party module that was coding directly to the http API, and it didn't work? If so, we should still strengthen the docs, and an issue should be opened with that third party module.

@bnoordhuis

This comment has been minimized.

Copy link
Member

@bnoordhuis bnoordhuis commented Dec 6, 2016

I don't see how allowing HTTP proxy will suddenly open the flood gate to supporting all of those items you mention.

That's simply how such things play out when a piece of software is popular. It's only a matter of time before someone requests them because "feature $x doesn't work for me because you don't support feature $y."

To answer your specific questions: SOCKS because people use it (request supports it although I don't know if it does DNS-over-SOCKS), NTLM and Kerberos because many proxies require authenticated connections.

@matthewwiesen

This comment has been minimized.

Copy link

@matthewwiesen matthewwiesen commented Dec 6, 2016

@silverwind great to hear this is being included. Looking at #9139 this seems like good news on this front. Glad to see there is agreement there.

@sam-github The example that I am providing is that:

I would like to use a Node Web Application Framework, this application is dependent upon lots of modules. Those modules are dependent upon modules, which are dependent upon modules and ultimately we arrive at https://www.npmjs.com/package/got, which is were the we see the issue.

got appears to directly interface with HTTP, but doesn't appear to support HTTP Proxy itself and bills itself as stripped down version of request. However, got appears to be somewhat popular with 3.5 million downloads over the last month, but it appears that 645 other modules are use got as a direct dependency.

Now you say that request is depended upon 95% of the base, but npm shows it gets installed 18.5 million times per month and 19k modules are dependent upon it. While got appears to be in the minority here, it doesn't seem that insignificant either.

So what are my options here, go to the got authors and ask them to support HTTP Proxy?
Go to the dependencies that use got and tell them that they should switch to request to support HTTP Proxy?

If got ends up supporting HTTP proxy, do they do as I suggest and make this simple by looking for the environment variable (much in the same way that request does this or do they implement this in some other way to require passing configuration options through code so that I need to get modules that depend on this to support these new configuration options?

Great we solved this for got now, but what about the next module that breaks this convention?

You see how to essentially support HTTP proxy we are forcing this to be done on every module? Or the solution is just to not use that module?

I don't see how that is a solution at all.

I certainly don't see a straw man here as since I'm not misrepresenting or exaggerating the options here.

If anything the opposing argument I see here is that if we implement HTTP proxy then it opens the flood gate, which certainly an assumption and definitely not true.

@sam-github

This comment has been minimized.

Copy link
Member

@sam-github sam-github commented Dec 6, 2016

Thanks @matthewwiesen , that helps a lot to give context.

And I'm still not in agreement, and somewhat baffled. got exists exactly because node core doesn't support:

following redirects, promises, streams, retries, automagically handling gzip/deflate and some convenience options. Created because request is bloated (several megabytes!).

Why is this one single feature (that is missing from got), proxy support, something that should be merged into node core, and the laundry lists of features above something that should not? Where would you and @sindresorhus draw the line? I draw the line at: must be implemented in the Node HTTP layer to enable user land to efficiently implement friendly APIs on top. By which categorization, proxy, redirects, etc. are out.

Particularly when the very existence of got is a reaction to request having "too many" features, you can see how one person's "bloat" is another's "essential", and how much node should strive to not have any unnecessary features, so it can never be bloated.

I'd say, the bare-bones nature of node's HTTP enables fetch/request/got to all have their own opinions on what features are bloat, and what are essential. If got thinks proxy support (or following redirects) is essential, then implement it.

got can get together with fetch (https://github.com/andris9/fetch/issues/23) and maybe even request, and share the same code, code which may already exist: sindresorhus/got#79 (comment)

@matthewwiesen

This comment has been minimized.

Copy link

@matthewwiesen matthewwiesen commented Dec 7, 2016

This brings me to my original request.

If a user has explicitly defined the HTTP proxy environment variable in the environment that the node processes is started, when would it ever be the desired behavior that this is not respected?

This is the main reason why I believe this should be embedded in core, so that this is enforced as per the users explicit request by setting the HTTP proxy environment variable within the shell to ensure that ALL HTTP requests made through node pass through this user's proxy since if this is not done undesirable effects will occur.

I believe this fits into the similar use case of why it is important to allow the user to define their own custom CA certificates within an environment variable and I agree with the point you made here.

Ultimately this is necessary in core to ensure that node HTTP behaves in a way that is conducive to the corporate/enterprise environment in a consistent/enforceable fashion, so that the HTTP Proxy is honored regardless of downstream module authors since they may not support this because they are not familiar with proxy environments which ultimately leads to their modules not behaving correctly within the corporate environment where a proxy is essentially standard. This will ultimately mean that requests they expect to function normally will be blocked because they are not routed through the HTTP Proxy as defined by the user's environment variables.

The ultimate desired effect from the user's perspective is that there exists a need for ALL HTTP requests honor the proxy environment since it is undesirable where when the HTTP Proxy environment variable is set and is not enforced as this leads to undesirable behaviors with the HTTP environment where requests will not function unless all downstream modules that interface with Node HTTP do so where the proxy environment is ultimately configured, which begs the question as to why would we not want this in core?

It is because of this which is why I propose the HTTP proxy to be included into core so that this is the default effect when a user supplies the HTTP proxy environment.

As for all other HTTP customization/configurations these are not relevant to core.

If the user did not want to enforce the HTTP Proxy in this proposed way, it would be reasonable that the user not define their HTTP Proxy environment variable and leave this unset.

Edit:
Fixed some spelling/typos being on mobile.

@sam-github

This comment has been minimized.

Copy link
Member

@sam-github sam-github commented Dec 7, 2016

@nodejs/ctc needs more input. I'd be interested in knowing from the authors of request (@mikeal, @simov ), fetch (@andris9), and got (@sindresorhus).

I'm half convinced there should be a canonical github.com/nodejs/http-proxy or something of the like that can be used by down stream HTTP client APIs, still not so convinced that it should be in node core, would like to hear more from implementors.

@andris9

This comment has been minimized.

Copy link

@andris9 andris9 commented Dec 7, 2016

Just to be sure, did you had node-fetch in mind instead of my fetch which is some pretty old code and not used so much?

I don't know a lot about HTTP proxies but I have added proxy support to Nodemailer for SMTP connections (docs here). Basically I added a new method getSocket that by default opens a new socket against provided host and port but can be overriden to provide an existing socket from somewhere else.

@matthewwiesen

This comment has been minimized.

Copy link

@matthewwiesen matthewwiesen commented Aug 9, 2017

@silverwind Created #14705 to track that separately of this.

@ghost

This comment has been minimized.

Copy link

@ghost ghost commented Sep 20, 2017

any updates? :D

@martinheidegger

This comment has been minimized.

Copy link

@martinheidegger martinheidegger commented Sep 22, 2017

Since this issue has become hard to oversee, is it okay if I open another issue that lists #8381 (comment) as a step-by-step guide for possible contributions/contributors?

@bnoordhuis

This comment has been minimized.

Copy link
Member

@bnoordhuis bnoordhuis commented Sep 25, 2017

@martinheidegger Yes, good idea.

I'll close out this issue. As you say, it's become rather long and meandering.

@martinheidegger

This comment has been minimized.

Copy link

@martinheidegger martinheidegger commented Sep 26, 2017

Opened #15620 as a follow-up.

@dejayc

This comment has been minimized.

Copy link

@dejayc dejayc commented Feb 22, 2018

Not sure if anyone will see this, but I wanted to chime in that the Node ecosystem of HTTP modules has become rather obnoxious. Writing a React app from behind a proxy has become a real pain.

Many popular modules have hard-coded dependencies upon other HTTP modules, such as cross-fetch, isomorphic-fetch, axios, etc. Those modules, more often than not, do not support HTTP_PROXY at all, or if they do, don't support no_proxy, etc.

So, what to do in this case? Hit up every module author, and hope that somehow they "eventually" get around to fixing the issue? Install some temporary workaround that someone published to npm because the module has effectively become abandonware? And what to do when the workaround itself becomes abandonware?

I've spent several days chasing down red herrings and dead ends, trying to find modules that support isomorphic HTTP support for proxies. While yes, Node engineers will ultimately disclaim responsibility for "userland concerns" such as proxies, the result is that Node itself becomes a shining paradigm of austerity surrounded by steaming piles of module garbage. Congrats on that accomplishment!

@matthewwiesen

This comment has been minimized.

Copy link

@matthewwiesen matthewwiesen commented Feb 23, 2018

@martinheidegger

This comment has been minimized.

Copy link

@martinheidegger martinheidegger commented Feb 23, 2018

@dejayc + @matthewwiesen I understand your frustration and feel for your need to vent, but there is a path laid out for people wanting to contribute: #15620

@cztomsik

This comment has been minimized.

Copy link

@cztomsik cztomsik commented Jul 17, 2018

@dejayc If you have (or can somehow get) socks5 proxy, you can workaround this with tsocks unless you're on windows then it's much harder (I don't really recall but it was somehow possible with WideCap).

Anyway, it would be really great if there was a support in node.js for http_proxy env var or at least some globally installable module which does monkey-patch http (I don't know about any yet and I don't have time doing it so tsocks is fine for me now)

@AndyOGo

This comment has been minimized.

Copy link

@AndyOGo AndyOGo commented Aug 27, 2018

Why is this closed?
I still have proxy issues with node 10 on windows.

@dejayc

This comment has been minimized.

Copy link

@dejayc dejayc commented Sep 4, 2018

Because it's not Node's problem, from what I understand.

@martinheidegger

This comment has been minimized.

Copy link

@martinheidegger martinheidegger commented Sep 4, 2018

@AndyOGo @dejayc See #15620

@robophil

This comment has been minimized.

Copy link

@robophil robophil commented Apr 25, 2019

Would be a "nice to have". It's respected by request but there are a fair amount of other modules for making http requests out there 🤷‍♂️

@gajus

This comment has been minimized.

Copy link

@gajus gajus commented Apr 25, 2019

For Node.js v12 and above you can use https://github.com/gajus/global-agent.

For older version of Node.js you can use https://www.npmjs.com/package/global-tunnel-ng.

@cawoodm

This comment has been minimized.

Copy link

@cawoodm cawoodm commented Dec 4, 2019

So... does node-fetch support no_proxy or not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.