Support for http_proxy and https_proxy variables #94

Closed
ndmitchell opened this Issue Jan 13, 2015 · 19 comments

Projects

None yet

4 participants

@ndmitchell
Contributor

The standard is that the environment variables http_proxy and https_proxy respectively supply values for proxy for http/https connections. It would be good if that was easily configurable, without going as far as adding calls to getEnv/proxy explicitly.

@snoyberg
Owner

I'd imagine that setting an option when creating the manager as to whether to use these variables is the right way forward. I don't think respecting these variables as a default is a good call, but having it as an option does make sense.

@snoyberg snoyberg added a commit that referenced this issue Jan 14, 2015
@snoyberg Proxy environment variables #94 6e61db6
@snoyberg
Owner

@ndmitchell Can you check out the commit I just pushed? I'm not set up to easily test proxies right now, so I'd appreciate if you give this a shot.

@ndmitchell
Contributor

@snoyberg thanks for the quick response, I'll certainly give it a go, although I'm wondering how... I'm currently using http-conduit, as when I gave http-client a quick go it didn't transparently and easily download https files (it looked like I needed another package and that I needed to explicitly pick https, which seems overkill). For reference, what I'm really after is a function downloadFile, which I've written as:

import qualified Network.HTTP.Conduit as C
import qualified Data.Conduit as C

downloadFile :: FilePath -> String -> IO ()
downloadFile file url = withSocketsDo $ do
    request <- C.parseUrl url
    C.withManager $ \manager -> do
        response <- C.http request manager
        C.responseBody response C.$$+- sinkFile file

Is http-conduit deprecated in favor of http-client? If so, how do you write a similarly http/https agnostic function in http-client? Perhaps downloadFile would be a nice example - since once the file is downloaded you can always do a readFile if you absolutely must.

For reference, the original code is at: https://github.com/ndmitchell/hogle/blob/master/src/Web.hs

@creichert
Contributor

I'm not snoyberg but I can try to help.

http-conduit builds on top of http-client to support streaming. It also happens to use the extra package you mentioned, http-client-tls. All you really need to support http/https would be tlsManagerSettings. This is the same ManagerSettings http-conduit uses (e.g. conduitManagerSettings): https://github.com/snoyberg/http-client/blob/master/http-conduit/Network/HTTP/Conduit.hs#L283

import qualified Data.ByteString.Lazy as B
import qualified Network.HTTP.Client.TLS as H
import qualified Network.HTTP.Client as H


downloadFile :: FilePath -> String -> IO ()
downloadFile file url = do
    request <- H.parseUrl url
    H.withManager H.tlsManagerSettings $ \manager -> do
      response <- H.httpLbs request manager
      B.writeFile file $ H.responseBody response


main :: IO ()
main = downloadFile "DOWNLOAD" uri
  where
    uri = "http://github.com/snoyberg/http-client/blob/master/README.md"


Hope that helps!

(I fixed a doc type in #97 which may have been confusing the http-client-tls package name)

@creichert
Contributor

Also, worth mentioning my comment is unrelated to the new proxy env patch. I'm not exactly sure how that fits here as most of @snoyberg's changes are focused in http-client.

@snoyberg
Owner

@creichert is right. To clarify one specific point: http-conduit is not deprecated, it provides a specific higher-level wrapping around http-client, but other wrappings are possible as well. It's also possible to use the lower-level interface directly if desired. The code you have for downloading a file looks correct to me.

@ndmitchell
Contributor

I should note that curl and wget obey the http_proxy variables by default, so it wouldn't be an unreasonable default to have http-client do the same. As it is, I suspect as a proxy user I'm going to have to email each package author and ask "please use the proxy thingy" since it won't work unless they think to enable proxies, and no one will. On to the code:

  • My http_proxy was set to ukproxy1.uk.standardchartered.com:8080. Note the omission of a leading http:// which causes Manager.hs line 474 to raise an exception. I suggest you allow either http:// or missing.
  • After changing the variable, I get the error message FailedConnectionException2 "ukproxy1.uk.standardchartered.com" 8080 False getAddrInfo: does not exist (error 10093). I confirm I can ping that address. I tried substituting in the IP address, and that doesn't help. I moved to requesting http://example.com, and using defaultManagerSettings and an explicit proxy on the request, but all gave the same error (or with the IP address instead of the proxy name). I tried getting the same page using wget and it worked automatically. I don't think that's anything to do with this patch, but it's still a problem.
@snoyberg
Owner

I'm willing to reconsider setting the default to using the environment variables, but I need some time to think about it.

The getAddrInfo error message usually comes from a missing withSocketsDo. Just a sanity check: is that used in your example?

@ndmitchell
Contributor

Take your time, changing the defaults should never be done lightly.

I had missed withSocketsDo, with that it all works. A few months ago me and @fegu looked into the problem, and realised withSocketsDo is totally pointless, and could be made unnecessary with about 2 additional instructions in a not-very-hot path. I've pinged him so we can try and turn that into a patch and get it upstream.

@snoyberg
Owner

Cool, sounds good. Regarding the default behavior... I'll write an email to the cafe, probably next week. Since I'm about to get on a plane, sending the email now is just begging for pain. I'll leave this issue open until we decide on that and this code is released to Hackage.

@nh2
nh2 commented Jan 19, 2015

I should note that curl and wget obey the http_proxy variables by default

I would be careful to compare compare program binaries and libraries. That said, libcurl and Python's urllib seem to support the http_proxy as well, so http-client doing this wouldn't be unexpected.

@snoyberg snoyberg added a commit that referenced this issue Jan 21, 2015
@snoyberg Change default proxy settings #94
Use environment variables if available
dde918b
@snoyberg
Owner

I received only positive feedback regarding changing the default, so I've gone ahead with that. If others could do a sanity check of the current master, I'd appreciate it. I'll release in the next few days if there are no bug reports.

@ndmitchell
Contributor

I found two issues:

  • localhost:8000 is considered an invalid proxy because it doesn't have a leading http://. This should be added by default (like curl and wget do).
  • If the https_proxy variable is set to something bogus, and I only do http:// requests, I still get a failure (probably when constructing the manager). Should the parsing be lazy? Or should the manager fail then? Not sure if it matters either way, but something to be aware of.
@snoyberg
Owner

I like the current behavior regarding an invalid environment variable. I would rather find out reliably and early that there's a problem with the Manager, rather than waiting until I actually try to use it.

I just pushed a commit that (hackily) fixes the first point.

@ndmitchell
Contributor

Your fix is incorrect, and still doesn't work. The problem is that parseURI succeeds on localhost:80, so your mplus doesn't get a chance to be activated.

I fixed the code in pull request #100 - it took me two attempts, this is tricky code! I wonder if it needs pulling out as a function so it can be tested independently?

@snoyberg
Owner

Wow, that's a bit tedious. Thanks for fixing it.

@snoyberg
Owner

Oh, does that mean that I should release the code to Hackage?

@ndmitchell
Contributor

Yes, as far as I'm concerned it's good to go.

@snoyberg
Owner

Cool, thanks. Releasing now.

@snoyberg snoyberg closed this Jan 22, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment