New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for http_proxy and https_proxy variables #94

Closed
ndmitchell opened this Issue Jan 13, 2015 · 19 comments

Comments

Projects
None yet
4 participants
@ndmitchell
Contributor

ndmitchell commented Jan 13, 2015

The standard is that the environment variables http_proxy and https_proxy respectively supply values for proxy for http/https connections. It would be good if that was easily configurable, without going as far as adding calls to getEnv/proxy explicitly.

@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Jan 14, 2015

Owner

I'd imagine that setting an option when creating the manager as to whether to use these variables is the right way forward. I don't think respecting these variables as a default is a good call, but having it as an option does make sense.

Owner

snoyberg commented Jan 14, 2015

I'd imagine that setting an option when creating the manager as to whether to use these variables is the right way forward. I don't think respecting these variables as a default is a good call, but having it as an option does make sense.

snoyberg added a commit that referenced this issue Jan 14, 2015

@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Jan 14, 2015

Owner

@ndmitchell Can you check out the commit I just pushed? I'm not set up to easily test proxies right now, so I'd appreciate if you give this a shot.

Owner

snoyberg commented Jan 14, 2015

@ndmitchell Can you check out the commit I just pushed? I'm not set up to easily test proxies right now, so I'd appreciate if you give this a shot.

@ndmitchell

This comment has been minimized.

Show comment
Hide comment
@ndmitchell

ndmitchell Jan 14, 2015

Contributor

@snoyberg thanks for the quick response, I'll certainly give it a go, although I'm wondering how... I'm currently using http-conduit, as when I gave http-client a quick go it didn't transparently and easily download https files (it looked like I needed another package and that I needed to explicitly pick https, which seems overkill). For reference, what I'm really after is a function downloadFile, which I've written as:

import qualified Network.HTTP.Conduit as C
import qualified Data.Conduit as C

downloadFile :: FilePath -> String -> IO ()
downloadFile file url = withSocketsDo $ do
    request <- C.parseUrl url
    C.withManager $ \manager -> do
        response <- C.http request manager
        C.responseBody response C.$$+- sinkFile file

Is http-conduit deprecated in favor of http-client? If so, how do you write a similarly http/https agnostic function in http-client? Perhaps downloadFile would be a nice example - since once the file is downloaded you can always do a readFile if you absolutely must.

For reference, the original code is at: https://github.com/ndmitchell/hogle/blob/master/src/Web.hs

Contributor

ndmitchell commented Jan 14, 2015

@snoyberg thanks for the quick response, I'll certainly give it a go, although I'm wondering how... I'm currently using http-conduit, as when I gave http-client a quick go it didn't transparently and easily download https files (it looked like I needed another package and that I needed to explicitly pick https, which seems overkill). For reference, what I'm really after is a function downloadFile, which I've written as:

import qualified Network.HTTP.Conduit as C
import qualified Data.Conduit as C

downloadFile :: FilePath -> String -> IO ()
downloadFile file url = withSocketsDo $ do
    request <- C.parseUrl url
    C.withManager $ \manager -> do
        response <- C.http request manager
        C.responseBody response C.$$+- sinkFile file

Is http-conduit deprecated in favor of http-client? If so, how do you write a similarly http/https agnostic function in http-client? Perhaps downloadFile would be a nice example - since once the file is downloaded you can always do a readFile if you absolutely must.

For reference, the original code is at: https://github.com/ndmitchell/hogle/blob/master/src/Web.hs

@creichert

This comment has been minimized.

Show comment
Hide comment
@creichert

creichert Jan 15, 2015

Contributor

I'm not snoyberg but I can try to help.

http-conduit builds on top of http-client to support streaming. It also happens to use the extra package you mentioned, http-client-tls. All you really need to support http/https would be tlsManagerSettings. This is the same ManagerSettings http-conduit uses (e.g. conduitManagerSettings): https://github.com/snoyberg/http-client/blob/master/http-conduit/Network/HTTP/Conduit.hs#L283

import qualified Data.ByteString.Lazy as B
import qualified Network.HTTP.Client.TLS as H
import qualified Network.HTTP.Client as H


downloadFile :: FilePath -> String -> IO ()
downloadFile file url = do
    request <- H.parseUrl url
    H.withManager H.tlsManagerSettings $ \manager -> do
      response <- H.httpLbs request manager
      B.writeFile file $ H.responseBody response


main :: IO ()
main = downloadFile "DOWNLOAD" uri
  where
    uri = "http://github.com/snoyberg/http-client/blob/master/README.md"


Hope that helps!

(I fixed a doc type in #97 which may have been confusing the http-client-tls package name)

Contributor

creichert commented Jan 15, 2015

I'm not snoyberg but I can try to help.

http-conduit builds on top of http-client to support streaming. It also happens to use the extra package you mentioned, http-client-tls. All you really need to support http/https would be tlsManagerSettings. This is the same ManagerSettings http-conduit uses (e.g. conduitManagerSettings): https://github.com/snoyberg/http-client/blob/master/http-conduit/Network/HTTP/Conduit.hs#L283

import qualified Data.ByteString.Lazy as B
import qualified Network.HTTP.Client.TLS as H
import qualified Network.HTTP.Client as H


downloadFile :: FilePath -> String -> IO ()
downloadFile file url = do
    request <- H.parseUrl url
    H.withManager H.tlsManagerSettings $ \manager -> do
      response <- H.httpLbs request manager
      B.writeFile file $ H.responseBody response


main :: IO ()
main = downloadFile "DOWNLOAD" uri
  where
    uri = "http://github.com/snoyberg/http-client/blob/master/README.md"


Hope that helps!

(I fixed a doc type in #97 which may have been confusing the http-client-tls package name)

@creichert

This comment has been minimized.

Show comment
Hide comment
@creichert

creichert Jan 15, 2015

Contributor

Also, worth mentioning my comment is unrelated to the new proxy env patch. I'm not exactly sure how that fits here as most of @snoyberg's changes are focused in http-client.

Contributor

creichert commented Jan 15, 2015

Also, worth mentioning my comment is unrelated to the new proxy env patch. I'm not exactly sure how that fits here as most of @snoyberg's changes are focused in http-client.

@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Jan 15, 2015

Owner

@creichert is right. To clarify one specific point: http-conduit is not deprecated, it provides a specific higher-level wrapping around http-client, but other wrappings are possible as well. It's also possible to use the lower-level interface directly if desired. The code you have for downloading a file looks correct to me.

Owner

snoyberg commented Jan 15, 2015

@creichert is right. To clarify one specific point: http-conduit is not deprecated, it provides a specific higher-level wrapping around http-client, but other wrappings are possible as well. It's also possible to use the lower-level interface directly if desired. The code you have for downloading a file looks correct to me.

@ndmitchell

This comment has been minimized.

Show comment
Hide comment
@ndmitchell

ndmitchell Jan 15, 2015

Contributor

I should note that curl and wget obey the http_proxy variables by default, so it wouldn't be an unreasonable default to have http-client do the same. As it is, I suspect as a proxy user I'm going to have to email each package author and ask "please use the proxy thingy" since it won't work unless they think to enable proxies, and no one will. On to the code:

  • My http_proxy was set to ukproxy1.uk.standardchartered.com:8080. Note the omission of a leading http:// which causes Manager.hs line 474 to raise an exception. I suggest you allow either http:// or missing.
  • After changing the variable, I get the error message FailedConnectionException2 "ukproxy1.uk.standardchartered.com" 8080 False getAddrInfo: does not exist (error 10093). I confirm I can ping that address. I tried substituting in the IP address, and that doesn't help. I moved to requesting http://example.com, and using defaultManagerSettings and an explicit proxy on the request, but all gave the same error (or with the IP address instead of the proxy name). I tried getting the same page using wget and it worked automatically. I don't think that's anything to do with this patch, but it's still a problem.
Contributor

ndmitchell commented Jan 15, 2015

I should note that curl and wget obey the http_proxy variables by default, so it wouldn't be an unreasonable default to have http-client do the same. As it is, I suspect as a proxy user I'm going to have to email each package author and ask "please use the proxy thingy" since it won't work unless they think to enable proxies, and no one will. On to the code:

  • My http_proxy was set to ukproxy1.uk.standardchartered.com:8080. Note the omission of a leading http:// which causes Manager.hs line 474 to raise an exception. I suggest you allow either http:// or missing.
  • After changing the variable, I get the error message FailedConnectionException2 "ukproxy1.uk.standardchartered.com" 8080 False getAddrInfo: does not exist (error 10093). I confirm I can ping that address. I tried substituting in the IP address, and that doesn't help. I moved to requesting http://example.com, and using defaultManagerSettings and an explicit proxy on the request, but all gave the same error (or with the IP address instead of the proxy name). I tried getting the same page using wget and it worked automatically. I don't think that's anything to do with this patch, but it's still a problem.
@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Jan 16, 2015

Owner

I'm willing to reconsider setting the default to using the environment variables, but I need some time to think about it.

The getAddrInfo error message usually comes from a missing withSocketsDo. Just a sanity check: is that used in your example?

Owner

snoyberg commented Jan 16, 2015

I'm willing to reconsider setting the default to using the environment variables, but I need some time to think about it.

The getAddrInfo error message usually comes from a missing withSocketsDo. Just a sanity check: is that used in your example?

@ndmitchell

This comment has been minimized.

Show comment
Hide comment
@ndmitchell

ndmitchell Jan 16, 2015

Contributor

Take your time, changing the defaults should never be done lightly.

I had missed withSocketsDo, with that it all works. A few months ago me and @fegu looked into the problem, and realised withSocketsDo is totally pointless, and could be made unnecessary with about 2 additional instructions in a not-very-hot path. I've pinged him so we can try and turn that into a patch and get it upstream.

Contributor

ndmitchell commented Jan 16, 2015

Take your time, changing the defaults should never be done lightly.

I had missed withSocketsDo, with that it all works. A few months ago me and @fegu looked into the problem, and realised withSocketsDo is totally pointless, and could be made unnecessary with about 2 additional instructions in a not-very-hot path. I've pinged him so we can try and turn that into a patch and get it upstream.

@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Jan 16, 2015

Owner

Cool, sounds good. Regarding the default behavior... I'll write an email to the cafe, probably next week. Since I'm about to get on a plane, sending the email now is just begging for pain. I'll leave this issue open until we decide on that and this code is released to Hackage.

Owner

snoyberg commented Jan 16, 2015

Cool, sounds good. Regarding the default behavior... I'll write an email to the cafe, probably next week. Since I'm about to get on a plane, sending the email now is just begging for pain. I'll leave this issue open until we decide on that and this code is released to Hackage.

@nh2

This comment has been minimized.

Show comment
Hide comment
@nh2

nh2 Jan 19, 2015

I should note that curl and wget obey the http_proxy variables by default

I would be careful to compare compare program binaries and libraries. That said, libcurl and Python's urllib seem to support the http_proxy as well, so http-client doing this wouldn't be unexpected.

nh2 commented Jan 19, 2015

I should note that curl and wget obey the http_proxy variables by default

I would be careful to compare compare program binaries and libraries. That said, libcurl and Python's urllib seem to support the http_proxy as well, so http-client doing this wouldn't be unexpected.

snoyberg added a commit that referenced this issue Jan 21, 2015

Change default proxy settings #94
Use environment variables if available
@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Jan 21, 2015

Owner

I received only positive feedback regarding changing the default, so I've gone ahead with that. If others could do a sanity check of the current master, I'd appreciate it. I'll release in the next few days if there are no bug reports.

Owner

snoyberg commented Jan 21, 2015

I received only positive feedback regarding changing the default, so I've gone ahead with that. If others could do a sanity check of the current master, I'd appreciate it. I'll release in the next few days if there are no bug reports.

@ndmitchell

This comment has been minimized.

Show comment
Hide comment
@ndmitchell

ndmitchell Jan 22, 2015

Contributor

I found two issues:

  • localhost:8000 is considered an invalid proxy because it doesn't have a leading http://. This should be added by default (like curl and wget do).
  • If the https_proxy variable is set to something bogus, and I only do http:// requests, I still get a failure (probably when constructing the manager). Should the parsing be lazy? Or should the manager fail then? Not sure if it matters either way, but something to be aware of.
Contributor

ndmitchell commented Jan 22, 2015

I found two issues:

  • localhost:8000 is considered an invalid proxy because it doesn't have a leading http://. This should be added by default (like curl and wget do).
  • If the https_proxy variable is set to something bogus, and I only do http:// requests, I still get a failure (probably when constructing the manager). Should the parsing be lazy? Or should the manager fail then? Not sure if it matters either way, but something to be aware of.
@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Jan 22, 2015

Owner

I like the current behavior regarding an invalid environment variable. I would rather find out reliably and early that there's a problem with the Manager, rather than waiting until I actually try to use it.

I just pushed a commit that (hackily) fixes the first point.

Owner

snoyberg commented Jan 22, 2015

I like the current behavior regarding an invalid environment variable. I would rather find out reliably and early that there's a problem with the Manager, rather than waiting until I actually try to use it.

I just pushed a commit that (hackily) fixes the first point.

@ndmitchell

This comment has been minimized.

Show comment
Hide comment
@ndmitchell

ndmitchell Jan 22, 2015

Contributor

Your fix is incorrect, and still doesn't work. The problem is that parseURI succeeds on localhost:80, so your mplus doesn't get a chance to be activated.

I fixed the code in pull request #100 - it took me two attempts, this is tricky code! I wonder if it needs pulling out as a function so it can be tested independently?

Contributor

ndmitchell commented Jan 22, 2015

Your fix is incorrect, and still doesn't work. The problem is that parseURI succeeds on localhost:80, so your mplus doesn't get a chance to be activated.

I fixed the code in pull request #100 - it took me two attempts, this is tricky code! I wonder if it needs pulling out as a function so it can be tested independently?

snoyberg added a commit that referenced this issue Jan 22, 2015

@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Jan 22, 2015

Owner

Wow, that's a bit tedious. Thanks for fixing it.

Owner

snoyberg commented Jan 22, 2015

Wow, that's a bit tedious. Thanks for fixing it.

@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Jan 22, 2015

Owner

Oh, does that mean that I should release the code to Hackage?

Owner

snoyberg commented Jan 22, 2015

Oh, does that mean that I should release the code to Hackage?

@ndmitchell

This comment has been minimized.

Show comment
Hide comment
@ndmitchell

ndmitchell Jan 22, 2015

Contributor

Yes, as far as I'm concerned it's good to go.

Contributor

ndmitchell commented Jan 22, 2015

Yes, as far as I'm concerned it's good to go.

@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Jan 22, 2015

Owner

Cool, thanks. Releasing now.

Owner

snoyberg commented Jan 22, 2015

Cool, thanks. Releasing now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment