New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network API #2189

Open
eed3si9n opened this Issue Aug 31, 2015 · 18 comments

Comments

Projects
None yet
6 participants
@eed3si9n
Member

eed3si9n commented Aug 31, 2015

For sbt 1.0, we'd like to have Network API that can download metadata and artifacts.

Some feature to consider:

  • Out of the box proxy support
  • Caching based on URL
    • Potentially take advantage of Sonatype HTTP headers
  • Connection pooling that works with https and redirection (repo.scala-sbt.org redirects to Bintray, which redirects to some CDN)
  • https://github.com/AsyncHttpClient/async-http-client/ wrapper?
@jsuereth

This comment has been minimized.

Show comment
Hide comment
@jsuereth

jsuereth Aug 31, 2015

Member

Also, think about resumable downloads (on failure). The aether connectors have example code for how to do that.

Member

jsuereth commented Aug 31, 2015

Also, think about resumable downloads (on failure). The aether connectors have example code for how to do that.

@dwijnand

This comment has been minimized.

Show comment
Hide comment
@dwijnand

dwijnand Aug 31, 2015

Member

"Smartness in terms of caching/Sonatype HTTP headers" what's meant by this? Google didn't come up with anything.

Member

dwijnand commented Aug 31, 2015

"Smartness in terms of caching/Sonatype HTTP headers" what's meant by this? Google didn't come up with anything.

@eed3si9n

This comment has been minimized.

Show comment
Hide comment
@eed3si9n

eed3si9n Aug 31, 2015

Member

"Smartness in terms of caching/Sonatype HTTP headers" what's meant by this?

This is an idea @jsuereth has been talking about for a while, but gist is that since downloader is specialized to sbt's usage, we want to add smart caching that's aware of what's mutable and what's immutable. Another hint we can use for caching is ETag and custom HTTP headers used by Sonatype and/or Bintray.

See for example:

$ curl -X HEAD -L -i https://oss.sonatype.org/content/repositories/snapshots/com/eed3si9n/treehugger_2.10/0.2.4-SNAPSHOT/maven-metadata.xml
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 273
Content-Type: application/xml
Date: Mon, 31 Aug 2015 20:30:27 GMT
ETag: "{SHA1{d08d5210d442b0aefd51379b3bcce0b94bfb2f68}}"
Last-Modified: Sun, 30 Aug 2015 23:03:42 GMT
Server: nginx
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
Connection: keep-alive

treehugger 0.2.4-SNAPSHOT has to be from circa 2013, but the Last-Modified says 2015, so we can't trust it. If SHA1 is correct, we could use the ETag.

Member

eed3si9n commented Aug 31, 2015

"Smartness in terms of caching/Sonatype HTTP headers" what's meant by this?

This is an idea @jsuereth has been talking about for a while, but gist is that since downloader is specialized to sbt's usage, we want to add smart caching that's aware of what's mutable and what's immutable. Another hint we can use for caching is ETag and custom HTTP headers used by Sonatype and/or Bintray.

See for example:

$ curl -X HEAD -L -i https://oss.sonatype.org/content/repositories/snapshots/com/eed3si9n/treehugger_2.10/0.2.4-SNAPSHOT/maven-metadata.xml
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 273
Content-Type: application/xml
Date: Mon, 31 Aug 2015 20:30:27 GMT
ETag: "{SHA1{d08d5210d442b0aefd51379b3bcce0b94bfb2f68}}"
Last-Modified: Sun, 30 Aug 2015 23:03:42 GMT
Server: nginx
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
Connection: keep-alive

treehugger 0.2.4-SNAPSHOT has to be from circa 2013, but the Last-Modified says 2015, so we can't trust it. If SHA1 is correct, we could use the ETag.

@dwijnand

This comment has been minimized.

Show comment
Hide comment
@dwijnand

dwijnand Aug 31, 2015

Member

Is there any concern on depending on async-http-client on the classpath? Same for probably Netty. I recently had to move away from it (I was using it via dispatch) in a Play library as Play 2.3 and Play 2.4 used AHC 1.8 and 1.9 which are binary incompatible (yeah..), and it looks like AHC is working towards a 2.0 which I'm assuming will be incompatible too.

Member

dwijnand commented Aug 31, 2015

Is there any concern on depending on async-http-client on the classpath? Same for probably Netty. I recently had to move away from it (I was using it via dispatch) in a Play library as Play 2.3 and Play 2.4 used AHC 1.8 and 1.9 which are binary incompatible (yeah..), and it looks like AHC is working towards a 2.0 which I'm assuming will be incompatible too.

@dwijnand

This comment has been minimized.

Show comment
Hide comment
@dwijnand

dwijnand Aug 31, 2015

Member

I meant concern WRT to possible, known sbt plugins that already depends on AHC.

Member

dwijnand commented Aug 31, 2015

I meant concern WRT to possible, known sbt plugins that already depends on AHC.

@eed3si9n

This comment has been minimized.

Show comment
Hide comment
@eed3si9n

eed3si9n Aug 31, 2015

Member

That is a major concern. But if sbt shipped with a Network API, maybe all the plugins can just use that and the problem might be resolved. Is that too optimistic?
I feel like bits downloading is something the community or at least our plugin ecosystem should just agree to use one implementation, and just stick with it for a while, similar to JSON ASTs.

Member

eed3si9n commented Aug 31, 2015

That is a major concern. But if sbt shipped with a Network API, maybe all the plugins can just use that and the problem might be resolved. Is that too optimistic?
I feel like bits downloading is something the community or at least our plugin ecosystem should just agree to use one implementation, and just stick with it for a while, similar to JSON ASTs.

@eed3si9n

This comment has been minimized.

Show comment
Hide comment
@eed3si9n

eed3si9n Aug 31, 2015

Member

@jroper, @richdougherty What's your opinion on this topic since you might be affected in more than one way? ▶️ 🌏

Member

eed3si9n commented Aug 31, 2015

@jroper, @richdougherty What's your opinion on this topic since you might be affected in more than one way? ▶️ 🌏

@dwijnand

This comment has been minimized.

Show comment
Hide comment
@dwijnand

dwijnand Aug 31, 2015

Member
  1. I assumed sbt's Network API would be a wrapper of a client for the benefit of optimising for artifacts/metadata IO, so not general purpose.
  2. Even then, sometimes the dependencies aren't direct (eg. plugin uses a library for talking to GitHub's API, which uses AHC version 1.8).
Member

dwijnand commented Aug 31, 2015

  1. I assumed sbt's Network API would be a wrapper of a client for the benefit of optimising for artifacts/metadata IO, so not general purpose.
  2. Even then, sometimes the dependencies aren't direct (eg. plugin uses a library for talking to GitHub's API, which uses AHC version 1.8).
@jroper

This comment has been minimized.

Show comment
Hide comment
@jroper

jroper Sep 1, 2015

Member

If using async-http-client, I'd want to shade it.

Member

jroper commented Sep 1, 2015

If using async-http-client, I'd want to shade it.

@dwijnand

This comment has been minimized.

Show comment
Hide comment
@dwijnand

dwijnand Sep 1, 2015

Member

Indeed I was thinking the same.

Member

dwijnand commented Sep 1, 2015

Indeed I was thinking the same.

@dwijnand

This comment has been minimized.

Show comment
Hide comment
@dwijnand

dwijnand Sep 5, 2015

Member

Ugh, looks like Sonatype snapshots keeps update maven metadata files, updating the lastUpdates field of the XML, so the Last-Modified and ETag are constantly changing even if no new publish was made..

<?xml version="1.0" encoding="UTF-8"?>
<metadata modelVersion="1.1.0">
  <groupId>com.eed3si9n</groupId>
  <artifactId>treehugger_2.10</artifactId>
  <version>0.2.4-SNAPSHOT</version>
  <versioning>
    <lastUpdated>20150902225502</lastUpdated>
  </versioning>
</metadata>
Member

dwijnand commented Sep 5, 2015

Ugh, looks like Sonatype snapshots keeps update maven metadata files, updating the lastUpdates field of the XML, so the Last-Modified and ETag are constantly changing even if no new publish was made..

<?xml version="1.0" encoding="UTF-8"?>
<metadata modelVersion="1.1.0">
  <groupId>com.eed3si9n</groupId>
  <artifactId>treehugger_2.10</artifactId>
  <version>0.2.4-SNAPSHOT</version>
  <versioning>
    <lastUpdated>20150902225502</lastUpdated>
  </versioning>
</metadata>
@dwijnand

This comment has been minimized.

Show comment
Hide comment
@dwijnand

dwijnand Sep 5, 2015

Member

I've explored HTTP response headers for the repos that we care about:

https://repo1.maven.org/maven2/
https://jcenter.bintray.com/

https://oss.sonatype.org/content/repositories/releases/
https://oss.sonatype.org/content/repositories/snapshots/

https://repo.typesafe.com/typesafe/releases/
https://repo.typesafe.com/typesafe/ivy-releases/

https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/

// A supported alternative
https://repo.typesafe.com/typesafe/maven-releases/

// These 3 don't actually exist, and can't because they redirect to Bintray which doesn't support snapshots
https://repo.typesafe.com/typesafe/snapshots/
https://repo.typesafe.com/typesafe/ivy-snapshots/
https://repo.scala-sbt.org/scalasbt/sbt-plugin-snapshots/

and recorded the responses and some notes on each: https://github.com/dwijnand/sbt-net/blob/master/notes.md

One of the surprising discoveries is that Last-Modified/If-Modified-Since doesn't work for the Typesafe repos, but does for the sbt plugin repo, despite both being redirects to Bintray. Even when using the underlying Bintray URL directly. For reference the jCenter Bintray repo does support it.

Member

dwijnand commented Sep 5, 2015

I've explored HTTP response headers for the repos that we care about:

https://repo1.maven.org/maven2/
https://jcenter.bintray.com/

https://oss.sonatype.org/content/repositories/releases/
https://oss.sonatype.org/content/repositories/snapshots/

https://repo.typesafe.com/typesafe/releases/
https://repo.typesafe.com/typesafe/ivy-releases/

https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/

// A supported alternative
https://repo.typesafe.com/typesafe/maven-releases/

// These 3 don't actually exist, and can't because they redirect to Bintray which doesn't support snapshots
https://repo.typesafe.com/typesafe/snapshots/
https://repo.typesafe.com/typesafe/ivy-snapshots/
https://repo.scala-sbt.org/scalasbt/sbt-plugin-snapshots/

and recorded the responses and some notes on each: https://github.com/dwijnand/sbt-net/blob/master/notes.md

One of the surprising discoveries is that Last-Modified/If-Modified-Since doesn't work for the Typesafe repos, but does for the sbt plugin repo, despite both being redirects to Bintray. Even when using the underlying Bintray URL directly. For reference the jCenter Bintray repo does support it.

@dwijnand

This comment has been minimized.

Show comment
Hide comment
@dwijnand

dwijnand Sep 5, 2015

Member

Please let me know if there are other repos I should care about.

Perhaps fetching from ~/.m2/repository/?

[Edit: Uhm.. I don't know what I was thinking.. there's no HTTP response headers when fetching from ~/.m2/repository/ -.-]

Member

dwijnand commented Sep 5, 2015

Please let me know if there are other repos I should care about.

Perhaps fetching from ~/.m2/repository/?

[Edit: Uhm.. I don't know what I was thinking.. there's no HTTP response headers when fetching from ~/.m2/repository/ -.-]

@eed3si9n

This comment has been minimized.

Show comment
Hide comment
@eed3si9n

eed3si9n Sep 6, 2015

Member

Please let me know if there are other repos I should care about.

Local installation of Artifactory and Nexus I guess.

Member

eed3si9n commented Sep 6, 2015

Please let me know if there are other repos I should care about.

Local installation of Artifactory and Nexus I guess.

@eed3si9n

This comment has been minimized.

Show comment
Hide comment
@eed3si9n

eed3si9n Oct 9, 2015

Member

Some interesting facts from sbtがpomやjarを解決する際の無駄なhttpアクセス storifying @tkawachi's analysis on sbt traffic. The following is my translation of what he tweeted:

  • when .pom is found, .pom.sh1 goes back to the same server, but for .jar, it will try from the first resolvers.
  • HEAD calls can add latency (depending on where you live)
  • redirection from repo.typesafe.com to dl.bintray.com adds around 200ms. there's no Keep-Alive
  • when grabbing .pom.sha1, there are two HEAD calls, and then a GET -> 200.
  • for a pair of ivy.xml and ivy.xml.sha1 it took 2032ms. without redirection it will be 1104ms. further more without HEAD, it will be 445ms.
  • resolvers are attempted sequentially from the first one. it might make sense to bring Maven Central the first one instead of repo.typesafe.com or repo.scala-sbt.org
Member

eed3si9n commented Oct 9, 2015

Some interesting facts from sbtがpomやjarを解決する際の無駄なhttpアクセス storifying @tkawachi's analysis on sbt traffic. The following is my translation of what he tweeted:

  • when .pom is found, .pom.sh1 goes back to the same server, but for .jar, it will try from the first resolvers.
  • HEAD calls can add latency (depending on where you live)
  • redirection from repo.typesafe.com to dl.bintray.com adds around 200ms. there's no Keep-Alive
  • when grabbing .pom.sha1, there are two HEAD calls, and then a GET -> 200.
  • for a pair of ivy.xml and ivy.xml.sha1 it took 2032ms. without redirection it will be 1104ms. further more without HEAD, it will be 445ms.
  • resolvers are attempted sequentially from the first one. it might make sense to bring Maven Central the first one instead of repo.typesafe.com or repo.scala-sbt.org
@jroper

This comment has been minimized.

Show comment
Hide comment
@jroper

jroper Oct 12, 2015

Member

That explains a lot.

Member

jroper commented Oct 12, 2015

That explains a lot.

@huntc

This comment has been minimized.

Show comment
Hide comment
@huntc

huntc Oct 18, 2015

Coming in late here, but presuming that the ivy libs we're using also rely on the blocking JDK based APIs to reach across the network, I'd imagine we'd be ok in doing so here too - at least in order to keep our dependencies to a minimum. I agree that our API here should be non-blocking and async of course.

huntc commented Oct 18, 2015

Coming in late here, but presuming that the ivy libs we're using also rely on the blocking JDK based APIs to reach across the network, I'd imagine we'd be ok in doing so here too - at least in order to keep our dependencies to a minimum. I agree that our API here should be non-blocking and async of course.

@wsargent

This comment has been minimized.

Show comment
Hide comment
@wsargent

wsargent Mar 27, 2016

Contributor

Caching API is here -- it's set up for Play WS right now, but I can abstract it to be directly on async-http-client: https://github.com/playframework/play-ws-cache

Contributor

wsargent commented Mar 27, 2016

Caching API is here -- it's set up for Play WS right now, but I can abstract it to be directly on async-http-client: https://github.com/playframework/play-ws-cache

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment