Network API #2189

Open
eed3si9n opened this Issue Aug 31, 2015 · 18 comments

Projects

None yet

6 participants

@eed3si9n
Member

For sbt 1.0, we'd like to have Network API that can download metadata and artifacts.

Some feature to consider:

  • Out of the box proxy support
  • Caching based on URL
    • Potentially take advantage of Sonatype HTTP headers
  • Connection pooling that works with https and redirection (repo.scala-sbt.org redirects to Bintray, which redirects to some CDN)
  • https://github.com/AsyncHttpClient/async-http-client/ wrapper?
@jsuereth
Member

Also, think about resumable downloads (on failure). The aether connectors have example code for how to do that.

@dwijnand
Member

"Smartness in terms of caching/Sonatype HTTP headers" what's meant by this? Google didn't come up with anything.

@eed3si9n
Member

"Smartness in terms of caching/Sonatype HTTP headers" what's meant by this?

This is an idea @jsuereth has been talking about for a while, but gist is that since downloader is specialized to sbt's usage, we want to add smart caching that's aware of what's mutable and what's immutable. Another hint we can use for caching is ETag and custom HTTP headers used by Sonatype and/or Bintray.

See for example:

$ curl -X HEAD -L -i https://oss.sonatype.org/content/repositories/snapshots/com/eed3si9n/treehugger_2.10/0.2.4-SNAPSHOT/maven-metadata.xml
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 273
Content-Type: application/xml
Date: Mon, 31 Aug 2015 20:30:27 GMT
ETag: "{SHA1{d08d5210d442b0aefd51379b3bcce0b94bfb2f68}}"
Last-Modified: Sun, 30 Aug 2015 23:03:42 GMT
Server: nginx
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
Connection: keep-alive

treehugger 0.2.4-SNAPSHOT has to be from circa 2013, but the Last-Modified says 2015, so we can't trust it. If SHA1 is correct, we could use the ETag.

@dwijnand
Member

Is there any concern on depending on async-http-client on the classpath? Same for probably Netty. I recently had to move away from it (I was using it via dispatch) in a Play library as Play 2.3 and Play 2.4 used AHC 1.8 and 1.9 which are binary incompatible (yeah..), and it looks like AHC is working towards a 2.0 which I'm assuming will be incompatible too.

@dwijnand
Member

I meant concern WRT to possible, known sbt plugins that already depends on AHC.

@eed3si9n
Member

That is a major concern. But if sbt shipped with a Network API, maybe all the plugins can just use that and the problem might be resolved. Is that too optimistic?
I feel like bits downloading is something the community or at least our plugin ecosystem should just agree to use one implementation, and just stick with it for a while, similar to JSON ASTs.

@eed3si9n
Member

@jroper, @richdougherty What's your opinion on this topic since you might be affected in more than one way? ▶️ 🌏

@dwijnand
Member
  1. I assumed sbt's Network API would be a wrapper of a client for the benefit of optimising for artifacts/metadata IO, so not general purpose.
  2. Even then, sometimes the dependencies aren't direct (eg. plugin uses a library for talking to GitHub's API, which uses AHC version 1.8).
@jroper
Member
jroper commented Sep 1, 2015

If using async-http-client, I'd want to shade it.

@dwijnand
Member
dwijnand commented Sep 1, 2015

Indeed I was thinking the same.

@dwijnand
Member
dwijnand commented Sep 5, 2015

Ugh, looks like Sonatype snapshots keeps update maven metadata files, updating the lastUpdates field of the XML, so the Last-Modified and ETag are constantly changing even if no new publish was made..

<?xml version="1.0" encoding="UTF-8"?>
<metadata modelVersion="1.1.0">
  <groupId>com.eed3si9n</groupId>
  <artifactId>treehugger_2.10</artifactId>
  <version>0.2.4-SNAPSHOT</version>
  <versioning>
    <lastUpdated>20150902225502</lastUpdated>
  </versioning>
</metadata>
@dwijnand
Member
dwijnand commented Sep 5, 2015

I've explored HTTP response headers for the repos that we care about:

https://repo1.maven.org/maven2/
https://jcenter.bintray.com/

https://oss.sonatype.org/content/repositories/releases/
https://oss.sonatype.org/content/repositories/snapshots/

https://repo.typesafe.com/typesafe/releases/
https://repo.typesafe.com/typesafe/ivy-releases/

https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/

// A supported alternative
https://repo.typesafe.com/typesafe/maven-releases/

// These 3 don't actually exist, and can't because they redirect to Bintray which doesn't support snapshots
https://repo.typesafe.com/typesafe/snapshots/
https://repo.typesafe.com/typesafe/ivy-snapshots/
https://repo.scala-sbt.org/scalasbt/sbt-plugin-snapshots/

and recorded the responses and some notes on each: https://github.com/dwijnand/sbt-net/blob/master/notes.md

One of the surprising discoveries is that Last-Modified/If-Modified-Since doesn't work for the Typesafe repos, but does for the sbt plugin repo, despite both being redirects to Bintray. Even when using the underlying Bintray URL directly. For reference the jCenter Bintray repo does support it.

@dwijnand
Member
dwijnand commented Sep 5, 2015

Please let me know if there are other repos I should care about.

Perhaps fetching from ~/.m2/repository/?

[Edit: Uhm.. I don't know what I was thinking.. there's no HTTP response headers when fetching from ~/.m2/repository/ -.-]

@eed3si9n
Member
eed3si9n commented Sep 6, 2015

Please let me know if there are other repos I should care about.

Local installation of Artifactory and Nexus I guess.

@eed3si9n
Member
eed3si9n commented Oct 9, 2015

Some interesting facts from sbtがpomやjarを解決する際の無駄なhttpアクセス storifying @tkawachi's analysis on sbt traffic. The following is my translation of what he tweeted:

  • when .pom is found, .pom.sh1 goes back to the same server, but for .jar, it will try from the first resolvers.
  • HEAD calls can add latency (depending on where you live)
  • redirection from repo.typesafe.com to dl.bintray.com adds around 200ms. there's no Keep-Alive
  • when grabbing .pom.sha1, there are two HEAD calls, and then a GET -> 200.
  • for a pair of ivy.xml and ivy.xml.sha1 it took 2032ms. without redirection it will be 1104ms. further more without HEAD, it will be 445ms.
  • resolvers are attempted sequentially from the first one. it might make sense to bring Maven Central the first one instead of repo.typesafe.com or repo.scala-sbt.org
@jroper
Member
jroper commented Oct 12, 2015

That explains a lot.

@huntc
huntc commented Oct 18, 2015

Coming in late here, but presuming that the ivy libs we're using also rely on the blocking JDK based APIs to reach across the network, I'd imagine we'd be ok in doing so here too - at least in order to keep our dependencies to a minimum. I agree that our API here should be non-blocking and async of course.

@wsargent

Caching API is here -- it's set up for Play WS right now, but I can abstract it to be directly on async-http-client: https://github.com/playframework/play-ws-cache

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment