Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Federated Amazon S3 support for IGV desktop, migration towards Java11, Gradle and better release engineering #620

Open
wants to merge 79 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@brainstorm
Copy link

commented Jan 30, 2019

This pullrequest is a followup on one of my tweets and should fix issue #320. I am not a Java expert, so forgive my not-so-idiomatic code :-S

  • Adds AWS Cognito OAuth2/OIDC federated identity (with Google in our case, but any social login is supported).
  • Adds dynamic JTree S3 object loader, so that object listings are populated as needed, not all at once in a LoadDialog nor hardcoded statically as in the GA4GH prototype LoadDialog.
  • Corrects some OAuth2 implementation defects, i.e: state parameter should be a random string to avoid CSRF.
  • Gradle pulls AWS dependencies instead of keeping them as binaries under /lib.

TODO:

  • A bit more cleanup and refactoring (ActionListeners for UI are too complex/specific, factor out).
  • Ask @igvteam about their Gradle dependency management policy, integrate dependency pulling in .travis.yml. In other words: does it make sense to store the dep .jars under /lib instead of just declaring them on build.gradle?
  • Discuss with @igvteam and refactor oauth-config.json vs .properties logic in OAuthUtils.java fetchOauthProperties().
  • Make sure the pre-existing Google flow still works.
  • Write up a blog post about the whole authentication/cognito setup and integration with IGV.: https://umccr.org/blog/2019/02/16/igv-amazon/
  • Maybe ask @igvteam to fix up the dialog/forms that need JFormDesigner? (we don't have a license) :/
  • Migrate to Java11
  • Migrate lib/*.jar to Gradle implementation definitions.
  • Tighten up a bit the (transitive) dependencies
  • Merge into master! 馃帀

Thanks @reisingerf for the last laps on the UI side of this PR and brilliant pair programming ;)

brainstorm added some commits Jan 13, 2019

Working AWS Cognito+Google federated example. Now we should parse the鈥
鈥 returned JWT tokens accordingly to get access to i.e S3 buckets
Bump htsjdk version to make sure all S3 bugs are addressed. Now it ge鈥
鈥erates pre-signed S3 urls but the .bai detection is failing
Polish gradle dependencies, we do not want to pull all of AWS jdk. Re鈥
鈥ove the introduced and redundant HTTP class from Cognito example code.
Kill custom getAuthPage function and fix scope for AWS while keeping 鈥
鈥t compatible with Google. Adding event so that UI reacts with an AWS S3 dataset selection box as soon as we are authenticated

@brainstorm brainstorm changed the title Federated Amazon/AWS S3 support for IGV desktop Federated Amazon S3 support for IGV desktop Jan 30, 2019

brainstorm added some commits Jan 30, 2019

Remove stray AWS test constant. Run gradle build from .travis.yml to 鈥
鈥etch/cache AWS dependencies defined in main build.gradle file
Java8 seems to build fine, apply the same dependencies/repository dir鈥
鈥ctives to Java11 gradle file. Remove sudo privs since it might invalidate caches for gradle according to: https://stackoverflow.com/a/27365925/457116
Drop htsjdk from Gradle, @igvteam seems to prefer .jars under /lib in鈥
鈥tead, one for discussion. Bump AWS lib versions and possible fix for https://travis-ci.org/igvteam/igv/builds/486242952
Factor out a bit Amazon/Google-specific bits from Oauth via getters/s鈥
鈥tters. Disable travisci for Java11 for now
Refactor the actionListener for the dynamic S3 JTree loader Swing dia鈥
鈥og. Also assign filename to the locator
Narrowed outstanding XXX (TODOs) to just ResourceLocator ones. Catch 鈥
鈥ermission error exceptions and present them to user
Fix more Oauth2 profile defects. Add logged in message in statusbar. 鈥
鈥ow the saveRefreshToken() is not multi-provider aware and the logic should be refactored a bit more.
Need to rethink/refactor the whole token refresh logic. Disabling cro鈥
鈥s-session refresh token saving logic for now
@brainstorm

This comment has been minimized.

Copy link
Author

commented Feb 4, 2019

@igvteam @jrobinso, first, thanks a ton for considering this PR!

Practical question to move towards master merging goal faster as itemized in the PR description: Would it make sense to remove some of the .jars you keep under lib/ and put the equivalent Maven references on build.gradle? It should easily remove binary fat from the repo, what are your thoughts and internal discussions on that point?

@jrobinso

This comment has been minimized.

Copy link
Contributor

commented Feb 4, 2019

@brainstorm We don't have time to take that on right now. Perhaps in a future refactoring when I can devote some time. The amount of disk space taken is infinitesimal by modern standards, no motivation to devote time to this.

When you think you are done developing (I see regular pushes) and this is ready for review let me know. It looks exciting, I'd like to get it in. Try to make it as minimally disruptive as possible, perhaps opening other git issues for non-essential improvements. I'm juggling many projects now so review time is precious, and anything that requires a lot of re-testing of unrelated areas will make it difficult. Thanks!

@brainstorm

This comment has been minimized.

Copy link
Author

commented Feb 4, 2019

@jrobinso Gotcha, I'll leave the Gradle stuff as-is then, without removing the jars, good points.

I think that in the interest of getting this merged fast, I'll fix the oauth-config.json vs .properties logic. That bit and the automatic .vcf/.bam indexes detection are the two main issues preventing merge to master right now, imho.

I will try to expedit those changes during this week and call it done over here once I've tested that I didn't break anything else.

Thanks for your patience and interest!

@jrobinso

This comment has been minimized.

Copy link
Contributor

commented Feb 4, 2019

OK, sounds good. We can tackle the jars and other issues in future work.

Fix remaining oustanding XXX from the resourcelocator and manually au鈥
鈥o-detect types (since the existing logic does not do that). Add Oauth login status on status bar. Thanks @reisingerf for the XP session ;)
@jrobinso

This comment has been minimized.

Copy link
Contributor

commented Feb 6, 2019

@brainstorm It looks like there is a compile error (travis build). I don't have time currently to dig into it, could you check the travis output?

@brainstorm

This comment has been minimized.

Copy link
Author

commented Feb 6, 2019

@jrobinso No worries, working on it right now! As I mentioned I will @-mention you when it's ready-ready for good, tackling the fetchOauthProperties() now ;)

@jrobinso

This comment has been minimized.

Copy link
Contributor

commented Feb 6, 2019

Ahh, ok, I saw a comment and interpreted it as "ready". No rush, I couldn't look at it right away even if it was ready.

brainstorm added some commits Feb 6, 2019

Correct defects on oauth-config.json handling of optional vs mandator鈥
鈥 parameters. Add helper method to guess (some) index files location
Clarify current OAuth config parsing logic, does not support multi-pr鈥
鈥vider, would require a provider manager class and refactoring of Oauth, not doing that in this PR
@davideby

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

@brainstorm I've been taking a closer look at the branch at https://github.com/umccr/igv/commits/umccr . Can you give us a tighter list of dependencies? Obviously some new things will be necessary for the AWS integration but this is pulling in like 50 new jars.

One thing I know for sure about @jrobinso is he hates pulling in new dependencies unless they're absolutely necessary.

We just went through a round of cutting out unused stuff (probably more to be done there) so we don't want to re-grow it if it can be avoided.

brainstorm added some commits Mar 4, 2019

Ignoring Goby testcase for now. Test-data needs to be regenerated for鈥
鈥 org.broad.igv.goby.GobyAlignmentQueryReaderTest: 'This alignment requires upgrading, please download goby 2 and use its concatenate tool to upgrade.'

Small gradle cleanup to accomodate the 'application' plugin, potentially getting rid of custom deploy code. Fix S3 commonprefix. S3 presigned URLs not working yet
AWS presigned urls working for V2 AWS JAVA SDK with session-bound exp鈥
鈥ration times. Gradle target for Mac fixed. Thanks @reisingerf. Fix various Travisci deploy issues.

Deploy IGV release with Java11

Deploy with new/validated travisci config: travis-ci/travis-ci#8289 (comment)

Explicitly excluding old htsjdk:2.8.1: java.lang.module.FindException: Two versions of module htsjdk found in ./Contents/MacOS/../Java/lib (htsjdk-2.18.2-7-g3c48018-SNAPSHOT.jar and htsjdk-2.8.1.jar)

Correct indentation, only a Draft release was generated

Revert to known-working .travisci deploy syntax

Disable phony GROOVY warnings on travisci [ci skip]
Exclude jsap and htsjdk without version since the latter is under /li鈥
鈥 shipped as jar, for now... until htsjdk releases a new one
HTSJDK 2.19.0 released, solving the CSI read issues/tests. Deprecate 鈥
鈥ava8 and fix TravisCI deployment.

Downgrade slf4j since 'java.lang.ClassNotFoundException: org.slf4j.LoggerFactory', thanks @wandergeek for reporting

Simplify deployment artifact naming, users were confused about which zip to download. Thanks @wandergeek @reisingerf

Deploy the right target for Mac, deploy Linux zip release too

Deprecate Java8 gradle build, relocate resources to more sensible paths. Preferences.tab not being read through the IDEA editor but works fine through ./gradlew run and builds? Optimize imports, move about.properties to more standard app-level, default .properties. [ci skip]

Remove remaining java8 cruft and leftovers
Attempt to cut down on dependencies required/pulled by AWS SDK, follo鈥
鈥ing @davideby demand. I.e: managed to drop math2 over math3, but modern release engineering tools like ProGuard should really be the tools managing the (transitive) dependency pruning efficiently/effortlessly, not error-prone humans like me ;( [ci skip]
org.apache.log4j deprecation in favour of org.apache.logging.log4j (n鈥
鈥wer), catched this leftover test error

@brainstorm brainstorm changed the title Federated Amazon S3 support for IGV desktop Federated Amazon S3 support for IGV desktop, migration towards Java11, Gradle and better release engineering Apr 1, 2019

@brainstorm

This comment has been minimized.

Copy link
Author

commented Apr 1, 2019

@davideby @jrobinso, agree with you, keeping as few jars as possible #ftw. As you can see, I just merged the (rebased) umccr branch into aws_support, so ready to merge after the (minor) tweaks mentioned by @davideby earlier.

As you can see in this changeset, I stretched myself to trim those jars/deps, but they are indeed required, otherwise the AWS functionality itself breaks. So I boiled it down to the minimal subset of absolutely required modules, hope it helps:

        // Amazon (excluded) deps
        exclude group: 'software.amazon', module: 'flow'
        exclude group: 'software.amazon.awssdk', module: 'annotations'
        //exclude group: 'software.amazon.awssdk', module: 'aws-xml-protocol'
        //exclude group: 'software.amazon.awssdk', module: 'http-client-spi'
        //exclude group: 'software.amazon.awssdk', module: 'apache-client'
        //exclude group: 'software.amazon.awssdk', module: 'utils'
        //exclude group: 'software.amazon.awssdk', module: 'annotations'
        //exclude group: 'com.fasterxml.jackson.core', module: 'jackson-core'
        //exclude group: 'com.fasterxml.jackson.core', module: 'jackson-annotations'
        //exclude group: 'com.fasterxml.jackson.core', module: 'jackson-databind'
        //exclude group: 'org.reactivestreams', module: 'reactive-streams'
        //exclude group: 'com.typesafe.netty', module: 'netty-reactive-streams'
        //exclude group: 'com.typesafe.netty', module: 'netty-reactive-streams-http'

(...)

        // Amazon deps
        [group: 'software.amazon.awssdk', name: 'http-client-spi', version: '2.5.0'],
        [group: 'software.amazon.awssdk', name: 'cognitoidentity', version: '2.5.0'],
        [group: 'software.amazon.awssdk', name: 's3', version: '2.5.0']

I also trimmed a few other leftovers like math2 (in favor of math3) and the old log4j in favor of the more modern counterpart, among others.

This being said, better release engineering tools like ProGuard & co should do a better job than me as a human, so I'll leave this bit to the experts. In any case, I hope that after my contributions I've hopefully flattened the stage for you guys to go towards a good future CI/CD setup using more automation so you can have more time for the fun stuff ;)

Take care @jrobinso and @davideby, thanks for the patience and support in this work.

Cheers and thanks for all the fish! ;)

/cc @ohofmann @reisingerf @wandergeek

@jrobinso

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2019

Hey @brainstorm @davideby

Sorry to jump in the middle here, and this is probably a dumb question, but could I get just a flat list (in text) of new dependencies? Yes I'm being lazy.

RE log4J, IGV uses that extensively and I have no plans to change that at this time unless it is breaking something. Let's keep this focused on the new AWS functionality. There are 578 files changed, there's no way I can review 578 files. I realize much of that is probably library jars added/removed, but its hard to believe that adding OAuth and some enhanced S3 support requires this magnitude of change.

@brainstorm

This comment has been minimized.

Copy link
Author

commented Apr 1, 2019

@jrobinso, as I mentioned, most of those files changed are due to the required/asked for Java11 migration (i.e global search and replace for log4j.Logger to log4j.LogManager).

On log4j, it works fine, the only change is from org.apache.log4j (old Java8 import IIRC) to org.apache.logging.log4j (new Java11 namespace?). No big deal, just changed imports really.

On the ~3435 files that IGV has (via find . | wc -l), 578 files touched mostly by global search and replace (often single line changes or just deleted .jars) shouldn't be a big deal to quickly review, I hope :/

I was under the impression that @davideby was comfortable with those changes before merging it in this branch. Again, if it's overwhelming, I can point out to the main changes, but Java11 migration really did require this level of heavy lifting, sorry about that :/

Here's a ./gradlew dependencies dump for your convenience @jrobinso:

https://hardbin.com/ipfs/QmPKQoSqcfAQcS1RBAvvKYubFMqAXgJ2Ky3V2Qj85rK79T/#FJitGKtLhcGVdzQYUF7qv2sb95yTTH4hxsVRKFDqBG1R

@jrobinso

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2019

@brainstorm @davideby We have already ported to Java 11, without this PR, its released now in fact. So why was that change neccessary (log4j), it seems to account for a majority of the file changes. Also lots of import changes generally, did you globally run some import optimizer?

Could you explain the change of the goby paths?

(-) import edu.cornell.med.icb.goby.alignments.AlignmentReaderImpl;
(+) import org.campagnelab.goby.alignments.AlignmentReaderImpl;

I would like to get to what is actually new, the import changes look like noise, but will require testing.

Have you built a distributable zip with the new dependencies? Perhaps @davideby has if not. What is the total size now?

@brainstorm

This comment has been minimized.

Copy link
Author

commented Apr 1, 2019

Right, sorry, I meant Java11 and Gradle dependency management.

The modified goby import paths are motivated by the publicly available Maven artifacts, namely:

https://mvnrepository.com/search?q=org.campagnelab.goby

There's no other public Maven reference to Goby AFAIK.

I only automatically optimized imports for the Amazon related classes (mine), the other classes are only changed according the public artifacts (manually and when necessary). The log4j changes are also motivated by the modern, publicly available jars, I just wanted to make sure no old bugs creep up with this new build:

https://mvnrepository.com/search?q=log4j

Ska虉rmavbild 2019-04-01 kl  20 23 32

The distributable zips I generated are available in https://github.com/umccr/igv/releases and weight around ~50MB-60MB:

Ska虉rmavbild 2019-04-01 kl  20 20 11

Hope that makes sense? Happy to answer more questions, again, I know the amount of changes can seem overwhelming at first, but there's logic into it, not just noise, IMHO :/

@jrobinso

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2019

@brainstorm By noise I mean its not germane to the main topic of the PR, just generic improvements made along the way. Focused PRs are much easier to handle. There are at least 3 topics here now, the new S3 functionality (actually there are 2 or 3 included there), Changes to the build, and jar updates. The log4j update alone is a separate PR. The requirement for maven surprised me but perhaps you and David have discussed that? I'm not asking for any more changes, just leave this as is but it will be some time before it can be absorbed on this end. I think you are building from a branch for you local needs, correct?

@brainstorm

This comment has been minimized.

Copy link
Author

commented Apr 1, 2019

@jrobinso Totally hear you, I'm very much in favour of small/atomic pullrequests, that's why I first asked for review on the umccr branch commits before merging it here until I got the ok, so that it didn't become too much at once. Sorry if I miscommunicated that in any way :-S

It is for our local needs (connect to AWS S3), but the code is generalized, it doesn't have any custom/hardcoded UMCCR stuff, afaik, happy to address that if you guys find something site-specific.

@jrobinso

This comment has been minimized.

Copy link
Contributor

commented Apr 2, 2019

@brainstorm Thanks for your efforts, but this is not a small atomic pull request. I thought we had agreed to leave gradle and jars alone, other than changes (which should be minimal) required for the new S3 functionality. We can't change this many things at once on a project this size. Every change of a jar version has potential consequences, for example did you try exporting and svg file (batik library version changed). I cycle through projects, I am not actively working on IGV desktop at the moment, when I cycle back to this I'll take a good look at the S3 functions to see if they can be extracted from the rest of this PR. I asked if you were building locally to confirm that we weren't blocking you in any way.

Fix SVG export by using newest batik dependencies instead of legacy o鈥
鈥es from org.eclipse. Thanks @jrobinso for reporting the issue, also realized that SVG export functionality should perhaps be covered by tests? /cc @davideby
@brainstorm

This comment has been minimized.

Copy link
Author

commented Apr 3, 2019

I've reported the issue with BATIK and Java11 modules upstream here:

https://issues.apache.org/jira/browse/BATIK-1260

I think that @simonsteiner1984 is one of its core devs and has solved Java11 module issues in the past, I hope this is an easy fix and that it can be updated upstream as Batik 1.12.x or similar release.

@brainstorm

This comment has been minimized.

Copy link
Author

commented Apr 3, 2019

@jrobinso I did not say that this is a small/atomic pullrequest as it is right now.

We did discuss with @davideby above about Gradle/jars/java11 (see messages above in the thread), that's why I asked to review the commits from the umccr branch first before merging it into this branch. Explicitly.

You are not blocking me in any way, but it's always preferable to have one's work available to more researchers. On top of that, we do also prefer to work with a good upstream OSS project rather than yet another fork ;)

@jrobinso

This comment has been minimized.

Copy link
Contributor

commented Apr 3, 2019

@brainstorm There's no way I can accept a PR with this many changes, again unrelated to the functionality that was the original goal. The risk/reward ratio is too great. Again when I am actively developing IGV again I will look at the S3 changes specifically and see if I can pull them out. Changes to the build process are not germane to that, it is a separate topic. I thought we agreed to leave that basically as is. Ditto updating jars, new things you needed excepted we have a working Java 11 IGV with the current jars.

@ohofmann

This comment has been minimized.

Copy link

commented Apr 3, 2019

Thanks @jrobinso. I agree with @brainstorm that we'd like to see those changes make it back to IGV, particularly since we'll build on those for DOS/DRS support and htsget/Crypt4GH support down the road. Let's discuss how we can help with this once you cycle back to IGV.

@jrobinso

This comment has been minimized.

Copy link
Contributor

commented Apr 3, 2019

@ohofmann @brainstorm First thank you for your efforts. The key thing will be breaking this up into smaller focused chunks, each with a specific purpose. The starting point should be the current master, which is stable and working. Ideally for the S3 support we would have a focused PR without additional changes, improvements though they might be, and with the build process left as it is. We have not made any decision yet to use Maven for dependency management. I would expect minimal changes to existing code and jars for this step.

Updates to existing jars would be another set of PRs, but each would need to be weighed for benefit vs risk, also considering our ability and time available to test the change. I like to commit what should be no-effect improvements separately from new features and bug fixes, mixing those things makes it hard to later isolate problems introduced.

Tens of thousands of people use IGV and I personally have to respond to every issue, so we have to be deliberate and maybe slower than any of us would like.

@jrobinso

This comment has been minimized.

Copy link
Contributor

commented Apr 8, 2019

@ohofmann @brainstorm I'm not back to IGV yet but am getting closer, and looking this over. I think the way forward is for me to check out this PR locally, extract the S3 support bits and try to make a minimal changeset. I did a quick review and think I can do that. Then I will submit a PR back to you, which you can update this PR with or create a new one. That is a bit convoluted, I could just merge it directly, but I want Roman to get credit, that is have a github record of the accepted PR.

@brainstorm

This comment has been minimized.

Copy link
Author

commented Apr 8, 2019

Thanks sounds great, thanks Jim!

Not convoluted, it's pretty much GitHub life these days, I'm used to it ;)

@brainstorm

This comment has been minimized.

Copy link
Author

commented May 9, 2019

@jrobinso Let me know if you want me to resolve those latest conflicts introduced on TribbleFeatureSource and IGVUrlHelper classes. I don't want changes to pile up making this unmergeable. If you don't want that I'll assume you have it under control and that AWS support is on the way and so I'll keep my hands off, exciting! ;) 馃憤

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can鈥檛 perform that action at this time.