Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated path-based interactive tools with entry point path injection, support for ITs with relative links, shortened URLs, doc and config updates including Podman job_conf #16795

Merged
merged 18 commits into from Nov 16, 2023

Conversation

sveinugu
Copy link
Contributor

@sveinugu sveinugu commented Oct 6, 2023

Work on path-based interactive tools started with issue #14690 concluded. Latest additions:

  • Support for path based ITs with relative links, implemented through:
    • New entry point attribute requires_path_in_url to signal to gx-it-proxy whether the path should be stripped from the URL (if False, which is default) or not stripped (if True)
  • Support for path based ITs with absolute links using one of:
    • Mechanism for injecting entry point URL path in HTTP header variable, implemented through:
      • New entry point attribute label to better identify specific entry point
      • New environment variable injection method named entry_point_path_for_label
    • Mechanism for injecting entry point URL path in environment variable at tool startup, implemented through:
      • New entry point attribute requires_path_in_header named to signal to gx-it-proxy that the entry point path should be added to the request as a HTTP header of the provided name (if set)
  • All interactive tool URLs are shortened to the level of interactivetools_shorten_url, removing the need for this config (now removed). This has been implemented through:
    • Removing the unneeded /access part for path-based ITs
    • Identifying entry point class with ep instead of interactivetoolentrypoint (only one class implemented anyway)
    • Reducing entry point encoded identifier from 16 to 12-13 chars using base36 [a-z][0-9] instead of base16 [a-f][0-9] (hex)
    • Using 8-byte cryptographic token encoded with base36 (~13 chars) instead of hex-encoded uuid (32 chars)
    • interactivetools_shorten_url used hex-encoded entry point id (16 chars)+10 first chars of uuid = 26 chars. New base36-encoded identifier + token is 25-26 chars
    • interactivetools_shorten_url already stripped .interactivetoolentrypoint. Adding back .ep adds 3 chars but brings consistency. If shorter URLs are needed for the kubernetes use case, the galaxy.yml config interactivetools_prefix="interactivetool" can be shortened. @almahmoud looks ok?
    • Example domain-based URL: https://24q1dbzrknq1v-1a1p13jnahscj.ep.interactivetool.myserver.net/
    • Example path-based URL: https://myserver.net/interactivetool/ep/24q1dbzrknq1v/1a1p13jnahscj/
  • Example tools for various modes have been properly configured:
    • Jupytool (jupyter_notebook_1.0.0) is now a path-based IT using entry point path injection into environment variable
    • OpenRefine and Ubuntu XFCE Desktop (guacamole_desktop) are now path-based ITs without path injection (as they only use relative links)
    • RStudio is now a path-based IT using entry point path injection as HTTP header
  • Small updates of galaxy.yml.sample, galaxy.yml.sample.interactivetools and tool_conf.xml.sample
  • Fetches GALAXY_WEB_PORT from galaxy_infrastructure_web_port instead of hardcoding to 8080 (all ITs)
  • Reorganisation and update of interactive tools docs. Added section on path-based interactive tools. Updated sample configs
  • Added job_conf.yml.interactivetool
  • Added job_conf.yml.interactivetool.podman to document a working config to run ITs on Podman (@hexylena)
  • Added note in galaxy.yml.sample on interactivetools_map
  • Depends on gravity v1.0.4 (PR: Minimal path change needed for galaxy PR 16795 gravity#114) , which again depends on gx-it-proxy version 0.0.6 (PR: Added support for requires_path_in_url and requires_path_in_header_named gx-it-proxy#17)

Much details in a single PR here, but did not want to spend even more time on individual PRs.

FIxes #14690

Thanks to @morj-uio, @hexylena and @natefoo for help!

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • [ X ] This is a refactoring of components with existing test coverage.
  • [ X ] Instructions for manual testing are as follows:
    1. Open https://proto2-test.fairtracks.net/
    2. Start the updated interactive tools (Jupytool, RStudio, Open Refine, Ubuntu XFCE Desktop), notice that they are hosted under the same subdomain, but with ids and tokens in the path.
    3. Notice that the tools still work.

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@bernt-matthias
Copy link
Contributor

So ITs work with podman? This would be great news for me.

@sveinugu
Copy link
Contributor Author

sveinugu commented Oct 6, 2023

So ITs work with podman? This would be great news for me.

Yes, at least under CentOS stream! Finding the correct config has been a long series of trial and error but this last version has been running without a glitch for a long time.

Note that I have configured to ignore the SELinux issues. A proper implementation would add a Podman mode in Galaxy that correctly labels output files.

There is also an issue with Galaxy overwriting the HOME env, which confuses podman, but re-setting this in the docker_cmd is a good workaround. The Podman config is really independent of the rest of the PR and can be put to use immediately.

EDIT: This is for Docker containers that runs as root on the inside, which is then automatically mapped to the galaxy user by Podman. I now force-pushed an update with a setting for non-root containers, however this will break root containers if I remember correctly.

@hexylena
Copy link
Member

hexylena commented Oct 6, 2023

RStudio is still domain-based IT but prepared for path injection, pending someone to figure out how to configure this (@hexylena?)

yes, should be possible. It looks like rocker-versioned2 doesn't support it directly as an env var but we could send it along in the proxy itself? I.e. add to your existing PR for gx-it-proxy, the header will probably only be read by RStudio and it'll solve it. rocker-org/rocker-versioned2#388 (comment), or we use our forked image and add it there. But yes! it is easy to support :)

1 similar comment
@hexylena

This comment was marked as duplicate.

@hexylena
Copy link
Member

hexylena commented Oct 6, 2023

Added job_conf.yml.interactivetool.podman to document a working config to run ITs on Podman (@hexylena)

<3 this is fantastic news

@sveinugu
Copy link
Contributor Author

sveinugu commented Oct 6, 2023

yes, should be possible. It looks like rocker-versioned2 doesn't support it directly as an env var but we could send it along in the proxy itself? I.e. add to your existing PR for gx-it-proxy, the header will probably only be read by RStudio and it'll solve it. rocker-org/rocker-versioned2#388 (comment), or we use our forked image and add it there. But yes! it is easy to support :)

Great info! Feels very much like a hack though to add this to all proxy requests. Also, the solution might be needed for other tools as well. It seems we should really add a requires_ep_path_in_header="X-RStudio-Root-Path" attribute, and that starts to look like a new PR together with other things that will appear when going through the IT list?

@hexylena
Copy link
Member

hexylena commented Oct 6, 2023

Feels very much like a hack though to add this to all proxy requests

agreed, I think exposing a custom env var that gets set at start time is a better choice, though a bit more complicated to manage on top of an existing container (it'll be fine for rstudio but)

@sveinugu
Copy link
Contributor Author

sveinugu commented Oct 6, 2023

agreed, I think exposing a custom env var that gets set at start time is a better choice, though a bit more complicated to manage on top of an existing container (it'll be fine for rstudio but)

My point is that this is really an alternative injection method which is actually quite simple to implement and might solve a bit of headaches (I had to give up on an env-based solution for RStudio myself). But let's then make it a proper feature.

@sveinugu
Copy link
Contributor Author

sveinugu commented Oct 6, 2023

quite simple to implement

On the other hand, it would probably not be much work to add it to this PR, and it also has some consequences on naming. So instead of requires_entry_point_path (which I was not so happy with in the first place), we can have two attributes requires_path_in_url and requires_path_in_header and then we can have RStudio running with path-based URL by the end of the day (given that the fix you describe works for the current RStudio image). Let me give it a try!

@almahmoud
Copy link
Member

@sveinugu Is it absolutely necessary to remove interactivetools_shorten_url ? The purpose was to shorten the subdomain token, not just the prefix, so saying that one can just change the prefix doesn't accomplish the same thing. We are currently using this feature at Bioconductor, and would rather not have it removed if there is no good reason to do so.

IIRC, without the shortening, using ITs out of the box without wildcard certificates would not work, while with the shortened URL, the admin would only have to setup wildcard certs if they are launching over 100 interactive tools per week. On the Galaxy side, this is especially relevant for the AnVIL, as the progress to ITs working there would be going backwards from previous work if this is removed.

So, given that there are some possible negative implications, what are the advantages of removing the feature as opposed to just turning it off when one doesn't need it? .ep can stay, but it would be ideal to still have the option to collapse the subdomain into a shorter one when needed, and I don't quite understand how it hurts to have the option

@sveinugu
Copy link
Contributor Author

sveinugu commented Oct 6, 2023

@sveinugu Is it absolutely necessary to remove interactivetools_shorten_url ? The purpose was to shorten the subdomain token, not just the prefix, so saying that one can just change the prefix doesn't accomplish the same thing. We are currently using this feature at Bioconductor, and would rather not have it removed if there is no good reason to do so.

IIRC, without the shortening, using ITs out of the box without wildcard certificates would not work, while with the shortened URL, the admin would only have to setup wildcard certs if they are launching over 100 interactive tools per week. On the Galaxy side, this is especially relevant for the AnVIL, as the progress to ITs working there would be going backwards from previous work if this is removed.

So, given that there are some possible negative implications, what are the advantages of removing the feature as opposed to just turning it off when one doesn't need it? .ep can stay, but it would be ideal to still have the option to collapse the subdomain into a shorter one when needed, and I don't quite understand how it hurts to have the option

Perhaps I have misunderstood something or perhaps I was unclear in my description. So if the PR is merged, all subdomains should now be more or less the same length as if interactivetools_shorten_url=True (plus 2 to 3 characters). Since I believe I read that the limit is 63 characters I don't think there is a need for an option to shave off these 2-3 chars? So all domain-based urls will now look like this: https://24q1dbzrknq1v-1a1p13jnahscj.ep.interactivetool.myserver.net/ which is 59 characters in this example, and there is 15 characters in the "interactivetool" part that you can further shave off by the interactivetools_prefix config if e.g. the domain name is substantially longer than myserver.net. The token is now reduced for all users.

@sveinugu
Copy link
Contributor Author

sveinugu commented Oct 6, 2023

IIRC, without the shortening, using ITs out of the box without wildcard certificates would not work, while with the shortened URL, the admin would only have to setup wildcard certs if they are launching over 100 interactive tools per week.

I was not aware that it was possible to use ITs out of the box without wildcard certificates at all? And where does that 100 per week limit come from? Can you enlighten me?

@sveinugu
Copy link
Contributor Author

sveinugu commented Oct 6, 2023

Btw, @almahmoud, I am trying to push for, build towards path-based support for all ITs, which will end the wildcard tyranny once and for all!

See: https://f1000research.com/slides/12-1163

The general shortening of the URL is really only a side-thing here. That came about mainly from nuisiance ("why are the full urls so ridiculously long?") and also due to a general wish to simplify the code a bit by removing what I believe is the now unnecessary interactivetools_shorten_url to improve maintainability and reduce bugs (e.g. #15224).

@sveinugu
Copy link
Contributor Author

I have now rebased the code on top of latest dev. Dependent PRs galaxyproject/gx-it-proxy#17 and galaxyproject/gravity#114 have both been merged. Checks seems to be successful (2 still running as I write this). So everything should be in order for a potential merge. @mvdbeek?

@mvdbeek mvdbeek merged commit 43c5573 into galaxyproject:dev Nov 16, 2023
48 of 49 checks passed
@mvdbeek
Copy link
Member

mvdbeek commented Nov 16, 2023

Thanks a lot @sveinugu!

Copy link

This PR was merged without a "kind/" label, please correct.

@sveinugu
Copy link
Contributor Author

Thanks a lot @sveinugu!

And thanks for the review and merge! Great to have this finalised!

@hexylena
Copy link
Member

hexylena commented Nov 16, 2023

Congratulations @sveinugu this is so fantastic to see!

@sveinugu sveinugu deleted the it_inject_entry_point_path branch November 20, 2023 19:12
@mvdbeek mvdbeek added the highlight/admin Included in admin/dev release notes label Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Path-based interactive tools stopped working in 22.05
6 participants