Skip to content
This repository has been archived by the owner on Mar 22, 2022. It is now read-only.

Centralize fetch / install script #181

Closed
ekingery opened this issue Oct 16, 2020 · 24 comments
Closed

Centralize fetch / install script #181

ekingery opened this issue Oct 16, 2020 · 24 comments
Assignees

Comments

@ekingery
Copy link
Contributor

ekingery commented Oct 16, 2020

Currently we hardcode a handful of helper / installation scripts into every track repo. Specifically relevant to configlet are the fetch-configlet and fetch-configlet.ps1 scripts.

Up for discussion / consideration is centralizing the fetch / install scripts, similar to the way homebrew and many other software projects provide a bash installer: /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)

This would give us much more control and flexibility in the configlet distribution process and would prevent us from requiring backwards compatibility in release file naming and other similar constraints. Ideally it would be the last time we need to attempt to track down and update all of the fetch_configlet scripts in all of the track repos.

As an example of a backwards-compatible hurdle we have now, we need to manually upload renamed release files from *.tar.gz to *.tgz, as GoReleaser allows for customization of every part of the filename, excepting the extension. We are also not adhering to the updated naming standards we established in the CLI releases.

For track maintainers using configlet locally, it seems reasonable that we supply a separate set of installation instructions / scripts for people who don't have bash available on their computer.

@ErikSchierboom
Copy link
Member

I like the idea of tracks being able to install configlet without having to update the actual script. We should have a bash script for Unix-based systems, and a PowerShell script for Windows based systems.

A great example of this is .NET Core, which provides both these install scripts: https://docs.microsoft.com/en-us/dotnet/core/tools/dotnet-install-script

@ekingery
Copy link
Contributor Author

ekingery commented Oct 20, 2020

Another instance of increased fragmentation of fetch_configlet is the new GH Actions build. This simpler version should be easier to maintain, but does add to the problem of hardcoding release names, for example.

Also worth noting is that python (and probably a few other repos) have their own fetch script and do not use fetch_configlet at all.

Lastly, #183 is an example of a fix that would need to be hardcoded into 35+ repos. All this to say, I think we should consider a strategy for configlet (and canonical-data-syncer]) distribution sooner rather than later. Perhaps as part of the switch from Travis to GH Actions?

@ee7
Copy link
Member

ee7 commented Jan 13, 2021

Did we decide anything here?

The option proposed by this PR (option 2 in the below list) seems good to me, assuming we're set on keeping a fetch script in every track repo.

Some options:
1a. Keep a fetch-configlet script in every track repo, and do nothing. This has been the "strategy" so far, and caused some tracks to have silently broken scripts.
1b. Keep a fetch-configlet script in every track repo, and make a PR to every repo when we make an important change. This is a big maintenance burden, especially if tracks independently start trying to make changes to the fetch-configlet scripts.
1c. Keep a fetch-configlet script in every track repo, using git submodule. This makes it harder for a contributor to use the script, as a simple git clone is no longer be sufficient to download it - users must run git clone --recurse-submodules, or a plain git clone with git submodule update --init.
1d. Keep a fetch-configlet script in every track repo, using git subtree
1e. Keep a fetch-configlet script in every track repo, using git subrepo (see discussion of submodule vs subtree vs subrepo)
1f. Keep a fetch-configlet script in every track repo, using some other git-based approach.
2. Add a fetch-fetch-configlet script to every track repo, which downloads the latest version of the fetch-configlet script and executes it. This can avoid the downsides of 1b and 1c, but is slightly less convenient for people who want to inspect every script that they run. It's also critical that the upstream script is not modified maliciously. From @SleeplessByte: The script that performs fetch-fetch-configlet can also just be named fetch-configlet.
3a. Remove the fetch-configlet script from every track repo, and just give a link in the docs to the upstream fetch-configlet location.
3b. Remove the non-CI fetch-configlet script entirely, and exclusively link to the upstream configlet release page.
4. Keep the configlet upgrade self-updating functionality around. Encourage people to only use that, or combine that with some above option. (I would personally like the new configlet to have it eventually).

Anything else?

I think CI on every track should use the exercism/github-actions/configlet-ci action to run configlet regardless of the above choice.

Another instance of increased fragmentation of fetch_configlet is the new GH Actions build.

Yes, but I do think the fragmentation is manageable here, and worth it.

This simpler version should be easier to maintain

I agree.

but does add to the problem of hardcoding release names, for example.

Can you expand here?

Also worth noting is that python (and probably a few other repos) have their own fetch script and do not use fetch_configlet at all.

Indeed. But I think that tracks that choose to provide their own fetch scripts can take responsbility for maintaining them. I don't think we want to ban them, so there's not much we can do about it.

@ekingery
Copy link
Contributor Author

Thanks for the response, @ee7! I was thinking along similar lines as your 2. proposal above. However, rather than having the install script pull fetch-configlet, I am proposing that wherever fetch-configlet is used, we replace that with a single command that runs the shell script installer from the main branch of the repo. This way, we don't actually need fetch-configlet at all (unless I'm missing something, which is likely)! This is the example command from homebrew:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)

but does add to the problem of hardcoding release names, for example.

I was pointing out that we still hardcode release names in the GHA script. This code could be updated to use the shell script installer as well.

The reason I made this proposal was for easier installation and updating with GoReleaser. Now that configlet is going to be rewritten in nim, I no longer have a vested interest and I'm sure whatever you and @ErikSchierboom come up with will be fine. Feel free to adjust and/or close out this issue as you see fit.

@ErikSchierboom
Copy link
Member

I like proposal #2, as it does solve the "how to update the script across all tracks" issue.

This way, we don't actually need fetch-configlet at all

I would still have it just as a convenience. I wouldn't want all tracks to figure out the exact bash command to run the installer.

@SaschaMann @SleeplessByte What do you think?

@SaschaMann
Copy link
Contributor

I don't use the script outside of CI and don't really care. I download it from the GH release page and watch the repo for releases. I don't know if first time contributors or casual contributors bother downloading it at all.

Having a script to download a script to download a tool seems a bit cumbersome though. I'd slightly prefer option 1b, with an opt-out for tracks that use their own tool. I don't see how that's a big maintenance burden. The script doesn't change too often, and if they make downstream changes they can opt-out.

@SleeplessByte
Copy link
Member

I use the scripts locally all the time. I don't mind if they were to go away, but is fetch fetch worth the DX for the few times fetch updates?

I rather have a CI on the track that periodically runs to pull the latest fetch script from a central repo.

@ErikSchierboom
Copy link
Member

I rather have a CI on the track that periodically runs to pull the latest fetch script from a central repo.

I would be totally fine with this.

@SaschaMann
Copy link
Contributor

Why not push it rather than periodically running it?

@SleeplessByte
Copy link
Member

I would, but those are 1a and 1b and apparently are problematic.

@ErikSchierboom
Copy link
Member

1a is the current situation, which I don't particularly like because it is easy for stuff to break without tracks knowing about the fix. I'm fine with 1b.

@SleeplessByte
Copy link
Member

In this case I think the maintenance burden is really low as it's "Approve and merge".

@ErikSchierboom
Copy link
Member

Yeah. We will have tracks that are not actively maintained, but that is something we can workaround once we get the exercism/reviewers permissions sorted out.

@ee7
Copy link
Member

ee7 commented Jan 15, 2021

However, rather than having the install script pull fetch-configlet, I am proposing that wherever fetch-configlet is used, we replace that with a single command that runs the shell script installer from the main branch of the repo. This way, we don't actually need fetch-configlet at all (unless I'm missing something, which is likely)! This is the example command from homebrew:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)

The fetch scripts have two main uses:

  • In CI (historically): to download configlet so it can be used to run configlet lint (some tracks also use fmt and generate in CI).
  • Locally by contributors: during development to get a local copy of configlet to use while preparing PRs.

If the first use case was the only one, then I agree your suggestion seems better than having a copy of the script in every repo.

But for the first use case, it's better now to use the relatively recent exercism/github-actions/configlet-ci GitHub Action (see e.g. here) which runs an inherently centralized and simplified script - it doesn't have to do platform detection, and can rely on jq being installed.

It's true that we could provide for the second use case by removing the fetch scripts and documenting everywhere that users should run that command. But I think that's worse than fetch-fetch-configlet script (option 2) because:

  • It's perhaps less convenient for the user: they need to find the command that they want to copy-paste
  • It's less convenient for us: if the command ever needs to be updated (we hope that it doesn't), it's harder to do that across documentation on many tracks than in a file.
  • I don't want to train an inexperienced user that it's a good idea to copy-paste commands of the form bash -c "$(curl foo)", (although that's a little better than curl foo | bash, see e.g. https://www.idontplaydarts.com/2016/04/detecting-curl-pipe-bash-server-side). It's also pretty bad to tell them to execute a script that downloads and executes another script, though.

I don't see how that's a big maintenance burden. The script doesn't change too often, and if they make downstream changes they can opt-out.

In this case I think the maintenance burden is really low as it's "Approve and merge".

Sorry, I was unclear.

By "maintenance burden" I meant something like: "somebody has to spend time coding the thing that automates creating those PRs, and that time might be better spent doing other Exercism work".

But if that's easy to automate then my preference is for option 1b.

I think it's messy to maintain the same one or two files across 75+ track repos, and that centralizing it as a dependency is cleaner. But that generally makes the script harder to inspect or use. There's also an argument that we've had the fetch scripts for a long time, and contributors might expect them to be there.

@SaschaMann
Copy link
Contributor

By "maintenance burden" I meant something like: "somebody has to spend time coding the thing that automates creating those PRs, and that time might be better spent doing other Exercism work".

Ah okay. Pushing a file across repos is pretty straightforward.

@SleeplessByte
Copy link
Member

SleeplessByte commented Jan 15, 2021

If fetch-fetch-configlet is renamed to fetch-configlet and actually does:

  1. fetch-fetch-configlet
  2. followed by fetch-configlet (which can then be inline-executed, instead of what we have right now)

then nothing changes for current maintainers and I can agree fully with the change of the contents as @ee7 describes.

@ErikSchierboom
Copy link
Member

Yes, that seems to me to be the most reasonable solution. That way, the fetch-configlet script in the track's repo will hardly ever change and tracks still have the exact same functionality. We should use a CODEOWNERS rule for the fetched script as that file will become more important due to to it being used in all tracks.

@ee7
Copy link
Member

ee7 commented Jan 22, 2021

I don't know what's the right decision here.

I don't personally use the fetch scripts locally, and I wouldn't personally run a script that does "download a script from an open source repo, and execute that script without inspection" in this context. But maybe configlet users are fine with that; after all, the script is downloading a configlet binary - they already trust that the binary itself isn't malicious, and the binary comes from the same repo as the upstream fetch script.

Given that it's apparently easy to automate "open PRs in ~80 repos after an upstream change to fetch-configlet" then maybe keeping the status quo with 1b is fine. We'd probably want to set CODEOWNERS to exercism/maintainers-admin in every repo for that fetch script so that it can only be changed upstream, which should make it always trivial to merge the automated PRs. We could also add a comment at the top of the fetch script that links to the upstream location (I think this would have helped in the past).

If we did option 2 then keeping the script named fetch-configlet is fine by me.

I edited my first post in this issue to add 1d/1e/1f, explicitly mentioning git subtree and git subrepo - let's discuss whether they'd be a better fit.

@ErikSchierboom
Copy link
Member

I don't personally use the fetch scripts locally, and I wouldn't personally run a script that does "download a script from an open source repo, and execute that script without inspection" in this context. But maybe configlet users are fine with that; after all, the script is downloading a configlet binary - they already trust that the binary itself isn't malicious, and the binary comes from the same repo as the upstream fetch script.

My thinking was indeed along the second part of your reasoning: people already trust the configlet binary, so we hardly lose anything with making them trust the script too.

Given that it's apparently easy to automate "open PRs in ~80 repos after an upstream change to fetch-configlet" then maybe keeping the status quo with 1b is fine. We'd probably want to set CODEOWNERS to exercism/maintainers-admin in every repo for that fetch script so that it can only be changed upstream, which should make it always trivial to merge the automated PRs. We could also add a comment at the top of the fetch script that links to the upstream location (I think this would have helped in the past).

One thing to note here is that not all tracks are very actively maintained. We could indeed use CODEOWNERS to add guarantees to make it trivial to merge the PRs, but I don't particularly see the benefit of doing that over downloading a centralized script. It still feels like we are introducing a burden on the maintainers that we don't need to introduce.

For me, the most important thing is ease of maintenance. Option 2 to me sounds like it is easier to maintain, although admittedly the scripts doesn't change that often (it does change from time to time though).

A related question is whether tracks are actually interested in the contents of the fetch script. I'm guessing that in most cases, they don't really care about the actual script's contents, but only in what it helps them achieve (downloading the latest configlet). If this is true, they'd probably be fine with having the track's fetch script download the actual fetch script.

By the way, if we care enough about the "don't execute external scripts", we could always have an option where it is possible to pass a flag to first show which script will be executed. I'd not want to make this the default though as I think most maintainers don't care enough about this :)

@SleeplessByte
Copy link
Member

By the way, if we care enough about the "don't execute external scripts", we could always have an option where it is possible to pass a flag to first show which script will be executed. I'd not want to make this the default though as I think most maintainers don't care enough about this :)

I think this is best of both world.

[...] after all, the script is downloading a configlet binary - they already trust that the binary itself isn't malicious, and the binary comes from the same repo as the upstream fetch script.

Yep. I use these scripts all the time and this is exactly how I feel.

@ErikSchierboom
Copy link
Member

Okay, let's do the following:

  • Two new scripts will be created that download the latest fetch script from this repo and immediately run that script. We'll script PR these scripts to the tracks and for the tracks it will appear as if nothing has changed functionality wise
  • The two new scripts will have an option to require a confirmation before execution, showing the script that will be executed

@ee7
Copy link
Member

ee7 commented Jan 26, 2021

I'm okay with that. But I'll mention:

  • It is generally non-trivial to write idiomatic, bug-free scripts.
  • Adding any complexity to the script that performs "fetch-fetch" increases the chance that we need to update it later.
  • Updating that script across 80 repos isn't much better than updating across 80 repos the script that performs "fetch".

Can we explicitly rule out git subtree and git subrepo? The end user doesn't need to know that we're using them - they don't need to change their git clone command (unlike with submodules), and they don't need to install anything. But we can update the fetch script in every repo just with a pull command and a commit.

The best discussion I've found that compares submodule, subtree and subrepo is here.

See also the top-level documentation in that repo: https://github.com/ingydotnet/git-subrepo

I think this is messier than the "fetch-fetch" approach in some ways, but cleaner in others. I don't know which I prefer, but I do think we should explicitly reject git subtree and git subrepo as an option before changing every track repo to do "fetch-fetch".

@ErikSchierboom
Copy link
Member

I'm hereby explicitly ruling that out :) We won't do anything with git submodules/tree/repos.

@ekingery
Copy link
Contributor Author

ekingery commented Feb 9, 2021

Seems like we can close this issue out - please re-open if I've missed something! Thanks for all of the contributions - this is a tricky one...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants