Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: prefer remote taskfiles over cached ones #1345

Merged
merged 6 commits into from Nov 17, 2023
Merged

Conversation

pd93
Copy link
Member

@pd93 pd93 commented Sep 22, 2023

As per discussion in #1152 (comment), this PR changes the Remote Taskfiles experiment to prefer remote files over locally cached ones.

Previously, if a user used the --download flag to cache a file, this file would be automatically preferred over the remote copy. The only way to get a newer version of that file was to delete the cached version entirely.

The new behaviour always prefers the remote copy and will always attempt to fetch and use it unless the user specifies the --offline flag (in which case Task will search for a cached version).

In addition to this, if the network times out, when trying to fetch a remote copy, Task will now search for a cached version and use that instead.


My question now is... Do we need --download at all? I would argue that we can now make --download the default behaviour. i.e. we always cache the file when we download it. If we don't do this, then there is a risk that the cached copy will slowly diverge (if a user doesn't remember to use --download periodically when the remote copy changes). I think it's safer to assume that the user always wants the latest version of the file cached.

I'm also considering a --timeout flag. I have set the default to 10 seconds (which seems reasonable to me), but I can see CI/script users wanting to adjust this for various reasons.

@pd93 pd93 mentioned this pull request Sep 22, 2023
15 tasks
@caphrim007
Copy link

The way I try to frame stuff like this cli-args thing is to consider how both myself and the engineers that I work with would either pass or fail the principle-of-least-astonishment (POLA).

In this example, if I were to look at the following syntax in a taskfile,

include:
  my-remote-namespace: https://raw.githubusercontent.com/my-org/my-repo/main/Taskfile.yml

my gut-reaction before any thinking occurs is that the system is going to download that Taskfile.yml. In other words, my personal POLA is that URLs are not much different than filesystems. I point to a thing and it just gets it.

Caching, given the above, is an implementation detail to me as the user; it's invisible. If it works, yay. If it doesn't work, something seems a little slow, but the system still got my file...I wonder why slow...who cares, yay.

The above is what I also seem to have been able to grok from my colleagues.

So in this regard, my 2c, is that the --download arg is redundant, and the --offline arg is exposing an internal behavior of the system that might happen under normal operating conditions. But, to your points, I might want to deliberately specify it if I'm using Task in situations where robots are involved and I have full control over my environment.

Happy to hear others opinions.

@pd93
Copy link
Member Author

pd93 commented Sep 23, 2023

Thanks @caphrim007. Really appreciate your thoughts :) I've had a bit more time to think about this.

I think the main reasons for the --offline flag are:

  1. If a user wants to view/edit a remote Taskfile before execution or maybe for debugging
  2. If a user knows that they don't have an internet connection and they don't want to wait for the timeout.

To extend on point 1. If we kept the --download flag, but made it so that it never executes a command, then it would further facilitate the ability to "view/edit a remote Taskfile before execution".

So maybe all we need to do is:

  1. Change the default behaviour to always download/cache files (as previously stated)
  2. Change the --download flag so that it never executes tasks

@blackjid
Copy link
Contributor

blackjid commented Oct 3, 2023

I think I would want this to behave a bit like dependencies...

  • The first run, download and cache.. (possible log to stdout that downloading is happening)
    • You can always run a command to upgrade the dependencies. e.g task includes upgrade
      • There might be an option to set an auto upgrade every x hours, minutes.
  • Next runs, just use the cache.. until
    • The reference change,, for example the urs change to point to a different branch/tag
    • The autoupgrade ttl time reach and a new download happens

I think this is mostly the same as the "always download and cache" with a configurable timeout. But presented in a different way. At least I'm more used to think in terms of dependencies management that in terms of cache/timeout.

@c-ameron
Copy link

c-ameron commented Nov 2, 2023

I like @blackjid's idea!

For my context, I am wanting to use these features to have a standard set of taskfile includes across my org.

For me, I would like it to download first, then by default always use the cache. I run taskfile a lot, (multiple times a minute if I'm running a debugging command like task test -- feature/a ), so the extra network overhead wouldn't be useful.

I like the idea of having the user force a new download to overwrite the cached files. The auto-update-ttl idea is also great. It would allow users to have the fast latency of not fetching every task run, but also allow automatic updates in a soft manner.

Another suggestion, would be to have these as options as keys inside the .yml file as well. To me it would be clunky to have to have these all as a flag when running my tasks.
As an example

includes:
  my-remote-namespace: https://raw.githubusercontent.com/my-org/my-repo/main/Taskfile.yml
  offline: true
  auto-update-ttl: weekly

Thanks!

@pd93
Copy link
Member Author

pd93 commented Nov 2, 2023

Hey all. Thanks for the comments and sorry for the lack of progress on the experiment lately. It's been a busy month or so for me!

I've pushed the changes discussed in previous comments and I believe this is ready for a review when @andreynering has some time.

@c-ameron I like the idea of a cache TTL, but I'm going to leave this as out-of-scope for now. I don't think this addition will affect the API and could be easily added later. The same goes for adding the flags as keys in the file. The schema is a bit harder to amend for an experiment if we were to change our minds on anything, so I'd like to concentrate on the fundamentals for now (so that we can deliver this experiment quicker) and then revisit these bits later. That said, please feel free to open issues for these features so that they aren't forgotten.

Copy link
Member

@andreynering andreynering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 👏 👏

any calls to remote sources.
Whenever you run a remote Taskfile, the latest copy will be downloaded from the
internet and cached locally. If for whatever reason, you lose access to the
internet, you will still be able to run your tasks by specifying the `--offline`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we talked on Discord, it'd be interesting to have an offline: true setting and a TASK_OFFLINE=1 env to allow users to set this once and have it always enabled.

Can be on another PR if you prefer, no problem.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! I also created it is an issue as requested :)
#1403

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the functionality here is ready, so let's get this merged. I'll work on the schema/env options in another PR as suggested.

@c-ameron Thanks for creating the issues. I've added them to the TODO list in the experiment issue so they're not forgotten.

Comment on lines +81 to +82
of trying to download it. You are able to use the `--download` flag to update
the cached version of the remote files without running any tasks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are able to use the --download flag to update the cached version of the remote files without running any tasks.

Yes, that's the idea 👍

@pd93 pd93 merged commit 546a4d7 into main Nov 17, 2023
11 checks passed
@pd93 pd93 deleted the prefer-remote-files branch November 17, 2023 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants