Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal]: Flyte System Tags and metadata #3320

Merged
merged 11 commits into from
Jul 20, 2023
Merged

[Proposal]: Flyte System Tags and metadata #3320

merged 11 commits into from
Jul 20, 2023

Conversation

kumare3
Copy link
Contributor

@kumare3 kumare3 commented Feb 7, 2023

This RFC proposes a way to add execution Tags and description.
this would help the user in multple ways. Please read the rfc for more context.

Signed-off-by: Ketan Umare ketan.umare@gmail.com

kumare3 and others added 2 commits February 6, 2023 17:15
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <16888709+kumare3@users.noreply.github.com>
Copy link
Contributor

@goyalankit goyalankit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very useful feature!

rfc/system/0001-flyte-execution-tags.md Show resolved Hide resolved
#### Approach 2: Certain label keys are treated special
- “group” will group everything
- “experiment” will also group everything with higher priority.
- “name” will override the execution id with the name?
Copy link
Contributor

@goyalankit goyalankit Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are users allowed to change the labels? If they are then overriding might be an issue since you might have already fired async events to external systems. So I think it might be useful to maintain executionID as an identifier that can't be modified once execution has been created.

Alternatively, this could be an alias to the execution ID rather than overriding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

executionID cannot be changed - it is immutable and unique per project/domain.
name is just an alias. I will update the doc to reflect this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but i do like the idea of immutable labels as well. once added you cannot change them

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we'll support both mutable and immutable labels?

Copy link
Contributor

@flixr flixr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Mostly sounds good to me.
Just that I would go with the name tags on the flyte (cli) level.
Querying could still be done on k8s labels as well...

A workflow or task can be executed using

```bash
pyflyte run --remote --labels k:v --labels k1:v1 test.py wf --input1=10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably call the arg here --tag to not confuse this with kubernetes labels.
And then probably assign the tags to k8s annotations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great point, but the rpc field is sadly already called label. and these will become k8s labels

@davidmirror-ops davidmirror-ops added the rfc A label for RFC issues label Mar 29, 2023


## 7 Potential Impact and Dependencies
We this this is one of the most requested features in Flyte and will solve
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We this this is one of the most requested features in Flyte and will solve
This is one of the most requested features in Flyte and will solve

available on each execution. The users are allowed to filter an exection simply
by clicking on a label and then all executions are filtered by that label.

#### Approach 2: Certain label keys are treated special
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this approach is chosen, I wonder whether it would be nicer for the user to do

pyflyte run --remote --group foo --experiment bar ...

instead of

pyflyte run --remote --labels group:foo --labels experiment:bar ...

This doesn't mean that under the hood the labels mechanism couldn't be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat opposed to this as it could be confusing to users as to what is a label vs what is a keyword cli argument 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do we wanna create group and experiment as CLI arguments and introduce them as a concept? 🤔

Copy link
Member

@fg91 fg91 May 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally prefer option 1: treat all labels the same way. Users might not want to follow the categories we deem sensible. Experiment tracking servers like Mlflow or Wandb, which also have such a tagging mechanism, simply allow users to assign arbitrary tags. I would argue that ML engineers are used to this and we should provide the same UX without imposing special naming conventions.

Only exception: execution name
I find it really helpful to have the pod names include customizable identifiers.
We have a registration script, similar to pyflyte run with has an --execution_name arg. The user provided value is appended with a random uuid, as is currently already chosen for the execution ids, and the result is checked against the execution name regex again and then passed to FlyteRemote.execute(execution_name=...) (already supported, see here). So I wouldn't treat execution name with a pod label but the pods metadata.name.

This comment is another argument for not treating execution names with labels but instead metadata.name since I agree that tags need to be mutable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bstadlbauer @fg91 @elibixby @flixr @goyalankit Some questions for you

  1. Do you prefer key-value pair tags or tags that only have key?
  2. Should we add tags to Kubernetes label?

Currently, execution spec (with labels) is serialized to byte and is stored in the execution table. it's impossible to add / delete / update tags. if we use k8s client to filtered flyteworkflow (CRD) by labels. we cannot search a execution after CR is deleted.

I have a PR that adds tags table. it allows us easily add / update / delete tags, and even attach tags to task / workflow / project. however, it's not key-value pair tags for now. If we decide to use key-value pair tags, I just need to add a new column to the tags table and update the query. I'd like to know your thought first.

btw, the current implementation works with both Mysql and Postgres.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My $0.02 is let's keep it simple and support what you call key-only tags.

A person can 'hack' this to resemble key-value if needed (ie 'costcenter-12'), but we don't need to manage that complexity on the back end or in the UI when we get to figuring out how to let folks use tags to sort/group things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • In my opinion key-only tags are perfectly fine and what ML engineers are used to from experiment tracking servers

  • Should we add tags to Kubernetes label

    I think being able to add/delete/update tags after the execution has already started or ended is an important feature. User story: an experiment is training/trained really well and I want to mark it for later. This is something that is not known when starting the execution. But updating/deleting/adding tags when the execution is already running would mean that the k8s labels are not in sync with what is stored in the tags table. I'd therefore say that I wouldn't apply the tags as labels to k8s.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late response here but agreed with what's been said above. Key only tags would also solve all our usecases 👍

I think being able to add/delete/update tags after the execution has already started or ended is an important feature
+1 to this and the reasoning of not applying those to k8s

fg91
fg91 previously approved these changes Mar 30, 2023
Copy link
Member

@fg91 fg91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this proposal very much. Currently we use decks to link to Wandb runs corresponding to Flyte executions.
We then use tags in wandb to do the grouping by experiments, tags, ... that you describe in the RFC.
Would love to do this directly in Flyte.

@davidmirror-ops
Copy link
Contributor

03-30-2023 Meeting notes:
KU: launchplan name could be the default grouping tag
Tim Sheiner: that's not very prominent right now in the UI
KU: we could add all to system tags but could be an overload
TS: make sure that this proposal is not redundant to the fact users are already naming workflows and tasks
TS: instead of treating this proposal as tags, treat it as a separate...
GG: their use case is running workflows from notebooks (using Quarto/Jupyter for reports)

@tsheiner
Copy link
Contributor

Slightly perpendicular to this proposal but related to the notion of making executions easier to identify:

I note that Prefect uses a system of nonsense ids for executions which is much friendlier looking and far easier to remember than Flyte alphanumeric ids. For example ‘enigmatic-waxbill’ for ‘massive-antelope.’ Would something like this be possible for Flyte?

@davidmirror-ops
Copy link
Contributor

04-13-203 notes: no updates

bstadlbauer
bstadlbauer previously approved these changes May 11, 2023
Copy link
Member

@bstadlbauer bstadlbauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

There is an ongoing slack conversation related to MySQL tag storage here - not 100% sure what that is about? cc @kumare3 @ByronHsu

@fg91
Copy link
Member

fg91 commented May 11, 2023

For visibility: @kasimiraula proposed that user should be able to create notes for executions, similar to what is currently possible when aborting an execution #3646

@kumare3
Copy link
Contributor Author

kumare3 commented May 25, 2023

Slightly perpendicular to this proposal but related to the notion of making executions easier to identify:

I note that Prefect uses a system of nonsense ids for executions which is much friendlier looking and far easier to remember than Flyte alphanumeric ids. For example ‘enigmatic-waxbill’ for ‘massive-antelope.’ Would something like this be possible for Flyte?

I think this is possible and I have thought about it. We should consider entropy considerations and cost of doing this and then all for it 👍🏽

Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
fg91
fg91 previously approved these changes Jul 11, 2023
Copy link
Member

@fg91 fg91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for incorporating what was discussed in the contributors' syncs. I like that now:

  • tags will not be attached to k8s objects as labels but instead saved in the database so that they can be modified/deleted during/after the execution.
  • all tags are equal and Flyte doesn't impose any special names.
  • we have a clear distinction between tags and notes.

Please consider all comments below as nit-picks that you can just resolve in case you don't agree.

rfc/system/0001-flyte-execution-tags.md Outdated Show resolved Hide resolved
rfc/system/0001-flyte-execution-tags.md Outdated Show resolved Hide resolved
rfc/system/0001-flyte-execution-tags.md Outdated Show resolved Hide resolved
rfc/system/0001-flyte-execution-tags.md Outdated Show resolved Hide resolved
A workflow or task can be executed using

```bash
pyflyte run --remote --tags '["hello", "world"]' test.py wf --input1=10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do --tag hello --tag world instead of providing a list of tags in string format?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can support both?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pyflyte run --remote --tags '["hello", "world"]' 
# and 
pyflyte run --remote --tag hello --tag hello
# and 
pyflyte run --remote --tag hello --tags '["key1", "key2"]' 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion about this is not so strong that I'd say we need to support two options in case others prefer --tags '["hello", "world"]' . I personally find lists or jsons in string representation as cli args a bit cumbersome.

rfc/system/0001-flyte-execution-tags.md Outdated Show resolved Hide resolved
pingsutw and others added 5 commits July 11, 2023 20:52
Co-authored-by: Fabio M. Graetz, Ph.D. <fabiograetz@googlemail.com>
Signed-off-by: Kevin Su <pingsutw@gmail.com>
Co-authored-by: Fabio M. Graetz, Ph.D. <fabiograetz@googlemail.com>
Signed-off-by: Kevin Su <pingsutw@gmail.com>
Co-authored-by: Fabio M. Graetz, Ph.D. <fabiograetz@googlemail.com>
Signed-off-by: Kevin Su <pingsutw@gmail.com>
Co-authored-by: Fabio M. Graetz, Ph.D. <fabiograetz@googlemail.com>
Signed-off-by: Kevin Su <pingsutw@gmail.com>
Signed-off-by: Kevin Su <pingsutw@apache.org>
@pingsutw pingsutw requested a review from fg91 July 12, 2023 15:24
Copy link
Member

@fg91 fg91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG!

@eapolinario eapolinario merged commit 845d0f5 into master Jul 20, 2023
6 checks passed
@eapolinario eapolinario deleted the flyte-tags branch July 20, 2023 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfc A label for RFC issues
Projects
Status: Implemented
Development

Successfully merging this pull request may close these issues.