Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor changes #4

Merged
merged 2 commits into from
Nov 14, 2023
Merged

Minor changes #4

merged 2 commits into from
Nov 14, 2023

Conversation

AlexandreDecan
Copy link
Collaborator

Hello,

Also, please add some documentation for the two new functions. I don't see why they were implemented. Is this something you internally use? Why is this called "verbose" exactly? That's not really the kind of behaviour I see when using "-v" in most CLI.

@pooya-rostami
Copy link
Owner

  • It was a feature requested by @tommens on the detailed output of the tool. In case you have the addition of a job. he wanted to see all the changes in the level of step, etc. in the output of CLI.
  • The two new functions are used internally in the CLI output section. It does not change any functionality in the tool for finding the changes.
  • I agree that verbose behavior in CLI tools is different than this but I just came up with the name because to the extent it makes sense. Do you have any other suggestions for it?

@AlexandreDecan
Copy link
Collaborator Author

You said "In case you have the addition of a job. he wanted to see all the changes in the level of step, etc. in the output of CLI." but I don't really see what was missing. If I remember, in case a job is added, the whole job is displayed in the output, including its steps, etc. I haven't tried the verbose option, so I don't know exactly what the output looks like and what it brings compared to the "default" output.

@pooya-rostami
Copy link
Owner

  • first, nothing is missing. I'm using the output of the tool and showing it in more detail (you can say in another way) in CLI
  • second, in the changes you made in the code in your commit in the print section, in case of verbose I'm showing the 'value' alongside the 'path' of the removed item. If I accept the current changes, the tool will not show that 'value'. Please revert the change in the CLI function (other parts look good) or (better option:) add another if in "elif kind == "removed": " so if it is verbose, also show the 'value' and not only 'path'

@AlexandreDecan
Copy link
Collaborator Author

(1) What are the "more details" that are provided when using --verbose ? I'm really missing the need for "another way" here, esp. since the tool is mostly aimed at being used as a library, not really as a CLI. Perhaps we should change the default way or reporting changes so that it meets Tom's expectations, instead of adding more and more options?

(2) I'll do that.

@tommens
Copy link
Collaborator

tommens commented Nov 14, 2023

If we want to use the tool as a library, and if we want to compute, for example (this is just one example among many), the exact number of steps that are being added, removed or modified between two workflow file versions, then this is currently not immediately possible using the non-verbose version, in the case where the steps being affected are part of a new job being added (or job being removed?) Currently a job addition is considered to be a single atomic change (with a single value that contains all of its details). So if you want to use the library to count the number of changes/additions/removals made to steps, you would need to manually process the output of the tool to unwind the value of this atomic change in its more primitive constituents. It is our intention to use the tool for counting fine-grained changes (additions/ removals/modifications) for the different kinds of syntactic elements contained in a workflow, so for this it would be useful to have an output in which all of the details that are part of a single job addition (removal) are repored as a list of changes (rather than a single value).

@AlexandreDecan
Copy link
Collaborator Author

AlexandreDecan commented Nov 14, 2023

Ok. But we're talking about the CLI, not the functions themselves (as provided by the library), so the --verbose option doesn't change anything when the tool is used as a library. That said, I better understand the rationale behind this change, even if I tend to disagree with: the tool aims to provide a list of changes, and this is what it does. The fact that the tool cannot be used "as-is" to compute, for example, the number of steps or keys that were added/removed/moved/changed, etc. is not per-se a limitation of the tool itself: one can easily use the output of the provided functions to compute this (and that's why we provide the tool as a library: to allow people to extend it the way they want).

I tend to fully adhere to the KISS principle when writing such tools: if one needs more functionalities, or different behaviours, either it's a common use-case and in that situation, we may decide to implement it as part of the tool, or it isn't (or it adds technical debt, maintenance burden, or...) and one can write a wrapper with the desired functionalities/behaviours.

Currently a job addition is considered to be a single atomic change (with a single value that contains all of its details). So if you want to use the library to count the number of changes/additions/removals made to steps, you would need to manually process the output of the tool to unwind the value of this atomic change in its more primitive constituents.

If something is added or removed in a workflow file, it is quite obvious that everything below is added or removed. I don't see the value in providing the full list of subpaths/values that were added/removed (the same way that diff reports on block of lines that are added/removed, and not every individual line). Processing the diff is always an option (you can iterate on each block of lines from diff to report on each of them individually; similarly, you can iterate on the keys/items of a dict/list to report on each of the underlying change). I see that as a post-processing that is out of the scope of the tool, even if it will be useful for us at some point.

It is also very likely we'll do a major "2.0.0" update at some point (likely, soon) with a complete rewriting of the tool so that it is easier to manipulate as a library. For what I've in mind currently, I would say we really need a Path object to describe, in a structured way, a path in a workflow file. Such a path object could also be fuzzy (e.g., "jobs.*.steps.uses" to get all Actions), with a match method to find concrete instances of this fuzzy path given a workflow file. We also need a Change object (with potential subclasses for removals, additions, editions, moves, ...). These objects will be associated to Path instances (fuzzy or not), perhaps with a .apply(workflow) method to address #2 (if the path object can be extended to take into account values, encapsulated in another kind of object, we can even use this for automated refactoring, for example to automatically replace "jobs.*.steps.uses = actions/checkout@*" with a newer version of the Action; TL;DR: a kind of xpath for YAML). Then, we can have as_text, as_json, as_tuple, etc. methods to output the change in a specific format (moving the responsibility of displaying the changes to the class itself rather than some top-level code in the CLI). To address what you proposed, I can think of a walk method (on a change) to get all the atomic changes (e.g., when called on a Addition object, returns all "sub-additions" if any). But let's discuss about all of these "2.0.0"-related things later :-)

@pooya-rostami pooya-rostami merged commit 0d4b0a5 into main Nov 14, 2023
10 checks passed
@AlexandreDecan
Copy link
Collaborator Author

What about - -atomic instead of - -verbose to report on all, including nested ones, changes? (if you really want to keep this option as part of the cli). Or - -nested?

@AlexandreDecan AlexandreDecan deleted the AlexandreDecan-patch-1 branch November 15, 2023 10:04
@tommens
Copy link
Collaborator

tommens commented Nov 15, 2023

I am not convinced of the words "--atomic" or "--nested", they do not convince me. What about something like "--detailed"?

@AlexandreDecan
Copy link
Collaborator Author

I'm not convinced by "--detailed" since it does not bring more details: it just unfolds non-atomic changes into atomic ones. That said, I still consider it's a bad idea to have such flag for the CLI (I understand, to some extent, the use-case you mentioned, but IMHO this shouldn't be part of the tool itself, or if you really want to have it in the tool, it should be part of the Python API, not the CLI). I really think we should keep the tool as simple as possible, while allowing (or, said differently, not doing anything to prevent) other developers to make use of it for their own use-case using their own code. If several developers share similar or close use-cases, then we can decide to support these use-cases directly in gawd.

@tommens
Copy link
Collaborator

tommens commented Nov 15, 2023

What about --unfold ? One particular use case for which this unfolded version would be particularly useful (for ourselves) is to detect whether some step is moved from an existing job to a new job (or whether some step is moved from a removed job to an existing job). Currently, such "step moves" are not distinguishable from "step additions" or "step removals" if one does not check INSIDE the context of a job.

@AlexandreDecan
Copy link
Collaborator Author

Unfolding is a post-processing on the changes that are reported by the tool. The use-case you indicate is kinda specific: one could also consider moving step parameters, moving environment variables (between jobs, between steps, from a step to a job, globally, etc.) in which cases the unfolding won't be helpful.

I'm not saying that the use-case you mention isn't interesting, but I think we shouldn't support it directly in the tool (or at least, not advertise it for the CLI part). On the other hand, we can mention it (and explain how to deal with it) in the documentation, perhaps in a "Recipes" section (that can also include the solution to integrate gawd into git, as mentioned in #1).

In any case, if we decide to keep it in the tool:

  • If we keep it in the CLI, I'm ok with --unfold as a flag to "unfold non-atomic additions and removals into atomic ones";
  • If we keep it in the Python API: the implementation should be reworked:
    • Python code is not consistent (I suggested to use the black formatter to avoid this, and I already fixed some lines);
    • Paths are not handled consistently (there's an hard-coded check to see whether we are in a step to use the bracket notation for positions, while this should apply to all sequences, not only for steps);
    • None of the two functions are documented ('flatten is quite obscure; creating_verbose_version is.. erh..);
    • We should decide whether the unfolded additions (or removals) should be reported only for leaves, or for the whole subtree (i.e., if a job X if added, should we only mention an "added" change for each path leading to a value (e.g., X.name, X.steps[0].uses, X.steps[0].name, ...), or should we also mention an "added" change for X, for X.steps, X.steps[0], ...). I'm in favour of the second option, which seems to be the current behaviour (even if https://github.com/pooya-rostami/gawd/blob/main/gawd/__init__.py#L312 and https://github.com/pooya-rostami/gawd/blob/main/gawd/__init__.py#L314 seem to suggest that some internal nodes could be added as well, I don't think that those conditional statements can be reached by a proper GHA workflow file).
    • We should have tests for it.

@AlexandreDecan
Copy link
Collaborator Author

Two things I forgot:

  • Other possibilities than --unfold: --full, to display the full list of changes, including those affecting subtrees; --expand to expand the list of changes so that each subpath affected by a change is reported as well; --flat to return a flat list of changes instead of combining added/removed trees into a single change.
  • What do we do for "changed" ? Currently, "changed" is only reported for paths leading to leaves (because this is how we map subtrees, and it adds little value to report that intermediate nodes have been "changed" because a subpath has been changed) but for consistency, perhaps we should propagate "changed" for all paths from a changed leaf to the root? I'm not saying this is interesting to implement/report, it's just a matter of consistency: something that is added or removed means that everything below (i.e., "descendants") is added/removed, similarly, something that is changed means that everything above (i.e., "ancestors") has changed (and in that case, do we consider that "added", "removed", "renamed" and "moved" also changed the ancestors?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants