Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp Source Code event structure #261

Open
sselberg opened this issue May 21, 2021 · 6 comments
Open

Revamp Source Code event structure #261

sselberg opened this issue May 21, 2021 · 6 comments
Assignees
Labels
protocol All protocol changes protocol-incompat Protocol changes that aren't backwards compatible

Comments

@sselberg
Copy link

sselberg commented May 21, 2021

I'm mostly versed in git and not so much in other SCM tools, therefore the examples uses git.

Description

The current Source Code events seem to be modeled around Gerrit's "patchset-created" and "change-merged" events and aren't optimal for describing the various states of a Source Code tree.
There are f.i. no events describing Source Code States (labeled by f.i. git-branch and git-tag).

For git in particular one of the biggest issues is that a new branch is extremely cheap in git but potentially extremely expensive in Eiffel if you wish to model the git commit-graph with Eiffel.
Creating a branch in git is simply adding a reference to a commit so it's O(1) regardless of how big the commit graph is, whereas in Eiffel this means that 2 events are created for every commit reachable from the branch (since the SCC and SCS aren't describing a commit but the relation-ship a commit has to a branch, and an SCS needs a SCC link).
To compare number of events created with the current and proposed solution, when a new branch is created in a big repo like the linux-kernel;

event structure nbr of commits reachable from branch number of events created
Current 800.000 1.600.000
Proposed 800.000 1

Motivation

To make the modelling of Source Code life-cycle more consistent and flexible.

Exemplification

There are certain parts that are interesting when describing the lifecycle of Source Code:

Term Description git equivalent
Source Code Change Delta between old and current state git commit
Source Code Definition Label that defines the meaning of a Source Code State git branch, git tag
Source Code Definition Change Request A request to update a mutable Source Code Definition Request to update a branch-pointer to a new commit,(Gerrit change, GitHub PR, GitLab MR...)

To model this in Eiffel you could use the following events:
(The data fields in the following examples are not meant to be an exhaustive list, but merely an example of possible fields)

SourceCodeChangeCreated:

For git this would be equivalent to someone pushing a commit to a git server and look something akin to the current SourceChangeCreated#gitIdentifier minus the branch field.
Data fields:

  • gitIdentifier
    • commitId
    • repoName
    • details (for Gerrit this could be the REST EndPoint that gives you a commit description)

Links:

  • PREVIOUS_VERSION: [1..many] SourceChangeCreated (equivalent to git-commit-parents)

SourceDefinitionChangeRequestCreated

Data fields:

  • changeRequest
    • targetBranch
    • changeIdentifier
    • requester|changeOwner

Links:

  • CHANGE: [1..many] SourceCodeChangeCreated
  • PREVIOUS_VERSION: [1] SourceDefinitionChangeRequest (a link to the SDCR corresponding to the previous patch-set of the change or the previous state of the PR/MR) (relevance?).

SourceDefinitionUpdated

For updates of mutable definitions (f.i. git-branches)
Data fields:

  • branch (f.i. "master")
  • reference (f.i. "refs/heads/master")
  • Updater

Links:

  • CHANGE: [0..many] SourceDefinitionChangeRequest (the change requests that were included in the branch update)
  • PREVIOUS_VERSION: [1] to the previous DefinitionUpdate (previous branch-update)
  • BASE: [1] SourceCodeChangeCreated

SourceDefinitionCreated

For creation of immutable definitions (f.i. git-tags)
Data fields:

  • tag (f.i. "v3.2.1")
  • reference (f.i. "refs/tags/v3.2.1")
  • Creator

Links:

  • BASE: [1] SourceCodeChangeCreated

Use cases

User creates two changes that are submitted

  1. User pushes a patch-set and creates a new change
    Events created: SCCC1, SDCR1
  2. User pushes a new patch-set for an existing change
    Events created: SCCC2, SDCR2
  3. User pushes a patch-set and creates a new change on top of the previous change
    Events created: SCCC3, SDCR3
  4. Both changes are merged
    Events created: SDU1

Proposed structure
merge-2-changes-new
Current structure
merge-2-changes-old

User updates a branch from upstream

  1. User pushes a stack of commits to copy an upstream repo
    Events created: ...SCCCn-1, SCCCn, SDU1

Proposed structure
update-from-upstream-new
Current structure
update-from-upstream-old

User creates a new branch from existing commit

  1. User pushes existing commit to new branch
    Events created: SDU1

Proposed structure
new-branch-from-existing-commit-new
Current structure
new-branch-from-existing-commit-old

Benefits

It would cut down on the number of events necessary for modelling Source Code life-cycle significantly.
To model Source Code you currently need, for each git-commit (SCC +SCS) * (number of branches that the commit is reachable from)

Possible Drawbacks

It requires a total revamp of the current SCM event structure

TODO

If we rewrite the source code event structure we should do that with Hierarchical Source Code Structures (f.i. git submodules) in mind.

@sselberg sselberg changed the title WIP: Revamp Source Code event structure Revamp Source Code event structure Jun 15, 2021
@sselberg
Copy link
Author

sselberg commented Feb 2, 2022

I updated the event-graphs describing the difference between new and old event structures.

@erkist
Copy link

erkist commented Feb 3, 2022

I think it looks good, I cannot come up with any use case I would want to be able to realize that I could not do with these events. (together with ConfidenceLevelModifiedEvent, as Magnus suggested in the last community meeting)

Some minor feedback and/or questions (most of them are premature, as this is a proposal and not a spec):

SourceDefinitionX. I would prefer SourceReferenceX. The term "definition" implies to me "what is it" whereas reference is "where is it", and I think the events are more representing the "where" part.

SourceDefinitionCreated/Updated. I think Eiffel generally uses the term Created for something that later Submitted/Published events will link to, but in this proposal these two events are completely separate and are not allowed to link between each other. I don't have good ideas for new names, so I will give my bad proposals: SourceImmutableReferenceCreated, SourceMutableReferenceModified

As discussed in the community meeting, SourceDefinitionUpdated needs to allow 0..1 previous links, as the previous link won't exist when the branch is first created. The same is true for SourceCodeChangeCreated, to allow for the creation of new repos with only one initial commit. And for SourceDefinitionChangeRequestCreated to be able to create the first patch set.

Also as discussed in the community meeting, the flow for when a change results in a merge commit might need a diagram to explain what events would be sent and with what content. Probably also for PRs that are squashed on submit.

I guess the Creator attribute of SourceDefinitionCreated is optional, as e.g. Git lightweight tags don't have a creator.

When it comes to incoming links from other Eiffel events, what are valid targets? For instance CLM, I guess it should be able to point to all of these events? Or only SCCC and SDCRC (as SDC and SDU can always be resolved to exactly one SCCC)?

@m-linner-ericsson
Copy link
Member

I came to think of Git Flow when we talked about branches with special meaning. I guess that is in such a model that you see problems with the current events.

@sselberg
Copy link
Author

sselberg commented Feb 3, 2022

Not entirely comparable, git-flow was created under the assumption that all you had was git-branches and not Gerrit-changes, GitHub-merge-requests etc.
But typically (and I think this model is the most common) we have two different types of branches that are protected by code-review (i.e. interesting from an integration perspective):

  • master - where all new features are developed
  • release - for bug-fixes on a release track

For the release branches there are different integration policies, f.i. release-1.0 might need to be tested on device a,b,c whereas release-2.0 might be need to be tested against c,d,e.

This is a rather simplified picture and developers/project-owners/teams are to some extent free to create the branches they feel they need outside of these namespaces, To complicate matters more all repositories aren't governed by the same rule-set w.r.t. branch namespaces which makes it non-trivial to decide whether we should create events for a branch or not.

@magnusbaeck magnusbaeck added protocol All protocol changes protocol-incompat Protocol changes that aren't backwards compatible labels Nov 18, 2022
@e-backmark-ericsson
Copy link
Member

@e-backmark-ericsson has agreed to provide a proposal based on SCM changes, from the summit 2023.1

@e-backmark-ericsson
Copy link
Member

Eiffel Community meeting held on Nov 23. Outcome was that @magnusbaeck et al will update their proposal with possibilities to describe source change updates, including changed abandoned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
protocol All protocol changes protocol-incompat Protocol changes that aren't backwards compatible
Projects
Status: In Progress
Development

No branches or pull requests

5 participants