Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tags are one commit early #9282

Open
Beuc opened this issue Aug 21, 2019 · 13 comments

Comments

@Beuc
Copy link
Contributor

commented Aug 21, 2019

Hi,

From 1.38.33, checking out a tagged version of emscripten comes with an emscripten-version.txt containing 1.38.[version-1] (1.38.33 -> 1.38.32; 1.38.42 -> 1.38.41...).
This confuses emscripten, it will emit warnings e.g. "shared:WARNING: (Emscripten: system change: 1.38.32..." for 1.38.33 with the wrong version number.

emscripten-version.txt is usually correct in the next commit - is that a bug or a feature ? :)

(emsdk overwrites emscripten-version AFAICS, but the problem is present when compiling from source following emscripten.org's instructions)

@kripken

This comment has been minimized.

Copy link
Member

commented Aug 21, 2019

This is a downside of the current versioning method - slightly annoying, yeah. The key issue is that the source of truth for what is a revision now depends not just on the emscripten repo, but whether all relevant repos (emscripten, llvm, binaryen) pass all the tests together. That lets us know that our releases work properly :) but it does mean that after we "certify" a set of commits across the repos we then only update emscripten-version.txt after the fact.

See https://github.com/emscripten-core/emscripten/blob/incoming/docs/process.md#minor-version-updates-1xy-to-1xy1 (around "delayed" in particular).

This is probably only noticeable when bisecting, and shouldn't break bisection, but it can be odd...

@Beuc

This comment has been minimized.

Copy link
Contributor Author

commented Aug 21, 2019

I'm not used to have a project report a wrong version when compiled from source, TBH.
I wouldn't be surprised if this leads to ambiguous reports and downstream issues :/
What? Not my problem? Yeah, OK ;)

@kripken

This comment has been minimized.

Copy link
Member

commented Aug 21, 2019

You're definitely right, yeah - I wish there was a better way.

Hard to fix given git history is immutable + the decision of what to tag is between multiple repos. Maybe we need a monorepo ;)

So far there hasn't been a misreported version in the bug tracker. I think 99% of users don't use emscripten from git. edit: removed inaccurate statement

@dschuff

This comment has been minimized.

Copy link
Member

commented Aug 21, 2019

I guess the issue is that any arbitrary combination of commits that the release builder tests is a candidate for release, but only those combinations have prebuilt binaries. But Doesn't emsdk only use the hash in the emscripten-releases repo? Does it still actually use the emscripten tags for any purpose?
If the answer is no, we could:

  1. close the tree (i.e. no commits allowed) until a particular release candidate finishes testing;
  2. add a text-only commit and tag it. This would mean the tag wouldn't match the emscripten-releases DEPS, but it shouldn't affect correctness.
@kripken

This comment has been minimized.

Copy link
Member

commented Aug 21, 2019

Correct, the emsdk only uses the hashes in the releases repo. emscripten-version.txt and the tags in the emscripten repo are just for convenience for people using that repo in git.

We do in practice kind of close the tree - I always say I am tagging before doing so, and it seems to work as people don't land stuff in the middle :) but there is still at least 1 commit difference, as the new commit updating emscripten-version.txt is after the "right" one that the CI tested across all the repos. And the tag is in fact the right one. So those two are at least 1 commit off.

Yeah, tagging the commit after, so it matches the .txt file, is an option. But then it wouldn't match the releases hashes, which can also be confusing. Not sure which is better.

@sbc100

This comment has been minimized.

Copy link
Collaborator

commented Aug 21, 2019

This makes me sad.. but I don't have the answer 😢

@hujiajie

This comment has been minimized.

Copy link
Contributor

commented Aug 22, 2019

Yeah, tagging the commit after, so it matches the .txt file, is an option. But then it wouldn't match the releases hashes, which can also be confusing. Not sure which is better.

I'm a little confused here. Why not update the hashes in emsdk after tagging so that they are matched?

@kripken

This comment has been minimized.

Copy link
Member

commented Aug 22, 2019

@hujiajie

The hashes of the combinations that passed the tests are also in a git repo. So it's the same problem again, we'd have to do a force-push to rewrite history.

(Having them in a git repo there is very convenient for bisection across repos, etc.)

Also, the update would necessarily be a commit that was not checked. It's true that it's a nearby commit that maybe only changes the emscripten-version.txt and the changelog, but in theory, something could go wrong there.

@hujiajie

This comment has been minimized.

Copy link
Contributor

commented Aug 22, 2019

@kripken Thanks for the quick reply!

The hashes of the combinations that passed the tests are also in a git repo. So it's the same problem again, we'd have to do a force-push to rewrite history.

I suppose the hashes are for gclient, but I still don't quite understand why a force-push is required. Do you mean a force-push to the emscripten-releases repo or a force-push to any of the github repos? Sorry if I missed something in your workflow.

Also, the update would necessarily be a commit that was not checked. It's true that it's a nearby commit that maybe only changes the emscripten-version.txt and the changelog, but in theory, something could go wrong there.

I'm not sure if this is feasible on the bot:

  1. sync repos
  2. change emscripten-version.txt and create a new local commit for this change on the bot side
  3. run tests
  4. if everything goes well, create a new release branch on github and push the new local commit to this release branch, and also tag the new commit there

This may cause some other oddness though.

BTW, how can you close the tree? Will all collaborators with merge permission be informed or only the googlers?

@Beuc

This comment has been minimized.

Copy link
Contributor Author

commented Aug 22, 2019

In other projects I've worked on, the version bump is done a few commits before the final release / tag.
Incidentally this allows all tests to run with the target version number.

(also, IMHO making life more difficult for people who compile from source is deterring contribution and downstream packaging :/)

@kripken

This comment has been minimized.

Copy link
Member

commented Aug 22, 2019

@hujiajie

I suppose the hashes are for gclient, but I still don't quite understand why a force-push is required.

Let's say that the automatic test infrastructure finds that the set of commits (x,y,z) (x in repo 1 - say, emscripten, y in repo 2 - say, llvm, z in repo 3 - say, binaryen) pass all tests. The bot then adds a commit C to the repo that has all the tested-as-good combinations, and calls it version N. So checking out C in that repo gives you a file that says version N is (x,y,z).

If we want to note that fact in repo 1 (emscripten), we can easily add a tag, since it's just metadata. But if we add a commit that changes a file saying "this repo is now at version N" (like emscripten-version.txt) then we have a problem. Adding such a commit creates a new commit x' (!= x). We kind of want to say that's the canonical commit for that version, and not the tested one x, but (1) it wasn't tested, and (2) the only way to change things in the repo with version N is (x,y,z) would be to force-push it to say version N is (x',y,z).

change emscripten-version.txt

Yeah, we can imagine some atomic operation where the bot pushes a commit like that, then tests that combination. If it fails, though, then we've added a commit with a non-existent version. If many attempts are required before a good commit is found, that will be messy. It also means a complete closure of the tree while the bot tries this (several hours, compared to right now, which we in effect close the tree for a few minutes (since the hours-long testing is already behind us, we know what is a good set of commits).

BTW, how can you close the tree? Will all collaborators with merge permission be informed or only the googlers?

Well, it's not crucial - in the worst case, a commit might fall through. We diverge by 1 commit at least anyhow. So I haven't been super-careful about this. I sometimes mention it on irc, sometimes on internal chat, and sometimes not at all if it's like 6am and I assume no one is up ;) I don't think I've been consistent either way.

@Beuc

In other projects I've worked on, the version bump is done a few commits before the final release / tag.
Incidentally this allows all tests to run with the target version number.

Interesting, yeah, maybe pushing the version number first is better. That does require a more sophisticated atomic process though, as mentioned above, and has the problems with what happens if the tests fail, etc.

Another option here that we've considered is to remove emscripten-version.txt. It's really the problem here, since adding a metadata tag works great.

  • The emsdk already creates emscripten-version.txt when it checks out a version.
  • If someone is using git, perhaps emcc could populate the version based on the git history (find the closest tag, etc.). But that's not easy, especially if someone moves around between commits a lot (should emcc constantly check the git hash and guess at the version?), I don't have a great idea here.
  • Another option is not having a emscripten-version.txt if emscripten wasn't created by the emsdk. It could just say "dev version". That would be a breaking change, though, for example the C defines for the version number would no longer be numeric, etc. - I experimented with this and saw a bunch of breakage on the test suite which scared me.

So overall it's hard to see a better solution here, without a complex atomic process for version releasing, or without a breaking change?

@hujiajie

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2019

So checking out C in that repo gives you a file that says version N is (x,y,z)

I checked out C (emscripten-releases, IIUC) and just saw (x,y,z) in DEPS, but where is N (1.38.Y)? If N comes from emsdk, then considering how the hashes are rolled in DEPS, I can understand the risk now.

Yeah, we can imagine some atomic operation where the bot pushes a commit like that, then tests that combination.

I'm not sure if there's any misunderstanding here. Let's say the branch head of emscripten-core/emscripten is C and emscripten-core/emscripten-fastcomp is at F now:

emscripten          ... -> A -> B -> C

emscripten-fastcomp ... -> D -> E -> F

so C and F are recorded in the latest emscripten-releases repo. The bot checks out repos according to that and finds 1.X.Y in emscripten-version.txt. Then it bumps the version number and creates the new commits C' and F' for that, so the git history on the bot looks like this now:

emscripten          ... -> A -> B -> C
                                      \
                                       C'

emscripten-fastcomp ... -> D -> E -> F
                                      \
                                       F'

Note C' and F' are local (maybe in detached state) and not pushed to any public repo yet. The bot starts testing now to see if the combination of (C',F') is good. If the trees are left open in the next few hours, let's assume X, Y and Z are committed during this period:

emscripten          ... -> A -> B -> C -> X -> Y

emscripten-fastcomp ... -> D -> E -> F -> Z

At this point, the test finishes. If (C',F') is broken, then nothing will happen. Otherwise, C' and F' are pushed to a new release branch separately and they are tagged:

emscripten          ... -> A -> B -> C -> X -> Y
                                      \
                                       C' [tag 1.X.(Y+1)]

emscripten-fastcomp ... -> D -> E -> F -> Z
                                      \
                                       F' [tag 1.X.(Y+1)]

So at least the tag should match with emscripten-version.txt now, and we know (C',F') has been well tested. The limitation is that emscripten-version.txt in X, Y or Z still shows 1.X.Y, and the development branch still needs an extra commit to update emscripten-version.txt.

I forgot to take emsdk into consideration in my previous comment, but maybe it can still use the hash number in emscripten-releases that refers to (C,F). If so, I guess binaries for (C,F) will be downloaded, and these binaries should be identical to (C',F'). The version file may still show 1.X.Y, and that's the only difference between (C,F) and (C',F'), so I think that can be easily patched after downloading.

Thank you very much for taking time to read this. Probably it's too complex and does not bring much benefit.

@kripken

This comment has been minimized.

Copy link
Member

commented Aug 26, 2019

(N comes from the emsdk, yes, I slightly simplified in my examples before...)

Thanks @hujiajie, yeah, I think that could work. That does break the "every interesting commit on the main branch is in linear order" (which is slightly annoying but not the end of the world), and also maybe emscripten-version may be updated more than once if there are commits in the middle (also not the end of the world, but possibly confusing). Overall it's a nice idea.

The big problem with adopting that is setting up CI. We've set things up as close as possible to the "standard" way that Chromium CI works, to make it as easy as possible to use that infrastructure and get support from the devs. Writing something more custom would have the opposite downside.

Thinking about your idea, maybe a related option is to assume that modifying emscripten-version.txt has no side effects. In that case, maybe we can make the following small modification to the current model:

  • We see that (X, Y, Z) pass tests.
  • Close the tree.
  • Commit emscripten-version.txt to relevant places (the only thing allowed on the closed tree), creating new commits X', Y' (assuming Z is not a repo with a version file).
  • (X', Y', Z) can be the new version, even though it wasn't tested, because the difference is just those version files. We must manually commit it to the releases repo, or otherwise very carefully make sure that the autorollers end up commiting it later.
  • Open the tree.

That last part worries me, and might make this not a great option...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.