For files created in development versions, the content does not always match the file format version number #17846

cbjeukendrup · 2023-06-04T16:47:33Z

This issue describes rather a problem for developers and testers than a problem for users. The fact that we are having this problem with our current policy is obvious; the reason that I'm still logging an issue for it, is to discuss if and how we should improve this.

The problem

Our current policy looks something like this:

While building new features for 4.1, we change the file format, but don't update the File Format Version Number (FFVN) yet, because there might be more changes upcoming.
A helpful user (or one of us) tests the latest nightly build, and creates a score in it.
We make more changes to the file format.
Finally, we update the FFVN.

The score created in step 2 is now in an inconsistent state. Its FFVN is still 400, but the contents of this file will be a mix of (at worst) 3 different versions:

410, because it was created after step 1;
400, because it was created before step 3;
no existing version at all but something between 400 and 410, because some of the changes in step 3 might be revisions of the changes in step 1, rather than independent changes.

Later, when the user opens this score in the released version of 4.1, things don't work properly. The score is detected as corrupted, or causes crashes, or other problems occur...
We spend a lot of time trying to find the causes, and eventually we finally realise that this score is from an unreleased version (because luckily the commit of the MuseScore version is also written to the file). We conclude that we can't do anything and don't need to do anything, but we have wasted a lot of time figuring that out.

How common is this and how problematic is it for users?

Not very, of course, since users are advised to use nightly builds with care and not for real work. But for 4.0, this has happened several times, which caused us (and potentially others) to waste quite some time on useless debugging.

Proposed solution

I propose that we simply increase the FFVN in every pull request that changes any aspect of the file format.

This way, we rule out the possibility of a file being somewhere in-between between two versions.

Problems to overcome

Given our current ideas about the reading code (i.e. creating a full isolated version of the reading code for every existing FFVN), the proposed solution would give two problems (or, better said, it magnifies these problems rather than introducing them):

a lot of work: duplicating the whole reading code every time we change something, even for a simple typo fix, is a lot of tedious work.
a lot of duplication: that doesn't need any explanation, I guess. (Keep in mind that while we could decide that we will never provide bug fixes for reading files from old versions, we will still need to maintain all versions of the reading code every time when making changes to Libmscore's data structures.)

Possible solutions to these problems

Drop the idea of making the reading code for each version completely isolated
Instead, allow different versions to share code. That way, there is no duplication anymore and not a lot of extra work. Only for things that are actually different, there will be different code paths. Reg. how to choose which code path to take: I don't have strong opinions on whether that should be done using virtual methods, if-statements, or something else.
Allow a reader version to work for a range of version numbers, rather than exactly one version number.
This might be seen as "worst of both worlds" though. It will mean that there's still some duplication, and still if-statements that make things complicated.
Explicitly drop support for those never-released file format versions
For example, if the current version number is 408, and we make more changes so the number becomes 409, we drop support for 408 scores (which makes testing development builds slightly less convenient, because the only thing you can do with scores from an old development version is throwing them away, but that might be understandable for the user).

Additional thought

We could create named constants for file version numbers. For example:

static constexpr int FFV_NEW_ORNAMENTS = 408;
static constexpr int FFV_NEW_EXPRESSIONS = 409;
static constexpr int FFV_NEW_ORNAMENTS_SOME_EXTRA_FIX = 410;

...

// When reading expressions:
if (ctx.fileFormatVersion < FFV_NEW_EXPRESSIONS) {
    // read old expression
} else {
    // read new expression
}

...

// When reading ornaments:
if (ctx.fileFormatVersion < FFV_NEW_ORNAMENTS) {
    // read old ornament
} else if (ctx.fileFormatVersion < FFV_NEW_ORNAMENTS_SOME_EXTRA_FIX) {
    // read ornament from some intermediate version
} else {
    // read ornament from latest version
}

(This example does not necessarily correspond to the reality, but you get the idea)

bkunda · 2023-06-07T13:51:51Z

@cbjeukendrup would this maybe be more appropriate as a discussion rather than an issue?
We can definitely raise it in one of our weekly team calls.

cbjeukendrup · 2023-06-07T16:55:22Z

I'd not prefer moving it to Discussions, because Discussions are less convenient and nobody ever looks at them. And this issue is somewhat important for choosing our development policy.

cbjeukendrup · 2023-09-04T17:03:44Z

@igorkorsukov Now that both engraving developers are about to make changes to the file format, this issue becomes actual again.

To summarise: since different changes to the file format are made independently of each other, a score from a development build between two file format changes will not properly conform to any released version of the file format. When opening such a score in the final release, this may lead to strange bugs, which consumes development time.
Therefore, I think we should increase the file format version number after every change that we make to the file format, so that scores from development builds between file format changes also have a file format version number to conform to. That way, we can detect that those scores are from a development version, and decide what to do with them.

Indeed, deciding what to do, is what we need to do now. The "problem" is namely that for every file format version change, we need to create a whole new copy of the reading code. (Although this leads to an ever-growing code duplication and thus also an ever growing compilation time, the advantages are clear: simple code paths, no side effects of changes. So right now I won't propose to change that.)
We need to decide on a policy about older versions. In the original post here, I mentioned some options, but perhaps the best option is to preserve the following versions:

every version that has been the final version in some release (100...114, 200...206, 301, 302, 400, 410)
the latest development version of course (for example, 423)
the previous development versions (for example, 422), so that scores from the previous nightly can still be opened in the latest nightly, so to speak. But since we don't preserve the reading code from two file format versions ago (for example, 421), scores from that version cannot be opened in the latest nightly. However, you can use the previous nightly to convert that score from 421 to 422, so that it can be imported in the latest nightly (which will then save it in 423 format).
(those examples assume that there has been a version change from 422 to 423 between the previous and latest nightly, which is of course not every day the case).

@igorkorsukov Do you think this approach is okay?

igorkorsukov · 2023-09-04T19:22:30Z

@cbjeukendrup

Change file format version

I think one should change the file format version as soon as there are new structure changes.
I think that the version should become the next release, i.e. no need to increment the version for each change, no need to add intermediate versions. Subsequent changes should complement the current new development version.

For example, suppose we now have a file format version of 400
We added new ornaments, made version 410
We added expressions - added changes to version 410 (will also read ornaments)
Added some fixes - added changes to version 410 (will also read ornaments and expressions)
Released version 410

Yes, with this approach, it may happen that files created by the nightly version may not open in the next nightly version if some intermediate fix changes something rather than expands.
We need to understand how important this really is. Users should not expect that files created by nightly versions can be opened in the future, can we somehow warn about this?

By adding support for reading intermediate formats, the main thing that confuses me is that for the short-term convenience of a limited number of people, we complicate the code that will need to be maintained throughout the life of the application.
For example:

3 september: added new ornaments, made version 410
5 sep: a couple of people downloaded the version and created simple scores
6 sep: decided to rename a couple of fields or somehow change the structure of the ornament record
7 sep: in addition to the old implementation, we have added a new implementation of reading ornaments
4 oct: release

So it turns out, for an intermediate implementation of reading ornaments, which lives 4 days and with the it was a couple of simple scores were created - we will support this all our lives in the future. I don't think it's logical.

I think that if we really need to keep reading intermediate versions when they break, then it’s better not to change the file format version, but to add one more field to the file, something like devPatch.
Then we can call the reader version 410 and check in the right places

// When reading expressions:
if (ctx.devPatch < FFV_NEW_EXPRESSIONS) {
    // read old expression
} else {
    // read new expression
}

Then it will be more clear what is what.

Isolated read implementations

This was done mainly so that it was possible to greatly change the file structure, and not just slightly expand the current one. For example, remove the Measure as a top-level container.

a lot of work: duplicating the whole reading code every time we change something, even for a simple typo fix, is a lot of tedious work.
a lot of duplication: that doesn't need any explanation

Here you need explanations, for example, what kind of renaming are we talking about?
If we are talking about code renaming, for example, renaming a method, or a class name, then the IDE can handle it without problems.
If we are talking about renaming a field in a file, then independent implementations work well here, the old ones do not need to be touched, nothing will change in the old files, and the new name should only be in the new implementation.
Let's discuss more cases in which this approach is not convenient.

The old approach, without isolated implementations, confused me mainly because we have a lot of code and don’t even know what it is for, what it is for, i.e. we can only add and expand and cannot change or remove anything.
Well, we cannot change the structure in a significant way, for example, just the relationship between parents and children.

igorkorsukov · 2023-09-04T19:40:35Z

Alternatively, I had this idea:
We can change the file format version into components
Like this:

structureVersion - version of the structure as a whole, the relationship of parents and children
extensionVersion - structure extension version, i.e. the structure does not change, but only expands (in theory, even older versions of the application can read such a file, ignoring new elements)
patchVersion - this is a version of intermediate changes, what we talked about above

Or maybe even better done according to https://semver.org
only patches need to be made so that they are backwards compatible (for example, duplicating the old and new implementations when writing)

Then it is logical to make independent readers for the version of the structure, and inside check for the version of the extension and the patch

cbjeukendrup · 2023-09-04T22:12:24Z

a lot of work: duplicating the whole reading code every time we change something, even for a simple typo fix, is a lot of tedious work.
a lot of duplication: that doesn't need any explanation

Here you need explanations, for example, what kind of renaming are we talking about?

I meant that if we want to change the name of a field in the file, then we need to update the file format version number, and because we need to update the version number, we have to create a whole new copy of the reading code, while the difference between the new version and the old version is only one word.

Another example, as far as I know, read410 is very similar to read400.

In this situation, we might indeed choose to create not a whole new copy, but just update the existing reader to support the version with typo fix too.
However, I'm not sure if that's really convenient in the longer term. What if we later decide that we also want to add bigger changes? Then we still need to create a copy, but we've already modified the original for that typo change, so after copying we need to restore the original to its original state. Not optimal.
And, as I said in the previous comment, when each version is isolated, it's very clear what belongs to what, there are no distracting if statements, and when making changes to the latest version you can be sure that won't have side effects on the existing versions.
On the other hand, having each version isolated does make the codebase a lot bigger, which increases compile time... and versions will doubtlessly have a lot in common so there is code duplication... it's a difficult dilemma.

One thing that we can do: after we have updated the version number to 420 and created a 420 reader, we are certain that 410 is frozen. That means we can safely start identifying pieces of code that are shared between 410 and 400, and let 410 call 400's methods in those places. That way, we still don't risk making changes with side effects, but we also filter out the worst code duplication cases. And we can do the same for 420 once we've moved to the next version.
But not sure if that's a good idea.

I'm not sure if I like the idea of using different version components. That might cause problems because older MuseScore versions won't know about this, and actually I like the simplicity of having just one number that just increases. But it is true that there is benefit in adding a way to distinguish very big file format changes from changes that will still be understood correctly by older versions.
(But again, the problem there is that when beginning the development of a new version, we can't always be sure how big the final changes in that version will be... so we can't always decide if it should be a major version number bump or a minor one.)

(regarding SemVer: note that if we make a patch release but maintain backward compatibility, we would not have had to increase the version number at all 🙂)

But back to the original problem of this issue:

So it turns out, for an intermediate implementation of reading ornaments, which lives 4 days and with the it was a couple of simple scores were created - we will support this all our lives in the future. I don't think it's logical.

That's why I suggested in my previous comment to keep only the latest version, and the previous version during the development. So, in the long term, nothing really changes, but only during development there will be always one intermediate version at a time in order to be able to convert scores from the previous to the latest one.

As an alternative to increasing the entire version number, we can indeed add an additional "dev_version" attribute, that will be increased during development. If this attribute is present in a score, it is from a development build; if it's not, then the score is from a released version. For development builds, each nightly build supports reading files from the current and previous development file format version, so that there is a way to "convert" existing scores. This will be handled by if statements, so we don't create a new copy of the reading code, and we can remove these if statements just before the release when there won't be any other changes.

Perhaps that is indeed the best solution to this issue, since it's so simple.

One more concrete question: what do you think, for the 420 format, should we create new copy of the reading code, or just extend the 410 code? It looks like Michele is doing the second option, in #18722.

igorkorsukov · 2023-09-05T07:38:09Z

I agree that it is not convenient and redundant to create a new copy of the read implementation if we want to add a couple of fields..
But I don’t like it when a lot of ifs appear, when it’s not clear what is what, when it’s easy to break the reading of old formats, and therefore we need to keep the code as it is, even if we don’t know why ...

There is another question here, do we want to make large structural changes to the file within one major version of the application? For example, if we decide to remove a Measure as a top-level container, will we do it in 4th or will we release the 5th version? Although the user will not be affected by this change, it may be necessary for subsequent changes.
(@bkunda probably a question for you :) )

If we do not make large structural changes in one major version, then I think it will be fine to keep one copy of the reading implementation per major version. Although in general, nothing prevents we from creating a copy of the implementation of reading at any time if necessary.

So I see options:

One generic read implementation - I think this is not realistic, even before there were separate implementations for 114, 206 but they were not completely isolated. This approach does not allow us to significantly change the structure of files. And over time there will be too many ifs
Make copies of the read implementation as needed - For example, if we know that we will significantly change the structure, or if too many changes have accumulated. In fact, this is the approach that was before, only it is proposed to completely isolate implementations. I don’t like this option because someone needs to decide ... whether this person will be available at each such moment or not, in practice I’m afraid people will simply expand the current version, fearing to create a new one ... (in fact, the whole libmscore developed like this)
Create an isolated implementation of reading for each major version - without thinking whether it is practically necessary or not, that is, just a rule. It's pretty simple, so it might work.
Make components of version (similar to server) - It is important here to decouple it from the release version of the application. For major versions of the file format, create isolated read implementations. Do extensions and patches in a major implementation.

@cbjeukendrup What do you like or what other options are there?

igorkorsukov · 2023-09-05T07:40:59Z

As an alternative to increasing the entire version number, we can indeed add an additional "dev_version" attribute, that will be increased during development. If this attribute is present in a score, it is from a development build; if it's not, then the score is from a released version. For development builds, each nightly build supports reading files from the current and previous development file format version, so that there is a way to "convert" existing scores. This will be handled by if statements, so we don't create a new copy of the reading code, and we can remove these if statements just before the release when there won't be any other changes.

Perhaps that is indeed the best solution to this issue, since it's so simple.

I like this approach :)

cbjeukendrup assigned igorkorsukov Jun 4, 2023

bkunda assigned RomanPudashkin Jun 7, 2023

bkunda added the P1 Priority: High label Jul 5, 2023

bkunda added this to To do in 4.x SHORTLIST via automation Aug 31, 2023

bkunda unassigned RomanPudashkin Aug 31, 2023

cbjeukendrup self-assigned this Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For files created in development versions, the content does not always match the file format version number #17846

For files created in development versions, the content does not always match the file format version number #17846

cbjeukendrup commented Jun 4, 2023

bkunda commented Jun 7, 2023

cbjeukendrup commented Jun 7, 2023

cbjeukendrup commented Sep 4, 2023

igorkorsukov commented Sep 4, 2023 •

edited

Loading

igorkorsukov commented Sep 4, 2023

cbjeukendrup commented Sep 4, 2023

igorkorsukov commented Sep 5, 2023

igorkorsukov commented Sep 5, 2023

For files created in development versions, the content does not always match the file format version number #17846

For files created in development versions, the content does not always match the file format version number #17846

Comments

cbjeukendrup commented Jun 4, 2023

The problem

How common is this and how problematic is it for users?

Proposed solution

Problems to overcome

Possible solutions to these problems

Additional thought

bkunda commented Jun 7, 2023

cbjeukendrup commented Jun 7, 2023

cbjeukendrup commented Sep 4, 2023

igorkorsukov commented Sep 4, 2023 • edited Loading

Change file format version

Isolated read implementations

igorkorsukov commented Sep 4, 2023

cbjeukendrup commented Sep 4, 2023

igorkorsukov commented Sep 5, 2023

igorkorsukov commented Sep 5, 2023

igorkorsukov commented Sep 4, 2023 •

edited

Loading