Fix incorrect model files on case-insensitive file systems#3148
Fix incorrect model files on case-insensitive file systems#3148
Conversation
This fixes some incorrect model files on case-insensitive file systems when the package names are the same but the capitalization is different. For example, when there are two packages `Volo.Abp.TestApp.MongoDb` and `Volo.Abp.TestApp.MongoDB`, there would be 1 model file for each package. However, on case-insensitive file systems, the second file would overwrite the first file. This results in missing models. This fixes it by canonicalizing the filenames to lowercase and writing all files with the same package name to the same file.
|
|
||
| // Ensure that if a file exists on disk, we use the same capitalization | ||
| // as the original file. | ||
| actualFilenameByCanonicalFilename[canonicalFilename] = filename; |
There was a problem hiding this comment.
I'm wondering what will happen if two files with the same canonicalization already exist on disk.
What I believe will happen is that we'll arbitrarily pick one of them as the true filename. Which one we pick will depend on the order they were listed by codeql resolve extensions when we listed existing packs. We'll merge their data together into one file (unless they contain data for the same method signature, in which the canonical file will win) and write all of this data back out to the canonical file. The non-canonical file will remain untouched.
This could be quite bad actually. When the extension or CodeQL reads in data extensions it'll read data from both the canonical and non-canonical files, but when the extension writes data it'll only update the canonical file. Effectively the non-canonical file will because invisible and untouchable by the model editor, but it'll continue to exist and supply model data to CodeQL.
Do you agree with my understanding here, or have I got it wrong?
I'm not sure what else we could do here. Should we display a warning? Or could we go as far as to delete the other file after we've merged its data into the canonical file?
There was a problem hiding this comment.
I think you're right, it'll just pick one arbitrarily and merge all data into that file. I'm not sure if we should be deleting the file because we could be deleting data that is not in the merged file (this can happen if the CodeQL query doesn't return the method). This can only happen on Linux or on a (non-default) case-sensitive filesystem on Windows/macOS.
I'm not sure if we should be handling this case because having two files with the same canonical filename will already fail on Windows/macOS. For example, if you create a model pack with a.yml and A.yml, I suspect that the CodeQL CLI will already arbitrarily pick one of these files on case-insensitive filesystems. Starting from this PR, we should never create filenames with different capitalizations, and I don't really think this is something the model editor should check.
This bug is also very unlikely to happen in Java since package names are usually lowercase, and even in C# it should be very unlikely. The only reason that I ran into this bug is because one database contained multiple test projects with different capitalization.
There was a problem hiding this comment.
This bug is also very unlikely to happen in Java since package names are usually lowercase, and even in C# it should be very unlikely.
That's a good point. It is unlikely to happen in practice. It's likely only to happen if people manually create a conflicting file. Given we don't have many users right now and it's unlikely someone hit this already I think we can ignore it for now.
we could be deleting data that is not in the merged file
Doesn't change our decision, but I don't think this is true. For every method we're saving we'll load the full contents of any files that canonicalise that way, and then write the canonical files back out again.
Anyway, let's move on for now. We're agreed we want to do this transformation to the canonical file. If any users hit bugs or confusion around this then we can add extra logging or warnings later on.
robertbrignull
left a comment
There was a problem hiding this comment.
This LGTM as a fix for case-insensitive file systems.
There could still be edge cases. We could cover this in more unit tests. But maybe we should avoid getting bogged down and instead just go ahead and assess further if we see bugs.
This fixes some incorrect model files on case-insensitive file systems when the package names are the same but the capitalization is different.
For example, when there are two packages
Volo.Abp.TestApp.MongoDbandVolo.Abp.TestApp.MongoDB, there would be 1 model file for each package. However, on case-insensitive file systems, the second file would overwrite the first file. This results in missing models. This fixes it by canonicalizing the filenames to lowercase and writing all files with the same package name to the same file.Checklist
ready-for-doc-reviewlabel there.