SCUMM: Replace UB-triggering serialization code with Common::Serializer #1077
This PR replaces the UB-triggering serialization code used by SCUMM engine with the standard, safe, hack-free Common::Serializer.
During this work I also discovered several things where I am not sure what is the right thing to do, where I added TODOs:
It would be helpful for someone with more knowledge to look and say what should be done in these cases.
I tried testing with all the games I own, pulling various old save games to load from the bug tracker, and asked someone I know with a few rarer versions of Zak to also test that, and they all load and appear to work correctly:
maniac-v1 is from the DOTT CD, monkey1 and monkey2 are from the Monkey Island Madness release. Always English releases.
Notably, I don’t have the ability to test the non-PC Maniac Mansions, Loom, or most of the HE games. So if someone could do that, to make sure nothing has been obviously broken, that would be appreciated.
So, the elegant compile-time computation which was working and is working is replaced by this run-time verbose code for the sake of addressing compiler warnings?
Colin, this follows SCI engine and is an example of a simplistic engineering.
The engine is stable, the code is working, I am against this change. Fix the compiler or shut the warning, it is pointless.
I will address and review the detected inconsistencies when I get some time. Those could be the bugs.
Revisiting older engines to replace custom solutions with newer common infrastructure seems like good practice to me. The new code feels safer and easier to debug thanks to the removal of the struct offset computations and use of void *.
I've tested saving and loading with MM C64 demo and The Dig Mac Demo.
@sev-, I really appreciate your willingness to spend some of your valuable time reviewing my patches. I understand and respect that you care deeply about this matter, and, I must request that in the future a less combative tone be used when writing to me with feedback. The way in which concerns were communicated here left me feeling very disrespected and disinterested in engaging with you, which is not how I want to feel, and I hope not how you want me to feel either.
In order to improve this code (and my coding practices), I need feedback which is specific and objective. I acknowledge your opinion that this new code is less elegant, and I am unable to respond to that feedback productively without more specificity and objectivity regarding how this implementation is worse than what it is replacing.
This code does eliminate a source of UB in ScummVM. Regardless of whether or not this non-conforming code has worked successfully in the past, it is still valid for any conforming compiler to break it, and it does currently crash at least one compiler’s analysis tooling. _Sub_jectively, I prefer to be proactive in ensuring that ScummVM is conforming to standards so that it will not be broken in the future by a conforming compiler or architecture which chooses to behave differently, regardless of how unlikely such a scenario may appear. As I have learned two times in the recent past, implausible-seeming changes can and do happen, and scrambling to fix a defect in a critical subsystem like this after it’s broken seems like a worse idea to me than slowly, carefully, and deliberately fixing it today.
With regards to verbosity, objectively I can only measure SLOC and the amount of repetition in the new code, and by those measurements this patch reduces SLOC and eliminates additional structures, the repetition of class names, and the repetition of calls to APIs like
I acknowledge and share a concern that this is a relatively significant change to save code, which is always the riskiest place to make changes as it can inadvertently cause corruption of game progress. I still feel that fixing this code is the correct course of action, for the reasons I have explained. This change is being intentionally introduced at the beginning of a release cycle since it ensures that if there is a mistake there is a maximal amount of time for users to find and report any bugs prior to another release. The amount of testing I performed prior to submitting this PR for review was done in order to reduce the risk of any breakage of old save games which would not be noticed by users during normal play. The common code which is being used to replace the engine-specific serialisation code is itself well-used and well-tested, so is unlikely to be a source of new bugs.
If you are willing to continue to work with me on this PR, I would appreciate it if you could take another look and offer some specific and measurable examples of problems with the code, tell me why they are problems, and why the old code does it better. If not, I completely understand, and will defer to the feedback of the rest of the group on whether or how this patch can be improved.
Thank you again for your feedback and consideration, and I hope that I have explained things clearly here in a way that allows a productive discussion to continue.
I explained the primary drawback which this patch introduces: now all the code is runtime, while the previous one was compile-time.
Operating void pointers was always the primary feature of the language, and in this particular case is used to calculate the relative position of a class/struct member. That is a logical code and does exactly what is needed. That is, we do not care about types, we need only numbers which represent offsets.
Now, there is no single issue in production, nor in the wild. Only some obscure tool is crashing or even just complaining (I am not sure what is exactly happening there), and this tool is not used by anyone else. This is why I think it is better to fix the tool.
I was hoping that you will come with some tricky way of calculating the same pointers at compile time, which will make your tool happy. At this moment this is unfortunately not the case and you used a generic approach which is used by many engines, including the SCI, which codebase you're very well versed with. But I would rather not touch the crucial part of the stable engine for the sake of fixing a non-existent issue.
In the past, I was a proponent of fixing "all Coverity reports", and I am grateful to my teammates for keeping me out of that and spending their time and energy on explaining it to me. What is happening here is very similar to that story in my opinion, but now I am trying to keep you out of fixing issues detected by some static code analyser.
And as of the disrespect, it is not my intention, neither I believe, I did any comments, both on public and in private which could suggest otherwise. If you're referring to my words about the "simplistic engineering", then it is my evaluation of the approach, not your abilities or personality.
This is exactly the opposite of what is happening.
In the new code, calculation of all information needed to serialise an object member is now performed automatically by the compiler at compile time. One need only look at the disassembly generated by the compiler to see it is so. From
.loc […] ## engines/scumm/saveload.cpp:800:4 mov edx, 8 mov ecx, -1 mov rdi, r13 mov rsi, rbx # od.OBIMoffset call __ZN6Common10Serializer14syncAsUint32LEIjEEvRT_jj .loc […] ## engines/scumm/saveload.cpp:801:4 lea rsi, [rbx + 4] # od.OBCDoffset mov edx, 8 mov ecx, -1 mov rdi, r13 call __ZN6Common10Serializer14syncAsUint32LEIjEEvRT_jj .loc […] ## engines/scumm/saveload.cpp:802:4 lea rsi, [rbx + 8] # od.walk_x mov edx, 8 mov ecx, -1 mov rdi, r13 call __ZN6Common10Serializer14syncAsUint16LEIsEEvRT_jj # […et cetera…]
The key things to note here are that this code is not doing any branching on each property to find the correct serializer, as must be done in the old
.loc […] ## engines/scumm/saveload.cpp:1939:14 mov r8d, dword ptr [rdx] # extra load of sle->offs .loc […] ## engines/scumm/saveload.cpp:1939:19 cmp r8d, 65535 # extra termination check .loc […] ## engines/scumm/saveload.cpp:1939:2 je LBB47_11 LBB47_1: ## loop, once for each object member .loc […] ## engines/scumm/saveload.cpp:1942:22 mov bl, byte ptr [rdx + 4] # extra load of sle->type .loc […] ## engines/scumm/saveload.cpp:1944:23 cmp byte ptr [rdx + 9], 98 # extra load of sle->maxVersion .loc […] ## engines/scumm/saveload.cpp:1944:7 jne LBB47_2 .loc […] ## engines/scumm/saveload.cpp:1941:10 movzx ecx, word ptr [rdx + 6] # extra load of sle->size # […et cetera…]
(This is not actually all of the extra code, I got tired of cleaning the disassembly and this makes my point well enough.)
So now that you see this new code is actually better at runtime, is this objection resolved to your satisfaction? If there are other objections which I have not addressed in this comment or the previous one, please let me know so that I can consider them. Thanks.
However, at this moment there is a warning:
Colin, could you take a look at it?
...and with disable FMTOWNS dual layer code, there are compilation errors:
Thanks for your willingness to move ahead with this, I appreciate it.
Sorry about those warnings, create_project still does not set up the same warning flags for Xcode projects as for GNU Make, and I generate new projects so infrequently that I forget that I need to fix them (as when I generated a new project to work on SCUMM engine). I’ve added it to my list as something to look into in future so this doesn’t happen to other Xcode users in future.
I’ve pushed a fix for the