Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCUMM: Replace UB-triggering serialization code with Common::Serializer #1077

Merged
merged 3 commits into from Jan 31, 2018

Conversation

csnover
Copy link
Member

@csnover csnover commented Dec 1, 2017

Fixes Trac#10342.

This PR replaces the UB-triggering serialization code used by SCUMM engine with the standard, safe, hack-free Common::Serializer.

During this work I also discovered several things where I am not sure what is the right thing to do, where I added TODOs:

  1. In Sprite::saveLoadWithSerializer for save games < version 64 it was trying to serialize a pointer to pointer to SpriteInfo as an array of SpriteInfo. I changed this to serialize whatever is dereferenced from the first pointer, but I have no idea if this makes sense (it at least makes more sense than the previous code which was bogus and only compiled due to the use of void * in the old save code).
  2. In ScummEngine::saveLoadWithSerializer, _grabbedCursor is always serialised as a 8k byte sequence but it is actually 16k and sometimes stores native-endianness int16s.
  3. In ScummEngine::saveLoadWithSerializer, _cyclRects is size 16 but only 10 entries are serialised.
  4. In ScummEngine_v5::saveLoadWithSerializer, _cursorImages is size 4*17 but is serialised as 4*16.

It would be helpful for someone with more knowledge to look and say what should be done in these cases.

I tried testing with all the games I own, pulling various old save games to load from the bug tracker, and asked someone I know with a few rarer versions of Zak to also test that, and they all load and appear to work correctly:

Game ID Save version Ticket # Platform
comi 16 766 dos
comi 28 1408 dos
comi 73 4145 dos
comi 98 10340 dos
dig 16 518 dos
dig 30 1535 dos
dig 59 2488 dos
dig 96 dos
dig 98 dos
ft 9 506 dos
ft 22 1223 dos
ft 59 2422 dos
ft 98 dos
maniac-v1 48 2028 dos
maniac-v1 73 3467 dos
maniac-v1 98 dos
monkey1 9 418 dos
monkey1 9 200 dos
monkey1 29 1572 dos
monkey1 73 4199 dos
monkey1 98   dos
monkey2 9 241 dos
monkey2 22 1224 dos
monkey2 42 2082 dos
monkey2 76 4130 dos
monkey2 9 8064 dos
monkey2 98   dos
pajama-win 94 6448 win
pajama-win 98 win
samnmax 9 248 dos
samnmax 28 1430 dos
samnmax 73 3832 dos
samnmax 97 7175 dos
samnmax 98 dos
tentacle 8 250 dos
tentacle 28 1064 dos
tentacle 68 2729 dos
tentacle 94 7141 dos
tentacle 98 dos
zak 77 3099 fmtowns
zak 80 4683 fmtowns
zak 81 5624 fmtowns
zak 98   fmtowns
zak 98 amiga v2

maniac-v1 is from the DOTT CD, monkey1 and monkey2 are from the Monkey Island Madness release. Always English releases.

Notably, I don’t have the ability to test the non-PC Maniac Mansions, Loom, or most of the HE games. So if someone could do that, to make sure nothing has been obviously broken, that would be appreciated.

@sev-
Copy link
Member

sev- commented Dec 8, 2017

So, the elegant compile-time computation which was working and is working is replaced by this run-time verbose code for the sake of addressing compiler warnings?

Colin, this follows SCI engine and is an example of a simplistic engineering.

The engine is stable, the code is working, I am against this change. Fix the compiler or shut the warning, it is pointless.

I will address and review the detected inconsistencies when I get some time. Those could be the bugs.

@bgK
Copy link
Member

bgK commented Dec 8, 2017

Revisiting older engines to replace custom solutions with newer common infrastructure seems like good practice to me. The new code feels safer and easier to debug thanks to the removal of the struct offset computations and use of void *.

I've tested saving and loading with MM C64 demo and The Dig Mac Demo.

@csnover
Copy link
Member Author

csnover commented Dec 12, 2017

@sev-, I really appreciate your willingness to spend some of your valuable time reviewing my patches. I understand and respect that you care deeply about this matter, and, I must request that in the future a less combative tone be used when writing to me with feedback. The way in which concerns were communicated here left me feeling very disrespected and disinterested in engaging with you, which is not how I want to feel, and I hope not how you want me to feel either.

In order to improve this code (and my coding practices), I need feedback which is specific and objective. I acknowledge your opinion that this new code is less elegant, and I am unable to respond to that feedback productively without more specificity and objectivity regarding how this implementation is worse than what it is replacing.

This code does eliminate a source of UB in ScummVM. Regardless of whether or not this non-conforming code has worked successfully in the past, it is still valid for any conforming compiler to break it, and it does currently crash at least one compiler’s analysis tooling. _Sub_jectively, I prefer to be proactive in ensuring that ScummVM is conforming to standards so that it will not be broken in the future by a conforming compiler or architecture which chooses to behave differently, regardless of how unlikely such a scenario may appear. As I have learned two times in the recent past, implausible-seeming changes can and do happen, and scrambling to fix a defect in a critical subsystem like this after it’s broken seems like a worse idea to me than slowly, carefully, and deliberately fixing it today.

With regards to verbosity, objectively I can only measure SLOC and the amount of repetition in the new code, and by those measurements this patch reduces SLOC and eliminates additional structures, the repetition of class names, and the repetition of calls to APIs like saveLoadEntries, so by all these quantifiable measurements appears to me to be less verbose than the code which it is replacing.

I acknowledge and share a concern that this is a relatively significant change to save code, which is always the riskiest place to make changes as it can inadvertently cause corruption of game progress. I still feel that fixing this code is the correct course of action, for the reasons I have explained. This change is being intentionally introduced at the beginning of a release cycle since it ensures that if there is a mistake there is a maximal amount of time for users to find and report any bugs prior to another release. The amount of testing I performed prior to submitting this PR for review was done in order to reduce the risk of any breakage of old save games which would not be noticed by users during normal play. The common code which is being used to replace the engine-specific serialisation code is itself well-used and well-tested, so is unlikely to be a source of new bugs.

If you are willing to continue to work with me on this PR, I would appreciate it if you could take another look and offer some specific and measurable examples of problems with the code, tell me why they are problems, and why the old code does it better. If not, I completely understand, and will defer to the feedback of the rest of the group on whether or how this patch can be improved.

Thank you again for your feedback and consideration, and I hope that I have explained things clearly here in a way that allows a productive discussion to continue.

@hpvb
Copy link
Contributor

hpvb commented Dec 12, 2017

Hey @csnover what UB was triggered with the old save code? Maybe we can come up with a 'lighter touch' fix if @sev- continues to be worried about a change this large?

@csnover
Copy link
Member Author

csnover commented Dec 12, 2017

@hpvb The UB triggering behaviour here is taking the offset of a member of a non-POD object (the OFFS macro). I am happy to discuss alternatives if anyone can propose some.

@sev-
Copy link
Member

sev- commented Dec 12, 2017

I explained the primary drawback which this patch introduces: now all the code is runtime, while the previous one was compile-time.

Operating void pointers was always the primary feature of the language, and in this particular case is used to calculate the relative position of a class/struct member. That is a logical code and does exactly what is needed. That is, we do not care about types, we need only numbers which represent offsets.

Now, there is no single issue in production, nor in the wild. Only some obscure tool is crashing or even just complaining (I am not sure what is exactly happening there), and this tool is not used by anyone else. This is why I think it is better to fix the tool.

I was hoping that you will come with some tricky way of calculating the same pointers at compile time, which will make your tool happy. At this moment this is unfortunately not the case and you used a generic approach which is used by many engines, including the SCI, which codebase you're very well versed with. But I would rather not touch the crucial part of the stable engine for the sake of fixing a non-existent issue.

In the past, I was a proponent of fixing "all Coverity reports", and I am grateful to my teammates for keeping me out of that and spending their time and energy on explaining it to me. What is happening here is very similar to that story in my opinion, but now I am trying to keep you out of fixing issues detected by some static code analyser.

And as of the disrespect, it is not my intention, neither I believe, I did any comments, both on public and in private which could suggest otherwise. If you're referring to my words about the "simplistic engineering", then it is my evaluation of the approach, not your abilities or personality.

@csnover
Copy link
Member Author

csnover commented Dec 12, 2017

I explained the primary drawback which this patch introduces: now all the code is runtime, while the previous one was compile-time.

This is exactly the opposite of what is happening.

In the new code, calculation of all information needed to serialise an object member is now performed automatically by the compiler at compile time. One need only look at the disassembly generated by the compiler to see it is so. From syncWithSerializer(Common::Serializer &s, ObjectData &od), here is the code it takes to save 3 member properties:

        .loc    [] ## engines/scumm/saveload.cpp:800:4
        mov     edx, 8
        mov     ecx, -1
        mov     rdi, r13
        mov     rsi, rbx # od.OBIMoffset
        call    __ZN6Common10Serializer14syncAsUint32LEIjEEvRT_jj
        .loc    [] ## engines/scumm/saveload.cpp:801:4
        lea     rsi, [rbx + 4] # od.OBCDoffset
        mov     edx, 8
        mov     ecx, -1
        mov     rdi, r13
        call    __ZN6Common10Serializer14syncAsUint32LEIjEEvRT_jj
        .loc    [] ## engines/scumm/saveload.cpp:802:4
        lea     rsi, [rbx + 8] # od.walk_x
        mov     edx, 8
        mov     ecx, -1
        mov     rdi, r13
        call    __ZN6Common10Serializer14syncAsUint16LEIsEEvRT_jj
        # […et cetera…]

The key things to note here are that this code is not doing any branching on each property to find the correct serializer, as must be done in the old saveArrayOf. It is not needing to perform any loop termination check, as must be done in the old saveEntries. It is not needing to do any extra loads from memory to learn offsets, as must be done in the old saveEntries. It is not needing to do any extra checks to decide if a save entry is an array entry, as must be done in the old saveEntries. At runtime, this new code objectively is doing less work than the old code. All of this extra runtime work does not exist in the new code:

        .loc   []  ## engines/scumm/saveload.cpp:1939:14
        mov     r8d, dword ptr [rdx] # extra load of sle->offs
        .loc   []  ## engines/scumm/saveload.cpp:1939:19
        cmp     r8d, 65535 # extra termination check
        .loc   []  ## engines/scumm/saveload.cpp:1939:2
        je      LBB47_11
LBB47_1:                                ## loop, once for each object member
        .loc    [] ## engines/scumm/saveload.cpp:1942:22
        mov     bl, byte ptr [rdx + 4] # extra load of sle->type
        .loc    [] ## engines/scumm/saveload.cpp:1944:23
        cmp     byte ptr [rdx + 9], 98 # extra load of sle->maxVersion
        .loc    [] ## engines/scumm/saveload.cpp:1944:7
        jne     LBB47_2
        .loc    [] ## engines/scumm/saveload.cpp:1941:10
        movzx   ecx, word ptr [rdx + 6] # extra load of sle->size
        # […et cetera…]

(This is not actually all of the extra code, I got tired of cleaning the disassembly and this makes my point well enough.)

So now that you see this new code is actually better at runtime, is this objection resolved to your satisfaction? If there are other objections which I have not addressed in this comment or the previous one, please let me know so that I can consider them. Thanks.

@sev-
Copy link
Member

sev- commented Jan 31, 2018

In the end, I decided to give it a go.

Thank you, Colin, for your good work. Merging.

@sev- sev- merged commit 2e061d9 into scummvm:master Jan 31, 2018
@sev-
Copy link
Member

sev- commented Jan 31, 2018

However, at this moment there is a warning:

./engines/scumm/music.h:86:15: warning: 'Scumm::MusicEngine::saveLoadWithSerializer' hides overloaded virtual function
      [-Woverloaded-virtual]
        virtual void saveLoadWithSerializer(Common::Serializer &ser, ScummEngine *scumm, bool fixAfterLoad = true) {}
                     ^
./common/serializer.h:280:15: note: hidden overloaded virtual function 'Common::Serializable::saveLoadWithSerializer' declared
      here: different number of parameters (1 vs 3)
        virtual void saveLoadWithSerializer(Serializer &ser) = 0;
                     ^
1 warning generated.

Colin, could you take a look at it?

@sev-
Copy link
Member

sev- commented Jan 31, 2018

...and with disable FMTOWNS dual layer code, there are compilation errors:

http://buildbot.scummvm.org/builders/master-ds/builds/5233/steps/compile/logs/stdio

@csnover
Copy link
Member Author

csnover commented Feb 1, 2018

Thanks for your willingness to move ahead with this, I appreciate it.

Sorry about those warnings, create_project still does not set up the same warning flags for Xcode projects as for GNU Make, and I generate new projects so infrequently that I forget that I need to fix them (as when I generated a new project to work on SCUMM engine). I’ve added it to my list as something to look into in future so this doesn’t happen to other Xcode users in future.

I’ve pushed a fix for the DISABLE_TOWNS_DUAL_LAYER_MODE branch. My apologies for missing that during my initial work. I’m still waiting for Buildbot to catch up, but I did compile locally with that definition added so it is hopefully fixed. I will monitor that build until it is healthy again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants