Skip to content

Fixes Random Desyncs Due To Statue Exception And NextThingId Missmatch#632

Merged
notfood merged 6 commits intorwmt:devfrom
Tick-git:Fixes-New-Statue-Exception-And-NextThingId-Missmatch
Aug 24, 2025
Merged

Fixes Random Desyncs Due To Statue Exception And NextThingId Missmatch#632
notfood merged 6 commits intorwmt:devfrom
Tick-git:Fixes-New-Statue-Exception-And-NextThingId-Missmatch

Conversation

@Tick-git
Copy link
Copy Markdown
Contributor

@Tick-git Tick-git commented Aug 19, 2025

Label: 1.6, High Priority, Fix

  • This problem causes save files to constantly desync.
  • The issue is caused by the new 1.6 statue created in the art bench.
  • The root cause happens during the statue expose process, a nextThingId mismatch occurs because the host uses negative local thing IDs (patched by us) while the clients use unpatched ones.
  • I suspect the problem occurs more often when flying with the gravship, since a statue can sometimes be present on NPC maps (just a guess).

Tested:

  • Test that damaged save files work correctly
  • Test creating a Statue with art bench
  • Spawning a Statue via debug tool (Statues look different but do not lead to problems)

This PR fixes two issues with the new method CompStatue.InitFakePawn:

  1. Pawn name sync issue: The pawn getter is synced while pawn is not fully initialized, which caused syncing errors due to a missing PawnKindDef.

  1. Desyncs due to UniqueIdsPatch (nextThingId missmatch): This was the main reason I investigated the issue. Many desyncs occurred because UniqueIdsPatch (look link below for code) generates negative thing IDs if certain values are set in a specific way. These values differ between client and host - specifically the Multiplayer.InInterface property. The method InitFakePawn is called inside an ExecuteWhenFinished event (look below for code). On the host, this event runs as an async long event, while on the client it runs as a synced event. As a result, the LongEventHandler.currentEvent field is different on host and client, which makes InInterface differ as well. This causes the host to generate negative new thing IDs, while the clients generate positive ones. Since clients get ahead in nextThingIds, the desyncs occur
image
class CompStatue
...
public override void PostExposeData()
{
    ...
    if (Scribe.mode == LoadSaveMode.PostLoadInit)
    {
        ...
        LongEventHandler.ExecuteWhenFinished(new Action(this.InitFakePawn));
    }
}

[HarmonyPatch(typeof(UniqueIDsManager))]
[HarmonyPatch(nameof(UniqueIDsManager.GetNextID))]
public static class UniqueIdsPatch
{
// Start at -2 because -1 is sometimes used as the uninitialized marker
private static int localIds = -2;
public static bool useLocalIdsOverride;
private static bool UseLocalIds =>
Multiplayer.Client != null && (useLocalIdsOverride || Multiplayer.InInterface || Current.ProgramState == ProgramState.Entry);
static bool Prefix()
{
return !UseLocalIds;
}
static void Postfix(ref int __result)
{
if (UseLocalIds)
__result = localIds--;
}
}

Important:

I'm not entirely sure if the fix is the right approach. I haven’t looked into why we use two different events or why we rely on local IDs that are negative values. If the fix feels off to you, I can dig deeper into the problem.

1.6 introduces Statues, which causes problems.
This fixes two issues with the new method InitFakePawn:

1. Pawn name sync issue: The pawn name was being synced because the getter was called before the pawn was fully initialized. At that point, the pawn was not set up correctly, which caused syncing errors due to the missing PawnKindDef.

2. Desyncs due to UniqueIdsPatch: This was the main reason I investigated the issue. Many desyncs occurred because UniqueIdsPatch generates negative thing IDs if certain values are set in a specific way. These values differ between client and host — specifically the Multiplayer.InInterface property.
The method InitFakePawn is called inside an ExecuteWhenFinished event. On the host, this event runs as an async long event, while on the client it runs as a synced event. As a result, the LongEventHandler.currentEvent field is different on host and client, which makes InInterface differ as well.
This causes the host to generate negative new thing IDs, while the clients generate positive ones. Since clients get ahead in nextThingIds, the next time a thing is generated a desync occurs.
@notfood notfood added fix Fixes for a bug or desync. high priority Important for enjoyable gameplay with the mod. 1.6 Fixes or bugs relating to 1.6 (Not Odyssey). labels Aug 19, 2025
@notfood notfood moved this to In review in 1.6 and Odyssey Aug 19, 2025
@SokyranTheDragon
Copy link
Copy Markdown
Member

I'm not entirely sure if the fix is the right approach. I haven’t looked into why we use two different events or why we rely on local IDs that are negative values. If the fix feels off to you, I can dig deeper into the problem.

To explain the highlighted part:

The game expects a lot of objects in the game to have a unique ID, and desyncs will happen when they are out of sync between players.

However, sometimes we don't want the IDs to be in syncs. Usually in situations when something happens only for one of the clients - for example, displaying a message when rejecting input. It would be pointless to display a message for every player when it's related to something one person did. Likewise, syncing a message saying "increment IDs by 1" every time such a message is generated would be wasteful - especially since those messages can get quite spammy.

So a simple solution to that is to just use negative IDs. Anything with a negative ID is considered a local object that is not synced with all players (unless a bug such as this one occurs). The game never really uses negative values for ID (with exception of -1/-2/-3), meaning it's pretty safe to do so as well. And when we're syncing something using local IDs with other players - that's when such object gets assigned a positive ID.

@SokyranTheDragon
Copy link
Copy Markdown
Member

The method InitFakePawn is called inside an ExecuteWhenFinished event (look below for code). On the host, this event runs as an async long event, while on the client it runs as a synced event. As a result, the LongEventHandler.currentEvent field is different on host and client, which makes InInterface differ as well.

To expand a bit on this: ExecuteWhenFinished isn't really the event - it's the method to be called once the event finishes. And currentEvent is not null if the event was asynchronous since it calls the callback passed to ExecuteWhenFinished first, and then sets the field to null. Meanwhile, synchronous and enumerator events first set the field to null, and then run the callback.

We'll likely need to modify UpdateCurrentAsynchronousEvent so it does its cleanup first, and then calls the callback (through the ExecuteToExecuteWhenFinished method).

@SokyranTheDragon
Copy link
Copy Markdown
Member

I believe that modifying UpdateCurrentAsynchronousEvent should be enough to fix this issue (and other potential ones, likely related to mods).

However, if this is not the case, then we'd need to look into other solutions first.

@Tick-git
Copy link
Copy Markdown
Contributor Author

Tick-git commented Aug 20, 2025

I have already written a comment and deleted it due me looking wrongly at the code and making a wrong suggestion. Looking into it tomorrow again - Thanks for the feedback :)

@Tick-git
Copy link
Copy Markdown
Contributor Author

Tick-git commented Aug 20, 2025

Pawn name sync issue: The pawn getter is synced while the pawn isn’t fully initialized, which causes sync errors due to a missing PawnKindDef.

This will still be a problem, I assume. My first solution for that was to change pawn.Name to pawn.nameInit to bypass syncing via the Name getter.

I also looked into the different execution orders between UpdateCurrentAsynchronousEvent() and the other two, but I assumed the different cleanup order was intentional.

I’ll implement your proposal and make the change for problem 1 as written above :)

Calling the Pawn.Name getter in CompStatue.InitFakePawn causes issues because the pawn has not been initialized correctly at this point.
Fix Problem 2 from rwmt#632 by adjusting the execution of the cleanup logic in UpdateCurrentAsynchronousEvent, as suggested by @SokyranTheDragon
@Tick-git
Copy link
Copy Markdown
Contributor Author

Tick-git commented Aug 20, 2025

I implemented the suggested changes for Problem 2 and updated the logic for Problem 1 by replacing pawn.Name with pawn.nameInt.

I created two variations of your proposal:

  • Option A (included in this PR): uses a simple transpiler to change the cleanup logic for both Multiplayer and Singleplayer.
  • Option B: applies the same logic, but only in Multiplayer.

You can see the implementation for Option B here. I kept the logs so you can review how I tested the functionality, since this is the first time I’ve done it this way.

https://github.com/Tick-git/Multiplayer/blob/95904c636c01da663765ebdd63fcb79868929251/Source/Client/Patches/LongEvents.cs#L86-L141

@maxsupermanhd
Copy link
Copy Markdown

We migrated our playthrough to 9d235f4 and so far no issues

@SokyranTheDragon
Copy link
Copy Markdown
Member

The UpdateCurrentAsynchronousEvent patch - pretty clean. I probably would have tried to re-arrange the instructions, so your approach is definitely simpler.

image

The InitFakePawn patch - small question first, is it still necessary after patching UpdateCurrentAsynchronousEvent? If I recall correctly, one of the conditions for syncing was LongEventHandler.currentEvent either being or not being null (I don't recall fully). I'd double check first if the name still syncs or not, just to be 100% sure.

As for the patch itself, assuming it's still needed - the patch is pretty clean and easy to understand. However, would it work if instead of a transpiler we surround the method with Prefix/Finalizer and set Multiplayer.dontSync to true/false in those? The method doesn't look like it should sync anything while it's called. I feel like the method itself should already be in a synced context (like ticking or synced method), and not be called from UI code.

image

@Tick-git
Copy link
Copy Markdown
Contributor Author

Tick-git commented Aug 23, 2025

Important: From what tool are those screenshots? 😮😮😮

small question first, is it still necessary after patching UpdateCurrentAsynchronousEvent

Yep, I did test it

If I recall correctly, one of the conditions for syncing was LongEventHandler.currentEvent either being or not being null

I thought you had this in mind. But I think then it should be the other way around. Meaning leaving the async version as it is and change the other two. But this would just fix the syncing error in the expose process.

public static bool ShouldSync => InInterface && !dontSync;
public static bool InInterface =>
Client != null
&& !Ticking
&& !ExecutingCmds
&& !reloading
&& Current.ProgramState == ProgramState.Playing
&& LongEventHandler.currentEvent == null;

would it work if instead of a transpiler we surround the method with Prefix/Finalizer and set Multiplayer.dontSync

Yeah that would work. I already had it implemented that way in the unchanged version of this PR. Happily change it back.

I feel like the method itself should already be in a synced context (like ticking or synced method), and not be called from UI code.

I think we don’t update the LongEvents through our tick system (I quickly double checked). The problem happens in the expose process of the statue.
In most cases, InitFakePawn should be invoked by the tick system - for example, when a statue is created at the art table.

@SokyranTheDragon
Copy link
Copy Markdown
Member

I thought you had this in mind. But I think then it should be the other way around. Meaning leaving the async version as it is and change the other two. But this would just fix the syncing error in the expose process.

Honestly, I must have misremembered this code then. Oh well, my bad. Perhaps we should consider adding another condition that checks if we're inside ExecuteToExecuteWhenFinished? Well, probably not worth overthinking it right now.

would it work if instead of a transpiler we surround the method with Prefix/Finalizer and set Multiplayer.dontSync

Yeah that would work. I already had it implemented that way in the unchanged version of this PR. Happily change it back.

Using a prefix/finalizer would probably be safer in case of conflicts with other mods - other mods may use the code we're replacing as hooks, or something of the sorts. Although that's probably not the most likely scenario, I feel. Anyway, either way it's done should work.

So yeah, I'm giving this PR a green light. Even either in current form, or using prefix/finalizer with dontSync set to true/false in them.

@Tick-git
Copy link
Copy Markdown
Contributor Author

I reverted the change and opened #654 so the additional InInterface condition can be added later

Copy link
Copy Markdown
Member

@notfood notfood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, ready?

@notfood notfood merged commit 9b9a085 into rwmt:dev Aug 24, 2025
1 check passed
@github-project-automation github-project-automation bot moved this from In review to Done in 1.6 and Odyssey Aug 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1.6 Fixes or bugs relating to 1.6 (Not Odyssey). fix Fixes for a bug or desync. high priority Important for enjoyable gameplay with the mod.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants