Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forge configs occasionally become corrupted #32

Open
embeddedt opened this issue Jul 18, 2023 · 6 comments
Open

Forge configs occasionally become corrupted #32

embeddedt opened this issue Jul 18, 2023 · 6 comments
Labels
bug A bug or error

Comments

@embeddedt
Copy link
Member

This is a continuation of MinecraftForge/MinecraftForge#9122.

After performing an analysis of the code Forge uses to interact with the Night Config library, I identified a design flaw which seems to have the potential for race conditions and corruption. In ConfigFileTypeHandler when a config is first registered and loaded, a file watcher is attached. Shortly after that method returns save is invoked on the config object. In summary, this can lead to the following sequence of events:

  1. The CommentedFileConfig instance is built and the watcher is registered (the file on disk is now being watched for changes)
  2. Forge code invokes config.save() which calls into Night Config to write to the file synchronously.
  3. As part of the process of saving, Files.newOutputStream is invoked with the config file's path, which, according to the Javadoc, "initially [truncates] an existing regular-file to a size of 0 if it exists." I verified using a simple test program that, at least on Linux, the OutputStream given by this method is not atomic. The file is truncated immediately upon return of the method, and writes are seen in the underlying file even before the stream is closed.
  4. Because Night Config's file watcher is asynchronous, it will detect any changes to the underlying file, regardless of whether or not saving has actually completed. That means it may see the fully truncated file, or a half-written file. In this scenario, attempting to read from the incomplete file is very likely to result in random errors when the system encounters some corrupted version.

When testing on 1.16, I was able to reliably observe symptoms similar to the theory I have described above - by augmenting the Forge file watcher's run function with my own, I confirmed that the currentlyWriting flag in WriteSyncFileConfig (the config class in NC responsible for saving) was still true when the watcher was invoked, indicating that saving is NOT completed before the watcher is fired.

I think fully fixing this will require a fundamental rethink of how the file watcher is used in Forge, as ideally it needs to never be listening when .save is called.

For completeness, here is a link to the Discord conversation in which I first brought up this new theory.

@quat1024
Copy link

It doesn't help that nightconfig's filewatcher is apparently really aggressive and checks for file changes up to a million times a second apparently (???) which could exacerbate this race

@TheElectronWill
Copy link

TheElectronWill commented Aug 31, 2023

Hi! NightConfig's author here.
First of all, thanks for reporting the issue. I cannot access the discord's conversation, do you have an invitation to the relevant server?
The currentlyWriting flag is meant to avoid that: if save() hasn't finished yet, load() will do nothing. It's not a problem if the FileWatcher calls the handler, because load() won't do anything... at least that's the theory.

In practice, I suspect that the problem comes from the load() method: the double-checked locking idiom is badly implemented:

	@Override
	public void load() {
		if (!currentlyWriting) {
			synchronized (this) {
                                // HERE is missing a double-check if(!currentlyWriting)
				if (closed) {
					throw new IllegalStateException("Cannot (re)load a closed FileConfig");
				}
				parser.parse(nioPath, config, parsingMode, nefAction);
			}
		}
	}

edit: but since write() and parse() are in a synchronized block, reading and writing still isn't possible at the same time. So it's weird 🤔

As for the FileWatcher, that shouldn't cause the problem. A partial write of the file and/or partially updated config may be the root cause instead.

@AterAnimAvis
Copy link
Contributor

It's the NeoForge (previously Forge) Discord Server, https://discord.neoforged.net

@TheElectronWill
Copy link

thanks!

@TheElectronWill
Copy link

TheElectronWill commented Jan 28, 2024

Update: TheElectronWill/night-config#152 has been merged with a full rewrite of FileConfig, based on concurrent configurations (see the comments on the PR). When the new API is used properly, it should hopefully fix the problem :)

@TheElectronWill
Copy link

NightConfig v3.7.0 released 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug or error
Projects
None yet
Development

No branches or pull requests

6 participants
@quat1024 @AterAnimAvis @TheElectronWill @sciwhiz12 @embeddedt and others