Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redefining SURVIVABLE_NONATOMIC to use atomic writes but to skip fsync(2) #163

Closed
wants to merge 4 commits into from

Conversation

basil
Copy link
Member

@basil basil commented Jun 29, 2021

See JENKINS-66001 and jenkinsci/jenkins#5599. Part 2 of a 5-part series to make Pipeline's SURVIVABLE_NONATOMIC mode behave as advertised in the documentation (parts 3 through 5 are in jenkinsci/workflow-support-plugin#120, jenkinsci/workflow-cps-plugin#452, and jenkinsci/workflow-job-plugin#199 respectively).

I tested this by running a Pipeline job with 100 steps in SURVIVABLE_NONATOMIC mode while simultaneously attaching a remote debugger to XmlFile and monitoring fsync(2) calls with syncsnoop.bt. I confirmed that AtomicFileWriter was being used in the Java debugger, and I confirmed that fsync(2) was not being used with syncsnoop.bt.

@jglick
Copy link
Member

jglick commented Jul 6, 2021

There is a problem with core Javadoc. I am looking into it.

Copy link
Member

@jtnord jtnord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the title is incorrect - writes are definatly not atomic with this change (writes never are atomic)

The move of the file after writing may be atomic, however it also may not be. This is both OperatingSystem and file system dependant, and the atomicity of the move (overwrite/replace) is best effort.

thus this not correct to say it will tollerate JVM crashes. It may for some users of some operating systems on some filesystems

Copy link
Member

@jtnord jtnord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update comments to reflect what is actually happening and that this is both OS and filesystem dependant

@@ -1,6 +1,6 @@
FlowDurabilityHint.PERFORMANCE_OPTIMIZED.description=Performance-optimized: much faster (requires clean shutdown to save running pipelines)
FlowDurabilityHint.PERFORMANCE_OPTIMIZED.tooltip=Avoids writing data with every step, avoids atomic writes of data. Pipelines can resume if Jenkins shuts down cleanly, but running pipelines lose step information and cannot resume if Jenkins unexpectedly fails.
FlowDurabilityHint.SURVIVABLE_NONATOMIC.description=Less durability, a bit faster (specialty use only)
FlowDurabilityHint.SURVIVABLE_NONATOMIC.tooltip=Writes data with every step but avoids atomic writes. On some filesytems this is faster than maximum durability mode, but running pipeline data may be lost if disk writes are interrupted or fail.
FlowDurabilityHint.SURVIVABLE_NONATOMIC.description=Less durability, a bit faster (requires stable OS and storage but tolerates dirty JVM shutdown)
Copy link
Member

@jtnord jtnord Aug 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FlowDurabilityHint.SURVIVABLE_NONATOMIC.description=Less durability, a bit faster (requires stable OS and storage but tolerates dirty JVM shutdown)
FlowDurabilityHint.SURVIVABLE_NONATOMIC.description=Less durability, a bit faster (requires stable OS and storage but may tolerate a dirty JVM shutdown if certain OS/Filesystem combinations are met)

it is important to let users know that this will not always tollerate a JVM shutdown. it is dependant on the OS/Filessytem support for Atomic moves (POSIX compliance). Not going to provide a list of combinations that do / do not support this. If some guidance is necessary then the requirement is generally a local block device with a reasonably modern file system on a Unix-like operating system.

It may be because it is pipeline and these small files are only ever written and not modified, that things are better than if updating something, but if anything else in pipeline is written and would assume that the file is there (program.dat?) then I would expect you are still in the realms of dragons.

FlowDurabilityHint.SURVIVABLE_NONATOMIC.description=Less durability, a bit faster (specialty use only)
FlowDurabilityHint.SURVIVABLE_NONATOMIC.tooltip=Writes data with every step but avoids atomic writes. On some filesytems this is faster than maximum durability mode, but running pipeline data may be lost if disk writes are interrupted or fail.
FlowDurabilityHint.SURVIVABLE_NONATOMIC.description=Less durability, a bit faster (requires stable OS and storage but tolerates dirty JVM shutdown)
FlowDurabilityHint.SURVIVABLE_NONATOMIC.tooltip=Writes data with every step but avoids flushing the page cache to the storage device. On some filesytems this is faster than maximum durability mode, but running pipeline data may be lost if disk writes are interrupted or fail.
Copy link
Member

@jtnord jtnord Aug 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FlowDurabilityHint.SURVIVABLE_NONATOMIC.tooltip=Writes data with every step but avoids flushing the page cache to the storage device. On some filesytems this is faster than maximum durability mode, but running pipeline data may be lost if disk writes are interrupted or fail.
FlowDurabilityHint.SURVIVABLE_NONATOMIC.tooltip=Writes data with every step but avoids flushing the page cache for the specific file to the storage device. On some filesytems this is faster than maximum durability mode, but running pipeline data may be lost if disk writes are interrupted or fail, or if the JVM terminates abruptly in some OS and filesystem combinations.

@jtnord
Copy link
Member

jtnord commented May 24, 2022

@basil are you still interested in this PR, do you want to look at the suggestions?

@jglick
Copy link
Member

jglick commented May 24, 2022

It may [tolerate JVM crashes] for some users of some operating systems on some filesystems

FYI I have recently been checking behavior of Jenkins when restored from EBS snapshots (using default durability settings) and do occasionally see weird cases of stray atomicxxx.tmp files inside running build directories but have not tried to track down the behavior at this level. I have always been leery of asking users to pick a durability setting without necessarily understanding the consequences; if you could assume that the filesystem orders writes (maybe not a good assumption) then you could rework how metadata files are stored to use a more journaled mode, so that the last file written includes links to the desired versions of previous files, but this could be significant effort that might be better spent on e.g. SQL storage.

@basil
Copy link
Member Author

basil commented May 24, 2022

I do not accept your suggestions.

@basil basil closed this May 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants