Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable lazy initialization of ext3/ext4 filesystems #38865

Merged
merged 1 commit into from
Jan 18, 2017

Conversation

codablock
Copy link
Contributor

@codablock codablock commented Dec 16, 2016

What this PR does / why we need it: It enables lazy inode table and journal initialization in ext3 and ext4.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #30752, fixes #30240

Release note:

Enable lazy inode table and journal initialization for ext3 and ext4

Special notes for your reviewer:
This PR removes the extended options to mkfs.ext3/mkfs.ext4, so that the defaults (enabled) for lazy initialization are used.

These extended options come from a script that was historically located at /usr/share/google/safe_format_and_mount and later ported to GO so this dependency to the script could be removed. After some search, I found the original script here: https://github.com/GoogleCloudPlatform/compute-image-packages/blob/legacy/google-startup-scripts/usr/share/google/safe_format_and_mount

Checking the history of this script, I found the commit Disable lazy init of inode table and journal.. This one introduces the extended flags with this description:

Now that discard with guaranteed zeroing is supported by PD,
initializing them is really fast and prevents perf from being affected
when the filesystem is first mounted.

The problem is, that this is not true for all cloud providers and all disk types, e.g. Azure and AWS. I only tested with magnetic disks on Azure and AWS, so maybe it's different for SSDs on these cloud providers. The result is that this performance optimization dramatically increases the time needed to format a disk in such cases.

When mkfs.ext4 is told to not lazily initialize the inode tables and the check for guaranteed zeroing on discard fails, it falls back to a very naive implementation that simply loops and writes zeroed buffers to the disk. Performance on this highly depends on free memory and also uses up all this free memory for write caching, reducing performance of everything else in the system.

As of #30752, there is also something inside kubelet that somehow degrades performance of all this. It's however not exactly known what it is but I'd assume it has something to do with cgroups throttling IO or memory.

I checked the kernel code for lazy inode table initialization. The nice thing is, that the kernel also does the guaranteed zeroing on discard check. If it is guaranteed, the kernel uses discard for the lazy initialization, which should finish in a just few seconds. If it is not guaranteed, it falls back to using bios, which does not require the use of the write cache. The result is, that free memory is not required and not touched, thus performance is maxed and the system does not suffer.

As the original reason for disabling lazy init was a performance optimization and the kernel already does this optimization by default (and in a much better way), I'd suggest to completely remove these flags and rely on the kernel to do it in the best way.

@k8s-ci-robot
Copy link
Contributor

Hi @codablock. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with @k8s-bot ok to test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

If you have questions or suggestions related to this bot's behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 16, 2016
@k8s-reviewable
Copy link

This change is Reviewable

@codablock
Copy link
Contributor Author

CC @colemickens

@k8s-github-robot k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Dec 16, 2016
@rootfs
Copy link
Contributor

rootfs commented Dec 16, 2016

@k8s-bot ok to test

@rootfs
Copy link
Contributor

rootfs commented Dec 16, 2016

lgtm, this is consistent with mkfs.ext4 default.

@codablock
Copy link
Contributor Author

@rootfs any chance to get this into 1.5.x as well? As you know, azure is affected by this and makes using large volumes very difficult

@rootfs
Copy link
Contributor

rootfs commented Dec 16, 2016

would love it in 1.5 once merged into 1.6
@lavalamp

@k8s-ci-robot
Copy link
Contributor

Jenkins unit/integration failed for commit c938794e60b60dde81a8767b59ff2cfbf77b6312. Full PR test history.

The magic incantation to run this job again is @k8s-bot unit test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@rootfs
Copy link
Contributor

rootfs commented Dec 16, 2016

@codablock have you fixed unit test too?

safe_format_and_mount_test.go:176: Unexpected args [-F /dev/foo]. Expecting [-E lazy_itable_init=0,lazy_journal_init=0 -F /dev/foo]
safe_format_and_mount_test.go:176: Unexpected args [-F /dev/foo]. Expecting [-E lazy_itable_init=0,lazy_journal_init=0 -F /dev/foo]
safe_format_and_mount_test.go:176: Unexpected args [-F /dev/foo]. Expecting [-E lazy_itable_init=0,lazy_journal_init=0 -F /dev/foo]
safe_format_and_mount_test.go:176: Unexpected args [-F /dev/foo]. Expecting [-E lazy_itable_init=0,lazy_journal_init=0 -F /dev/foo]
safe_format_and_mount_test.go:176: Unexpected args [-F /dev/foo]. Expecting [-E lazy_itable_init=0,lazy_journal_init=0 -F /dev/foo]
safe_format_and_mount_test.go:176: Unexpected args [-F /dev/foo]. Expecting [-E lazy_itable_init=0,lazy_journal_init=0 -F /dev/foo]
safe_format_and_mount_test.go:176: Unexpected args [-F /dev/foo]. Expecting [-E lazy_itable_init=0,lazy_journal_init=0 -F /dev/foo]
safe_format_and_mount_test.go:176: Unexpected args [-F /dev/foo]. Expecting [-E lazy_itable_init=0,lazy_journal_init=0 -F /dev/foo]

@k8s-ci-robot
Copy link
Contributor

Jenkins GCE e2e failed for commit c938794e60b60dde81a8767b59ff2cfbf77b6312. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@codablock
Copy link
Contributor Author

@rootfs Unit tests should be fixed now. I also removed the extended flags from all the bash scripts to be more consistent.

@k8s-github-robot k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 18, 2016
@saad-ali saad-ali added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Dec 21, 2016
@saad-ali
Copy link
Member

The root cause of #30752 appears to be an alignment issue in Azure. If that is the case, is this PR still necessary?

@codablock
Copy link
Contributor Author

@saad-ali Alignment turned out to not be the fault on #30752. As we don't even use partitions in the Azure Dynamic Disk provisioning case, there is no chance to have unaligned partitions. I also did some performance tests to confirm that we don't have alignment issues.

Please see #30752 (comment) for detailed information on this.

@colemickens
Copy link
Contributor

@saad-ali / @lavalamp Do we want to proceed with this? I'd like to see it merged since we can't seem to determine the underlying cause for why mkfs is so slow when run under kubelet with the current settings.

@brendandburns brendandburns self-assigned this Jan 10, 2017
@brendandburns
Copy link
Contributor

@codablock thanks for the detailed examination of the issue.

@saad-ali this looks ok to me and I think we should go ahead with it.

Any concerns before I LGTM?

Thanks
--brendan

@brendandburns brendandburns removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Jan 10, 2017
@brendandburns
Copy link
Contributor

I'm going to LGTM this. We can roll it back if @saad-ali has any further concerns.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 12, 2017
@codablock
Copy link
Contributor Author

@brendandburns Thanks for the LGTM :) The Github check Jenkins kops AWS e2e hangs on "Waiting for status to be reported" since quite some time. Do we have to retrigger CI?

Btw, is there any documentation available that describes the whole CI and merge process of k8s?

@ncdc
Copy link
Member

ncdc commented Jan 17, 2017

@k8s-bot kops aws e2e test this

@k8s-ci-robot
Copy link
Contributor

Jenkins kops AWS e2e failed for commit 13a2bc8. Full PR test history. cc @codablock

The magic incantation to run this job again is @k8s-bot kops aws e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ncdc
Copy link
Member

ncdc commented Jan 17, 2017

Saw various timeouts in several test cases. Re-kicking to see if it's temporary.

@k8s-bot kops aws e2e test this

@liggitt
Copy link
Member

liggitt commented Jan 18, 2017

@k8s-bot test this

@k8s-ci-robot
Copy link
Contributor

Jenkins verification failed for commit 13a2bc8. Full PR test history. cc @codablock

The magic incantation to run this job again is @k8s-bot verify test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@rootfs
Copy link
Contributor

rootfs commented Jan 18, 2017

@k8s-bot verify test this

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@colemickens
Copy link
Contributor

@saad-ali Can we get this one cherry-picked to 1.5.x as well?

@colemickens
Copy link
Contributor

ping @saad-ali for a 1.5.x cherry-pick.

@saad-ali saad-ali added this to the v1.5 milestone Feb 7, 2017
@saad-ali saad-ali added cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cherrypick-candidate labels Feb 7, 2017
k8s-github-robot pushed a commit that referenced this pull request Feb 10, 2017
…rigin-release-1.5

Automatic merge from submit-queue

Automated cherry pick of #38865

Cherry pick of #38865 on release-1.5.

#38865: Enable lazy initialization of ext3/ext4 filesystems
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.5" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

@colemickens
Copy link
Contributor

colemickens commented Feb 10, 2017

Ah, I didn't realize this had been tagged... I went ahead and manually sent the PR using the ./hack/cherry_pick_pull.sh script.

edit: Oh, no, I guess the cherry-pick just happened to get merged right around the same time I opened an automated cherry-pick of my own...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mkfs.ext4 is slow when run by kubelet Creating ext4 filesystem on empty network volumes takes long time