Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]Delete corrupt exist lower layer when pull image #42978

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zvier
Copy link
Contributor

@zvier zvier commented Oct 31, 2021

If layer corrupt with disk full, host or daemon crash or some other reasons,
may let the layer link and lower file empty in the disk, which can lead new
image pull also get lower file with invalid content, typically image lower file's
content ends with an colon character. Despit the image pull success, but can not inspect
with error message "Error response from daemon: readlink /var/lib/docker/overlay2/l: invalid argument".
So, when pull image check exist layer, can add layer valid check logic. If
cause a corrupt layer, delete it and pull it again.

Signed-off-by: Jeff Zvier zvier20@gmail.com

@zvier
Copy link
Contributor Author

zvier commented Oct 31, 2021

Related ISSUE #35747

@zvier zvier changed the title Delete corrupt and exist lower layer and when pull image. Delete corrupt and exist lower layer when pull image. Oct 31, 2021
@zvier zvier changed the title Delete corrupt and exist lower layer when pull image. [WIP]Delete corrupt and exist lower layer when pull image. Oct 31, 2021
@zvier zvier changed the title [WIP]Delete corrupt and exist lower layer when pull image. Delete corrupt and exist lower layer when pull image. Oct 31, 2021
@zvier zvier changed the title Delete corrupt and exist lower layer when pull image. Delete corrupt and exist lower layer when pull image Nov 7, 2021
@zvier
Copy link
Contributor Author

zvier commented Nov 10, 2021

@thaJeztah PTAL

@zvier zvier changed the title Delete corrupt and exist lower layer when pull image Delete corrupt exist lower layer when pull image Nov 10, 2021
Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a preventative I would recommend changing overlay's create method to setup the dir in a staging area and then move into place once complete.
Then we can cleanup those staging areas on startup if there is stuff left over.

@@ -398,10 +398,24 @@ func (ls *layerStore) Get(l ChainID) (Layer, error) {
if layer == nil {
return nil, ErrLayerDoesNotExist
}
if !ls.driver.Exists(layer.cacheID) {
if err := ls.deleteCorruptLayer(layer); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though the layer itself is corrupt, deleting it at this level doesn't really seem right.
For drivers like btrfs this especially wouldn't work (granted it seems like it would be weirder to get into this situation with btrfs).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like if we want to recover from this we need a way to repair the layer, but this seems non-trivial from a more generic standpoint.

@zvier zvier changed the title Delete corrupt exist lower layer when pull image [WIP]Delete corrupt exist lower layer when pull image Nov 28, 2021
@zvier zvier force-pushed the fix_corrupt_layer branch 3 times, most recently from fa073f7 to 87bb081 Compare December 2, 2021 07:27
@zvier zvier force-pushed the fix_corrupt_layer branch 3 times, most recently from 9b269f2 to acbac61 Compare December 18, 2021 03:49
@zvier zvier force-pushed the fix_corrupt_layer branch 5 times, most recently from ae9540d to 87a2fbb Compare December 26, 2021 07:23
If lower layers corrupt with disk full, host or daemon crash or other reasons,
may let the layer link and lower file empty in the disk, which can lead new
image pull also get an invalid lower file, typically image lower file's
content ends with an colon. Despit the image pull success, but can not inspect
with error message "Error response from daemon: readlink /var/lib/docker/overlay2/l: invalid argument".
So, when pull image check exist layer, can add layer valid check logic. If
cause a corrupt layer, delete and pull register it again.

Signed-off-by: Jeff Zvier <zvier20@gmail.com>
@florianfischerx
Copy link

Hi,
@zvier are you still working on this pull request?
@cpuguy83 It seams you disagree with deleting the layer in case of corruption. How would a recovery strategy fundamentally look (i.e. is there even a generic way)? Looking through the code, it is unclear to me how exactly this issue can occur (is create the only problematic path?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants