Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tarfs: follow hardlinks in ReadFile #1305

Merged

Conversation

BradLugo
Copy link
Contributor

@BradLugo BradLugo commented Apr 7, 2024

Description

In ReadFile, we either follow symlinks or attempt to read the amount of data from the tar as specified by the Size field in tar.Header. When we come across a hardlink, however, we essentially tell the caller that the file is empty since the Size is 0.

These changes to ReadFile are largely based on the changes @hdonnay made to Open for handling hardlinks: 48317f8

@BradLugo BradLugo self-assigned this Apr 7, 2024
@BradLugo BradLugo requested a review from a team as a code owner April 7, 2024 21:30
@BradLugo BradLugo requested review from crozzy, hdonnay and RTann and removed request for a team April 7, 2024 21:30
@BradLugo
Copy link
Contributor Author

BradLugo commented Apr 7, 2024

When we come across a hardlink, however, we essentially tell the caller that the file is empty since the Size is 0.

I wanted to note that I don't know if this behavior is consistent. For example, perhaps there's a flag you can pass to tar when creating the archive that sets the size header for hardlinks.

@BradLugo BradLugo force-pushed the blugo/tarfs-readfile-hardlink-fix branch from 8745bcb to 24f2cf0 Compare April 7, 2024 21:35
Copy link

codecov bot commented Apr 7, 2024

Codecov Report

Attention: Patch coverage is 61.53846% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 55.76%. Comparing base (b28ae8c) to head (e8f9aff).
Report is 2 commits behind head on main.

Files Patch % Lines
pkg/tarfs/tarfs.go 61.53% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1305      +/-   ##
==========================================
- Coverage   55.79%   55.76%   -0.04%     
==========================================
  Files         265      265              
  Lines       16552    16562      +10     
==========================================
  Hits         9236     9236              
- Misses       6357     6365       +8     
- Partials      959      961       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@BradLugo BradLugo force-pushed the blugo/tarfs-readfile-hardlink-fix branch 2 times, most recently from 2b4d980 to 1786a01 Compare April 8, 2024 05:05
@BradLugo
Copy link
Contributor Author

BradLugo commented Apr 9, 2024

The more I think about this PR, the more I wonder if this is the right approach. Open and ReadFile share a lot of the same logic, and the os library's ReadFile calls its Open (https://cs.opensource.google/go/go/+/refs/tags/go1.22.2:src/os/file.go;l=768).

It makes me wonder if we should move the datasize logic to Open then call Open in ReadFile. The main problem with Open (IIUC) is that hardlinks still aren't technically implemented correctly, i.e., they return an inode with a tar.Header that has a Size of 0. That's not hugely problematic since the caller can follow the link themselves, but I think it goes against the idea of an inode

@BradLugo BradLugo force-pushed the blugo/tarfs-readfile-hardlink-fix branch from 1786a01 to 6fdd963 Compare April 9, 2024 15:34
go.mod Outdated Show resolved Hide resolved
t.Helper()
// This is a perfect candidate for using test.GenerateFixture, but
// creates an import cycle.
f, err := os.Create(filepath.Join(tmp, path.Base(t.Name())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: filepath.Base unless there's a specific reason path is used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was copied from the run function in TestSymlinks 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I used path throughout and swapped specific uses because io/fs mandates forward-slash paths and go just handles forward-slash paths, so it's moot unless the path could be absolute.

pkg/tarfs/tarfs.go Outdated Show resolved Hide resolved
switch {
case typ.IsRegular() && i.h.Typeflag != tar.TypeLink:
r = tar.NewReader(io.NewSectionReader(f.r, i.off, i.sz))
case typ.IsRegular() && i.h.Typeflag == tar.TypeLink:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious would all this just be equivalent to return f.ReadFile(i.h.Linkname)? If so, should we just combine this case and the symlink check into a single case?

@RTann
Copy link
Contributor

RTann commented Apr 9, 2024

The more I think about this PR, the more I wonder if this is the right approach. Open and ReadFile share a lot of the same logic, and the os library's ReadFile calls its Open (https://cs.opensource.google/go/go/+/refs/tags/go1.22.2:src/os/file.go;l=768).

It makes me wonder if we should move the datasize logic to Open then call Open in ReadFile. The main problem with Open (IIUC) is that hardlinks still aren't technically implemented correctly, i.e., they return an inode with a tar.Header that has a Size of 0. That's not hugely problematic since the caller can follow the link themselves, but I think it goes against the idea of an inode

I like the idea of implementing this as closely to os.ReadFile as possible, especially since Open and ReadFile, as implemented here, share a lot of the same logic

Also, Open uses the hardlink's "target" size, no?

@BradLugo
Copy link
Contributor Author

BradLugo commented Apr 10, 2024

The more I think about this PR, the more I wonder if this is the right approach. Open and ReadFile share a lot of the same logic, and the os library's ReadFile calls its Open (https://cs.opensource.google/go/go/+/refs/tags/go1.22.2:src/os/file.go;l=768).
It makes me wonder if we should move the datasize logic to Open then call Open in ReadFile. The main problem with Open (IIUC) is that hardlinks still aren't technically implemented correctly, i.e., they return an inode with a tar.Header that has a Size of 0. That's not hugely problematic since the caller can follow the link themselves, but I think it goes against the idea of an inode

I like the idea of implementing this as closely to os.ReadFile as possible, especially since Open and ReadFile, as implemented here, share a lot of the same logic

Also, Open uses the hardlink's "target" size, no?

Ya know... When I was digging into this before, I saw the note at the bottom of https://pkg.go.dev/archive/tar#Reader.Read and assumed the tarfs.file.h (i.e., the *tar.Header of the hardlink) we were setting was problematic, but after looking more thoroughly when writing my first response, I think you're right. Since we get *tar.Reader with the correct offset and sz, store it in tarfs.file.h, which (IIUC) is what tar.Reader.Read ultimately looks at, I think we avoid the case of getting (0, io.EOF) on hardlinks.

So never mind. Now I want to use Open even more


I guess it really comes down to the hint at the top of ReadFile:

// ReadFileFS is implemented because it can avoid allocating an intermediate
// "file" struct and can immediately allocate a byte slice of the correct
// size.

I'm open to having my mind changed, but maintaining these two functions that are very similar but diverge due to the allocation of a tarfs.file struct in such a critical code path isn't worth it. The PR proves it. Maybe I'm missing something though (like maybe tarfs.file.Read would cause us to allocate the data twice? not sure)

@BradLugo BradLugo force-pushed the blugo/tarfs-readfile-hardlink-fix branch from 5fae3dd to 608e0c2 Compare April 10, 2024 22:29
@hdonnay hdonnay self-assigned this Apr 12, 2024
Copy link
Member

@hdonnay hdonnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

On the rebase, can I trouble you to rewrite the commit summary line to not have fix(tarfs) but rather just tarfs? We try to follow something closer to the go commit style than "conventional commits".

@hdonnay
Copy link
Member

hdonnay commented Apr 12, 2024

I'll look though how this is handled in #669

@BradLugo
Copy link
Contributor Author

LGTM

On the rebase, can I trouble you to rewrite the commit summary line to not have fix(tarfs) but rather just tarfs? We try to follow something closer to the go commit style than "conventional commits".

Yup, will do!
Is this documented anywhere? If not, I can open up another PR and add it somewhere.


Before we merge this, do you mind giving your thoughts on #1305 (comment)? I wonder if we can/should remove ReadFile altogether

Signed-off-by: Brad Lugo <blugo@redhat.com>
@BradLugo BradLugo force-pushed the blugo/tarfs-readfile-hardlink-fix branch from 608e0c2 to e8f9aff Compare April 15, 2024 17:49
@crozzy
Copy link
Contributor

crozzy commented Apr 15, 2024

LGTM
On the rebase, can I trouble you to rewrite the commit summary line to not have fix(tarfs) but rather just tarfs? We try to follow something closer to the go commit style than "conventional commits".

Yup, will do! Is this documented anywhere? If not, I can open up another PR and add it somewhere.

Before we merge this, do you mind giving your thoughts on #1305 (comment)? I wonder if we can/should remove ReadFile altogether

https://github.com/quay/claircore/blob/main/docs/contributor/commit-style.md

@hdonnay
Copy link
Member

hdonnay commented Apr 15, 2024

/fast-forward

@github-actions github-actions bot merged commit e8f9aff into quay:main Apr 15, 2024
8 checks passed
@BradLugo BradLugo changed the title fix(tarfs): follow hardlinks in ReadFile tarfs: follow hardlinks in ReadFile Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants