Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix two inconsistency issues between stat cache and cache file #2152

Merged
merged 2 commits into from
May 30, 2023

Conversation

eryugey
Copy link
Contributor

@eryugey eryugey commented Apr 26, 2023

Details

Mark pagelist as unloaded if cache file has been truncated

If cache file size doesn't match object size, the cache file might be
corrupted, so invalidate it and save new cache stat file.

Fix inconsistency between stat cache file and cache file

We unlock stat cache file too early in FdEntity::Open(), and would
truncate cache file and update stat cache file, so there's a window that
stat cache doesn't reflect cache file status.

Copy link
Member

@ggtakec ggtakec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you tell us the issue(reason) that led you to submit this PR?

I still don't understand the purpose of this PR fix.
(I've added comments where I think it's intended.)

Therefore, I apologize for the inconvenience, but please let me know the phenomenon that led to the submission of the PR.

src/fdcache_entity.cpp Outdated Show resolved Hide resolved
@eryugey
Copy link
Contributor Author

eryugey commented May 10, 2023

Could you tell us the issue(reason) that led you to submit this PR?

I still don't understand the purpose of this PR fix. (I've added comments where I think it's intended.)

Therefore, I apologize for the inconvenience, but please let me know the phenomenon that led to the submission of the PR.

Sorry for the late reply..

For both patches, I didn't hit any problems in real uses, just noticed the potential issues when I was reading the code.

The first patch "Fix inconsistency between stat cache file and cache file", if cache file needs truncate (is_truncate is true), and cache stat file needs save (need_save_csf is true), there's a window that cache stat doesn't match the cache file, i.e. cache file has been truncated but cache stat file still says it's valid and loaded. This only may cause issue when multiple s3fs instances share the same cache dir, which is not a common setup.

 623         // truncate cache(tmp) file
 624         if(is_truncate){
 625             if(0 != ftruncate(physical_fd, size) || 0 != fsync(physical_fd)){
 626                 S3FS_PRN_ERR("ftruncate(%s) or fsync returned err(%d)", cachepath.c_str(), errno);
 627                 fclose(pfile);
 628                 pfile       = NULL;
 629                 physical_fd = -1;
 630                 inode       = 0;
 631                 return (0 == errno ? -EIO : -errno);
 632             }
 633         }
 634
 635         // reset cache stat file
 636         if(need_save_csf){
 637             if(!pagelist.Serialize(cfstat, true, inode)){
 638                 S3FS_PRN_WARN("failed to save cache stat file(%s), but continue...", path.c_str());
 639             }
 640         }

The second patch "Mark pagelist as unloaded if cache file has been truncated", in case of cache file was truncated to a smaller size and didn't match cache stat file, the if(size != pagelist.Size()) check would pass, but if(size != st.st_size) check would be true, and cache file would be truncated to a larger size again, resulting in a hole, so s3fs will read all zeros and get wrong data. So I invalidate pagelist and cache stat in this case as well.

@ggtakec
Copy link
Member

ggtakec commented May 10, 2023

@eryugey
Thanks for the detailed explanation.
I will check the background, so please wait just a little longer.

@ggtakec
Copy link
Member

ggtakec commented May 13, 2023

@eryugey

"Fix inconsistency between stat cache file and cache file"

I understood your issue regarding about this.
You are right, the original code had potential bugs due to inconsistent handling.
However, I don't think it's a good place to create the CacheFileStat object.
That code works fine, but it internally occurrs errors in the class method call in the non-cache case because the cache directory path is undefined.

"Mark pagelist as unloaded if cache file has been truncated"

Also I understood about this.
But, your fix would discard all caches when that problematic condition occurs.
Therefore, in this fix I think that pagelist.Resize should be used instead of calling pagelist.Init.

Suggestion

About the above two modifications, I wrote the code that I wanted you to modify for this PR code.(That code has been modified in line with my comment above.)
If the code below solves your concern, could you modify your code below into the code and update this PR?
(By that time, I think I've merged some PRs, so it would be helpful if you could rebase to the master branch before updating. Especially since Github Actions on macos10 is no longer supported, so any error will not put out.)

Thanks in advance for your assistance.

int FdEntity::Open(const headers_t* pmeta, off_t size, const struct timespec& ts_mctime, int flags, AutoLock::Type type)
{
    AutoLock auto_lock(&fdent_lock, type);

    S3FS_PRN_DBG("[path=%s][physical_fd=%d][size=%lld][ts_mctime=%s][flags=0x%x]", path.c_str(), physical_fd, static_cast<long long>(size), str(ts_mctime).c_str(), flags);

    if (!auto_lock.isLockAcquired()) {
        // had to wait for fd lock, return
        S3FS_PRN_ERR("Could not get lock.");
        return -EIO;
    }

    AutoLock auto_data_lock(&fdent_data_lock);

    // [NOTE]
    // When the file size is incremental by truncating, it must be keeped
    // as an untreated area, and this area is set to these variables.
    //
    off_t truncated_start = 0;
    off_t truncated_size  = 0;

    if(-1 != physical_fd){
        //
        // already open file
        //

        // check only file size(do not need to save cfs and time.
        if(0 <= size && pagelist.Size() != size){
            // truncate temporary file size
            if(-1 == ftruncate(physical_fd, size) || -1 == fsync(physical_fd)){
                S3FS_PRN_ERR("failed to truncate temporary file(physical_fd=%d) by errno(%d).", physical_fd, errno);
                return -errno;
            }
            // resize page list
            if(!pagelist.Resize(size, false, true)){      // Areas with increased size are modified
                S3FS_PRN_ERR("failed to truncate temporary file information(physical_fd=%d).", physical_fd);
                return -EIO;
            }
        }

        // set untreated area
        if(0 <= size && size_orgmeta < size){
            // set untreated area
            truncated_start = size_orgmeta;
            truncated_size  = size - size_orgmeta;
        }

        // set original headers and set size.
        off_t new_size = (0 <= size ? size : size_orgmeta);
        if(pmeta){
            orgmeta  = *pmeta;
            size_orgmeta = get_size(orgmeta);
        }
        if(new_size < size_orgmeta){
            size_orgmeta = new_size;
        }

    }else{
        //
        // file is not opened yet
        //
        bool  need_save_csf = false;  // need to save(reset) cache stat file
        bool  is_truncate   = false;  // need to truncate

        CacheFileStat* pcfstat = NULL;

        if(!cachepath.empty()){
            // using cache
            struct stat st;
            if(stat(cachepath.c_str(), &st) == 0){
                if(0 > compare_timespec(st, ST_TYPE_MTIME, ts_mctime)){
                    S3FS_PRN_DBG("cache file stale, removing: %s", cachepath.c_str());
                    if(unlink(cachepath.c_str()) != 0){
                        return (0 == errno ? -EIO : -errno);
                    }
                }
            }

            // open cache and cache stat file, load page info.
            pcfstat = new CacheFileStat(path.c_str());

            // try to open cache file
            if( -1 != (physical_fd = open(cachepath.c_str(), O_RDWR)) &&
                0 != (inode = FdEntity::GetInode(physical_fd))        &&
                pagelist.Serialize(*pcfstat, false, inode)            )
            {
                // succeed to open cache file and to load stats data
                memset(&st, 0, sizeof(struct stat));
                if(-1 == fstat(physical_fd, &st)){
                    S3FS_PRN_ERR("fstat is failed. errno(%d)", errno);
                    physical_fd = -1;
                    inode       = 0;
                    return (0 == errno ? -EIO : -errno);
                }
                // check size, st_size, loading stat file
                if(-1 == size){
                    if(st.st_size != pagelist.Size()){
                        pagelist.Resize(st.st_size, false, true); // Areas with increased size are modified
                        need_save_csf = true;                     // need to update page info
                    }
                    size = st.st_size;
                }else{
                    // First if the current cache file size and pagelist do not match, fix pagelist.
                    if(st.st_size != pagelist.Size()){
                        pagelist.Resize(st.st_size, false, true); // Areas with increased size are modified
                        need_save_csf = true;                     // need to update page info
                    }
                    if(size != pagelist.Size()){
                        pagelist.Resize(size, false, true);       // Areas with increased size are modified
                        need_save_csf = true;                     // need to update page info
                    }
                    if(size != st.st_size){
                        is_truncate = true;
                    }
                }

            }else{
                if(-1 != physical_fd){
                    close(physical_fd);
                }
                inode = 0;

                // could not open cache file or could not load stats data, so initialize it.
                if(-1 == (physical_fd = open(cachepath.c_str(), O_CREAT|O_RDWR|O_TRUNC, 0600))){
                    S3FS_PRN_ERR("failed to open file(%s). errno(%d)", cachepath.c_str(), errno);

                    // remove cache stat file if it is existed
                    if(!CacheFileStat::DeleteCacheFileStat(path.c_str())){
                        if(ENOENT != errno){
                            S3FS_PRN_WARN("failed to delete current cache stat file(%s) by errno(%d), but continue...", path.c_str(), errno);
                        }
                    }
                    return (0 == errno ? -EIO : -errno);
                }
                need_save_csf = true;       // need to update page info
                inode         = FdEntity::GetInode(physical_fd);
                if(-1 == size){
                    size = 0;
                    pagelist.Init(0, false, false);
                }else{
                    // [NOTE]
                    // The modify flag must not be set when opening a file,
                    // if the ts_mctime parameter(mtime) is specified(tv_nsec != UTIME_OMIT)
                    // and the cache file does not exist.
                    // If mtime is specified for the file and the cache file
                    // mtime is older than it, the cache file is removed and
                    // the processing comes here.
                    //
                    pagelist.Resize(size, false, (UTIME_OMIT == ts_mctime.tv_nsec ? true : false));

                    is_truncate = true;
                }
            }

            // open mirror file
            int mirrorfd;
            if(0 >= (mirrorfd = OpenMirrorFile())){
                S3FS_PRN_ERR("failed to open mirror file linked cache file(%s).", cachepath.c_str());
                return (0 == mirrorfd ? -EIO : mirrorfd);
            }
            // switch fd
            close(physical_fd);
            physical_fd = mirrorfd;

            // make file pointer(for being same tmpfile)
            if(NULL == (pfile = fdopen(physical_fd, "wb"))){
                S3FS_PRN_ERR("failed to get fileno(%s). errno(%d)", cachepath.c_str(), errno);
                close(physical_fd);
                physical_fd = -1;
                inode       = 0;
                return (0 == errno ? -EIO : -errno);
            }

        }else{
            // not using cache
            inode = 0;

            // open temporary file
            if(NULL == (pfile = FdManager::MakeTempFile()) || -1 ==(physical_fd = fileno(pfile))){
                S3FS_PRN_ERR("failed to open temporary file by errno(%d)", errno);
                if(pfile){
                    fclose(pfile);
                    pfile = NULL;
                }
                return (0 == errno ? -EIO : -errno);
            }
            if(-1 == size){
                size = 0;
                pagelist.Init(0, false, false);
            }else{
                // [NOTE]
                // The modify flag must not be set when opening a file,
                // if the ts_mctime parameter(mtime) is specified(tv_nsec != UTIME_OMIT)
                // and the cache file does not exist.
                // If mtime is specified for the file and the cache file
                // mtime is older than it, the cache file is removed and
                // the processing comes here.
                //
                pagelist.Resize(size, false, (UTIME_OMIT == ts_mctime.tv_nsec ? true : false));

                is_truncate = true;
            }
        }

        // truncate cache(tmp) file
        if(is_truncate){
            if(0 != ftruncate(physical_fd, size) || 0 != fsync(physical_fd)){
                S3FS_PRN_ERR("ftruncate(%s) or fsync returned err(%d)", cachepath.c_str(), errno);
                fclose(pfile);
                pfile       = NULL;
                physical_fd = -1;
                inode       = 0;
                return (0 == errno ? -EIO : -errno);
            }
        }

        // reset cache stat file
        if(need_save_csf && pcfstat){
            if(!pagelist.Serialize(*pcfstat, true, inode)){
                S3FS_PRN_WARN("failed to save cache stat file(%s), but continue...", path.c_str());
            }
        }
        if(pcfstat){
            delete pcfstat;
        }

        // set original headers and size in it.
        if(pmeta){
            orgmeta      = *pmeta;
            size_orgmeta = get_size(orgmeta);
        }else{
            orgmeta.clear();
            size_orgmeta = 0;
        }

        // set untreated area
        if(0 <= size && size_orgmeta < size){
            truncated_start = size_orgmeta;
            truncated_size  = size - size_orgmeta;
        }

        // set mtime and ctime(set "x-amz-meta-mtime" and "x-amz-meta-ctime" in orgmeta)
        if(UTIME_OMIT != ts_mctime.tv_nsec){
            if(0 != SetMCtime(ts_mctime, ts_mctime, AutoLock::ALREADY_LOCKED)){
                S3FS_PRN_ERR("failed to set mtime/ctime. errno(%d)", errno);
                fclose(pfile);
                pfile       = NULL;
                physical_fd = -1;
                inode       = 0;
                return (0 == errno ? -EIO : -errno);
            }
        }
    }

    // create new pseudo fd, and set it to map
    PseudoFdInfo*   ppseudoinfo = new PseudoFdInfo(physical_fd, flags);
    int             pseudo_fd   = ppseudoinfo->GetPseudoFd();
    pseudo_fd_map[pseudo_fd]    = ppseudoinfo;

    // if there is untreated area, set it to pseudo object.
    if(0 < truncated_size){
        if(!AddUntreated(truncated_start, truncated_size)){
            pseudo_fd_map.erase(pseudo_fd);
            if(pfile){
                fclose(pfile);
                pfile = NULL;
            }
            delete ppseudoinfo;
        }
    }

    return pseudo_fd;
}

@eryugey
Copy link
Contributor Author

eryugey commented May 18, 2023

Thanks for the suggestions! I'll find some free cycles to update this PR according to your suggestions.

We unlock stat cache file too early in FdEntity::Open(), and would
truncate cache file and update stat cache file, so there's a window that
stat cache doesn't reflect cache file status.

Suggested-by: Takeshi Nakatani <ggtakec@gmail.com>
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
If cache file size doesn't match object size, the cache file might be
corrupted, so invalidate it and save new cache stat file.

Suggested-by: Takeshi Nakatani <ggtakec@gmail.com>
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
@eryugey
Copy link
Contributor Author

eryugey commented May 30, 2023

Thanks for the suggestions! I'll find some free cycles to update this PR according to your suggestions.

All fixed, PTAL.

@ggtakec
Copy link
Member

ggtakec commented May 30, 2023

@eryugey Thank you for your correction and your contribution.
This PR of yours may solve other problems as well.

@ggtakec ggtakec merged commit 6ca5a24 into s3fs-fuse:master May 30, 2023
eryugey added a commit to eryugey/s3fs-fuse that referenced this pull request May 31, 2023
Previously pcfstat points to a raw pointer, and it may be leaked if
function returned before deleting it.

So use unique_ptr to automatically release the object.

Fixes: 6ca5a24 ("Fix two inconsistency issues between stat cache and cache file (s3fs-fuse#2152)")
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
eryugey added a commit to eryugey/s3fs-fuse that referenced this pull request May 31, 2023
Previously pcfstat points to a raw pointer, and it may be leaked if
function returned before deleting it.

So use unique_ptr to automatically release the object.

Fixes: 6ca5a24 ("Fix two inconsistency issues between stat cache and cache file (s3fs-fuse#2152)")
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
eryugey added a commit to eryugey/s3fs-fuse that referenced this pull request Jun 1, 2023
Previously pcfstat points to a raw pointer, and it may be leaked if
function returned before deleting it.

So use unique_ptr to automatically release the object.

Fixes: 6ca5a24 ("Fix two inconsistency issues between stat cache and cache file (s3fs-fuse#2152)")
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
eryugey added a commit to eryugey/s3fs-fuse that referenced this pull request Jun 1, 2023
Previously pcfstat points to a raw pointer, and it may be leaked if
function returned before deleting it.

So use smart pointer to automatically release the object.

Note that currently s3fs only uses c++03, so we use auto_ptr here, not
unique_ptr, which requires c++11.

Fixes: 6ca5a24 ("Fix two inconsistency issues between stat cache and cache file (s3fs-fuse#2152)")
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
eryugey added a commit to eryugey/s3fs-fuse that referenced this pull request Jun 1, 2023
Previously pcfstat points to a raw pointer, and it may be leaked if
function returned before deleting it.

So use smart pointer to automatically release the object.

Note that currently s3fs only uses c++03, so we use auto_ptr here, not
unique_ptr, which requires c++11.

Fixes: 6ca5a24 ("Fix two inconsistency issues between stat cache and cache file (s3fs-fuse#2152)")
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
gaul pushed a commit that referenced this pull request Jul 19, 2023
Previously pcfstat points to a raw pointer, and it may be leaked if
function returned before deleting it.

So use smart pointer to automatically release the object.

Note that currently s3fs only uses c++03, so we use auto_ptr here, not
unique_ptr, which requires c++11.

Fixes: 6ca5a24 ("Fix two inconsistency issues between stat cache and cache file (#2152)")
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
@gaul
Copy link
Member

gaul commented Jul 19, 2023

@eryugey could you share more about your use case? We have not seen users attempting to share a cache directory between multiple s3fs instances before. I don't think we guarantee any atomicity of these files.

@guaneryu
Copy link

guaneryu commented Aug 6, 2023

@eryugey could you share more about your use case? We have not seen users attempting to share a cache directory between multiple s3fs instances before. I don't think we guarantee any atomicity of these files.

I'm trying to use a distributed filesystem as s3fs's cache dir, so multiple s3fs instances may share the same cache to speed up I/O performance.

adamqqqplay pushed a commit to adamqqqplay/s3fs-fuse that referenced this pull request Oct 18, 2023
…fuse#2152)

commit 6ca5a24 upstream

* Fix inconsistency between stat cache file and cache file

We unlock stat cache file too early in FdEntity::Open(), and would
truncate cache file and update stat cache file, so there's a window that
stat cache doesn't reflect cache file status.

Suggested-by: Takeshi Nakatani <ggtakec@gmail.com>
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>

* Mark pagelist as unloaded if cache file has been truncated

If cache file size doesn't match object size, the cache file might be
corrupted, so invalidate it and save new cache stat file.

Suggested-by: Takeshi Nakatani <ggtakec@gmail.com>
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>

---------

Fixes: #49342993
Fixes: #49342673
Reference: s3fs-fuse#2152
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants