fix(files_external): S3 folder mtime updated on every read-only access#59539
Open
Antreesy wants to merge 4 commits into
Open
fix(files_external): S3 folder mtime updated on every read-only access#59539Antreesy wants to merge 4 commits into
Antreesy wants to merge 4 commits into
Conversation
33f416a to
4930285
Compare
16473af to
aa9e8ab
Compare
icewind1991
reviewed
Apr 24, 2026
icewind1991
reviewed
Apr 24, 2026
icewind1991
reviewed
Apr 24, 2026
icewind1991
reviewed
Apr 24, 2026
73aa23a to
f273bc7
Compare
f273bc7 to
960f3b4
Compare
Contributor
Author
|
@icewind1991 Can you take another look? |
normalizePath('') returns '.' for S3 object keys, but the filecache stores the
storage root under the key ''. getDirectoryMetaData() was calling getCache()->get('.')
which looks up by md5('.'), never matching the root entry stored at md5('').
The cache miss caused getDirectoryMetaData to return synthetic data with time() as
mtime/storage_mtime and uniqid() as etag on every call. The scanner then saw a
storage_mtime mismatch and wrote the fabricated timestamps back to the cache on every
occ files:scan run, regardless of whether any S3 content had changed.
Introduce getCachePath() to translate the normalized root path '.' back to '' before
any filecache lookup.
Signed-off-by: Maksim Sukharev <antreesy.web@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rs it Common::getMetaData (Common.php:667) always sets storage_mtime = mtime before returning. For S3 virtual directories this is wrong: mtime can be bumped by child propagation while storage_mtime should remain the actual last S3 change. When the scanner later calls getMetaData() it compares data['storage_mtime'] against cacheData['storage_mtime']. If they differ it writes the value back, triggering View::getCacheEntry to fire propagateChange on every read even when nothing on S3 changed. Override getMetaData() to restore storage_mtime from the live cache entry (or, for non-root directories not yet in the cache, from the S3 directory marker LastModified header) after the parent call. Signed-off-by: Maksim Sukharev <antreesy.web@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…changed Storage backends like AmazonS3 whose hasUpdated() always returns true (S3 has no cheap global change-detection mechanism) caused propagateChange() to be called on every watcher->update() after a folder was browsed, even when the underlying storage_mtime and etag were identical to the cached values. Before the fix, getCacheEntry() would call watcher->update() and then unconditionally call propagateChange() whenever the watcher reported a change. For S3 directories this meant every read bumped the parent mtime chain. After the fix, getCacheEntry() snapshots the cache entry before watcher->update() and compares it with the entry after. propagateChange() is only invoked when at least one metadata field (mtime, storage_mtime, size, or etag) actually changed. Signed-off-by: Maksim Sukharev <antreesy.web@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
960f3b4 to
82c63a3
Compare
Add null-safety checks to handle S3 responses that don't include
LastModified and ETag fields. This prevents 'Undefined array key'
warnings and deprecation notices when processing directory metadata
or incomplete S3 responses.
- objectToMetaData(): Check if LastModified/ETag exist before accessing
- getMetaData(): Check if LastModified exists before using in strtotime()
Fixes test failures in testStat where hasUpdated('/', time) would fail
when encountering S3 objects without complete metadata.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Maksim Sukharev <antreesy.web@gmail.com>
Contributor
Author
|
Attempted to trim down the changes to minimum |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
mtime(andstorage_mtime) of an S3 external storage root folder was updatedon every
occ files:scan/occ files_external:scanrun and on every folder browse— even when nothing on S3 had changed.
AI findings
Compiled into HTML file here: s3-mtime-propagation.html
How to reproduce
occ files:scanorocc files_external:scanfor the folderRoot causes (3, fixed independently)
1. Wrong filecache key for the storage root (
AmazonS3#getDirectoryMetaData)normalizePath('')returns'.'for S3 object keys, but the filecache stores thestorage root under the key
''.getDirectoryMetaData()was callinggetCache()->get('.')which looks up bymd5('.'), never matching the root entrystored at
md5('').The cache miss caused
getDirectoryMetaDatato return synthetic data withtime()as
mtime/storage_mtimeanduniqid()asetagon every call. The scanner thensaw a
storage_mtimemismatch and wrote the fabricated timestamps back to the cacheon every scan run, regardless of whether any S3 content had changed.
Fix: introduce
getCachePath()to translate the normalized root path'.'back to''before any filecache lookup.2.
Common::getMetaDataclobbersstorage_mtime(AmazonS3#getMetaData)Common::getMetaData()always setsstorage_mtime = mtimebefore returning. For S3virtual directories
mtimecan be bumped by child propagation whilestorage_mtimeshould remain the actual last S3 change. When the scanner later calls
getMetaData()it compares
data['storage_mtime']againstcacheData['storage_mtime']; if theydiffer it writes the value back, triggering
View::getCacheEntryto firepropagateChangeon every read even when nothing on S3 changed.Fix: override
getMetaData()to restorestorage_mtimefrom the live cache entryafter the parent call.
3.
propagateChange()fired unconditionally after watcher update (View#getCacheEntry)S3's
hasUpdated()always returnstruefor directories (S3 has no cheap globalchange-detection mechanism).
getCacheEntry()calledwatcher->update()and thenunconditionally called
propagateChange()whenever the watcher reported a change,meaning every folder browse bumped the parent
mtimechain.Fix: snapshot the cache entry before
watcher->update()and compare it with theentry after.
propagateChange()is only invoked when at least one metadata field(
mtime,storage_mtime,size, oretag) actually changed.TODO
Checklist
3. to review, feature component)stable32)AI (if applicable)