Skip to content

fix(files_external): S3 folder mtime updated on every read-only access#59539

Open
Antreesy wants to merge 4 commits into
masterfrom
fix/noid/s3-mtime-debug
Open

fix(files_external): S3 folder mtime updated on every read-only access#59539
Antreesy wants to merge 4 commits into
masterfrom
fix/noid/s3-mtime-debug

Conversation

@Antreesy
Copy link
Copy Markdown
Contributor

@Antreesy Antreesy commented Apr 9, 2026

  • Resolves: #

Summary

The mtime (and storage_mtime) of an S3 external storage root folder was updated
on every occ files:scan / occ files_external:scan run and on every folder browse
— even when nothing on S3 had changed.

AI findings

Compiled into HTML file here: s3-mtime-propagation.html

How to reproduce

  • Connect an S3 external storage and populate it (tested with rustfs S3)
  • Create a public share, verify folder mtimes are not updated
  • Reload the page, see 'Modified' date updated
  • From a public link, access share, read the folder content, download the file
  • Alternative: run occ files:scan or occ files_external:scan for the folder
  • See 'Modified' date updated on root

Root causes (3, fixed independently)

1. Wrong filecache key for the storage root (AmazonS3#getDirectoryMetaData)

normalizePath('') returns '.' for S3 object keys, but the filecache stores the
storage root under the key ''. getDirectoryMetaData() was calling
getCache()->get('.') which looks up by md5('.'), never matching the root entry
stored at md5('').

The cache miss caused getDirectoryMetaData to return synthetic data with time()
as mtime/storage_mtime and uniqid() as etag on every call. The scanner then
saw a storage_mtime mismatch and wrote the fabricated timestamps back to the cache
on every scan run, regardless of whether any S3 content had changed.

Fix: introduce getCachePath() to translate the normalized root path '.' back to
'' before any filecache lookup.

2. Common::getMetaData clobbers storage_mtime (AmazonS3#getMetaData)

Common::getMetaData() always sets storage_mtime = mtime before returning. For S3
virtual directories mtime can be bumped by child propagation while storage_mtime
should remain the actual last S3 change. When the scanner later calls getMetaData()
it compares data['storage_mtime'] against cacheData['storage_mtime']; if they
differ it writes the value back, triggering View::getCacheEntry to fire
propagateChange on every read even when nothing on S3 changed.

Fix: override getMetaData() to restore storage_mtime from the live cache entry
after the parent call.

3. propagateChange() fired unconditionally after watcher update (View#getCacheEntry)

S3's hasUpdated() always returns true for directories (S3 has no cheap global
change-detection mechanism). getCacheEntry() called watcher->update() and then
unconditionally called propagateChange() whenever the watcher reported a change,
meaning every folder browse bumped the parent mtime chain.

Fix: snapshot the cache entry before watcher->update() and compare it with the
entry after. propagateChange() is only invoked when at least one metadata field
(mtime, storage_mtime, size, or etag) actually changed.


Before After
image image

TODO

  • ...

Checklist

AI (if applicable)

  • The content of this PR was partly or fully generated using AI

@Antreesy Antreesy added this to the Nextcloud 34 milestone Apr 9, 2026
@Antreesy Antreesy self-assigned this Apr 9, 2026
@Antreesy Antreesy force-pushed the fix/noid/s3-mtime-debug branch from 33f416a to 4930285 Compare April 9, 2026 14:51
@Antreesy Antreesy marked this pull request as ready for review April 15, 2026 07:21
@Antreesy Antreesy requested a review from a team as a code owner April 15, 2026 07:21
@Antreesy Antreesy requested review from icewind1991 and nfebe and removed request for a team April 15, 2026 07:21
@Antreesy Antreesy assigned Antreesy and unassigned Antreesy Apr 15, 2026
@Antreesy Antreesy force-pushed the fix/noid/s3-mtime-debug branch from 16473af to aa9e8ab Compare April 24, 2026 08:07
Comment thread lib/private/Files/View.php Outdated
Comment thread apps/files_external/lib/Lib/Storage/AmazonS3.php Outdated
Comment thread apps/files_external/lib/Lib/Storage/AmazonS3.php Outdated
Comment thread apps/files_external/lib/Lib/Storage/AmazonS3.php Outdated
@Antreesy Antreesy force-pushed the fix/noid/s3-mtime-debug branch 2 times, most recently from 73aa23a to f273bc7 Compare April 27, 2026 13:16
@joshtrichards joshtrichards added the hotspot: file time handling ctime, mtime, etc. handling during various operations label May 17, 2026
@Antreesy Antreesy force-pushed the fix/noid/s3-mtime-debug branch from f273bc7 to 960f3b4 Compare May 20, 2026 07:55
@Antreesy Antreesy requested a review from icewind1991 May 20, 2026 13:38
@Antreesy
Copy link
Copy Markdown
Contributor Author

@icewind1991 Can you take another look?

Antreesy and others added 3 commits May 22, 2026 17:55
normalizePath('') returns '.' for S3 object keys, but the filecache stores the
storage root under the key ''. getDirectoryMetaData() was calling getCache()->get('.')
which looks up by md5('.'), never matching the root entry stored at md5('').

The cache miss caused getDirectoryMetaData to return synthetic data with time() as
mtime/storage_mtime and uniqid() as etag on every call. The scanner then saw a
storage_mtime mismatch and wrote the fabricated timestamps back to the cache on every
occ files:scan run, regardless of whether any S3 content had changed.

Introduce getCachePath() to translate the normalized root path '.' back to '' before
any filecache lookup.

Signed-off-by: Maksim Sukharev <antreesy.web@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rs it

Common::getMetaData (Common.php:667) always sets storage_mtime = mtime before
returning. For S3 virtual directories this is wrong: mtime can be bumped by
child propagation while storage_mtime should remain the actual last S3 change.

When the scanner later calls getMetaData() it compares data['storage_mtime']
against cacheData['storage_mtime']. If they differ it writes the value back,
triggering View::getCacheEntry to fire propagateChange on every read even when
nothing on S3 changed.

Override getMetaData() to restore storage_mtime from the live cache entry (or,
for non-root directories not yet in the cache, from the S3 directory marker
LastModified header) after the parent call.

Signed-off-by: Maksim Sukharev <antreesy.web@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…changed

Storage backends like AmazonS3 whose hasUpdated() always returns true (S3 has
no cheap global change-detection mechanism) caused propagateChange() to be called
on every watcher->update() after a folder was browsed, even when the underlying
storage_mtime and etag were identical to the cached values.

Before the fix, getCacheEntry() would call watcher->update() and then
unconditionally call propagateChange() whenever the watcher reported a change.
For S3 directories this meant every read bumped the parent mtime chain.

After the fix, getCacheEntry() snapshots the cache entry before watcher->update()
and compares it with the entry after. propagateChange() is only invoked when at
least one metadata field (mtime, storage_mtime, size, or etag) actually changed.

Signed-off-by: Maksim Sukharev <antreesy.web@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Antreesy Antreesy force-pushed the fix/noid/s3-mtime-debug branch from 960f3b4 to 82c63a3 Compare May 22, 2026 16:00
Add null-safety checks to handle S3 responses that don't include
LastModified and ETag fields. This prevents 'Undefined array key'
warnings and deprecation notices when processing directory metadata
or incomplete S3 responses.

- objectToMetaData(): Check if LastModified/ETag exist before accessing
- getMetaData(): Check if LastModified exists before using in strtotime()

Fixes test failures in testStat where hasUpdated('/', time) would fail
when encountering S3 objects without complete metadata.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Maksim Sukharev <antreesy.web@gmail.com>
@Antreesy
Copy link
Copy Markdown
Contributor Author

Attempted to trim down the changes to minimum
Tested with connected rustfs as S3, with occ commands files:scan, and files_external:scan, neither updates the root folder cache entry with mtime

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3. to review Waiting for reviews bug feature: external storage hotspot: file time handling ctime, mtime, etc. handling during various operations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants