Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files are not deleted from S3 (primary) #20333

Open
solracsf opened this issue Apr 6, 2020 · 34 comments
Open

Files are not deleted from S3 (primary) #20333

solracsf opened this issue Apr 6, 2020 · 34 comments
Labels
0. Needs triage Pending check for reproducibility or if it fits our roadmap 25-feedback bug feature: object storage feature: trashbin

Comments

@solracsf
Copy link
Member

solracsf commented Apr 6, 2020

How to use GitHub

  • Please use the 馃憤 reaction to show that you are affected by the same issue.
  • Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
  • Subscribe to receive notifications on status change and new comments.

Steps to reproduce

  1. Set S3 as primary storage
  2. Upload, say, 2000 files into a folder (JPGs here)
  3. Delete that folder, and try to empty trashbin

Expected behaviour

Trashbin should be empty correctly

Actual behaviour

After some time, an error appears (Error while empty trash). Reloading the page shows no more files either in the Files or Trashbin.

image

But files still on the Object Storage, here OBJECTS and SIZE:

image

Before these test operations (upload, delete...)

image

Following commands had been executed (after):

sudo -u testing php occ files:scan test

Starting scan for user 1 out of 1 (test)
+---------+-------+--------------+
| Folders | Files | Elapsed time |
+---------+-------+--------------+
| 0       | 0     | 00:00:00     |
+---------+-------+--------------+
sudo -u testing php occ files:cleanup

0 orphaned file cache entries deleted
sudo -u www-data php occ trashbin:cleanup --all-users

Remove deleted files for all users
Remove deleted files for users on backend Database
   test

One user has reported that interface show he is "using" 1,9Gb of storage, but he has NO FILES or FOLDERS at all, either in FILES or TRASHBIN in a production instance.

Server configuration

Operating system: Ubuntu 18.04

Web server: Nginx 17

Database: MariaDB 10.4

PHP version: 7.3

Nextcloud version: (see Nextcloud admin page) 18.0.3

Updated from an older Nextcloud/ownCloud or fresh install: Fresh install

Where did you install Nextcloud from: Official sources

Signing status:

Signing status
No errors have been found.

List of activated apps:

App list
Enabled:
  - accessibility: 1.4.0
  - admin_audit: 1.8.0
  - announcementcenter: 3.7.0
  - apporder: 0.9.0
  - cloud_federation_api: 1.1.0
  - dav: 1.14.0
  - external: 3.5.0
  - federatedfilesharing: 1.8.0
  - files: 1.13.1
  - files_accesscontrol: 1.8.1
  - files_automatedtagging: 1.8.2
  - files_pdfviewer: 1.7.0
  - files_rightclick: 0.15.2
  - files_sharing: 1.10.1
  - files_trashbin: 1.8.0
  - files_versions: 1.11.0
  - files_videoplayer: 1.7.0
  - groupfolders: 6.0.3
  - impersonate: 1.5.0
  - logreader: 2.3.0
  - lookup_server_connector: 1.6.0
  - notifications: 2.6.0
  - oauth2: 1.6.0
  - password_policy: 1.8.0
  - privacy: 1.2.0
  - provisioning_api: 1.8.0
  - settings: 1.0.0
  - sharebymail: 1.8.0
  - theming: 1.9.0
  - theming_customcss: 1.5.0
  - twofactor_backupcodes: 1.7.0
  - viewer: 1.2.0
  - workflow_script: 1.3.1
  - workflowengine: 2.0.0

Nextcloud configuration:

Config report
{
    "system": {
        "objectstore": {
            "class": "\\OC\\Files\\ObjectStore\\S3",
            "arguments": {
                "bucket": "testing.example.com",
                "autocreate": true,
                "key": "***REMOVED SENSITIVE VALUE***",
                "secret": "***REMOVED SENSITIVE VALUE***",
                "hostname": "10.1.0.2",
                "port": 8080,
                "use_ssl": false,
                "region": "fr-par",
                "use_path_style": true
            }
        },
        "log_type": "file",
        "logfile": "\/var\/log\/nextcloud\/testing.example.com-nextcloud.log",
        "passwordsalt": "***REMOVED SENSITIVE VALUE***",
        "secret": "***REMOVED SENSITIVE VALUE***",
        "trusted_domains": [
            "testing.example.com"
        ],
        "datadirectory": "***REMOVED SENSITIVE VALUE***",
        "dbtype": "mysql",
        "version": "18.0.3.0",
        "overwrite.cli.url": "https:\/\/testing.example.com",
        "dbname": "***REMOVED SENSITIVE VALUE***",
        "dbhost": "***REMOVED SENSITIVE VALUE***",
        "dbport": "3306",
        "dbtableprefix": "oc_",
        "mysql.utf8mb4": true,
        "dbuser": "***REMOVED SENSITIVE VALUE***",
        "dbpassword": "***REMOVED SENSITIVE VALUE***",
        "dbdriveroptions": {
            "1009": "\/etc\/ssl\/mysql\/ca-cert.pem",
            "1008": "\/etc\/ssl\/mysql\/client-cert.pem",
            "1007": "\/etc\/ssl\/mysql\/client-key.pem",
            "1014": false
        },
        "installed": true,
        "skeletondirectory": "",
        "default_language": "fr",
        "default_locale": "fr_FR",
        "activity_expire_days": 30,
        "auth.bruteforce.protection.enabled": false,
        "blacklisted_files": [
            ".htaccess",
            "Thumbs.db",
            "thumbs.db"
        ],
        "htaccess.RewriteBase": "\/",
        "integrity.check.disabled": false,
        "knowledgebaseenabled": false,
        "logtimezone": "Europe\/Paris",
        "maintenance": false,
        "memcache.local": "\\OC\\Memcache\\APCu",
        "memcache.distributed": "\\OC\\Memcache\\Redis",
        "updatechecker": false,
        "appstoreenabled": false,
        "upgrade.disable-web": true,
        "filelocking.enabled": false,
        "overwriteprotocol": "https",
        "preview_max_scale_factor": 1,
        "redis": {
            "host": "***REMOVED SENSITIVE VALUE***",
            "port": 6379,
            "timeout": 2.5,
            "dbindex": 2,
            "password": "***REMOVED SENSITIVE VALUE***"
        },
        "quota_include_external_storage": false,
        "theme": "",
        "trashbin_retention_obligation": "auto, 7",
        "updater.release.channel": "stable",
        "mail_smtpmode": "smtp",
        "mail_smtpsecure": "tls",
        "mail_sendmailmode": "smtp",
        "mail_from_address": "***REMOVED SENSITIVE VALUE***",
        "mail_domain": "***REMOVED SENSITIVE VALUE***",
        "mail_smtpauth": 1,
        "mail_smtphost": "***REMOVED SENSITIVE VALUE***",
        "mail_smtpport": "587",
        "mail_smtpname": "***REMOVED SENSITIVE VALUE***",
        "mail_smtppassword": "***REMOVED SENSITIVE VALUE***",
        "instanceid": "***REMOVED SENSITIVE VALUE***",
        "overwritehost": "testing.example.com",
        "preview_max_x": "1280",
        "preview_max_y": "800",
        "jpeg_quality": "70",
        "loglevel": 2,
        "enabledPreviewProviders": [
            "OC\\Preview\\PNG",
            "OC\\Preview\\JPEG",
            "OC\\Preview\\GIF",
            "OC\\Preview\\BMP",
            "OC\\Preview\\XBitmap"
        ],
        "apps_paths": [
            {
                "path": "\/var\/www\/apps",
                "url": "\/apps",
                "writable": false
            },
            {
                "path": "\/var\/www\/custom",
                "url": "\/custom_apps",
                "writable": true
            }
        ]
    }
}

Logs are completely empty (we have just fired up a test instance, and test this use case).

Similar to #17744

@solracsf solracsf added bug 0. Needs triage Pending check for reproducibility or if it fits our roadmap labels Apr 6, 2020
@solracsf solracsf changed the title If empty the trash times out, files are not deleted from S3 (primary) If empty the trash fails, files are not deleted from S3 (primary) Apr 6, 2020
@SimplyCorbett
Copy link

Your issue is likely filelocking. Disable it in the nextcloud config, disable redis filelocking. Restart PHP-FPM and try reproducing this again.

In my case by disabling filelocking all of my issues related to deletion were resolved. I just let the S3 backend handle the filelocking now.

@solracsf
Copy link
Member Author

solracsf commented Apr 6, 2020

Filelocking is already disabled (see my config in the 1st post).

@SimplyCorbett
Copy link

SimplyCorbett commented Apr 6, 2020

Filelocking is already disabled (see my config in the 1st post).

My bad, another foot in mouth moment. If you wait a while are they removed from the backend? Sometimes with S3 deletion is delayed on the backend.

@solracsf
Copy link
Member Author

solracsf commented Apr 6, 2020

Thanks but I don't think so as if i upload a 200M file and delete it, i can see it in real time in the S3 backend. And 2h had passed now and files are there (cron is running every 5mn).

@SimplyCorbett
Copy link

Thanks but I don't think so as if i upload a 200M file and delete it, i can see it in real time in the S3 backend. And 2h had passed now and files are there (cron is running every 5mn).

Right but with S3 in particular if a file is removed but is locked on the S3 backend it can take a while for it to process the deletions. 2 hours is a fairly long time though.

If you have your S3 provider run the garbage collection process do the files stay or are they deleted?

@SimplyCorbett
Copy link

With amazon they also include an option to retain locked objects for x days.

https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lock-overview.html
https://aws.amazon.com/blogs/storage/protecting-data-with-amazon-s3-object-lock/

Can you verify that's not the case and garbage collection doesn't resolve the issue?

@solracsf
Copy link
Member Author

solracsf commented Apr 6, 2020

I'm not using Amazon but Scaleway, they have Lifecycle Rules but they are disable by default.
What do you call GC in S3?

@SimplyCorbett
Copy link

I'm not using Amazon but Scaleway, they have Lifecycle Rules but they are disable by default.
What do you call GC in S3?

I use radosgw-admin gc process. Each host has their own rules for garbage collection. Do they have a end-user option for garbage collection or an API call to process it?

If not you would need to contact them directly and ask how often it runs and if they can run it now.

@SimplyCorbett
Copy link

SimplyCorbett commented Apr 6, 2020

Edit: Scaleway runs it once a day on their cold S3 storage. I don't know about the other storage options - best contact them about it.

@SimplyCorbett
Copy link

SimplyCorbett commented Apr 7, 2020

I owe you an apology. I don't use nextcloud for images but for file storage. You are correct that files are not being deleted properly when it comes to image previews.

@JUVOJustin
Copy link

Same here with wasabi as storage backend. I am having many problems with s3 currently. Maybe there is something more general broken.

@solracsf solracsf changed the title If empty the trash fails, files are not deleted from S3 (primary) Files are not deleted from S3 (primary) Apr 24, 2020
@solracsf
Copy link
Member Author

solracsf commented Nov 24, 2020

I can always confirm this with v19.0.5.
My test instance is completely empty, no files at all, trashbin cleaned, but mc outputs this:

./mc du minio/bucket
2.7GiB

and ./mc ls minio/bucket lists hundreds of files from my different tests.
Some of the files were created more than one month ago in the bucket.

These are clearly not images previews as i have big files in the bucket:

./mc ls minio/bucket
...
[2020-11-24 20:16:54 CET] 115KiB urn:oid:50503
[2020-10-07 10:10:30 CEST] 1.1KiB urn:oid:15082
[2020-11-24 20:38:10 CET]  99KiB urn:oid:55762
[2020-10-07 10:09:26 CEST] 5.5MiB urn:oid:14773
[2020-11-24 20:38:09 CET] 192KiB urn:oid:55750
[2020-10-07 09:59:00 CEST]  26KiB urn:oid:11050
[2020-10-07 10:10:27 CEST]   110B urn:oid:15034
[2020-10-06 11:21:30 CEST] 360KiB urn:oid:883
[2020-11-24 20:21:33 CET] 271KiB urn:oid:54307
[2020-10-07 09:59:52 CEST]  25MiB urn:oid:11158
...

Summary: an empty instance, and a bucket with 2.87GB used and 3685 objects in it.
馃槷

@changsheng1239
Copy link

Yes, can confirmed this issue with Nextcloud 20.0.1 also.
My steps:

  1. upload a 500mb files.
  2. cancel the upload at 100mb.

Now my minio bucket is having 10 x 10mb chunked file which should've been deleted.

@solracsf
Copy link
Member Author

solracsf commented Dec 31, 2020

I've tested it with AWS S3, to eliminate any compatibility issues with S3 'compatible' providers.
Object lock is disabled for this test case.

Problem remains.

  • Create an instance with one single user (admin) with AWS bucket as primary
  • Login and upload 5 folders from my external HDD containing thousands of sub-folders and files of all sort, for a total of 13,8Gib
  • Select and delete them all: an error like Can't delete folder - nothing in logs about this - has showed in WebUI for 2 of them but after refresh the Files tab, it shows an empty root)
  • Empty the Trashbin by php occ trashbin:clean --all-users)

Check the bucket stats of the files empty NC instance:

./mc du --versions aws/bucket
2.9GiB

This is a serious problem for many reasons, specially GDPR when users request their files to be deleted and they aren't, beyond the S3 billing for objects we aren't using anymore.

cc @nextcloud/server-triage can someone take a look at this? I believe this affects every ObjectStorage instance but since files are named urn:oid:xxx nobody really knows what files are on their buckets.

@disco-panda
Copy link

+1 for GDPR concerns.

It has been almost a year since this issue was brought up and S3 is heavily used in enterprise environments - any ideas when this will be prioritized? Unfortunately, we cannot rely on using S3 for storage if we cannot show that files are completely removed.

@caretuse
Copy link

caretuse commented Apr 3, 2021

I have a suggestion on this issue.

I know it is hard to sync between files (including filesystem or object storage) and database, not mention about handling cache involed. I believe it is impossible to make database correct after a hardware-based failure, just like a simple power failure. So I suggest there should be a way to check current file list and file information at database.

There is a command occ files:scan to sync between files and database in filesystem based storage condition, but this is not applicable in settings using object as primary storage. I believe every server using object stroage always using a standalone bucket (or directory), so it is safe to clean uncontrolled or unregistered files.

I also appreciate if developers take a look at object server direct download function, an issue had opened at #14675. This function can save our server non-essential bandwitdh, and server loading.

I use object storage (minio) as primary storage because files can be backup easily. I don't need to shutdown Nextcloud server for a long time, and I can seperate database and file server easily. I believe this workout will be widely used in enterprise level, I also wish this suggestion can help Nextcloud server at deployment and migration.

@siglun88
Copy link

I am having the same issue running Nextcloud 21.0.2 with Digital Ocean Spaces (S3) as primary storage. In my case it seems that the issue only occurs when server-side encryption is activated. Although, I haven't tested too much without encryption so I can't be too conclusive.

Also, I agree with @caretuse
It would be much appreciated if both of these features could be implemented in some future release

There is a command occ files:scan to sync between files and database in filesystem based storage condition, but this is not applicable in settings using object as primary storage. I believe every server using object stroage always using a standalone bucket (or directory), so it is safe to clean uncontrolled or unregistered files.

I also appreciate if developers take a look at object server direct download function, an issue had opened at #14675. This function can save our server non-essential bandwitdh, and server loading.

@jeffglancy
Copy link

Yes, can confirmed this issue with Nextcloud 20.0.1 also.
My steps:

  1. upload a 500mb files.
  2. cancel the upload at 100mb.

Now my minio bucket is having 10 x 10mb chunked file which should've been deleted.

I have been running Nextcloud using S3 storage for over two years. I noticed my bucket was bloating early on. Digital Ocean shows my S3 was using 800GB even though my only user had 218 GB of files including versioning. I've been watching this issue for a long time now hoping for a solution, but finally got around to looking into it myself.

I compared the s3cmd la file list output to the oc_filecache database table. I expected to find extra trash files in the S3 bucket. I was perplexed to find that the file list matched perfectly. This was easy to check as the database fileid is the urn:oid:___ number. The file size from the data also matched. This led me to research more about S3 storage.

I finally found that the bloat was from old incomplete uploads. You can list these using the s3cmd multipart s3://BUCKET/ command. S3 allows large uploads to be uploaded as smaller multipart files which are then concatenated when the upload is complete. This helps reduce data transfer in case of interrupted uploads as it can resume near where it left off. It appears neither Nextcloud nor S3 storage is set to delete old incomplete multipart uploads by default. You can remove each file set individually using s3cmd abortmp s3://BUCKET/FILENAME UPLOAD_ID.

Nextcloud could remove old multipart data if it kept track of them. But S3 has the ability to do so on its own. Using s3cmd you upload an XML rule to the S3 bucket:
s3cmd setlifecycle lifecycle.xml s3://BUCKET/
Where lifecycle.xml is:

<LifecycleConfiguration>
        <Rule>
                <ID>Remove uncompleted uploads</ID>
                <Prefix/>
                <Status>Enabled</Status>
                <AbortIncompleteMultipartUpload>
                        <DaysAfterInitiation>3</DaysAfterInitiation>
                </AbortIncompleteMultipartUpload>
        </Rule>
</LifecycleConfiguration>

This rule will run once a day at midnight UTC according to what I found. After waiting a day my nearly 800 incomplete uploads spanning over two years were gone and my S3 storage now sits at 220GB as it should.

This doesn't appear to be the solution to all of the issues in this thread, but hopefully it helps some. In my case the files marked as trash or versioning in the database were being removed correctly according to the rules I have in Nextclouds config file. I have transactional file locking disabled and encryption is not enabled.

@szaimen
Copy link
Contributor

szaimen commented Aug 8, 2021

I suppose this is still happening on NC21.0.4?

@Sivarion
Copy link

Sivarion commented Aug 29, 2021

I suppose this is still happening on NC21.0.4?

I use Nextcloud 22.0.1 and have the exact same problem with Scaleway S3. At this point I have about 35 GB used by my users, but storage is filled with 74 GB.

Edit:
Manually running command: ./occ trashbin:clean --all-users has fixed it for me, but I guess problem will return in time.

@szaimen
Copy link
Contributor

szaimen commented Sep 15, 2021

Manually running command: ./occ trashbin:clean --all-users has fixed it for me

Looks like the original issue is fixed then.

@acsfer can you still reproduce this on NC21.0.4 or NC22.1.1?

@solracsf
Copy link
Member Author

@szaimen can't help anymore here, we moved away from S3...

@krakazyabra
Copy link

I can confirm, problem exists, even after upgrading to latest (22.1.1) version.
./occ trashbin:clean --all-users didn't help
In interface I see: 146.7Gb is used, and minio shows 961Gb for this user and his bucket.
image

image

@ghost
Copy link

ghost commented Nov 4, 2021

This issue has been automatically marked as stale because it has not had recent activity and seems to be missing some essential information. It will be closed if no further activity occurs. Thank you for your contributions.

@ghost ghost added the stale Ticket or PR with no recent activity label Nov 4, 2021
@szaimen szaimen added 1. to develop Accepted and waiting to be taken care of and removed needs info 0. Needs triage Pending check for reproducibility or if it fits our roadmap stale Ticket or PR with no recent activity labels Nov 4, 2021
@agowa
Copy link

agowa commented Nov 23, 2021

I've a similar issue with my S3 bucket. But for me it's most likely caused by partially failed multipart uploads, where nextcloud didn't clean up the junks it already pushed into the S3 after the upload failed (other issue #29516).

@leonbollerup
Copy link

same issue here - basiclly - i suggest everyone avoid using s3 as primary storage unless you want to throw money out the window.

@Scandiravian
Copy link

Scandiravian commented Feb 22, 2022

I snooped around the Nextcloud database and it seems that the issue is, that objects uploaded to S3 are not committed to the db until the transfer to S3 is completed. If a transfer is interrupted, then Nextcloud looses track of the object, since no record of ongoing transfers is kept.

A potential fix could be to log ongoing transfers in the database and occasionally do a clean-up if something goes wrong.

Until this is fixed Nextcloud will continue to bloat the bucket, so I've hacked together a python script that cleans up the S3 storage. It doesn't solve any of the open issues using S3 as primary storage - it simply cleans up orphaned objects in the bucket, thereby bringing down the amount of storage used by Nextcloud.

DISCLAIMER:
I'm a stranger on the internet providing a script, that requires access to your personal (and probably sensitive data) -> Do not trust strangers on the internet. Please review the code before running it. I'm not responsible if this script destroys your data, corrupts your db, makes your house catch fire, or curse you to step on Lego bricks every time you have bare feet.

Since the issue seems to be caused by the db not being updated until a transfer to S3 is complete, the script might delete objects that have successfully been transferred to S3, but have not yet been recorded in the database, if it's run while a sync is in progress. Therefore you should not run this while a sync is in progress. I repeat Do not run this while a sync is in progress!

I've run/tested this against my own setup (Minio + Postgres) and haven't encountered any issues so far. If you use any other combination of S3 compatible storage and database, you'll need to modify the code to your needs:

gist

@jeffglancy
Copy link

jeffglancy commented Feb 22, 2022

Why reinvent the wheel? Look back at my post on July 4 and S3 lifecycle rules. Since then I have had zero issues with S3 storage bloating from NC 20 through 23.
(#20333 (comment))

@caretuse
Copy link

@Scandiravian made a good script to solve database not consistent issue, although I believe this should be implemented in occ trashbin:cleanup --all-users, just like NeoTheThird mentioned in #29841.

@jeffglancy and otherguy also made a good script to solve another issue, which is cleanup pending multipart uploads in S3. But lazy as me, I would choose rclone cleanup s3:bucket in rclone document, rather simple and mistake-proof solution.

@solracsf solracsf closed this as completed Oct 3, 2022
@szaimen szaimen reopened this Oct 3, 2022
@szaimen

This comment has been minimized.

@szaimen szaimen added needs info 0. Needs triage Pending check for reproducibility or if it fits our roadmap and removed 1. to develop Accepted and waiting to be taken care of labels Jan 9, 2023
@caretuse
Copy link

I tested some scenario

  1. Delete S3 object with Nextcloud server shut down: Files remain even occ files:scan --all, until delete manual from Nextcloud
  2. Turn off S3 server (minio) with Nextcloud file deleting: Files remain in trash bin (will appear after reloading webpage)

I can't confirm files not shown in Nextcloud scenario, it should manipulate in database level. It goes beyond my interest.

Does anyone have environment to test?

@Corinari
Copy link

Corinari commented Feb 9, 2023

Hi @szaimen ,

we are currently running Nextcloud 25 and still experience this problem. On one instance, our S3 Bucket shows 209GB of data, while counting the different users quota in NC itself comes to about 55GB.
select sum(size/1024/1024/1024) as size_GB, count(*) as anzahl from oc_filecache where mimetype != 2; shows around 203GB of data which is tracked by NC
Trashbin is (almost) empty (~2GB)

Occurs on different Instances, which were built with a custom Docker Image.

@mrAceT
Copy link

mrAceT commented Feb 20, 2023

@Corinari

I created a script (S3->local) once upon a time I had trouble with S3.. partially because I found a bug and feared it was S3 related (but wasn't, fixed that one: #34422 ;) ) later on (partially with creating that migration script), I dared to try to migrate back to S3.. "reversing" that script was quite a challenge.. but I got it working.. With creating that script I built in various "sanity checks".. and I now run my "local->S3" script every now and then to clean up my S3.. and baring a little hiccup every now and then the script rarely needs to clean stuf up..

A few weeks ago I decided to publish it on Github, take a look at:
https://github.com/mrAceT/nextcloud-S3-local-S3-migration

PS: I have various users on my Nextcloud, totaling some 100+Gb of data

@aurelienpierre
Copy link

I wrote a Python script to delete orphaned S3 objects (among other work-arounds for NC lack of proper S3 support): https://github.com/aurelienpierre/clean-nextloud-s3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0. Needs triage Pending check for reproducibility or if it fits our roadmap 25-feedback bug feature: object storage feature: trashbin
Projects
None yet
Development

No branches or pull requests