New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File cache table excessively large (and does not shrink after data removal) / Nextcloud should defragment file cache table #7312

Open
ruedigerkupper opened this Issue Nov 27, 2017 · 13 comments

Comments

Projects
None yet
@ruedigerkupper
Copy link

ruedigerkupper commented Nov 27, 2017

Problem

My nextcloud instance (small-scale, single-server setup, 6 users) has an excessively large database size (20 GB as of today). This size doesn't sensibly relate to the amount of data managed (approx. 1 TB, now reduced to 4.3 GB).

I have found that almost all of the 20GB is located in a single table, the file cache:

root@helge:~# ls -lh /var/snap/nextcloud/current/mysql/nextcloud/oc_filecache.*
-rw-r----- 1 root root 21K Jan 8 2017 /var/snap/nextcloud/current/mysql/nextcloud/oc_filecache.frm
-rw-r----- 1 root root 19G Nov 27 15:17 /var/snap/nextcloud/current/mysql/nextcloud/oc_filecache.ibd

This file seems to grow and grow, but never shrinks.

Further details

  • My nextcloud used to manage roughly 1TB of data, most of them located on external storage (external disks attached to my server).
  • External storage was included through nextcloud’s “external storage” app as “Local”.
  • I have since removed all external storage from my nextcloud instance:
root@helge:~# nextcloud.occ files_external:list --all
No mounts configured
  • The data managed by my nextcloud instance is now only 4.3 GB (compared to 1 TB before). Surely this should reduce the file cache?
  • I have called “files:scan” and “files:cleanup”, but to no effect:
root@helge:~# nextcloud.occ files:cleanup
0 orphaned file cache entries deleted

Here’s something I don’t understand: there should be thousands of orphaned entries now that most of the data is gone!

=> I suspect that my file cache table is somehow broken, but I don’t know how to fix it. I believe I should clear the table and reproduce it, but I do not know how to do that.

Steps to reproduce

Unsure, sorry

Expected behaviour

Flie cache table should shrink after external storage was removed.

Actual behaviour

File cache table stays the same (and is excessively large)

General server configuration

Operating system: Linux helge 4.13.0-17-generic #20-Ubuntu SMP Mon Nov 6 10:04:08 UTC 2017 x86_64

Web server: Apache/2.4.28 (Unix) OpenSSL/1.0.2g (fpm-fcgi)

Database: mysql 5.7.18

PHP version: 7.0.23

PHP-modules loaded
 - Core
 - date
 - libxml
 - openssl
 - pcre
 - sqlite3
 - zlib
 - bz2
 - ctype
 - curl
 - dom
 - hash
 - fileinfo
 - filter
 - gd
 - SPL
 - iconv
 - intl
 - json
 - mbstring
 - mcrypt
 - PDO
 - session
 - pdo_sqlite
 - posix
 - Reflection
 - standard
 - SimpleXML
 - mysqlnd
 - exif
 - tokenizer
 - xml
 - xmlreader
 - xmlwriter
 - zip
 - pdo_mysql
 - cgi-fcgi
 - redis

Nextcloud configuration

Nextcloud version: 11.0.5 (stable) - 11.0.5.1 (snap version 3680)

Updated from an older Nextcloud/ownCloud or fresh install: Nextcloud snap, updated from previous snap versions (3317, 2707)

Where did you install Nextcloud from: snap

Are you using external storage, if yes which one: \OC\Files\Storage\Local, see description for details!

Are you using encryption: no

Are you using an external user-backend, if yes which one: Nextcloud sync client from the ubuntu store

Enabled apps
 - activity: 2.4.1
 - admin_audit: 1.1.0
 - audioplayer: 2.2.1
 - comments: 1.1.0
 - dav: 1.1.1
 - federatedfilesharing: 1.1.1
 - federation: 1.1.1
 - files: 1.6.1
 - files_external: 1.1.2
 - files_pdfviewer: 1.0.1
 - files_sharing: 1.1.1
 - files_texteditor: 2.2
 - files_trashbin: 1.1.0
 - files_versions: 1.4.0
 - files_videoplayer: 1.0.0
 - firstrunwizard: 2.0
 - gallery: 16.0.0
 - issuetemplate: 0.2.2
 - logreader: 2.0.0
 - lookup_server_connector: 1.0.0
 - nextcloud_announcements: 1.0
 - notifications: 1.0.1
 - password_policy: 1.1.0
 - provisioning_api: 1.1.0
 - serverinfo: 1.1.1
 - sharebymail: 1.0.1
 - survey_client: 0.1.5
 - systemtags: 1.1.3
 - theming: 1.1.1
 - twofactor_backupcodes: 1.0.0
 - workflowengine: 1.1.1
Disabled apps
 - activitylog
 - calendar
 - encryption
 - external
 - files_accesscontrol
 - files_automatedtagging
 - files_retention
 - ownbackup
 - templateeditor
 - user_external
 - user_ldap
 - user_saml
Content of config/config.php
{
    "apps_paths": [
        {
            "path": "\/snap\/nextcloud\/current\/htdocs\/apps",
            "url": "\/apps",
            "writable": false
        },
        {
            "path": "\/var\/snap\/nextcloud\/current\/nextcloud\/extra-apps",
            "url": "\/extra-apps",
            "writable": true
        }
    ],
    "supportedDatabases": [
        "mysql"
    ],
    "memcache.locking": "\\OC\\Memcache\\Redis",
    "memcache.local": "\\OC\\Memcache\\Redis",
    "redis": {
        "host": "\/tmp\/sockets\/redis.sock",
        "port": 0
    },
    "instanceid": "ocwzswpevaos",
    "passwordsalt": "***REMOVED SENSITIVE VALUE***",
    "secret": "***REMOVED SENSITIVE VALUE***",
    "trusted_domains": [
        "rkupper.no-ip.org",
        "helge"
    ],
    "datadirectory": "\/media\/Data\/nextcloud\/data",
    "overwrite.cli.url": "http:\/\/rkupper.no-ip.org",
    "dbtype": "mysql",
    "version": "11.0.5.1",
    "dbname": "nextcloud",
    "dbhost": "localhost:\/tmp\/sockets\/mysql.sock",
    "dbport": "",
    "dbtableprefix": "oc_",
    "dbuser": "***REMOVED SENSITIVE VALUE***",
    "dbpassword": "***REMOVED SENSITIVE VALUE***",
    "logtimezone": "UTC",
    "installed": true,
    "mail_smtpmode": "php",
    "mail_smtpsecure": "tls",
    "mail_from_address": "ruediger",
    "mail_domain": "rkupper.no-ip.org",
    "mail_smtpauthtype": "LOGIN",
    "mail_smtpauth": 1,
    "mail_smtpname": "***REMOVED SENSITIVE VALUE***",
    "mail_smtppassword": "***REMOVED SENSITIVE VALUE***",
    "loglevel": 3,
    "maintenance": false,
    "singleuser": false,
    "log_rotate_size": 1073741824
}

Client configuration

Browser: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/605.1 (KHTML, like Gecko) Version/11.0 Safari/605.1 Ubuntu/17.10 (3.26.1-1ubuntu1) Epiphany/3.26.1 (Web App)

Operating system: GNU/Linux (Ubuntu 17.10)

@ArnisR

This comment has been minimized.

Copy link

ArnisR commented Nov 27, 2017

Yes, there is no way with NC commands to reduce oc_filecache table if files&folders are deleted or renamed outside NC. It seems the same apply to external shares. If changes are made outside, NC can only add files to oc_filecache.

I my case oc_filecache was expanded due large amount of photos and previews generated by previewgenerator app (20 previews for each photo generated by default). Every preview is also registered in oc_filecache table. I had to modify previewgenerator, manually delete all previews and all db records for files that referenced to preview folder. Then regenerate reduced amount of previews again. It's time consuming process.

There is definitely need for NC maintenance command which keep oc_filecache table optimized according to local filesystem. It would also allows more easily to use NC for fastlane file upload and managing with direct Samba/FTP shares. Because http/webdav upload for numerous files is lagging and not as robust and safe as Samba/FTP.

@MorrisJobke

This comment has been minimized.

Copy link
Member

MorrisJobke commented Nov 27, 2017

@ruedigerkupper

This comment has been minimized.

Copy link

ruedigerkupper commented Nov 27, 2017

@ArnisR: That would explain the size of the file cache: The external disk hosted my photo library and was indexed by previewgenerator. So how would I proceed to reduce the file cache manually? Is there a way to simply clear the table and regenerate it?

@ArnisR

This comment has been minimized.

Copy link

ArnisR commented Nov 27, 2017

@ruedigerkupper : I can't say. There are several other records regarding NC system in that table, not only files. So I wouldn't do it. I saw a thread where someone presented php file which make comparison of actual files and records in database and delete db records for the non-existent. But nobody confirms it's working and my knowledge of MySQL is too short to make conclusion.

I didn't risk. Because most overhead was due experiments with previews I'd managed only them. I put NC in maintenance mode and made backup of database. Delete all content of appdata_.../preview/ folder. With phpmyadmin selected (with search) all records in oc_filecache table where column "path" contain appdata_your-NCspecificpath-here/preview/ and deleted them. I have NC12. I don't know maybe NC11 put previews in different folder (under each user).

Then I modified previewgenerator - rullzer/previewgenerator#78

Switch maintenance mode off and run preview:generate-all. In my case it takes almost 3 days to finish generating them for ~85 K photos on i3 CPU.

At this moment I have problem with this - #7269

@ruedigerkupper

This comment has been minimized.

Copy link

ruedigerkupper commented Nov 28, 2017

Thanks, ArnisR! It have meanwhile solved my problem, and it turned out too be much easier than that:

  • Investigating the oc_filecache table from the mysql prompt showed that it only had abut 4000 rows -- nothing that could account for 20 GB of size!
  • However, the table space was largely empty:
mysql> show table status like "oc_filecache";
+--------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-------------+----------------+---------------------+---------------------+------------+-----------+----------+----------------+---------+
| Name         | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free   | Auto_increment | Create_time         | Update_time         | Check_time | Collation | Checksum | Create_options | Comment |
+--------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-------------+----------------+---------------------+---------------------+------------+-----------+----------+----------------+---------+
| oc_filecache | InnoDB |      10 | Dynamic    | 4106 |           5119 |    21020672 |               0 |     35749888 | 19579011072 |       20409581 | 2017-11-26 23:42:15 | 2017-11-27 18:56:42 | NULL       | utf8_bin  |     NULL |                |         |
+--------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-------------+----------------+---------------------+---------------------+------------+-----------+----------+----------------+---------+

Note "Data_length" (=21 MB) and "Data_free" (=19 GB)!! So, the removed files had actually been cleaned from the file cache, but the InnoDB table was so badly fragmented that it occupied 1000 times the disk space needed for its data.

Going through the mysql reference, I came about https://dev.mysql.com/doc/refman/5.7/en/innodb-file-defragmenting.html. That states that an InnoDB table can be defragmented by issuing a "no-op" ALTER TABLE command:

mysql> ALTER TABLE oc_filecache FORCE;

And -- magically -- this reduced the table to a file size of 8 MB (!!!):

mysql> show table status like "oc_filecache";
+--------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------+----------+----------------+---------+
| Name         | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+--------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------+----------+----------------+---------+
| oc_filecache | InnoDB |      10 | Dynamic    | 4168 |            381 |     1589248 |               0 |       671744 |   2097152 |       20409581 | 2017-11-28 00:40:41 | NULL        | NULL       | utf8_bin  |     NULL |                |         |
+--------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------+----------+----------------+---------+

Summary

So let's get this straight:

  • After removing external storage, Nextcloud left me with a table that occupied 20 GB of disk space, but which contained only 8 MB of data.
  • How the table got so badly fragmented is unclear, but obviously it can happen.
  • The problem could be solved by a simple mysql command for defragmenting the table (ALTER TABLE oc_filecache FORCE;).
  • But neither does the regular nextcloud cron job perform such defragmentation, nor does nextcloud provide a command for manually doing so.
  • Instead, the user is required to read the mysql reference (I really followed a steep learning curve -- until yesterday, I had never worked with mysql ;-) )

Feature request

Nextcloud should defragment the oc_filecache table regularly, probably as part of the regular maintenance cron job.

@ruedigerkupper ruedigerkupper changed the title File cache table excessively large (and does not shrink after data removal) File cache table excessively large (and does not shrink after data removal) / Nextcloud should defragment file cache table Nov 28, 2017

@MorrisJobke MorrisJobke referenced this issue Dec 8, 2017

Merged

13.0.0 beta 2 #7425

28 of 28 tasks complete
@LnLogN

This comment has been minimized.

Copy link

LnLogN commented Dec 18, 2017

Same issue here, our internal next cloud service has only a few hundred users, but the database was over 130GB. We just upgraded from owncloud 9 to nextcloud 12.0.4 and manually cleared the oc_filecache table leaving only the shared links ( with the php script ArnisR mentioned), this leaves only about 400MB, with the scan for all local directories, the database size is only at 700MB(only 2TB data locally), and the web interface functions normally (the service will rescan mounted directories when a user enters that mounted directory, and display the files).

With a closer look at the file system, we found out of the space database inserts are from different users mounting the same file systems over smb (the fs has a few PB of data). The bit of annoying thing is when doing the occ files:scan $user command, the service will actually try to scan all the mounted folders this user have, including the smb mounted folder, which takes forever, we couldn't afford to have users waiting for those scans, so wrote a script to scan only local files.

So here are a few suggestions:

  1. please allow files:scan only scan local directory, with something like: occ files:scan --local, because local directories path are much more functionally important than mounted paths
  2. have all external paths stored in a separate temporary filecache table, maybe external_filecache? so we can clear the table and not affect the function of the web service.
@rullzer

This comment has been minimized.

Copy link
Member

rullzer commented Dec 19, 2017

Maybe an explicit occ command is fine. But I'm really against doing this in a regular cron job as it rebuilds the table basically. So on larger instances that is not a good idea.

@cgrima

This comment has been minimized.

Copy link

cgrima commented Dec 27, 2017

Same issue here with a 80GB oc_filecache.ibd while my entire database is roughly 4TB. @ArnisR , @LnLogN, could you give a link to the php script you are mentioning?

@ArnisR, could you give an idea on how long the defragmenting took for you?

For Info, I could get the oc_filecache.ibd growing rate from recent backups:
Sept. 1, 2017: 46GB
Nov.1, 2017: 61GB
Dec. 1, 2017: 69GB
Dec. 27, 2017: 85GB
In the meantime, my database grew up from ~3.5TB to ~4TB, with a mix of small and >1GB files. 7 users.

@pprotschka

This comment has been minimized.

Copy link

pprotschka commented Apr 1, 2018

same issue here. my database was 61gigs with only 400gb of data. I had the root of an external ubuntu server mounted... ~73,918,234 ROWs! lol. anyways is there a way to prevent the indexing of external shares?

@nextcloud-bot nextcloud-bot added the stale label Jun 20, 2018

@McKay1717

This comment has been minimized.

Copy link

McKay1717 commented Nov 16, 2018

Same issue here 20Go of DB for 200Go of local data
Nextcloud version 14.0.3.0

@nextcloud-bot nextcloud-bot removed the stale label Nov 16, 2018

@helmut72

This comment has been minimized.

Copy link

helmut72 commented Jan 3, 2019

anyways is there a way to prevent the indexing of external shares?

Second that. I also want to prevent indexing of external shares. It's not needed for my usecase.

@helmut72

This comment has been minimized.

Copy link

helmut72 commented Jan 10, 2019

MySQL regulary stops working because of this:

mysql> show table status like "oc_filecache";
+--------------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------+----------+-----------------------+---------+
| Name         | Engine | Version | Row_format | Rows     | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time | Collation   | Checksum | Create_options        | Comment |
+--------------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------+----------+-----------------------+---------+
| oc_filecache | InnoDB |      10 | Compressed | 76443704 |            134 | 10273595392 |               0 |  11548246016 |  82837504 |       84169982 | 2018-12-11 12:45:45 | 2019-01-10 09:51:44 | NULL       | utf8mb4_bin |     NULL | row_format=COMPRESSED |         |
+--------------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------+----------+-----------------------+---------+
1 row in set (0,00 sec)
@chaosgrid

This comment has been minimized.

Copy link

chaosgrid commented Jan 11, 2019

*bump

I'm quite surprised that there is no "clean up files in index that have actually been deleted on disk" functionality in Nextcloud, since this is obviously vital to keep Nextcloud running smoothly.
In my case, I did move the data directory and it resulted in some files (appdata and __groupfolder files) being duplicated in the filecache because it generated a new storage entry. I did reset the original storage entry but now I have stale filecache entries. I will probably delete those manually now and then I will have to also clean up tables like oc_share and oc_activity for the deleted filecache entries...

I wonder, we also have SMB mounted external storages and for those Nextcloud automatically cleans up the filecache for files that get deleted (and I did not activate the notify feature as far as I'm aware) ... why does this not work for local files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment