Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with v28 - 'sudo nextcloud.occ upgrade;' fails on very large databases #2758

Closed
LMRW opened this issue May 30, 2024 · 50 comments
Closed

Comments

@LMRW
Copy link
Contributor

LMRW commented May 30, 2024

Hi There

  1. Your readme still says v27
  2. I installed v28 latest channel snap update
  3. My NC instance goes down and never returns
  4. nextcloud.occ status shows Nextcloud or one of the apps require upgrade - only a limited number of commands are available
  5. I run sudo nextcloud.occ upgrade
  6. I see errors - nextcloud snap Repair error: An exception occurred while executing a query: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away & Failed to connect to the database: An exception occurred in the driver: SQLSTATE[HY000] [2002] Connection refused
  7. I cannot upgrade and my nextcloud instance is down and unavailable

Luckily I run within a VM. So I have restored a snapshot and locked to channel 27 for now.

System:
Ubuntu LTS 22.04 Arm64

Note:
Prior to this for last few days I was occasionally seeing "System Internal Error" Nextcloud branded error screens when changing page/url. A refresh would always resolve the issue and work second time. I never saw these before this week and they have been a couple of times a day now. Unsure how related. My snaps auto refresh daily.

@LMRW
Copy link
Contributor Author

LMRW commented May 30, 2024

And as a note - thank you again for the amazing product. HPF works great recently and I enjoy your snap 99% of time upgrades perfectly every time. As I have a VM snapshot now, i'm happy to experiment. I can do whatever you like on my prod environment backup as it is a VM snapshot I can restore and test and break and rewind. Hopefully an ideal situation to help.

@LMRW
Copy link
Contributor Author

LMRW commented May 30, 2024

Further note: sudo nextcloud.mysql-client does work, but does not show the usual welcome screen of mysql copyright etc

@user8446
Copy link

I'm also on Ubuntu LTS 22.04 (but x86/64) and the upgrade to 28 went without issue so we can eliminate something global here:

$ sudo nextcloud.occ status
  - installed: true
  - version: 28.0.6.1
  - versionstring: 28.0.6
  - edition:
  - maintenance: false
  - needsDbUpgrade: false
  - productname: Nextcloud
  - extendedSupport: false

@LMRW
Copy link
Contributor Author

LMRW commented May 30, 2024

Thank you for confirming

Interesting is I have attempted it a couple of times now (thanks to VM snapshot) and it fails every time so it's not a fluke

I might try disabling all apps next (like nextcloud mail etc) and see if works better

@scubamuc
Copy link
Member

scubamuc commented May 30, 2024

@LMRW thanks for your request and your heads up

... snap 99% of time upgrades perfectly every time

with your help we'll surely reach 100% 👍

run the debugging script and post here

@user8446

I'm also on Ubuntu LTS 22.04 (but x86/64) and the upgrade to 28 went without issue so we can eliminate something global here:

while I'd agree, lets see if we can help @LMRW solve his issue.

@LMRW

I might try disabling all apps next (like nextcloud mail etc) and see if works better

yip, that's exactly what @user8446 is suggesting but not ALL only 3rd party apps

@user8446
Copy link

user8446 commented May 30, 2024

So snap services nextcloud.mysql is showing enabled and active?

With the error screens you were getting earlier in the week, MySQL monitor not working, and the specific MySQL error this seems to point in that direction.

@LMRW
Copy link
Contributor Author

LMRW commented May 31, 2024

I will do a deep dive tomorrow and post the debug script but I wanted to immediately mention this incase relevant

I read this: #2734

This week I (not via snap) installed cmake on this VM for the first time. Could that be related?

@scubamuc
Copy link
Member

@stondino00

installed cmake on this VM for the first time. Could that be related?

could it?

@scubamuc
Copy link
Member

@user8446

upgrade to 28 went without issue so we can eliminate something global here:

4 instances (x86/64) upgraded without issues for me too.

had some issues with "News" app not being up to date, but activated it anyway without further issues.

@steinger
Copy link

The upgrade to 28 worked well for me with Ubuntu 22.04 LTS (x86_64), but I had to restart the cron sudo nextcloud.occ background:cron and the following message came up for the Collectives app "Collectives app is enabled, but PDO SQLite driver is missing." I deactivated Collectives because I didn't really need it.

I also use the News app, version 24.0.0 is not supportet for NC28. You can activate it on NC28, then the sitebar layout is brocken, you have to adjust the custom.css there. Templates are available on Github / News, or you can manually install the unstable version 25.0.0.

I would like to take this opportunity to thank the Nextcloud Snap team for their hard work in bringing NC28 to Snap. I read in your issue how much effort went into it this time for this Version. Great!

@scubamuc
Copy link
Member

@steinger

following message came up for the Collectives app "Collectives app is enabled, but PDO SQLite driver is missing."

confirmed this:

grafik

would you mind creating a new issue to track this?

@pachulo
Copy link
Member

pachulo commented May 31, 2024

would you mind creating a new issue to track this?

It is already created.

@scubamuc
Copy link
Member

@steinger

already created...

done....

@scubamuc
Copy link
Member

scubamuc commented May 31, 2024

@steinger

just did this to allow news 25.0.0 sudo nextcloud.occ app:update --allow-unstable news see comment after which the app updated successfully

works for me

@steinger
Copy link

@scubamuc

When I searched for the problem about News App, I first found the workaround for custom.css nextcloud/news#2610 and I try it, only later did I see the nextcloud/news#2585 for install the unstable 25.0.0

What still made me suspicious was the question of whether the unstable channel updates automatically or how do you go from unstable back to stable.

But if the new version is so cool, maybe I should try it 😉

@scubamuc
Copy link
Member

scubamuc commented May 31, 2024

@steinger

yes I agree, I'm thinking along the same lines. how to get back to stable?

Set the update channel in the Admin settings for Nextcloud to beta and wait until the update for News is offered in the webinterface

we'll have to watch nextcloud/news#2585

But if the new version is so cool, maybe I should try it 😉

yeah it is quite spiffy, we could watch this together. dunno 'bout you, but I rely heavily on my news app

@LMRW
Copy link
Contributor Author

LMRW commented May 31, 2024

I also have nextcloud news installed but haven't seen any issues with that specifically as nothing is booting at all. I still plan for a deep dive later today. Will report all I find. Thank you everyone

@scubamuc
Copy link
Member

@LMRW, if all else fails, you can revert to last working good like @adrianvg did here: #2759 (comment)

@LMRW
Copy link
Contributor Author

LMRW commented May 31, 2024

Thank you. I read previously we cannot skip major versions? So I cannot just skip 28 and go 27->29 when released? The comment in #2759 (comment) seems to indicate thats possible?

@steinger
Copy link

@scubamuc

yeah it is quite spiffy, we could watch this together. dunno 'bout you, but I rely heavily on my news app

yeah, I feel the same, the news app is one of my most important ones.

I have now installed the unstable version. Then a message appeared: "Missing optional index "news_feeds_deleted_at_index" in the table "news_feeds.", I guess the new version came with new indexes. So I ran nextcloud.occ db:add-missing-indices again.

Later I noticed that the cron job on Nextcloud-Sanp had failed again, the failure started exactly at the time when I installed the News app unstable app.

@scubamuc
Copy link
Member

@steinger,

I can't confirm cron failing... but you can try this

sudo nextcloud.occ background:cron

@scubamuc
Copy link
Member

scubamuc commented May 31, 2024

@LMRW,

Thank you. I read previously we cannot skip major versions? So I cannot just skip 28 and go 27->29 when released? The comment in #2759 (comment) seems to indicate thats possible?

that is correct, but last known good was 28.05 if I'm not mistaken?

but this stifles me also:

grafik

@Pilzinsel64 cloud you make a suggestion?

@Pilzinsel64
Copy link
Member

Pilzinsel64 commented May 31, 2024

@scubamuc That's ok, as 28.0.6snap1 is the latest one and active. But why 27.1.9snap1 actually is in stable might be just a mistake. However, as long as it's disabled it's all fine. Shouldn't happen again for the next update.

Also, yes, last known good was 28.0.5. but for me 28.0.6 is also nicely working (as manual install and as snap).

You can also not skip an major version from side of nextcloud. It will lizery break your instance if you skip necessary migration steps.

installed cmake on this VM for the first time. Could that be related?

I can't imagine that it has an negative side effect. We moved to make snap, yes, but only relevant for building the snap.

@pachulo
Copy link
Member

pachulo commented May 31, 2024

Thank you. I read previously we cannot skip major versions? So I cannot just skip 28 and go 27->29 when released? The comment in #2759 (comment) seems to indicate thats possible?

@LMRW that's not possible. What @adrianvg did was revert to the 27.1.9snap1, because it still was on their system, and then started tracking the 27/stable channel, so that they don't get the update to 28 yet.
The thing with this approach is that they will need to manually update to a newer version/channel soon, as version 27 will be out of support next month.

that is correct, but last known good was 28.05 if I'm not mistaken?

@scubamuc no, 28.0.5 was never released to the latest/stable channel, the most used one, because we were still solving #2659

But why 29.1.9snap1 actually is in stable might be just a mistake.

@Pilzinsel64 it's not a mistake: it is the exact version that was in latest/stable before 28.0.6snap1 was promoted.

@noticons
Copy link

noticons commented Jun 1, 2024

I don't know if this will help, but my own server wasn't working, so I watched the logs, and saw that Circles was throwing errors. So I disabled it and everything worked perfectly.
sudo nextcloud.occ log:watch
nextcloud.occ app:disable circles

@LMRW
Copy link
Contributor Author

LMRW commented Jun 1, 2024

I disabled all non-default apps.

I stopped nextcloud

sudo snap stop nextcloud

I run the below:
sudo snap refresh nextcloud --channel=latest

Response was:

Error: cannot perform the following tasks:
- Run pre-refresh hook of "nextcloud" snap if present (run hook "pre-refresh": 
-----
Waiting for Apache...

<exceeded maximum runtime of 10m0s>

I started Nextcloud up... still v27,. still working.

sudo snap start nextcloud

sudo nextcloud.occ status

  - installed: true
  - version: 27.1.9.1
  - versionstring: 27.1.9
  - edition: 
  - maintenance: false
  - needsDbUpgrade: false
  - productname: Nextcloud
  - extendedSupport: false

Now I run a second time... this time I do not stop it first, his time it worked to upgrade the snap.

Now sudo nextcloud.occ status would return:

Nextcloud or one of the apps require upgrade - only a limited number of commands are available
You may use your browser or the occ upgrade command to do the upgrade
  - installed: true
  - version: 28.0.6.1
  - versionstring: 28.0.6
  - edition: 
  - maintenance: true
  - needsDbUpgrade: true
  - productname: Nextcloud
  - extendedSupport: false

I run sudo nextcloud.occ upgrade

The response was

Nextcloud or one of the apps require upgrade - only a limited number of commands are available
You may use your browser or the occ upgrade command to do the upgrade
Setting log level to debug
Updating database schema
Updated database
Repair error: An exception occurred while executing a query: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away
Repair error: Failed to connect to the database: An exception occurred in the driver: SQLSTATE[HY000] [2002] Connection refused
Repair error: Failed to connect to the database: An exception occurred in the driver: SQLSTATE[HY000] [2002] Connection refused
Repair error: Failed to connect to the database: An exception occurred in the driver: SQLSTATE[HY000] [2002] Connection refused
Repair error: Failed to connect to the database: An exception occurred in the driver: SQLSTATE[HY000] [2002] Connection refused
Repair error: Failed to connect to the database: An exception occurred in the driver: SQLSTATE[HY000] [2002] Connection refused
Repair warning: Unable to clear the frontend cache
Repair error: Failed to connect to the database: An exception occurred in the driver: SQLSTATE[HY000] [2002] Connection refused
Repair error: Failed to connect to the database: An exception occurred in the driver: SQLSTATE[HY000] [2002] Connection refused
Repair error: Failed to connect to the database: An exception occurred in the driver: SQLSTATE[HY000] [2002] Connection refused
Repair error: Failed to connect to the database: An exception occurred in the driver: SQLSTATE[HY000] [2002] Connection refused
Repair error: Failed to connect to the database: An exception occurred in the driver: SQLSTATE[HY000] [2002] Connection refused
Repair error: Failed to connect to the database: An exception occurred in the driver: SQLSTATE[HY000] [2002] Connection refused
Repair error: Failed to connect to the database: An exception occurred in the driver: SQLSTATE[HY000] [2002] Connection refused
Repair error: Failed to connect to the database: An exception occurred in the driver: SQLSTATE[HY000] [2002] Connection refused
Repair error: Failed to connect to the database: An exception occurred in the driver: SQLSTATE[HY000] [2002] Connection refused
Doctrine\DBAL\Exception: Failed to connect to the database: An exception occurred in the driver: SQLSTATE[HY000] [2002] Connection refused
Update failed
Maintenance mode is kept active
Resetting log level

I have no non-default apps and it is not working.

One clue: This is a large Nextcloud instance with 5TB of external data. The mysql databse is over 6GB of file cache.

Could it be so big the upgrade is timing out?

@LMRW
Copy link
Contributor Author

LMRW commented Jun 1, 2024

@scubamuc I run the debug script - its a bit too much with some personal information for me to paste the whole output publicly. Is there anything in particular to share?

Some points of interest below though:

I saw this repeatedly:

2024-06-01T18:15:11.101396Z 18 [Warning] [MY-011958] [InnoDB] Over 67 percent of the buffer pool is occupied by lock heaps or the adaptive hash index! Check that your transactions do not set too many row locks. Your buffer pool size is 128 MB. Maybe you should make the buffer pool bigger?. Starting the InnoDB Monitor to print diagnostics, including lock heap and hash index sizes.

This as well:

2024-06-01T18:15:16.856321Z 18 [ERROR] [MY-013183] [InnoDB] Assertion failure: row0sel.cc:5292:!use_semi_consistent thread 281473370234704
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
2024-06-01T18:15:16Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=7deb154fe73c77f981e87ea13d1ba7cc0b4114ed
Thread pointer: 0xffff30005fc0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = ffffa03f16f8 thread_stack 0x100000
/snap/nextcloud/42571/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x44) [0xaaaae5b8fadc]
/snap/nextcloud/42571/bin/mysqld(print_fatal_signal(int)+0x394) [0xaaaae4d0887c]
/snap/nextcloud/42571/bin/mysqld(my_server_abort()+0xb0) [0xaaaae4d08a00]
/snap/nextcloud/42571/bin/mysqld(my_abort()+0x14) [0xaaaae5b89a74]
/snap/nextcloud/42571/bin/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x290) [0xaaaae5de48f0]
/snap/nextcloud/42571/bin/mysqld(row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long)+0x1e6c) [0xaaaae5d5b0a4]
/snap/nextcloud/42571/bin/mysqld(ha_innobase::general_fetch(unsigned char*, unsigned int, unsigned int)+0x1e0) [0xaaaae5bea168]
/snap/nextcloud/42571/bin/mysqld(handler::ha_rnd_next(unsigned char*)+0x19c) [0xaaaae4dfa6fc]
/snap/nextcloud/42571/bin/mysqld(TableScanIterator::Read()+0x78) [0xaaaae4f398b0]
/snap/nextcloud/42571/bin/mysqld(Sql_cmd_update::update_single_table(THD*)+0x938) [0xaaaae4c854b0]
/snap/nextcloud/42571/bin/mysqld(Sql_cmd_update::execute_inner(THD*)+0xd0) [0xaaaae4c86c68]
/snap/nextcloud/42571/bin/mysqld(Sql_cmd_dml::execute(THD*)+0x18c) [0xaaaae4c17204]
/snap/nextcloud/42571/bin/mysqld(mysql_execute_command(THD*, bool)+0xe84) [0xaaaae4bbd2a4]
/snap/nextcloud/42571/bin/mysqld(dispatch_sql_command(THD*, Parser_state*)+0x36c) [0xaaaae4bc0034]
/snap/nextcloud/42571/bin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x149c) [0xaaaae4bc19fc]
/snap/nextcloud/42571/bin/mysqld(do_command(THD*)+0x1c4) [0xaaaae4bc2c84]
/snap/nextcloud/42571/bin/mysqld(+0x10bad44) [0xaaaae4cfad44]
/snap/nextcloud/42571/bin/mysqld(+0x243dba8) [0xaaaae607dba8]
/lib/aarch64-linux-gnu/libpthread.so.0(+0x7088) [0xffffacac8088]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (ffff3044fa10): is an invalid pointer
Connection ID (thread ID): 18
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
2024-06-01T18:15:17.067259Z 0 [System] [MY-010116] [Server] /snap/nextcloud/42571/bin/mysqld (mysqld 8.0.37) starting as process 63257
2024-06-01T18:15:17.073816Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2024-06-01T18:15:17.354331Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2024-06-01T18:15:17.495087Z 0 [Warning] [MY-013829] [Server] Missing data directory for ICU regular expressions: //lib/private/.
2024-06-01T18:15:17.495613Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel.
2024-06-01T18:15:17.496901Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/tmp' in the path is accessible to all OS users. Consider choosing a different directory.
2024-06-01T18:15:17.503707Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Socket: /tmp/mysqlx.sock
2024-06-01T18:15:17.503752Z 0 [System] [MY-010931] [Server] /snap/nextcloud/42571/bin/mysqld: ready for connections. Version: '8.0.37'  socket: '/tmp/sockets/mysql.sock'  port: 0  Source distribution.
2024-06-01T18:15:17.763168Z 0 [System] [MY-013172] [Server] Received SHUTDOWN from user <via user signal>. Shutting down mysqld (Version: 8.0.37).
2024-06-01T18:15:18.975033Z 0 [System] [MY-010910] [Server] /snap/nextcloud/42571/bin/mysqld: Shutdown complete (mysqld 8.0.37)  Source distribution.
2024-06-01T18:15:20.167152Z 0 [System] [MY-010116] [Server] /snap/nextcloud/42571/bin/mysqld (mysqld 8.0.37) starting as process 63824
2024-06-01T18:15:20.173040Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2024-06-01T18:15:20.231718Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2024-06-01T18:15:20.305477Z 0 [Warning] [MY-013829] [Server] Missing data directory for ICU regular expressions: //lib/private/.
2024-06-01T18:15:20.306065Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel.
2024-06-01T18:15:20.306975Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/tmp' in the path is accessible to all OS users. Consider choosing a different directory.
2024-06-01T18:15:20.314750Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Socket: /tmp/mysqlx.sock
2024-06-01T18:15:20.314758Z 0 [System] [MY-010931] [Server] /snap/nextcloud/42571/bin/mysqld: ready for connections. Version: '8.0.37'  socket: '/tmp/sockets/mysql.sock'  port: 0  Source distribution.

  </p>
</details>

<details>
  <summary>Redis</summary>
  <p>

Some redis errors:

2904:M 01 Jun 2024 18:16:13.797 # Server initialized
2904:M 01 Jun 2024 18:16:13.797 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
2904:M 01 Jun 2024 18:16:13.810 * Loading RDB produced by version 7.0.15
2904:M 01 Jun 2024 18:16:13.810 * RDB age 177 seconds
2904:M 01 Jun 2024 18:16:13.810 * RDB memory usage when created 50.24 Mb
2904:M 01 Jun 2024 18:16:13.960 * Done loading RDB, keys loaded: 24495, keys expired: 57.
2904:M 01 Jun 2024 18:16:13.960 * DB loaded from disk: 0.150 seconds
2904:M 01 Jun 2024 18:16:13.960 * The server is now ready to accept connections at /tmp/sockets/redis.sock
2904:signal-handler (1717262181) Received SIGTERM scheduling shutdown...
2904:signal-handler (1717262181) Received SIGTERM scheduling shutdown...
2904:M 01 Jun 2024 18:16:21.697 # User requested shutdown...
2904:M 01 Jun 2024 18:16:21.697 * Saving the final RDB snapshot before exiting.
2904:M 01 Jun 2024 18:16:21.853 * DB saved on disk
2904:M 01 Jun 2024 18:16:21.853 * Removing the pid file.
2904:M 01 Jun 2024 18:16:21.853 * Removing the unix socket file.
2904:M 01 Jun 2024 18:16:21.853 # Redis is now ready to exit, bye bye...
2908:C 01 Jun 2024 18:17:16.303 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
2908:C 01 Jun 2024 18:17:16.303 # Redis version=7.0.15, bits=64, commit=0dfbd247, modified=0, pid=2908, just started
2908:C 01 Jun 2024 18:17:16.303 # Configuration loaded
2908:M 01 Jun 2024 18:17:16.303 * Increased maximum number of open files to 10032 (it was originally set to 1024).
2908:M 01 Jun 2024 18:17:16.303 * monotonic clock: POSIX clock_gettime
2908:M 01 Jun 2024 18:17:16.304 * Running mode=standalone, port=0.
2908:M 01 Jun 2024 18:17:16.304 # Server initialized
2908:M 01 Jun 2024 18:17:16.304 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
2908:M 01 Jun 2024 18:17:16.317 * Loading RDB produced by version 7.0.15
2908:M 01 Jun 2024 18:17:16.317 * RDB age 55 seconds
2908:M 01 Jun 2024 18:17:16.317 * RDB memory usage when created 50.11 Mb
2908:M 01 Jun 2024 18:17:16.452 * Done loading RDB, keys loaded: 24500, keys expired: 7.
2908:M 01 Jun 2024 18:17:16.452 * DB loaded from disk: 0.136 seconds
2908:M 01 Jun 2024 18:17:16.452 * The server is now ready to accept connections at /tmp/sockets/redis.sock
29082908:signal-handler (:signal-handler (17172624501717262450) ) Received SIGTERM scheduling shutdown...Received SIGTERM scheduling shutdown...

Also showed multiple times

RepairErrorEvent: Repair error: Failed to connect to the database: An exception occurred in the driver: SQLSTATE[HY000] [2002] Connection refused","userAgent":"--","version":"27.1.9.1","data":{"app":"updater"}}

@LMRW
Copy link
Contributor Author

LMRW commented Jun 1, 2024

Im thinking now maybe my mysql database is abnormally large? Its about 14GB uncompressed and a zipped dump about 5GB.

But its always been about this size and previous nextcloud snap updates worked.

I use external_files to connect to a very large storage drive so its all file cache data

@LMRW
Copy link
Contributor Author

LMRW commented Jun 1, 2024

OK, GOOD NEWS I now have NC 28 working! 😃
I upgraded my prod v27 to v28 and kept all my users and data on same install.

It was indeed my database was too large which was the issue.

I started with a 14GB nextcloud MySQL database which could not be upgraded by nextcloud snap as part of the v28 upgrade process. For whatever reason, a database that large fails during sudo nextcloud.occ upgrade;

The reason my database was so large, is a database can corrupt and grow exponentially when using the files_external app with large external locations, and when this occurs, nextcloud snap cannot seemlessly upgrade a user without error as it cannot handle databases that large. files_external has a known issue where fast moving folders can cause trumendously large databases of old caches (like a video editors server editing lots of online media, with rendering files and caches etc -- they generate, move, delete fast and files_external caches them all and never removes them even when the files are gone)

So to fix this issue for myself, essentially I managed to shrink my database down from 14gb to 500mb perform a snap upgrade...

The Steps I took

  1. Inside nextcloud web admin, I removed all my files_external shares, but left the app enabled. This is so I can next shrink my database as small as possible by removing all my large external files path caches.

  2. I created and run the below bash script. It deletes all file caches inside nextcloud, and then rescans to rebuild. This means all the old crap built up over time is removed, and only existing good file paths remain. And for now, as my external files are still not connected, it will only add local file paths back as cache listings, which will be very small. I also ensured to perform backups at the start and end, for both safety, and comparing sizes of the sql dump.

(And before I run below, I followed https://github.com/nextcloud-snap/nextcloud-snap/wiki/Backup-and-Restore to perform a snap backup of my working v27)

#!/bin/bash
echo "SCRIPT TO RESET NEXTCLOUD FILE CACHE DATABASE TABLES";
echo "1. BACKUP OLD DATABASE IN FULL";
sudo nextcloud.export;
echo "2. TRUNCATE OLD FILE CACHE DATABASE TABLES";
sudo nextcloud.mysql-client "nextcloud" -e "TRUNCATE TABLE oc_filecache; TRUNCATE TABLE oc_filecache_extended; TRUNCATE TABLE oc_storages;"
echo "3. FIX & RESCAN FILES";
sudo nextcloud.occ files:repair-tree -n -vvv;
sudo nextcloud.occ files:scan --all -n -vvv;
sudo nextcloud.occ files:scan-app-data -n -vvv;
echo "4. REPAIR DATABASE";
sudo nextcloud.mysql-client "nextcloud" -e "REPAIR TABLE oc_filecache; REPAIR TABLE oc_filecache_extended; REPAIR TABLE oc_storages;";
echo "5. OPTIMIZE DATABASE";
sudo nextcloud.mysql-client "nextcloud" -e "OPTIMIZE TABLE oc_filecache; OPTIMIZE TABLE oc_filecache_extended; OPTIMIZE TABLE oc_storages;";
echo "6. RESCAN CHANGES AND CLEANUP";
sudo nextcloud.occ files:scan --all --shallow -n -vvv;
sudo nextcloud.occ files:scan --all --unscanned -n -vvv;
sudo nextcloud.occ files:cleanup -n -vvv;
echo "7. BACKUP NEW DATABASE (should be smaller)";
sudo nextcloud.export;
  1. I upgraded NC Snap to 28 as usual.. as I had previously locked to channel 27 to save myself auto update headaches over last few days when it wasn't working, I had to undo that and switch back to latest channel.
#!/bin/bash
echo "SCRIPT TO INSTALL LATEST NEXTCLOUD SNAP AND UPGRADE DATABASE AND REBUILD INDICES";
sudo snap refresh nextcloud --channel=latest;
sudo snap refresh --unhold nextcloud;
sudo nextcloud.occ status;
sudo nextcloud.occ upgrade;
sudo nextcloud.occ status;
sudo nextcloud.occ db:add-missing-indices;
  1. I checked, and right now my database is only 300mb down from 14GB. This is because it now is only local files, none of my external storage is cached as it remains unconnected.

But now NC is working, I can next re-add my files_external shares in nextcloud web admin inside nextcloud 28... once they are all re-added in the web admin, I ran the below to scan them and double check everything is still good condition:

#!/bin/bash
echo "SCRIPT TO MAKE NEXTCLOUD FILE SCAN NEW EXTERNAL STORAGE AND PERFORM SOME MAINTENANCE";
echo "1. RESCAN CHANGES AND CLEANUP";
sudo nextcloud.occ files:repair-tree -n -vvv;
sudo nextcloud.occ files:scan --all -n -vvv;
sudo nextcloud.occ files:scan-app-data -n -vvv;
sudo nextcloud.occ files:cleanup -n -vvv;
  1. Success! Now i'm on NC 28 with all my external drives again.

I now check my database size once more, and it is only 500mb, which is shocking but true, as it is now a cache of the EXACT same set of files and went from 14GB down by approx 13.5GB.

  1. A final step was needed...
    As Nextcloud Snap MySQL keeps an 'undo' history, I now had gigantic 14GB MySQL undo files in /var/snap/nextcloud/current/mysql. Those needed clearing too... I used a few commands I found at https://stackoverflow.com/a/77391700

sudo nextcloud.mysql-client to enter the mysql client

Then I run the below commands

use nextcloud;

CREATE UNDO TABLESPACE temp_undo_003 ADD DATAFILE 'temp_undo_003.ibu';

ALTER UNDO TABLESPACE innodb_undo_001 SET INACTIVE;
SELECT NAME, STATE FROM INFORMATION_SCHEMA.INNODB_TABLESPACES WHERE NAME = 'innodb_undo_001';
ALTER UNDO TABLESPACE innodb_undo_001 SET ACTIVE;

ALTER UNDO TABLESPACE innodb_undo_002 SET INACTIVE;
SELECT NAME, STATE FROM INFORMATION_SCHEMA.INNODB_TABLESPACES WHERE NAME = 'innodb_undo_002';
ALTER UNDO TABLESPACE innodb_undo_002 SET ACTIVE;

ALTER UNDO TABLESPACE temp_undo_003 SET INACTIVE;
DROP UNDO TABLESPACE temp_undo_003;

That cleared out the massive undo_001 and undo_002 files, getting my SSD space free!

Now I finally was not wasting space and had NC28 snap.

KNOWN ISSUES WITH THIS SOLUTION:

  1. The Nextcloud web admin was showing a ridiculous number of historic errors (in 100,000's). So I then cleared my nextcloud snap logs, as I a) wanted a fresh start to track from here onwards and b) they were all irrelevant errors resolved up after I completed my database. I used https://github.com/nextcloud-snap/nextcloud-snap/wiki/Where-to-find-logs-of-components for guidance (thanks again to the great nextcloud snap devs and their well maintained docs!)

Otherwise web admin was complaining about the very high error count.

  1. Another problem: any Nextcloud app clients probably will now resync any syncing files, and shared files may lose their URLs? This is an assumption based on I deleted and restored the external files directories, but I did not test for this and is not a deal breaker for me if does occur. Edit: confirmed, it did do this! All my shares, external and internal to other users, were lost

RESULTS IN SUMMARY
files_external caused me to have a 14GB database which nextcloud snap could not upgrade with sudo nextcloud.occ upgrade; inside nextcloud snap v28.

Nextcloud has a bug or design flaw where the database grows over years and can never be shrinked back down. Essentially, if you create, then move or delete a file, it is not removed from the database by Nextcloud. So fast moving folders with a lot of activity (like video editors) can cause insanely large databases.

I cleared it right down by removing everything to nothing by local files only at 300mb and when I rescanned my externals fresh again, it was only 500mb.

At the 300mb point in my process, nextcloud snap v28 successfully installed. But I assume at the 500mb point it would have been fine too.

NOTES:

  1. I'm currently unsure if I needed to remove all my files_external shares in step 1, as even after adding all the files back in again my database is currently only 500mb after my fixing process, whereas before it was 14GB for exact same external storages! So I probably would have been fine not removing external files sources in web admin in step 1 and just started at step 2... but users with incredibly large databases, maybe even optimising would not have got it small enough for the nextcloud snap upgrade process? As somewhere unknown exactly between 300mb and 14gb nextcloud snap cannot upgrade a database.
  1. Apparently growing external files databases is a very well known NC issue. It never tidies up the filecache database... so it grows and grows out of control, and the cleanup commands do not work as advertised:

https://www.reddit.com/r/NextCloud/comments/du62gw/nextcloud_and_external_storage/?rdt=44521
https://help.nextcloud.com/t/oc-filecache-full-of-orphaned-files-how-to-remove-fast-and-efficiently/184151
https://help.nextcloud.com/t/oc-files-cleanup-dosent-clean-up-anything/107132
https://help.nextcloud.com/t/purging-oc-storages-and-oc-filecache/84441

My steps above though resolved this issue for me.

  1. VERY IMPORTANT! If you are someone who found this comment via search engine and you experience the same issue of large nextcloud external files databases and/or nextcloud snap not updating to a working NC v28 as a result... I STRONGLY reccomend you do NOT run any of my scripts above!! I shared for the nextcloud snap devs only, to share what fixed things for ME, and what works for YOU may not be my scripts exactly, so I really do not think you should run them.

  2. I entierly ruled out apps, including Nextcloud News/Mail and 3rd party, being the upgrade fault, as I left them all enabled during my sucessfull update

And key note:

5) I will leave this issue open please as the issue now is "users with very large databases cannot upgrade"? We now at least know the issue for me was that.

--

And thank you everyone above in comments, and those in the URLs I linked, and of course the amazing nextcloud and nextcloud snap devs. It all really helped me get NC 28 up and running.

I hope above was helpful to someone.

@LMRW
Copy link
Contributor Author

LMRW commented Jun 1, 2024

TLDR: sudo nextcloud.occ upgrade; fails on very large databases! Its tested working on 300mb databases and tested failing on 14GB databases. And databases that large are not unusual for users of files_external app, due to a known bug with that app & nextcloud.

@LMRW LMRW changed the title Issues with v28 Issues with v28 - 'sudo nextcloud.occ upgrade;' fails on very large databases Jun 1, 2024
@scubamuc
Copy link
Member

scubamuc commented Jun 2, 2024

@LMRW,

OK, GOOD NEWS I now have NC 28 working! 😃
I upgraded my prod v27 to v28 and kept all my users and data on same install.

job well done and thanks for your detailed description. this thread will be an information goldmine for large database issues.

@scubamuc I run the debug script - its a bit too much with some personal information for me to paste the whole output publicly. Is there anything in particular to share?

... personal information is minimal and generally not a leak-issue and the debug script absolutely necessary for supporting issues here. luckily you were able to triage the issue yourself. 👍

...my mysql database is abnormally large...

definitely so!

most of the database scanning and cleaning jobs mentioned in your first script are usually default maintenance jobs which Nextcloud handles automatically.

TLDR: sudo nextcloud.occ upgrade; fails on very large databases!

for interest sake, do you think this could be a "timing" issue?

for a similar setup, no where near the size of your external files,
grafik

external files are being scanned, requiring good network speeds to external media and volumes...

@LMRW
Copy link
Contributor Author

LMRW commented Jun 2, 2024

My external mounts are SFTP. They are not too slow because when I run file scan manually, they scan fine (about 30 minutes), and also Nextcloud snap has upgraded many times perfectly fine before with these mounted. So if the file scan taking too long is the issue -- this was not an issue in any prior nextcloud snap upgrade.

So my hunch is either v28 is different somehow, or, it's coincidence and I just now went over the threshold in size where it breaks (for example: maybe I was 13gb database before and that was fine but now it's 14gb and isn't). If that's the case, I think perhaps the upgrade script is timing out? When I run it, it runs for a couple of minutes and then fails. It's about the same amount of time every time I tried. It's not tested but I wouldn't be surprised if it was for example a 5 minute limit?

Also to clarify: in one sense I'm hesitant to call this a nextcloud snap issue because really databases shouldn't be that large and it's a nextcloud files_external issue... but, we should suppose, having a 14gb database is not impossible! Someone could have a lot of valid file caches, large teams, a tonne of nextcloud talk conversations etc. It could be possible make 14gb naturally without the files_external bug, so perhaps it should still be a solved issue?

@scubamuc
Copy link
Member

scubamuc commented Jun 2, 2024

@LMRW

I'm hesitant to call this a nextcloud snap issue

agree 100%

@Pilzinsel64
Copy link
Member

As far as I know there isn't a timer in background. When occ upgrade is called, nextcloud-fixer waits until it's done as you never can expect how long it takes. 🤔

But yes, I agree, there may be two "bugs". One upstream for files_external and probably one in the snap. Not sure why, can't see a reason for this yet.

@LMRW
Copy link
Contributor Author

LMRW commented Jun 2, 2024

That sounds like a good theory

Hope we can all track down together

Thank you again everyone

I really enjoy the snap version of nextcloud.

@LMRW
Copy link
Contributor Author

LMRW commented Jun 2, 2024

Prior to this for last few days I was occasionally seeing "System Internal Error" Nextcloud branded error screens when changing page/url. A refresh would always resolve the issue and work second time. I never saw these before this week and they have been a couple of times a day now. Unsure how related. My snaps auto refresh daily.

An update on the above from my original comment

files_external has a DECADE old bug

https://help.nextcloud.com/t/still-issue-cant-get-app-storage-app-files-external-user-not-logged-in/170160/16
nextcloud/server#38940
Same bug in 2014: owncloud/core#19610

I'm wondering if its that which is causing the error screens.

Above is absolutely not snap related or even this issue related - but incase anyone found my original comment quoted above, here is the answer for that too. I added some new SFTP shares and removed some old ones around time these errors started.

@scubamuc
Copy link
Member

scubamuc commented Jun 3, 2024

@LMRW, yeah, been watching this too. There seems to be no fix in the pipeline yet. Hopefully this is fixed in 29. My SFTP shares seem to be working okay, but there have been warnings in the logs. not at the moment though, seems to be stable.

just to compare, are your SFTP connections local or external? mine are mostly local.

@pachulo
Copy link
Member

pachulo commented Jun 3, 2024

You can continue the discussion, but this can be closed, as it is already solved.

Thanks to everyone involved!

@pachulo pachulo closed this as completed Jun 3, 2024
@adrianvg
Copy link

adrianvg commented Jun 3, 2024

Thank you. I read previously we cannot skip major versions? So I cannot just skip 28 and go 27->29 when released? The comment in #2759 (comment) seems to indicate thats possible?

@LMRW that's not possible. What @adrianvg did was revert to the 27.1.9snap1, because it still was on their system, and then started tracking the 27/stable channel, so that they don't get the update to 28 yet. The thing with this approach is that they will need to manually update to a newer version/channel soon, as version 27 will be out of support next month.

Didn't realise it was to be EOLed quite that soon. Thanks for the heads up @scubamuc!
Will need to look into this ASAP I think...

@scubamuc
Copy link
Member

scubamuc commented Jun 4, 2024

@LMRW, so here's how I got my logs cleaned up after logs were being spammed by "user not logged in". since local SFTP connects per SSH credentials to the local server (NAS in my case), I deleted the global credentials which had my Nextcloud user credentials saved. After entering SSH/SFTP user credentials there, log spamming stopped for me.

I'm guessing that if the global credentials are empty, then the credentials per SFTP connection will be used and logs spamming will stop.

@LMRW
Copy link
Contributor Author

LMRW commented Jun 4, 2024

That's very interesting. I used to use credentials per connection (local sftp). I very recently swapped to using global.

Can I ask how you deleted the global ones please?

@LMRW
Copy link
Contributor Author

LMRW commented Jun 4, 2024

@pachulo agreed this thread now has moved in a different direction but I do think an issue remains... very large databases cannot be upgraded.

The question is

  1. how many people have 14gb databases?
  2. of those who do, how many have legitimate data (lots of nextcloud talk conversations for example) and how many have spam invalid data (like I did from files_external)

I would counter if anyone has legitimate large data they won't be able to upgrade nextcloud snap atm.

But maybe it's not a realistic real world issue?

@scubamuc
Copy link
Member

scubamuc commented Jun 5, 2024

@LMRW

Can I ask how you deleted the global ones please?

get settings: sudo nextcloud.occ config:app:get external_files

delete setting: sudo nextcloud.occ config:app:delete "app config value"

grafik

@LMRW
Copy link
Contributor Author

LMRW commented Jun 5, 2024

@LMRW

Can I ask how you deleted the global ones please?

get settings: sudo nextcloud.occ config:app:get external_files

delete setting: sudo nextcloud.occ config:app:delete "app config value"

grafik

Amazing thank you

I set them to both empty and pressed save in UI yesterday before I knew this command

Checking today I see

sudo nextcloud.occ config:list external_files 
{
    "apps": {
        "external_files": []
    }
}

So looks like I already removed them

Thank you very kindly for your tip here to do it via CLI and also tracking down the cause.

This thread has a wealth of information now.

Maybe I will when I have time make new issues on files_external to share with their team

Thank you

@scubamuc
Copy link
Member

@noticons thanks for that and for the command for watching the logs in live mode 👌🤿👍
added this to the wiki: sudo nextcloud.occ log:watch

@scubamuc
Copy link
Member

@steinger see here nextcloud/news#2585 (comment)

@steinger
Copy link

@scubamuc yes, I saw it and updated it.
My problem since updating to NC28 and the News App is that Snap's Nextcloud Cron keeps crashing after a few days and has to be restarted. I just don't know if it's the problem, Nextcloud or the News App.

@LMRW
Copy link
Contributor Author

LMRW commented Jun 20, 2024

@scubamuc yes, I saw it and updated it.

My problem since updating to NC28 and the News App is that Snap's Nextcloud Cron keeps crashing after a few days and has to be restarted. I just don't know if it's the problem, Nextcloud or the News App.

My cron also crashed a few times -- I also use the new news app. I upgraded to v29 today. I'm also unsure why. But I think we need a new issue.

@Pilzinsel64
Copy link
Member

Please check/create issues at the desired app then.
For example: nextcloud/news#2693

@scubamuc
Copy link
Member

@steinger see here: #2793 (comment)

there is a new version of fulltextsearch which prevents cron from failing....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants