-
-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization of queries on very large datasets #45680
Comments
Shouldn't that the DB optimizer do, to filter for |
Similar report: #28114 The SQL query belongs to: https://github.com/nextcloud/server/blob/master/lib/private/Preview/BackgroundCleanupJob.php
There are no rows in oc_filecache where fileid is null because the fileid is the primary key and therefore your suggestion, to first select all rows with fileid is null, does not work. |
About this, my idea was mainly because of this line
After a deeper investigation I found that the problem is with the left join, the database takes too much time(hours) to do it. I'm beginning think the problem could be on my mariadb database, it has around 5M of entries and a simple count it takes around 1 minute run. So, this probably is a question for the foruns then, |
I'm commenting more for the case if this issue is found by someone else in the future but I found the problem. The oc_filecache table corrupted itself. I had to drop the table and recreate it with the most recent script that I found here: After recreating it I ran |
For anyone coming across: Deleting the filecache table is a bad idea. The filecache table is not a cache anymore and deleting will lose all share and other information about files. |
How to use GitHub
Is your feature request related to a problem? Please describe.
Recently I setted up a Nextcloud instance for a company to store over 700k of small XML invoice files, it started 2 weeks ago. After the upload finished I expected that Nextcloud would generate a lot of data on database(index, previews, ....) so I didn't minded that it would take some time.
After a week done running I began to search if this was normal and apparently it wasn't. I noticed that a few of the queries that used a lot of the CPU/Disk IO and wouldn't let the drive to spin down;
What is causing a lot of trouble from what I was looking is the query
(
a.
pathLIKE 'appdata\\_oc39a4gb5j29/preview/_/_/_/_/_/_/_/%')
.I tried to find where from the code this was sent by the server so I could try to fix it myself and submit an PR but I couldn't find as I never worked on it.
Describe the solution you'd like
Basically would be to do some fast checks before running long and complex query. The second command of the terminal of the image, shows a basic command that it counts all the entries where the fileid is NULL and it returned zero. If something like an if statement was added with this check before this query, the server would be able to skip it and let the drive to spin down and save resources.
Describe alternatives you've considered
Maybe doing a more complex using transactions to filter the data first by simple and fast queries and then execute the complex queries, something like: Get all fileid == null -> filter all that the storagetype == 2 -> ...
Additional context
The text was updated successfully, but these errors were encountered: