Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: optimize preview cleanup sql #40514

Merged
merged 1 commit into from
Feb 6, 2023
Merged

Conversation

DeepDiver1975
Copy link
Member

@DeepDiver1975 DeepDiver1975 commented Nov 28, 2022

Description

Related Issue

Motivation and Context

How Has This Been Tested?

  • test environment:
  • test case 1:
  • test case 2:
  • ...

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Database schema changes (next release will require increase of minor version instead of patch)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Technical debt
  • Tests only (no source changes)

Checklist:

  • Code changes
  • Unit tests added
  • Acceptance tests added
  • Documentation ticket raised:
  • Changelog item, see TEMPLATE

@update-docs
Copy link

update-docs bot commented Nov 28, 2022

Thanks for opening this pull request! The maintainers of this repository would appreciate it if you would create a changelog item based on your changes.

@mmattel
Copy link
Contributor

mmattel commented Nov 29, 2022

the failing test: php unit 7.4 for sqlite windows

+ php occ config:app:set core enable_external_storage --value=yes
Config value enable_external_storage for app core set to yes
+ case "${FILES_EXTERNAL_TYPE}" in
+ wait-for-it -t 600 fsweb.test.owncloud.com:445
services aren't ready in 10m0s

@DeepDiver1975
Copy link
Member Author

the failing test: php unit 7.4 for sqlite windows

oldie .....

@DeepDiver1975 DeepDiver1975 marked this pull request as ready for review November 29, 2022 13:20
@mmattel mmattel force-pushed the fix/preview-cleanup-sql branch from df9d791 to b6f7436 Compare January 3, 2023 10:04
@mmattel
Copy link
Contributor

mmattel commented Jan 3, 2023

@phil-davis I just have rebased but the test php unit 7.4 for sqlite windows still fails. Can you have a look and support to make CI happy?

@DeepDiver1975
Copy link
Member Author

Can you have a look and support to make CI happy?

It cannot be fixed - see #40426

@phil-davis
Copy link
Contributor

phil-davis commented Jan 3, 2023

@mmattel that CI pipeline has been out-of-action for "a long time". I have been waiting for a replacement Windows SMB share to be made available for CI, but no joy yet.

If someone reviews and approves the PR, then I (or someone else) can post a "success" drone result and merge the PR.

@DeepDiver1975 you have set the "3 - To Review" label, but nobody is actually called up as reviewers, and this has not been added to the "ownTeam Board" project, and there is no PR description or changelog. Is it really ready for review?

@mmattel
Copy link
Contributor

mmattel commented Jan 3, 2023

We are looking forward to have 10.12 in a forseeable timeframe and this fix is more than welcomed to get in.

@DeepDiver1975
Copy link
Member Author

There is no label set because I do not consider this finished. If time permits I will return 🙈

@phil-davis
Copy link
Contributor

There is no label set

I can see that the "3 - To Review" label was set on Nov 29, 2022 - that is what made me ask if it is really ready for review.

@DeepDiver1975
Copy link
Member Author

I can see that the "3 - To Review" label was set on Nov 29, 2022 - that is what made me ask if it is really ready for review.

😕

@DeepDiver1975
Copy link
Member Author

My current objection:

  • the original query was analysed and did not raise the referenced performance issues
  • the new proposed query seems to solve some observed performance issues
  • will this now open up new performance issues on different dbms?
  • a closer look is necessary

@mmattel mmattel force-pushed the fix/preview-cleanup-sql branch from b6f7436 to b5846e2 Compare January 5, 2023 14:17
@mmattel
Copy link
Contributor

mmattel commented Jan 5, 2023

rebased

@ownclouders
Copy link
Contributor

💥 Acceptance tests pipeline webUITrashbin-chrome-mariadb10.2-php7.4 failed. The build has been cancelled.

https://drone.owncloud.com/owncloud/core/37544/143

@ownclouders
Copy link
Contributor

💥 Acceptance tests pipeline webUISharingAcceptSh-chrome-mariadb10.2-php7.4 failed. The build has been cancelled.

https://drone.owncloud.com/owncloud/core/37544/133

@ownclouders
Copy link
Contributor

💥 Acceptance tests pipeline webUISharingIntUsers1-chrome-mariadb10.2-php7.4 failed. The build has been cancelled.

https://drone.owncloud.com/owncloud/core/37544/138

@ownclouders
Copy link
Contributor

💥 Acceptance tests pipeline webUITags-chrome-mariadb10.2-php7.4 failed. The build has been cancelled.

https://drone.owncloud.com/owncloud/core/37544/142

@mmattel mmattel force-pushed the fix/preview-cleanup-sql branch from b5846e2 to f52b2b0 Compare January 6, 2023 12:04
@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 6, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

@mmattel
Copy link
Contributor

mmattel commented Jan 6, 2023

At least CI is now green after it refused to get there 😄

@jnweiger
Copy link
Contributor

I'd like to see some comments on the subquery

select `fileid` from `*PREFIX*filecache` where `fc`.`name` = CAST(`*PREFIX*filecache`.`fileid` as CHAR(24))

I'd say, his almost always returns empty, as the numeric fileid - (even if cast to string) is very unlikely to be equal to the name??

@DeepDiver1975
Copy link
Member Author

I'd say, his almost always returns empty, as the numeric fileid - (even if cast to string) is very unlikely to be equal to the name??

absolutly not - for thumbnails the the folder name is the fileid. Believe it or not - this is all done by intention ;-)

@jnweiger
Copy link
Contributor

Understood.
I tried creating several files and folders with numeric names that match the fileid of thumbnails and fotos. They all survived the
the cleanup jobs correctly. It seems quite save despite that namespace clash.

I thought, that something like

WHERE `fc`.`path` = CONCAT("thumbnails/",CAST(oc_filecache.fileid  as CHAR(24)))

should be safter, but I could not provoke any misbehaviour.
Mindblowing technology. :-)

@jvillafanez
Copy link
Member

Something to consider is to remove the sorting and make a query per user.

Using fileid > 1

MariaDB [owncloud]> explain select fileid, name, user_id from oc_filecache fc join oc_mounts on storage = storage_id where parent in (select fileid from oc_filecache where storage in (select numeric_id from oc_storages where id like 'home::%' or id like 'object::user:%') and path = 'thumbnails')   and fc.fileid not in (select fileid from oc_filecache where fc.name = CAST(oc_filecache.fileid as CHAR(24)))   and fc.fileid > 1   order by user_id, fileid;
+------+--------------------+--------------+-----------------+----------------------------------------------------------------------------------------------------------+---------------------+---------+-------------------------------+------+--------------------------------------------------------------+
| id   | select_type        | table        | type            | possible_keys                                                                                            | key                 | key_len | ref                           | rows | Extra                                                        |
+------+--------------------+--------------+-----------------+----------------------------------------------------------------------------------------------------------+---------------------+---------+-------------------------------+------+--------------------------------------------------------------+
|    1 | PRIMARY            | oc_mounts    | ALL             | mounts_storage_index                                                                                     | NULL                | NULL    | NULL                          | 1    | Using temporary; Using filesort                              |
|    1 | PRIMARY            | oc_storages  | range           | PRIMARY,storages_id_index                                                                                | storages_id_index   | 259     | NULL                          | 2    | Using where; Using index; Using join buffer (flat, BNL join) |
|    1 | PRIMARY            | fc           | ref             | PRIMARY,fs_storage_path_hash,fs_parent_name_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size | fs_storage_mimetype | 4       | owncloud.oc_mounts.storage_id | 9    | Using index condition; Using where                           |
|    1 | PRIMARY            | oc_filecache | eq_ref          | PRIMARY,fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size                     | PRIMARY             | 8       | owncloud.fc.parent            | 1    | Using where                                                  |
|    4 | DEPENDENT SUBQUERY | oc_filecache | unique_subquery | PRIMARY                                                                                                  | PRIMARY             | 8       | func                          | 1    | Using index; Using where                                     |
+------+--------------------+--------------+-----------------+----------------------------------------------------------------------------------------------------------+---------------------+---------+-------------------------------+------+--------------------------------------------------------------+
5 rows in set (0.001 sec)

Without sorting and fileid > 1 and user_id = admin

MariaDB [owncloud]> explain select fileid, name from oc_filecache fc join oc_mounts on storage = storage_id where parent in (select fileid from oc_filecache where storage in (select numeric_id from oc_storages where id like 'home::%' or id like 'object::user:%') and path = 'thumbnails')   and fc.fileid not in (select fileid from oc_filecache where fc.name = CAST(oc_filecache.fileid as CHAR(24)))   and fc.fileid > 1   and user_id = 'admin';
+------+--------------------+--------------+-----------------+----------------------------------------------------------------------------------------------------------+------------------------+---------+-------------------------------+------+--------------------------------------------------------------+
| id   | select_type        | table        | type            | possible_keys                                                                                            | key                    | key_len | ref                           | rows | Extra                                                        |
+------+--------------------+--------------+-----------------+----------------------------------------------------------------------------------------------------------+------------------------+---------+-------------------------------+------+--------------------------------------------------------------+
|    1 | PRIMARY            | oc_mounts    | ref             | mounts_user_root_index,mounts_user_index,mounts_storage_index                                            | mounts_user_root_index | 258     | const                         | 1    | Using index condition                                        |
|    1 | PRIMARY            | oc_storages  | range           | PRIMARY,storages_id_index                                                                                | storages_id_index      | 259     | NULL                          | 2    | Using where; Using index; Using join buffer (flat, BNL join) |
|    1 | PRIMARY            | fc           | ref             | PRIMARY,fs_storage_path_hash,fs_parent_name_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size | fs_storage_mimetype    | 4       | owncloud.oc_mounts.storage_id | 9    | Using index condition; Using where                           |
|    1 | PRIMARY            | oc_filecache | eq_ref          | PRIMARY,fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size                     | PRIMARY                | 8       | owncloud.fc.parent            | 1    | Using where                                                  |
|    4 | DEPENDENT SUBQUERY | oc_filecache | unique_subquery | PRIMARY                                                                                                  | PRIMARY                | 8       | func                          | 1    | Using index; Using where                                     |
+------+--------------------+--------------+-----------------+----------------------------------------------------------------------------------------------------------+------------------------+---------+-------------------------------+------+--------------------------------------------------------------+
5 rows in set (0.001 sec)

The search on the oc_mount table should use an index now, boosting the performance. Previously, we had to sort a temporary table that didn't seem to be indexed. Note that there could be thousands of users, so that temporary table could be really huge.

@DeepDiver1975
Copy link
Member Author

Something to consider is to remove the sorting and make a query per user.

The order by is necessary so that the pagination will work in a predictable manner.

@jvillafanez
Copy link
Member

With around 2000 images for Brian and 100 images for Carol, these are the results for the queries:

MariaDB [owncloud]> select fileid, name, user_id
    -> from oc_filecache fc join oc_mounts on storage = storage_id
    -> where parent in (
    ->   select fileid from oc_filecache where storage in (
    ->     select numeric_id from oc_storages where id like 'home::%' or id like 'object::user:%'
    ->   ) and path = 'thumbnails')
    ->   and not exists(select `fileid` from oc_filecache where fc.name = CAST(oc_filecache.fileid as CHAR(24)))
    ->   and fc.fileid > 1
    ->   order by user_id, fileid;
Empty set (17.584 sec)
MariaDB [owncloud]> select fileid, name, user_id
    -> from oc_filecache fc join oc_mounts on storage = storage_id
    -> where parent in (
    ->   select fileid from oc_filecache where storage in (
    ->     select numeric_id from oc_storages where id like 'home::%' or id like 'object::user:%'
    ->   ) and path = 'thumbnails')
    ->   and fc.fileid not in (select fileid from oc_filecache where fc.name = CAST(oc_filecache.fileid as CHAR(24)))
    ->   and fc.fileid > 1
    ->   order by user_id, fileid;
2114 rows in set (0.034 sec)
MariaDB [owncloud]> select fileid, name
    -> from oc_filecache fc join oc_mounts on storage = storage_id
    -> where parent in (
    ->   select fileid from oc_filecache where storage in (
    ->     select numeric_id from oc_storages where id like 'home::%' or id like 'object::user:%'
    ->   ) and path = 'thumbnails')
    ->   and fc.fileid not in (select fileid from oc_filecache where fc.name = CAST(oc_filecache.fileid as CHAR(24)))
    ->   and fc.fileid > 1
    ->   and user_id = 'Brian';
2004 rows in set (0.028 sec)
MariaDB [owncloud]> select fileid, name
    -> from oc_filecache fc join oc_mounts on storage = storage_id
    -> where parent in (
    ->   select fileid from oc_filecache where storage in (
    ->     select numeric_id from oc_storages where id like 'home::%' or id like 'object::user:%'
    ->   ) and path = 'thumbnails')
    ->   and fc.fileid not in (select fileid from oc_filecache where fc.name = CAST(oc_filecache.fileid as CHAR(24)))
    ->   and fc.fileid > 1
    ->   and user_id = 'Carol';
106 rows in set (0.005 sec)

@DeepDiver1975
Copy link
Member Author

Analysis from the new query:

-> Limit: 1000 row(s)  (actual time=2.472..2.484 rows=183 loops=1)
    -> Sort: oc_mounts.user_id, fc.fileid, limit input to 1000 row(s) per chunk  (actual time=2.471..2.477 rows=183 loops=1)
        -> Stream results  (cost=15.32 rows=12) (actual time=2.366..2.411 rows=183 loops=1)
            -> Inner hash join (oc_mounts.storage_id = fc.`storage`)  (cost=15.32 rows=12) (actual time=2.362..2.383 rows=183 loops=1)
                -> Table scan on oc_mounts  (cost=0.17 rows=4) (actual time=0.006..0.007 rows=4 loops=1)
                -> Hash
                    -> Nested loop antijoin  (cost=12.67 rows=6) (actual time=0.275..2.297 rows=182 loops=1)
                        -> Nested loop inner join  (cost=11.35 rows=6) (actual time=0.270..1.985 rows=182 loops=1)
                            -> Nested loop inner join  (cost=8.70 rows=2) (actual time=0.257..1.349 rows=3 loops=1)
                                -> Filter: ((oc_storages.id like 'home::%') or (oc_storages.id like 'object::user:%'))  (cost=0.65 rows=4) (actual time=0.013..0.017 rows=3 loops=1)
                                    -> Covering index scan on oc_storages using storages_id_index  (cost=0.65 rows=4) (actual time=0.011..0.013 rows=4 loops=1)
                                -> Filter: (oc_filecache.`path` = 'thumbnails')  (cost=1.45 rows=1) (actual time=0.088..0.442 rows=1 loops=3)
                                    -> Index lookup on oc_filecache using fs_storage_path_hash (storage=oc_storages.numeric_id)  (cost=1.45 rows=6) (actual time=0.014..0.426 rows=193 loops=3)
                            -> Index lookup on fc using fs_parent_name_hash (parent=oc_filecache.fileid), with index condition: (fc.fileid > 1)  (cost=0.69 rows=3) (actual time=0.007..0.210 rows=61 loops=3)
                        -> Filter: (fc.`name` = cast(oc_filecache.fileid as char(24) charset utf8mb4))  (cost=0.28 rows=1) (actual time=0.002..0.002 rows=0 loops=182)
                            -> Single-row covering index lookup on oc_filecache using PRIMARY (fileid=fc.fileid)  (cost=0.28 rows=1) (actual time=0.001..0.001 rows=1 loops=182)

Previous query:

-> Limit: 1000 row(s)  (actual time=4.461..4.475 rows=165 loops=1)
    -> Sort: oc_mounts.user_id, fc.fileid, limit input to 1000 row(s) per chunk  (actual time=4.460..4.468 rows=165 loops=1)
        -> Stream results  (cost=115497.16 rows=326) (actual time=4.143..4.417 rows=165 loops=1)
            -> Hash antijoin (fc.`name` = cast(oc_filecache.fileid as char(24) charset utf8mb4))  (cost=115497.16 rows=326) (actual time=4.140..4.392 rows=165 loops=1)
                -> Nested loop inner join  (cost=18.13 rows=0.5) (actual time=1.227..3.150 rows=215 loops=1)
                    -> Inner hash join (no condition)  (cost=9.91 rows=9) (actual time=1.213..1.218 rows=12 loops=1)
                        -> Table scan on oc_mounts  (cost=0.30 rows=4) (actual time=0.005..0.007 rows=4 loops=1)
                        -> Hash
                            -> Nested loop inner join  (cost=8.70 rows=2) (actual time=0.303..1.196 rows=3 loops=1)
                                -> Filter: ((oc_storages.id like 'home::%') or (oc_storages.id like 'object::user:%'))  (cost=0.65 rows=4) (actual time=0.009..0.011 rows=3 loops=1)
                                    -> Covering index scan on oc_storages using storages_id_index  (cost=0.65 rows=4) (actual time=0.007..0.008 rows=4 loops=1)
                                -> Filter: (oc_filecache.`path` = 'thumbnails')  (cost=1.45 rows=1) (actual time=0.104..0.395 rows=1 loops=3)
                                    -> Index lookup on oc_filecache using fs_storage_path_hash (storage=oc_storages.numeric_id)  (cost=1.45 rows=6) (actual time=0.015..0.382 rows=235 loops=3)
                    -> Filter: (fc.`storage` = oc_mounts.storage_id)  (cost=0.64 rows=0.05) (actual time=0.115..0.160 rows=18 loops=12)
                        -> Index lookup on fc using fs_parent_name_hash (parent=oc_filecache.fileid), with index condition: (fc.fileid > 1)  (cost=0.64 rows=3) (actual time=0.005..0.157 rows=71 loops=12)
                -> Hash
                    -> Covering index scan on oc_filecache using fs_storage_mimetype  (cost=115460.54 rows=709) (actual time=0.016..0.085 rows=709 loops=1)

@DeepDiver1975
Copy link
Member Author

-> Table scan on oc_mounts (cost=0.17 rows=4) (actual time=0.006..0.007 rows=4 loops=1)

I agree with @jvillafanez that this is yet another optimization opportunity .... but maybe something for a followup pr ...

@DeepDiver1975 DeepDiver1975 merged commit 5129a55 into master Feb 6, 2023
@delete-merged-branch delete-merged-branch bot deleted the fix/preview-cleanup-sql branch February 6, 2023 14:42
@mmattel
Copy link
Contributor

mmattel commented Feb 6, 2023

Do we need any changelog?

@DeepDiver1975
Copy link
Member Author

Do we need any changelog?

argh - totally missed that .... will submit pr ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

previews:cleanup issues
6 participants