New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow searches after upgrading from 6.0.6 to 7.0.4, Server CPU load peaks at 100% while searching #13237

Closed
legnabar opened this Issue Jan 10, 2015 · 27 comments

Comments

Projects
None yet
4 participants
@legnabar
Copy link

legnabar commented Jan 10, 2015

Hello eveybody,

Before start, as this is my first post, I would like to thank all the developers for the awesome work with Owncloud, it's a wonderful software.

I came across a weird issue after the migration from an stable and rock solid deployment based on OC 6.0.6 with 300 users and LDAP integration. The Search app in the web, which had always been blazing fast (I'm not using lucent search so far) is now sluggish (I.e: one search takes more than 7 seconds to draw results in a user account with less than 300 files on it), to the point that with those users with more files in their accounts the issue gets worse and no results are displayed and which is worst, during this search task, Apache processes takes up the whole CPU, up to 100% thus degrading the performance of the whole installation.

I did lot of testing on my own before posting this issue here: For example, the issue only appears with existent users migrated from 6.x version then, if I create a new user within OC 7.0.4 and copy over the same files from another existent user with lot of files, the search works as expected (within a second) and the CPU doesn't peak. Therefore, I assume that may have something to do with the upgrade process and something that is not well updated somewhere in the DB, causing the issue.

There is nothing relevant in the logs and the upgrade process completed successfully. I have also checked that downgrading to 6.0.6 solves the issue.

I would appreciate it if you could have a look at this issue and let me know of any possible solution for this.

If you need further information please let me know.

Thank you in advance.

Kind Regards,

Steps to reproduce

1.Login to OC with a user migrated from OC 6 and enter any pattern to search in the search box
2.The search takes too long (several seconds) to complete or even never finish.
3.While the search task is running the CPU on the server remains at 100% due to apache processes.

Expected behaviour

As in OC 6, the search should complete within a second or two and should not affect severely the performance on the server, with the CPU at 100%.

Actual behaviour

The search, specially with the users with most files, takes too long to complete or never ends and while launching it the CPU on the server peaks due to Apache process.

Server configuration

Operating system: Ubuntu 14.04.1 LTS

Web server: Apache/2.4.7 (Ubuntu) OpenSSL/1.0.1f

Database: Mysql 5.5.40-0ubuntu0.14.04.1 (Ubuntu)

PHP version: 5.5.9-1ubuntu4.5

ownCloud version: OC 7.0.4

Updated from an older ownCloud or fresh install:
UPDATED FROM 6.0.6

List of activated apps:
Enabled:

  • activity
  • documents
  • files
  • files_external
  • files_pdfviewer
  • files_sharing
  • files_texteditor
  • files_trashbin
  • files_versions
  • gallery
  • user_ldap

The content of config/config.php:

<?php
$CONFIG = array (
  'instanceid' => 'XXXX-OMITTED-XXXXX,
  'theme' => 'custom',
  'passwordsalt' => 'XXXX-OMITTED-XXXXX,
  'datadirectory' => '/owncloud/data',
  'dbtype' => 'mysql',
  'version' => '7.0.4.2',
  'dbname' => 'XXXX-OMITTED-XXXXX,
  'dbhost' => 'localhost',
  'dbtableprefix' => 'oc_',
  'dbuser' => '',XXXX-OMITTED-XXXXX
  'dbpassword' => '',
  'installed' => true,
  'ldapIgnoreNamingRules' => false,
  'mail_smtpmode' => 'smtp',
  'mail_smtphost' => 'XXXX-OMITTED-XXXXX',
  'mail_smtpauth' => false,
  'loglevel' => '1',
  'maintenance' => false,
  'enable_previews' => true,
  'preview_max_x' => NULL,
  'preview_max_y' => NULL,
  'enable_avatars' => true,
  'knowledgebaseenabled' => false,
  'trusted_domains' => 
  array (
    0 => 'XXXX-OMITTED-XXXXX',
  ),
  'secret' => 'XXXXX-OMITTED-XXXXXX'
  'share_folder' => '/Shared',

**Are you using external storage, if yes which one:** 
NO
**Are you using encryption:** 
NO
### Client configuration
**Browser:* Google Chrome*
@LukasReschke

This comment has been minimized.

Copy link
Member

LukasReschke commented Jan 10, 2015

I did lot of testing on my own before posting this issue here: For example, the issue only appears with existent users migrated from 6.x version then, if I create a new user within OC 7.0.4 and copy over the same files from another existent user with lot of files, the search works as expected (within a second) and the CPU doesn't peak. Therefore, I assume that may have something to do with the upgrade process and something that is not well updated somewhere in the DB, causing the issue.

@icewind1991 Sounds like a FS problem – did you happen to notice anything like that?

@legnabar

This comment has been minimized.

Copy link
Author

legnabar commented Jan 10, 2015

FS? Don't think so, because the newly created user with the very same files copied over from an already existing user who suffers from this issue, does not face this issue, and the filesystem were the files from either user are stored is the same. In short, the issue seems to affect only old users from the 6.0.6, not the new ones, no matter the amount of files or storage quota usage.

Thank you for your help.

Best regards,

@LukasReschke

This comment has been minimized.

Copy link
Member

LukasReschke commented Jan 10, 2015

Ah - Yes - I referred to our internal "Filesystem" used by ownCloud :)

We have some wrappers and caches for FS access - that's where I suspect the problem.

@legnabar

This comment has been minimized.

Copy link
Author

legnabar commented Jan 10, 2015

Sorry Lukas, I didn't realise you were actually forwarding the question to icewind1991 :-).

Thanks again for your promptly response.

Anything you may need please let me know.

Best Regards,

@LukasReschke

This comment has been minimized.

Copy link
Member

LukasReschke commented Jan 10, 2015

If you want to do us a VERY big favour you could profile the request and so we can see right away where the problem possibly lies.

For that I'd recommend:

  1. Go to blackfire.io
  2. Register with your GitHub account
  3. Install the PHP module
  4. Install the client on your machine
  5. Login as a user that experiences the problem
  6. Open the network console
  7. Search the slow search request and click "Copy as curl" (Chrome offers this feature)
  8. Open a command line and type "blackfire " and paste the curl command
  9. Open the URL generated by blackfire
  10. Make screenshots of the whole result page (the more the better)
  11. Repeat the same for one without problems (best: copy the files to a new user)
  12. In the blackfire webinterface click compare on those two requests
  13. Make screenshots as well
  14. Post them here
  15. Always good: Disable the PHP module after that again

Please notice that blackfire is proprietary software but created from the guys behind Symfony so I consider it trustworthy enough ;-)

Such a trace would help us a lot since we see directly where most of the time was spent and can analyze it without needing to guess :)

@LukasReschke

This comment has been minimized.

Copy link
Member

LukasReschke commented Jan 10, 2015

(Please note that I'm on my mobile right now and so the actual process might sligthly differ from my brain notices ;-))

@legnabar

This comment has been minimized.

Copy link
Author

legnabar commented Jan 10, 2015

Hi Lukas,

Thank you for help, I will follow your directions as soon as possible and provide you with the required info.

Have a nice day!

Kind regards,

El 10/1/2015, a las 14:37, Lukas Reschke notifications@github.com escribió:

If you want to do us a VERY big favour you could profile the request and so we can see right away where the problem possibly lies.

For that I'd recommend:

Go to blackfire.io
Register with your GitHub account
Install the PHP module
Install the client on your machine
Login as a user that experiences the problem
Open the network console
Search the slow search request and click "Copy as curl" (Chrome offers this feature)
Open a command line and type "blackfire " and paste the curl command
Open the URL generated by blackfire
Make screenshots of the whole result page (the more the better)
Repeat the same for one without problems (best: copy the files to a new user)
In the blackfire webinterface click compare on those two requests
Make screenshots as well
Post them here
Always good: Disable the PHP module after that again
Please notice that blackfire is proprietary software but created from the guys behind Symfony so I consider it trustworthy enough ;-)

Such a trace would help us a lot since we see directly where most of the time was spent and can analyze it without needing to guess :)


Reply to this email directly or view it on GitHub #13237 (comment).

@legnabar

This comment has been minimized.

Copy link
Author

legnabar commented Jan 10, 2015

Hi Lukas,

I'm back again I've performed all the requiered profiling on my side and I'm providing you with screen captures of the results:

In order to test all the possible scenarios described for the issue I tested it with two users with issues (migrated from OC 6.0.6), one with fewer files where the search completes but takes lot of time to do (almost 17 seconds) and the other one, with much more files than the previous where the search never completes. The searched pattern is the same throughout all the testing.

To compare it, I also have a newly created user to whom I copied over the same files as the user with the large file list (exactly the one used below that never completes). In this case the search completes in just 137 ms.

Screen captures for all the tests so far:

User 1: User migrated from 6.0.6, very few files, search task ended after almost 17 seconds:

image

image

image

image

image

User 2: User migrated from 6.0.6, large file list, search task never ends:

image

image

image

image

User 3: Newly created user, Same files as user2. Search behaves as expected:

image

image

image

Comparision between User 1 (16 seconds) and User 3 (137 ms):

image

image

image

comparision between user 2 (never completes) and user 3:

image

image

image

I hope this will help you finding out the issue. However, if you need further details or any further testing on my side, please let me know.

Thank you in advance.

Kind regards.

@LukasReschke

This comment has been minimized.

Copy link
Member

LukasReschke commented Jan 10, 2015

Do you happen to have an external SMB mount for some of the users or on the instance?

The dependency check for external SMB storage seems to take horribly long as can be seen at https://cloud.githubusercontent.com/assets/10474615/5692857/72d80e3a-9906-11e4-9a18-7a655fd9f504.png - 52.11% of 16.6s for \OC_Helper::findBinaryPath('smbclient') is horribly. We need to cache that return value.

#13192 should fix that if you have a memory cache installed. – For non memory cache instances we need #13211

To test whether that is the actual problem you could modify

/**
* check if smbclient is installed
*/
public static function checkDependencies() {
$smbClientExists = (bool) \OC_Helper::findBinaryPath('smbclient');
return $smbClientExists ? true : array('smbclient');
}
to always return true (comment out all lines in the function and add a return true; if you have smbclient installed on your machine)

@DeepDiver1975 Pretty interesting case with a lot of debug informations. Please schedule the array memory cache replacement (#13211). I'd say that certainly warrants 8.0 if we take a look at the horrible performance here.

@LukasReschke

This comment has been minimized.

Copy link
Member

LukasReschke commented Jan 10, 2015

Alternatively try to disable the files_external application if you don't require it. That should also at least take this out of the equation. If that does not help new stacktraces would be appreciated :-)

@LukasReschke

This comment has been minimized.

Copy link
Member

LukasReschke commented Jan 10, 2015

Still, I wonder why there is such a huge difference here – maybe @icewind1991 knows more.

@legnabar

This comment has been minimized.

Copy link
Author

legnabar commented Jan 11, 2015

Hi Lukas,

As suggested, I disabled external files app and the user whose search took 17 seconds improves a lot, I'll send you new profile screenshots later on. However, the user with no response at all continues with the issue ( no response & CPU load). I don't know whether has anything to do with the same issue, but further testing this problematic user, I found out that he has also issues while accessing picture gallery (takes also forever) or Documents (same here).

We do use SMB mounts for some users, in the meantime and while troubleshooting this issue, I think it's better to leave it off rather than taking the risk of searches badly affecting the performance of the whole instance.

Thank you very much for your help.

Kind regards,

@legnabar

This comment has been minimized.

Copy link
Author

legnabar commented Jan 11, 2015

Hi again,

I performed further testing on mi side, I did the profiling once again with the files_external app disabled:

The search on the user that took 17 seconds now takes less than 6 seconds, although I still think is far more than expected. Here are the captures for this search:

captura de pantalla 2015-01-11 a las 10 45 35

captura de pantalla 2015-01-11 a las 10 48 17

And compared with the same search with same user before disabling files_external:

captura de pantalla 2015-01-11 a las 10 56 10

On the other hand, while testing the user with large amount of files, it still does not complete but this time 43 seconds later it ends with a PHP error:

[Sun Jan 11 10:24:08.768699 2015] [:error] [pid 9219] [client 172.18.X.X:50396] PHP Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 56061 bytes) in /var/www/owncloud/apps/files_sharing/lib/cache.php on line 164, referer: https://SERVER/index.php/apps/files/
[Sun Jan 11 10:27:28.237122 2015] [:error] [pid 9733] [client 172.18.X.X:43989] PHP Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 55773 bytes) in /var/www/owncloud/apps/files_sharing/lib/cache.php on line 164, referer: https://SERVER/index.php/apps/files/

captura de pantalla 2015-01-11 a las 10 52 32

Hope this would be helpful for you. If you need further testing please let me know.

Thank you in advance.

Kind Regards,

Angel.

@LukasReschke

This comment has been minimized.

Copy link
Member

LukasReschke commented Jan 11, 2015

Can you do a profiling for the request that gives the memory error and then click in blackfire on "Memory"? - That may give us some insights.

@legnabar

This comment has been minimized.

Copy link
Author

legnabar commented Jan 11, 2015

Hi Lukas,

Actually I have already done it, but for unknown reason blackfire does not gather any info of it and in the slot it reads: Unknown Profile context.

I'll keep on trying.

Any ideas on why blackfire does not record it correctly?

Thank you.

Kind regards,

Angel.

El 11/1/2015, a las 11:29, Lukas Reschke notifications@github.com escribió:

Can you do a profiling for the request that gives the memory error and then click in blackfire on "Memory"? - That may give us some insights.


Reply to this email directly or view it on GitHub.

@LukasReschke

This comment has been minimized.

Copy link
Member

LukasReschke commented Jan 11, 2015

For the user where searching now works I see the following performance enhancements possible:

I wonder whether the difference between the user where searching works fast and the user where it is slow is that one user has received some shares? – I see a lot of sharing related code paths called which in sum need again more than 80% (!) of the processing time. Can you confirm that the first user has active shares?

Any ideas on why blackfire does not record it correctly?

Seems like their module does not profile completely when the script times out before (or requires too much memory etc…). So nothing we can change.

@legnabar

This comment has been minimized.

Copy link
Author

legnabar commented Jan 11, 2015

Yes, I can confirm that the user with no response apart from having more files, also has lots of shared files and folders in both ways: files shared with others and files that others share with him.

Thank you.

Angel.

@LukasReschke

This comment has been minimized.

Copy link
Member

LukasReschke commented Jan 11, 2015

About how many files are we talking here that the user has? - Hundreds, thousands, millions?
Also can you take a look how many shares he has?

The following SQL commands should help you for that:

  • Listing of all shares that the user "test" has received
    • SELECT * FROM oc_share WHERE share_with = "test";
  • Listing of all shares that the user "test" has created
    • SELECT * FROM oc_share WHERE uid_owner = "admin";

For the files list I don't have a handy SQL query available since I'm bad with Joins etc ;-) – I'd go the with the following command in the users files folder in data: find . -type f | wc -l

Would be great if you could complete the following list with the missing XXX:

  1. User 1 has about 300 files and XXX shares – searches take about 17 seconds and after disabling files_external take only 6 seconds.
  2. User 2 has about XXX files and XXX shares – searching are timing out because too much memory allocated.
  3. User 3 is a copy of the files of User 2 without shares – searches are very fast with about 130 ms.

In case of user 3 are the search results actually returning sane results?

@legnabar

This comment has been minimized.

Copy link
Author

legnabar commented Jan 11, 2015

Hi Lukas,

I come back with the info, surprisingly the figures does not seem too high
for me:

  1. User 1 has 975 files and 12 shares with others and 8 shares with him.
    (this takes 6 seconds after removing the external_files app).
  2. User 2 has 6475 files and 109 shares with others and 14 shares with him
    (this one never completes)
  3. User 3 is a copy of User 2 so 6476 files, but no shares so far. Searches
    are very fast (as expected). I may try to add more shares to this test user
    now with another dummy user in order to try to recreate similar environment
    as with user 1.

Yesterday with user 2 I also tried to truncate oc_fscache table (delete
from oc_fscache where user="User2";) but didn't help. Another test I did
was to REMOVE the files in the filesystem for user 2 then copied over the
files again (all the shares were lost, did a occ files:scan user2 but still
stuck with the same issue. I then rolled back all these changes recovering
a full DB backup.

Finally, yes, the results of User 3, are correct, consistent and are quick
as they where in version 6.0.X

Hope his helps.

Kind regards,

On Sun, Jan 11, 2015 at 1:00 PM, Lukas Reschke notifications@github.com
wrote:

About how many files are we talking here that the user has? - Hundreds,
thousands, millions?
Also can you take a look how many shares he has?

The following SQL commands should help you for that:

  • Listing of all shares that the user "test" has received
    • SELECT * FROM oc_share WHERE share_with = "test";
      • Listing of all shares that the user "test" has created
    • SELECT * FROM oc_share WHERE uid_owner = "admin";

For the files list I don't have a handy SQL query available since I'm bad
with Joins etc ;-) – I'd go the with the following command in the users
files folder in data: find . -type f | wc -l

Would be great if you could complete the following list with the missing
XXX:

  1. User 1 has about 300 files and XXX shares – searches take about 17
    seconds and after disabling files_external take only 6 seconds.
  2. User 2 has about XXX files and XXX shares – searching are timing
    out because too much memory allocated.
  3. User 3 is a copy of the files of User 2 without shares – searches
    are very fast with about 130 ms.

In case of user 3 are the search results actually returning sane results?


Reply to this email directly or view it on GitHub
#13237 (comment).


Ángel Barrio Martínez

@PVince81

This comment has been minimized.

Copy link
Member

PVince81 commented Jan 12, 2015

Can you check whether the slow users are the ones receiving shares ?
New users usually don't have shares initially, so it could be a reason.

@legnabar

This comment has been minimized.

Copy link
Author

legnabar commented Jan 12, 2015

Hi,

I have been further testing with the new user created (user 3) with the same amount of files uploaded and then I share with him from another user a total of 20 folders and I can see that, as suggested the search time is much worse than when this user was first created, so as you suggested the issue may lay within the shared files list.

I would single out the user2 (didn't complete with memory error php) because I think that this user has another issue that is affecting the behaviour causing the memory error, not only while searching but also while browsing gallery or (this user happens to be a member of admin gropu) also in the user management web page as you can see down here:

"[Sun Jan 11 20:24:55.287741 2015] [:error] [pid 22083] [client IPADDRESS:63967] PHP Fatal error:  Allowed memory size of 536870912 bytes exhausted (tried to allocate 56241 bytes) in /var/www/owncloud/apps/files_sharing/lib/sharedstorage.php on line 75, referer: https://SERVERNAME/index.php/settings/users
[Mon Jan 12 08:08:21.718764 2015] [:error] [pid 14795] [clientIPADDRESS:4497] PHP Fatal error:  Allowed memory size of 536870912 bytes exhausted (tried to allocate 56025 bytes) in /var/www/owncloud/apps/files_sharing/lib/sharedstorage.php on line 75, referer: https://SERVERNAME/index.php/apps/files/
[Mon Jan 12 08:09:06.675757 2015] [:error] [pid 27908] [clientIPADDRESS:4549] PHP Fatal error:  Allowed memory size of 536870912 bytes exhausted (tried to allocate 55575 bytes) in /var/www/owncloud/apps/files_sharing/lib/cache.php on line 339, referer: https://SERVERNAME/index.php/apps/gallery
[Mon Jan 12 08:53:00.911939 2015] [:error] [pid 31760] [clientIPADDRESS:3961] PHP Fatal error:  Allowed memory size of 536870912 bytes exhausted (tried to allocate 55872 bytes) in /var/www/owncloud/apps/files_sharing/lib/cache.php on line 164, referer: https://SERVERNAME/index.php/settings/users"

Regarding this user in particular, I did further reserching within DB tables and this user has lots of references in oc_properties table like this one:

| Username | /FolderX/filename.jpg                                                                                                                                                             | {DAV:}getetag                                     | "518a3ca62d6770.80708943"     |  4307 |

I can't see any references like this in other users, is something that is is use in OC 7.0.4 or could instead be safely truncated?

Take into account that this is an installation that started with OC 4.5, then upgraded to 5.0 and 6.0.X until get to 7.0.4 now, so could it be that something "legacy" in the DB is making noise to the point of causing this issue?

I will continue with the research on my own, however, if you need anything else on mi side please let me know.

Thank you very much for the superb support.

Best Regards,

Edit: added triple backquotes for log

@LukasReschke

This comment has been minimized.

Copy link
Member

LukasReschke commented Jan 12, 2015

Thank you very much for the superb support.

And that all for free – isn't a open-source community great :-)

Thanks for your patience and help in debugging this. Usually people are much less helpful and restrict themselves to posting things like "That crap doesn't work" ;-)

@PVince81

This comment has been minimized.

Copy link
Member

PVince81 commented Jan 12, 2015

From what I saw in the code searching through folders as recipient is slow, because it is running the search query on every received shared folder separately. I'm not aware of any changes regarding that part. I believe 6.0.6 had that code path too.

As for oc_properties: that table isn't supposed to contain "getetag" so I think you can remove all such entries.

Also I saw you talking abous oc_fscache: this table is very old and obsolete. Unused since OC 5 I believe.

@legnabar

This comment has been minimized.

Copy link
Author

legnabar commented Jan 12, 2015

Absolutely Lukas, I think owncloud is not only one of the best softwares I've ever tried in my 14 years of IT experience, but also one the best supported so far, much better than the majority of propietary services for sure :-). Awesome, please carry on.

Regarding the unused tables on the DB, I supposed they were no longer in use, but now I will remove them at least for the most problematic user just in order to see whether any of this could affect this one. Actually, the most problematic user is my own user, I think it was the very first one user of the instance back in version 4.5. :-) Can I safely delete the whole oc_fscache? Do you have the names of tables of a freshly (not migrated) installation of OC 7.0.4 with the apps I currently have enabled so that I can compare?

@PVince81 I assure you that with 6.0.6 everything was working fine, nevertheless I'm quite convinced that this could be a legacy issue or something inherited from even past versions.

For the moment, with the files_external disabled, I am not aware of any other user, except mine, with the memory timeout issue, in every single user I tested the searches work, however, as I told you, they take much longer than in version 6.0.6, they should be in the ms range an not in the seconds range as they are.

Thank you Vince & Lukas.

Best Regards,

@DeepDiver1975

This comment has been minimized.

Copy link
Member

DeepDiver1975 commented Jan 12, 2015

As for oc_properties: that table isn't supposed to contain "getetag" so I think you can remove all such entries.

Cries for another repair step

@PVince81

This comment has been minimized.

Copy link
Member

PVince81 commented Jan 12, 2015

Don't cry, just raised this: #13281

@legnabar

This comment has been minimized.

Copy link
Author

legnabar commented Jan 13, 2015

Hi Again,

I did some more research on DB comparison and for that I installed from scratch a 7.0.4 in another demo server and enable the same apps on both instances, except from Calendar, the comparison between both instances: the old one and the newly deployed one shows the following differences:

image

As you can see on the picture of the left, apart from the already commented oc_fscache, there are other tables that exists in the upgraded install but does not exist in a newly deployed instance. Of course maybe any of these could be generated with the use, but there are quite suspicious in my humble opinion:

First, oc_oc_filecache_6dxxxxxxx as far as I know is a temporary table used during migration process, and it should have been deleted after completion, which in this case did not happen. I think is safe to drop this table completely ,so I will do so. I remind you that during the upgrade process from 6.0.6 no errors were triggered. The original oc_filecache table has a total of 1.5 million entries, I think is fine because every single file stored on the instance has an entry on this table.

Then, oc_permissions seems to match shared files with the numeric value of the permissions associated with them, but I am wondering whether this is of use as for 7.0.4 as in oc_share is also the same info.

Last, the table oc_queuedtask has zero rows so it doesn't seem to be in use at all.

I hope this information could shed light on this issue or at least be helpful to improve the installation process of Owncloud.

Thank you in advance.

Best Regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment