Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating $page->parent extremely slow; Sometimes selector 'has_parent' is not working #1297

Closed
dennisspohr opened this issue Dec 28, 2020 · 8 comments

Comments

@dennisspohr
Copy link

Short description of the issue

3 weeks ago we upgraded ProcessWire from 3.0.148 to 3.0.165. Since then we had problems quite randomly when using $page->find(); We figured out that the same selector was working with $pages->find() - therefore the selector "has_parent" was not working. The issue was happening just some times and was always resolved after a few minutes, without us doing anything. It looks like there's missing data in the pages_parents table and somehow it's getting fixed by the system after some time.

Additionally changing the parent of a page was getting extremely slow. The following codes took 20-30 seconds:

$page->of(false);
$page->parent = $anotherpage.
$page->save();

We switched back to Processwire 3.0.148 and since then we don't have these issues anymore.

Setup/Environment

  • ProcessWire version: 3.0.165
  • PHP version: 7.4.13
  • MySQL: MariaDB 10.3.27 with InnoDB

Additional details

Our database is quite large:

  • 900.000 entries in the pages table.
  • 90.000 entries in the pages_parents table

Thread

https://processwire.com/talk/topic/24848-change-of-page-parent-is-extremely-slow/

@dennisspohr
Copy link
Author

Another thought on this topic - I guess this error appears after Ryan changed some behaviour dealing with page parents.
Would it be considerable to add a flag on the config for choosing which method to use?

Since a long time we would love to upgrade to the latest version, but because of this bug it is just not possible for us. In the latest ProcessWire version are many features we currently cannot use. We really hope this issue gets fixed soon or this config-setting would be an option.

This issue was posted 3 months ago and we would highly appreciate any feedback on this. Thank you!

@ryancramerdesign
Copy link
Member

@dennisspohr It is essentially an index of every page that has children. This part of the code was rewritten because the previous version had some issues that caused it to be sometimes unreliable. Now it is reliable, and should be faster most of the time as well, but part of the reason it is more reliable is that it recreates larger portions of the index in some cases. Some parent changes can affect other parent changes, so it can have to rewrite large portions of the index depending on the case. The index isn't technically necessary for anything other than to fulfill "has_parent=..." portions of a selector. So if you don't need "has_parent" in your selectors, then it would be okay to turn this index off. Let me know if that would help in your case and I can add a setting to disable that index.

@dennisspohr
Copy link
Author

@ryancramerdesign The point is, that in our case the code is getting extremely slow. I guess there should be a better solution. We are using the has_parent-Selector quite often, so removing this is unfortunately no option for us.

@lparikka
Copy link

lparikka commented Jan 3, 2022

@dennisspohr It is essentially an index of every page that has children. This part of the code was rewritten because the previous version had some issues that caused it to be sometimes unreliable. Now it is reliable, and should be faster most of the time as well, but part of the reason it is more reliable is that it recreates larger portions of the index in some cases. Some parent changes can affect other parent changes, so it can have to rewrite large portions of the index depending on the case. The index isn't technically necessary for anything other than to fulfill "has_parent=..." portions of a selector. So if you don't need "has_parent" in your selectors, then it would be okay to turn this index off. Let me know if that would help in your case and I can add a setting to disable that index.

Could it be possible to turn the index off for the time to move a big amount of pages under a new parent and then turn it back on and have the system recalculate the index once again?

@dennisspohr
Copy link
Author

@ryancramerdesign Are there any updates on this?

Unfortunately we have increased problems with the performance. There are new interesting features that could help us to deal with it, but because of this issue mentioned above we are still stuck on version 3.0.148.

  1. Is there a possibility to update our PW-version but with keeping the old has_parent-behaviour?
  2. Or are you even thinking of a fix our workaround?
  3. We have still (even on that old version) problems in changing the parent of a page, which very often can turn into a time-out when loading the page.

By now we have 9,3 million entries in the pages-table.
The pages_parent-table has 4,5 million entries.

I have opened this issue more than 2 years ago and I really hope we can find a suitable solution together.
Thank you!

@ryancramerdesign
Copy link
Member

@dennisspohr This is an issue I've not been able to duplicate and have not had other reports of either. Using the original example of $page->parent = $anotherpage; $page->save(); in several contexts, the worst case I was able to get was slightly under 1 second debug timer when it had to update the pages_parents index for 255k pages. I think the issue may be specific to conditions in your installation or the size of it, at least I don't have the ability to duplicate it here. If it's something where you are able to provide me with a copy of installation's database export, then that would help. Or if you'd like to hire me by the hour to find a solution for this particular site I can do that too.

I also wanted mention there's a hook you can use to bypass the update of the pages_parents table on a parent change. For instance, it's likely not necessary for you to keep track of parents for repeater pages (if you are using them) so you can tell it to skip over updating the pages_parents table for repeaters, as one example:

$repeaters = $pages->get('name=repeaters, parent_id=2'); 

$wire->addHookBefore('Pages::save', function($event) use($repeaters) {
  list($page, $options) = $event->arguments();
  if($page->parents->has($repeaters)) {
    $options['saveParentsTable'] = false;
    $event->arguments(1, $options);  
  }
}); 

ryancramerdesign added a commit to processwire/processwire that referenced this issue Feb 2, 2023
…rocesswire-issues#1297 by rewriting code that builds pages_parents table and requires fewer changes to the table. This is called on page parent changes and clone operations. Needs further testing on installation with 1+ million pages to compare with previous and confirm performance improvement while maintaining same accuracy.
@ryancramerdesign
Copy link
Member

@dennisspohr See the attached commit above for an attempted fix for this issue.

@dennisspohr
Copy link
Author

Works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants