Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix perfomance for search all project of user #1900

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

fAtsteD
Copy link

@fAtsteD fAtsteD commented Jul 30, 2023

Fix problem 32796

Performance with the project tree in 3 levels deep decreased very much.
It increases loading for the main page to 6 times faster for the problem.

Copy link
Member

@dregad dregad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution.

Do we really need a new, separate cache variable for this ?

$g_user_all_accessible_subprojects_cache is a superset of
$g_user_accessible_subprojects_cache, so I believe it would be better to reuse the existing variable, changing its structure to add user dimension and adapt user_get_accessible_subprojects() function accordingly.

@fAtsteD
Copy link
Author

fAtsteD commented Jul 31, 2023

Yeah, I can use like $g_user_accessible_subprojects_cache, but the functions have some differences that can imply the project:
user_get_accessible_subprojects() - return all subprojects in one level deep only
user_get_all_accessible_subprojects() - return all subprojects through all levels

Both functions do not return the tree for subprojects, they return one dimension array that holds all projects. If I use one cache variable, it requires different logic or it has to hold the tree structure.

@fAtsteD fAtsteD requested a review from dregad August 1, 2023 13:34
@dregad
Copy link
Member

dregad commented Aug 1, 2023

If I use one cache variable, it requires different logic or it has to hold the tree structure.

user_get_all_accessible_subprojects() recursively walks the project tree to get the subprojects, but as you pointed out it returns a flat list, so I really don't see why the cache should store a tree.

Given a user_id and a project_id, user_get_accessible_subprojects() will always return the same list of projects, which is what should be cached, to avoid execution of the expensive SELECT DISTINCT statements to retrieve subprojects from the DB.

Therefore I'm thinking that using a 2-dimensional cache with user_id and project_id in user_get_accessible_subprojects() should be enough to avoid the performance hit, even though it would be slightly more costly than having a specific cache for user_get_accessible_subprojects() since it would still go through the (cached) list of projects.

@fAtsteD
Copy link
Author

fAtsteD commented Aug 1, 2023

I misunderstood you. You think the problem is from the database, but the problem is from the code.

Function user_get_all_accessible_subprojects() runs slow and it runs many times. I have a problem that the database does not load, but the server works minutes before the page is ready.

I see that user_get_accessible_subprojects() caches only one project data and fixes the cache for all projects by users. I checked the performance with my database and it is a bit better. It loads the page in ~51 seconds instead of ~55.

My data use a tree structure of projects and they have many projects in the second and third levels, so mostly of the time it searches for subprojects of subprojects, etc. After updating the code (changing cache as you offered), I saw that it will be best for user_get_all_accessible_subprojects() to return the flat array of cached variable $g_user_accessible_subprojects_cache.

@fAtsteD
Copy link
Author

fAtsteD commented Aug 1, 2023

Maybe it requires more refactoring?

  • run a request to the database in user_get_all_accessible_subprojects()
  • change user_get_accessible_subprojects() for recall to user_get_all_accessible_subprojects() with the required project?

Copy link
Member

@dregad dregad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You think the problem is from the database

That is not correct. There is no problem with the database AFAIK, it's the code in user_get_accessible_subprojects() that I think should cache the SQL queries' result to avoid running them again and increase performance.

Maybe it requires more refactoring?
* run a request to the database in user_get_all_accessible_subprojects()
* change user_get_accessible_subprojects() for recall to user_get_all_accessible_subprojects() with the required project?

I'm not sure I understand what you mean by this.

Would you be able to provide your project list and hierarchy, so I can get a better understanding of your usage scenario and make some tests locally with meaningful data (e.g. a SQL dump of project and project_hierarchy tables would be great; feel free to anonymize project name, description and file_path

@dregad
Copy link
Member

dregad commented Aug 7, 2023

@fAtsteD waiting for your feedback.

@fAtsteD
Copy link
Author

fAtsteD commented Aug 8, 2023

Sorry, I have some busy weekends.

Tables with anonymizing data. It has possibly all active projects. If you run with my initial PR suggestion you can see performance in the loading page in times.

projects.sql.txt

@fAtsteD fAtsteD requested a review from dregad August 11, 2023 11:37
@dregad
Copy link
Member

dregad commented Aug 11, 2023

@fAtsteD thanks for the sample data. I'm working on other priorities at the moment, but I have not forgotten this. I'll get back to it as soon as I can, probably some time next week if all goes well.

In the meanwhile, may I suggest you address the merge conflicts ?

@fAtsteD
Copy link
Author

fAtsteD commented Aug 11, 2023

Yes, I fix conflicts.

@fAtsteD fAtsteD force-pushed the bug-32796 branch 4 times, most recently from 377cdf4 to e063999 Compare August 12, 2023 05:38
@dregad
Copy link
Member

dregad commented Aug 16, 2023

@fAtsteD I imported your project list and hierarchy, and created some dummy issues in each project in a fresh install.

I must be missing something, because I'm not seeing any noticeable difference in execution time between master branch and your PR; in both cases view_all_bug_page.php loads in under 0.35s on average (29 queries), logged in as administrator.

Did you make any changes to MantisBT source code, or are you using any plugins ?

@fAtsteD
Copy link
Author

fAtsteD commented Aug 16, 2023

I recheck, turned off all plugins, use the master branch and the problem persist (50+ seconds loading). I cannot understand why you don't have any problem. When I start with my branch, the loading time is 8 seconds.

Maybe I need some other checking, but I cannot understand why that part fixes the problem with my database.

For information, projects and users are nearly the same number. Each project in the leaf of the project tree can have up to 50 bugs.

@dregad
Copy link
Member

dregad commented Aug 17, 2023

For information, projects and users are nearly the same number. Each project in the leaf of the project tree can have up to 50 bugs.

The number of users or bugs should not have any impact on the

Not being able to reproduce makes it difficult for me to understand what the problem is, which is required to evaluate if the proposed fix is appropriate.

Maybe you can try to reproduce in a fresh install ? You could also try to build a simple test program that can be run independently from MantisBT and calling only the necessary functions to demonstrate the problem.

@dregad
Copy link
Member

dregad commented Aug 24, 2023

@fAtsteD you did not provide any additional information following my note of last week. I need to be able to reproduce the problem otherwise I will not be able to properly review and test.

@fAtsteD
Copy link
Author

fAtsteD commented Aug 24, 2023

I have a partially free weekend, so I'll retest the problem and create a plugin for generating test data for a clean install of the app.

I'll return on Monday with some updates.

@dregad
Copy link
Member

dregad commented Aug 24, 2023

Great, thanks for your help and understanding.

@fAtsteD
Copy link
Author

fAtsteD commented Aug 27, 2023

I started the new clean project and created a simple plugin for generating data of projects (with a hierarchy), bugs, and users. It is mostly used part data and they are huge (we then clean up our Mantis but it can happen later again). All data is generated it does not have any from the existing database of anonymized but it recreates the problem and with a fix from PR the performance increases.

Plugin for generating data - it is very simple and it takes a very long time to create. Use dump. The goal was to automate recreation but does not performance)

So, if it requires the full database, maybe my fix is not right, and the problem is deeper. I tried to run through Xdebug with writing time for functions, and the most run function was user_get_all_accessible_subprojects().

full_database_bug-32796.zip - this rar in zip because it is more than GitHub can attach.

@fAtsteD fAtsteD requested a review from dregad September 7, 2023 17:48
@dregad
Copy link
Member

dregad commented Sep 9, 2023

@fAtsteD sorry I've been busy and did not have time to look into this following your last update.

@fAtsteD
Copy link
Author

fAtsteD commented Oct 26, 2023

@dregad Do you have any updates? Do you check the reproducibility of the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants