Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opening the tasks directory lead to not acceptable loading times #8637

Closed
2 tasks done
Tracked by #7734
StefanKock opened this issue Mar 31, 2022 · 4 comments · Fixed by #8711
Closed
2 tasks done
Tracked by #7734

Opening the tasks directory lead to not acceptable loading times #8637

StefanKock opened this issue Mar 31, 2022 · 4 comments · Fixed by #8711
Assignees
Labels
backend Affects the web backend change A change of an existing feature (ticket type) events performance Issues that are meant to increase the performance of the application qa-verified Issue has been tested and verified by QA tasks

Comments

@StefanKock
Copy link
Contributor

StefanKock commented Mar 31, 2022

Problem Description

In some German health departments the loading times for opening the tasks directory (my tasks + done) takes way too long. The health departments are reporting loading times up to 2,5 minutes.
The Long Query Logs can be found here

Proposed Change

  • 1. Reuse Joins, see marked duplicates in comment (except refactoring of jurisdiction checks -> Reuse joins for jurisdiction predicates [5] #8688)
  • 2. Avoid table joins if only the id needs to be checked.
    • Found for: event21_.reportingUser_id, event21_.responsibleUser_id, task0_.assigneeUser_id

Possible Alternatives

Additional Information

  • Split out from [PERFORMANCE] Opening the samples leads to not acceptable loading times #8610
  • Finding by @dinua: There are too many duplicate joins for example on cases. It seems that the reusage pattern for joins (AbstractDomainObjectJoins) is not yet introduced but should be.
  • Existing tests can be used to generate and compare SQL:
    • TaskFacadeEjbTest.testGetIndexList
    • TaskFacadeEjbTest.testFilterTasksByUserJurisdiction, especially query 2b. Region user now sees tasks from district level (set break point before and after to capture the Select Query in console)
@StefanKock StefanKock added tasks backend Affects the web backend change A change of an existing feature (ticket type) performance Issues that are meant to increase the performance of the application labels Mar 31, 2022
@StefanKock StefanKock added this to Backlog in SORMAS Team 2 - DEV - Iteration Backlog via automation Mar 31, 2022
@StefanKock StefanKock changed the title Opening the samples and tasks directory lead to not acceptable loading times Opening the tasks directory lead to not acceptable loading times Mar 31, 2022
@dinua
Copy link
Contributor

dinua commented Mar 31, 2022

here is an example how the query for search all tasks looks like
Between same entities there are to many joins
I think each time we do something like this
Join<Contact, Case> contactCaze = contact.join(Contact.CAZE, JoinType.LEFT);
a new join is created

query
select distinct 
........
from Task task0_ 
left outer join cases case1_ on task0_.caze_id=case1_.id 
left outer join Person person2_ on case1_.person_id=person2_.id 
left outer join Location location3_ on person2_.address_id=location3_.id 
left outer join Region region4_ on case1_.responsibleRegion_id=region4_.id 
left outer join Region region5_ on case1_.region_id=region5_.id 
left outer join District district6_ on case1_.responsibleDistrict_id=district6_.id 
left outer join District district7_ on case1_.district_id=district7_.id 
left outer join Community community8_ on case1_.responsibleCommunity_id=community8_.id 
left outer join Community community9_ on case1_.community_id=community9_.id 
left outer join Contact contact10_ on task0_.contact_id=contact10_.id 
left outer join Person person11_ on contact10_.person_id=person11_.id 
left outer join Location location12_ on person11_.address_id=location12_.id 
left outer join Region region13_ on contact10_.region_id=region13_.id 
left outer join District district14_ on contact10_.district_id=district14_.id 
left outer join Community community15_ on contact10_.community_id=community15_.id 
left outer join cases case16_ on contact10_.caze_id=case16_.id
left outer join Person person17_ on case16_.person_id=person17_.id 
left outer join cases case18_ on contact10_.caze_id=case18_.id  -- <- duplicates case16_ 
left outer join cases case19_ on contact10_.caze_id=case19_.id   -- <- duplicates case16_ 
left outer join cases case20_ on contact10_.caze_id=case20_.id  -- <- duplicates case16_ 
left outer join events event21_ on task0_.event_id=event21_.id 
left outer join Location location22_ on event21_.eventLocation_id=location22_.id 
left outer join Region region23_ on location22_.region_id=region23_.id 
left outer join District district24_ on location22_.district_id=district24_.id 
left outer join Community community25_ on location22_.community_id=community25_.id 
left outer join users user26_ on event21_.reportingUser_id=user26_.id
left outer join users user27_ on event21_.responsibleUser_id=user27_.id
left outer join users user28_ on event21_.reportingUser_id=user28_.id    -- <- duplicates user26_ 
left outer join users user29_ on event21_.responsibleUser_id=user29_.id    -- <- duplicates user27_ 
left outer join travelentry travelentr30_ on task0_.travelEntry_id=travelentr30_.id 
left outer join Region region31_ on travelentr30_.responsibleRegion_id=region31_.id 
left outer join District district32_ on travelentr30_.responsibleDistrict_id=district32_.id 
left outer join Community community33_ on travelentr30_.responsibleCommunity_id=community33_.id 
left outer join Person person34_ on travelentr30_.person_id=person34_.id 
left outer join cases case35_ on travelentr30_.resultingCase_id=case35_.id 
left outer join users user36_ on task0_.creatorUser_id=user36_.id 
left outer join users user37_ on task0_.assigneeUser_id=user37_.id
left outer join users user38_ on task0_.assigneeUser_id=user38_.id    -- <- duplicates user37_ 
left outer join cases case39_ on task0_.caze_id=case39_.id    -- <- duplicates case1_ 
left outer join Contact contact40_ on task0_.contact_id=contact40_.id    -- <- duplicates contact10_ 
left outer join cases case41_ on contact40_.caze_id=case41_.id    -- <- duplicates case16_ 
left outer join events event42_ on task0_.event_id=event42_.id    -- <- duplicates event21_ 
left outer join travelentry travelentr43_ on task0_.travelEntry_id=travelentr43_.id    -- <- duplicates travelentr30_ 

where 
......

@StefanKock StefanKock self-assigned this Apr 4, 2022
@StefanKock StefanKock moved this from Backlog to In Progress in SORMAS Team 2 - DEV - Iteration Backlog Apr 4, 2022
@StefanKock
Copy link
Contributor Author

StefanKock commented Apr 4, 2022

One severe finding I have is that many duplicated joins come in with the inJurisdictionOrOwned methods, because there we have the break in the transient joins that are not reused anymore (see TaskService.getJurisdictionSelections).

A clear sign is still this pattern in the query (Query updated after all other fixable findings where done):

...
LEFT OUTER JOIN cases case17_ ON contact11_.caze_id=case17_.id
LEFT OUTER JOIN Person person18_ ON case17_.person_id=person18_.id
LEFT OUTER JOIN cases case19_ ON contact11_.caze_id=case19_.id   -- <- duplicates case17_ 
LEFT OUTER JOIN cases case20_ ON contact11_.caze_id=case20_.id   -- <- duplicates case17_ 
LEFT OUTER JOIN cases case21_ ON contact11_.caze_id=case21_.id   -- <- duplicates case17_ 
LEFT OUTER JOIN cases case22_ ON contact11_.caze_id=case22_.id   -- <- duplicates case17_ 
...
LEFT OUTER JOIN EVENTS event24_ ON task0_.event_id=event24_.id
LEFT OUTER JOIN LOCATION location25_ ON event24_.eventLocation_id=location25_.id
LEFT OUTER JOIN Region region26_ ON location25_.region_id=region26_.id
LEFT OUTER JOIN District district27_ ON location25_.district_id=district27_.id
LEFT OUTER JOIN Community community28_ ON location25_.community_id=community28_.id
LEFT OUTER JOIN LOCATION location29_ ON event24_.eventLocation_id=location29_.id   -- <- duplicates location25_
LEFT OUTER JOIN LOCATION location30_ ON event24_.eventLocation_id=location30_.id   -- <- duplicates location25_
LEFT OUTER JOIN EventParticipant eventperso31_ ON event24_.id=eventperso31_.event_id
LEFT OUTER JOIN LOCATION location32_ ON event24_.eventLocation_id=location32_.id   -- <- duplicates location25_
...

I assume this finding affects most of the currently investigated performance problems.

TaskFacade.getIndexList BEFORE.sql.txt

@StefanKock
Copy link
Contributor Author

StefanKock commented Apr 4, 2022

Current state: Only duplicated joins are left over that are initiated by "Entity"Service.inJurisdictionOrOwned as sub-joins from the queried entity.

I reduced my example query from 54 joins to 41 joins affecting tasks and events queries.

TaskFacade.getIndexList AFTER.sql.txt

Update 2002-04-05: Decision for this ticket: We want to provide the changes we already have in a hotfix and push the problem with "Entity"Service.inJurisdictionOrOwned to a new ticket with a more holistic approach (-> #8688).

@AndyBakcsy-she
Copy link

Marking ticket as verified on the latest sormas hotfix 1.70.2 branch on @BarnaBartha's local environment.
Barna used a DB backup with a lot of data.
The Tasks Directory has approximately 92000 tasks and Loading time was 15 seconds.
From 2.5 minutes to 15 is a big improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Affects the web backend change A change of an existing feature (ticket type) events performance Issues that are meant to increase the performance of the application qa-verified Issue has been tested and verified by QA tasks
Projects
None yet
3 participants