Improve performance of getAllAfter queries into DTOs [8] #9320

StefanKock · 2022-05-24T12:39:38Z

Problem Description

As shown by the following analysis, many getAllAfter methods show an inperformant pattern:

The "Entity"Service.getAllAfter method takes some seconds. As shown in Improve performance of PersonService.getAllAfter method [3] #8946 (comment), this can be improved by initially fetching only the ids (reduced distinct effort) and using a dedicated index with appropriate sorting.
"Entity"Service.inJurisdictionOrOwned per each entity used for pseudonymization seems to be inperformant. For Cases it took ~330ms per entity, for Persons ~0,3ms per entity with in clause (PersonService.getInJurisdictionIDs).

Analysis

Dataset:

1225343 persons
85677 cases
114782 contacts
236583 tasks
2600890 immunizations
1991 vaccinations
15693 samples
11846 events
534 eventparticipants

The following measurements were taken from backend logs (EJB methods) and the Postgres logs.
Observations:

the actual SQL queries are executed in few seconds while the EJB methods take minutes to complete
a relevant contribution to runtime comes from inJurisdictionOrOwned methods, here mainly the number of calls
batch size is relevant. Some requests did not complete within reasonable time for a batch size of 10000, these were measured for batch size 1000.

Requests:

http://localhost:6080/sormas-rest/persons/all/1637090372005/10000/NO_LAST_SYNCED_UUID: 3 min 5 sec
server: 2 min 58 sec

SQL queries:
duration: 948.716 ms
duration: 2297.275 ms
duration: 4012.220 ms
duration: 4393.538 ms
duration: 696.277 ms
~ 12.5 sec
Note: SQL query times are inaccurate, either due to an error in the analysis or changes introduced later, see later analysis of this method.
http://localhost:6080/sormas-rest/cases/all/1637090372005/10000/NO_LAST_SYNCED_UUID: 8 min 21 sec
server: 8 min 18 sec

SQL queries:
duration: 787.011 ms
duration: 449.616 ms
duration: 146.224 ms
duration: 178.515 ms
duration: 154.552 ms
duration: 136.836 ms
duration: 142.597 ms
duration: 143.688 ms
duration: 150.584 ms
duration: 152.298 ms
duration: 133.814 ms
duration: 147.687 ms
~ 3 sec
http://localhost:6080/sormas-rest/contacts/all/1637090372005/10000/NO_LAST_SYNCED_UUID: 35 sec
server: 35 sec

SQL queries:
duration: 151.413 ms
duration: 143.237 ms
duration: 152.661 ms
duration: 144.054 ms
duration: 145.374 ms
duration: 155.861 ms
duration: 177.212 ms
duration: 153.647 ms
duration: 201.516 ms
duration: 187.382 ms
~ 1.6 sec
http://localhost:6080/sormas-rest/tasks/all/1637090372005/1000/NO_LAST_SYNCED_UUID: 1 min 27 sec
server: 1 min 24 sec

SQL queries:
duration: 466.097 ms
duration: 154.093 ms
duration: 132.150 ms
duration: 137.751 ms
duration: 140.757 ms
duration: 135.526 ms
duration: 136.612 ms
duration: 126.399 ms
duration: 124.695 ms
duration: 127.888 ms
duration: 133.940 ms
~1.9 sec
http://localhost:6080/sormas-rest/samples/all/1637090372005/10000/NO_LAST_SYNCED_UUID: 27 sec
server: 27 sec

SQL queries:
duration: 129.587 ms
http://localhost:6080/sormas-rest/immunizations/all/1637090372005/1000/NO_LAST_SYNCED_UUID (1min 50 sec)
server: 1 min 48 sec

SQL queries:
duration: 6985.128 ms
duration: 106.608 ms
~ 7 sec

Proposed Change

1. Rewrite AdoServiceWithUserFilter.getAllAfter to first fetch the needed ids (see pattern in PersonService.getAllAfter, then fetch the entities by id with IN-clause (use BaseAdoService.getByIds).
2. Add indices for sorting.
3. Use the pattern of PersonService.getInJurisdictionIDs to query by ids with IN clause also for other entities where "Entity"Service.inJurisdictionOrOwned is currently running one query per Entity.

Acceptance Criteria

An analysis before/after shows the performance improvement (also mentioning the amount of existing and queried entities). -> comment

Implementation Details

All getAllAfter and getInJurisdictionIDs methods avoid parameter limit exception with IterableHelper.executeBatched batching.
Remove or adapt not with superclass aligned getAllAfter implementations in:
- CampaignService.getAllAfter
- CampaignFormMetaService.getAllAfter
- ContactService.getAllAfter (active contacts filter)
- EventParticipantService.getAllAfter (not archived filter)
- EventService.getAllAfter (configured EventUserFilterCriteria)
- ImmunizationService.getAllAfter
- BaseTravelEntryService.getAllAfter
Remove or adapt not with superclass aligned implementations parallel to getAllAfter:
- CampaignFormDataService.getAllActiveAfter
- TreatmentService.getAllActiveTreatmentsAfter
- PathogenTestService.getAllActivePathogenTestsAfter
- PrescriptionService.getAllActivePrescriptionsAfter
- SampleService.getAllActiveSamplesAfter
- ActionService.getAllActionsAfter
- AdditionalTestService.getAllActiveAdditionalTestsAfter
- ClinicalVisitService.getAllActiveClinicalVisitsAfter
- TaskService.getAllActiveTasksAfter
- CaseService.getAllActiveCasesAfter with boolean includeExtendedChangeDateFilters that is never used outside of tests (introduced with Also consider indirectly dependent objects when determining when a case was last updated [0.5] #2059, partly removed by [SORMAS2SurvNet] Cleanup: Remove CaseResource::getAllCasesWithExtendedChangeDateFilters #2674).

Additional Information

Sibling to Improve performance of PersonService.getAllAfter method [3] #8946
Batch jurisdiction processing for Persons was done in Pseudonymization of sensitive data [3+2] #2130 (commit 305b169).
I tried to use CREATE INDEX CONCURRENTLY but that does not work within a transaction as it is used in the automatic db migration. If deployments get stuck because of INDEX creation (not sure what the likelyhood is), they can be created with CREATE INDEX CONCURRENTLY IF NOT EXISTS ... before on running instance.
This ticket improves the performance of the following functions:
1. Sync for android-app (getAllAfter methods) for most to all entities.
2. List (getIndexList) for AdditionalTest, PathogenTest
3. List (getByUuids, getBy"Reference"/"criteria") for Case, Sample, Task, AdditionalTest, PathogenTest, Prescription, Treatment, Vaccination, Visit.

Performance Analysis

Preliminary analysis of performance before and after the changes introduced in PR #10282

(tested on dafd47d)

The analysis was conducted by measuring the runtime of REST calls to the respective endpoints with batch size 1000.
A substantial improvement can be observed for all entities with long response times:

cases (1000) .................00:05:28 -> 00:01:31
contacts (1000) ..............00:01:07 -> 00:00:30
immunizations (1000) .........00:01:16 -> 00:00:12
samples (1000) ...............00:01:41 -> 00:00:20
tasks (1000) .................00:01:06 -> 00:00:18
visits (1000) ................00:00:27 -> 00:00:04

An exception is pathogen test with no improvement / slightly worse performance:

pathogentests (1000) .........00:01:15 -> 00:01:26

The results based on the dataset of the initial analysis (see issue description). Further measurements will be performed as needed (e.g., retesting of individual endpoints, larger batch sizes etc.).

Analysis data:
performance_9320.zip

Detailed analysis

Full list of REST calls
Performance of REST calls (.../<entities>/all/) before the refacoring:

actions (1000) ...............00:00:00
additionaltests (1000) .......00:00:01
aggregatereports (1000) ......00:00:00
areas (1000) .................00:00:00
campaigns (1000) .............00:00:00
cases (1000) .................00:05:28
clinicalvisits (1000) ........00:00:00
communities (1000) ...........00:00:00
contacts (1000) ..............00:01:07
continents (1000) ............00:00:00
countries (1000) .............00:00:01
customizableenumvalues (1000) 00:00:00
diseaseconfigurations (1000) .00:00:00
districts (1000) .............00:00:00
eventparticipants (1000) .....00:00:07
events (1000) ................00:00:04
featureconfigurations (1000) .00:00:00
immunizations (1000) .........00:01:16
pathogentests (1000) .........00:01:15
persons (1000) ...............00:00:07
pointsofentry (1000) .........00:00:00
prescriptions (1000) .........00:00:00
regions (1000) ...............00:00:00
samples (1000) ...............00:01:41
sharerequests (1000) .........00:00:00
subcontinents (1000) .........00:00:00
tasks (1000) .................00:01:06
treatments (1000) ............00:00:00
users (1000) .................00:00:00
userroles (1000) .............00:00:00
visits (1000) ................00:00:27
weeklyreports (1000) .........00:00:01

... and after:

actions (1000) ...............00:00:01
additionaltests (1000) .......00:00:01
aggregatereports (1000) ......00:00:00
areas (1000) .................00:00:00
campaigns (1000) .............00:00:01
cases (1000) .................00:01:31
clinicalvisits (1000) ........00:00:00
communities (1000) ...........00:00:00
contacts (1000) ..............00:00:30
continents (1000) ............00:00:00
countries (1000) .............00:00:00
customizableenumvalues (1000) 00:00:00
diseaseconfigurations (1000) .00:00:00
districts (1000) .............00:00:00
eventparticipants (1000) .....00:00:05
events (1000) ................00:00:02
featureconfigurations (1000) .00:00:00
immunizations (1000) .........00:00:12
pathogentests (1000) .........00:01:26
persons (1000) ...............00:00:08
pointsofentry (1000) .........00:00:00
prescriptions (1000) .........00:00:01
regions (1000) ...............00:00:00
samples (1000) ...............00:00:20
sharerequests (1000) .........00:00:00
subcontinents (1000) .........00:00:00
tasks (1000) .................00:00:18
treatments (1000) ............00:00:01
users (1000) .................00:00:00
userroles (1000) .............00:00:00
visits (1000) ................00:00:04
weeklyreports (1000) .........00:00:00

Execution details for selected entities

Cases
before:

after:

Here the calls to CaseService.inJurisdictionOrOwned per selected case have been dropped, roughly accounting for the measurde difference.

Contacts
before:

after:

As for cases, a heavy reduction of calls to ContactService.inJurisdictionOrOwned.

Immunizations
before:

after:

Here the main culprits were calls to ImmunizationService.inJurisdictionOrOwned and FeatureConfigurationFacadeEjb.isPropertyValueTrue

Tasks
before:

after:

Here, calls to TaskService.inJurisdictionOrOwned on a per object basis accounted for most of the time spent.

Analysis for different batch sizes

A analysis with different batch sizes after completion of this task shows:

request durations are substantially improved (see times above)
there is little overhead (long running SQL queries etc.) for small batch sizes: for batch size 1, no entity exceeds a response time of 1second
for larger batch sizes, response time is approximately linear to the batch size
this is consistent across all entities (response time for 1000 pathogentests is down to 15 sec now)

Response times for selected entities:

cases
(1) ....................00:00:01
(10) ...................00:00:03
(100) ..................00:00:08
(500) ..................00:00:36
(1000) .................00:01:11
(5000) .................00:05:52

persons
(1) ..................00:00:01
(10) .................00:00:00
(100) ................00:00:01
(500) ................00:00:04
(1000) ...............00:00:07
(5000) ...............00:00:33

samples
(1) ..................00:00:01
(10) .................00:00:01
(100) ................00:00:02
(500) ................00:00:10
(1000) ...............00:00:17
(5000) ...............00:01:10

Analysis data:
differentBatchSizes.zip

The text was updated successfully, but these errors were encountered:

ClimedoHealthCodingTests · 2022-05-31T10:43:38Z

Hi there, I am not sure if this is related but since recently we are having issues to get all tasks created within the last 24 hours via /tasks/all/{since} No matter the size of the response e.g. even if only 1 task will be returned matching the time criteria. No response even for 5 minutes. The highest possible timespan that we got working on some instances is last 4 hours. Even that one takes a couple of minutes to respond. Let us know if you need more info around the issue and keep us posted about the outcome here.

StefanKock · 2022-05-31T13:24:51Z

Hi there, I am not sure if this is related but since recently we are having issues to get all tasks created within the last 24 hours via /tasks/all/{since}

Identifying the changed entities is what we want to improve. That includes also tasks.

ClimedoHealthCodingTests · 2022-06-01T07:54:46Z

Perfect! Are there already some concrete plans when can this one be expected to make its way to production?

StefanKock · 2022-09-06T08:19:47Z

Discussed current status and next steps with @HolgerReiseVSys and @syntakker. Results:

Update feature branch to current development branch
@syntakker will start with performance tests on cases, contacts, immunizations, tasks, samples as prechecks
Integrate CaseService.getAllActiveCasesAfter as all the other sibling methods.
Refactor BaseAdoService.getAll to getList to be reusable with other filters, make it protected or leave public?
Indizes for all entities that implement AdoServiceWithUserFilter

- Removed superfluous includeExtendedChangeDateFilters parameter

- Removed redundant getAll(Predicate, Integer) method

- Sorting matches BaseAdoService.getList queries

- Removed redundant getAll(Predicate, Integer) method

- Sorting matches BaseAdoService.getList queries

- Unified methods in CoreAdoServices for Facade: - getInJurisdictionIds: List<Long> - inJurisdictionOrOwned: boolean - inJurisdictionOrOwned: Predicate - Made jurisdiction check and convertToDto resilient against null entites

- new: BaseAdoService.getIdList with batching

Applied toPseudonymizedDtos and createPseudonymizer pattern in corresponding Facades

- Adjusted to match patterns for CoreAdo, added VisitQueryContext

- Case dependent entities: SurveillanceReport, ClinicalVisit, Prescription, Treatment - Sample dependent entities: AdditionalTest, PathogenTest - Immunization dependent entities: Vaccination

- Removed duplicate ContactJurisdictionFlagsDto - Simplified jurisdiction check where only Contact itself is relevant

- Removed obsolete method BaseAdoService.getBatchedQueryResults - Removed duplicate method CaseService.createActiveCasesFilter - Fixed naming of a Case unit test

…getAllAfter_methods #9320 improve getAllAfter methods and batch pseudonymization

StefanKock · 2022-09-21T08:21:52Z

Changes on performance tooling still open

* Performance analysis: process directories #9320 * Default log level TRACE for cargo server #9320 * Automate REST performance analysis #9320 * update TROUBLESHOOTING.md #9320

StefanKock added backend Affects the web backend change A change of an existing feature (ticket type) performance Issues that are meant to increase the performance of the application general Not directly concerned with any particular functional section of the application labels May 24, 2022

This was referenced May 24, 2022

Improve performance of android-app with large data #7372

Closed

Improve performance of PersonService.getAllAfter method [3] #8946

Closed

StefanKock added refactoring Technical refactoring of an existing feature code-quality Reliability, maintainability, reusability, or testability labels May 24, 2022

StefanKock changed the title ~~Improve performance of getAllAfter queries into DTOs~~ Improve performance of getAllAfter queries into DTOs [8] Jul 27, 2022

StefanKock added this to Backlog in SORMAS Team 2 - DEV - Iteration Backlog via automation Aug 31, 2022

StefanKock self-assigned this Sep 5, 2022

StefanKock moved this from Backlog to In Progress in SORMAS Team 2 - DEV - Iteration Backlog Sep 5, 2022

StefanKock added a commit that referenced this issue Sep 5, 2022

#9320 Refactor: getAllAfter with two steps

0b798cd

StefanKock added a commit that referenced this issue Sep 5, 2022

#9320 Removed superfluous duplicate methods

cd515a1

StefanKock added a commit that referenced this issue Sep 5, 2022

#9320 Unified usage to superclass getAllAfter method

0bd32e0

StefanKock mentioned this issue Sep 5, 2022

#9320 improve get all after methods #10282

Merged

StefanKock added a commit that referenced this issue Sep 6, 2022

#9320 Refactor: getAllAfter with two steps

75c5d61

StefanKock added a commit that referenced this issue Sep 6, 2022

#9320 Removed superfluous duplicate methods

f96b9ec

StefanKock added a commit that referenced this issue Sep 6, 2022

#9320 Unified usage to superclass getAllAfter method

f1f4ad1

StefanKock added a commit that referenced this issue Sep 6, 2022

#9320 Unified usage to superclass getAllAfter method (CaseService)

bf7f774

- Removed superfluous includeExtendedChangeDateFilters parameter

StefanKock added a commit that referenced this issue Sep 6, 2022

#9320 Removed superfluous duplicate methods in Facades

5438cba

StefanKock added a commit that referenced this issue Sep 6, 2022

#9320 Removed superfluous duplicate methods in Facades

c92f448

StefanKock added a commit that referenced this issue Sep 6, 2022

#9320 Moved two step querying from getAllAfter to getList

f346fb5

- Removed redundant getAll(Predicate, Integer) method

StefanKock added a commit that referenced this issue Sep 6, 2022

#9320 superfluous method override in PersonService

b3789b9

StefanKock assigned syntakker Sep 7, 2022

StefanKock added a commit that referenced this issue Sep 7, 2022

#9320 Add sort index for all user filterable entities

c7b257d

- Sorting matches BaseAdoService.getList queries

StefanKock added a commit that referenced this issue Sep 21, 2022

#9320 Moved two step querying from getAllAfter to getList

c6d78c6

- Removed redundant getAll(Predicate, Integer) method

StefanKock added a commit that referenced this issue Sep 21, 2022

#9320 superfluous method override in PersonService

73ba2bd

StefanKock added a commit that referenced this issue Sep 21, 2022

#9320 Add sort index for all user filterable entities

f683eba

- Sorting matches BaseAdoService.getList queries

StefanKock added a commit that referenced this issue Sep 21, 2022

#9320 Unified PersonService.getInJurisdictionIds with CoreAdo logic

20e5545

- new: BaseAdoService.getIdList with batching

StefanKock added a commit that referenced this issue Sep 21, 2022

#9320 getJurisdictionFlags for List of Tasks and Samples

712d70d

Applied toPseudonymizedDtos and createPseudonymizer pattern in corresponding Facades

StefanKock added a commit that referenced this issue Sep 21, 2022

#9320 getAllAfter & getInJurisdictionIds pattern for Visits

8258181

- Adjusted to match patterns for CoreAdo, added VisitQueryContext

StefanKock added a commit that referenced this issue Sep 21, 2022

#9320 Introduced JurisdictionFlagsService (for Task & Sample)

bdedb31

StefanKock added a commit that referenced this issue Sep 21, 2022

#9320 Applied JurisdictionCheckService pattern for PersonService

961ae45

StefanKock added a commit that referenced this issue Sep 21, 2022

#9320 Applied JurisdictionCheckService pattern for Abs.CoreAdoService

8d7600e

StefanKock added a commit that referenced this issue Sep 21, 2022

#9320 Applied JurisdictionFlagsService pattern for ContactService

7c32949

- Removed duplicate ContactJurisdictionFlagsDto - Simplified jurisdiction check where only Contact itself is relevant

StefanKock added a commit that referenced this issue Sep 21, 2022

#9320 Make sure getSelectionAttributes gets a result for every id

dcc72bb

StefanKock added a commit that referenced this issue Sep 21, 2022

#9320 Applied JurisdictionCheckService pattern for VisitService

b88439d

StefanKock added a commit that referenced this issue Sep 21, 2022

#9320 Addressed review comments

7b9f95f

- Removed obsolete method BaseAdoService.getBatchedQueryResults - Removed duplicate method CaseService.createActiveCasesFilter - Fixed naming of a Case unit test

StefanKock closed this as completed in #10282 Sep 21, 2022

StefanKock added a commit that referenced this issue Sep 21, 2022

Merge pull request #10282 from hzi-braunschweig/feature-9320_improve_…

9b3271f

…getAllAfter_methods #9320 improve getAllAfter methods and batch pseudonymization

SORMAS Team 2 - DEV - Iteration Backlog automation moved this from Review to Testing Sep 21, 2022

StefanKock reopened this Sep 21, 2022

SORMAS Team 2 - DEV - Iteration Backlog automation moved this from Testing to In Progress Sep 21, 2022

StefanKock moved this from In Progress to Review in SORMAS Team 2 - DEV - Iteration Backlog Sep 21, 2022

syntakker added a commit that referenced this issue Sep 21, 2022

Review findings #9320

82abfe4

syntakker added a commit that referenced this issue Sep 27, 2022

Review findings #9320

4a50592

syntakker added a commit that referenced this issue Sep 27, 2022

update TROUBLESHOOTING.md #9320

5b45887

StefanKock closed this as completed in #10351 Sep 27, 2022

SORMAS Team 2 - DEV - Iteration Backlog automation moved this from Review to Testing Sep 27, 2022

StefanKock moved this from Testing to Done in SORMAS Team 2 - DEV - Iteration Backlog Oct 14, 2022

SORMAS-JanBoehme removed this from Done in SORMAS Team 2 - DEV - Iteration Backlog May 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of getAllAfter queries into DTOs [8] #9320

Improve performance of getAllAfter queries into DTOs [8] #9320

StefanKock commented May 24, 2022 •

edited by syntakker

ClimedoHealthCodingTests commented May 31, 2022

StefanKock commented May 31, 2022 •

edited

ClimedoHealthCodingTests commented Jun 1, 2022

StefanKock commented Sep 6, 2022 •

edited

StefanKock commented Sep 21, 2022

Improve performance of getAllAfter queries into DTOs [8] #9320

Improve performance of getAllAfter queries into DTOs [8] #9320

Comments

StefanKock commented May 24, 2022 • edited by syntakker

Problem Description

Proposed Change

Acceptance Criteria

Implementation Details

Additional Information

Performance Analysis

ClimedoHealthCodingTests commented May 31, 2022

StefanKock commented May 31, 2022 • edited

ClimedoHealthCodingTests commented Jun 1, 2022

StefanKock commented Sep 6, 2022 • edited

StefanKock commented Sep 21, 2022

StefanKock commented May 24, 2022 •

edited by syntakker

StefanKock commented May 31, 2022 •

edited

StefanKock commented Sep 6, 2022 •

edited