Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for the high memory usage #935

Merged
merged 4 commits into from
Feb 13, 2024

Conversation

chandrashekar-s
Copy link
Collaborator

@chandrashekar-s chandrashekar-s commented Feb 5, 2024

Description of what I changed

Fixes #900

The incremental run used to fail when there were high number of FHIR resources to be processed already present in the system. This was because during the incremental run, the delta records had to be merged with the existing records in the parquet files and this needed reading of all the parquet records.

It turned out to be that the minimum amount of memory needed to read from all the parquet shards in parallel is determined by the below formula

memoryNeeded = Misc memory for JVM stack + (#Parallel Pipeline Threads * #Parallel Pipelines * Parquet Row Group Size)

This was causing failures for the default parquet row size of 128mb which needed a high memory to load these records into in-memory by all the pipeline threads.

Changes are made to reduce the Parquet Row Group Size to 32mb to run the pipelines even on a low resource environment and it is made configurable as well. Also, the application fails fast if the configured params is not sufficient.

E2E test

Ran the pipelines incremental runs and verified the following

  • Tested with an initial load of 80K patients already being set and then processed a delta records of 100 patients. The pipeline did not fail and successfully processed the records.

TESTED:

Please replace this with a description of how you tested your PR beyond the
automated e2e/unit tests.

Checklist: I completed these to help reviewers :)

  • I have read and will follow the review process.

  • I am familiar with Google Style Guides for the language I have coded in.

    No? Please take some time and review Java and Python style guides.

  • My IDE is configured to follow the Google code styles.

    No? Unsure? -> configure your IDE.

  • I have added tests to cover my changes. (If you refactored existing code that was well tested you do not have to add tests)

  • I ran mvn clean package right before creating this pull request and added all formatting changes to my commit.

  • All new and existing tests passed.

  • My pull request is based on the latest changes of the master branch.

    No? Unsure? -> execute command git pull --rebase upstream master

@chandrashekar-s
Copy link
Collaborator Author

chandrashekar-s commented Feb 5, 2024

Hi @bashir2, the changes are complete. However, the application is failing to launch in the e2e test cases because the machine used in the e2e runs is of type n1-highcpu-32 which I assume is a 32 core machine. But the default values currently only support up to 8 core machine.

Just wanted to get your opinion on whether should the default configurations of application should be increased to support the 32 core machine, which means the Xmx value should be defaulted to more than 6GB, or just add a WARN log and not fail the application to launch?

@chandrashekar-s
Copy link
Collaborator Author

Hi @bashir2, the changes are complete. However, the application is failing to launch in the e2e test cases because the machine used in the e2e runs is of type n1-highcpu-32 which I assume is a 32 core machine. But the default values currently only support up to 8 core machine.

Just wanted to get your opinion on whether should the default configurations of application should be increased to support the 32 core machine, which means the Xmx value should be defaulted to more than 6GB, or just add a WARN log and not fail the application to launch?

The other option which I am thinking is to automatically generate the JVM memory parameters based on the threads and parquet row group settings, which might be little non-trivial.

@chandrashekar-s
Copy link
Collaborator Author

chandrashekar-s commented Feb 7, 2024

In the latest commit, the default JVM memory values have been increased to support machines up to 32 cores. Also, a point to note is that the Parquet Group Row Size has been reduced from a default value of 128mb to 32mb. This is to support running the pipelines without fail even on low resource environment. The down sides of reducing this has been documented in the PR.

Copy link
Collaborator

@bashir2 bashir2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chandrashekar-s for the investigation and the fix. All of my comments are minor suggestions or questions. Once addressed, please feel free to merge this.

@chandrashekar-s chandrashekar-s merged commit 8854f23 into google:master Feb 13, 2024
5 checks passed
mozzy11 added a commit to mozzy11/openmrs-fhir-analytics that referenced this pull request Feb 14, 2024
remove redundant space

Fix for the high memory usage (google#935)

* Fix for the high memory usage

* Increased the default JVM memory values

* Review comments

minor format

increase time for reading paquet files

increase time

restore innitial time for reading parquet files

Fixed jackson issue for hapi upgrade (google#940)

Changed e2e test to use the latest version of HAPI (google#941)

Bump org.slf4j:slf4j-api from 2.0.11 to 2.0.12 (google#943)

Bump commons-codec:commons-codec from 1.16.0 to 1.16.1 (google#942)

try increasing time for parquet reader

test with default parquet row size

increase time

increase more time

increase row size

revert parquet runtime time to 5 minutes

Bump com.google.apis:google-api-services-healthcare (google#945)

reset rowGroupSizeForParquetFiles
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The pipeline fails with Out of Heap Space error for high number of FHIR resources
2 participants