Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recreation #756

Closed
wants to merge 2 commits into from
Closed

Recreation #756

wants to merge 2 commits into from

Conversation

jawalonoski
Copy link
Member

@jawalonoski jawalonoski commented Jul 21, 2020

This pull request attempts to address #752.

Recreation of data requires the use of a random seed (-s), a clinician seed (-cs), and a reference date (-r).

Things that will not be identical:

  • UUIDs: these will change across runs. Sorry.
  • Export History: if you run the same parameters next year, and only keep the default 10 years of history, then the data in the record will be filtered differently (i.e. the oldest one year from the first run will not be exported)..

Summary of Changes

  • Randomness
    • I made the Random inside Person to be private and made a few new accessor methods (e.g. randBoolean()).
    • Except for a few places (e.g. Generator.java), that means Random objects are no longer passed around as method arguments, and there should be no use of Random in the code.
    • All random number access goes through the Person#rand* methods.
    • There should also be no System.currentTimeInMillis() calls except for a few spots (e.g. Generator.java and sometimes export paths when folder per run settings are used).
  • Reference Dates
    • I added an optional -r switch for referenceDate which can be used to specify when Synthea bases all the birth dates from. In my testing I used 20200320 (or March 20th, 2020) but you can use any date. This defaults to now (System.currentTimeinMillis()) if no value is supplied.
  • Modules and States
    • All modules and states are completely cloned across the patients now. The modules used to be passed directly, and only the states were cloned. Now the modules are cloned as well. This was done to remove any possibility of multi-thread race conditions on accessing modules. Possibly unnecessary.
  • Wiki

@codecov
Copy link

codecov bot commented Jul 21, 2020

Codecov Report

Merging #756 into master will decrease coverage by 0%.
The diff coverage is 78%.

Impacted file tree graph

@@           Coverage Diff            @@
##             master    #756   +/-   ##
========================================
- Coverage        81%     81%   -1%     
- Complexity     2681    2713   +32     
========================================
  Files           100     101    +1     
  Lines         16802   16853   +51     
  Branches       2317    2324    +7     
========================================
+ Hits          13699   13738   +39     
- Misses         2404    2426   +22     
+ Partials        699     689   -10     
Impacted Files Coverage Δ Complexity Δ
...g/mitre/synthea/modules/HealthInsuranceModule.java 96% <0%> (-2%) 17 <0> (ø)
...java/org/mitre/synthea/world/agents/Clinician.java 83% <ø> (+3%) 9 <0> (ø)
.../mitre/synthea/world/concepts/BirthStatistics.java 58% <0%> (ø) 17 <0> (ø)
src/main/java/App.java 57% <16%> (-2%) 0 <0> (ø)
.../mitre/synthea/editors/GrowthDataErrorsEditor.java 66% <26%> (-2%) 27 <4> (-1)
...g/mitre/synthea/helpers/RandomNumberGenerator.java 42% <42%> (ø) 6 <6> (?)
...ava/org/mitre/synthea/modules/EncounterModule.java 97% <50%> (+<1%) 36 <0> (+1)
src/main/java/org/mitre/synthea/engine/State.java 90% <85%> (+<1%) 18 <0> (ø)
...va/org/mitre/synthea/world/geography/Location.java 72% <87%> (+<1%) 46 <6> (+6)
src/main/java/org/mitre/synthea/engine/Module.java 81% <90%> (+1%) 43 <2> (+3)
... and 37 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5921215...b56a7c1. Read the comment docs.

@jawalonoski jawalonoski marked this pull request as ready for review July 21, 2020 20:08
clone.name = this.name;
clone.submodule = this.submodule;
clone.remarks = this.remarks;
clone.states = new ConcurrentHashMap<String, State>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you could move this inside the if block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

@@ -168,7 +168,7 @@ public Person(long seed) {
}

/**
* Retuns a random double.
* Returns a random double.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having the random number related methods as public Person methods, I think it would be cleaner if they were all gathered together into an interface (e.g. RandomProvider) that Person implements. Methods that need access to randomness but not other aspects of Person could then just accept a RandomProvider instead of Person and this would make it clear which methods need Person access vs just Randomness.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An interface like RandomProvider is a good idea. I'll work on that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I called the interface RandomNumberGenerator because I feel like Provider is overloaded with meaning, e.g. healthcare provider.

@@ -439,7 +439,8 @@ private Clinician generateClinician(long clinicianSeed, Random clinicianRand,
long clinicianIdentifier, Provider provider) {
Clinician clinician = null;
try {
Demographics city = location.randomCity(clinicianRand);
Person doc = new Person(clinicianIdentifier);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be Person(clinicianSeed) instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I admit that it is a little hard to follow, but clinicianSeed is the seed for ALL of the clinicians that can be passed in with the GeneratorOptions. The clinicianIdentifier is the seed for THIS particular clinician.

I'll make some changes in this area when I refactor with a RandomProvider as suggested above.

@jawalonoski
Copy link
Member Author

@hadleynet Ready for your re-review.

@hadleynet
Copy link
Collaborator

Rebased and merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants