Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring column output and export routines #3

Merged

Conversation

anthonysena
Copy link

@lhalvors @PRijnbeek I reviewed these changes and ran the code against our internal SQL PDW environment and ran into some issues. A first problem was that SQL PDW doesn't support INSERT statements that include common table expressions. As I got deeper into the review, I noted that you are storing data as:

Achilles_results.stratum_1 ==> concept_id
Achilles_results.stratum_2 ==> % people
Achilles_results.count_value ==> the frequency # for the concept_id in a person's history

As a result, you also had to modify the delete statement that removes any records based on the @smallcellcount setting:

delete from @results_database_schema.ACHILLES_results 
 where analysis_id <> 691 
 	and analysis_id <> 791 
 	and analysis_id <> 891 
 	and analysis_id <> 1891 
 	and count_value <= @smallcellcount;
delete from @results_database_schema.ACHILLES_results_dist 
 where analysis_id <> 691 
 	and analysis_id <> 791
 	and analysis_id <> 891
 	and analysis_id <> 1891
 	and count_value <= @smallcellcount;

In speaking with @chrisknoll, we noted that the Achilles_results.count_value should hold the count of persons as this is the convention that is used in the other analyses. As a result, I've refactored the frequency analyses to store the data as follows:

Achilles_results.stratum_1 ==> concept_id
Achilles_results.stratum_2 ==> the frequency # for the concept_id in a person's history
Achilles_results.count_value ==> the count of persons

I then revised the delete statements referenced above to remove your analyses from the list since we want to ensure that any analyses that have a person count <= @smallcellcount are censored properly.

Furthermore, I refactored the exportJSON SQL calls so that we calculate the % of persons and preserved the columns that you originally had in your query. The notable difference is that I've included a reference to another Achilles analysis to find the denominator when calculating the % which is consistent with the other export SQL scripts. Specifically, you will see this in the queries:

(select count_value from @results_database_schema.ACHILLES_results where analysis_id = 1) denom

Which references an analysis that computes the total # of people in the CDM.

I'd kindly ask you to review and test these changes. I know they will have a ripple effect to your changes for WebAPI/Atlas so I'm happy to help change that code as well. Please reach out with any questions and happy to discuss as needed. If and when you are comfortable with these changes, we can then push this update to the open pull request on the OHDSI repository.

Copy link

@lhalvors lhalvors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Achilles changes are OK.

In the exportToJson changes, the Y_NUM_PERSONS values currently come out as 1/100'th of the original.

Changing the following statement:
"1.0*num.count_value / denom.count_value as Y_NUM_PERSONS, "

to:
"cast(round((100.0*num.count_value / denom.count_value), 0) as int) as Y_NUM_PERSONS, "

should produce the same output values as before.

@anthonysena
Copy link
Author

Thanks Lars for the feedback. I've pushed a change to change the calculation of the Y_NUM_PERSONS column and I believe that should address the files for export. Please let me know if you find any other issues. Thanks!

@lhalvors lhalvors merged commit 18bbb4f into mi-erasmusmc:freq_distribution Aug 22, 2017
lhalvors pushed a commit that referenced this pull request Sep 13, 2017
* Frequency distribution of total persons that have at least x drug_exposure/observation/measurement records, respectively

* Refactoring column output and export routines (#3)

* Refactoring column output and export routines

* Fixing Y_NUM_PERSONS for export
PRijnbeek pushed a commit that referenced this pull request May 18, 2018
* Frequency distribution of total persons that have at least x drug_exposure/observation/measurement records, respectively

* Refactoring column output and export routines (#3)

* Refactoring column output and export routines

* Fixing Y_NUM_PERSONS for export

* Correct sub-query syntax for frequency distribution queries
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants