Skip to content

Anatomy of Q2Q3

Liang Zhisheng edited this page Nov 29, 2021 · 5 revisions

2nd Q & 3rd Q

The 2nd and 3rd Q are processed together.

  1. initial_ref_data: To obtain the reference population result of the algorithm database
  • get_ref_cal: To read the previous reference population result file. If the file was corresponding to the current analysis setting using the function "verify_data", the result would be gotten directly by python package pandas. Otherwise, a new reference population result would be created using functions "get_ref_gt", "cal_ref_from_txt" and "get_ref_prs"

  • verify_data: To verify whether the population result is corresponding to the current analysis setting

  • get_ref_gt: To obtain the genotypes of the reference population in order to calculate the new result

  • cal_ref_from_txt: To calculate result for algorithms stored in normal text file

  • get_ref_prs: To extract PRS result for algorithms stored in GWAS file

    • plink --clump --clump-p1 --clump-p2 --clump-r2 --clump-kb: To extract independent variants from raw GWAS file
    • plink --score: To calculate the polygenic risk score for each individual in reference population
  1. load_vcf: To load the genotype from user genotype file into object “Human”.
  1. load_database: To calculate the result of algorithm database and store it into object “Human”. cal_from_txt and get_prs are used and similar to cal_ref_from_txt and get_ref_prs, but get_ref_prs do not use the “plink --clump” to extract independent variants for the list of these variants is already stored.
  1. add_distribution: To add distribution information for each indicator of user according to the result of reference population. In add_dist of object Ind, python packages matplotlib is used to display the distribution in histogram or pie chart.

Q23


Clone this wiki locally