# LDSC-SEG on TWAS Meta-Analysis Results for OUD

**Author**: Jesse Marks<br>
**NIH Project**: [Harnessing Knowledge of Gene Function in Brain Tissue for Discovering Biology Underlying Heroin Addiction](https://reporter.nih.gov/search/RC99reuHhEW0n_3WuFPU6g/project-details/10116351) <br>
**Charge Code**: 0218755.001.002.001<br>
**GitHub Issue**:  [Opioid Use Disorder TWAS Meta-analysis (Uniform Processing) #183](https://github.com/RTIInternational/bioinformatics/issues/183)<br>


**Description**:<br>
This notebook details the process used to test for heritability enrichment in the significantly differentially expressed genes from our TWAS meta-analysis for 47 phenotypes. 

We performed Stratified LD Score Regression (S-LDSC) analyses using the LD score regression approach described in [Finucane et al. 2018](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5896795/) on specifically expressed genes (LDSC-SEG).
LDSC-SEG applies stratified LD score regression to test whether disease heritability is enriched in the regions surrounding genes with the highest specific expression in a given tissue.
This approach helps to interpret GWAS signal by leveraging gene expression data.
We used gene sets from a TWAS meta-analysis of 4 OUD case/control datasets:

- [Corradin et al. (2022) Molecular Psychiatry](https://doi.org/10.1038/s41380-022-01477-y)
- [Mendez et al. (2021) Molecular Psychiatry](https://doi.org/10.1038/s41380-021-01259-y)
- [Seney et al. (2021) Biological Psychiatry](https://doi.org/10.1016/j.biopsych.2021.06.007)
- [Sosnowski et al. (2022) Drug and Alcohol Dependence Reports](https://doi.org/10.1016/j.dadr.2022.100040)

The meta-analysis results are published in [THIS](https://github.com/RTIInternational/bioinformatics/issues/183#issuecomment-1523806234) GitHub comment, and we used the results with singletons removed: genes that only appear in one study.
The file name is `meta_analysis_sumstats_no_singletons_20230214.csv` and was emailed to Jesse by Javan Carter.

We filtered the results to obtain two sets of significantly expressed genes:

- List of genes with a Benjamini-Hochberg FDR <0.05

The `wfisher_adj_pvalue` column was used to create these gene lists.
This column contains the [weighted Fisher's p-value](https://www.nature.com/articles/s41598-021-86465-y) that we applied the Benjamini-Hochberg FDR thresholds to.

Our aim was to determine whether genes (and their proximal genomic regions) that show differential expression by opioid use disorder are also enriched for genetic signal associated with opioid addiction and related phenotypes.

___

<br><br>

<details>
    <summary>phenotype list</summary>
    
    
___    
    
* Age of Initiation  (Liu et al., 2019 Nat Genet [30643251](https://pubmed.ncbi.nlm.nih.gov/30643251/))
* Alcohol Dependence (Walters et al., 2018 Nat Neurosci [30482948](https://pubmed.ncbi.nlm.nih.gov/30482948))
* Alcohol Drinks per Week (DPW) (Liu et al., 2019 Nat Genet [30643251]())
* Alzheimer's Disease (Lambert et al., 2013 Nat Genet [24162737](https://pubmed.ncbi.nlm.nih.gov/24162737))
* Amyotrophic Lateral Sclerosis (Rheenen et al., 2016 Nat Genet [27455348](https://pubmed.ncbi.nlm.nih.gov/27455348))
* Anorexia Nervosa (Watson et al., 2019 Nat Genet [31308545](https://pubmed.ncbi.nlm.nih.gov/31308545))
* Attention Deficit Hyperactivity Disorder (Demontis et al., 2019 Nat Genet [30478444]())
* Autism Spectrum Disorders (Grove et al., 2019 Nat Genet [30804558](https://pubmed.ncbi.nlm.nih.gov/30804558))
* Bipolar Disorder (Stahl et al., 2019 Nat Genet [31043756](https://pubmed.ncbi.nlm.nih.gov/31043756))
* Cannabis Use Disorder (CUD) (Demontis et al., 2019 Nat Neurosci [31209380](https://pubmed.ncbi.nlm.nih.gov/31209380))
* Childhood IQ (Benyamin et al., 2014 Mol Psychiatry [23358156](https://pubmed.ncbi.nlm.nih.gov/23358156))
* Cigarettes Per Day (Liu et al., 2019 Nat Genet [30643251](https://pubmed.ncbi.nlm.nih.gov/30643251/))
* College Completion (Rietveld et al., 2013 Science [23722424](https://pubmed.ncbi.nlm.nih.gov/23722424))
* Cotinine Levels (Ware et al., 2016 Sci Rep [26833182](https://pubmed.ncbi.nlm.nih.gov/26833182/))
* Fagerstrom Test for Nicotine Dependence (FTND) (Quach et al., 2020 Nat Commun [33144568](https://pubmed.ncbi.nlm.nih.gov/33144568/))
* Heaviness of Smoking Index (HSI) (Quach et al., 2020 Nat Commun [33144568](https://pubmed.ncbi.nlm.nih.gov/33144568/))
* Insomnia (Jansen et al., 2019 Nat Genet [30804565](https://pubmed.ncbi.nlm.nih.gov/30804565/))
* Insomnia (Lane et al., 2019 Nat Genet [30804566](https://pubmed.ncbi.nlm.nih.gov/30804566/))
* Intelligence (Sniekers et al., 2017 Nat Genet [28530673](https://pubmed.ncbi.nlm.nih.gov/28530673))
* Lifetime Cannabis Use (Ever vs. Never) (Pasman et al., 2018 Nat Neurosci [30150663](https://pubmed.ncbi.nlm.nih.gov/30150663))
* LongSleepDur (Dashti et al., 2019 Nat Commun [30846698](https://pubmed.ncbi.nlm.nih.gov/30846698/))
* Major Depressive Disorder (Howard et al., 2018 Nat Commun [29662059](https://pubmed.ncbi.nlm.nih.gov/29662059))
* Mean Accumbens Volume (Hibar et al., 2015 Nature [25607358](https://pubmed.ncbi.nlm.nih.gov/25607358/))
* Mean Caudate Volume (Hibar et al., 2015 Nature [25607358](https://pubmed.ncbi.nlm.nih.gov/25607358/))
* Mean Hippocampus Volume (Hibar et al., 2015 Nature [25607358](https://pubmed.ncbi.nlm.nih.gov/25607358/))
* Mean Pallidum Volume (Hibar et al., 2015 Nature [25607358](https://pubmed.ncbi.nlm.nih.gov/25607358/))
* Mean Putamen Volume (Hibar et al., 2015 Nature [25607358](https://pubmed.ncbi.nlm.nih.gov/25607358/))
* Mean Thalamus Volume (Hibar et al., 2015 Nature [25607358](https://pubmed.ncbi.nlm.nih.gov/25607358/))
* Neo-conscientiousness (de Moor et al., 2012 Mol Psychiatry [21173776](https://pubmed.ncbi.nlm.nih.gov/21173776))
* Neo-openness to Experience (de Moor et al., 2012 Mol Psychiatry [21173776](https://pubmed.ncbi.nlm.nih.gov/21173776))
* Neuroticism (Okbay et al., 2016 Nat Genet [27089181]())
* Opioid Addiction: GENOA GWAS meta-analysis
* Opioid Addiction: gSEM OA GWAS meta-analysis (i.e., GENOA, MVP-SAGE-YP, PGC-SUD, and Partners Health)
* Parkinson's Disease (Sanchez et al., 2009 Nat Genet [19915575](https://pubmed.ncbi.nlm.nih.gov/19915575))
* Post-traumatic Stress Disorder (Nievergelt et al., 2019 Nat Commun [31594949](https://pubmed.ncbi.nlm.nih.gov/31594949))
* Psychiatric Genetics Consortium Cross-disorder GWAS (Schizophrenia, Bipolar Disorder, MDD, ASD and ADHD) (Cross-Disorder Group of the Psychiatric Genomics Consortium, 2013 Lancet [23453885](https://pubmed.ncbi.nlm.nih.gov/23453885))
* Schizophrenia (Ripke et al., 2014 Nature [25056061](https://pubmed.ncbi.nlm.nih.gov/25056061))
* ShortSleepDur (Dashti et al., 2019 Nat Commun [30846698](https://pubmed.ncbi.nlm.nih.gov/30846698/))
* sleepDuration (Dashti et al., 2019 Nat Commun [30846698](https://pubmed.ncbi.nlm.nih.gov/30846698/))
* Sleepdur (Jansen et al., 2019 Nat Genet [30804565](https://pubmed.ncbi.nlm.nih.gov/30804565/))
* Smoking Cessation (Liu et al., 2019 Nat Genet [30643251](https://pubmed.ncbi.nlm.nih.gov/30643251/))
* Smoking Initiation (Liu et al., 2019 Nat Genet [30643251](https://pubmed.ncbi.nlm.nih.gov/30643251/))
* Subjective Well Being (Okbay et al., 2016 Nat Genet [27089181](https://pubmed.ncbi.nlm.nih.gov/27089181))
* Total Intracranial Volume (ICV) (Hibar et al., 2015 Nature [25607358](https://pubmed.ncbi.nlm.nih.gov/25607358/))
* Years of Schooling (Okbay et al., 2016 Nature [27225129](https://pubmed.ncbi.nlm.nih.gov/27225129))
</details><br><br>
    
    

## Create filtered gene list
Create a filtered gene list from the meta-analysis. Significant genes at FDR <0.05 when singletons are excluded.
Meta-analysis summary statistics without singletons are posted in this [comment](https://github.com/RTIInternational/bioinformatics/issues/183#issuecomment-1523806234) as file `meta_analysis_sumstats_no_singletons_20230214.csv`.

In [7]:
%%bash

cd /Users/jmarks/projects/heroin/ldsc/oud_twas_meta_issue183/20230426/
head -2 meta_analysis_sumstats_no_singletons_20230214.csv

gencode_id,gene_name,base_mean_expression_corradin,base_mean_log2cpm_corradin,fold_change_corradin,log2_fold_change_corradin,log2_fold_change_se_corradin,test_statistic_corradin,b_statistic_corradin,pvalue_corradin,adjusted_pvalue_corradin,base_mean_expression_mendez,base_mean_log2cpm_mendez,fold_change_mendez,log2_fold_change_mendez,log2_fold_change_se_mendez,test_statistic_mendez,b_statistic_mendez,pvalue_mendez,adjusted_pvalue_mendez,base_mean_expression_seney,base_mean_log2cpm_seney,fold_change_seney,log2_fold_change_seney,log2_fold_change_se_seney,test_statistic_seney,b_statistic_seney,pvalue_seney,adjusted_pvalue_seney,base_mean_expression_sosnowski,base_mean_log2cpm_sosnowski,fold_change_sosnowski,log2_fold_change_sosnowski,log2_fold_change_se_sosnowski,test_statistic_sosnowski,b_statistic_sosnowski,pvalue_sosnowski,adjusted_pvalue_sosnowski,num_datasets,fc_sign_corradin,fc_sign_mendez,fc_sign_seney,fc_sign_sosnowski,wfisher_fc_sign,wfisher_pvalue,wfisher_adj_pvalue
ENSG00000000

In [12]:
%%bash

cd /Users/jmarks/projects/heroin/ldsc/oud_twas_meta_issue183/20230426/
# verify the last column is the wfisher_adj_pvalue column (the Benjamini-Hochberg pvalue)
head -2 meta_analysis_sumstats_no_singletons_20230214.csv | awk -F"," '{print $NF}'

# create FDR filtered file: 0.05
#gunzip --to-stdout meta_analysis_sumstats_no_singletons_20220727.tsv.gz | \
head -1 meta_analysis_sumstats_no_singletons_20230214.csv > \
  meta_analysis_sumstats_no_singletons_20230214_fdr0.05.tsv
tail -n +2 meta_analysis_sumstats_no_singletons_20230214.csv | \
  awk -F"," '$NF < 0.05 ' >> meta_analysis_sumstats_no_singletons_20230214_fdr0.05.tsv

wfisher_adj_pvalue
0.354047100238941


In [15]:
%%bash

# extract just the gencode_id
cd /Users/jmarks/projects/heroin/ldsc/oud_twas_meta_issue183/20230426/
tail -n +2 meta_analysis_sumstats_no_singletons_20230214_fdr0.05.tsv | \
cut -d"," -f1 > meta_analysis_sumstats_no_singletons_20230214_fdr0.05_genecode_id.tsv

## munge sumstats
Some of the sumstats were already munged, so we will just use them when we can.
Others we will have to download an munge.


* ~Age of Initiation (Liu et al., 2019 Nat Genet 30643251)~
* ~Alcohol Dependence (Walters et al., 2018 Nat Neurosci 30482948)~
* ~Alcohol Drinks per Week (DPW) (Liu et al., 2019 Nat Genet 30643251)~
* ~Alzheimer's Disease (Lambert et al., 2013 Nat Genet 24162737)~
* ~Amyotrophic Lateral Sclerosis (Rheenen et al., 2016 Nat Genet 27455348)~
* ~Anorexia Nervosa (Watson et al., 2019 Nat Genet 31308545)~
* ~Attention Deficit Hyperactivity Disorder (Demontis et al., 2019 Nat Genet 30478444)~
* ~Autism Spectrum Disorders (Grove et al., 2019 Nat Genet 30804558)~
* ~Bipolar Disorder (Stahl et al., 2019 Nat Genet 31043756)~
* ~Brain Volume: Mean Accumbens Volume (Hibar et al., 2015 Nature 25607358)~
* ~Brain Volume: Mean Amygdala Volume (Hibar et al., 2015 Nature 25607358)~
* ~Brain Volume: Mean Caudate Volume (Hibar et al., 2015 Nature 25607358)~
* ~Brain Volume: Mean Hippocampus Volume (Hibar et al., 2015 Nature 25607358)~
* ~Brain Volume: Mean Pallidum Volume (Hibar et al., 2015 Nature 25607358)~
* ~Brain Volume: Mean Putamen Volume (Hibar et al., 2015 Nature 25607358)~
* ~Brain Volume: Mean Thalamus Volume (Hibar et al., 2015 Nature 25607358)~
* ~Brain Volume: Total Intracranial Volume (ICV) (Hibar et al., 2015 Nature 25607358)~
* ~Cannabis Use Disorder (CUD) (Demontis et al., 2019 Nat Neurosci 31209380)~
* ~Childhood IQ (Benyamin et al., 2014 Mol Psychiatry 23358156)~
* ~Cigarettes Per Day (Liu et al., 2019 Nat Genet 30643251)~
* ~College Completion (Rietveld et al., 2013 Science 23722424)~
* ~Cotinine Levels (Ware et al., 2016 Sci Rep 26833182)~
* ~Depressive Symptoms (Okbay et al., 2016 Nat Genet 27089181)~
* ~Fagerstrom Test for Nicotine Dependence (FTND) (Quach et al., 2020 Nat Commun 33144568)~
* ~Heaviness of Smoking Index (HSI) (Quach et al., 2020 Nat Commun 33144568)~
* ~Intelligence (Sniekers et al., 2017 Nat Genet 28530673)~
* ~Lifetime Cannabis Use (Ever vs. Never) (Pasman et al., 2018 Nat Neurosci 30150663)~
* ~Major Depressive Disorder (Howard et al., 2018 Nat Commun 29662059)~
* ~Neo-conscientiousness (de Moor et al., 2012 Mol Psychiatry 21173776)~
* ~Neo-openness to Experience (de Moor et al., 2012 Mol Psychiatry 21173776)~
* ~Neuroticism (Okbay et al., 2016 Nat Genet 27089181)~
* ~Opioid Addiction: GENOA GWAS meta-analysis~
* ~Opioid Addiction: gSEM OA GWAS meta-analysis (i.e., GENOA, MVP-SAGE-YP, PGC-SUD, and Partners Health)~
* ~Parkinson's Disease (Sanchez et al., 2009 Nat Genet 19915575)~
* ~Post-traumatic Stress Disorder (Nievergelt et al., 2019 Nat Commun 31594949)~
* ~Psychiatric Genetics Consortium Cross-disorder GWAS (Schizophrenia, Bipolar Disorder, MDD, ASD and ADHD) (Cross-Disorder Group of the Psychiatric Genomics Consortium, 2013 Lancet 23453885)~
* ~Schizophrenia (Ripke et al., 2014 Nature 25056061)~
* ~Smoking Cessation (Liu et al., 2019 Nat Genet 30643251)~
* ~Smoking Initiation (Liu et al., 2019 Nat Genet 30643251)~
* ~Subjective Well Being (Okbay et al., 2016 Nat Genet 27089181)~
* ~Years of Education (Okbay et al., 2022 Nature Genetics  35361970)~




In [None]:
cd sumstats/

aws s3 cp s3://rti-shared/ldsc/data/gscan_liu2019/munged/AgeOfInitiation.txt.munged.merged.txt.gz .
aws s3 cp s3://rti-shared/ldsc/data/alcohol_dependence_walters2018_nat_neurosci/munged/pgc_alcdep.eur_discovery.aug2018_release.txt.munged.merged.txt.gz .
aws s3 cp s3://rti-shared/ldsc/data/gscan_liu2019/munged/DrinksPerWeek.txt.munged.merged.txt.gz .
aws s3 cp s3://rti-shared/ldsc/data/alzheimers_disease_lambert2013_nat_genet/munged/alzheimers_disease_lambert2013_nat_genet.sumstats.gz .
aws s3 cp s3://rti-shared/ldsc/data/amyotrophic_lateral_sclerosis_rheenen2016_nat_genet/munged/amyotrophic_lateral_sclerosis_rheenen2016_nat_genet.sumstats.gz .
aws s3 cp s3://rti-shared/ldsc/data/anorexia_watson2019_nat_genet/munged/anorexia_watson2019_workflow_ready.txt.munged.merged.txt.gz .
aws s3 cp s3://rti-shared/ldsc/data/adhd_demontis2018_nat_genet/munged/daner_meta_filtered_NA_iPSYCH23_PGC11_sigPCs_woSEX_2ell6sd_EUR_Neff_70.meta.munged.merged.txt.gz .
aws s3 cp s3://rti-shared/ldsc/data/autism_spectrum_disorder_grove2019_nat_genet/munged/iPSYCH-PGC_ASD_Nov2017.munged.merged.txt.gz . 
aws s3 cp s3://rti-shared/ldsc/data/bipolar_disorder_stahl2019_nat_genet/munged/daner_PGC_BIP32b_mds7a_0416a.munged.merged.txt.gz .

aws s3 cp s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/ENIGMA2_MeanAccumbens_Combined_GenomeControlled_Jan23.tbl.sumstats.gz . # Mean Accumbens Volume (Hibar et al., 2015 Nature 25607358)
aws s3 cp s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/ENIGMA2_MeanAmygdala_Combined_GenomeControlled_Jan23.tbl.sumstats.gz .
aws s3 cp s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/ENIGMA2_MeanCaudate_Combined_GenomeControlled_Jan23.tbl.sumstats.gz . # Mean Caudate Volume (Hibar et al., 2015 Nature 25607358)
aws s3 cp s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/ENIGMA2_MeanHippocampus_Combined_GenomeControlled_Jan23.tbl.sumstats.gz . # Mean Hippocampus Volume (Hibar et al., 2015 Nature 25607358)
aws s3 cp s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/ENIGMA2_MeanPallidum_Combined_GenomeControlled_Jan23.tbl.sumstats.gz . # Mean Pallidum Volume (Hibar et al., 2015 Nature 25607358)
aws s3 cp s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/ENIGMA2_MeanPutamen_Combined_GenomeControlled_Jan23.tbl.sumstats.gz . # Mean Putamen Volume (Hibar et al., 2015 Nature 25607358)
aws s3 cp s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/ENIGMA2_MeanThalamus_Combined_GenomeControlled_Jan23.tbl.sumstats.gz . # Mean Thalamus Volume (Hibar et al., 2015 Nature 25607358)
aws s3 cp s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/ENIGMA2_ICV_Combined_GenomeControlled_Jan23.tbl.sumstats.gz .
#17

aws s3 cp s3://rti-shared/ldsc/data/cannabis_use_disorder_demontis2019_nat_neurosci/munged/CUD_GWAS_iPSYCH_June2019.munged.merged.txt.gz .
aws s3 cp s3://rti-shared/ldsc/data/childhood_intelligence_benyamin2014_mol_psych/munged/CHIC_Summary_Benyamin2014.sumstats.gz . # Childhood IQ s3://rti-shared/gwas_publicly_available_sumstats/childhood_intelligence_benyamin2014_mol_psych/raw/CHIC_Summary_Benyamin2014.txt.gz
aws s3 cp s3://rti-shared/ldsc/data/gscan_liu2019/munged/CigarettesPerDay.txt.munged.merged.txt.gz .
aws s3 cp s3://rti-shared/ldsc/data/educational_attainment_rietveld2013_science/munged/SSGAC_College_Rietveld2013_publicrelease.sumstats.gz . # College completion 
aws s3 cp s3://rti-shared/ldsc/data/cotinine_levels_ware2016_sci_rep/munged/cotinine_ware2016_workflow_ready.txt.munged.merged.txt.gz .
aws s3 cp s3://rti-shared/ldsc/data/depressive_symptoms_okbay2016/munged/DS_Full.txt.munged.merged.txt.gz .
aws s3 cp s3://rti-shared/ldsc/data/nicotine_dependence_quach2020_nat_commun/munged/ftnd_wave3_eur_quach2020_workflow_ready.txt.munged.merged.txt.gz .
aws s3 cp s3://rti-shared/ldsc/data/ukb_hsi/munged/ukb_gwa_003_workflow_ready.txt.munged.merged.txt.gz .
aws s3 cp s3://rti-shared/ldsc/data/intelligence_sniekers2017_nat_genet/munged/intelligence_sniekers2017_nat_genet_sumstats_formatted.sumstats.gz .
aws s3 cp s3://rti-shared/ldsc/data/lifetime_cannabis_use_pasman2018_nat_neurosci/munged/cannabis_icc_ukb_workflow_ready.txt.munged.merged.txt.gz .
aws s3 cp s3://rti-shared/ldsc/data/major_depressive_disorder_howard2018_nat_commun/munged/pgc_ukb_depression_gwas_workflow_ready.txt.munged.merged.txt.gz .
#28

aws s3 cp s3://rti-shared/ldsc/data/personality_demoor_2012_mol_psych/munged/GPC-1.NEO-CONSCIENTIOUSNESS.full.with_header.h19.sumstats.gz . # Neo-conscientiousness (de Moor et al., 2012 Mol Psychiatry 21173776)
aws s3 cp s3://rti-shared/ldsc/data/personality_demoor_2012_mol_psych/munged/GPC-1.NEO-OPENNESS.full.with_header.hg19.sumstats.gz .# Neo-openness to Experience (de Moor et al., 2012 Mol Psychiatry 21173776)

aws s3 cp s3://rti-shared/ldsc/data/neuroticism_okbay2016_nat_genet/munged/neuroticism_okbay2016_nat_genet.sumstats.gz .
aws s3 cp s3://rti-shared/ldsc/data/opioid_addiction_gaddis_mathur2022_sci_rep/munged/cats+coga+decode+kreek+odb+uhs+vidus+yale-penn.ea.chrall.maf_gt_0.01.rsq_gt_0.8.sumstats_formatted.sumstats.gz .
aws s3 cp s3://rti-heroin/rti-midas-data/studies/ngc/GenomicSEM/results/29/gSEM/final/munged/genomicSEM_GWAS.oaALL.MVP1_MVP2_YP_SAGE.PGC.Song.table.sumstats.gz .
aws s3 cp s3://rti-shared/ldsc/data/parkinsons_disease_sanchez2009_nat_genet/munged/parkinsons_disease_sanchez2009_nat_genet.sumstats.gz .
aws s3 cp s3://rti-shared/ldsc/data/ptsd_nievergelt2019_nat_commun/munged/pts_eur_freeze2_overall.results.munged.merged.txt.gz .
aws s3 cp s3://rti-shared/ldsc/data/cross_disorder_gwas_pgc2013_lancet/munged/pgc.cross.full.2013-03.hg19.sumstats.gz . # pgc needs a liftover from hg18 (s3://rti-shared/gwas_publicly_available_sumstats/cross_disorder_gwas_pgc2013_lancet/raw/)
aws s3 cp s3://rti-shared/ldsc/data/schizophrenia_ripke2014_nature/munged/daner_natgen_pgc_eur.munged.merged.txt.gz .
aws s3 cp s3://rti-shared/ldsc/data/gscan_liu2019/munged/SmokingCessation.txt.munged.merged.txt.gz .
aws s3 cp s3://rti-shared/ldsc/data/gscan_liu2019/munged/SmokingInitiation.txt.munged.merged.txt.gz .
aws s3 cp s3://rti-shared/ldsc/data/subjective_wellbeing_okbay2016_nat_genet/munged/SWB_Full.sumstats.gz . # Subjective Well Being (Okbay et al., 2016 Nat Genet 27089181)
aws s3 cp s3://rti-shared/ldsc/data/years_schooling_okbay2022_nat_genet/munged/EA4_additive_excl_23andMe.sumstats.gz . # Years of Education (Okbay et al., 2022 Nature Genetics  35361970)
#41

aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/insomnia_jansen2019_nat_genet/raw/Insomnia-Jansen_2019.sumstats.gz .
aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/insomnia_lane2019_nat_genet/raw/Insomnia-Lane_2019.sumstats.gz .
aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/long_sleep_duration_dashti2019_nat_commun/raw/LongSleepDur-Dashti_2019.sumstats.gz .
aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/short_sleep_duration_dashti2019_nat_commun/raw/ShortSleepDur-Dashti_2019.sumstats.gz .
aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/sleep_duration_dashti2019_nat_commun/raw/SleepDuration-Dashti_2019.sumstats.gz .
aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/sleep_duration_jansen2019_nat_genet/raw/Sleepdur-Jansen_2019.sumstats.gz .
#46

### Opioid Addiction
initiated restore 11/2/2022

In [None]:
# use ldsc tool munge_sumstats.py to convert to sumstats format (https://github.com/bulik/ldsc/wiki/Summary-Statistics-File-Format)

s3://rti-heroin/rti-midas-data/studies/ngc/meta/144/processing/
for chr in {1..22}; do
    aws s3 cp s3://rti-heroin/rti-midas-data/studies/ngc/meta/144/processing/oaall/cats+coga+decode+kreek+odb+uhs+vidus+yale-penn.ea.chr$chr.maf_gt_0.01.rsq_gt_0.8.tsv.gz .
done

# combine into 1 file
# keep only SNPs with rsIDs, and just keep the rsID portion of MarkerName
zcat cats+coga+decode+kreek+odb+uhs+vidus+yale-penn.ea.chr1.maf_gt_0.01.rsq_gt_0.8.tsv.gz  | head -1 > \
    cats+coga+decode+kreek+odb+uhs+vidus+yale-penn.ea.chrall.maf_gt_0.01.rsq_gt_0.8.tsv

for chr in {1..22}; do
    awk '{split($1,a,":")} 
        {
            $1=a[1] 
            b=substr($1,1,2)
            {if (b=="rs") 
                {print $0}
            }
        }' OFS="\t" <(zcat cats+coga+decode+kreek+odb+uhs+vidus+yale-penn.ea.chr$chr.maf_gt_0.01.rsq_gt_0.8.tsv.gz | tail -n +2) \
            >> cats+coga+decode+kreek+odb+uhs+vidus+yale-penn.ea.chrall.maf_gt_0.01.rsq_gt_0.8.tsv
done


# munge:  docker interactive mode
docker run -it -v $PWD:/data/ rtibiocloud/ldsc:v1.0.1_9501d4d bash
python /opt/ldsc/munge_sumstats.py \
    --sumstats cats+coga+decode+kreek+odb+uhs+vidus+yale-penn.ea.chrall.maf_gt_0.01.rsq_gt_0.8.tsv \
    --snp MarkerName \
    --N-cas 7281 \
    --N-con 297550 \
    --a1 Allele1 \
    --a2 Allele2 \
    --p P-value \
    --signed-sumstats Effect,0 \
    --out cats+coga+decode+kreek+odb+uhs+vidus+yale-penn.ea.chrall.maf_gt_0.01.rsq_gt_0.8.sumstats_formatted


    
# upload to s3
aws s3 cp cats+coga+decode+kreek+odb+uhs+vidus+yale-penn.ea.chrall.maf_gt_0.01.rsq_gt_0.8.sumstats_formatted.sumstats.gz s3://rti-shared/ldsc/data/opioid_addiction_gaddis_mathur2022_sci_rep/munged/
aws s3 cp cats+coga+decode+kreek+odb+uhs+vidus+yale-penn.ea.chrall.maf_gt_0.01.rsq_gt_0.8.sumstats_formatted.log s3://rti-shared/ldsc/data/opioid_addiction_gaddis_mathur2022_sci_rep/munged/

### Brain Volume
use ldsc tool munge_sumstats.py to convert to sumstats format (https://github.com/bulik/ldsc/wiki/Summary-Statistics-File-Format)

In [None]:
# Mean Accumbens Volume (Hibar et al., 2015 Nature 25607358)

aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/brain_volume_hibar2015_nature/ENIGMA2_MeanAccumbens_Combined_GenomeControlled_Jan23.tbl.gz .

zcat ENIGMA2_MeanAccumbens_Combined_GenomeControlled_Jan23.tbl.gz  | head
#RSID CHR_BP_hg19b37 Effect_Allele Non_Effect_Allele Freq_European_1000Genomes Effect_Beta StdErr Pvalue N
#rs667647 5:29439275 T C 0.347 0.9454 1.1303 0.4029 13112

# interactive mode
docker run -it -v $PWD:/data/ rtibiocloud/ldsc:v1.0.1_9501d4d bash
python /opt/ldsc/munge_sumstats.py \
    --sumstats ENIGMA2_MeanAccumbens_Combined_GenomeControlled_Jan23.tbl.gz \
    --snp RSID \
    --N-col N \
    --a1 Effect_Allele  \
    --a2 Non_Effect_Allele \
    --p Pvalue \
    --signed-sumstats Effect_Beta,0 \
    --out ENIGMA2_MeanAccumbens_Combined_GenomeControlled_Jan23.tbl


# upload to s3
aws s3 cp ENIGMA2_MeanAccumbens_Combined_GenomeControlled_Jan23.tbl.sumstats.gz s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/
aws s3 cp ENIGMA2_MeanAccumbens_Combined_GenomeControlled_Jan23.tbl.log s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/


In [None]:
# Mean Amygdala Volume (Hibar et al., 2015 Nature 25607358)

aws s3 cp  s3://rti-shared/gwas_publicly_available_sumstats/brain_volume_hibar2015_nature/ENIGMA2_MeanAmygdala_Combined_GenomeControlled_Jan23.tbl.gz .

zcat ENIGMA2_MeanAmygdala_Combined_GenomeControlled_Jan23.tbl.gz  | head -2
#RSID CHR_BP_hg19b37 Effect_Allele Non_Effect_Allele Freq_European_1000Genomes Effect_Beta StdErr Pvalue N
#rs667647 5:29439275 T C 0.347 2.3536 2.4545 0.3376 13160

# interactive mode
docker run -it -v $PWD:/data/ rtibiocloud/ldsc:v1.0.1_9501d4d bash
python /opt/ldsc/munge_sumstats.py \
    --sumstats ENIGMA2_MeanAmygdala_Combined_GenomeControlled_Jan23.tbl.gz \
    --snp RSID \
    --N-col N \
    --a1 Effect_Allele  \
    --a2 Non_Effect_Allele \
    --p Pvalue \
    --signed-sumstats Effect_Beta,0 \
    --out ENIGMA2_MeanAmygdala_Combined_GenomeControlled_Jan23.tbl


# upload to s3
aws s3 cp ENIGMA2_MeanAmygdala_Combined_GenomeControlled_Jan23.tbl.sumstats.gz s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/
aws s3 cp ENIGMA2_MeanAmygdala_Combined_GenomeControlled_Jan23.tbl.log s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/


In [None]:
# Mean Caudate Volume (Hibar et al., 2015 Nature 25607358)
aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/brain_volume_hibar2015_nature/ENIGMA2_MeanCaudate_Combined_GenomeControlled_Jan23.tbl.gz .

zcat ENIGMA2_MeanCaudate_Combined_GenomeControlled_Jan23.tbl.gz  | head
#RSID CHR_BP_hg19b37 Effect_Allele Non_Effect_Allele Freq_European_1000Genomes Effect_Beta StdErr Pvalue N
#rs667647 5:29439275 T C 0.347 3.1005 5.1190 0.5447 13171

docker run -it -v $PWD:/data/ rtibiocloud/ldsc:v1.0.1_9501d4d bash
python /opt/ldsc/munge_sumstats.py \
    --sumstats ENIGMA2_MeanCaudate_Combined_GenomeControlled_Jan23.tbl.gz \
    --snp RSID \
    --N-col N \
    --a1 Effect_Allele  \
    --a2 Non_Effect_Allele \
    --p Pvalue \
    --signed-sumstats Effect_Beta,0 \
    --out ENIGMA2_MeanCaudate_Combined_GenomeControlled_Jan23.tbl


# upload to s3
aws s3 cp ENIGMA2_MeanCaudate_Combined_GenomeControlled_Jan23.tbl.sumstats.gz s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/
aws s3 cp ENIGMA2_MeanCaudate_Combined_GenomeControlled_Jan23.tbl.log s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/

In [None]:
# Mean Hippocampus Volume (Hibar et al., 2015 Nature 25607358)
aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/brain_volume_hibar2015_nature/ENIGMA2_MeanHippocampus_Combined_GenomeControlled_Jan23.tbl.gz .

zcat ENIGMA2_MeanHippocampus_Combined_GenomeControlled_Jan23.tbl.gz | head -2
#RSID CHR_BP_hg19b37 Effect_Allele Non_Effect_Allele Freq_European_1000Genomes Effect_Beta StdErr Pvalue N
#rs667647 5:29439275 T C 0.347 -7.4896 4.9232 0.1282 13163


docker run -it -v $PWD:/data/ rtibiocloud/ldsc:v1.0.1_9501d4d bash
python /opt/ldsc/munge_sumstats.py \
    --sumstats ENIGMA2_MeanHippocampus_Combined_GenomeControlled_Jan23.tbl.gz \
    --snp RSID \
    --N-col N \
    --a1 Effect_Allele  \
    --a2 Non_Effect_Allele \
    --p Pvalue \
    --signed-sumstats Effect_Beta,0 \
    --out ENIGMA2_MeanHippocampus_Combined_GenomeControlled_Jan23.tbl

    
# upload to s3
aws s3 cp ENIGMA2_MeanHippocampus_Combined_GenomeControlled_Jan23.tbl.sumstats.gz s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/
aws s3 cp ENIGMA2_MeanHippocampus_Combined_GenomeControlled_Jan23.tbl.log s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/


In [None]:
# Mean Pallidum Volume (Hibar et al., 2015 Nature 25607358)
aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/brain_volume_hibar2015_nature/ENIGMA2_MeanPallidum_Combined_GenomeControlled_Jan23.tbl.gz .

zcat ENIGMA2_MeanPallidum_Combined_GenomeControlled_Jan23.tbl.gz  | head -2
#RSID CHR_BP_hg19b37 Effect_Allele Non_Effect_Allele Freq_European_1000Genomes Effect_Beta StdErr Pvalue N
#rs667647 5:29439275 T C 0.347 -3.0672 2.0149 0.1279 13142

docker run -it -v $PWD:/data/ rtibiocloud/ldsc:v1.0.1_9501d4d bash
python /opt/ldsc/munge_sumstats.py \
    --sumstats ENIGMA2_MeanPallidum_Combined_GenomeControlled_Jan23.tbl.gz \
    --snp RSID \
    --N-col N \
    --a1 Effect_Allele  \
    --a2 Non_Effect_Allele \
    --p Pvalue \
    --signed-sumstats Effect_Beta,0 \
    --out ENIGMA2_MeanPallidum_Combined_GenomeControlled_Jan23.tbl


# upload to s3
aws s3 cp ENIGMA2_MeanPallidum_Combined_GenomeControlled_Jan23.tbl.sumstats.gz s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/
aws s3 cp ENIGMA2_MeanPallidum_Combined_GenomeControlled_Jan23.tbl.log s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/

In [None]:
# Mean Putamen Volume (Hibar et al., 2015 Nature 25607358)
aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/brain_volume_hibar2015_nature/ENIGMA2_MeanPutamen_Combined_GenomeControlled_Jan23.tbl.gz .

zcat ENIGMA2_MeanPutamen_Combined_GenomeControlled_Jan23.tbl.gz  | head
#RSID CHR_BP_hg19b37 Effect_Allele Non_Effect_Allele Freq_European_1000Genomes Effect_Beta StdErr Pvalue N
#rs667647 5:29439275 T C 0.347 -3.2910 6.2791 0.6002 13145

docker run -it -v $PWD:/data/ rtibiocloud/ldsc:v1.0.1_9501d4d bash
python /opt/ldsc/munge_sumstats.py \
    --sumstats ENIGMA2_MeanPutamen_Combined_GenomeControlled_Jan23.tbl.gz \
    --snp RSID \
    --N-col N \
    --a1 Effect_Allele  \
    --a2 Non_Effect_Allele \
    --p Pvalue \
    --signed-sumstats Effect_Beta,0 \
    --out ENIGMA2_MeanPutamen_Combined_GenomeControlled_Jan23.tbl


# note that I had to manually edit the munge_sumstats.py file, in particular I substituted line 1 for line 2. This was because I was getting an error saying (ValueError: WARNING: median value of SIGNED_SUMSTATS is 0.13 (should be close to 0.0). This column may be mislabeled.). Raymond Walters suggested lessening the tolerance threshold a bit. see https://groups.google.com/g/ldsc_users/c/RLbVw3e_PU0
# verifying median value in R > median(df$Effect_Beta) [1] 0.129
# 1. check_median(dat.SIGNED_SUMSTAT, signed_sumstat_null, 0.10, sign_cname))
# 2. check_median(dat.SIGNED_SUMSTAT, signed_sumstat_null, 0.15, sign_cname))


# upload to s3
aws s3 cp ENIGMA2_MeanPutamen_Combined_GenomeControlled_Jan23.tbl.sumstats.gz s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/
aws s3 cp ENIGMA2_MeanPutamen_Combined_GenomeControlled_Jan23.tbl.log s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/

In [None]:
# Mean Thalamus Volume (Hibar et al., 2015 Nature 25607358)
aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/brain_volume_hibar2015_nature/ENIGMA2_MeanThalamus_Combined_GenomeControlled_Jan23.tbl.gz .

zcat ENIGMA2_MeanThalamus_Combined_GenomeControlled_Jan23.tbl.gz  | head -2
#RSID CHR_BP_hg19b37 Effect_Allele Non_Effect_Allele Freq_European_1000Genomes Effect_Beta StdErr Pvalue N
#rs667647 5:29439275 T C 0.347 -3.0636 6.5794 0.6415 13193

docker run -it -v $PWD:/data/ rtibiocloud/ldsc:v1.0.1_9501d4d bash
python /opt/ldsc/munge_sumstats.py \
    --sumstats ENIGMA2_MeanThalamus_Combined_GenomeControlled_Jan23.tbl.gz \
    --snp RSID \
    --N-col N \
    --a1 Effect_Allele  \
    --a2 Non_Effect_Allele \
    --p Pvalue \
    --signed-sumstats Effect_Beta,0 \
    --out ENIGMA2_MeanThalamus_Combined_GenomeControlled_Jan23.tbl


# upload to s3
aws s3 cp ENIGMA2_MeanThalamus_Combined_GenomeControlled_Jan23.tbl.sumstats.gz s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/
aws s3 cp ENIGMA2_MeanThalamus_Combined_GenomeControlled_Jan23.tbl.log s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/

####  Total Intracranial Volume: to-do


There were issues with this that I need to resolve.
```
I attempted to perform LDSC with this set of GWAS results. However, when I attempted to munge the sumstats I received the following error message:


ValueError: WARNING: median value of SIGNED_SUMSTATS is -22.41 (should be close to 0.0). This column may be mislabeled.

 

I calculated this in R to verify (note it was not exactly the same but that is because the munge_sumstats.py script provided by the LDSC repo removed some SNPs during QC):

> df <- read.table("ENIGMA2_ICV_Combined_GenomeControlled_Jan23.tbl.gz", header=TRUE)

> median(df$Effect_Beta).   # [1] -20.121

 

Do you know why the Effect_Beta column would not be close to 0? Here is a sample of the file – it looks like the beta values are way away from 0 and the SE is huge.

zcat ENIGMA2_ICV_Combined_GenomeControlled_Jan23.tbl.gz | head

RSID CHR_BP_hg19b37 Effect_Allele Non_Effect_Allele Freq_European_1000Genomes Effect_Beta StdErr Pvalue N

rs667647 5:29439275 T C 0.347 -148.8340 2029.8618 0.9415 11373

rs113534962 5:85928892 T C 0.05013 6239.4568 4368.8262 0.1532 11373

rs2366866 10:128341232 T C 0.4578 1025.8514 1895.1917 0.5883 11373

rs472303 3:62707519 T C 0.06332 5058.2986 4040.6084 0.2106 11373
```

**response**:
```
This is because the GWAS was run in the raw scale of the data not on residuals or z scores.
The variable you are looking at is the basically volume the brain in mm3, as a consequence all beta and se are scaled accordingly
```


<br>

**What we did**:
Edited the script `munge_sumstats.py` so that it would not throw an error when checking the median.

In [None]:
# Total Intracranial Volume (ICV) (Hibar et al., 2015 Nature 25607358) 
aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/brain_volume_hibar2015_nature/ENIGMA2_ICV_Combined_GenomeControlled_Jan23.tbl.gz .

zcat ENIGMA2_ICV_Combined_GenomeControlled_Jan23.tbl.gz  | head -2
#RSID CHR_BP_hg19b37 Effect_Allele Non_Effect_Allele Freq_European_1000Genomes Effect_Beta StdErr Pvalue N
#rs667647 5:29439275 T C 0.347 -148.8340 2029.8618 0.9415 11373

docker run -it -v $PWD:/data/ rtibiocloud/ldsc:v1.0.1_9501d4d bash

# we edited tolerance in munge_sumstats.py so this would pass QC
python /opt/ldsc/munge_sumstats.py \
    --sumstats ENIGMA2_ICV_Combined_GenomeControlled_Jan23.tbl.gz \
    --snp RSID \
    --N-col N \
    --a1 Effect_Allele  \
    --a2 Non_Effect_Allele \
    --p Pvalue \
    --signed-sumstats Effect_Beta,0 \
    --out ENIGMA2_ICV_Combined_GenomeControlled_Jan23.tbl



# upload to s3
aws s3 cp ENIGMA2_ICV_Combined_GenomeControlled_Jan23.tbl.sumstats.gz s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/
aws s3 cp ENIGMA2_ICV_Combined_GenomeControlled_Jan23.tbl.log s3://rti-shared/ldsc/data/brain_volume_hibar2015_nature/munged/

### Childhood IQ

In [None]:
aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/childhood_intelligence_benyamin2014_mol_psych/raw/CHIC_Summary_Benyamin2014.txt.gz .

zcat CHIC_Summary_Benyamin2014.txt.gz  | head
SNP CHR BP A1 A2 FREQ_A1 EFFECT_A1 SE P
#rs1000000 chr12 125415860 A G 0.373 -0.0143 0.0156 0.3604
#rs10000010 chr4 21294943 T C 0.575 0.0028 0.0127 0.8237


# munge:  docker interactive mode
docker run -it -v $PWD:/data/ rtibiocloud/ldsc:v1.0.1_9501d4d bash
python /opt/ldsc/munge_sumstats.py \
    --sumstats CHIC_Summary_Benyamin2014.txt.gz \
    --snp SNP \
    --N 17989 \
    --a1 A1 \
    --a2 A2 \
    --p P \
    --signed-sumstats EFFECT_A1,0 \
    --out CHIC_Summary_Benyamin2014
    
# upload to s3
aws s3 cp CHIC_Summary_Benyamin2014.sumstats.gz s3://rti-shared/ldsc/data/childhood_intelligence_benyamin2014_mol_psych/munged/
aws s3 cp CHIC_Summary_Benyamin2014.log s3://rti-shared/ldsc/data/childhood_intelligence_benyamin2014_mol_psych/munged/

### College Completion
Rietvald Science 2013


In [None]:
aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/educational_attainment_rietveld2013_science/raw/SSGAC_Rietveld2013.zip .

unzip SSGAC_Rietveld2013.zip

head SSGAC_College_Rietveld2013_publicrelease.txt
#MarkerName      Effect_Allele   Other_Allele    EAF     OR      SE      Pvalue
#rs3813193       C       G       0.16    0.985   0.014   0.2823
#rs4075116       T       C       0.78    1.018   0.012   0.1199


docker run -it -v $PWD:/data/ rtibiocloud/ldsc:v1.0.1_9501d4d bash
python /opt/ldsc/munge_sumstats.py \
    --sumstats SSGAC_College_Rietveld2013_publicrelease.txt \
    --snp MarkerName \
    --N 95427 \
    --a1 Effect_Allele  \
    --a2 Other_Allele \
    --p Pvalue \
    --signed-sumstats OR,1 \
    --out SSGAC_College_Rietveld2013_publicrelease

# upload to s3
aws s3 cp SSGAC_College_Rietveld2013_publicrelease.sumstats.gz s3://rti-shared/ldsc/data/educational_attainment_rietveld2013_science/munged/
aws s3 cp SSGAC_College_Rietveld2013_publicrelease.log s3://rti-shared/ldsc/data/educational_attainment_rietveld2013_science/munged/

### Cross-Disorder (PGC)
build 36

In [None]:
aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/cross_disorder_gwas_pgc2013_lancet/raw/pgc.cross.full.2013-03.zip .

unzip pgc.cross.full.2013-03.zip

head pgc.cross.full.2013-03.txt
#snpid hg18chr bp a1 a2 or se pval info ngt CEUaf
#rs3131972       1       742584  A       G       1.047   0.0361  0.2032  0.774   0       0.16055
#rs3131969       1       744045  A       G       1.067   0.0377  0.08597 0.939   0       0.133028

# need to convert to build 37
# use liftover workflow: liftover_genomic_annotations
git clone --recurse-submodules https://github.com/RTIInternational/biocloud_gwas_workflows

# once liftover completed, now we can use munge_sumstats.py

head pgc.cross.full.2013-03.hg19.txt
#snpid chr GRCh37_POS a1 a2 or se pval info ngt CEUaf
#rs3131972 1 752721 A G 1.047 0.0361 0.2032 0.774 0 0.16055


docker run -it -v $PWD:/data/ rtibiocloud/ldsc:v1.0.1_9501d4d bash
python /opt/ldsc/munge_sumstats.py \
    --sumstats pgc.cross.full.2013-03.hg19.txt \
    --snp snpid \
    --N-cas 33332 \
    --N-con 27888 \
    --a1 a1  \
    --a2 a2 \
    --p pval \
    --signed-sumstats or,1 \
    --out pgc.cross.full.2013-03.hg19

# upload to s3
aws s3 cp pgc.cross.full.2013-03.hg19.sumstats.gz s3://rti-shared/ldsc/data/cross_disorder_gwas_pgc2013_lancet/munged/
aws s3 cp pgc.cross.full.2013-03.hg19.log s3://rti-shared/ldsc/data/cross_disorder_gwas_pgc2013_lancet/munged/


### Intelligence
Sniekers et al., 2017 Nat Genet	28530673

```
## Association results of the meta-analysis for intelligence based on 78,308 individuals in 13 cohorts. 

## Version date: 10-07-2017

#Columns:
Chromosome: chromosome number
position: base pair position of the SNP on the chromosome (reported on GRCh37)
rsid: SNP rs number
ref: effect allele
alt: non-effect allele
N: sample size
MAF: minor allele frequency in UK Biobank
Beta: effect size of the effect allele
SE: standard error of the effect
Zscore: Z-score computed in METAL by a weighted Z-score method
p_value: P-value computed in METAL by a weighted Z-score method
direction: direction of the effect in each of the cohorts, order: CHIC (consisting of 6 cohorts), UKB-wb, UKB-ts, ERF, GENR, HU, MCTFR, STR

Beta/SE were calculated from METAL Z-scores using the formula from Zhu et al (Nature Genetics, 2016):

Beta = Zscore / sqrt( 2 * MAF * ( 1 - MAF) * ( N + Zscore^2 ) )
SE = 1 / sqrt( 2 * MAF * ( 1 - MAF ) * ( N + Zscore^2 ) )
```

In [None]:
# use ldsc tool munge_sumstats.py to convert to sumstats format (https://github.com/bulik/ldsc/wiki/Summary-Statistics-File-Format)

# intelligence s3://rti-shared/gwas_publicly_available_sumstats/intelligence_sniekers2017_nat_genet/raw/sumstats.txt.gz
aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/intelligence_sniekers2017_nat_genet/raw/sumstats.txt.gz .
zcat sumstats.txt.gz  | head
#Chromosome      position        rsid    ref     alt     MAF     Beta    SE      Zscore  p_value direction
#1       100000012       rs10875231      T       G       0.234588        0.000298163293384453    0.00596326586768906     0.05    0.9599  +-++--+-

# interactive mode
docker run -it -v $PWD:/data/ rtibiocloud/ldsc:v1.0.1_9501d4d bash \
python /opt/ldsc/munge_sumstats.py \
    --sumstats sumstats.txt.gz \
    --snp rsid \
    --N 78308 \
    --a1 ref \
    --a2 alt \
    --p p_value \
    --signed-sumstats Beta,0 \
    --out intelligence_sniekers2017_nat_genet_sumstats_formatted



# upload to s3
aws s3 cp intelligence_sniekers2017_nat_genet_sumstats_formatted.log s3://rti-shared/ldsc/data/intelligence_sniekers2017_nat_genet/munged/
aws s3 cp intelligence_sniekers2017_nat_genet_sumstats_formatted.sumstats.gz s3://rti-shared/ldsc/data/intelligence_sniekers2017_nat_genet/munged/

### Personality

In [None]:
# Neo-conscientiousness (de Moor et al., 2012 Mol Psychiatry 21173776)
aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/personality_demoor_2012_mol_psych/raw/GPC-1.NEO-CONSCIENTIOUSNESS.full.txt .

# Need to add headers - they were missing. The readme says "The files contain the following information (columns): SNPID CHR BP A1 A2 BETA SE PVALUE INFO NCOH MAF"
echo -e "SNPID\tCHR\tBP\tA1\tA2\tBETA\tSE\tPVALUE\tINFO\tNCOH\tMAF" > GPC-1.NEO-CONSCIENTIOUSNESS.full.with_header.txt
cat GPC-1.NEO-CONSCIENTIOUSNESS.full.txt >> GPC-1.NEO-CONSCIENTIOUSNESS.full.with_header.txt

# perform liftover to build37 (currently build 36)
# use liftover workflow: liftover_genomic_annotations
git clone --recurse-submodules https://github.com/RTIInternational/biocloud_gwas_workflows

# once liftover completed, now we can use munge_sumstats.py

zcat GPC-1.NEO-CONSCIENTIOUSNESS.full.with_header.h19.txt.gz | head -2
#SNPID CHR HG19_POS A1 A2 BETA SE PVALUE INFO NCOH MAF
#rs3121561 1 990380 t c -.0144 .1282 .9105 .531 8 .267

docker run -it -v $PWD:/data/ rtibiocloud/ldsc:v1.0.1_9501d4d bash
python /opt/ldsc/munge_sumstats.py \
    --sumstats GPC-1.NEO-CONSCIENTIOUSNESS.full.with_header.h19.txt.gz \
    --snp SNPID \
    --N 17375 \
    --a1 A1  \
    --a2 A2 \
    --p PVALUE \
    --signed-sumstats BETA,0 \
    --out GPC-1.NEO-CONSCIENTIOUSNESS.full.with_header.h19

# upload to s3
aws s3 cp GPC-1.NEO-CONSCIENTIOUSNESS.full.with_header.h19.sumstats.gz s3://rti-shared/ldsc/data/personality_demoor_2012_mol_psych/munged/
aws s3 cp GPC-1.NEO-CONSCIENTIOUSNESS.full.with_header.h19.log s3://rti-shared/ldsc/data/personality_demoor_2012_mol_psych/munged/

In [None]:
# Neo-openness to Experience (de Moor et al., 2012 Mol Psychiatry 21173776)
aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/personality_demoor_2012_mol_psych/raw/GPC-1.NEO-OPENNESS.full.txt .

# Need to add headers - they were missing. The readme says "The files contain the following information (columns): SNPID CHR BP A1 A2 BETA SE PVALUE INFO NCOH MAF"
echo -e "SNPID\tCHR\tBP\tA1\tA2\tBETA\tSE\tPVALUE\tINFO\tNCOH\tMAF" > GPC-1.NEO-OPENNESS.full.with_header.txt
cat GPC-1.NEO-OPENNESS.full.txt >> GPC-1.NEO-OPENNESS.full.with_header.txt

# perform liftover to build37 (currently build 36)
# use liftover workflow: liftover_genomic_annotations
git clone --recurse-submodules https://github.com/RTIInternational/biocloud_gwas_workflows

# once liftover completed, now we can use munge_sumstats.py

zcat GPC-1.NEO-OPENNESS.full.with_header.hg19.txt.gz | head -2
#SNPID CHR HG19_POS A1 A2 BETA SE PVALUE INFO NCOH MAF
#rs3121561 1 990380 t c -.0259 .1199 .8293 .536 8 .267

docker run -it -v $PWD:/data/ rtibiocloud/ldsc:v1.0.1_9501d4d bash
python /opt/ldsc/munge_sumstats.py \
    --sumstats GPC-1.NEO-OPENNESS.full.with_header.hg19.txt.gz \
    --snp SNPID \
    --N 17375 \
    --a1 A1  \
    --a2 A2 \
    --p PVALUE \
    --signed-sumstats BETA,0 \
    --out GPC-1.NEO-OPENNESS.full.with_header.hg19

# upload to s3
aws s3 cp GPC-1.NEO-OPENNESS.full.with_header.hg19.sumstats.gz s3://rti-shared/ldsc/data/personality_demoor_2012_mol_psych/munged/
aws s3 cp GPC-1.NEO-OPENNESS.full.with_header.hg19.log s3://rti-shared/ldsc/data/personality_demoor_2012_mol_psych/munged/

In [None]:
aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/personality_demoor_2012_mol_psych/raw/GPC-1.NEO-EXTRAVERSION.full.txt .

In [None]:

aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/personality_demoor_2012_mol_psych/raw/GPC-1.NEO-AGREEABLENESS.full.txt

### Subjective Wellbeing

In [None]:
aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/subjective_wellbeing_okbay2016_nat_genet/raw/SWB_Full.txt.gz .

zcat SWB_Full.txt.gz  | head -2
#MarkerName      CHR     POS     A1      A2      EAF     Beta    SE      Pval
#rs2075677       20      47701024        A       G       0.7743  0.021   0.004   1.879e-08


docker run -it -v $PWD:/data/ rtibiocloud/ldsc:v1.0.1_9501d4d bash
python /opt/ldsc/munge_sumstats.py \
    --sumstats SWB_Full.txt.gz \
    --snp MarkerName \
    --N 298420 \
    --a1 A1  \
    --a2 A2\
    --p Pval \
    --signed-sumstats Beta,0 \
    --out SWB_Full

# upload to s3
aws s3 cp SWB_Full.sumstats.gz s3://rti-shared/ldsc/data/subjective_wellbeing_okbay2016_nat_genet/munged/
aws s3 cp SWB_Full.log s3://rti-shared/ldsc/data/subjective_wellbeing_okbay2016_nat_genet/munged/

###  Years of Education 
(Okbay et al., 2022 Nature Genetics  35361970)

In [None]:
# use ldsc tool munge_sumstats.py to convert to sumstats format (https://github.com/bulik/ldsc/wiki/Summary-Statistics-File-Format)

aws s3 cp s3://rti-shared/gwas_publicly_available_sumstats/years_schooling_okbay2022_nat_genet/raw/EA4_additive_excl_23andMe.txt.gz .

zcat EA4_additive_excl_23andMe.txt.gz  | head -2
#rsID    Chr     BP      Effect_allele   Other_allele    EAF_HRC Beta    SE      SE_unadj        P       P_unadj
#rs667647        5       29439275        T       C       0.376548        -0.00032        0.00179 0.00167 0.86    0.8504

docker run -it -v $PWD:/data/ rtibiocloud/ldsc:v1.0.1_9501d4d bash
python /opt/ldsc/munge_sumstats.py \
    --sumstats EA4_additive_excl_23andMe.txt.gz \
    --snp rsID \
    --N 765283 \
    --a1 Effect_allele  \
    --a2 Other_allele \
    --p P \
    --signed-sumstats Beta,0 \
    --out EA4_additive_excl_23andMe

# upload to s3
aws s3 cp EA4_additive_excl_23andMe.sumstats.gz s3://rti-shared/ldsc/data/years_schooling_okbay2022_nat_genet/munged/
aws s3 cp EA4_additive_excl_23andMe.log s3://rti-shared/ldsc/data/years_schooling_okbay2022_nat_genet/munged/

# Genomic start-end locations for GENCODE v30 IDS

<br><br>
https://www.gencodegenes.org/human/release_43lift37.html

get the start and end positions for each gene
https://www.gencodegenes.org/pages/data_format.html

| column-number   | content                | values/format           |
|-----------------|------------------------|-------------------------|
| 4               | genomic start location | integer-value (1-based) |
| 5               | genomic end location   | integer-value           |

In [None]:
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_43/GRCh37_mapping/gencode.v43lift37.annotation.gtf.gz

In [3]:
cd /Users/jmarks/projects/heroin/ldsc/oud_twas_meta_issue183/20230426

/Users/jmarks/Library/CloudStorage/OneDrive-ResearchTriangleInstitute/Projects/heroin/ldsc/oud_twas_meta_issue183/20230426


In [7]:
import pandas as pd

df = pd.read_csv("meta_analysis_sumstats_no_singletons_20230214.csv", sep=",")
df.head()

Unnamed: 0,gencode_id,gene_name,base_mean_expression_corradin,base_mean_log2cpm_corradin,fold_change_corradin,log2_fold_change_corradin,log2_fold_change_se_corradin,test_statistic_corradin,b_statistic_corradin,pvalue_corradin,...,pvalue_sosnowski,adjusted_pvalue_sosnowski,num_datasets,fc_sign_corradin,fc_sign_mendez,fc_sign_seney,fc_sign_sosnowski,wfisher_fc_sign,wfisher_pvalue,wfisher_adj_pvalue
0,ENSG00000000003.14,TSPAN6,284.065328,2.172214,0.991802,-0.011876,0.135991,-0.087329,-5.773293,0.930999,...,0.467805,0.904407,4,-1.0,1.0,1.0,1.0,1,0.090018,0.354047
1,ENSG00000000419.12,DPM1,375.845245,2.603202,0.887022,-0.172957,0.09731,-1.777386,-4.461017,0.085795,...,0.743068,0.958975,4,-1.0,-1.0,1.0,-1.0,-1,0.381822,0.650737
2,ENSG00000000457.14,SCYL3,538.317581,3.143996,0.973343,-0.03898,0.089259,-0.436711,-5.912914,0.665494,...,0.079513,0.676163,4,-1.0,-1.0,1.0,1.0,1,0.241729,0.53736
3,ENSG00000000460.17,C1orf112,157.927412,1.322165,0.984871,-0.021993,0.19838,-0.110862,-5.567076,0.912475,...,0.54293,0.923965,4,-1.0,1.0,-1.0,-1.0,-1,1.0,1.0
4,ENSG00000000938.13,FGR,44.234796,-0.796507,0.904991,-0.144025,0.349171,-0.412477,-5.151557,0.682966,...,0.672601,0.948767,4,-1.0,1.0,1.0,-1.0,1,0.387745,0.6563


In [8]:
ann = pd.read_csv("gencode.v43lift37.annotation.gtf", sep="\t", skiprows=5, header=None)
ann.columns = ["chromosome_name", "annotation_source", "feature_type","genomic_start_location","genomic_end_location","score(not_used)","genomic_strand","genomic_phase", "additional_information"]
ann.head()

Unnamed: 0,chromosome_name,annotation_source,feature_type,genomic_start_location,genomic_end_location,score(not_used),genomic_strand,genomic_phase,additional_information
0,chr1,HAVANA,gene,12010,13670,.,+,.,"gene_id ""ENSG00000223972.6_6""; gene_type ""tran..."
1,chr1,HAVANA,transcript,12010,13670,.,+,.,"gene_id ""ENSG00000223972.6_6""; transcript_id ""..."
2,chr1,HAVANA,exon,12010,12057,.,+,.,"gene_id ""ENSG00000223972.6_6""; transcript_id ""..."
3,chr1,HAVANA,exon,12179,12227,.,+,.,"gene_id ""ENSG00000223972.6_6""; transcript_id ""..."
4,chr1,HAVANA,exon,12613,12697,.,+,.,"gene_id ""ENSG00000223972.6_6""; transcript_id ""..."


In [12]:
import gzip
# https://storage.googleapis.com/broad-alkesgroup-public/LDSCORE/make_annot_sample_files/ENSG_coord.txt
# https://storage.googleapis.com/broad-alkesgroup-public-requester-pays/LDSCORE/make_annot_sample_files/ENSG_coord.txt

fdr = "0.05"
in1 = "meta_analysis_sumstats_no_singletons_20230214.csv"
out1 = f"meta_analysis_sumstats_no_singletons_20230214_fdr{fdr}_coord.tsv"
annfile = "gencode.v43lift37.annotation.gtf.gz"

with open(in1) as inF, gzip.open(annfile, 'rt') as annF, open(out1, 'w') as outF:
    for _ in range(5):
        next(annF)
    line = annF.readline()
    
    gencode_dic  = {}
    while line:
        sl = line.split("\t")
        if sl[2] == "gene":
            gencode = sl[8].split(";")[0] # remove all additional_info except "gene_id <ENSG...>"
            gencode = gencode.split(" ")[1].strip('"') # remove "gene_id" portion, and double quotes
            gencode = gencode.split(".")[0] # remove suffix  <ENSG...>
            gencode_dic[gencode] = [sl[0], sl[3], sl[4]] # chr, start-, and end-genomic position
        line = annF.readline()
    
    print(dict(list(gencode_dic.items())[0:2]))

    outF.write("GENE\tCHR\tSTART\tEND\n")
    next(inF)
    line = inF.readline()
    while line:
        sl = line.split(",")
        gencode = sl[0].split(".")[0]
        wfisher_adj_p = sl[-1]
        if float(wfisher_adj_p) < float(fdr):
            if gencode in gencode_dic:
                chrom = gencode_dic[gencode][0]
                start = gencode_dic[gencode][1]
                end = gencode_dic[gencode][2]
                outline = f"{gencode}\t{chrom}\t{start}\t{end}\n"
                outF.write(outline)
            else:
                print(gencode)
        line = inF.readline()

{'ENSG00000223972': ['chr1', '12010', '13670'], 'ENSG00000227232': ['chr1', '14404', '29570']}
ENSG00000254615
ENSG00000263278
ENSG00000277209


In [None]:
# extract just the gene
tail -n +2 meta_analysis_sumstats_no_singletons_20230214_fdr0.05_coord.tsv | \
  cut -f1 > meta_analysis_sumstats_no_singletons_20230214_fdr0.05_geneset.tsv


# Stratified LDSC
https://github.com/bulik/ldsc/wiki/LD-Score-Estimation-Tutorial#partitioned-ld-scores

In [None]:
# download files needed for partitioned heritability analysis
wget https://storage.googleapis.com/broad-alkesgroup-public/LDSCORE/1000G_phase3_baseline_ldscores.tgz
wget https://storage.googleapis.com/broad-alkesgroup-public/LDSCORE/1000G_Phase3_plinkfiles.tgz
wget https://storage.googleapis.com/broad-alkesgroup-public/LDSCORE/1000G_Phase3_frq.tgz

wget https://storage.googleapis.com/broad-alkesgroup-public/LDSCORE/weights_hm3_no_hla.tgz
#wget https://storage.googleapis.com/broad-alkesgroup-public/LDSCORE/hapmap3_snps.tgz

# extract files
tar -xvf 1000G_Phase3_baseline_ldscores.tgz
tar -xvf 1000G_Phase3_plinkfiles.tgz
tar -xvf 1000G_Phase3_frq.tgz
#tar -xvf hapmap3_snps.tgz
tar -xvf weights_hm3_no_hla.tgz

# make directory for logs
mkdir logs/


# interactive session
docker run -it -v $PWD:/data/ \
    rtibiocloud/ldsc:v1.0.1_0bb574e bash

In [None]:
date=20230426
window=100000
#for fdr in {"0.05","0.10"}; do # loop through each BED file
for fdr in "0.05"; do
    coord_file=/data/deg_bedfiles/meta_analysis_sumstats_no_singletons_20230214_fdr${fdr}_coord.tsv 
    geneset_file=/data/deg_bedfiles/meta_analysis_sumstats_no_singletons_20230214_fdr${fdr}_geneset.tsv

    # store processing files for each meta in separate dir
    mkdir -p /data/{annotations_ldscores,results}/fdr$fdr/

    for j in {1..22}; do # loop through each chromosome
        # create annotation files
        python /opt/ldsc/make_annot.py \
            --gene-set-file $geneset_file \
            --gene-coord-file $coord_file \
            --windowsize $window \
            --bimfile /data/1000g/1000G_EUR_Phase3_plink/1000G.EUR.QC.$j.bim \
            --annot-file /data/annotations_ldscores/fdr$fdr/oa_twas_meta_fdr${fdr}genes_window${window}_chr$j.annot.gz >> /data/logs/sldsc_${date}.log 2>&1

        # compute LD scores
        python /opt/ldsc/ldsc.py \
            --l2 \
            --thin-annot \
            --ld-wind-cm 1 \
            --print-snps /data/1000g/1000G_EUR_Phase3_baseline/print_snps.txt \
            --bfile /data/1000g/1000G_EUR_Phase3_plink/1000G.EUR.QC.$j \
            --annot /data/annotations_ldscores/fdr$fdr/oa_twas_meta_fdr${fdr}genes_window${window}_chr$j.annot.gz \
            --out /data/annotations_ldscores/fdr$fdr/oa_twas_meta_fdr${fdr}genes_window${window}_chr$j >> /data/logs/sldsc_${date}.log 2>&1
    done # end chr loop

    for trait in {"insomnia_jansen","insomnia_lane","long_sleep_duration_dashti","short_sleep_duration_dashti","sleep_duration_dashti","sleep_duration_jansen","adhd","age_of_initiation","alcohol_dependence","alzheimers_disease","amyotrophic_lateral_sclerosis","anorexia","autism","bipolar","brain_volume_mean_accumbens","brain_volume_mean_amygdala","brain_volume_mean_caudate","brain_volume_mean_hippocampus","brain_volume_mean_pallidum","brain_volume_mean_putamen","brain_volume_mean_thalamus","brain_volume_total_intracranial","cannabis_use_disorder","childhood_intelligence","cigarettes_per_day","college_completion","cotinine_levels","cross_disorder","depressive_symptoms","drinks_per_week","ftnd","heaviness_smoking_index","intelligence","lifetime_cannabis_use","major_depressive_disorder","neo_conscientiousness","neo_openness","neuroticism","opioid_addiction_144","opioid_addiction_gsem","parkinsons","ptsd","schizophrenia","smoking_cessation","smoking_initiation","subjective_wellbeing","years_of_education"}; do # loop through all traits
        case $trait in  # use sumstats files that corresponds to the trait name for the h2 estimate
            "insomnia_jansen") stats=/data/sumstats/Insomnia-Jansen_2019.sumstats.gz ;;
            "insomnia_lane") stats=/data/sumstats/Insomnia-Lane_2019.sumstats.gz ;;
            "long_sleep_duration_dashti") stats=/data/sumstats/LongSleepDur-Dashti_2019.sumstats.gz ;;
            "short_sleep_duration_dashti") stats=/data/sumstats/ShortSleepDur-Dashti_2019.sumstats.gz ;;
            "sleep_duration_dashti") stats=/data/sumstats/SleepDuration-Dashti_2019.sumstats.gz ;;
            "sleep_duration_jansen") stats=/data/sumstats/Sleepdur-Jansen_2019.sumstats.gz ;;
            "adhd") stats=/data/sumstats/daner_meta_filtered_NA_iPSYCH23_PGC11_sigPCs_woSEX_2ell6sd_EUR_Neff_70.meta.munged.merged.txt.gz ;;
            "age_of_initiation") stats=/data/sumstats/AgeOfInitiation.txt.munged.merged.txt.gz ;;
            "alcohol_dependence") stats=/data/sumstats/pgc_alcdep.eur_discovery.aug2018_release.txt.munged.merged.txt.gz ;;
            "alzheimers_disease") stats=/data/sumstats/alzheimers_disease_lambert2013_nat_genet.sumstats.gz ;;
            "amyotrophic_lateral_sclerosis") stats=/data/sumstats/amyotrophic_lateral_sclerosis_rheenen2016_nat_genet.sumstats.gz ;;
            "anorexia") stats=/data/sumstats/anorexia_watson2019_workflow_ready.txt.munged.merged.txt.gz ;;
            "autism") stats=/data/sumstats/iPSYCH-PGC_ASD_Nov2017.munged.merged.txt.gz ;;
            "bipolar") stats=/data/sumstats/daner_PGC_BIP32b_mds7a_0416a.munged.merged.txt.gz ;;
            "brain_volume_mean_accumbens") stats=/data/sumstats/ENIGMA2_MeanAccumbens_Combined_GenomeControlled_Jan23.tbl.sumstats.gz ;;
            "brain_volume_mean_amygdala") stats=/data/sumstats/ENIGMA2_MeanAmygdala_Combined_GenomeControlled_Jan23.tbl.sumstats.gz ;;
            "brain_volume_mean_caudate") stats=/data/sumstats/ENIGMA2_MeanCaudate_Combined_GenomeControlled_Jan23.tbl.sumstats.gz ;;
            "brain_volume_mean_hippocampus") stats=/data/sumstats/ENIGMA2_MeanHippocampus_Combined_GenomeControlled_Jan23.tbl.sumstats.gz ;;
            "brain_volume_mean_pallidum") stats=/data/sumstats/ENIGMA2_MeanPallidum_Combined_GenomeControlled_Jan23.tbl.sumstats.gz ;;
            "brain_volume_mean_putamen") stats=/data/sumstats/ENIGMA2_MeanPutamen_Combined_GenomeControlled_Jan23.tbl.sumstats.gz ;;
            "brain_volume_mean_thalamus") stats=/data/sumstats/ENIGMA2_MeanThalamus_Combined_GenomeControlled_Jan23.tbl.sumstats.gz ;;
            "brain_volume_total_intracranial") stats=/data/sumstats/ENIGMA2_ICV_Combined_GenomeControlled_Jan23.tbl.sumstats.gz ;;
            "cannabis_use_disorder") stats=/data/sumstats/CUD_GWAS_iPSYCH_June2019.munged.merged.txt.gz ;;
            "childhood_intelligence") stats=/data/sumstats/CHIC_Summary_Benyamin2014.sumstats.gz ;;
            "cigarettes_per_day") stats=/data/sumstats/CigarettesPerDay.txt.munged.merged.txt.gz ;;
            "college_completion") stats=/data/sumstats/SSGAC_College_Rietveld2013_publicrelease.sumstats.gz ;;
            "cotinine_levels") stats=/data/sumstats/cotinine_ware2016_workflow_ready.txt.munged.merged.txt.gz ;;
            "cross_disorder") stats=/data/sumstats/pgc.cross.full.2013-03.hg19.sumstats.gz ;;
            "depressive_symptoms") stats=/data/sumstats/DS_Full.txt.munged.merged.txt.gz ;;
            "drinks_per_week") stats=/data/sumstats/DrinksPerWeek.txt.munged.merged.txt.gz ;;
            "ftnd") stats=/data/sumstats/ftnd_wave3_eur_quach2020_workflow_ready.txt.munged.merged.txt.gz ;;
            "heaviness_smoking_index") stats=/data/sumstats/ukb_gwa_003_workflow_ready.txt.munged.merged.txt.gz ;;
            "intelligence") stats=/data/sumstats/intelligence_sniekers2017_nat_genet_sumstats_formatted.sumstats.gz ;;
            "lifetime_cannabis_use") stats=/data/sumstats/cannabis_icc_ukb_workflow_ready.txt.munged.merged.txt.gz ;;
            "major_depressive_disorder") stats=/data/sumstats/pgc_ukb_depression_gwas_workflow_ready.txt.munged.merged.txt.gz ;;
            "neo_conscientiousness") stats=/data/sumstats/GPC-1.NEO-CONSCIENTIOUSNESS.full.with_header.h19.sumstats.gz ;;
            "neo_openness") stats=/data/sumstats/GPC-1.NEO-OPENNESS.full.with_header.hg19.sumstats.gz ;;
            "neuroticism") stats=/data/sumstats/neuroticism_okbay2016_nat_genet.sumstats.gz ;;
            "opioid_addiction_144") stats=/data/sumstats/cats+coga+decode+kreek+odb+uhs+vidus+yale-penn.ea.chrall.maf_gt_0.01.rsq_gt_0.8.sumstats_formatted.sumstats.gz ;;
            "opioid_addiction_gsem") stats=/data/sumstats/genomicSEM_GWAS.oaALL.MVP1_MVP2_YP_SAGE.PGC.Song.table.sumstats.gz ;;
            "parkinsons") stats=/data/sumstats/parkinsons_disease_sanchez2009_nat_genet.sumstats.gz ;;
            "ptsd") stats=/data/sumstats/pts_eur_freeze2_overall.results.munged.merged.txt.gz ;;
            "schizophrenia") stats=/data/sumstats/daner_natgen_pgc_eur.munged.merged.txt.gz ;;
            "smoking_cessation") stats=/data/sumstats/SmokingCessation.txt.munged.merged.txt.gz ;;
            "smoking_initiation") stats=/data/sumstats/SmokingInitiation.txt.munged.merged.txt.gz ;;
            "subjective_wellbeing") stats=/data/sumstats/SWB_Full.sumstats.gz ;;
            "years_of_education") stats=/data/sumstats/EA4_additive_excl_23andMe.sumstats.gz ;;
        esac

        # computed partitioned heritability estimate
        python /opt/ldsc/ldsc.py \
            --h2 $stats \
            --overlap-annot \
            --print-coefficients \
            --w-ld-chr "/data/weights_hm3_no_hla/weights." \
            --frqfile-chr "/data/1000g/1000G_Phase3_frq/1000G.EUR.QC." \
            --ref-ld-chr "/data/annotations_ldscores/fdr$fdr/oa_twas_meta_fdr${fdr}genes_window${window}_chr,/data/1000g/1000G_EUR_Phase3_baseline/baseline." \
            --out "/data/results/fdr$fdr/${trait}_with_oa_twas_meta_analysis_deg_genes_fdr${fdr}_window${window}" >> /data/logs/sldsc_${date}.log 2>&1
    done
done

In [None]:
cd results/
#for fdr in {"0.05","0.10"}; do
for fdr in "0.05"; do
    #outfile=fdr${fdr}/all_phenotypes_oa_twas_meta_analysis_deg_fdr${fdr}_window100000_final_results.tsv
    outfile=${date}_all_phenotypes_oa_twas_meta_analysis_deg_fdr${fdr}_window100000_final_results.tsv
    touch $outfile
    head -1 fdr${fdr}/smoking_initiation_with_oa_twas_meta_analysis_deg_genes_fdr${fdr}_window100000.results > $outfile
        
    for file in fdr${fdr}/*_fdr${fdr}_window100000.results; do
        trait=$(echo $file |  sed "s/_with_oa_twas_meta_analysis_deg_genes_fdr.*//") # remove suffix
        trait=$(echo $trait |  sed "s/fdr$fdr\///") # remove directory prefix
        #echo $trait
        awk -v trait=$trait \
        '$1 = trait {print $0}' OFS="\t" <(tail -n +2 $file | head -1) >> $outfile
    done
done


## `20230426_all_phenotypes_oa_twas_meta_analysis_deg_fdr0.05_window100000_final_results.tsv`

| Category                        | Prop._SNPs      | Prop._h2         | Prop._h2_std_error | Enrichment      | Enrichment_std_error | Enrichment_p          | Coefficient        | Coefficient_std_error | Coefficient_z-score |
|---------------------------------|-----------------|------------------|--------------------|-----------------|----------------------|-----------------------|--------------------|-----------------------|---------------------|
| adhd                            | 0.0285585403778 | 0.0218416626472  | 0.0090670100208    | 0.764803185254  | 0.317488565622       | 0.45892889838275241   | -1.7931184739e-08  | 2.66764276906e-08     | -0.672173386443     |
| age_of_initiation               | 0.0285585403778 | 0.0502879966467  | 0.0108837018478    | 1.7608741897    | 0.381101474508       | 0.04433372279776529   | 4.58042879956e-09  | 2.60758075056e-09     | 1.7565817659        |
| alcohol_dependence              | 0.0285585403778 | 0.0548965248059  | 0.0348263100259    | 1.92224546772   | 1.21947093812        | 0.44728299424909179   | 1.77590001351e-08  | 2.08756741059e-08     | 0.850703074064      |
| alzheimers_disease              | 0.0285585403778 | 0.0921687084102  | 0.0243617232568    | 3.22736061405   | 0.853045111356       | 0.0048173643188058899 | 1.27574511214e-08  | 1.0089860799e-08      | 1.26438326312       |
| amyotrophic_lateral_sclerosis   | 0.0285585403778 | 0.0822621328248  | 0.0487384989315    | 2.8804739926    | 1.7066172951         | 0.22948851915609661   | 6.40013438058e-09  | 1.30449809916e-08     | 0.490620445113      |
| anorexia                        | 0.0285585403778 | 0.0385441021739  | 0.00788139227577   | 1.34965238643   | 0.275973217521       | 0.20969636467222094   | 1.22590059351e-08  | 1.06589852453e-08     | 1.1501100389        |
| autism                          | 0.0285585403778 | 0.0451919217177  | 0.0100902931505    | 1.58243107385   | 0.353319638083       | 0.10359815584061807   | 1.80297408629e-08  | 1.14901887822e-08     | 1.56914226603       |
| bipolar                         | 0.0285585403778 | 0.0553429664919  | 0.00893782758731   | 1.93787797835   | 0.312965139992       | 0.0035877202244739411 | 8.14007177363e-08  | 3.69282398094e-08     | 2.20429454955       |
| brain_volume_mean_accumbens     | 0.0285585403778 | 0.109255847429   | 0.0609878222465    | 3.82568037383   | 2.13553709117        | 0.15008412778879612   | 4.77232973376e-08  | 3.68745676338e-08     | 1.29420628905       |
| brain_volume_mean_amygdala      | 0.0285585403778 | 3.35293650887    | 326.834431013      | 117.405737986   | 11444.3674883        | 0.44009345502786612   | 2.51653696577e-08  | 3.46107487859e-08     | 0.727096943592      |
| brain_volume_mean_caudate       | 0.0285585403778 | 0.0136770335025  | 0.0275876556431    | 0.478912203548  | 0.966003699004       | 0.59298779856737416   | -4.97303271716e-08 | 3.36165043058e-08     | -1.47934260859      |
| brain_volume_mean_hippocampus   | 0.0285585403778 | 0.0637949162309  | 0.038899108992     | 2.23382971913   | 1.36208323245        | 0.35649409787054409   | 2.32031469882e-08  | 3.40504896763e-08     | 0.681433577278      |
| brain_volume_mean_pallidum      | 0.0285585403778 | 0.0503240374619  | 0.0426661931661    | 1.7621361875    | 1.49399068025        | 0.60741518890954715   | 5.06791284829e-09  | 3.70349778731e-08     | 0.136841254925      |
| brain_volume_mean_putamen       | 0.0285585403778 | 0.0374269259307  | 0.0151159110264    | 1.31053357194   | 0.52929564419        | 0.55151764306165774   | -4.73503026e-09    | 3.19535506888e-08     | -0.148184791923     |
| brain_volume_mean_thalamus      | 0.0285585403778 | 0.0195029497787  | 0.0320422689719    | 0.68291129451   | 1.12198552685        | 0.77286313039780552   | -2.66088367194e-08 | 3.35168362924e-08     | -0.793894641107     |
| brain_volume_total_intracranial | 0.0285585403778 | 0.0930995928961  | 0.0480732892671    | 3.25995627453   | 1.68332444975        | 0.11555899537207429   | 3.84828279833e-08  | 4.57753075027e-08     | 0.840689666171      |
| cannabis_use_disorder           | 0.0285585403778 | 0.0498441178592  | 0.0556938523225    | 1.745331421     | 1.95016452472        | 0.69946909167385385   | 1.48555754541e-08  | 4.84808134505e-08     | 0.306421744951      |
| childhood_intelligence          | 0.0285585403778 | 0.0468767053935  | 0.0228603508995    | 1.64142511394   | 0.800473364434       | 0.41204460609808291   | 1.63832454921e-08  | 2.26603170072e-08     | 0.722992775735      |
| cigarettes_per_day              | 0.0285585403778 | 0.056668713115   | 0.0145259218004    | 1.98430005054   | 0.508636702305       | 0.04906691268293336   | 8.41099793927e-09  | 5.32286141427e-09     | 1.58016474311       |
| college_completion              | 0.0285585403778 | 0.0396086968536  | 0.0119682570927    | 1.38693001567   | 0.419078038806       | 0.35480762818213707   | 5.134984481e-09    | 6.31810272158e-09     | 0.812741531323      |
| cotinine_levels                 | 0.0285585403778 | -0.14065670065   | 1.26424078357      | -4.92520621813  | 44.2683963132        | 0.29443644112753609   | -9.28404690287e-08 | 8.70489220635e-08     | -1.06653209285      |
| cross_disorder                  | 0.0285585403778 | 0.0452115174308  | 0.0100135703059    | 1.58311723333   | 0.350633126675       | 0.097777884898957251  | 1.1309618687e-08   | 9.24556977886e-09     | 1.22324734522       |
| depressive_symptoms             | 0.0285585403778 | 0.0132509568445  | 0.0126849673168    | 0.46399279057   | 0.444174217205       | 0.22521914278205299   | -2.85187018978e-09 | 3.14264917979e-09     | -0.907473289773     |
| drinks_per_week                 | 0.0285585403778 | 0.0437547063271  | 0.00793313759707   | 1.53210583413   | 0.277785121092       | 0.057314547168120081  | 2.84252849979e-09  | 2.0694913921e-09      | 1.37353965841       |
| ftnd                            | 0.0285585403778 | 0.0533257339227  | 0.0220124339059    | 1.86724297591   | 0.770782876672       | 0.227983534914764     | 7.87051975194e-09  | 9.21694421151e-09     | 0.853918562522      |
| heaviness_smoking_index         | 0.0285585403778 | -0.0114019455374 | 0.0345535304881    | -0.399248189388 | 1.2099193457         | 0.22156439126918423   | -2.00037511266e-08 | 1.32519029778e-08     | -1.50950027027      |
| insomnia_jansen                 | 0.0285585403778 | 0.0466908798907  | 0.00746965759594   | 1.63491828619   | 0.261556000311       | 0.014890191507080573  | 3.63161114359e-09  | 1.83225738954e-09     | 1.98204202331       |
| insomnia_lane                   | 0.0285585403778 | 0.0337556404988  | 0.00525726230056   | 1.18198059327   | 0.18408721983        | 0.32523730810315699   | 8.56168777159e-10  | 1.72157152738e-09     | 0.49731815585       |
| intelligence                    | 0.0285585403778 | 0.0286979240168  | 0.00703996064471   | 1.00488062895   | 0.246509819885       | 0.9842284869915352    | -3.24015541385e-09 | 7.30372660017e-09     | -0.443630435698     |
| lifetime_cannabis_use           | 0.0285585403778 | 0.041688556005   | 0.010576488658     | 1.45975793768   | 0.370344160382       | 0.2167117382330421    | 4.61541677428e-09  | 3.97417704299e-09     | 1.16135157653       |
| long_sleep_duration_dashti      | 0.0285585403778 | 0.0519581112113  | 0.00972696151918   | 1.81935457919   | 0.340597292106       | 0.015149368946889265  | 2.44894416328e-09  | 1.15667258757e-09     | 2.11723195449       |
| major_depressive_disorder       | 0.0285585403778 | 0.0313116278636  | 0.00534241592517   | 1.09640154747   | 0.187068941707       | 0.60684181128177894   | 1.84547879528e-09  | 1.9105546643e-09      | 0.965938755779      |
| neo_conscientiousness           | 0.0285585403778 | 0.0872372705898  | 0.0606434161416    | 3.05468239748   | 2.12347743755        | 0.26499568499757148   | 2.86039983162e-08  | 2.79695428398e-08     | 1.02268379859       |
| neo_openness                    | 0.0285585403778 | 0.0131477552503  | 0.0412141387912    | 0.460379104689  | 1.4431458417         | 0.7039275394390937    | -2.54707284418e-09 | 2.37021220099e-08     | -0.107461806294     |
| neuroticism                     | 0.0285585403778 | 0.0400072017286  | 0.00753811520689   | 1.40088398074   | 0.263953097994       | 0.13255689617427546   | 5.78300756976e-09  | 3.68265835747e-09     | 1.57033507005       |
| opioid_addiction_144            | 0.0285585403778 | 0.0485327390292  | 0.0210483557255    | 1.69941244851   | 0.737024912584       | 0.34693489982297065   | 1.23287901682e-09  | 1.55595945026e-09     | 0.792359348836      |
| opioid_addiction_gsem           | 0.0285585403778 | 0.0227665366945  | 0.0129875497636    | 0.797188385448  | 0.45476938218        | 0.65671730995442534   | -4.08742007091e-09 | 6.21762057624e-09     | -0.657392972245     |
| parkinsons                      | 0.0285585403778 | 0.0181220846509  | 0.0377686844456    | 0.634559204049  | 1.32250051809        | 0.78266557298464967   | -1.07449345627e-07 | 1.03802934431e-07     | -1.03512820919      |
| ptsd                            | 0.0285585403778 | 0.0158311448575  | 0.0252035463427    | 0.554340125512  | 0.882522216097       | 0.61399475633185308   | -2.11969766734e-09 | 5.90378681398e-09     | -0.359040347175     |
| schizophrenia                   | 0.0285585403778 | 0.0439524960139  | 0.00518827391697   | 1.53903159729   | 0.181671536722       | 0.0040658986922064563 | 5.79191332067e-08  | 2.90745874737e-08     | 1.99208787602       |
| short_sleep_duration_dashti     | 0.0285585403778 | 0.0290202867982  | 0.00759054637314   | 1.01616841807   | 0.265789016971       | 0.95145320502812791   | -6.25991327036e-10 | 1.89010402753e-09     | -0.331194113085     |
| sleep_duration_dashti           | 0.0285585403778 | 0.0332153737468  | 0.00546705152353   | 1.16306272335   | 0.191433156289       | 0.39363104017150297   | 6.41672872362e-10  | 2.03834723678e-09     | 0.314800570179      |
| sleep_duration_jansen           | 0.0285585403778 | 0.0297621507776  | 0.00570452852212   | 1.04214537521   | 0.199748602227       | 0.83297985102434069   | -3.57065485203e-10 | 2.15179662249e-09     | -0.165938305447     |
| smoking_cessation               | 0.0285585403778 | 0.0559441827237  | 0.0117306565981    | 1.95893004277   | 0.410758268557       | 0.018762694696852374  | 4.25079654331e-09  | 2.11565124529e-09     | 2.00921420899       |
| smoking_initiation              | 0.0285585403778 | 0.0368913203302  | 0.00490912864099   | 1.29177891594   | 0.171897042918       | 0.088340120785442969  | 3.04574391241e-09  | 1.81285272096e-09     | 1.68008348234       |
| subjective_wellbeing            | 0.0285585403778 | 0.0310637889443  | 0.0120178613071    | 1.08772327063   | 0.420814969818       | 0.83517060985567571   | 1.27967917034e-10  | 1.67202258157e-09     | 0.076534801889      |
| years_of_education              | 0.0285585403778 | 0.0399693020891  | 0.00366596731897   | 1.39955689473   | 0.128366760712       | 0.0021923834379482966 | 7.45857708254e-09  | 2.81437949099e-09     | 2.65016750812       |