Skip to content
Extracting disease-specific genomic coordinates from GWAS catalog
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Disease- or trait-specific SNP sets, genomic coordinates

A collection of datasets from various publications containing genomic coordinates of disease- and/or trait-associated SNPs. And, scripts for their processing.

Autoimmune diseases

  • autoimmune folder. Description of autoimmune-related genomics datasets. R.GR.autoimmune - working folder with an R project for the analysis of 39 disease/trait-associated SNP sets.

  • gwasCatalog folder. Scripts to extract the coordinates of disease-specific SNP sets into separate files. Description of genomics datasets and databases related to complex diseases.

  • tumorportal folder. Description of genomics datasets and databases related to cancers.

  • population folder. Individual-specific genotypes of various populations. See there.

Large data collections are in the data subfolders of the autoimmune, gwasCatalog, and tumorportal folders. Each subfolder has its own README file with the dataset-specific explanations.

Disease-disease similarities

Disease-disease similarity based on symptom similarity. Zhou X, Menche J, Barabási A-L, Sharma A: Human symptoms-disease network. Nat Commun 2014, 5(May):4212.

  • human-cooccur-disease-network.txt.gz - data from Supplementary Data 1. List of all 4,442 diseases within PubMed and their occurrence. zcat < human-cooccur-disease-network.txt.gz | sort -k2 -n -r > human-cooccur-disease-names.txt - which diseases are the most frequently studied.

  • human-sig-disease-network.txt.gz - data from Supplementary Data 4. List of disease links in the disease network with both significant shared symptoms and shared genes/PPIs. In total there are 133,106 such connections between 1,596 distinct diseases. The table has 3 columns: "MeSH Disease Term", "MeSH Disease Term", "symptom similarity score".

  • human-disease-to-UMLS.xlsx - data from Supplementary Table 6. This data file includes 33,977 records of the map from HPO phenotypes to UMLS semantic types (from UMLS 2012AA). 33,977 records.

  • human-disease-to-SNOMED.xlsx - data from Supplementary Table 7. SNOMED-CT symptom-disease relationships. The data file has six components: disease-symptom relationships, disease list, disease terms, symptom list, symptom terms and SNOMED semantic types. There are 2,340 records of disease-symptom relationships, which include 1,623 diseases and 817 symptoms. The SNOMED semantic type component lists the semantic types of concepts and their numbers in SNOMED.

  • Rzhetsky_A_Appendix3.pdf - A list of diseases, their ICD9 codes, and brief descriptions. Rzhetsky, A., Wajngurt, D., Park, N., & Zheng, T. (2007). Probing genetic overlap among complex human phenotypes. Proceedings of the National Academy of Sciences of the United States of America, 104, 11694–11699. doi:10.1073/pnas.0704820104](

Disease-disease relationships based on gene/protein interaction networks. Suthram S, Dudley JT, Chiang AP, Chen R, Hastie TJ, Butte AJ: Network-based elucidation of human disease similarities reveals common functional modules enriched for pluripotent drug targets. PLoS Comput Biol 2010, 6:1–10.

  • Suthram_TableS2_diseases-umls.xlsx - data from Supplementary Table S1. List of 54 diseases, their UMLS IDs and GEO IDs.

  • Suthram_TableS2_disease-relationships.xlsx - data from Supplementary Table S2. List of the 138 significant disease-disease correlations. Other correlations are not significant.

You can’t perform that action at this time.