### Interactive Playbook for iderare_pheno library

#### Part 1 - Standardizing the terminology

Converter for the iderare_pheno library. This library is used to generate phenotypes for the iderare dataset. This notebook is an interactive playbook that demonstrates how to use the library.

In [None]:
from iderare_pheno.converter import term2omim, term2orpha, term2hpo, batchconvert

In [None]:
# Example of parsing SNOMEDCT to ORPHA code(s) (term2orpha), it will return a list of ORPHA code(s)
snomed_code = 'SNOMEDCT:190794006'
term2orpha_res = term2orpha(snomed_code)
print('Orpha List for {}'.format(snomed_code), term2orpha_res)

In [None]:
# Example of converting LOINC / SNOMEDCT to HPO (term2hpo), it will return a list of HPO code(s)
loinc_code = 'LOINC:1751-7|L'
snomed_code = 'SNOMEDCT:75183008'

term2hpo_res = term2hpo(loinc_code) + term2hpo(snomed_code)
print('HPO List result after converting LOINC and SNOMEDCT to HPO are', term2hpo_res)

In [None]:
# Example of converting ICD10 and ORPHA to OMIM (term2omim), it will return a list of OMIM code(s)
icd10_code = 'ICD-10:E74.0'
orpha_code = 'ORPHA:355'

term2omim_res = term2omim(icd10_code) + term2omim(orpha_code)
print('OMIM List result after converting ICD10 and ORPHA to OMIM are', term2omim_res)

In [None]:
# Example of automated parsing of mixed codes of SNOMEDCT, LOINC, ICD10, OMIM, HPO, ORPHA to OMIM and HPO (batchconvert)
clinical_data_list = ['SNOMEDCT:258211005', 'SNOMEDCT:36760000', 'SNOMEDCT:271737000', 'SNOMEDCT:389026000', 'SNOMEDCT:70730006', 'SNOMEDCT:127035006', 'SNOMEDCT:33688009', 'SNOMEDCT:75183008', 'SNOMEDCT:59927004', 'SNOMEDCT:312894000', 'SNOMEDCT:56786000', 'SNOMEDCT:1388004', 'LOINC:1751-7|L', 'LOINC:2085-9|L', 'LOINC:777-3|L', 'LOINC:2519-7|H', 'LOINC:1742-6|H', 'LOINC:1920-8|H', 'HP:0002366', 'HP:0006568', 'HP:0004333', 'HP:0001531', 'SNOMEDCT:65959000', 'SNOMEDCT:190794006', 'SNOMEDCT:66751000', 'ICD-10:E74.0']

# Batch convert will return 2 list array
hpo_sets, diagnosis_sets = batchconvert(clinical_data_list)

print('HPO List result after converting mixed codes to HPO are', hpo_sets)
print('OMIM List result after converting mixed codes to OMIM are', diagnosis_sets)

#### Part 2 - Similarity and ontology linkage analysis between the phenotype and given differential diagnoses

This part will cover how to use the similarity and ontology linkage analysis between phenotype and differential diagnoses.


Given the ```diagnosis_sets```, and ```hpo_sets``` as input, the similarity and ontology linkage analysis will be performed. The similarity and ontology linkage analysis will be performed using the ```iderare_pheno.simrec``` (similarity & recommendation) library. 

In this part, it will resulted in ```full``` diagnosis_sets linkage and similarity analysis, and ```filtered``` diagnosis_sets linkage and similarity analysis. The ```filtered``` diagnosis_sets linkage and similarity analysis will only show the diagnosis_sets that have similarity score **greater than preset threshold** or **n-top diagnoses** if there are no disease passing the minimum threshold.

The similarity score is calculated using ```graph``` method with ```bma``` (best-match average) strategy.

In [None]:
from iderare_pheno.simrec import hpo2omim_similarity

In [None]:
## For similarity first of all, set up the threshold and n-top differential diagnoses, you would like to analyzed
thr = 0.5 # Filter option 1 : example of threshold of 0.5 similarity
diffx = 100 # Filter option 2 / fallback if there are none passing threshold : example of top-100 differential diagnoses

In [None]:
## hpo2omim_similarity will return 3 object :
# 1. first (list) will be similarity score 
# 2. second (array) will contain the linkage, disease name, and omim disease ID of all differential diagnoses, sorted desc by similarity score
# 3. third (array) will contain the linkage, disease name, and omim disease ID of filtered diagnoses, sorted desc by similarity score

s_sim, [lnk_all, sr_dis_name, sr_dis_id], [lnk_thr, sr_dis_name_thr, sr_dis_id_thr] = hpo2omim_similarity(diagnosis_sets, hpo_sets, threshold=thr, differential=diffx)

#### Part 3 - Getting the recommendation of diagnoses based on the similarity and ontology linkage analysis between the phenotype provided and OMIM databases

This part will cover how to use the gene and phenotype differential diagnoses set generator to generate the OMIM-based differential diagnoses set based on the phenotype provided

#### Part 4 - Other utilities

This utilities function is the helper function used to transform the data into the  object / sequelized format of hpo3, and other file conversion function