# RareCrowds

In [1]:
#!pip install rarecrowds
#!sudo apt install graphviz

## DiseaseAnnotations

Disease information is extracted from Orphanet's orphadata (product 4, product 9 (prevalence) and product 9 (ages)) and from the HPOA file created by the Monarch Initiative within the HPO project. By default, Orphanet's and OMIM phenotypic description of a rare disease extracted from the HPOA file are intersected. There is, in principle, no need for you to parse the data provided from these institutions.


In [2]:
from rarecrowds import DiseaseAnnotations
dann = DiseaseAnnotations(mode="intersect")
data = dann.data["ORPHA:324"]
data

{'source': {},
 'phenotype': {'HP:0000083': {'frequency': 'HP:0040281',
   'modifier': {'diagnosticCriteria': True}},
  'HP:0000093': {'frequency': 'HP:0040282',
   'modifier': {'diagnosticCriteria': True}},
  'HP:0000822': {'frequency': 'HP:0040283',
   'modifier': {'diagnosticCriteria': True}},
  'HP:0000823': {'frequency': 'HP:0040282',
   'modifier': {'diagnosticCriteria': True}},
  'HP:0000966': {'frequency': 'HP:0040281',
   'modifier': {'diagnosticCriteria': True}},
  'HP:0001004': {'frequency': 'HP:0040283',
   'modifier': {'diagnosticCriteria': True}},
  'HP:0001014': {'frequency': 'HP:0040281',
   'modifier': {'diagnosticCriteria': True}},
  'HP:0001131': {'frequency': 'HP:0040281',
   'modifier': {'diagnosticCriteria': True}},
  'HP:0001250': {'frequency': 'HP:0040283',
   'modifier': {'diagnosticCriteria': True}},
  'HP:0001635': {'frequency': 'HP:0040281',
   'modifier': {'diagnosticCriteria': True}},
  'HP:0001681': {'frequency': 'HP:0040283',
   'modifier': {'diagnosticC

Based on this data, one may subset the diseases in order to get a list of diseases of interest, highly recommended at the beginning of the development of a phenotype-based prediction algorithm:

In [3]:
# These lines come from the previous code
ann = dann.data
# del phen
print(f"# total initial entities: {len(ann)}")
## Keep only disorders
for dis, val in list(ann.items()):
    if val["group"] != "Disorder":
        del ann[dis]
print(f"# disases: {len(ann)}")
## Keep only those with phenotypic information
for dis, val in list(ann.items()):
    if not val.get("phenotype"):
        del ann[dis]
print(f"# disases with phenotype data: {len(ann)}")
## Remove clinial syndromes
for dis, val in list(ann.items()):
    if val["type"].lower() == "clinical syndrome":
        del ann[dis]
print(f"# diseases w/o clinical syndromes: {len(ann)}")
## Keep only selected prevalences
valid_prev = [
    ">1 / 1000",
    "6-9 / 10 000",
    "1-5 / 10 000",
    "1-9 / 100 000",
    "Unknown",
    "Not yet documented",
]
for dis, val in list(ann.items()):
    if "prevalence" in val:
        classes = [
            a["class"] for a in val["prevalence"] if a["type"] == "Point prevalence"
        ]
        if not any(x in valid_prev for x in classes):
            del ann[dis]
    else:
        del ann[dis]
print(f"# disases with valid prevalence: {len(ann)}")


# total initial entities: 3617
# disases: 3258
# disases with phenotype data: 1920
# diseases w/o clinical syndromes: 1916
# disases with valid prevalence: 527


## HPO
Symptoms are organized through the Human Phenotype Ontology (HPO). If you are not familiar with it, please visit the website.

In order to get information on specific symptom IDs and other items included in the HPO ontology, such as the frequency subontology, RareCrowds includes the HPO module. This module allows you to get information about each term and their relationships.

In order to get information about a specific HPO term, run the following lines:

In [4]:
from rarecrowds import Hpo
hpo = Hpo()
hpo["HP:0000083"] # Renal insufficiency

{'id': 'HP:0000083', 'label': 'Renal insufficiency'}

![renal insufficiency](https://github.com/martin-fabbri/colab-notebooks/raw/master/pytorch-geometric/images/hierarchy-hp-0000083.png)

Frequency

In [5]:
hpo["HP:0040281"]

{'id': 'HP:0040281', 'label': 'Very frequent'}

In order to see the successors or predecessors of a particular term, run any of the following lines:

![renal insufficiency oncology](https://github.com/martin-fabbri/colab-notebooks/raw/master/pytorch-geometric/images/renal-insufficiency-oncology.png)

In [6]:
hpo.successors(["HP:0000083"])

# HP:0001919 Acute kidney injury
# HP:0004713 Reversible renal failure
# HP:0012622 Chronic kidney disease

['HP:0001919', 'HP:0004713', 'HP:0012622']

In [7]:
hpo.predecessors(["HP:0000083"])

# Abnormal renal physiology

['HP:0012211']

In [8]:
hpo.predecessors(["HP:0012211"])

# HP:0000077 Abnormality of the kidney
# HP:0011277 Abnormality of the urinary system physiology

['HP:0000077', 'HP:0011277']

In [9]:
hpo.simplify(['HP:0001250', 'HP:0007359'])

{'HP:0007359'}

![PhenoLines](https://github.com/martin-fabbri/colab-notebooks/raw/master/pytorch-geometric/images/renal-insuficiency.png)

Dump adjacency matrix as json

In [10]:
# hpo.json_adjacency()

## PatientSampler

This module allows the creation of realistic patient profiles based on the disease annotations. The following steps are followed to sample a patient from a given disease:

1. Sample symptoms using the symptom frequency.
2. From the selected symptoms, sample imprecision as a Poisson process with a certain probability of getting a less specific term using the HPO ontology.
3. Add random noise sampling random HPO terms. The amount of random noise is also a Poisson process, while the selection of the HPO terms to include is uniform across the phenotypic abnormality subontology (disregarding too uninformative terms).
4. Sample patient age by assuming that it is close to the disease onset plus a delay of ~1 month.
5. Sample patient sex taking into account the inheritance pattern of the disease.

In order to sample 5 patients from a disease, run the following lines:

In [15]:
from rarecrowds import PatientSampler
sampler = PatientSampler()
patients = sampler.sample(patient_params="default", N=5)
patients["ORPHA:1488"]

{'id': 'ORPHA:1488',
 'name': 'Cooper-Jabs syndrome',
 'phenotype': {'HP:0000413': {'frequency': 'HP:0040281',
   'modifier': {'diagnosticCriteria': True}},
  'HP:0001249': {'frequency': 'HP:0040281',
   'modifier': {'diagnosticCriteria': True}},
  'HP:0001629': {'frequency': 'HP:0040281',
   'modifier': {'diagnosticCriteria': True}}},
 'cohort': [{'ageOnset': 0.06545664221658443,
   'phenotype': {'HP:0000413': {}, 'HP:0010438': {}, 'HP:0001249': {}}},
  {'ageOnset': 0.10439734201198989,
   'phenotype': {'HP:0000413': {},
    'HP:0010438': {},
    'HP:0012638': {},
    'HP:0007181': {}}},
  {'ageOnset': 0.06668522064453812,
   'phenotype': {'HP:0000413': {},
    'HP:0001629': {},
    'HP:0012638': {},
    'HP:0100643': {}}},
  {'ageOnset': 0.10520708850282642,
   'phenotype': {'HP:0000372': {},
    'HP:0001629': {},
    'HP:0001249': {},
    'HP:0007018': {}}},
  {'ageOnset': 0.07486578570783506,
   'phenotype': {'HP:0000413': {}, 'HP:0001713': {}, 'HP:0011446': {}}}]}

In [17]:
patients['ORPHA:3255']

{'id': 'ORPHA:3255',
 'name': 'Filippi syndrome',
 'phenotype': {'HP:0000028': {'frequency': 'HP:0040281',
   'modifier': {'diagnosticCriteria': True}},
  'HP:0000252': {'frequency': 'HP:0040281',
   'modifier': {'diagnosticCriteria': True}},
  'HP:0000430': {'frequency': 'HP:0040281',
   'modifier': {'diagnosticCriteria': True}},
  'HP:0000431': {'frequency': 'HP:0040281',
   'modifier': {'diagnosticCriteria': True}},
  'HP:0001249': {'frequency': 'HP:0040281',
   'modifier': {'diagnosticCriteria': True}},
  'HP:0000322': {'frequency': 'HP:0040282',
   'modifier': {'diagnosticCriteria': True}},
  'HP:0000337': {'frequency': 'HP:0040282',
   'modifier': {'diagnosticCriteria': True}},
  'HP:0000648': {'frequency': 'HP:0040282',
   'modifier': {'diagnosticCriteria': True}},
  'HP:0001511': {'frequency': 'HP:0040282',
   'modifier': {'diagnosticCriteria': True}},
  'HP:0000233': {'frequency': 'HP:0040283',
   'modifier': {'diagnosticCriteria': True}},
  'HP:0001250': {'frequency': 'HP:004

## PhenotypicComparison

In [23]:
import networkx as nx
import plotly.graph_objects as go
from typing import List, Dict

from rarecrowds.utils.hpo import Hpo

In [24]:
def prepare_data(G, disease_set):
    mapping = {n: n.replace(":", "_") for n in G.nodes}
    G = nx.relabel_nodes(G, mapping)
    # pos = nx.drawing.nx_pydot.graphviz_layout(G, prog="dot")
    # data = {
    #     "edges": {"x": [], "y": []},
    #     "preds": {"x": [], "y": [], "labels": []},
    #     "phens": {"x": [], "y": [], "labels": []},
    # }
    # for edge in G.edges:
    #     x0, y0 = pos[edge[0]]
    #     x1, y1 = pos[edge[1]]
    #     data["edges"]["x"].append(x0)
    #     data["edges"]["x"].append(x1)
    #     data["edges"]["x"].append(None)
    #     data["edges"]["y"].append(y0)
    #     data["edges"]["y"].append(y1)
    #     data["edges"]["y"].append(None)
    # for node in G.nodes:
    #     x, y = pos[node]
    #     label = self.hpo[node.replace("_", ":")]
    #     label = f"{label['id']}: {label['label']}"
    #     if node.replace("_", ":") in disease_set:
    #         data["phens"]["x"].append(x)
    #         data["phens"]["y"].append(y)
    #         data["phens"]["labels"].append(label)
    #     else:
    #         data["preds"]["x"].append(x)
    #         data["preds"]["y"].append(y)
    #         data["preds"]["labels"].append(label)
    # return data
    return None

In [46]:
def plot_disease(patient: Dict, name: str = "", code: str = ""):
    patient_set = set(patient)
    predecessors = set(hpo.predecessors(list(patient_set), 1000))
    hpo_set = patient_set.union(predecessors)
    hpo_set.remove("HP:0000001")
    
    # 'networkx.classes.digraph.DiGraph'
    G = hpo.Graph.subgraph(list(hpo_set))

    plt_data = prepare_data(G, patient_set)

    # edge_trace = go.Scatter(
    #     x=plt_data["edges"]["x"],
    #     y=plt_data["edges"]["y"],
    #     name="HPO links",
    #     line=dict(width=0.75, color="#888"),
    #     hoverinfo="none",
    #     mode="lines",
    # )

    # pred_trace = go.Scatter(
    #     x=plt_data["preds"]["x"],
    #     y=plt_data["preds"]["y"],
    #     name="Predecessor terms",
    #     text=plt_data["preds"]["labels"],
    #     mode="markers",
    #     marker=dict(color="#888", size=5, line_width=0),
    # )

    # terms_trace = go.Scatter(
    #     x=plt_data["phens"]["x"],
    #     y=plt_data["phens"]["y"],
    #     name="Input terms",
    #     mode="markers",
    #     text=plt_data["phens"]["labels"],
    #     marker=dict(size=10, line_width=1),
    # )

    # fig = go.Figure(
    #     data=[edge_trace, pred_trace, terms_trace],
    #     layout=go.Layout(
    #         width=1000,
    #         height=600,
    #         showlegend=True,
    #         hovermode="closest",
    #         margin=dict(b=20, l=5, r=5, t=40),
    #         xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
    #         yaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
    #     ),
    # )

    # if not name:
    #     name = patient.get("name")
    # if not code:
    #     code = patient.get("id")
    # if name or code:
    #     title = "HPO terms"
    #     if name:
    #         title += f" of {name}"
    #     if code:
    #         if "orpha" in code.lower():
    #             link = "http://www.orpha.net/consor/cgi-bin/OC_Exp.php?lng=en&Expert="
    #             link += code.split(":")[1]
    #         elif "omim" in code.lower():
    #             link = "https://www.omim.org/entry/"
    #             link += code.split(":")[1]
    #         elif "mondo" in code.lower():
    #             link = "https://monarchinitiative.org/disease/"
    #             link += code.upper()
    #         title += f" <a href='{link}'>({code})</a>"
    #     fig.update_layout(title=title, titlefont_size=14)
    # fig.show()
    # return fig
    return None

In [34]:
patient = patients['ORPHA:324']['cohort'][0]['phenotype']
patient

{'HP:0002326': {},
 'HP:0033354': {},
 'HP:0011675': {},
 'HP:0000823': {},
 'HP:0000481': {},
 'HP:0001014': {},
 'HP:0011025': {},
 'HP:0011277': {},
 'HP:0012638': {},
 'HP:0025276': {},
 'HP:0006780': {},
 'HP:0000253': {}}

In [47]:
plot_disease(patient)

<class 'networkx.classes.digraph.DiGraph'>
