# Coding Club - NetworkX

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nhs-pycom/coding-club/blob/main/introduction-to-networkx/introduction-to-networkx.ipynb)

This notebook gives a light introduction to `networkx`.

In [None]:
import matplotlib.pyplot as plt
import networkx as nx
import pandas as pd
import re

print(nx.__version__)

In [None]:
G = nx.Graph()

nx.draw(G)

**Learning a Health Knowledge Graph from Electronic Medical Records. Nature Scientific Reports, 2017**

*Maya Rotmensch, Yoni Halpern, Abdulhakim Tlimat, Steven Horng, and David Sontag.*

[Paper](https://www.nature.com/articles/s41598-017-05778-z) | [GitHub](https://github.com/clinicalml/HealthKnowledgeGraph)

#### Health Knowledge Graph (KG) for 157 diseases and 491 symptoms, learned from patients' data using a Noisy-OR Bayesian Network, as described in the paper.

In [None]:
kg_url = 'https://raw.githubusercontent.com/clinicalml/HealthKnowledgeGraph/master/DerivedKnowledgeGraph_final.csv'

df_kg = pd.read_csv(kg_url)

df_kg.head()

In [None]:
df_kg['split_symptoms'] = df_kg['Symptoms'].apply(lambda elem: elem.split(','))
df_kg['split_symptoms'] = df_kg['split_symptoms'].apply(
    lambda elem: [(
        ent.split('(')[0].strip(), 
        float(re.search(r'0.\d{3}', ent)[0])
    ) for ent in elem]
)

df_kg = df_kg.drop('Symptoms', axis=1)

print(df_kg.shape)

df_kg.head()

In [None]:
long_rows = []

for n, row in df_kg.iterrows():
    new_rows = list(zip(
        [row['Diseases'] for i in range(len(row['split_symptoms']))],
        [elem[0] for elem in row['split_symptoms']],
        [elem[1] for elem in row['split_symptoms']]
    ))
    
    long_rows.extend(new_rows)
    
df_kg_long = pd.DataFrame(
    long_rows, 
    columns=('disease', 'symptom', 'score')
)

print(df_kg_long.shape)

df_kg_long.head()