# PubMed Knowledge Graph

This notebook is part of a series that walks through the process of generating a knowledge graph of PubMed articles.

This notebook will
* Load structured patient journey data into a Neo4j instance

In [1]:
import os

import pandas as pd
from pyneoinstance import Neo4jInstance, load_yaml_file

## PyNeoInstance

Our database credentials and all of our queries are stored in the `pyneoinstance_config.yaml` file. 

This makes it easy to manage our queries and keeps the notebook code clean. 

In [2]:
config = load_yaml_file("pyneoinstance_config.yaml")

db_info = config['db_info']

constraints = config['initializing_queries']['constraints']
indexes = config['initializing_queries']['indexes']

load_query = config['loading_queries']['patient_journey_graph']

This graph object will handle database connections and read / write transactions for us.

In [3]:
graph = Neo4jInstance(db_info.get('uri', os.getenv("NEO4J_URI", "neo4j://localhost:7687")), # use config value -> use env value -> use default value
                      db_info.get('user', os.getenv("NEO4J_USER", "neo4j")), 
                      db_info.get('password', os.getenv("NEO4J_PASSWORD", "password")))

This is a helper function for ingesting data using the PyNeoInstance library.

In [4]:
def get_partition(data: pd.DataFrame, batch_size: int = 500) -> int:
    """
    Determine the data partition based on the desired batch size.

    Parameters
    ----------
    data : pd.DataFrame
        The Pandas DataFrame to partition.
    batch_size : int
        The desired batch size.

    Returns
    -------
    int
        The partition size.
    """
    
    partition = int(len(data) / batch_size)
    print("partition: "+str(partition if partition > 1 else 1))
    return partition if partition > 1 else 1

## Constraints + Indexes

In [5]:
def create_constraints_and_indexes() -> None:
    """
    Create constraints and indexes for the lexical and domain graphs.
    """
    try:
        if constraints and len(constraints) > 0:
            graph.execute_write_queries(database=db_info['database'], queries=list(constraints.values()))
    except Exception as e:
        print(e)

    try:
        if indexes and len(indexes) > 0:
            graph.execute_write_queries(database=db_info['database'], queries=list(indexes.values()))
    except Exception as e:
        print(e)

In [6]:
create_constraints_and_indexes()

## Load Data

Our patient journey data will be loaded from a Pandas DataFrame.

In [7]:
def load_patient_journey_data(graph: Neo4jInstance, data: pd.DataFrame) -> None:
    """
    Load patient journey data into the graph.
    """

    print(f"Loading {len(data)} patient journey rows")
    res = graph.execute_write_query_with_data(database=db_info['database'], 
                                            data=data, 
                                            query=load_query, 
                                            partitions=get_partition(data, batch_size=500),
                                            parallel=False)
    print(res)

In [8]:
data = pd.read_csv("data/protocol/extended_patient_journey.csv")

In [9]:
data.head()

Unnamed: 0,member_id,age,sex,zip_code,protocol,diagnosis,diagnosis_code,procedure_code,procedure_name,medications,visit_date,lab_test,lab_loinc,lab_value,outcome
0,P001,65,M,72652,Protocol_A,Hypertension,I10,83036,HbA1c Test,"Metformin, GLP-1",2023-08-17,A1C,4548-4,6.7,A1C Controlled
1,P001,65,M,72652,Protocol_A,Hypertension,I10,83036,HbA1c Test,"Metformin, GLP-1",2023-11-15,A1C,4548-4,8.2,A1C Not Controlled
2,P001,65,M,72652,Protocol_A,Hypertension,I10,83036,HbA1c Test,"Metformin, GLP-1",2024-02-13,A1C,4548-4,8.0,A1C Not Controlled
3,P001,65,M,72652,Protocol_A,Hypertension,I10,83036,HbA1c Test,"Metformin, GLP-1",2024-05-13,A1C,4548-4,8.7,A1C Not Controlled
4,P002,47,F,25666,Protocol_A,Type 2 Diabetes,E11.9,83036,HbA1c Test,"Metformin, GLP-1",2024-02-13,A1C,4548-4,6.7,A1C Controlled


And now we load our patient journey data

In [10]:
load_patient_journey_data(graph, data)

Loading 221 patient journey rows
partition: 1
{'labels_added': 139, 'relationships_created': 863, 'nodes_created': 139, 'properties_set': 938}
