## 1. Connect to Postgres Database and Verify Connection
The MIMIC-IV data is stored in a Postgres Database with separate schemas for each module (ICU, Hosp, ED, and Notes).

We will use the `psycopg2` package to connect to Postgres and verify a connection to the database.

In [2]:
import psycopg2
from psycopg2 import OperationalError, DatabaseError
import pandas
import csv

In [3]:
### Environment Variables for Connection ###
DB_NAME = 'smcdougall'
USERNAME = 'postgres'
PASSWORD = 'postgres'
HOST = 'localhost'
PORT = 5432 

In [4]:
def connect_to_postgres(db_name, username, password, host, port):
    connection = None
    try:
        connection = psycopg2.connect(
            dbname=db_name,
            user=username,
            password=password,
            host=host,
            port=port
        )
        print('Connected to db:', db_name)
        return connection
    except OperationalError as e:
        print('Received the following error:', e)
        return None

In [5]:
def verify_postgres_connection(connection):
    if connection is not None:
        try:
            cur = connection.cursor()
            cur.execute('SELECT version();')
            db_version = cur.fetchone()
            print('The Postgres database version is:', db_version)
            cur.close()
        except DatabaseError as e:
            print('Received the following error:', e)
    else:
        print('Connection to Postgres failed.')

In [6]:
def close_connection(connection):
    if connection is not None:
        connection.close()
        print('Postgres connection has been closed.')

In [7]:
connection = connect_to_postgres(DB_NAME, USERNAME, PASSWORD, HOST, PORT)
verify_postgres_connection(connection)
close_connection(connection)

Connected to db: smcdougall
The Postgres database version is: ('PostgreSQL 14.5 on aarch64-apple-darwin20.6.0, compiled by Apple clang version 12.0.5 (clang-1205.0.22.9), 64-bit',)
Postgres connection has been closed.


## 2. Verify Ability to Read Data from all 4 Modules
The four modules from MIMIC-IV being used are: hosp, icu, ed, notes
Verify connection to each of them so that they can be appropriately filtered

Hosp module:

In [8]:
connection = connect_to_postgres(DB_NAME, USERNAME, PASSWORD, HOST, PORT)
cur = connection.cursor()
# tells Postgres to prioritize searching for tables within the mimiciv_hosp schema
# before searching other schemas
cur.execute('SET search_path to mimiciv_hosp')
cur.execute('SELECT COUNT(*) from patients')
# fetch result
count = cur.fetchone()[0]
print('Count of rows:', count)

cur.close()
close_connection(connection)

Connected to db: smcdougall
Count of rows: 299712
Postgres connection has been closed.


ICU module:

In [9]:
connection = connect_to_postgres(DB_NAME, USERNAME, PASSWORD, HOST, PORT)
cur = connection.cursor()
cur.execute('SET search_path to mimiciv_icu')
cur.execute('SELECT COUNT(*) from caregiver')
# fetch result
count = cur.fetchone()[0]
print('Count of rows:', count)

cur.close()
close_connection(connection)

Connected to db: smcdougall
Count of rows: 15468
Postgres connection has been closed.


ED module:

In [10]:
connection = connect_to_postgres(DB_NAME, USERNAME, PASSWORD, HOST, PORT)
cur = connection.cursor()
cur.execute('SET search_path to mimiciv_ed')
cur.execute('SELECT COUNT(*) from diagnosis')
# fetch result
count = cur.fetchone()[0]
print('Count of rows:', count)

cur.close()
close_connection(connection)

Connected to db: smcdougall
Count of rows: 899050
Postgres connection has been closed.


Notes:

In [11]:
connection = connect_to_postgres(DB_NAME, USERNAME, PASSWORD, HOST, PORT)
cur = connection.cursor()
cur.execute('SET search_path to mimiciv_note')
cur.execute('SELECT COUNT(*) from discharge')
# fetch result
count = cur.fetchone()[0]
print('Count of rows:', count)

cur.close()
close_connection(connection)

Connected to db: smcdougall
Count of rows: 331793
Postgres connection has been closed.


## 3. Store our filtering criteria
To start, we will filter by the following:
- (1) ICD-9/ICD-10 codes pertaining to pregnancy (from both hosp module and ED module)
- (2) DRG Codes (diagnosis-related group) related to pregnancy

### Diagnosis Code filtering - hosp module
- Relevant table - `diagnoses_icd`
- Relevant columns - `icd_code` and `icd_version`

Query will be of the form `icd_version = 9 AND icd_code = ... OR icd_version = 10 AND icd_code = ...`

### ICD-10 codes
- Z33 - Pregnant State
- Z34 - Encounter for supervision of normal pregnancy
- Z3A - weeks of gestation
- O00-O9A - complications of pregnancy

### ICD-10 query

```
.... FROM diagnoses_icd
WHERE icd_code LIKE 'Z33%' 
OR icd_code LIKE 'Z34%' 
OR icd_code LIKE 'Z3A%' 
OR icd_code LIKE 'O0%' 
OR icd_code LIKE 'O1%' 
OR icd_code LIKE 'O2%' 
OR icd_code LIKE 'O3%' 
OR icd_code LIKE 'O4%' 
OR icd_code LIKE 'O5%' 
OR icd_code LIKE 'O6%' 
OR icd_code LIKE 'O7%' 
OR icd_code LIKE 'O8%' 
OR icd_code LIKE 'O9%'

AND icd_version = 10

```
### ICD-9 codes
- V22.0-V22.9 - normal pregnancy
- V23.0-V23.9 - pregnancy, including high-risk pregnancy
- V24 - Routine postpartum care and examination
- V27 - Outcome of delivery
- V28 - Antenatal screening
- 630-679 - complications of pregnancy (includes ectopic pregnancy, complications related to pregnancy, puerperium, course of labor and delivery, etc.)

### ICD-9 query
```
... FROM diagnoses_icd
WHERE icd_code LIKE 'V22%'
OR icd_code LIKE 'V23%'
OR icd_code LIKE 'V24%'
OR icd_code LIKE 'V27%'
OR icd_code LIKE 'V28%'
OR icd_code LIKE '63%'
OR icd_code LIKE '64%'
OR icd_code LIKE '65%'
OR icd_code LIKE '66%'
OR icd_code LIKE '67%'

AND icd_version = 9
```


### Diagnosis Code filtering - ED module
- module - `mimiciv_ed`
- table - `diagnosis`
- columns - `icd_code` and `icd_version`
- will pretty much be the same as hosp queries except the table we are pulling from is called `diagnosis` instead of `diagnoses_icd`

### ICD-10 codes
- Z33 - Pregnant State
- Z34 - Encounter for supervision of normal pregnancy
- Z3A - weeks of gestation
- O00-O9A - complications of pregnancy

### ICD-10 query

```
.... FROM diagnosis
WHERE icd_code LIKE 'Z33%' 
OR icd_code LIKE 'Z34%' 
OR icd_code LIKE 'Z3A%' 
OR icd_code LIKE 'O0%' 
OR icd_code LIKE 'O1%' 
OR icd_code LIKE 'O2%' 
OR icd_code LIKE 'O3%' 
OR icd_code LIKE 'O4%' 
OR icd_code LIKE 'O5%' 
OR icd_code LIKE 'O6%' 
OR icd_code LIKE 'O7%' 
OR icd_code LIKE 'O8%' 
OR icd_code LIKE 'O9%'

AND icd_version = 10

```
### ICD-9 codes
- V22.0-V22.9 - normal pregnancy
- V23.0-V23.9 - pregnancy, including high-risk pregnancy
- V24 - Routine postpartum care and examination
- V27 - Outcome of delivery
- V28 - Antenatal screening
- 630-679 - complications of pregnancy (includes ectopic pregnancy, complications related to pregnancy, puerperium, course of labor and delivery, etc.)

### ICD-9 query
```
... FROM diagnosis
WHERE icd_code LIKE 'V22%'
OR icd_code LIKE 'V23%'
OR icd_code LIKE 'V24%'
OR icd_code LIKE 'V27%'
OR icd_code LIKE 'V28%'
OR icd_code LIKE '63%'
OR icd_code LIKE '64%'
OR icd_code LIKE '65%'
OR icd_code LIKE '66%'
OR icd_code LIKE '67%'

AND icd_version = 9
```


### Diagnosis Code filtering - DRG codes (hosp module)
- Hosp module
- `drgcodes` table
- `drg_code` column (note the values are numbers but they are represented as **strings**)

```
... FROM drgcodes
WHERE drg_code BETWEEN '370' AND '384'
```
(use of 'LIKE' over 'BETWEEN' would be less efficient here)

## Get Number of Unique Subjects we are starting with
- join `hosp` diagnosis table with `ed` diagnosis table so that we can get the starting number of patients that have been given diagnoses either in hospital or ED setting
- gives us a baseline when seeing how much filtering we have done

In [12]:
def count_unique_subject_ids_from_diagnosis(connection):
    cur = connection.cursor()
    cur.execute("""
        SELECT COUNT(DISTINCT di.subject_id)
        FROM mimiciv_hosp.diagnoses_icd di
        INNER JOIN mimiciv_ed.diagnosis ed
        ON di.subject_id = ed.subject_id
    """)
    count = cur.fetchone()[0]
    print("Number of unique patients with diagnoses:", count)
    cur.close()

connection = connect_to_postgres(DB_NAME, USERNAME, PASSWORD, HOST, PORT)
count_unique_subject_ids_from_diagnosis(connection)
close_connection(connection)

Connected to db: smcdougall
Number of unique patients with diagnoses: 122474
Postgres connection has been closed.


## Get Number of Unique Subjects with Hosp Pregnancy Diagnosis

In [13]:
def count_unique_hosp_pregnant_subjects(connection):
    cur = connection.cursor()
    cur.execute("""
        SELECT COUNT(DISTINCT di.subject_id)
        FROM mimiciv_hosp.diagnoses_icd di
		INNER JOIN mimiciv_hosp.drgcodes d ON di.subject_id = d.subject_id
        INNER JOIN mimiciv_hosp.patients p ON di.subject_id = p.subject_id
        WHERE (((di.icd_code LIKE 'Z33%' 
            OR di.icd_code LIKE 'Z34%' 
            OR di.icd_code LIKE 'Z3A%' 
            OR di.icd_code LIKE 'O0%' 
            OR di.icd_code LIKE 'O1%' 
            OR di.icd_code LIKE 'O2%' 
            OR di.icd_code LIKE 'O3%' 
            OR di.icd_code LIKE 'O4%' 
            OR di.icd_code LIKE 'O5%' 
            OR di.icd_code LIKE 'O6%' 
            OR di.icd_code LIKE 'O7%' 
            OR di.icd_code LIKE 'O8%' 
            OR di.icd_code LIKE 'O9%') AND di.icd_version = 10)
            OR
            ((di.icd_code LIKE 'V22%'
            OR di.icd_code LIKE 'V23%'
            OR di.icd_code LIKE 'V24%'
            OR di.icd_code LIKE 'V27%'
            OR di.icd_code LIKE 'V28%'
            OR di.icd_code LIKE '63%'
            OR di.icd_code LIKE '64%'
            OR di.icd_code LIKE '65%'
            OR di.icd_code LIKE '66%'
            OR di.icd_code LIKE '67%') AND di.icd_version = 9)
            OR (d.drg_code BETWEEN '370' AND '384'))
            AND p.gender = 'F';
    """)
    count = cur.fetchone()[0]
    print("Number of unique patients:", count)
    cur.close()

In [14]:
connection = connect_to_postgres(DB_NAME, USERNAME, PASSWORD, HOST, PORT)
count_unique_hosp_pregnant_subjects(connection)
close_connection(connection)

Connected to db: smcdougall
Number of unique patients: 17485
Postgres connection has been closed.


## Get Number of Unique Subjects with ED Pregnancy Diagnosis

In [15]:
def count_unique_ed_pregnant_subjects(connection):
    cur = connection.cursor()
    cur.execute("""
        SELECT COUNT(DISTINCT di.subject_id)
        FROM mimiciv_ed.diagnosis di
        INNER JOIN mimiciv_hosp.patients p ON di.subject_id = p.subject_id
        WHERE (((di.icd_code LIKE 'Z33%' 
            OR di.icd_code LIKE 'Z34%' 
            OR di.icd_code LIKE 'Z3A%' 
            OR di.icd_code LIKE 'O0%' 
            OR di.icd_code LIKE 'O1%' 
            OR di.icd_code LIKE 'O2%' 
            OR di.icd_code LIKE 'O3%' 
            OR di.icd_code LIKE 'O4%' 
            OR di.icd_code LIKE 'O5%' 
            OR di.icd_code LIKE 'O6%' 
            OR di.icd_code LIKE 'O7%' 
            OR di.icd_code LIKE 'O8%' 
            OR di.icd_code LIKE 'O9%') AND di.icd_version = 10)
            OR
            ((di.icd_code LIKE 'V22%'
            OR di.icd_code LIKE 'V23%'
            OR di.icd_code LIKE 'V24%'
            OR di.icd_code LIKE 'V27%'
            OR di.icd_code LIKE 'V28%'
            OR di.icd_code LIKE '63%'
            OR di.icd_code LIKE '64%'
            OR di.icd_code LIKE '65%'
            OR di.icd_code LIKE '66%'
            OR di.icd_code LIKE '67%') AND di.icd_version = 9))
            AND p.gender = 'F';
    """)
    count = cur.fetchone()[0]
    print("Number of unique patients:", count)
    cur.close()

In [16]:
connection = connect_to_postgres(DB_NAME, USERNAME, PASSWORD, HOST, PORT)
count_unique_ed_pregnant_subjects(connection)
close_connection(connection)

Connected to db: smcdougall
Number of unique patients: 4877
Postgres connection has been closed.


## Get Number of Joint Unique Subjects between the Two Modules

In [17]:
def get_all_pregnant_subject_ids(connection):
    cur = connection.cursor()
    cur.execute("""
        SELECT subject_id
        FROM (
        SELECT di.subject_id
                FROM mimiciv_hosp.diagnoses_icd di
        		INNER JOIN mimiciv_hosp.drgcodes d ON di.subject_id = d.subject_id
                INNER JOIN mimiciv_hosp.patients p ON di.subject_id = p.subject_id
                WHERE (((di.icd_code LIKE 'Z33%' 
                    OR di.icd_code LIKE 'Z34%' 
                    OR di.icd_code LIKE 'Z3A%' 
                    OR di.icd_code LIKE 'O0%' 
                    OR di.icd_code LIKE 'O1%' 
                    OR di.icd_code LIKE 'O2%' 
                    OR di.icd_code LIKE 'O3%' 
                    OR di.icd_code LIKE 'O4%' 
                    OR di.icd_code LIKE 'O5%' 
                    OR di.icd_code LIKE 'O6%' 
                    OR di.icd_code LIKE 'O7%' 
                    OR di.icd_code LIKE 'O8%' 
                    OR di.icd_code LIKE 'O9%') AND di.icd_version = 10)
                    OR
                    ((di.icd_code LIKE 'V22%'
                    OR di.icd_code LIKE 'V23%'
                    OR di.icd_code LIKE 'V24%'
                    OR di.icd_code LIKE 'V27%'
                    OR di.icd_code LIKE 'V28%'
                    OR di.icd_code LIKE '63%'
                    OR di.icd_code LIKE '64%'
                    OR di.icd_code LIKE '65%'
                    OR di.icd_code LIKE '66%'
                    OR di.icd_code LIKE '67%') AND di.icd_version = 9)
                    OR (d.drg_code BETWEEN '370' AND '384'))
                    AND p.gender = 'F'
        UNION
        SELECT di.subject_id
                FROM mimiciv_ed.diagnosis di
                INNER JOIN mimiciv_hosp.patients p ON di.subject_id = p.subject_id
                WHERE (((di.icd_code LIKE 'Z33%' 
                    OR di.icd_code LIKE 'Z34%' 
                    OR di.icd_code LIKE 'Z3A%' 
                    OR di.icd_code LIKE 'O0%' 
                    OR di.icd_code LIKE 'O1%' 
                    OR di.icd_code LIKE 'O2%' 
                    OR di.icd_code LIKE 'O3%' 
                    OR di.icd_code LIKE 'O4%' 
                    OR di.icd_code LIKE 'O5%' 
                    OR di.icd_code LIKE 'O6%' 
                    OR di.icd_code LIKE 'O7%' 
                    OR di.icd_code LIKE 'O8%' 
                    OR di.icd_code LIKE 'O9%') AND di.icd_version = 10)
                    OR
                    ((di.icd_code LIKE 'V22%'
                    OR di.icd_code LIKE 'V23%'
                    OR di.icd_code LIKE 'V24%'
                    OR di.icd_code LIKE 'V27%'
                    OR di.icd_code LIKE 'V28%'
                    OR di.icd_code LIKE '63%'
                    OR di.icd_code LIKE '64%'
                    OR di.icd_code LIKE '65%'
                    OR di.icd_code LIKE '66%'
                    OR di.icd_code LIKE '67%') AND di.icd_version = 9))
                    AND p.gender = 'F'
    ) AS combined_results;
    """)
    rows = cur.fetchall()
    subject_ids = [row[0] for row in rows]
    cur.close()
    return subject_ids

In [18]:
connection = connect_to_postgres(DB_NAME, USERNAME, PASSWORD, HOST, PORT)
subject_ids = get_all_pregnant_subject_ids(connection)
close_connection(connection)
print(len(subject_ids))

Connected to db: smcdougall
Postgres connection has been closed.
19088


In [19]:
deceased_pats_to_exclude = [10495653, 11611136, 10504589, 11101737, 15047583, 15014156, 15695321, 19017858, 18805396, 18892314, 17809756, 18186302]
subject_ids = [sub for sub in subject_ids if sub not in deceased_pats_to_exclude]
print(len(subject_ids))

19076


### Export the results to csv file for later comparison/usage

In [20]:
def export_to_csv(id_list, file_path):
    with open(file_path, 'w', newline='') as csv_file:
        writer = csv.writer(csv_file)
        # add header
        writer.writerow(['subject_id'])
        for item in id_list:
            writer.writerow([item])

In [21]:
export_to_csv(subject_ids, 'hosp_and_ed_pregnant_subjects.csv')