---

Some/all notebooks were run on a logon/interactive node (extrememly limited resources).  

---

To speed this up you can run this notebook on a compute node. 

Do this by being logged into an interactive node with 2 terminals open:  

---
---

Terminal 1:

---

(may want to increase time to be safe, as below is just an example)  

`salloc --time=6:00:00 --ntasks=1 --cpus-per-task=64 --nodes=1 --account=bmi_facelli-np --partition=bmi_facelli-np`  

The terminal returns when resources are allocated and further comamands will be executed on the compute node.  

`hostname -f`  
`<compute node name>` (copy this to clipboard)  

`export XDG_RUNTIME_DIR=""`  

`conda activate jupy2` (activate your conda env or venv that has jupyter/etc. dependencies)  

`jupyter notebook --no-browser --port=8889`  

----
---

Terminal 2: 

---

`google-chrome &`  

`ssh -N -L localhost:8888:localhost:8889 <compute node name>`  

Now, Chrome should open on your interactive node and you can copy the jupyter link (just change `...:8889/tree...` to `...:8888/lab...`) from the compute node terminal to access/create your notebooks/etc.  

---
---
---

Prior to this (or through a new terminal in Jupyter), you will need to initialize/create a PostgreSQL DB (adjust paths/names as necessary):  

---

`cd /uufs/chpc.utah.edu/common/home/u0740821/dissertation/data/diabetes`  

`mkdir -p pgsql/data`  

`module load postgresql/15.2`  

`initdb -D /uufs/chpc.utah.edu/common/home/u0740821/dissertation/data/diabetes/pgsql/data`  

`pg_ctl -D /uufs/chpc.utah.edu/common/home/u0740821/dissertation/data/diabetes/pgsql/data -l logfile start`  

`createdb diabetes`  

`psql -d diabetes`  

`GRANT ALL PRIVILEGES ON DATABASE diabetes TO u0740821;`  
`\q`  

---

---

You will also need to extract the `.zip` file containing all your TriNetX CSVs:  

`unzip -l your_zip_file.zip`  

This will show you what files are contained in the zip.  

`unzip yourfile.zip patient.csv diagnosis.csv -d /path/to/destination` (recommend `/scratch/general/vast/<your user ID>`)  

---

---

# Generate `constraint.sql`

---

__Note:__  
  
`--` is a comment (will not be executed) in `psql`. Add/remove commenting as appropriate.


In [6]:
val = """----------------------------------------
-- Add constraints to the tables --
----------------------------------------

-- USING MINIMAL FURTHER BELOW INSTEAD PER DUPLICATES 
--   (e.g., some encounter_id's are present in more than one row,
--    even though data dictionary indicates it is encounter table PK)
-- 
-- -- Patient Demographic Constraints
-- ALTER TABLE patient_demographic DROP CONSTRAINT IF EXISTS pk_patient_demographic CASCADE;
-- ALTER TABLE patient_demographic
--     ADD CONSTRAINT pk_patient_demographic PRIMARY KEY (patient_id);
-- 
-- 
-- -- Encounter Constraints
-- ALTER TABLE encounter DROP CONSTRAINT IF EXISTS pk_encounter CASCADE;
-- ALTER TABLE encounter DROP CONSTRAINT IF EXISTS fk_encounter_patient CASCADE;
-- ALTER TABLE encounter
--     ADD CONSTRAINT pk_encounter PRIMARY KEY (encounter_id),
--     ADD CONSTRAINT fk_encounter_patient FOREIGN KEY (patient_id) REFERENCES patient_demographic(patient_id);
-- 
-- -- Lab Result Constraints
-- ALTER TABLE lab_result DROP CONSTRAINT IF EXISTS fk_lab_result_patient CASCADE;
-- ALTER TABLE lab_result DROP CONSTRAINT IF EXISTS fk_lab_result_encounter CASCADE;
-- ALTER TABLE lab_result
--     ADD CONSTRAINT fk_lab_result_patient FOREIGN KEY (patient_id) REFERENCES patient_demographic(patient_id),
--     ADD CONSTRAINT fk_lab_result_encounter FOREIGN KEY (encounter_id) REFERENCES encounter(encounter_id);
-- 
-- -- Diagnosis Constraints
-- ALTER TABLE diagnosis DROP CONSTRAINT IF EXISTS fk_diagnosis_patient CASCADE;
-- ALTER TABLE diagnosis DROP CONSTRAINT IF EXISTS fk_diagnosis_encounter CASCADE;
-- ALTER TABLE diagnosis
--     ADD CONSTRAINT fk_diagnosis_patient FOREIGN KEY (patient_id) REFERENCES patient_demographic(patient_id),
--     ADD CONSTRAINT fk_diagnosis_encounter FOREIGN KEY (encounter_id) REFERENCES encounter(encounter_id);
-- 
-- -- Procedure Constraints
-- ALTER TABLE procedure DROP CONSTRAINT IF EXISTS fk_procedure_patient CASCADE;
-- ALTER TABLE procedure DROP CONSTRAINT IF EXISTS fk_procedure_encounter CASCADE;
-- ALTER TABLE procedure
--     ADD CONSTRAINT fk_procedure_patient FOREIGN KEY (patient_id) REFERENCES patient_demographic(patient_id),
--     ADD CONSTRAINT fk_procedure_encounter FOREIGN KEY (encounter_id) REFERENCES encounter(encounter_id);
-- 
-- -- Medication Constraints
-- ALTER TABLE medication DROP CONSTRAINT IF EXISTS fk_medication_patient CASCADE;
-- ALTER TABLE medication DROP CONSTRAINT IF EXISTS fk_medication_encounter CASCADE;
-- ALTER TABLE medication
--     ADD CONSTRAINT fk_medication_patient FOREIGN KEY (patient_id) REFERENCES patient_demographic(patient_id),
--     ADD CONSTRAINT fk_medication_encounter FOREIGN KEY (encounter_id) REFERENCES encounter(encounter_id);
-- 
-- -- Vital Sign Constraints
-- ALTER TABLE vital_sign DROP CONSTRAINT IF EXISTS fk_vital_sign_patient CASCADE;
-- ALTER TABLE vital_sign DROP CONSTRAINT IF EXISTS fk_vital_sign_encounter CASCADE;
-- ALTER TABLE vital_sign
--     ADD CONSTRAINT fk_vital_sign_patient FOREIGN KEY (patient_id) REFERENCES patient_demographic(patient_id),
--     ADD CONSTRAINT fk_vital_sign_encounter FOREIGN KEY (encounter_id) REFERENCES encounter(encounter_id);

-- Patient Demographic Constraints
ALTER TABLE patient_demographic DROP CONSTRAINT IF EXISTS pk_patient_demographic CASCADE;
ALTER TABLE patient_demographic
    ADD CONSTRAINT pk_patient_demographic PRIMARY KEY (patient_id);

-- Lab Result Constraints
ALTER TABLE lab_result DROP CONSTRAINT IF EXISTS fk_lab_result_patient CASCADE;
ALTER TABLE lab_result
    ADD CONSTRAINT fk_lab_result_patient FOREIGN KEY (patient_id) REFERENCES patient_demographic(patient_id);

-- Diagnosis Constraints
ALTER TABLE diagnosis DROP CONSTRAINT IF EXISTS fk_diagnosis_patient CASCADE;
ALTER TABLE diagnosis
    ADD CONSTRAINT fk_diagnosis_patient FOREIGN KEY (patient_id) REFERENCES patient_demographic(patient_id);

-- Medication Constraints
ALTER TABLE medication DROP CONSTRAINT IF EXISTS fk_medication_patient CASCADE;
ALTER TABLE medication
    ADD CONSTRAINT fk_medication_patient FOREIGN KEY (patient_id) REFERENCES patient_demographic(patient_id);
"""

# Write to SQL file
with open('constraint.sql', 'w') as f:
    f.write(val)

# print(val)
!cat constraint.sql

----------------------------------------
-- Add constraints to the tables --
----------------------------------------

-- USING MINIMAL FURTHER BELOW INSTEAD PER DUPLICATES 
--   (e.g., some encounter_id's are present in more than one row,
--    even though data dictionary indicates it is encounter table PK)
-- 
-- -- Patient Demographic Constraints
-- ALTER TABLE patient_demographic DROP CONSTRAINT IF EXISTS pk_patient_demographic CASCADE;
-- ALTER TABLE patient_demographic
--     ADD CONSTRAINT pk_patient_demographic PRIMARY KEY (patient_id);
-- 
-- 
-- -- Encounter Constraints
-- ALTER TABLE encounter DROP CONSTRAINT IF EXISTS pk_encounter CASCADE;
-- ALTER TABLE encounter DROP CONSTRAINT IF EXISTS fk_encounter_patient CASCADE;
-- ALTER TABLE encounter
--     ADD CONSTRAINT pk_encounter PRIMARY KEY (encounter_id),
--     ADD CONSTRAINT fk_encounter_patient FOREIGN KEY (patient_id) REFERENCES patient_demographic(patient_id);
-- 
-- -- Lab Result Constraints
-- ALTER TABLE lab_result 

---

---

# Run `constraint.sql`

## `psql -d diabetes -v ON_ERROR_STOP=1 -v diabetes_data_dir=/scratch/general/vast/u0740821/ -f /uufs/chpc.utah.edu/common/home/u0740821/dissertation/data/diabetes/constraint.sql`
---
```
NOTICE:  constraint "pk_patient_demographic" of relation "patient_demographic" does not exist, skipping
ALTER TABLE
ALTER TABLE
NOTICE:  constraint "fk_lab_result_patient" of relation "lab_result" does not exist, skipping
ALTER TABLE
ALTER TABLE
NOTICE:  constraint "fk_diagnosis_patient" of relation "diagnosis" does not exist, skipping
ALTER TABLE
ALTER TABLE
NOTICE:  constraint "fk_medication_patient" of relation "medication" does not exist, skipping
ALTER TABLE
ALTER TABLE
```

---

---

---

# Generate `index.sql`

---

In [7]:
val = """----------------------------------------
-- Add indexes to the tables --
----------------------------------------

-- Diagnosis Indexes
DROP INDEX IF EXISTS idx_diagnosis_patient_id;
CREATE INDEX idx_diagnosis_patient_id ON diagnosis(patient_id);

-- CREATE INDEX idx_diagnosis_encounter_id ON diagnosis(encounter_id);

DROP INDEX IF EXISTS idx_diagnosis_code;
CREATE INDEX idx_diagnosis_code ON diagnosis(code);

DROP INDEX IF EXISTS idx_diagnosis_date;
CREATE INDEX idx_diagnosis_date ON diagnosis(date);

DROP INDEX IF EXISTS idx_diagnosis_code_date;
CREATE INDEX idx_diagnosis_code_date ON diagnosis(code, date);



-- Encounter Indexes
DROP INDEX IF EXISTS idx_encounter_patient_id;
CREATE INDEX idx_encounter_patient_id ON encounter(patient_id);



-- Lab Result Indexes
DROP INDEX IF EXISTS idx_lab_result_patient_id;
CREATE INDEX idx_lab_result_patient_id ON lab_result(patient_id);

-- CREATE INDEX idx_lab_result_encounter_id ON lab_result(encounter_id);

DROP INDEX IF EXISTS idx_lab_result_code;
CREATE INDEX idx_lab_result_code ON lab_result(code);

DROP INDEX IF EXISTS idx_lab_result_date;
CREATE INDEX idx_lab_result_date ON lab_result(date);

DROP INDEX IF EXISTS idx_lab_result_code_date;
CREATE INDEX idx_lab_result_code_date ON lab_result(code, date);



-- Medication Indexes
DROP INDEX IF EXISTS idx_medication_patient_id;
CREATE INDEX idx_medication_patient_id ON medication(patient_id);

-- CREATE INDEX idx_medication_encounter_id ON medication(encounter_id);

DROP INDEX IF EXISTS idx_medication_code;
CREATE INDEX idx_medication_code ON medication(code);

DROP INDEX IF EXISTS idx_medication_start_date;
CREATE INDEX idx_medication_start_date ON medication(start_date);

DROP INDEX IF EXISTS idx_medication_code_start_date;
CREATE INDEX idx_medication_code_start_date ON medication(code, start_date);



-- Procedure Indexes
-- CREATE INDEX idx_procedure_patient_id ON procedure(patient_id);
-- -- CREATE INDEX idx_procedure_encounter_id ON procedure(encounter_id);
-- CREATE INDEX idx_procedure_code ON procedure(code);
-- CREATE INDEX idx_procedure_date ON procedure(date);

-- Vital Sign Indexes
-- CREATE INDEX idx_vital_sign_patient_id ON vital_sign(patient_id);
-- -- CREATE INDEX idx_vital_sign_encounter_id ON vital_sign(encounter_id);
-- CREATE INDEX idx_vital_sign_code ON vital_sign(code);
-- CREATE INDEX idx_vital_sign_date ON vital_sign(date);
-- CREATE INDEX idx_vital_sign_code_date ON vital_sign(code, date);
"""

# Write to SQL file
with open('index.sql', 'w') as f:
    f.write(val)

# print(val)
!cat index.sql

----------------------------------------
-- Add indexes to the tables --
----------------------------------------

-- Diagnosis Indexes
DROP INDEX IF EXISTS idx_diagnosis_patient_id;
CREATE INDEX idx_diagnosis_patient_id ON diagnosis(patient_id);

-- CREATE INDEX idx_diagnosis_encounter_id ON diagnosis(encounter_id);

DROP INDEX IF EXISTS idx_diagnosis_code;
CREATE INDEX idx_diagnosis_code ON diagnosis(code);

DROP INDEX IF EXISTS idx_diagnosis_date;
CREATE INDEX idx_diagnosis_date ON diagnosis(date);

DROP INDEX IF EXISTS idx_diagnosis_code_date;
CREATE INDEX idx_diagnosis_code_date ON diagnosis(code, date);



-- Encounter Indexes
DROP INDEX IF EXISTS idx_encounter_patient_id;
CREATE INDEX idx_encounter_patient_id ON encounter(patient_id);



-- Lab Result Indexes
DROP INDEX IF EXISTS idx_lab_result_patient_id;
CREATE INDEX idx_lab_result_patient_id ON lab_result(patient_id);

-- CREATE INDEX idx_lab_result_encounter_id ON lab_result(encounter_id);

DROP INDEX IF EXISTS idx_lab_resul

---

---

# Run `index.sql`

## `psql -d diabetes -v ON_ERROR_STOP=1 -v diabetes_data_dir=/scratch/general/vast/u0740821/ -f /uufs/chpc.utah.edu/common/home/u0740821/dissertation/data/diabetes/index.sql`
---
```
NOTICE:  index "idx_diagnosis_patient_id" does not exist, skipping
DROP INDEX
CREATE INDEX
NOTICE:  index "idx_diagnosis_code" does not exist, skipping
DROP INDEX
CREATE INDEX
NOTICE:  index "idx_diagnosis_date" does not exist, skipping
DROP INDEX
CREATE INDEX
NOTICE:  index "idx_diagnosis_code_date" does not exist, skipping
DROP INDEX
CREATE INDEX
NOTICE:  index "idx_encounter_patient_id" does not exist, skipping
DROP INDEX
CREATE INDEX
NOTICE:  index "idx_lab_result_patient_id" does not exist, skipping
DROP INDEX
CREATE INDEX
NOTICE:  index "idx_lab_result_code" does not exist, skipping
DROP INDEX
CREATE INDEX
NOTICE:  index "idx_lab_result_date" does not exist, skipping
DROP INDEX
CREATE INDEX
NOTICE:  index "idx_lab_result_code_date" does not exist, skipping
DROP INDEX
CREATE INDEX
NOTICE:  index "idx_medication_patient_id" does not exist, skipping
DROP INDEX
CREATE INDEX
NOTICE:  index "idx_medication_code" does not exist, skipping
DROP INDEX
CREATE INDEX
NOTICE:  index "idx_medication_start_date" does not exist, skipping
DROP INDEX
CREATE INDEX
NOTICE:  index "idx_medication_code_start_date" does not exist, skipping
DROP INDEX
CREATE INDEX
```

---

---

---


# Check Size


---

In [8]:
!du -sh pgsql/

1.6T	pgsql/
