# Clone the CAML-MIMIC Repositry

In [1]:
!git clone https://github.com/jamesmullenbach/caml-mimic.git

Cloning into 'caml-mimic'...


In [3]:
!git clone https://github.com/acadTags/Explainable-Automated-Medical-Coding.git

Cloning into 'Explainable-Automated-Medical-Coding'...


Now we have to modify the files in the caml-mimic repo to make it run using the latest libraries. </br>
Also modify the ```caml-mimic/notebooks/dataproc_mimic_III.ipynb``` to reflect any changes made to the code.

1. Your MIMIC III data must be organized with the following structure within this repository:
```
mimicdata/
|   D_ICD_DIAGNOSES.csv
|   D_ICD_PROCEDURES.csv
|   ICD9_descriptions (already in repo)
└───mimic3/
|   |   NOTEEVENTS.csv
|   |   DIAGNOSES_ICD.csv
|   |   PROCEDURES_ICD.csv
|   |   *_hadm_ids.csv (already in repo)
└───saved_models/
```
2. To get started, first edit `constants.py` to point to the `mimicdata` and `mimicdata/mimic3` directories above.

3. Open Jupyter Notebook `notebooks/dataproc_mimic_III.ipynb`, run all cells (in the menu, click Cell -> Run All)

Now we convert the generated .csv files into a text file of format ```doc__label__labelA labelB labelC``` where each line is document and its label.

In [None]:
# Define a function to convert the .csv into txt file
import sys

import pandas as pd


def txt_from_csv(caml_mimic_csv_file, hlan_text_file):
    df = pd.read_csv(caml_mimic_csv_file)
    print(df.info())

    text_lines = df["TEXT"].tolist()
    label_lines = df["LABELS"].map(lambda line: line.replace(";", " ")).tolist()

    with open(hlan_text_file, "w") as fh:
        for text_line, label_line in zip(text_lines, label_lines):
            print(f"{text_line}__label__{label_line}", file=fh)

# Convert the text and place it in the correct folder for Explainable-Automated-Medical-Coding
txt_from_csv("caml-mimic/mimicdata/mimic3/dev_50.csv", "Explainable-Automated-Medical-Coding/datasets/mimiciii_dev_50_th0.txt")
txt_from_csv("caml-mimic/mimicdata/mimic3/test_50.csv", "Explainable-Automated-Medical-Coding/datasets/mimiciii_test_50_th0.txt")
txt_from_csv("caml-mimic/mimicdata/mimic3/train_50.csv", "Explainable-Automated-Medical-Coding/datasets/mimiciii_train_50_th0.txt")


Now we code model class by taking inspirations from the ```https://github.com/dmcguire81/CS598DL4H``` HLAN class and making sure to modify the code to run using the latest libraries

Now download the embedding models provided by the author and place it in embedding folder, also if you want you can download the checkpoints to use the model for prediction directly.

Once all the modification is done make sure to trail run to see if the code is running or not using the following command.
```
!python HLAN/HAN_train.py \
    --dataset 'mimic-50' \
    --batch_size 128 \
    --per_label_attention=False \
    --per_label_sent_only=False \
    --num_epochs=100 \
    --early_stop_lr=0.00002 \
    --remove_ckpts_before_train=False \
    --use_label_embedding=True \
    --ckpt_dir checkpoints/HAN+LE/ \
    --log_dir logs/HAN+LE \
    --word2vec_model_path Explainable-Automated-Medical-Coding/embeddings/processed_full.w2v \
    --label_embedding_model_path Explainable-Automated-Medical-Coding/embeddings/code-emb-mimic3-tr-400.model \
    --label_embedding_model_path_per_label Explainable-Automated-Medical-Coding/embeddings/code-emb-mimic3-tr-200.model
```

Once it completes a Epoch you can stop the run if you are planning to run the training on the colab otherwise you can try out the differernt models and compare the result. </br>

Now if you want to train the model in colab.
- Either upload all the files in gdrive
<h4 align="center"> Or </h4>

- You can use the same repo and clone all the files into the colab environment, you can either upload the dataset into the colab environment and run the training in the colab

# Training Of the models

In [None]:
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)

Mounted at /content/gdrive


In [None]:
path_to_gdrive_folder = '/content/gdrive/MyDrive/MIMIC-III/Reproduce-Explainable-Automated-Medical-Coding'
%cd '/content/gdrive/MyDrive/MIMIC-III/Reproduce-Explainable-Automated-Medical-Coding'

In [None]:
!pip install tf-slim
!pip install tflearn

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tf-slim
  Downloading tf_slim-1.1.0-py2.py3-none-any.whl (352 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/352.1 KB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m348.2/352.1 KB[0m [31m11.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m352.1/352.1 KB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tf-slim
Successfully installed tf-slim-1.1.0
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tflearn
  Downloading tflearn-0.5.0.tar.gz (107 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m107.3/107.3 KB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: 

In [None]:
!python HLAN/HAN_train.py \
    --dataset 'mimic-50' \
    --batch_size 32 \
    --per_label_attention=True \
    --per_label_sent_only=False \
    --num_epochs=100 \
    --early_stop_lr=0.00002 \
    --remove_ckpts_before_train=False \
    --use_label_embedding=True \
    --ckpt_dir checkpoints/HLAN+LE/ \
    --log_dir logs/HLAN+LE \
    --word2vec_model_path Explainable-Automated-Medical-Coding/embeddings/processed_full.w2v \
    --label_embedding_model_path Explainable-Automated-Medical-Coding/embeddings/code-emb-mimic3-tr-400.model \
    --label_embedding_model_path_per_label Explainable-Automated-Medical-Coding/embeddings/code-emb-mimic3-tr-200.model


2023-03-22 18:43:50.447560: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-22 18:43:52.115427: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-22 18:43:52.115559: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
Instructions for updating:
non-resource variables are not supported in the long term
['caml-mi

In [None]:
!python HLAN/HAN_train.py \
    --dataset 'mimic-50' \
    --batch_size 32 \
    --per_label_attention=True \
    --per_label_sent_only=True \
    --num_epochs=100 \
    --early_stop_lr=0.00002 \
    --remove_ckpts_before_train=False \
    --use_label_embedding=True \
    --ckpt_dir checkpoints/HA-GRU+LE/ \
    --log_dir logs/HA-GRU+LE \
    --word2vec_model_path Explainable-Automated-Medical-Coding/embeddings/processed_full.w2v \
    --label_embedding_model_path Explainable-Automated-Medical-Coding/embeddings/code-emb-mimic3-tr-400.model \
    --label_embedding_model_path_per_label Explainable-Automated-Medical-Coding/embeddings/code-emb-mimic3-tr-200.model

2023-03-23 04:36:27.495162: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-23 04:36:28.418473: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-23 04:36:28.418577: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
Instructions for updating:
non-resource variables are not supported in the long term
['caml-mi

In [None]:
!python HLAN/HAN_train.py \
    --dataset 'mimic-50' \
    --batch_size 128 \
    --per_label_attention=False \
    --per_label_sent_only=False \
    --num_epochs=100 \
    --early_stop_lr=0.00002 \
    --remove_ckpts_before_train=False \
    --use_label_embedding=True \
    --ckpt_dir checkpoints/HAN+LE/ \
    --log_dir logs/HAN+LE \
    --word2vec_model_path Explainable-Automated-Medical-Coding/embeddings/processed_full.w2v \
    --label_embedding_model_path Explainable-Automated-Medical-Coding/embeddings/code-emb-mimic3-tr-400.model \
    --label_embedding_model_path_per_label Explainable-Automated-Medical-Coding/embeddings/code-emb-mimic3-tr-200.model

2023-03-23 05:02:22.767565: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-23 05:02:24.145493: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-23 05:02:24.145612: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
Instructions for updating:
non-resource variables are not supported in the long term
['caml-mi

In [None]:
!python HLAN/HAN_train.py \
    --dataset 'mimic-50' \
    --batch_size 32 \
    --per_label_attention=True \
    --per_label_sent_only=True \
    --num_epochs=100 \
    --early_stop_lr=0.00002 \
    --remove_ckpts_before_train=False \
    --use_label_embedding=True \
    --ckpt_dir checkpoints/HLAN/ \
    --log_dir logs/HLAN \
    --word2vec_model_path Explainable-Automated-Medical-Coding/embeddings/processed_full.w2v \
    --label_embedding_model_path Explainable-Automated-Medical-Coding/embeddings/code-emb-mimic3-tr-400.model \
    --label_embedding_model_path_per_label Explainable-Automated-Medical-Coding/embeddings/code-emb-mimic3-tr-200.model

2023-03-23 05:21:08.311247: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-23 05:21:09.195487: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-23 05:21:09.195585: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
Instructions for updating:
non-resource variables are not supported in the long term
['caml-mi

Trying the original code modified to run with the latest libraries to check if there is difference in result

In [None]:
!python HLAN/HAN_train.py \
    --dataset mimic3-ds-50 \
    --batch_size 32 \
    --per_label_attention=True \
    --per_label_sent_only=True \
    --num_epochs 100 \
    --report_rand_pred=False \
    --running_times 1 \
    --early_stop_lr 0.00002 \
    --remove_ckpts_before_train=False \
    --use_label_embedding=True \
    --ckpt_dir ../checkpoints/checkpoint_HAN_50_per_label_bs32_LE/ \
    --use_sent_split_padded_version=False \
    --marking_id 50-hlan \
    --gpu=True

2023-03-23 07:08:13.617659: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-23 07:08:14.609317: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-23 07:08:14.609414: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
Instructions for updating:
non-resource variables are not supported in the long term
path sele