# TimEHR

This tutorial will demonstrate how to utilize the `TimeEHR` class to generate synthetic electronic health records (EHR) data. TimeEHR can generate `irregularly sampled` time series with `missing values`. Presently, it only supports *continuous-valued* time series data. It employs `CTGAN` to initially generate static data and then generates time series conditioned on the static data.


* Please check out the [TimeEHR GitHub repository](https://github.com/hojjatkarami/TimEHR) for more information.
* You can find the original paper [here](https://arxiv.org/abs/2402.06318).
* To install required libraries:
    ```
    pip install wandb python-dotenv hydra-core
    ```
### Config file
* we use `hydra` to manage the configuration files. The configuration files are located in the `configs` folder. we use `configs/config.yaml` file to set the parameters of the TimeEHR class.
Data configs are stored in `configs/data/{DATA}.yaml`.
* Download the config files from original repository [here](https://github.com/hojjatkarami/TimEHR/tree/main) and put it in the same folder as the this notebook.


### Datasets
* Follow the [instructions](https://github.com/hojjatkarami/TimEHR/tree/main/data) to prepare `P12` or `P19` datasets, which are two EHRs datasets contatining irregularly sampled time series data for many ICU patients.
* Alternatively, you can download them from [here](https://drive.google.com/drive/folders/1QsK1tcH5NV5Xu2cEMOJvCy3IicSvRSme?usp=sharing).
* put the datasets in a folder and update the `path_processed` from data config file.

In [None]:
# !pip install synthcity
# !pip uninstall -y torchaudio torchdata


# activate line execution
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# auto reload modules. useful for development
%load_ext autoreload
%autoreload 2

In [None]:
# make sure the src folder is in the python path. usefule for development
import sys
sys.path.insert(0,"../src")

In [None]:
# stdlib
import sys
import warnings

from tqdm import tqdm

# synthcity absolute
import synthcity.logger as log
from synthcity.plugins import Plugins

log.add(sink=sys.stderr, level="INFO")
warnings.filterwarnings("ignore")

# Load configs



In [None]:
# you can't run thic cell twice. You have to restart the kernel.

from hydra import initialize, compose
from omegaconf import OmegaConf

initialize(version_base=None, config_path="configs")
cfg = compose(config_name="config.yaml")

# Load datasets

In [None]:
from synthcity.plugins.core.models.timehr.data_utils import get_datasets

train_dataset, val_dataset = get_datasets(
        cfg.data, split=cfg.split, preprocess=True
    )

# Import model

In [None]:
from synthcity.plugins import Plugins

generators = Plugins()

# check if the plugin is available
"timehr" in generators.list()

# get the plugin
timehr_model = Plugins().get("timehr", cfg)

# Train

* It is highly recommended to use `wandb` to log the training process. You can create a free account on [wandb](https://wandb.ai/site) and get your API key from [here](https://wandb.ai/authorize). Put the API key in a `.env` file in the root directory of the project.

```bash
WANDB_API_KEY=your_api_key
```

* TimEHR will train two modules: `CWGAN` and `Pix2pix`. Please check the original paper for more information about these models. Each module will be saved in a separate wandb project. You can check the training process on the wandb dashboard.

In [None]:
# init wandb
import os
import wandb
from dotenv import load_dotenv

load_dotenv()
wandb.login(key=os.getenv("WANDB_KEY")) 


In [None]:
# train

timehr_model._fit(train_dataset, val_dataset)

# Generate

`fake_static` is generated using `CTGAN` and `fake_data`(time series data) is generated using `TimEHR` conditioned on `fake_static`.

In [None]:
fake_static, fake_data = timehr_model._generate(count=1000, train_dataset = train_dataset, method='ctgan')



In [None]:
# converting to dataframes
from synthcity.plugins.core.models.timehr.utils import mat2df

df_ts_fake, df_static_fake = mat2df(fake_data,fake_static, train_dataset.dynamic_processor, train_dataset.static_processor)


In [None]:
df_static_fake
df_ts_fake

# Conclusion
* Please refere to the original repository for visualization and evaluation of the generated data.