# Running CLMBR

CLMBR is a model that can predict future patient events given past events. The tools to run this also depend on some CLI tools that come from `pip install femr`.

The general approach is:
1. Create a dictionary
2. Create batches of patient data
3. Train the model
4. Compute representations
5. Use the model

The commands are as-follows:
1. `clmbr_create_dictionary`
    - `clmbr_create_survival_dictionary`
2. `clmbr_create_batches`
3. `clmbr_train_model`
4. `clmbr_compute_representations`
5. Reference the model directory in the code


To use CLMBR, you'll need have CUDA on Linux.

You'll need to install some extra packages:
```bash
# clmbr_create_dictionary
pip install dm-haiku optax

# clmbr_create_batches
pip install jax
pip install --upgrade "jax[cpu]"

# clmbr_train_model, clmbr_compute_representations
# If using CUDA on Linux
pip install --upgrade "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
```

In [12]:
!clmbr_create_dictionary --data_path ./example_data/example_etl_output ./example_data/dictionary

Banned 0 out of 10
Got age statistics ... {"mean":23346308.57142857,"std":9540706.023008432}


In [32]:
# `--transformer_vocab_size`` needs to have a value less than number of codes in the dictionary.
!clmbr_create_batches --data_path ./example_data/example_etl_output --dictionary_path ./example_data/dictionary --transformer_vocab_size 5 --task clmbr ./example_data/batches

2023-05-18 11:03:35,670 [MainThread  ] [INFO ]  Preparing batches with Namespace(directory='./example_data/batches', data_path='./example_data/example_etl_output', dictionary_path='./example_data/dictionary', task='clmbr', transformer_vocab_size=5, clmbr_survival_dictionary_path=None, labeled_patients_path=None, is_hierarchical=False, seed=97, val_start=70, test_start=85, batch_size=16384, note_embedding_data=None, limit_to_patients_file=None, limit_before_date=None)
2023-05-18 11:03:35,671 [MainThread  ] [INFO ]  Wrote config ...
2023-05-18 11:03:35,671 [MainThread  ] [INFO ]  Starting to load
When mapping codes, dropped 0 out of 5
2023-05-18 11:03:35,672 [MainThread  ] [INFO ]  Loaded
When mapping codes, dropped 0 out of 5
2023-05-18 11:03:35,673 [MainThread  ] [INFO ]  Number of train patients 1


In [29]:
!clmbr_train_model --data_path ./example_data/example_etl_output --batches_path ./example_data/batches --learning_rate 0.1 --rotary_type disabled ./example_data/trained_model

2023-05-18 10:51:11,403 [MainThread  ] [INFO ]  Training model with Namespace(directory='./example_data/trained_model', data_path='./example_data/example_etl_output', batches_path='./example_data/batches', learning_rate=0.1, rotary_type='disabled', clmbr_survival_dim=None, num_batch_threads=None, start_from_checkpoint=None, freeze_weights=False, token_dropout=0, internal_dropout=0, weight_decay=0, max_iter=None, hidden_size=768, intermediate_size=3072, n_heads=12, n_layers=6, attention_width=512, dev_batches_path=None, early_stopping_window_steps=None)
2023-05-18 10:51:11,404 [MainThread  ] [INFO ]  Got config {'data_path': './example_data/example_etl_output', 'batch_info_path': './example_data/batches/batch_info.msgpack', 'seed': 97, 'task': {'type': 'clmbr', 'vocab_size': 8192}, 'transformer': {'vocab_size': 5, 'hidden_size': 768, 'intermediate_size': 3072, 'n_heads': 12, 'n_layers': 6, 'rotary': 'disabled', 'attention_width': 496, 'internal_dropout': 0, 'is_hierarchical': False, 'no

In [21]:
!clmbr_train_model --help

usage: Train [-h] --data_path DATA_PATH --batches_path BATCHES_PATH
             --learning_rate LEARNING_RATE --rotary_type ROTARY_TYPE
             [--clmbr_survival_dim CLMBR_SURVIVAL_DIM]
             [--num_batch_threads NUM_BATCH_THREADS]
             [--start_from_checkpoint START_FROM_CHECKPOINT]
             [--freeze_weights] [--token_dropout TOKEN_DROPOUT]
             [--internal_dropout INTERNAL_DROPOUT]
             [--weight_decay WEIGHT_DECAY] [--max_iter MAX_ITER]
             [--hidden_size HIDDEN_SIZE]
             [--intermediate_size INTERMEDIATE_SIZE] [--n_heads N_HEADS]
             [--n_layers N_LAYERS] [--attention_width ATTENTION_WIDTH]
             [--dev_batches_path DEV_BATCHES_PATH]
             [--early_stopping_window_steps EARLY_STOPPING_WINDOW_STEPS]
             directory

positional arguments:
  directory

options:
  -h, --help            show this help message and exit
  --data_path DATA_PATH
  --batches_path BATCHES_PATH
  --learning_rate LEARNING_

In [30]:
!clmbr_compute_representations --help

usage: Compute representations [-h] --data_path DATA_PATH --batches_path
                               BATCHES_PATH --model_dir MODEL_DIR
                               destination

positional arguments:
  destination

options:
  -h, --help            show this help message and exit
  --data_path DATA_PATH
  --batches_path BATCHES_PATH
  --model_dir MODEL_DIR


In [33]:
# TODO: How to reference trained CLMBR directory in the code?