# Train DeepMalNet on Kaggle

## How to run this notebook on Kaggle

1. Upload it
2. Mount the dataset [(link in README)](https://github.com/laam-egg/DeepMalNet)
3. Mount the output of a previous run of this notebook, if any,
    to continue training model from the latest previous checkpoint.

## How to run this notebook in local (Linux)

Assume the model checkpoints directory is `$PROJECT_ROOT/checkpoints`.
To make sure:

```sh
cd $PROJECT_ROOT
mkdir -p checkpoints
```

1. **Use the DeepMalNet conda environment to run the notebook.**
2. Inside `$PROJECT_ROOT/kaggle`, create a new directory
    named `DeepMalNet-on-Kaggle`, then `cd` to it.
3. Create directories for compatibility with Kaggle:
    
    ```sh
    sudo mkdir /kaggle
    sudo chown $USER:$USER /kaggle
    cd /kaggle
    mkdir input
    mkdir working
    ```

4. Create a symlink to the LMDB database containing training
    data:

    ```sh
    cd /kaggle/input
    sudo ln -s $PATH_TO_LMDB_DIR ember2024-lmdb
    ```

5. Create a symlink to the directory that contains the trained
    model checkpoints:

    ```sh
    cd /kaggle/working
    sudo ln -s $PROJECT_ROOT/checkpoints checkpoints
    ```

6. Also create this symlink to the same directory containing
    model checkpoints. The reason is we want to run this same
    notebook multiple times to continue training the latest
    previous model checkpoint.

    ```sh
    cd /kaggle/input
    mkdir -p deepmalnet-on-kaggle
    cd deepmalnet-on-kaggle
    mkdir -p DeepMalNet-on-Kaggle
    cd DeepMalNet-on-Kaggle
    sudo ln -s $PROJECT_ROOT/checkpoints checkpoints
    ```


In [None]:
DeepMalNet_PATH = "/kaggle/working/DeepMalNet-on-Kaggle/DeepMalNet"
LMDB_PATH = "/kaggle/input/ember2024-lmdb"
MODEL_INPUT_DIR_PATH = "/kaggle/input/deepmalnet-on-kaggle/DeepMalNet-on-Kaggle/checkpoints"
MODEL_OUTPUT_DIR_PATH = "/kaggle/working/DeepMalNet-on-Kaggle/checkpoints"

In [None]:
!mkdir -p DeepMalNet-on-Kaggle
%cd DeepMalNet-on-Kaggle
!rm -rf DeepMalNet
!git clone https://github.com/laam-egg/DeepMalNet
%cd DeepMalNet
!echo "DeepMalNet commit:"
!git log --pretty=format:'%H' -n 1
!echo
%pip install -r requirements-kaggle.txt
%cd ..
!mkdir -p checkpoints
%cd ..

import sys
sys.path.append(DeepMalNet_PATH)

In [None]:
from DeepMalNet.training import Trainer

trainer = Trainer(LMDB_PATH, num_epochs=16)
trainer.load_last_checkpoint(MODEL_INPUT_DIR_PATH, sanity_check_if_found=True)
trainer.train()
trainer.save(MODEL_OUTPUT_DIR_PATH)
trainer.sanity_check()