Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ckpt to nemo or model #8735

Closed
gslin1224 opened this issue Mar 23, 2024 · 2 comments
Closed

ckpt to nemo or model #8735

gslin1224 opened this issue Mar 23, 2024 · 2 comments

Comments

@gslin1224
Copy link

gslin1224 commented Mar 23, 2024

Hi everyone,

I'm using the code colab to train my own model. The performance looks promising in the first epoch. However, I'm struggling to figure out how to convert the checkpoint to the NeMo format.

[NeMo I 2024-03-22 15:19:31 token_classification_model:161] 
    label                                                precision    recall       f1           support   
    O (label_id: 0)                                         77.87      55.56      64.85        171
    B-DIS (label_id: 1)                                      2.65      31.25       4.88         16
    I-DIS (label_id: 2)                                     13.33       1.44       2.60        139
    -------------------
    micro avg                                               31.29      31.29      31.29        326
    macro avg                                               31.28      29.41      24.11        326
    weighted avg                                            46.66      31.29      35.36        326
    
[NeMo W 2024-03-22 15:19:32 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:438: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 256 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
      rank_zero_warn(
    
Epoch 1: 33%
117250/356647 [4:12:20<8:35:13, 7.74it/s, v_num=2-07, lr=4.08e-5, train_step_timing in s=0.104, val_loss=0.00559]
[NeMo I 2024-03-22 15:19:32 preemption:56] Preemption requires torch distributed to be initialized, disabling preemption
[NeMo I 2024-03-23 04:23:00 token_classification_model:161] 
    label                                                precision    recall       f1           support   
    O (label_id: 0)                                         99.77      99.82      99.80    6081509
    B-DIS (label_id: 1)                                     98.21      97.71      97.96     608465
    I-DIS (label_id: 2)                                    100.00     100.00     100.00    3735127
    -------------------
    micro avg                                               99.76      99.76      99.76   10425101
    macro avg                                               99.33      99.18      99.25   10425101
    weighted avg                                            99.76      99.76      99.76   10425101

Could you please advise on how to convert the checkpoint to the NeMo format?

And

When running a full epoch, will it automatically save the model with the lowest loss or the model of the last epoch?

@taras-sereda
Copy link

Hi for these proposes I wrote a tiny script for my ASR models, that accepts .ckpt model and saves it as .nemo model

I think something similar should work for NLP models too.

import sys
from pathlib import Path

import nemo.collections.asr as nemo_asr


def get_nemo_ckpt_path(ckpt_path: Path) -> Path:
    ckpt_name = ckpt_path.stem
    name, other = ckpt_name.split("--")
    epoch = other.split("=")[-1]
    out_path = ckpt_path.with_name(f"{name}-epoch-{epoch}.nemo")
    return out_path

def export_to_nemo(lightning_ckpt_path: Path):
    model = nemo_asr.models.EncDecCTCModelBPE.load_from_checkpoint(
        str(lightning_ckpt_path)
    )
    ckpt_name = get_nemo_ckpt_path(lightning_ckpt_path)
    out_path = get_nemo_ckpt_path(lightning_ckpt_path)
    model.save_to(str(out_path))
    print(f"Succesfully saved model checkpoint: {out_path} in nemo format")


if __name__ == "__main__":
    in_path = Path(sys.argv[1])
    assert in_path.exists()
    export_to_nemo(in_path)

@gslin1224
Copy link
Author

Hi for these proposes I wrote a tiny script for my ASR models, that accepts .ckpt model and saves it as .nemo model

I think something similar should work for NLP models too.

import sys
from pathlib import Path

import nemo.collections.asr as nemo_asr


def get_nemo_ckpt_path(ckpt_path: Path) -> Path:
    ckpt_name = ckpt_path.stem
    name, other = ckpt_name.split("--")
    epoch = other.split("=")[-1]
    out_path = ckpt_path.with_name(f"{name}-epoch-{epoch}.nemo")
    return out_path

def export_to_nemo(lightning_ckpt_path: Path):
    model = nemo_asr.models.EncDecCTCModelBPE.load_from_checkpoint(
        str(lightning_ckpt_path)
    )
    ckpt_name = get_nemo_ckpt_path(lightning_ckpt_path)
    out_path = get_nemo_ckpt_path(lightning_ckpt_path)
    model.save_to(str(out_path))
    print(f"Succesfully saved model checkpoint: {out_path} in nemo format")


if __name__ == "__main__":
    in_path = Path(sys.argv[1])
    assert in_path.exists()
    export_to_nemo(in_path)

Thanks for your reply.

It seems like changing the import nemo.collections.asr as nemo_asr might work.

Have you tried converting the Nemo format to gguf? Alternatively, how do you build a model from your Nemo format?

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants