Skip to content

Add save functionality to CTCDecoder #4113

@FredHaa

Description

@FredHaa

🚀 The feature

Add a CTCDecoder.save_to_dir(save_dir: str | Path) function , which saves the lexicon, tokens, kenlm file, decoder_options, and anything else required to build the decoder to a directory.

Saving the kenlm file either requires support in flashlight-text or passing the path to the CTCDecoder init instead of the KenLM object, so the file can be copied to the save_dir.

Motivation, pitch

HF transformers is looking at changing its dependency on pyctcdecode to the torchaudio CTCDecoder (huggingface/transformers/issues/41230).

In order to support pushing the decoder to the hub, it needs to support something equivalent to pyctcdecode.BeamSearchDecoderCTC.save_to_dir.

I'll be happy to make a PR.

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions