Skip to content

jimbozhang/hf_transformers_custom_model_ced

Repository files navigation

Pretrained CED on Hugging Face

CED are simple ViT-Transformer-based models for audio tagging.

Install

cd hf_transformers_custom_model_ced
pip install .

Inference

>>> from ced_model.feature_extraction_ced import CedFeatureExtractor
>>> from ced_model.modeling_ced import CedForAudioClassification

>>> model_name = "mispeech/ced-mini"
>>> feature_extractor = CedFeatureExtractor.from_pretrained(model_name)
>>> model = CedForAudioClassification.from_pretrained(model_name)

>>> import torchaudio
>>> audio, sampling_rate = torchaudio.load("resources/JeD5V5aaaoI_931_932.wav")
>>> assert sampling_rate == 16000
>>> inputs = feature_extractor(audio, sampling_rate=sampling_rate, return_tensors="pt")

>>> import torch
>>> with torch.no_grad():
...     logits = model(**inputs).logits

>>> predicted_class_id = torch.argmax(logits, dim=-1).item()
>>> model.config.id2label[predicted_class_id]
'Finger snapping'

Fine-tuning

example_finetune_esc50.ipynb demonstrates how to train a linear head on the ESC-50 dataset with the CED encoder frozen.

About

🤗 Transformers custom model for CED.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published