# Audio Classification with a pipeline

## Hugging Face Example

In [1]:
from datasets import load_dataset
from datasets import Audio

minds = load_dataset("PolyAI/minds14", name="en-AU", split="train")
minds = minds.cast_column("audio", Audio(sampling_rate=16_000))

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


In [2]:
from transformers import pipeline

classifier = pipeline(
    "audio-classification",
    model="anton-l/xtreme_s_xlsr_300m_minds14",
)

config.json:   0%|          | 0.00/2.73k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


pytorch_model.bin:   0%|          | 0.00/1.26G [00:00<?, ?B/s]

Some weights of the model checkpoint at anton-l/xtreme_s_xlsr_300m_minds14 were not used when initializing Wav2Vec2ForSequenceClassification: ['wav2vec2.encoder.pos_conv_embed.conv.weight_g', 'wav2vec2.encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing Wav2Vec2ForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ForSequenceClassification were not initialized from the model checkpoint at anton-l/xtreme_s_xlsr_300m_minds14 and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos

preprocessor_config.json:   0%|          | 0.00/212 [00:00<?, ?B/s]

In [3]:
example = minds[0]

In [4]:
classifier(example["audio"]["array"])

[{'score': 0.9625311493873596, 'label': 'pay_bill'},
 {'score': 0.0286727175116539, 'label': 'freeze'},
 {'score': 0.003349797800183296, 'label': 'card_issues'},
 {'score': 0.0020058038644492626, 'label': 'abroad'},
 {'score': 0.000848432828206569, 'label': 'high_value_payment'}]

In [5]:
id2label = minds.features["intent_class"].int2str
id2label(example["intent_class"])

'pay_bill'

## My Example

https://huggingface.co/mispeech/ced-base

In [87]:
from transformers import pipeline

#pipe = pipeline("audio-classification", model="MIT/ast-finetuned-audioset-10-10-0.4593")
pipe = pipeline("audio-classification", model="mispeech/ced-tiny")

config.json:   0%|          | 0.00/24.9k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


ValueError: The checkpoint you are trying to load has model type `ced` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

In [18]:
audioset = load_dataset("agkphysics/AudioSet", "balanced", streaming=True)
audioset

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


IterableDatasetDict({
    train: IterableDataset({
        features: ['video_id', 'audio', 'labels', 'human_labels'],
        n_shards: 1
    })
    test: IterableDataset({
        features: ['video_id', 'audio', 'labels', 'human_labels'],
        n_shards: 1
    })
})

In [88]:
pipe({"path" : "", "array" : example["audio"]["array"], "sampling_rate": example["audio"]["sampling_rate"]})

[{'score': 0.864713191986084, 'label': 'Alarm clock'},
 {'score': 0.06491716206073761, 'label': 'Beep, bleep'},
 {'score': 0.017518045380711555, 'label': 'Alarm'},
 {'score': 0.012518809176981449, 'label': 'Inside, small room'},
 {'score': 0.004593153018504381, 'label': 'Clock'}]

In [62]:
import librosa

array, sampling_rate = librosa.load("audio-samples/angelo-clap3.mp3")

In [89]:
pipe({"path" : "" , "array" : array, "sampling_rate" : sampling_rate})

[{'score': 0.12241095304489136, 'label': 'Cap gun'},
 {'score': 0.10804083198308945, 'label': 'Gunshot, gunfire'},
 {'score': 0.07040899991989136, 'label': 'Sound effect'},
 {'score': 0.06782793253660202, 'label': 'Slap, smack'},
 {'score': 0.06452888250350952, 'label': 'Arrow'}]