# Master’s Project – Advanced Deep Learning  
# Industrial Sound Anomaly Detection using Transformers  

**Authors:** 
**Date:** November 2025  


**Chosen approach :**  
Fine-tuning the **Audio Spectrogram Transformer (AST)** – MIT/ast-finetuned-audioset-10-10-0.4593  
→ Currently the **best published model on MIMII**  
→ 100 % compliant with the "transformer required" constraint  
→ Extremely easy to implement with Hugging Face (< 50 lines)

## 1. Problem Statement & Industrial Impact

Modern factories want to **predict mechanical failures before they happen**.  
Pumps, fans, valves, and sliding rails all produce a specific acoustic signature when healthy.  
As soon as a defect appears (worn bearing, leakage, imbalance, etc.), the sound changes often **very subtly**.

Goal of this project:  
Build an intelligent model that **automatically detects these sound anomalies in real time** using **Transformer-based models**, currently the state-of-the-art for modeling long-term dependencies in audio signals.

Real-world impact:
- Prevents unplanned downtime 
- Increases safety
- Enables predictive maintenance


## 2. Dataset: MIMII (2021) – The standard benchmark

Source: 

- 4 machine types: **fan**, **pump**, **valve**, **slider**
- Real factory recordings (with natural background noise)
- 10-second clips, 16 kHz, mono
- Two classes: **normal** vs **anomalous** (multiple simulated faults)

| Machine | Normal | Anomalous | Total   |
|---------|--------|-----------|---------|
| fan     | ~6,500 | ~2,000    | ~8,500  |
| pump    | ~6,500 | ~2,000    | ~8,500  |
| valve   | ~6,500 | ~2,000    | ~8,500  |
| slider  | ~6,500 | ~2,000    | ~8,500  |
| **Total**| **~26,000** | **~8,000** | **~34,000** |

i

## Final Strategy 

We focus on the **pump** machine only for training and validation:
- Source domains (training)
- Target domain (test only): never seen during training
- Zero-shot transfer test: same model on **fan** 

This allows:
- Very fast training 
- **cross-machine transfer** (fan → pump)



## 3. Data Exploration & Visualization


### 3.1 Data uploading

In [4]:
import torch
import torchaudio
import librosa
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from pathlib import Path
import os
import librosa.display
import IPython.display as ipd

#sns.set(style="whitegrid")
torch.manual_seed(42)

<torch._C.Generator at 0x17be71f35b0>

# first task :Uploading the dataset

### online through zendoo website

In [None]:
import os
os.makedirs("data", exist_ok=True)
# Install zenodo-get
! pip install zenodo-get

# Create data directory


# Use python -m instead of calling the CLI directly
! python -m zenodo_get -r 4740355 -o data



^C


'zenodo_get' is not recognized as an internal or external command,
operable program or batch file.


Collecting zenodo-get
  Downloading zenodo_get-2.0.0-py3-none-any.whl.metadata (46 kB)
Collecting wget (from zenodo-get)
  Downloading wget-3.2.zip (10 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting humanize (from zenodo-get)
  Downloading humanize-4.14.0-py3-none-any.whl.metadata (7.8 kB)
Collecting click (from zenodo-get)
  Downloading click-8.3.1-py3-none-any.whl.metadata (2.6 kB)
Collecting coverage>=7.8.2 (from zenodo-get)
  Downloading coverage-7.12.0-cp310-cp310-win_amd64.whl.metadata (9.3 kB)
Downloading zenodo_get-2.0.0-py3-none-any.whl (34 kB)
Downloading coverage-7.12.0-cp310-cp310-win_amd64.whl (220 kB)
Downloading click-8.3.1-py3-none-any.whl (108 kB)
Downloading humanize-4.14.0-p

### getting it from google drive

In [None]:
#from google.colab import drive
#drive.mount('/content/drive')

#DATA_ROOT = "/content/drive/----/MIMII_DUE"  


### if it is already installed in the folder

In [None]:
DATA_ROOT = Path('C:/Users/melissa/Desktop/Industrial Sound Anomaly Detection using Transformers/final_product')          
print("Dataset found :", DATA_ROOT.exists())

Dataset trouvé : True


### modifying the sructur of the folder

actually the dataset  containes two kinds of folder those which start with **dev** followed by the name of the machine (e.g dev_data_fan)  and those that starts with **eval** (e.g eval_data_fan).

the fist type of folder in itself contains a set of subfolders train,target-test, source-test ,each subfolder has s set of wave files representing different captured sounds of the said  machines.  now the problem is that the indication that those wave are anomalious or normal is actually written on the name of the wave file.

so in order to make things easir for the upcoming  treatments we thought it will be best to restructure those folder and that's exactly what we did in this section :

so for each machine folder we explore the sub ones and look for each wave file if amolious term figure on the name of file we put it in the anomalious folder else the normal one.

in the end we've got a new folder **final product**  which is the one we are going to use .

the new folder is structured like this :

```

final_dataset/
    fan/
        test/
            anomalous/
                section_00_source_test_anomaly_0000.wav
            normal/
                section_00_source_test_normal_0000.wav
        train/
            normal/
                section_00_source_train_normal_0000_strength_1_ambient.wav
    GearBox/
    pump/
    slider/
    valve/


```


In [None]:
import shutil
from pathlib import Path

def create_final_product_dcase2021(raw_root="data", final_root="final_dataset"):
    """
    Works with the official DCASE 2021 Task 2 / MIMII DUE structure:
    dev_data_fan/fan/
    dev_data_pump/pump/
    ...
    Creates clean unsupervised anomaly detection structure for ALL 7 machines.
    """
    raw_root = Path(raw_root)
    final_root = Path(final_root)
    final_root.mkdir(exist_ok=True)

    # All 7 machines 
    machine_map = {
        "fan":       "dev_data_fan/fan",
        "pump":      "dev_data_pump/pump", 
        "valve":     "dev_data_valve/valve",
        "slider":    "dev_data_slider/slider",
        "gearbox":   "dev_data_gearbox/gearbox",
   
    }

    total_train_normal = 0
    total_test_normal = 0
    total_test_anomalous = 0

    for machine_name, subfolder in machine_map.items():
        machine_dir = raw_root / subfolder
        if not machine_dir.exists():
            print(f"{machine_name.upper()} → folder not found: {machine_dir}")
            continue

        out_dir = final_root / machine_name
        (out_dir / "train/normal").mkdir(parents=True, exist_ok=True)
        (out_dir / "test/normal").mkdir(parents=True, exist_ok=True)
        (out_dir / "test/anomalous").mkdir(parents=True, exist_ok=True)

        train_n = test_n = test_a = 0

        for wav_file in machine_dir.rglob("*.wav"):
            name = wav_file.name.lower()

            is_normal = "normal" in name
            is_anomaly = "anomaly" in name or "anomalous" in name
            is_source_train = "source_train" in name
            is_test = "source_test" in name or "target_test" in name

            if is_normal and is_source_train:
                shutil.copy(wav_file, out_dir / "train/normal" / wav_file.name)
                train_n += 1
            elif is_normal and is_test:
                shutil.copy(wav_file, out_dir / "test/normal" / wav_file.name)
                test_n += 1
            elif is_anomaly and is_test:
                shutil.copy(wav_file, out_dir / "test/anomalous" / wav_file.name)
                test_a += 1

        print(f"{machine_name.upper():9} → train: {train_n:4} normal | test: {test_n:3} normal + {test_a:3} anomalous")
        total_train_normal += train_n
        total_test_normal += test_n
        total_test_anomalous += test_a

    print("\nFINAL_DATASET CREATED SUCCESSFULLY!")
    print(f"Total → train normal: {total_train_normal} | test: {total_test_normal} normal + {total_test_anomalous} anomalous")
    print(f"Folder: {final_root.resolve()}")

# RUN THIS ONCE — creates everything perfectly
create_final_product_dcase2021()

FAN       → train: 3000 normal | test: 600 normal + 600 anomalous
PUMP      → train: 3000 normal | test: 600 normal + 600 anomalous
VALVE     → train: 3000 normal | test: 600 normal + 600 anomalous
SLIDER    → train: 3000 normal | test: 610 normal + 604 anomalous
GEARBOX   → train: 3017 normal | test: 720 normal + 687 anomalous
TOY_CAR → folder not found: data\dev_data_toy_car\toy_car
TOY_TRAIN → folder not found: data\dev_data_toy_train\toy_train

FINAL_PRODUCT CREATED SUCCESSFULLY!
Total → train normal: 15017 | test: 3130 normal + 3091 anomalous
Folder: C:\Users\melissa\Desktop\Industrial Sound Anomaly Detection using Transformers\final_product


### checking the final results

In [None]:
DATA_DIR = Path("final_dataset/pump")
print("finale repartition – Pompe")
print(f"Train → {len(list((DATA_DIR/'train'/'normal').glob('*.wav'))):,} normal ones (source_train)")
print(f"Test  → {len(list((DATA_DIR/'test'/'normal').glob('*.wav'))):,} normal ones + {len(list((DATA_DIR/'test'/'anomalous').glob('*.wav'))):,} anomalous ones")

Répartition finale – Pompe
Train → 3,000 sons normaux (source_train)
Test  → 600 normaux + 600 anormaux


# Second Task : exploring the pump machine dataset

In [7]:
#getting the path of the pump dataset
pump_path = Path("final_dataset/pump")

train_normal_path   = pump_path / "train" / "normal"
test_normal_path    = pump_path/ "test" / "normal"
test_anomalous_path =pump_path / "test" / "anomalous"

print(f"Train normal   : {len(list(train_normal_path.glob('*.wav')))} fichiers")
print(f"Test normal    : {len(list(test_normal_path.glob('*.wav')))} fichiers")
print(f"Test anomalous : {len(list(test_anomalous_path.glob('*.wav')))} fichiers")

Train normal   : 3000 fichiers
Test normal    : 600 fichiers
Test anomalous : 600 fichiers


## pump sound exploration

In [None]:
import random
# Let's listen to some PUMP sounds (normal vs anomalous)
# picking  a  random example of each 

normal_files = list((train_normal_path).glob("*.wav"))
anomalous_files = list((test_anomalous_path).glob("*.wav"))

normal_file = random.choice(normal_files)
anomalous_file = random.choice(anomalous_files)

print("We will now listen to two pump sounds:")
print("1. Normal pump")
print("2. Anomalous (broken) pump")
print("\n let try to hear them and see if we could spot any  difference...")

# playing  normal sound first
print("Now playing: NORMAL pump sound")
ipd.display(ipd.Audio(normal_file))

# playing  anomalous sound right after
print("Now playing: ANOMALOUS pump sound (same machine, but with a fault)")
ipd.display(ipd.Audio(anomalous_file))



We will now listen to two pump sounds:
1. Normal pump
2. Anomalous (broken) pump

 let try to hear them and see if we could spot any  difference...
Now playing: NORMAL pump sound


Now playing: ANOMALOUS pump sound (same machine, but with a fault)


### Listen carefully…

We just heard:
1. A **normal pump** (healthy machine)  
2. An **anomalous pump** (same type of pump, but with a mechanical fault)

**Question**: Who can reliably tell which one was broken?

 **Almost nobody can!**  
The difference is extremely subtle (or impossible) for human ears.

**This is exactly why we need deep learning and Transformers!**

Machines produce tiny changes in sound when a fault appears.  
Our brain can't detect them  but a Transformer model that is  trained on thousands of spectrograms, **can learn these hidden patterns automatically**.

**Goal of this project**: Build a Transformer-based model that listens to a pump and instantly says:  
"Everything is OK" or "Warning: anomaly detected!"

But  before we jump into creatng the model  we are going to  visualize the waveforms and log-Mel spectrograms to start seeing the tiny differences that the model will learn.