# Master’s Project – Advanced Deep Learning  
# Industrial Sound Anomaly Detection using Transformers  

**Authors:** 
**Date:** November 2025  


**Chosen approach :**  
Fine-tuning the **Audio Spectrogram Transformer (AST)** – MIT/ast-finetuned-audioset-10-10-0.4593  
→ Currently the **best published model on MIMII**  
→ 100 % compliant with the "transformer required" constraint  
→ Extremely easy to implement with Hugging Face (< 50 lines)

## 1. Problem Statement & Industrial Impact

Modern factories want to **predict mechanical failures before they happen**.  
Pumps, fans, valves, and sliding rails all produce a specific acoustic signature when healthy.  
As soon as a defect appears (worn bearing, leakage, imbalance, etc.), the sound changes often **very subtly**.

Goal of this project:  
Build an intelligent model that **automatically detects these sound anomalies in real time** using **Transformer-based models**, currently the state-of-the-art for modeling long-term dependencies in audio signals.

Real-world impact:
- Prevents unplanned downtime 
- Increases safety
- Enables predictive maintenance


## 2. Dataset: MIMII (2021) – The standard benchmark

Source: 

- 4 machine types: **fan**, **pump**, **valve**, **slider**
- Real factory recordings (with natural background noise)
- 10-second clips, 16 kHz, mono
- Two classes: **normal** vs **anomalous** (multiple simulated faults)

| Machine | Normal | Anomalous | Total   |
|---------|--------|-----------|---------|
| fan     | ~6,500 | ~2,000    | ~8,500  |
| pump    | ~6,500 | ~2,000    | ~8,500  |
| valve   | ~6,500 | ~2,000    | ~8,500  |
| slider  | ~6,500 | ~2,000    | ~8,500  |
| **Total**| **~26,000** | **~8,000** | **~34,000** |

i

## Final Strategy 

We focus on the **pump** machine only for training and validation:
- Source domains (training)
- Target domain (test only): never seen during training
- Zero-shot transfer test: same model on **fan** 

This allows:
- Very fast training 
- **cross-machine transfer** (fan → pump)



## 3. Data Exploration & Visualization


### 3.1 Data uploading

In [2]:
import torch
import torchaudio
import librosa
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from pathlib import Path
import os

#sns.set(style="whitegrid")
torch.manual_seed(42)

<torch._C.Generator at 0x1e675c532b0>

In [None]:
#from google.colab import drive
#drive.mount('/content/drive')

#DATA_ROOT = "/content/drive/MyDrive/MIMII_DUE"  


In [3]:
DATA_ROOT = Path('C:/Users/melissa/Desktop/Industrial Sound Anomaly Detection using Transformers/final_product')          
print("Dataset trouvé :", DATA_ROOT.exists())

Dataset trouvé : True


### modifying the sructur of the folder

actually the dataset  containes two kinds of folder those which start with **dev** followed by the name of the machine (e.g dev_data_fan)  and those that starts with **eval** (e.g eval_data_fan).

the fist type of folder in itself contains a set of subfolders train,target-test, source-test ,each subfolder has s set of wave files representing different captured sounds of the said  machines.  now the problem is that the indication that those wave are anomalious or normal is actually written on the name of the wave file.

so in order to make things easir for the upcoming  treatments we thought it will be best to restructure those folder and that's exactly what we did in this section :

so for each machine folder we explore the sub ones and look for each wave file if amolious term figure on the name of file we put it in the anomalious folder else the normal one.

in the end we've got a new folder **final product**  which is the one we are going to use .

the new folder is structured like this :



In [None]:
import shutil
from pathlib import Path

def create_final_product_dcase2021(raw_root="data", final_root="final_product"):
    """
    Works with the official DCASE 2021 Task 2 / MIMII DUE structure:
    dev_data_fan/fan/
    dev_data_pump/pump/
    ...
    Creates clean unsupervised anomaly detection structure for ALL 7 machines.
    """
    raw_root = Path(raw_root)
    final_root = Path(final_root)
    final_root.mkdir(exist_ok=True)

    # All 7 machines 
    machine_map = {
        "fan":       "dev_data_fan/fan",
        "pump":      "dev_data_pump/pump", 
        "valve":     "dev_data_valve/valve",
        "slider":    "dev_data_slider/slider",
        "gearbox":   "dev_data_gearbox/gearbox",
   
    }

    total_train_normal = 0
    total_test_normal = 0
    total_test_anomalous = 0

    for machine_name, subfolder in machine_map.items():
        machine_dir = raw_root / subfolder
        if not machine_dir.exists():
            print(f"{machine_name.upper()} → folder not found: {machine_dir}")
            continue

        out_dir = final_root / machine_name
        (out_dir / "train/normal").mkdir(parents=True, exist_ok=True)
        (out_dir / "test/normal").mkdir(parents=True, exist_ok=True)
        (out_dir / "test/anomalous").mkdir(parents=True, exist_ok=True)

        train_n = test_n = test_a = 0

        for wav_file in machine_dir.rglob("*.wav"):
            name = wav_file.name.lower()

            is_normal = "normal" in name
            is_anomaly = "anomaly" in name or "anomalous" in name
            is_source_train = "source_train" in name
            is_test = "source_test" in name or "target_test" in name

            if is_normal and is_source_train:
                shutil.copy(wav_file, out_dir / "train/normal" / wav_file.name)
                train_n += 1
            elif is_normal and is_test:
                shutil.copy(wav_file, out_dir / "test/normal" / wav_file.name)
                test_n += 1
            elif is_anomaly and is_test:
                shutil.copy(wav_file, out_dir / "test/anomalous" / wav_file.name)
                test_a += 1

        print(f"{machine_name.upper():9} → train: {train_n:4} normal | test: {test_n:3} normal + {test_a:3} anomalous")
        total_train_normal += train_n
        total_test_normal += test_n
        total_test_anomalous += test_a

    print("\nFINAL_PRODUCT CREATED SUCCESSFULLY!")
    print(f"Total → train normal: {total_train_normal} | test: {total_test_normal} normal + {total_test_anomalous} anomalous")
    print(f"Folder: {final_root.resolve()}")

# RUN THIS ONCE — creates everything perfectly
create_final_product_dcase2021()

FAN       → train: 3000 normal | test: 600 normal + 600 anomalous
PUMP      → train: 3000 normal | test: 600 normal + 600 anomalous
VALVE     → train: 3000 normal | test: 600 normal + 600 anomalous
SLIDER    → train: 3000 normal | test: 610 normal + 604 anomalous
GEARBOX   → train: 3017 normal | test: 720 normal + 687 anomalous
TOY_CAR → folder not found: data\dev_data_toy_car\toy_car
TOY_TRAIN → folder not found: data\dev_data_toy_train\toy_train

FINAL_PRODUCT CREATED SUCCESSFULLY!
Total → train normal: 15017 | test: 3130 normal + 3091 anomalous
Folder: C:\Users\melissa\Desktop\Industrial Sound Anomaly Detection using Transformers\final_product


In [4]:
DATA_DIR = Path("final_product/pump")
print("Répartition finale – Pompe")
print(f"Train → {len(list((DATA_DIR/'train'/'normal').glob('*.wav'))):,} sons normaux (source_train)")
print(f"Test  → {len(list((DATA_DIR/'test'/'normal').glob('*.wav'))):,} normaux + {len(list((DATA_DIR/'test'/'anomalous').glob('*.wav'))):,} anormaux")

Répartition finale – Pompe
Train → 3,000 sons normaux (source_train)
Test  → 600 normaux + 600 anormaux
