# Audio Classification System

## Overview
This notebook implements an automated audio classification system that:
- Analyzes audio files with the YAMNet model
- Performs AI-based categorization with OpenAI GPT-4
- Exports results in XML and CSV formats
- Generates JSON files for further processing

## How it Works
1. **Audio Analysis**: YAMNet extracts acoustic features from audio files
2. **AI Classification**: GPT-4 evaluates the audio against predefined categories
3. **Export**: Results are saved in structured formats

## Prerequisites
- Python environment with TensorFlow, OpenAI API
- `classes.csv` with category definitions in the same folder
- Audio files in the specified directory
- OpenAI API key


In [None]:
# =============================================================================
# DEPENDENCIES & IMPORTS
# =============================================================================

# Install dependencies (run once if not already installed)
# !pip install tensorflow tensorflow-hub librosa openai pandas

# Standard library imports
import os                    # File system operations
import datetime             # Timestamps for metadata
import re                   # Regular expressions for text processing
import json                 # JSON file processing
import sys                  # System-specific parameters
from pathlib import Path    # Modern path handling
from xml.etree.ElementTree import Element, SubElement, ElementTree, tostring  # XML creation
from xml.dom import minidom # XML formatting

# Machine Learning & Audio Processing
import tensorflow as tf     # TensorFlow for YAMNet model
import tensorflow_hub as hub  # TensorFlow Hub for pre-trained models
import librosa              # Audio analysis and processing
import numpy as np          # Numerical computations

# Data Processing & AI
import pandas as pd         # Tabular data processing
from openai import OpenAI   # OpenAI API for GPT-4 classification

In [None]:
# =============================================================================
# CONFIGURATION
# =============================================================================

# ── Path Configuration ────────────────────────────────────────────────────────
# Main directory with audio files (recursive search for .wav, .mp3, .flac, .ogg)
folder_path = "/Users/jonashammerer/Documents/25_projekte/01_alte_post/0material/material_3"

# CSV file with category definitions (must contain "Label" and "Meaning Sound" columns)
csv_path = "classes.csv"  # Relative path - CSV is in the same folder as the notebook

# Output directory for XML files (preserves folder structure)
xml_output_dir = "/Users/jonashammerer/Documents/25_projekte/01_alte_post/0material/material_3"

# ── OpenAI API Configuration ──────────────────────────────────────────────────
# IMPORTANT: Replace the API key with your own or use environment variables
# For production: client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
client = OpenAI(api_key="sk-proj-YOUR-API-KEY-HERE")


## 1. Model Initialization and Category Loading

In this section:
- The YAMNet model is loaded (Google's AudioSet model)
- Category definitions are read from the CSV file
- Audio files are searched in the directory


In [None]:

# =============================================================================
# MODEL INITIALIZATION AND CATEGORY LOADING
# =============================================================================

# ── Load YAMNet Model ─────────────────────────────────────────────────────────
# YAMNet is a pre-trained model from Google for audio classification
# It can recognize 521 different audio classes (AudioSet dataset)
print("Loading YAMNet model...")
yamnet_model = hub.load("https://tfhub.dev/google/yamnet/1")

# Download and load YAMNet class mapping
class_map_path = tf.keras.utils.get_file(
    'yamnet_class_map.csv',
    'https://raw.githubusercontent.com/tensorflow/models/master/research/audioset/yamnet/yamnet_class_map.csv'
)
class_names = [line.split(',')[2] for line in open(class_map_path, encoding="utf-8").readlines()[1:]]
print(f"YAMNet model loaded with {len(class_names)} audio classes")

# ── Load Categories from CSV File ─────────────────────────────────────────────
# The CSV file must contain "Label" and "Meaning Sound" columns
print(f"Loading categories from: {csv_path}")
category_df = pd.read_csv(csv_path)
category_df = category_df.dropna(subset=["Label", "Meaning Sound"])
categories = category_df["Label"].tolist()
descriptions = category_df["Meaning Sound"].tolist()
instruction_block = "\n".join([f"{cat}: {desc}" for cat, desc in zip(categories, descriptions)])

print(f"Loaded categories: {categories}")

# ── Define LLM Prompt for GPT-4 ───────────────────────────────────────────────
# This prompt instructs GPT-4 to provide a value, confidence,
# and reasoning for each category
system_message = f"""
You are an expert in perceptual sound classification.
You will receive audio analysis data and return, for EVERY category below,
(1) a value between 0 and 1 (how strongly the sound fits),
(2) a confidence between 0 and 1 (your confidence in that value),
(3) a single-sentence reasoning.

Here are the category definitions to use for all requests:

{instruction_block}

STRICT OUTPUT FORMAT (one line per category; no extra text):
Category | value | confidence | reasoning
"""

# ── Collect Audio Files ───────────────────────────────────────────────────────
# Recursive search for audio files in the specified directory
print(f"Searching for audio files in: {folder_path}")
audio_files = []
for root, dirs, files in os.walk(folder_path):
    for file in files:
        if file.lower().endswith((".wav", ".mp3", ".flac", ".ogg")):
            full_path = os.path.join(root, file)
            rel_path = os.path.relpath(full_path, folder_path)
            audio_files.append((full_path, rel_path))

print(f"Found: {len(audio_files)} audio files")
for full_path, rel_path in audio_files:
    print(f"  - {rel_path}")

# ── Utils ─────────────────────────────────────────────────────────────────────
def normalize(text):
    return re.sub(r'\W+', '', str(text)).lower()

normalized_categories = {normalize(cat): cat for cat in categories}

# Robust parser for "Category | value | confidence | reasoning"
line_regexes = [
    re.compile(r"^\s*(.+?)\s*\|\s*([01](?:\.\d+)?)\s*\|\s*([01](?:\.\d+)?)\s*\|\s*(.+?)\s*$"),
    re.compile(r"^\s*(.+?)\s*:\s*([01](?:\.\d+)?)\s*\|\s*([01](?:\.\d+)?)\s*\|\s*(.+?)\s*$"),
    re.compile(r"^\s*(.+?)\s*:\s*val(?:ue)?\s*=\s*([01](?:\.\d+)?)\s*,\s*conf(?:idence)?\s*=\s*([01](?:\.\d+)?)\s*,\s*reason(?:ing)?\s*=\s*(.+?)\s*$")
]

def parse_llm_lines(raw: str):
    out = {}
    for line in raw.splitlines():
        if not line.strip():
            continue
        for rgx in line_regexes:
            m = rgx.match(line)
            if m:
                key_raw, val_str, conf_str, reason = m.groups()
                key_norm = normalize(key_raw)
                if key_norm in normalized_categories:
                    cat = normalized_categories[key_norm]
                    try:
                        val = float(val_str)
                        conf = float(conf_str)
                        # clip defensively to [0,1]
                        val = max(0.0, min(1.0, val))
                        conf = max(0.0, min(1.0, conf))
                        out[cat] = (val, conf, reason.strip())
                    except:
                        pass
                break
    # Ensure all categories present; fill missing with zeros and note reason
    for cat in categories:
        if cat not in out:
            out[cat] = (0.0, 0.0, "Not provided by model; defaulted to 0 with low confidence.")
    return out


def pretty_xml(elem) -> bytes:
    # Convert Element → bytes with xml.etree
    rough = tostring(elem, encoding="utf-8")
    # Re-parse with minidom for pretty printing
    parsed = minidom.parseString(rough)
    return parsed.toprettyxml(indent="  ", encoding="utf-8")

def safe_xml_filename(rel_path: str) -> str:
    # flatten nested paths and swap extension
    base = re.sub(r"[\\/]+", "__", rel_path)
    base = re.sub(r"[^A-Za-z0-9_.-]+", "_", base)
    base = os.path.splitext(base)[0] + ".xml"
    return base

# ── Analyze one audio file ────────────────────────────────────────────────────
def analyze_audio(file_path):
    waveform, sr = librosa.load(file_path, sr=16000)
    scores, _, _ = yamnet_model(waveform)
    mean_scores = tf.reduce_mean(scores, axis=0).numpy()
    top_n = 15
    top_indices = mean_scores.argsort()[-top_n:][::-1]
    top_labels = [(class_names[i], float(mean_scores[i])) for i in top_indices]
    top_3_labels = ", ".join([label for label, _ in top_labels[:3]])

    # Audio features
    D = librosa.amplitude_to_db(np.abs(librosa.stft(waveform)), ref=np.max)
    spectral_centroid = librosa.feature.spectral_centroid(y=waveform, sr=sr)
    spectral_bandwidth = librosa.feature.spectral_bandwidth(y=waveform, sr=sr)
    dominant_freq = np.argmax(np.mean(np.abs(D), axis=1)) * (sr / 2 / D.shape[0])
    rms = float(librosa.feature.rms(y=waveform).mean())

    label_str = ", ".join([f"{l} ({s:.2f})" for l, s in top_labels])

    # Category scoring (value + confidence + reasoning)
    user_message = f"""
Audio analysis:

YAMNet top labels: {label_str}
Dominant frequency: {dominant_freq:.2f} Hz
Spectral centroid (mean): {float(np.mean(spectral_centroid)):.2f}
Spectral bandwidth (mean): {float(np.mean(spectral_bandwidth)):.2f}
Average loudness (RMS): {rms:.4f}

Return one line PER CATEGORY exactly as:
Category | value | confidence | reasoning
"""

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_message},
            {"role": "user", "content": user_message}
        ],
        temperature=0.2
    )
    raw_output = response.choices[0].message.content
    triples = parse_llm_lines(raw_output)  # {cat: (value, confidence, reasoning)}

    # Plain-language description for metadata
    desc_prompt = f"""
Describe this sound in 1–2 plain English sentences for a metadata 'reasoning' field.
Focus on perceptual qualities and context.

YAMNet: {label_str}
Dominant frequency: {dominant_freq:.2f} Hz
Spectral centroid: {float(np.mean(spectral_centroid)):.2f}
Spectral bandwidth: {float(np.mean(spectral_bandwidth)):.2f}
Loudness (RMS): {rms:.4f}
"""
    desc_response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are an expert in acoustic and perceptual sound description."},
            {"role": "user", "content": desc_prompt}
        ],
        temperature=0.5
    )
    sound_description = desc_response.choices[0].message.content.strip()

    return triples, top_3_labels, sound_description

# ── XML writer (per file) ─────────────────────────────────────────────────────
def write_xml_for_file(rel_path: str, triples: dict, top3: str, summary: str, out_dir: str = xml_output_dir):
    root = Element("audio_classification")
    metadata = SubElement(root, "metadata")
    reasoning_meta = SubElement(metadata, "reasoning")
    reasoning_meta.text = f"{summary} | Top-3 YAMNet: {top3}"
    analysis_date = SubElement(metadata, "analysis_date")
    analysis_date.text = datetime.date.today().isoformat()

    for cat in categories:
        val, conf, why = triples[cat]
        p = SubElement(root, "parameter")
        p.set("name", str(cat))
        p.set("value", f"{float(val):.4f}")
        p.set("confidence", f"{float(conf):.4f}")
        p.set("reasoning", why)

    xml_bytes = pretty_xml(root)

    # --- Preserve folder structure: rel_path -> out_dir/<rel_path>.xml ---
    xml_rel_path = Path(rel_path).with_suffix(".xml")     # e.g. sub/xy.wav -> sub/xy.xml
    xml_path = Path(out_dir) / xml_rel_path

    # Create subfolders
    xml_path.parent.mkdir(parents=True, exist_ok=True)

    with open(xml_path, "wb") as f:
        f.write(xml_bytes)

    return str(xml_path)


# ── Batch process: CSV + XMLs ─────────────────────────────────────────────────
xml_output_dir = "xml_outputs"
csv_rows = []

for full_path, rel_path in audio_files:
    print(f"\nProcessing {rel_path}...")
    triples, top3, description = analyze_audio(full_path)

    # CSV entry
    row = {"File": rel_path, "Top 3 Labels": top3, "LLM Description": description}
    for cat in categories:
        val, conf, why = triples[cat]
        row[cat] = val
        row[f"{cat}__confidence"] = conf
        row[f"{cat}__reason"] = why
    csv_rows.append(row)

    # Write XML
    xml_path = write_xml_for_file(rel_path, triples, top3, description)
    print(f"  → XML saved: {xml_path}")

# Save CSV overview
df = pd.DataFrame(csv_rows)
csv_output_path = "audio_classification_results.csv"
df.to_csv(csv_output_path, index=False, encoding="utf-8")
print(f"\nResults saved to: {csv_output_path}\nXMLs in: {os.path.abspath(xml_output_dir)}")

Found 4 audio files.

Processing PerryComo_MagicMoments.wav...
  → XML saved: /Users/jonashammerer/Documents/25_projekte/01_alte_post/0material/material_3/PerryComo_MagicMoments.xml

Processing Palestrina_MissaAeternaSanctus.wav...
  → XML saved: /Users/jonashammerer/Documents/25_projekte/01_alte_post/0material/material_3/Palestrina_MissaAeternaSanctus.xml

Processing Glockenschlag1.wav...
  → XML saved: /Users/jonashammerer/Documents/25_projekte/01_alte_post/0material/material_3/Glockenschlag1.xml

Processing Glockenschlag2.wav...
  → XML saved: /Users/jonashammerer/Documents/25_projekte/01_alte_post/0material/material_3/Glockenschlag2.xml

Results saved to: audio_classification_results.csv
XMLs in: /Users/jonashammerer/Documents/25_projekte/01_alte_post/6 classification/classification_v0.2/xml_outputs


## 2. Utility Functions

These functions support creating JSON files for the Max patch:
- Text normalization for category comparison
- Parsing of GPT-4 responses
- XML formatting and filename safety


In [None]:
import sys
import json
from pathlib import Path

def build_json(name: str) -> dict:
    return {
        "pattrstorage": {
            "name": name,
            "slots": {}
        }
    }

def generate_json_for_wavs_multi_dirs(input_dir=None, base_output_dir=None):
    if input_dir is None:
        input_dir = folder_path
    if base_output_dir is None:
        base_output_dir = folder_path

    input_dir = Path(input_dir)
    base_output_dir = Path(base_output_dir)

    # Folders and associated names
    target_dirs_and_names = [
        ("data-Ablp", "blp"),
        ("data-Bblp", "blp"),
        ("data-grain", "grain"),
        ("data-stretch", "stretch"),
    ]

    print(f"Input: {input_dir}")
    print(f"Base Output: {base_output_dir}")

    if not input_dir.exists() or not input_dir.is_dir():
        print(f"Error: Input folder does not exist or is not a folder: {input_dir}", file=sys.stderr)
        return 1

    wav_paths = [p for p in input_dir.rglob("*") if p.is_file() and p.suffix.lower() == ".wav"]
    print(f"Found WAV files: {wav_paths}")

    if not wav_paths:
        print("Note: No .wav files found.")
        return 0

    for target, name in target_dirs_and_names:
        output_dir = base_output_dir / target
        output_dir.mkdir(parents=True, exist_ok=True)
        used_names = {}
        for wav in wav_paths:
            base_name = wav.stem
            json_name = f"{base_name}.json"
            json_out = output_dir / json_name

            # Avoid collisions
            if json_out.exists():
                count = used_names.get(base_name, 1)
                while True:
                    json_name = f"{base_name}_{count}.json"
                    json_out = output_dir / json_name
                    if not json_out.exists():
                        used_names[base_name] = count + 1
                        break
                    count += 1

            payload = build_json(name)
            with json_out.open("w", encoding="utf-8") as f:
                json.dump(payload, f, ensure_ascii=False, indent=4)
        print(f"Done. {len(wav_paths)} JSON files saved in: {output_dir}")

    print("All folders have been populated.")
    return 0

generate_json_for_wavs_multi_dirs()

Input: /Users/jonashammerer/Documents/25_projekte/01_alte_post/0material/material_3
Base Output: /Users/jonashammerer/Documents/25_projekte/01_alte_post/0material/material_3
Gefundene WAV-Dateien: [PosixPath('/Users/jonashammerer/Documents/25_projekte/01_alte_post/0material/material_3/PerryComo_MagicMoments.wav'), PosixPath('/Users/jonashammerer/Documents/25_projekte/01_alte_post/0material/material_3/Palestrina_MissaAeternaSanctus.wav'), PosixPath('/Users/jonashammerer/Documents/25_projekte/01_alte_post/0material/material_3/Glockenschlag1.wav'), PosixPath('/Users/jonashammerer/Documents/25_projekte/01_alte_post/0material/material_3/Glockenschlag2.wav')]
Fertig. 4 JSON-Dateien gespeichert in: /Users/jonashammerer/Documents/25_projekte/01_alte_post/0material/material_3/data-Ablp
Fertig. 4 JSON-Dateien gespeichert in: /Users/jonashammerer/Documents/25_projekte/01_alte_post/0material/material_3/data-Bblp
Fertig. 4 JSON-Dateien gespeichert in: /Users/jonashammerer/Documents/25_projekte/01_a

0

## 3. Audio Analysis Functions

These functions perform the actual audio analysis:
- `analyze_audio()`: Analyzes an audio file with YAMNet and GPT-4
- `write_xml_for_file()`: Creates XML output for each file


## 4. JSON Generation for Further Processing

This section creates JSON files for integration into other systems:
- Creates structured JSON files for each audio file
- Organizes files into different categories (blp, grain, stretch)
- Prevents name conflicts through automatic numbering


## Summary

### What was created:

1. **CSV File**: `audio_classification_results.csv`
   - Overview table with all classification results
   - Contains values, confidences, and reasoning for each category

2. **XML Files**: Individual XML files for each audio file
   - Structured metadata with analysis date
   - Detailed parameters for each category
   - Preserves original folder structure

3. **JSON Files**: For integration into other systems
   - `data-Ablp/` and `data-Bblp/`: Buffer Loop Player
   - `data-grain/`: Granular Synthesizer  
   - `data-stretch/`: Time-Stretch Player

### Next Steps:
- Review the results in the CSV file
- Use the XML files for detailed analyses
- Integrate the JSON files into your audio processing pipeline

### Customization:
- Change `folder_path` for other audio directories
- Adapt the categories in `classes.csv` to your needs
- Modify the JSON structure depending on your use case
