# Audio Classification System

## Overview
This notebook implements an automated audio classification system that:
- Analyzes audio files with the YAMNet model
- Performs AI-based categorization with OpenAI GPT-4
- Exports results in XML and CSV formats
- Generates JSON preset files for Max patch

---

## 🔧 PREREQUISITES & INSTALLATION

### System Requirements
- **Python 3.8 or newer** (Python 3.9-3.11 recommended)
- **8GB RAM minimum** (16GB recommended for processing many files)
- **Internet connection** (for downloading models and API access)

### Step 1: Install Python
If you don't have Python installed:

**Windows:**
1. Download from: https://www.python.org/downloads/
2. During installation, check ✅ "Add Python to PATH"
3. Complete the installation

**macOS:**
1. Python 3 is usually pre-installed
2. Check version: Open Terminal and type `python3 --version`
3. If not installed, download from: https://www.python.org/downloads/

### Step 2: Install Required Libraries
Open your **Terminal** (macOS) or **Command Prompt** (Windows) and run:

```bash
pip install tensorflow tensorflow-hub librosa openai pandas
```

**If you encounter errors**, try:
```bash
pip3 install tensorflow tensorflow-hub librosa openai pandas
```

**Note:** Installation may take 5-10 minutes as TensorFlow is a large library.

### Step 3: Get OpenAI API Key
1. Go to: https://platform.openai.com/api-keys
2. Sign up or log in to your OpenAI account
3. Click "Create new secret key"
4. Copy the key (you'll need it in Cell 2)

**Important:** Keep your API key private! Never share it publicly.

---

## 📌 HOW TO USE THIS NOTEBOOK:

### ✏️ STEP 1: Configure Settings (Cell 2 Below)
- **YOU MUST EDIT** the configuration cell (Cell 2)
- Set your folder path and API key
- ⚠️ **CRITICAL**: Ensure audio files have NO umlauts, accents, or spaces!

### ▶️ STEP 2: Run All Cells  
- After configuration, run all cells in order
- **DO NOT MODIFY** code in cells 3-5
- Just click "Run" on each cell

### ✅ STEP 3: Check Output
- Classification XMLs will be saved with your audio files
- JSON preset folders will be created in your material folder
- See summary at the end for next steps


---

# ✏️ CELL 1: CONFIGURATION - YOU MUST EDIT THIS!

⚠️ **REQUIRED: Edit the values in the cell below**

**What to change:**
1. **folder_path**: Full path to your new audio material folder
2. **api_key**: Your OpenAI API key (get from: https://platform.openai.com/api-keys)

**File Naming Requirements (CRITICAL!):**
- ❌ NO umlauts (ä, ö, ü)
- ❌ NO accents (é, à, ñ)  
- ❌ NO spaces (use underscores _)

**Examples:**
```
❌ Wrong: "Paco De Lucía.wav"
✅ Correct: "Paco_De_Lucia.wav"

❌ Wrong: "Müller & Söhne.wav"
✅ Correct: "Mueller_und_Soehne.wav"
```


In [None]:
# ═════════════════════════════════════════════════════════════════════════════
# ✏️ CONFIGURATION - EDIT THESE VALUES!
# ═════════════════════════════════════════════════════════════════════════════

# ── Your Audio Material Folder Path ──────────────────────────────────────────
# Replace with the FULL path to your new audio material folder
# 
# Example Windows: 
#   folder_path = "C:/Users/YourName/Documents/APO_Main/apo_material/new_sounds"
# 
# Example macOS: 
#   folder_path = "/Users/YourName/Documents/APO_Main/apo_material/new_sounds"

folder_path = "PUT_YOUR_FOLDER_PATH_HERE"  # ← CHANGE THIS!

# ── Your OpenAI API Key ──────────────────────────────────────────────────────
# Replace with your OpenAI API key
# Get your key from: https://platform.openai.com/api-keys

api_key = "PUT_YOUR_API_KEY_HERE"  # ← CHANGE THIS!

# ═════════════════════════════════════════════════════════════════════════════
# ⚠️ DO NOT EDIT BELOW THIS LINE!
# ═════════════════════════════════════════════════════════════════════════════

# Internal configuration (automatically set)
csv_path = "classes.csv"  # Category definitions (must be in same folder as this notebook)
xml_output_dir = folder_path  # XMLs will be saved in the material folder

print("✅ Configuration loaded!")
print(f"   Folder: {folder_path}")
print(f"   Categories file: {csv_path}")


---

# ▶️ CELLS 2-4: RUN THESE CELLS - DO NOT MODIFY!

**Instructions:**
1. Run Cell 2 (Load Dependencies)
2. Run Cell 3 (Initialize Model & Analyze)
3. Run Cell 4 (Generate JSON Presets)
4. See summary at the end

⚠️ **Do NOT modify** any code in the cells below!

---


In [None]:
# ═════════════════════════════════════════════════════════════════════════════
# CELL 2: LOAD DEPENDENCIES
# ═════════════════════════════════════════════════════════════════════════════
# ▶️ RUN THIS CELL - DO NOT MODIFY!

# ─────────────────────────────────────────────────────────────────────────────
# TROUBLESHOOTING: If you get "ModuleNotFoundError", the libraries are not 
# installed. Run this command in your Terminal/Command Prompt:
#
#   pip install tensorflow tensorflow-hub librosa openai pandas
#
# Then restart this notebook and try again.
# ─────────────────────────────────────────────────────────────────────────────

print("Loading libraries...")

import os, datetime, re, json, sys
from pathlib import Path
from xml.etree.ElementTree import Element, SubElement, tostring
from xml.dom import minidom
import tensorflow as tf
import tensorflow_hub as hub
import librosa
import numpy as np
import pandas as pd
from openai import OpenAI

# Verify versions
print("\n📦 Library versions:")
print(f"   TensorFlow: {tf.__version__}")
print(f"   Librosa: {librosa.__version__}")
print(f"   Pandas: {pd.__version__}")
print(f"   NumPy: {np.__version__}")

# Initialize OpenAI client
client = OpenAI(api_key=api_key)

print("\n✅ All dependencies loaded successfully!")
print("   You can proceed to Cell 3.")


In [None]:
# ═════════════════════════════════════════════════════════════════════════════
# CELL 3: INITIALIZE MODEL & ANALYZE AUDIO FILES
# ═════════════════════════════════════════════════════════════════════════════
# ▶️ RUN THIS CELL - DO NOT MODIFY!

# Load YAMNet model
print("Loading YAMNet model...")
yamnet_model = hub.load("https://tfhub.dev/google/yamnet/1")
class_map_path = tf.keras.utils.get_file('yamnet_class_map.csv',
    'https://raw.githubusercontent.com/tensorflow/models/master/research/audioset/yamnet/yamnet_class_map.csv')
class_names = [line.split(',')[2] for line in open(class_map_path, encoding="utf-8").readlines()[1:]]
print(f"✅ YAMNet loaded ({len(class_names)} classes)\n")

# Load categories
print(f"Loading categories from: {csv_path}")
category_df = pd.read_csv(csv_path).dropna(subset=["Label", "Meaning Sound"])
categories = category_df["Label"].tolist()
descriptions = category_df["Meaning Sound"].tolist()
instruction_block = "\n".join([f"{cat}: {desc}" for cat, desc in zip(categories, descriptions)])
print(f"✅ Loaded {len(categories)} categories\n")

# GPT-4 prompt
system_message = f"""You are an expert in perceptual sound classification.
You will receive audio analysis data and return, for EVERY category below,
(1) a value between 0 and 1 (how strongly the sound fits),
(2) a confidence between 0 and 1 (your confidence in that value),
(3) a single-sentence reasoning.

Here are the category definitions:
{instruction_block}

STRICT OUTPUT FORMAT (one line per category; no extra text):
Category | value | confidence | reasoning"""

# Find audio files
print(f"Searching for audio files in: {folder_path}")
audio_files = []
for root, dirs, files in os.walk(folder_path):
    for file in files:
        if file.lower().endswith((".wav", ".mp3", ".flac", ".ogg")):
            full_path = os.path.join(root, file)
            audio_files.append((full_path, os.path.relpath(full_path, folder_path)))
print(f"✅ Found {len(audio_files)} audio files\n")

# Utility functions
def normalize(text):
    return re.sub(r'\W+', '', str(text)).lower()

normalized_categories = {normalize(cat): cat for cat in categories}
line_regexes = [
    re.compile(r"^\s*(.+?)\s*\|\s*([01](?:\.\d+)?)\s*\|\s*([01](?:\.\d+)?)\s*\|\s*(.+?)\s*$"),
    re.compile(r"^\s*(.+?)\s*:\s*([01](?:\.\d+)?)\s*\|\s*([01](?:\.\d+)?)\s*\|\s*(.+?)\s*$"),
]

def parse_llm_lines(raw):
    out = {}
    for line in raw.splitlines():
        if not line.strip(): continue
        for rgx in line_regexes:
            m = rgx.match(line)
            if m:
                key_raw, val_str, conf_str, reason = m.groups()
                key_norm = normalize(key_raw)
                if key_norm in normalized_categories:
                    cat = normalized_categories[key_norm]
                    try:
                        val = max(0.0, min(1.0, float(val_str)))
                        conf = max(0.0, min(1.0, float(conf_str)))
                        out[cat] = (val, conf, reason.strip())
                    except: pass
                break
    for cat in categories:
        if cat not in out:
            out[cat] = (0.0, 0.0, "Not provided by model.")
    return out

def pretty_xml(elem):
    rough = tostring(elem, encoding="utf-8")
    return minidom.parseString(rough).toprettyxml(indent="  ", encoding="utf-8")

def analyze_audio(file_path):
    waveform, sr = librosa.load(file_path, sr=16000)
    scores, _, _ = yamnet_model(waveform)
    mean_scores = tf.reduce_mean(scores, axis=0).numpy()
    top_indices = mean_scores.argsort()[-15:][::-1]
    top_labels = [(class_names[i], float(mean_scores[i])) for i in top_indices]
    top_3_labels = ", ".join([label for label, _ in top_labels[:3]])
    
    D = librosa.amplitude_to_db(np.abs(librosa.stft(waveform)), ref=np.max)
    spectral_centroid = librosa.feature.spectral_centroid(y=waveform, sr=sr)
    spectral_bandwidth = librosa.feature.spectral_bandwidth(y=waveform, sr=sr)
    dominant_freq = np.argmax(np.mean(np.abs(D), axis=1)) * (sr / 2 / D.shape[0])
    rms = float(librosa.feature.rms(y=waveform).mean())
    label_str = ", ".join([f"{l} ({s:.2f})" for l, s in top_labels])
    
    user_message = f"""Audio analysis:
YAMNet top labels: {label_str}
Dominant frequency: {dominant_freq:.2f} Hz
Spectral centroid: {float(np.mean(spectral_centroid)):.2f}
Spectral bandwidth: {float(np.mean(spectral_bandwidth)):.2f}
Average loudness (RMS): {rms:.4f}

Return one line PER CATEGORY exactly as:
Category | value | confidence | reasoning"""
    
    response = client.chat.completions.create(model="gpt-4o-mini",
        messages=[{"role": "system", "content": system_message},
                  {"role": "user", "content": user_message}], temperature=0.2)
    triples = parse_llm_lines(response.choices[0].message.content)
    
    desc_response = client.chat.completions.create(model="gpt-4o-mini",
        messages=[{"role": "system", "content": "You are an expert in acoustic sound description."},
                  {"role": "user", "content": f"Describe this sound in 1-2 sentences:\n{label_str}"}],
        temperature=0.5)
    description = desc_response.choices[0].message.content.strip()
    
    return triples, top_3_labels, description

def write_xml(rel_path, triples, top3, summary):
    root = Element("audio_classification")
    metadata = SubElement(root, "metadata")
    SubElement(metadata, "reasoning").text = f"{summary} | Top-3: {top3}"
    SubElement(metadata, "analysis_date").text = datetime.date.today().isoformat()
    for cat in categories:
        val, conf, why = triples[cat]
        p = SubElement(root, "parameter")
        p.set("name", str(cat))
        p.set("value", f"{float(val):.4f}")
        p.set("confidence", f"{float(conf):.4f}")
        p.set("reasoning", why)
    xml_path = Path(xml_output_dir) / Path(rel_path).with_suffix(".xml")
    xml_path.parent.mkdir(parents=True, exist_ok=True)
    xml_path.write_bytes(pretty_xml(root))
    return str(xml_path)

# Process files
print("="*80 + "\nPROCESSING AUDIO FILES\n" + "="*80 + "\n")
csv_rows = []
for idx, (full_path, rel_path) in enumerate(audio_files, 1):
    print(f"[{idx}/{len(audio_files)}] {rel_path}")
    try:
        triples, top3, desc = analyze_audio(full_path)
        row = {"File": rel_path, "Top 3 Labels": top3, "Description": desc}
        for cat in categories:
            val, conf, why = triples[cat]
            row[cat], row[f"{cat}__conf"], row[f"{cat}__reason"] = val, conf, why
        csv_rows.append(row)
        xml_path = write_xml(rel_path, triples, top3, desc)
        print(f"  ✅ XML: {xml_path}\n")
    except Exception as e:
        print(f"  ❌ ERROR: {e}\n")

csv_path_out = os.path.join(folder_path, "audio_classification_results.csv")
pd.DataFrame(csv_rows).to_csv(csv_path_out, index=False, encoding="utf-8")
print("\n" + "="*80 + "\n✅ CLASSIFICATION COMPLETE!\n" + "="*80)
print(f"📄 CSV: {csv_path_out}")
print(f"📋 XMLs: {folder_path}")


In [None]:
# ═════════════════════════════════════════════════════════════════════════════
# CELL 4: GENERATE JSON PRESET FILES
# ═════════════════════════════════════════════════════════════════════════════
# ▶️ RUN THIS CELL - DO NOT MODIFY!

print("\n" + "="*80 + "\nGENERATING JSON PRESET FILES\n" + "="*80 + "\n")

def build_json(name):
    return {"pattrstorage": {"name": name, "slots": {}}}

input_dir = Path(folder_path)
target_dirs = [("data-Ablp", "blp"), ("data-Bblp", "blp"), 
               ("data-grain", "grain"), ("data-stretch", "stretch")]

wav_paths = [p for p in input_dir.rglob("*") if p.is_file() and p.suffix.lower() == ".wav"]
print(f"Found {len(wav_paths)} WAV files\n")

for target, name in target_dirs:
    output_dir = input_dir / target
    output_dir.mkdir(parents=True, exist_ok=True)
    for wav in wav_paths:
        json_path = output_dir / f"{wav.stem}.json"
        json_path.write_text(json.dumps(build_json(name), indent=4, ensure_ascii=False))
    print(f"✅ {len(wav_paths)} JSON files → {output_dir}")

print("\n" + "="*80 + "\n✅ JSON PRESET GENERATION COMPLETE!\n" + "="*80)


---

# ✅ SUMMARY & NEXT STEPS

## What was created:

### 1. Classification XMLs ✅
- **Location**: Saved alongside your audio files
- **Content**: YAMNet analysis + GPT-4 categorization

### 2. CSV Overview ✅
- **File**: `audio_classification_results.csv` in your material folder
- **Content**: Complete overview of all classifications

### 3. JSON Preset Folders ✅
Created in your material folder:
- `data-Ablp/` - Loop Player A presets
- `data-Bblp/` - Loop Player B presets  
- `data-grain/` - Granular Synthesizer presets
- `data-stretch/` - Time-Stretch Player presets

---

## 📋 Next Steps:

### 1. Copy Preset Folders to APO_Main
```
[your_material_folder]/data-Ablp/    → APO_Main/data-Ablp/
[your_material_folder]/data-Bblp/    → APO_Main/data-Bblp/
[your_material_folder]/data-grain/   → APO_Main/data-grain/
[your_material_folder]/data-stretch/ → APO_Main/data-stretch/
```

### 2. Backup Material to NAS
- Copy your entire material folder to NAS server **data-pg8**

### 3. Refine Presets in Max
- Open Max patch
- Load and test new sounds  
- Adjust preset parameters for desired results

### 4. Update Partitur Files
- Create preset combinations in Max
- Save to `partitur.txt` (outdoor) or `partitur_in.txt` (indoor)
- Mirror partitur files to NAS **data-pg8** for brain control

---

**For detailed instructions, see: `readme.md` Section 10.4**
