# 🎧 Comprehensive List of Audio Features

---

## 🔊 1. Time-Domain Features

Derived directly from waveform data.

| Feature              | Description                                  | Use Case                    |
|----------------------|----------------------------------------------|-----------------------------|
| Zero Crossing Rate   | Number of times the signal changes sign      | Percussive sound detection  |
| Energy               | Sum of squared amplitudes                    | Volume/activity detection   |
| RMS Energy           | Root Mean Square amplitude                   | Perceived loudness          |
| Envelope             | Smoothed amplitude curve                     | Dynamics, onset detection   |
| Temporal Centroid    | Center of mass of signal energy in time      | Attack/release timing       |

---

## 🔁 2. Frequency-Domain Features

Extracted using FFT or spectrograms.

| Feature              | Description                                      | Use Case                   |
|----------------------|--------------------------------------------------|----------------------------|
| Spectral Centroid    | Center of mass of the spectrum                   | Brightness of sound        |
| Spectral Bandwidth   | Spread around the centroid                       | Timbre, sharpness          |
| Spectral Contrast    | Difference between spectral peaks/valleys       | Instrument/speaker ID      |
| Spectral Rolloff     | Frequency below which 85–95% of energy lies     | Bright vs dark sound       |
| Spectral Flatness    | Measures tone-like vs noise-like quality         | Noise/music classification |
| Spectral Flux        | How spectrum changes between frames              | Onset detection            |

---

## 🧠 3. Perceptual Features

Mimic how humans perceive sound.

| Feature                   | Description                                  | Use Case                  |
|---------------------------|----------------------------------------------|---------------------------|
| MFCC (Mel-Frequency Cepstral Coefficients) | Captures vocal tract shape | Speech/speaker recognition |
| Chroma Features           | Energy per musical pitch class               | Chord/key recognition     |
| Tonnetz                   | Harmonic relationships between pitches       | Music genre/harmony       |
| Mel Spectrogram           | Spectrogram on a Mel scale                   | Deep learning input       |
| Bark/Mel Frequency Bands  | Perceptual frequency bands                   | Psychoacoustic modeling   |

---

## 🧩 4. Rhythm & Temporal Features

| Feature              | Description                                  | Use Case                  |
|----------------------|----------------------------------------------|---------------------------|
| Tempo                | Beats per minute (BPM)                       | Beat tracking             |
| Beat Histogram       | Distribution of beat intervals               | Rhythm structure analysis |
| Onset Strength       | Measures signal's attack                     | Music/speech segmentation |

---

## 🧬 5. Statistical Features (over time)

| Feature              | Description                                  | Use Case                    |
|----------------------|----------------------------------------------|-----------------------------|
| Mean/Variance/Std    | Stats over frames for each feature           | Emotion/speaker detection   |
| Skewness/Kurtosis    | Shape of distribution over time              | Feature characterization    |
| Delta / Delta-Delta  | Change and acceleration of MFCCs             | Speech dynamics             |

---

## 🎙️ 6. Voice-Specific Features

Specialized for speech/emotion analysis.

| Feature              | Description                                  | Use Case                  |
|----------------------|----------------------------------------------|---------------------------|
| Pitch (F0)           | Fundamental frequency of voice               | Gender, emotion, speaker  |
| Jitter               | Pitch variability                            | Voice health, emotion     |
| Shimmer              | Amplitude variability                        | Fatigue, stress           |
| HNR                  | Harmonic-to-noise ratio                      | Voice clarity             |
| Formants (F1, F2...) | Resonant frequencies of vocal tract          | Vowel detection           |
| Voiced/Unvoiced      | Binary voice activity                        | Speech segmentation       |

---

## 🔐 7. Deep Learning Features

Learned representations from large models.

| Feature              | Description                                  | Use Case                      |
|----------------------|----------------------------------------------|-------------------------------|
| VGGish Embeddings    | CNN features trained on YouTube audio        | General audio tagging         |
| YAMNet Embeddings    | Audio classification embeddings              | Sound event detection         |
| OpenL3               | Learned from audio-video pairing             | Cross-modal retrieval         |

---

## 🧪 Task-Specific Features

| Task             | Feature Examples                          |
|------------------|--------------------------------------------|
| Speaker ID       | MFCC, i-vectors, x-vectors                |
| Emotion Analysis | Pitch, jitter, statistical MFCCs          |
| Music Genre      | Chroma, tempo, MFCC, spectral contrast    |
| Keyword Spotting | MFCC, log-mel spectrogram, onset energy   |
| Birdsong         | Pitch patterns, harmonics, contour        |

---

## 🧠 Pro Tip

> You almost never use raw waveform directly for classical models.  
> Instead, use features like MFCC, Chroma, or Mel Spectrograms.

---

## ✅ Common Feature Libraries

| Library           | Feature Set              | Description                           |
|-------------------|--------------------------|----------------------------------------|
| `librosa`         | MFCC, chroma, spectral    | Most commonly used Python library     |
| `pyAudioAnalysis` | High-level audio features | Emotion/speaker/content analysis       |
| `openSMILE`       | Large research-level set  | 6k+ features for emotion/speech tasks  |
| `torchaudio`      | Deep-learning ready       | Spectrograms, MFCCs, filters           |

---

## 📌 Summary

- **Time-Domain** → Amplitude changes (ZCR, RMS)
- **Frequency-Domain** → Pitch & brightness (centroid, rolloff)
- **Perceptual** → What humans hear (MFCC, Chroma)
- **Voice-Specific** → Emotions & speech (pitch, jitter, formants)
- **Statistical** → Aggregates over time (mean, delta)
- **DL Features** → Learned by CNNs from real-world audio

---

## 🚀 Want to Learn Practically?

I can walk you through:
- How to extract each type of feature
- Visualize and interpret it
- Use it for ML models or classification tasks

---

Let me know if you want to start with:
- **MFCC**
- **Chroma**
- **Spectral Features**
- **All of them (in order)**

