# Audio
> A brief description of Meyda features

- keywords: ['meyda', 'audio']

## Useful Links and Their Descriptions

**[MusicBeam](https://www.musicbeam.org/)**: Open-source music visualizer for live shows.  
**[Flutopedia - Sound Color](http://www.flutopedia.com/sound_color.htm)**: Explains the concept of "sound color" (timbre) in music.  
**[Beat Detection Using Web Audio](http://joesul.li/van/beat-detection-using-web-audio/)**: Tutorial on detecting beats in audio using the Web Audio API.  
**[W3C Web NFC](https://w3c.github.io/web-nfc/)**: Official documentation for the Web NFC API (Near Field Communication in browsers).  
**[Chrome Extensions Developer Guide](https://developer.chrome.com/extensions/devguide)**: Guide for building Chrome browser extensions.  
**[Capture MediaStream from Canvas/Video](https://developers.google.com/web/updates/2016/10/capture-stream)**: Google guide on capturing media streams from canvas or video elements.  
**[MDN getUserMedia API](https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getUserMedia)**: Documentation for accessing camera/microphone in the browser.


## Audio Feature Analysis Overview  
**Library**: [Meyda - Audio Feature Extraction](https://meyda.js.org/)

---

## What Is "The Signal"?

The *signal* is a stream of audio samples—a time-ordered array of values (typically `Float32Array`) representing the amplitude of sound pressure over time.  
Each value typically ranges between **-1.0 and 1.0**, with `0.0` representing silence or neutral pressure.

---

## What Is the Time Domain?

The **time domain** is the representation of sound as it varies over **time**.  
It’s how waveforms are stored and visualized—amplitude on the y-axis, time on the x-axis.

All audio originates in the time domain. Time-domain features directly analyze these raw waveforms without translating them into frequency components.

---

## Time-Domain Features

**RMS (Root Mean Square)**  
- **How it works**:  
  1. Square each sample (removes sign, emphasizes larger values).  
  2. Take the average of those squares (mean).  
  3. Take the square root of that mean (brings it back to the original unit).  

- **Why it correlates with perceived loudness**:  
  RMS approximates the **power** of the waveform, and louder sounds have more power.  
  It doesn’t account for human hearing sensitivity across frequencies, but it gives a good rough estimate of amplitude-based loudness.

---

**ZCR (Zero Crossing Rate)**  
- **What is a zero-crossing?**  
  When the signal value **changes sign**, i.e., it moves from positive to negative or vice versa.  
  Example: going from `+0.2` to `-0.3` crosses zero.

- **Why it's useful**:  
  - **Percussive** sounds: chaotic, irregular oscillations → high ZCR variance.  
  - **Pitched** sounds: periodic waveforms → stable ZCR.

**Energy**  
- Similar to RMS, but omits the square root.  
- Sums the squared amplitude values over time; indicates the total "work" or "intensity" of the signal.

---

## What Is the Spectral Domain?

The **spectral (or frequency) domain** represents **how much energy is present at different frequencies** rather than at different time points.  
This is obtained using the **Fourier Transform (FFT)**, which decomposes the signal into a set of sine waves at various frequencies.

Spectral features help answer: *"What frequencies are present in the sound?"* and *"How is the energy distributed among them?"*

---

## Spectral Features

**Amplitude Spectrum**  
The magnitude of the FFT; shows which frequencies are present and their intensity.

**Power Spectrum**  
Amplitude spectrum squared; emphasizes strong frequency components.

**Spectral Centroid**  
The weighted average frequency; a measure of “brightness.”

**Spectral Flatness**  
Measures tonality vs noisiness. Flatness near 1 = white noise; near 0 = tonal signal.

**Spectral Flux**  
Quantifies the change between successive spectra. High flux = dynamic/rough sounds.

**Spectral Slope**  
Linear regression on the spectrum; shows whether energy trends toward low or high frequencies.

**Spectral Rolloff**  
The frequency below which 99% of energy resides.

**Spectral Spread**  
How widely the energy is distributed around the centroid.

**Spectral Skewness**  
Asymmetry of the energy distribution.

**Spectral Kurtosis**  
Peakedness; high values imply tonal/pointy spectra.

**Chroma**  
Energy content mapped onto 12 pitch classes (C, C♯, ..., B).

---

## Perceptual Features

These account for how humans **actually hear** (nonlinear, frequency-sensitive).

**Loudness (Bark Scale)**  
Better matches human perception than RMS; uses 24 critical bands of hearing.

**Perceptual Spread**  
Distribution of energy across Bark bands; corresponds to perceived richness.

**Perceptual Sharpness**  
Emphasizes high frequencies; reflects how "snappy" a sound feels.

**MFCC (Mel-Frequency Cepstral Coefficients)**  
Captures timbre and perceptual pitch on a Mel scale; used heavily in speech/music recognition.

---

## Why Perceptual Features Differ

Human hearing is nonlinear and frequency-selective:

| Feature             | Raw (Physical)        | Perceptual (Human-Centered)        |
|---------------------|------------------------|-------------------------------------|
| Loudness            | RMS, Energy            | Bark-scale Loudness                |
| Spread              | Spectral Spread        | Perceptual Spread (Bark)           |
| Brightness/Sharpness| Centroid, Slope        | Sharpness (Weighted High Bands)    |
| Pitch Content       | Chroma, Centroid       | MFCCs, Mel-scale representation     |

---

## Slope vs Skewness Explained

You can have:

- 🔹 **Flat spectrum, skewed**  
  - **Slope = 0** → No linear trend (flat average tilt)  
  - **Skew ≠ 0** → Energy is bunched to one side of the mean  
  - 🧠 *Imagine a spectrum with two big peaks off-center but an equal dip in the middle — average tilt is zero, but energy isn't symmetrically placed.*

- 🔹 **Sloped spectrum, symmetric**  
  - **Slope ≠ 0** → Energy trends down (or up) across frequencies  
  - **Skew = 0** → Energy is symmetrically spread around the mean frequency  
  - 🧠 *Imagine a steadily descending ramp of frequencies — it slopes, but the energy on either side of the center is balanced in shape.*

### Analogy: The Seesaw  
- **Slope** = whether the whole seesaw is tilted.  
- **Skewness** = whether the mass (energy) is unevenly distributed even if the seesaw is level.