Skip to content

Legacy Audio Analyser documentation

Robin Fernandes edited this page May 26, 2023 · 1 revision

Legacy audio analyser for automatic keyframe creation from audio data

Parseq's built-in audio analyser is deprecated and will be removed in upcoming versions. Its documentation has been removed from the README and is archived in this wiki page.

All functionality from the Audio Analyser has been moved into other parts of the Parseq UI, namely Time series creation for pitch detection, and the Reference audio section for BPM detection and audio event detection.

Audio analyer general info (read this first)

  • ⚠️ This feature is experimental. That's why it's quite separate from the main Parseq UI for now. The keyframes it generates can be merged into an existing Parseq document using the "Merge keyframes" button in the main UI.
  • Tempo, onset event and pitch detection use Aubio, via AubioJS. See the Aubio CLI documentation for the meaning of analysis parameters.
  • Not all parameters are exposed by AubioJS. Some look like they should be, but aren't (those are grayed out in the UI).
  • All processing runs in the browser, using web workers. This seems to be faster in Chrome and Safari compared to Firefox. You can speed things up by increasing the hop sizes to larger multiples of 2 (trading off accuracy).
  • Parseq generally expects audio with a constant Tempo (shuffles and other changign tempos are not yet supported). Also, tempo detection is not perfect, so you can override it before generating keyframes. If the first beat is not at the very beginning of the track, you will need to enter a manual offset for now.
  • Pitch detection is sketchy with beats in the mix. You may want to run this multiple times on different audio layers and do multiple merges.
  • The frame-per-second (FPS) specified in the analyser must match the parseq doc you're merging with, or you'll be out-of-sync.

Using the Audio analyser

The audio analyser UI is split into 3 parts:

  • Audio analysis where you load an audio file, and run algorithms on it for tempo, onset event and pitch detection.
  • Visualisation & playback where you can see your audio wave, spectrogram, detected pitch and beat & event positions, as well as play the audio file.
  • Conversion to Parseq keyframes where you define how the result of the audio analysis will be mapped to Parseq keyframes.

Digging into each section:

Audio Analysis
Parseq%20-%20parameter%20sequencer%20for%20Stable%20Diffusion

Parseq performs 3 types of analysis on your audio file:

  • Tempo detection: attempts to identify the overall number of beats per minute in the audio file.
  • Onset (event) detection: attempts to identify the moments where meaningful changes occur in the audio file. These could be drum beats, new instruments being introduced, etc...
  • Pitch detection: attempts to identify the dominant frequency at each moment of your audio file.

Here are the settings you can tweak for this analysis:

  • File: opens a file browser for you to select an audio file. Most audio formats are supported.
  • Sample rate: the samples per second to use for processing.
  • Tempo detection settings:
    • Tempo buffer: number of samples used for each analysis. Recommend to leave as is.
    • Tempo hop: number of samples between two consecutive analysis parsses. Lower values increase precision at the exepense of increased CPU time, but going too low will result in invalid results.
  • Onset event detection settings (see the aubioonset CLI docs for more complete definitions).
    • Onset buffer: number of samples used for each analysis. Recommend to leave as is.
    • Onset hop: number of samples between two consecutive analysis parsses. Lower values increase precision at the exepense of increased CPU time, but going too low will result in invalid results.
    • Onset threshold: defines how picky to be when identifying onset events. Lower threshold values result in more events, high values result in fewer events.
    • Onset silence: from aubio docs: "volume in dB under which the onset will not be detected. A value of -20.0 would eliminate most onsets but the loudest ones. A value of -90.0 would select all onsets."
    • Onset method: a selction of onset detection algorithms to choose from. Experiment with these, as they can produce vastly different results.
  • Pitch detection settings (see the aubiopitch CLI docs for more complete definitions):
    • Pitch buffer: number of samples used for each analysis. Recommend to leave as is.
    • Pitch hop: number of samples between two consecutive analysis parsses. Lower values increase precision at the exepense of increased CPU time, but going too low will result in invalid results.
    • Pitch method: a selction of pitch detection algorithms to choose from. Experiment with these, as they can produce vastly different results.
Visualisation & playback
Parseq%20-%20parameter%20sequencer%20for%20Stable%20Diffusion

The top section shows the waveform and, after running the analysis, will include markers for beats in blue at the bottom, and for onset events in red at the top. Onset markers can be dragged, but beat markers cannot. The waveform can be zoomed with the slider at the bottom. Beneath that is a simple spectrogram view.

Next is a graph showing detected pitch values over time in light purple. Typically, the pitch detection is quite "jagged", i.e. the values might include undesirable spikes. After you generate keyframes, you will see 2 additonal curves appear, which should be smoother: the normalised values in dark purple which has had filtering and normalisation applied to it as per your settings (see next section), and the keyframed values in red which are the actual values that will be used in Parseq keyframes.

Conversion to Parseq keyframes
Parseq%20-%20parameter%20sequencer%20for%20Stable%20Diffusion

Beats and Onset events can be converted to keyframes with custom values and interpolation functions for any Parseq-controllable field. Pitch data can be used to set a value at each generated keyframe.

There are a range of settings that help define how the audio analysis data will be converted to Parseq keyframes:

  • Tempo settings:
    • Filtering:
      • Include every Nth beat and Starting from allow you to pick a subset of beats to convert to keyframes. You can set either one of these to a ridiculously high value to skip generating keyframes for beats altogether.
      • Custom label: include custom text in the "info" field for these keyframes.
    • Correction:
      • BPM override: ignore the detected BPM and use this value instead. Defaults to a rounded version of the detected BPM
      • First beat offset: the time of the first beat in seconds. Useful if the song does not start immediately.
    • Set value: the value data to include in the generated keyframes
  • Onset settings:
    • Filtering:
      • Include every Nth event and Starting from allow you to pick a subset of onset events to convert to keyframes. You can set either one of these to a ridiculously high value to skip generating keyframes for onset events altogether.
      • Custom label: include custom text in the "info" field for these keyframes.
    • Set value: the value data to include in the generated keyframes
  • Pitch settings:
    • Filtering:
      • Outlier tolerance: parseq uses a Median Differencing algorithm to try to remove unwanted outliers from the pitch detection data. Higher values means allow more outliers, lower values (above 0) means cut out more outliers, or -1 to disable outlier elimination completely.
      • Discard above / discard below: a more brute-force way of removing pitch detection glitches: ignore any pitch data points outside of these thresholds.
    • Normalisation: you'll usually want to map the pitch data to a different range for the values to make sense in deforum. For example, if you're using pitch to set a prompt weight, you'll want to normalise the pitch data to a min of 0 and a max of 1. For a rotation, you might want a min of -45 and a max of 45.
    • Set value: unlike beats and onset events, pitch data points do not create keyframes because it is a continuous data stream rather than discrete events. Therefore, the normalised pitch value is assigned to the keyframes generated from the beat and onset events. Here you can set the field and interpolation that should be used (the value is the pitch value itself).

Once you are happy with your generated keyframes, hit Copy Keyframes and merge them into your Parseq Doc with the Merge Keyframes button beneath the grid in the main Parseq UI.

Parseq%20-%20parameter%20sequencer%20for%20Stable%20Diffusion

Note that you can run multiple merges if you with to set multiple different values on each keyframe.