<a href="https://colab.research.google.com/github/gwengo/AI-art/blob/main/maua_stylegan2_audioreactive.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Audioreactive video with StyleGAN

This notebook will walk thru the various tools and techniques in Hans Brouwer’s **Audio-reactive Latent Interpolations with StyleGAN**.

Hans has a great number of resources that were hugely helpful in learning about what’s below: [Research Paper](https://wavefunk.xyz/assets/audio-reactive-stylegan/paper.pdf) | [Blog Post](https://wavefunk.xyz/audio-reactive-stylegan) | https://github.com/JCBrouwer/maua-stylegan2/

Hans already has a [really nice notebook](https://colab.research.google.com/drive/1Ig1EXfmBC01qik11Q32P0ZffFtNipiBR#scrollTo=fOde375CrLZ0) but I wanted to walk through my process and tools in my own way.



---

**The best way to understand this notebook is to follow along with the video tutorial created in conjuction with it.**

This notebook was created for my Advanced StyleGAN class taught in the summer of 2021. If you find this notebook and content useful, please consider signing up for my [Patreon](https://www.patreon.com/bustbright) or [YouTube channel](https://www.youtube.com/channel/UCaZuPdmZ380SFUMKHVsv_AA/join). You can also send me a one-time payment on [Venmo](https://venmo.com/Derrick-Schultz).


Let’s start. Did we get a V100? Hopefully! But a P100 will work too. Anything else and we might run into issues. (Yes, you should just pay for Colab Pro, it’s $10/month.)

In [None]:
!nvidia-smi -L

Install Hans Brouwer’s repo and related libraries

In [None]:
!git clone https://github.com/dvschultz/maua-stylegan2
%cd maua-stylegan2/
!pip install ninja madmom kornia ffmpeg-python cython

## Upload model and audio files

Your model file must be in the format of a Rosinality `.pt` model. If you trained your model from a Tensorflow repo or the newer ADA PyTorch model you can convert your model to this version in [this notebook](https://colab.research.google.com/github/dvschultz/stylegan2-ada-pytorch/blob/main/SG2_ADA_PT_to_Rosinality.ipynb).

For audio, I recommend uploading `.wav` files. You may also want to look at a tool like Spleeter or Demucs to separate layers of a song. A short demo of Demucs follows this section.

If you don’t have a model to use, you can download a version of my model trained using the art of [Frea Buckler](https://www.instagram.com/freabuckler/). **Please do not use this model for any commercial work.**

In [None]:
# Derrick Schultz's Frea Buckler network (from awesome-pretrained-stylegan2)
!gdown https://drive.google.com/u/0/uc?id=1YzZemZAp7BVW701_BZ7uabJWJJaS2g7v

In [None]:
!gdown --id 1huJHdsDlj6x50j_uI1wvsIY8zW6O4lVb -O /content/freagan.pt


The audio files used in this demo were made available as a part of a remix contest. I don’t believe they are available any longer, sorry.

### Demucs

Demucs is a machine learning model from Facebook that does audio source seperation. That means it will take an audio file and split it into separate files for drums, bass, melody, and vocals. I find the results fairly hit or miss but try it out.

A better option would be to find music that is available as "stems": recordings that are separated during the recording process and have much higher fidelity. You can often find stems for electronic music on Beatport, Splice, or other remix contest sites.

Demucs is pretty straightforward to use so I’ll include it here. 

In [None]:
!pip install demucs musdb museval

In [None]:
!python -m demucs.separate -h

In [None]:
!python -m demucs.separate /content/IKnowU_all-60s.wav -n demucs48_hq --shifts 10

## Visualize Audio

Let’s first load our audio using a library called librosa.

### Aside: Shorten Audio

The longer your audio clip the longer it takes to render the video. While experimenting with different settings you may want to use a shorter clip of audio to use. We can use ffmpeg to create a shorter section.

Don’t make your audio too short. Something 30-60 seconds is usually good.

In [None]:
audio_path = "/content/IKnowU_all.wav"
output_path = "/content/IKnowU_all-30s.wav"
start_seconds = 30
end_seconds = 60
!ffmpeg -i {audio_path} -af "atrim={start_seconds}:{end_seconds}" {output_path}

### Visualize Chromagraphs and Onsets

In [None]:
import librosa as rosa
import audioreactive as ar
import numpy as np

audiofile = "/content/IKnowU_all.wav"
melody_audiofile = "/content/IKnowU_melody.wav"
drums_audiofile = "/content/IKnowU_drums.wav"
bass_audiofile = "/content/IKnowU_bass.wav"
vox_audiofile = "/content/IKnowU_bass.wav"
duration = 120 #in seconds
fps = 30
n = int(round(duration*fps))

audio, sr = rosa.load(audiofile, offset=0, duration=n)
melody, sr = rosa.load(melody_audiofile, offset=0, duration=n)
drums, sr = rosa.load(drums_audiofile, offset=0, duration=n)
bass, sr = rosa.load(bass_audiofile, offset=0, duration=n)
vox, sr = rosa.load(vox_audiofile, offset=0, duration=n)

The first thing we might want to do is look at the waveform of the audio. This might provide us with something like "volume" over time.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

plt.figure(figsize=(14, 5))
#plotting the sampled signal
rosa.display.waveplot(audio, sr=sr)

In [None]:
plt.figure(figsize=(14, 5))
rosa.display.waveplot(melody, sr=sr)

Hans’s article, however, states there are really two main ways we’ll want to use the audio track: chromagraph and onsets.

A chromagraph will give us an approximation of 12 tones (the Western scale).

In [None]:
chroma = ar.chroma(audio, 22050, n)
print("chroma, all:")
ar.plot_spectra([chroma], chroma=True)

melody_chroma = ar.chroma(melody, 22050, n)
print("chroma, melody:")
ar.plot_spectra([melody_chroma], chroma=True)

Onsets will show us the rhythms/spikes in audio changes. Onsets can also look at specific frquency ranges

In [None]:
hi_onsets = ar.onsets(audio, 22050, n, fmin=150, smooth=3)
hi_onsets_sm = ar.onsets(audio, 22050, n, fmin=150, smooth=10)
lo_onsets = ar.onsets(audio, 22050, n, fmax=150, smooth=3)
lo_onsets_sm = ar.onsets(audio, 22050, n, fmax=150, smooth=20)


print("onsets:")
ar.plot_signals([hi_onsets, hi_onsets_sm, lo_onsets, lo_onsets_sm])

In [None]:
hi_onsets = ar.onsets(drums, 22050, n, fmin=1000, smooth=3)
hi_onsets_sm = ar.onsets(drums, 22050, n, fmin=1000, smooth=10)
lo_onsets = ar.onsets(drums, 22050, n, fmax=250, smooth=3)
lo_onsets_sm = ar.onsets(drums, 22050, n, fmax=250, smooth=10)

bass_scaled = lo_onsets * 0.2

print("onsets:")
# ar.plot_signals([hi_onsets, hi_onsets_sm, lo_onsets, lo_onsets_sm])
ar.plot_signals([lo_onsets, bass_scaled])

In [None]:
bass_onsets = ar.onsets(bass, 22050, n, fmax=500, smooth=3)
bass_onsets_sm = ar.onsets(bass, 22050, n, fmax=500, smooth=20)

print("onsets:")
ar.plot_signals([bass_onsets, bass_onsets_sm])

In [None]:
vox_onsets = ar.onsets(bass, 22050, n, fmin=500, smooth=3)
vox_onsets_sm = ar.onsets(bass, 22050, n, fmin=500, smooth=20)

print("onsets:")
ar.plot_signals([vox_onsets, vox_onsets_sm])
print(vox_onsets.shape)

## Basic example

Let’s start by generating a video using the defaults in the notebook. This assumes you have a single audio track.

Let’s run `--help` to see what options we have.

In [None]:
!python generate_audiovisual.py --help

We can run a pretty basic video generation script below (swap out the path to your audio file after the `--audio_file` argument):

In [None]:
!python generate_audiovisual.py --ckpt "/content/freagan.pt" --audio_file "/content/IKnowU_all.wav" --output_dir '/content/output'

Here’s what my output looked like. You’ll notice it sometimes picks up the correct drum beats, but its somewhat inconsistent. This is likely due to the chromagraph being process over a mixed audio file rather than separating out the individual instruments and processing them separately.

## Customize the output

This creates a pretty decent video, but we can do better. Now let’s look at customizing a couple of these functions.

In [None]:
!mkdir /content/maua-stylegan2/custom/
!cp /content/maua-stylegan2/audioreactive/examples/default.py /content/maua-stylegan2/custom/custom.py

Hans recommends doing a garbage collection process every time we redo a visualization

In [None]:
print("Time                     GPU        Used      Total")
!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader
import gc
import torch
gc.collect()
torch.cuda.empty_cache()
!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader

We’ll start by just building up our chromagraph so we have nice smooth latents that correspond to tones. Hans’s examples also add a layer of onsets. I made [an example video without (left) and with (right) onsets](https://drive.google.com/file/d/1_1G0y-hgHkX8zfsMxhtJmoV8sVUU8WZy/view?usp=sharing) so you can see the difference. Onsets help pick up small details in the track so we’ll includ them below.

In [None]:
%%writefile /content/maua-stylegan2/custom/custom.py
import torch as th

import librosa as rosa
import audioreactive as ar

def initialize(args):
    # Use just the melody file so the vocals, bass, and drums don't interfere
    melody_audiofile = "/content/IKnowU_melody-60s.wav"
    args.melody, sr = rosa.load(melody_audiofile, offset=0, duration=args.n_frames)

    # melody onsets
    args.mel_hi_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmin=250, smooth=3, power=2)
    args.mel_lo_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmax=250, smooth=3, power=2)

    return args


def get_latents(selection, args):
    chroma = ar.chroma(args.melody, args.sr, args.n_frames)
    chroma_latents = ar.chroma_weight_latents(chroma, selection)
    latents = ar.gaussian_filter(chroma_latents, 4)

    # we can use onsets to capture quick changes/stabs in the melody
    lo_onsets = args.mel_lo_onsets[:, None, None]
    hi_onsets = args.mel_hi_onsets[:, None, None]

    latents = hi_onsets * selection[[-6]] + (1 - hi_onsets) * latents
    latents = lo_onsets * selection[[-10]] + (1 - lo_onsets) * latents

    latents = ar.gaussian_filter(latents, 2, causal=0.2)

    return latents


def get_noise(height, width, scale, num_scales, args):
    # we'll look at noise later

    return None

def get_bends(args):
    bends = []
    return bends

def get_rewrites(args):
    rewrites = {}
    return rewrites

def get_truncation(args):
    #fixed truncation
    truncation = 0.7
    return truncation


Let’s run this. Using `--manual_seed` will give us the control to make minor changes to our code while still getting the same vectors (or we can change that value and get totally new vectors)

In [None]:
!python generate_audiovisual.py \
--ckpt "/content/freagan.pt" \
--audio_file "/content/IKnowU_melody-60s.wav" \
--audioreactive_file "custom/custom.py" \
--output_dir '/content/output' \
--output_file "/content/output/chroma-melody-v4-onsets-deep.mp4" \
--out_size 1024 \
--manual_seed 0

[This video](https://drive.google.com/file/d/15-sM_QhTdxRvZVN1tF3VD7A91UV1j0yT/view?usp=sharing) will show the difference between our default example and choosing custom vectors

In [None]:
print("Time                     GPU        Used      Total")
!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader
import gc
import torch
gc.collect()
torch.cuda.empty_cache()
!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader

## Using custom vectors
This is nice, but what if we wanted to pick our own vectors?

(*See the bottom of this notebook for a way to generate images from seeds to make your choices*)

In [None]:
%%writefile /content/maua-stylegan2/custom/custom.py
import torch as th

import librosa as rosa
import audioreactive as ar
from models.stylegan2 import Generator

def initialize(args):
    melody_audiofile = "/content/IKnowU_melody-60s.wav"
    args.melody, sr = rosa.load(melody_audiofile, offset=0, duration=args.n_frames)

    # melody onsets
    args.mel_hi_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmin=250, smooth=3, power=2)
    args.mel_lo_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmax=250, smooth=3, power=2)

    return args


def get_latents(selection, args):
    chroma = ar.chroma(args.melody, args.sr, args.n_frames)
    # selection gives us the randomized vectors. lets skip this and make our own
    # chroma_latents = ar.chroma_weight_latents(chroma, selection)

    generator = Generator(
        1024, 512, 8, channel_multiplier=2, constant_input=not False, checkpoint='/content/freagan.pt',
    ).cuda()

    # create custom vectors
    seeds = [176,44,53,60,76,92,140,13,21,7,10,42]
    saved = []
    for idx, i in enumerate(seeds):
        th.manual_seed(i)
        zs = th.randn((1, 512), device="cuda")
        saved.append(zs)
    zs = th.cat((saved),0)
    # convert zs to ws
    custom_vectors = generator(zs, map_latents=True).cpu()

    #back to our regularly scheduled program...
    chroma_latents = ar.chroma_weight_latents(chroma, custom_vectors) 

    latents = ar.gaussian_filter(chroma_latents, 4)

    lo_onsets = args.mel_lo_onsets[:, None, None]
    hi_onsets = args.mel_hi_onsets[:, None, None]

    latents = hi_onsets * custom_vectors[[-4]] + (1 - hi_onsets) * latents
    latents = lo_onsets * custom_vectors[[-7]] + (1 - lo_onsets) * latents

    latents = ar.gaussian_filter(latents, 2, causal=0.2)

    return latents


def get_noise(height, width, scale, num_scales, args):
    # we'll look at noise later
    return None

def get_bends(args):
    bends = []
    return bends

def get_rewrites(args):
    rewrites = {}
    return rewrites

def get_truncation(args):
    #fixed truncation
    truncation = 0.7
    return truncation


In [None]:
!python generate_audiovisual.py \
--ckpt "/content/freagan.pt" \
--audio_file "/content/IKnowU_melody-60s.wav" \
--audioreactive_file "custom/custom.py" \
--output_dir '/content/output' \
--output_file "/content/output/custom-vectors.mp4" \
--out_size 1024 \
--manual_seed 0 #this is technically unnecessary now

### Example

[Here’s where we stand.](https://drive.google.com/file/d/1-W9p6E2VSiLRONSEQLCtWyvZIIYUr4ze/view?usp=sharing) On the left is the default vectors, on the right is customized to my choices.

### Aside: an alternate to hard-coding this directly into the custom script
Alternately you can pass in a list of vectors in a `.npy` file to the script to define the vectors you want to use. In fact a nice little feature of this tool is that your last used vectors are saved in a file at `/content/maua-stylegan2/workspace/last-latents.npy` (well, not the custom vectors above because we’re hacking around it.)

(Note: if you run this make sure you remove/comment out the code in the cell above that generates custom vectors)

In [None]:
!python generate_audiovisual.py \
--ckpt "/content/freagan.pt" \
--audio_file "/content/IKnowU_melody-60s.wav" \
--audioreactive_file "custom/custom.py" \
--output_dir '/content/output' \
--output_file "/content/output/custom-vectors.mp4" \
--out_size 1024 \
--latent_file '/content/maua-stylegan2/workspace/last-latents.npy' # or another custom file with 12 vectors

In [None]:
print("Time                     GPU        Used      Total")
!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader
import gc
import torch
gc.collect()
torch.cuda.empty_cache()
!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader


## I want to feel/see the bass 

Ok, we’ve got a nice little melody/interpolation going here now. But we can do a lot more. My song has a lot of bass in it, so I want to emphasize that. I think about when that bass drops, and I think about truncation. So let’s match the onsets of some bass to truncation levels 

In [None]:
%%writefile /content/maua-stylegan2/custom/custom.py
import torch as th

import librosa as rosa
import audioreactive as ar
from models.stylegan2 import Generator

def initialize(args):
    melody_audiofile = "/content/IKnowU_melody-60s.wav"
    bass_audiofile = "/content/IKnowU_bass-60s.wav"
    args.melody, sr = rosa.load(melody_audiofile, offset=0, duration=args.n_frames)
    args.bass, sr = rosa.load(bass_audiofile, offset=0, duration=args.n_frames)

    # melody onsets
    args.mel_hi_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmin=250, smooth=3, power=2)
    args.mel_lo_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmax=250, smooth=3, power=2)

    # bass onsets
    # I only really need one here
    # args.bass_hi_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmin=250, smooth=3, power=2)
    args.bass_onsets = ar.onsets(args.bass, 22050, args.n_frames, fmax=250, smooth=10)

    return args


def get_latents(selection, args):
    chroma = ar.chroma(args.melody, args.sr, args.n_frames)
    # selection gives us the randomized vectors. lets skip this and make our own
    # chroma_latents = ar.chroma_weight_latents(chroma, selection)

    generator = Generator(
        1024, 512, 8, channel_multiplier=2, constant_input=not False, checkpoint='/content/freagan.pt',
    ).cuda()

    # create custom vectors
    seeds = [176,44,53,60,76,92,140,13,21,7,10,42]
    saved = []
    for idx, i in enumerate(seeds):
        th.manual_seed(i)
        zs = th.randn((1, 512), device="cuda")
        saved.append(zs)
    zs = th.cat((saved),0)
    # convert zs to ws
    custom_vectors = generator(zs, map_latents=True).cpu()

    #back to our regularly scheduled program...
    chroma_latents = ar.chroma_weight_latents(chroma, custom_vectors) 

    latents = ar.gaussian_filter(chroma_latents, 4)

    lo_onsets = args.mel_lo_onsets[:, None, None]
    hi_onsets = args.mel_hi_onsets[:, None, None]

    latents = hi_onsets * custom_vectors[[-4]] + (1 - hi_onsets) * latents
    latents = lo_onsets * custom_vectors[[-7]] + (1 - lo_onsets) * latents

    latents = ar.gaussian_filter(latents, 2, causal=0.2)

    return latents


def get_noise(height, width, scale, num_scales, args):
    # we'll look at noise later
    # if width > 256:
    #     return None

    # lo_onsets = args.lo_onsets[:, None, None, None].cuda()
    # hi_onsets = args.hi_onsets[:, None, None, None].cuda()

    # noise_noisy = ar.gaussian_filter(th.randn((args.n_frames, 1, height, width), device="cuda"), 5)
    # noise = ar.gaussian_filter(th.randn((args.n_frames, 1, height, width), device="cuda"), 128)

    # if width < 128:
    #     noise = lo_onsets * noise_noisy + (1 - lo_onsets) * noise
    # if width > 32:
    #     noise = hi_onsets * noise_noisy + (1 - hi_onsets) * noise

    # noise /= noise.std() * 2.5

    return None

def get_bends(args):
    bends = []
    return bends

def get_rewrites(args):
    rewrites = {}
    return rewrites

def get_truncation(args):
    # lets use truncation to emphasize bass
    # onsets range from 0.0 to 1.0, so lets map our truncation from 0.7 to 2.2
    truncation = args.bass_onsets*1.5 + 0.7
    return truncation


Let’s run this and check out our results.

(Another note: I’ve combined my melody and bass tracks into one in ffmpeg. So I’m referencing that in `--audio_file` so its used in the final render, but I’m using the individual tracks to generate my chromagraph and onsets)

In [None]:
!python generate_audiovisual.py \
--ckpt "/content/freagan.pt" \
--audio_file "/content/IKnowU_bass+melody-60s.wav" \
--audioreactive_file "custom/custom.py" \
--output_dir '/content/output' \
--output_file "/content/output/bass-truncation.mp4" \
--out_size 1024

### Example

[Here’s a video](https://drive.google.com/file/d/1tq3tKkV-FAlE2vl6-s0vFgaHvCYMhGwb/view?usp=sharing) showing the bass influencing the truncation. It’s fairly subtle, but if you watch toward the end you will see some differences.

In [None]:
print("Time                     GPU        Used      Total")
!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader
import gc
import torch
gc.collect()
torch.cuda.empty_cache()
!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader

## Mapping Drums
Now we have all of our melodic reactions in, let’s look at how to use drums. You could do a lot with all the various drum pieces, but I’ll do two things. Let’s start by using them to influence the noise.

Noise in StyleGAN can change a lot model to model. In most cases it changes the grain of images (or the hair textures in something like FFHQ) but sometimes it can affect more important details. You might want to test it a bit on your model (one trick is to isolate the latent vectors and truncation and only apply noise to your video to see what happens)

[This video](https://drive.google.com/file/d/1sQksgVtgN7bJX8cjaYMCPamM8eVLpocb/view?usp=sharing) shows what audioreactive noise looks like when it is isolated without any other effects.

In [None]:
%%writefile /content/maua-stylegan2/custom/custom.py
import torch as th

import librosa as rosa
import audioreactive as ar
from models.stylegan2 import Generator

def initialize(args):
    melody_audiofile = "/content/IKnowU_melody-60s.wav"
    bass_audiofile = "/content/IKnowU_bass-60s.wav"
    drums_audiofile = "/content/IKnowU_drums-60s.wav"
    all_audiofile = "/content/IKnowU_all-60s.wav"
    args.melody, sr = rosa.load(melody_audiofile, offset=0, duration=args.n_frames)
    args.bass, sr = rosa.load(bass_audiofile, offset=0, duration=args.n_frames)
    args.drums, sr = rosa.load(drums_audiofile, offset=0, duration=args.n_frames)
    args.all, sr = rosa.load(all_audiofile, offset=0, duration=args.n_frames)

    # melody onsets
    args.mel_hi_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmin=250, smooth=3, power=2)
    args.mel_lo_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmax=250, smooth=3, power=2)

    # bass onsets
    # I only really need one here
    # args.bass_hi_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmin=250, smooth=3, power=2)
    args.bass_onsets = ar.onsets(args.bass, 22050, args.n_frames, fmax=250, smooth=10)

    # drums onsets
    args.drums_hi_onsets = ar.onsets(args.drums, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)
    args.drums_lo_onsets = ar.onsets(args.drums, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)

    # all onsets
    args.all_hi_onsets = ar.onsets(args.all, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)
    args.all_lo_onsets = ar.onsets(args.all, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)

    return args


def get_latents(selection, args):
    # to isolate noise, uncomment the following line and comment everything else in this function out (except the return line)
    # latents = selection[0].repeat(args.n_frames,1,1)

    chroma = ar.chroma(args.melody, args.sr, args.n_frames)
    generator = Generator(
        1024, 512, 8, channel_multiplier=2, constant_input=not False, checkpoint='/content/freagan.pt',
    ).cuda()

    # create custom vectors
    seeds = [13,21,7,10,42,176,44,53,60,76,92,140]
    saved = []
    for idx, i in enumerate(seeds):
        th.manual_seed(i)
        zs = th.randn((1, 512), device="cuda")
        saved.append(zs)
    zs = th.cat((saved),0)
    # convert zs to ws
    custom_vectors = generator(zs, map_latents=True).cpu()
    chroma_latents = ar.chroma_weight_latents(chroma, custom_vectors) 

    latents = ar.gaussian_filter(chroma_latents, 4)

    lo_onsets = args.mel_lo_onsets[:, None, None]
    hi_onsets = args.mel_hi_onsets[:, None, None]

    latents = hi_onsets * custom_vectors[[-4]] + (1 - hi_onsets) * latents
    latents = lo_onsets * custom_vectors[[-7]] + (1 - lo_onsets) * latents

    latents = ar.gaussian_filter(latents, 2, causal=0.2)

    return latents


def get_noise(height, width, scale, num_scales, args):
    if width > 512:
        return None

    lo_onsets = args.drums_lo_onsets[:, None, None, None].cuda()
    hi_onsets = args.drums_hi_onsets[:, None, None, None].cuda()

    noise_noisy = ar.gaussian_filter(th.randn((args.n_frames, 1, height, width), device="cuda"), 5)
    noise = ar.gaussian_filter(th.randn((args.n_frames, 1, height, width), device="cuda"), 128)

    if width < 128:
        noise = lo_onsets * noise_noisy + (1 - lo_onsets) * noise
    if width > 32:
        noise = hi_onsets * noise_noisy + (1 - hi_onsets) * noise

    noise /= noise.std() * 2.5

    return noise.cpu()

def get_bends(args):
    bends = []
    return bends

def get_rewrites(args):
    rewrites = {}
    return rewrites

def get_truncation(args):
    # to isolate noise, uncomment the following line and comment everything else out (except the return line)
    # truncation = 0.7
    
    # onsets range from 0.0 to 1.0, so lets map our truncation from 0.7 to 2.2
    truncation = args.bass_onsets*1.5 + 0.7
    return truncation


I think at this point we can add a section the whole song in to see what it looks like.

In [None]:
!python generate_audiovisual.py \
--ckpt "/content/freagan.pt" \
--audio_file "/content/IKnowU_all-60s.wav" \
--audioreactive_file "custom/custom.py" \
--output_dir '/content/output' \
--output_file "/content/output/plus-noise.mp4" \
--out_size 1024

### Example

[This video](https://drive.google.com/file/d/15-sM_QhTdxRvZVN1tF3VD7A91UV1j0yT/view?usp=sharing) will compare default noise (left) with audioreactive noise (right). You’ll notice there is still noise in the default video—that’s because StyleGAN slways includes some noise in its model. But in the custom video we can control it and use it when desired.

In [None]:
print("Time                     GPU        Used      Total")
!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader
import gc
import torch
gc.collect()
torch.cuda.empty_cache()
!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader

## Network Bends
I like the drums, but I kinda want them do a little something more. Let’s look at making the kick and snare do a little zoom into the frame using a network bend.

Hans has three built-in bend options: Zoom, Translate, and Rotate. They’re a little finicky, so I’ve only implemented Zoom here.

In [None]:
%%writefile /content/maua-stylegan2/custom/custom.py
from functools import partial

import torch as th

import librosa as rosa
import audioreactive as ar
from models.stylegan2 import Generator

def initialize(args):
    melody_audiofile = "/content/IKnowU_melody.wav"
    bass_audiofile = "/content/IKnowU_bass.wav"
    drums_audiofile = "/content/IKnowU_drums.wav"
    all_audiofile = "/content/IKnowU_all.wav"
    args.melody, sr = rosa.load(melody_audiofile, offset=0, duration=args.n_frames)
    args.bass, sr = rosa.load(bass_audiofile, offset=0, duration=args.n_frames)
    args.drums, sr = rosa.load(drums_audiofile, offset=0, duration=args.n_frames)
    args.all, sr = rosa.load(all_audiofile, offset=0, duration=args.n_frames)

    # melody onsets
    args.mel_hi_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmin=250, smooth=3, power=2)
    args.mel_lo_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmax=250, smooth=3, power=2)

    # bass onsets
    # I only really need one here
    # args.bass_hi_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmin=250, smooth=3, power=2)
    args.bass_onsets = ar.onsets(args.bass, 22050, args.n_frames, fmax=250, smooth=10)

    # drums onsets
    args.drums_hi_onsets = ar.onsets(args.drums, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)
    args.drums_lo_onsets = ar.onsets(args.drums, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)

    # all onsets
    args.all_hi_onsets = ar.onsets(args.all, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)
    args.all_lo_onsets = ar.onsets(args.all, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)

    return args


def get_latents(selection, args):

    chroma = ar.chroma(args.melody, args.sr, args.n_frames)
    generator = Generator(
        1024, 512, 8, channel_multiplier=2, constant_input=not False, checkpoint='/content/freagan.pt',
    ).cuda()

    # create custom vectors
    seeds = [176,44,53,60,76,92,140,13,21,7,10,42]
    saved = []
    for idx, i in enumerate(seeds):
        th.manual_seed(i)
        zs = th.randn((1, 512), device="cuda")
        saved.append(zs)
    zs = th.cat((saved),0)
    # convert zs to ws
    custom_vectors = generator(zs, map_latents=True).cpu()
    chroma_latents = ar.chroma_weight_latents(chroma, custom_vectors) 

    latents = ar.gaussian_filter(chroma_latents, 4)

    lo_onsets = args.mel_lo_onsets[:, None, None]
    hi_onsets = args.mel_hi_onsets[:, None, None]

    latents = hi_onsets * custom_vectors[[-4]] + (1 - hi_onsets) * latents
    latents = lo_onsets * custom_vectors[[-7]] + (1 - lo_onsets) * latents

    latents = ar.gaussian_filter(latents, 2, causal=0.2)

    return latents


def get_noise(height, width, scale, num_scales, args):
    if width > 256:
        return None

    lo_onsets = args.drums_lo_onsets[:, None, None, None].cuda()
    hi_onsets = args.drums_hi_onsets[:, None, None, None].cuda()

    noise_noisy = ar.gaussian_filter(th.randn((args.n_frames, 1, height, width), device="cuda"), 5)
    noise = ar.gaussian_filter(th.randn((args.n_frames, 1, height, width), device="cuda"), 128)

    if width < 128:
        noise = lo_onsets * noise_noisy + (1 - lo_onsets) * noise
    if width > 32:
        noise = hi_onsets * noise_noisy + (1 - hi_onsets) * noise

    noise /= noise.std() * 2.5

    return noise.cpu()

def get_bends(args):

    # transform = th.nn.Sequential(
    #     th.nn.ReplicationPad2d((2, 2, 2, 2)), ar.AddNoise(0.025 * th.randn(size=(1, 1, 4, 8), device="cuda")),
    # )
    # bends = [{"layer": 0, "transform": transform}]

    # we'll use just the lo onset so its the kick drum isolated only
    hi_onsets = args.drums_lo_onsets
    lo_onsets = args.drums_lo_onsets
    scaled_onsets = (lo_onsets * 0.75) + 1.0 # switch from range of (0.0, 1.0) to (1.0,1.75) 
    scaled_onsets += (hi_onsets * 0.75) # we'll make the snares a little more punchy

    # don't worry about this too much but we want to apply this bend to a low layer of the stylegan model
    # apply network bending to second layer in StyleGAN
    # lower layer network bends have more fluid outcomes
    tl = 4
    h = 2 ** tl
    w = h

    translation = scaled_onsets.unsqueeze(1)
    transform = lambda batch: partial(ar.Zoom, h=h, w=w)(batch)
    bends = [{"layer": tl, "transform": transform, "modulation": translation}]  # add network bend to list dict

    return bends

def get_rewrites(args):
    rewrites = {}
    return rewrites

def get_truncation(args):    
    # onsets range from 0.0 to 1.0, so lets map our truncation from 0.7 to 2.2
    truncation = (args.bass_onsets * 1.5) + 0.7
    return truncation


At this point I’m going to render the full song. You may find that the section of audio you rendered previously looks slightly different now—that’s an indication that the rest of the audio has additional peaks not accounted for previously when you process the onsets on the longer source. You may need to go back and edit some of your previous settings if so.

In [None]:
!python generate_audiovisual.py \
--ckpt "/content/freagan.pt" \
--audio_file "/content/IKnowU_all.wav" \
--audioreactive_file "custom/custom.py" \
--output_dir '/content/output' \
--output_file "/content/output/add-bends.mp4" \
--out_size 1024

### Example
[Video example is here](https://drive.google.com/file/d/1guHgynifzYeLmu-FkHxICtDdp-tvUEiC/view?usp=sharing). One interesting effect of using the whole video is that the bass kick early in the song don’t have as much affect as they do later.

In [None]:
print("Time                     GPU        Used      Total")
!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader
import gc
import torch
gc.collect()
torch.cuda.empty_cache()
!nvidia-smi --query-gpu=timestamp,name,memory.used,memory.free --format=csv,noheader

## Using Feature Vectors

Ok this is looking really good! We have this vocal file, so we might as well use it. We’ve previously covered feature vectors, so I’m going to have the vocals activate a single feature vector.

(*See more in Appendix 2 at the bottom about trying out feature vectors*)


In [None]:
%%writefile /content/maua-stylegan2/custom/custom.py
from functools import partial

import torch as th

import librosa as rosa
import audioreactive as ar
from models.stylegan2 import Generator

def initialize(args):
    melody_audiofile = "/content/IKnowU_melody.wav"
    bass_audiofile = "/content/IKnowU_bass.wav"
    drums_audiofile = "/content/IKnowU_drums.wav"
    all_audiofile = "/content/IKnowU_all.wav"
    vox_audiofile = "/content/IKnowU_vox.wav"
    args.melody, sr = rosa.load(melody_audiofile, offset=0, duration=args.n_frames)
    args.bass, sr = rosa.load(bass_audiofile, offset=0, duration=args.n_frames)
    args.drums, sr = rosa.load(drums_audiofile, offset=0, duration=args.n_frames)
    args.all, sr = rosa.load(all_audiofile, offset=0, duration=args.n_frames)
    args.vox, sr = rosa.load(vox_audiofile, offset=0, duration=args.n_frames)

    # melody onsets
    args.mel_hi_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmin=250, smooth=3, power=2)
    args.mel_lo_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmax=250, smooth=3, power=2)

    # bass onsets
    # I only really need one here
    # args.bass_hi_onsets = ar.onsets(args.melody, 22050, args.n_frames, fmin=250, smooth=3, power=2)
    args.bass_onsets = ar.onsets(args.bass, 22050, args.n_frames, fmax=250, smooth=10)

    # drums onsets
    args.drums_hi_onsets = ar.onsets(args.drums, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)
    args.drums_lo_onsets = ar.onsets(args.drums, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)

    # all onsets
    args.all_hi_onsets = ar.onsets(args.all, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)
    args.all_lo_onsets = ar.onsets(args.all, 22050, args.n_frames, fmax=150, smooth=5, clip=97, power=2)

    # vox onsets
    # lets skip the bass onsets
    args.vox_hi_onsets = ar.onsets(args.vox, 22050, args.n_frames, fmin=250, smooth=3)
    print(args.vox_hi_onsets.shape)

    return args


def get_latents(selection, args):

    chroma = ar.chroma(args.melody, args.sr, args.n_frames)
    generator = Generator(
        1024, 512, 8, channel_multiplier=2, constant_input=not False, checkpoint='/content/freagan.pt',
    ).cuda()

    # create custom vectors
    seeds = [176,44,53,60,76,92,140,13,21,7,10,42]
    saved = []
    for idx, i in enumerate(seeds):
        th.manual_seed(i)
        zs = th.randn((1, 512), device="cuda")
        saved.append(zs)
    zs = th.cat((saved),0)
    # convert zs to ws
    custom_vectors = generator(zs, map_latents=True).cpu()
    chroma_latents = ar.chroma_weight_latents(chroma, custom_vectors) 

    latents = ar.gaussian_filter(chroma_latents, 4)

    lo_onsets = args.mel_lo_onsets[:, None, None]
    hi_onsets = args.mel_hi_onsets[:, None, None]

    latents = hi_onsets * custom_vectors[[-4]] + (1 - hi_onsets) * latents
    latents = lo_onsets * custom_vectors[[-7]] + (1 - lo_onsets) * latents

    latents = ar.gaussian_filter(latents, 2, causal=0.2)

    # read in feature vector file
    eigvec = th.load('/content/freagan-factor.pt')["eigvec"].to("cuda")
    # I chose vector 17 after looking thru some feature vectors
    direction = eigvec[:, 17].unsqueeze(0)

    # convert direction to w vector
    direction_w = generator(direction, map_latents=True).repeat(args.n_frames,1,1).cpu() # [1,18,512] to [n_frames,18,512]
    
    # 10 is the strength of the effect the vector will have, you can make it smaller or larger
    vox_hi_onsets = args.vox_hi_onsets[:, None, None]
    print(vox_hi_onsets.shape)
    dist = 10 * vox_hi_onsets * direction_w 

    latents += dist

    return latents


def get_noise(height, width, scale, num_scales, args):
    if width > 256:
        return None

    lo_onsets = args.drums_lo_onsets[:, None, None, None].cuda()
    hi_onsets = args.drums_hi_onsets[:, None, None, None].cuda()

    noise_noisy = ar.gaussian_filter(th.randn((args.n_frames, 1, height, width), device="cuda"), 5)
    noise = ar.gaussian_filter(th.randn((args.n_frames, 1, height, width), device="cuda"), 128)

    if width < 128:
        noise = lo_onsets * noise_noisy + (1 - lo_onsets) * noise
    if width > 32:
        noise = hi_onsets * noise_noisy + (1 - hi_onsets) * noise

    noise /= noise.std() * 2.5

    return noise.cpu()

def get_bends(args):

    # transform = th.nn.Sequential(
    #     th.nn.ReplicationPad2d((2, 2, 2, 2)), ar.AddNoise(0.025 * th.randn(size=(1, 1, 4, 8), device="cuda")),
    # )
    # bends = [{"layer": 0, "transform": transform}]

    # we'll use just the lo onset so its the kick drum isolated only
    hi_onsets = args.drums_lo_onsets
    lo_onsets = args.drums_lo_onsets
    scaled_onsets = (lo_onsets * 0.75) + 1.0 # switch from range of (0.0, 1.0) to (1.0,1.75) 
    scaled_onsets += (hi_onsets * 0.75) # we'll make the snares a little more punchy

    # don't worry about this too much but we want to apply this bend to a low layer of the stylegan model
    # apply network bending to second layer in StyleGAN
    # lower layer network bends have more fluid outcomes
    tl = 4
    h = 2 ** tl
    w = h

    translation = scaled_onsets.unsqueeze(1)
    transform = lambda batch: partial(ar.Zoom, h=h, w=w)(batch)
    bends = [{"layer": tl, "transform": transform, "modulation": translation}]  # add network bend to list dict
    
    return bends

def get_rewrites(args):
    rewrites = {}
    return rewrites

def get_truncation(args):    
    # onsets range from 0.0 to 1.0, so lets map our truncation from 0.7 to 2.2
    truncation = (args.bass_onsets * 1.5) + 0.7

    return truncation


## Final Render
Let’s do the final render. If you know this is your final video, I recommend using `--ffmpeg_preset veryslow`. This will take a little longer to render but will lead to better quality video and a smaller filesize.

In [None]:
!python generate_audiovisual.py \
--ckpt "/content/freagan.pt" \
--audio_file "/content/IKnowU_all.wav" \
--audioreactive_file "custom/custom.py" \
--output_dir '/content/output' \
--output_file "/content/output/add-feature-vectors.mp4" \
--ffmpeg_preset veryslow \
--out_size 1024

### Example

[Here’s the last step of our video.](https://drive.google.com/file/d/1BwHWovSi6V_MPe55JHOm8HVG6goT_XDz/view?usp=sharing)

### Final Comparison

If you want to see the difference from the default vidoe to the custom version you can [see the final output on my Vimeo channel](https://vimeo.com/560688589).

# Appendix

Some tools not completely related to the audioreactive tool, but helpful for various components of it.

## Appendix 1: Generating Seed Images for Testing

It’s possible that a seed chosen in the ADA repo will generate the same image in this repo. But it’s probably safer to test them with this repo entirely. The code below will generate images based on seeds for you to use elsewhere in this notebook.

In [None]:
# set seeds here
seeds = range(0,200)
truncation = 0.5
size = 1024 # edit this if your model is smaller than 1024

#you probably don't need to edit anything below here
import os
import torch as th
from torchvision import utils
from models.stylegan2 import Generator

os.makedirs('/content/seeds',exist_ok=True)

generator = Generator(
    size, 512, 8, channel_multiplier=2, constant_input=not False, checkpoint='/content/freagan.pt',
).cuda()

if truncation < 1:
    with th.no_grad():
        mean_latent = generator.mean_latent(4096)
else:
    mean_latent = None

with th.no_grad():
    generator.eval()
    for idx, i in enumerate(seeds):
        th.manual_seed(i)
        sample_z = th.randn((1,512), device="cuda")

        sample, _ = generator(
            [sample_z], truncation=truncation, truncation_latent=mean_latent
        )

        utils.save_image(
            sample,
            f"/content/seeds/{str(i).zfill(6)}.jpg",
            nrow=1,
            normalize=True,
            value_range=(-1, 1),
        )

Best thing to do is probably zip up your images and download them.

In [None]:
!zip -r /content/seeds.zip /content/seeds/

If you want to redo this with different seeds, run the last line to trash the previous folder.

In [None]:
!rm -r /content/seeds

## Appendix 2: Generating Feature Vectors
You can generate feature vectors directly from the rosinality repo. 

Below we’ll install it.

In [None]:
%cd /content/
!git clone https://github.com/dvschultz/stylegan2-pytorch
%cd /content/stylegan2-pytorch

Now we can generate the feature vectors with the following script:

In [None]:
!python closed_form_factorization.py /content/freagan.pt --out /content/freagan-factor.pt

Now we can generate our feature vectors. Because features can operate differently on different images, I highly recommend you choose your seeds using the Appendix 1 before doing this step. And then use those seeds in the first variable below.

Then set your feature vectors in `vecs`. You could choose a range of `range(0,511)` to get all of them, but be warned that’s going to be a lot of images to sift through.

In [None]:
# set seeds, vectors, and truncation
seeds = [176,44,53,60,76,92,140,13,21,7,10,42]
vecs = range(0,50)
truncation = 0.5
model_file = '/content/freagan.pt'
feature_file = '/content/freagan-factor.pt'

# you probably don't need to edit anything below here
import os
import torch
from torchvision import utils
from model import Generator

os.makedirs('/content/vectors',exist_ok=True)


def line_interpolate(zs, steps):
   out = []
   for i in range(len(zs)-1):
    for index in range(steps):
     fraction = index/float(steps) 
     out.append(zs[i+1]*fraction + zs[i]*(1-fraction))
   return out

eigvec = torch.load(feature_file)["eigvec"].to("cuda")
ckpt = torch.load(model_file)
g = Generator(1024, 512, 8, channel_multiplier=2).to("cuda")
g.load_state_dict(ckpt["g_ema"], strict=False)

trunc = g.mean_latent(4096)

for idx, i in enumerate(seeds):
    torch.manual_seed(i)

    latent = torch.randn(1, 512, device="cuda")
    latent = g.get_latent(latent)

    for v_idx, v in enumerate(vecs):
        direction = 10 * eigvec[:, v].unsqueeze(0)

        img, _ = g(
            [latent],
            truncation=truncation,
            truncation_latent=trunc,
            input_is_latent=True,
        )
        img1, _ = g(
            [latent + direction],
            truncation=truncation,
            truncation_latent=trunc,
            input_is_latent=True,
        )
        img2, _ = g(
            [latent - direction],
            truncation=truncation,
            truncation_latent=trunc,
            input_is_latent=True,
        )

        grid = utils.save_image(
            torch.cat([img1, img, img2], 0),
            f"/content/vectors/index_{v}-degree_10-seed_{i}.jpg",
            normalize=True,
            value_range=(-1, 1),
            # nrow=args.n_sample,
        )


I recommend zipping and downloading these to look at on your desktop.

In [None]:
!zip -r /content/vectors.zip /content/vectors/

From here I recommend you look thru the images and find a single vector (`index_X`) that you like for all of the seeds. Lower indexes will make drastic modifications, while higher indexes will often exhibit much more subtle changes.

If you want to return to the maua repo, make sure you `cd` back to it:

In [None]:
%cd /content/maua-stylegan2/

Cleanup if you want to run it again with different settings:

In [None]:
!rm -r /content/vectors
!rm /content/vectors.zip