<a href="https://colab.research.google.com/github/fzantalis/colab_collection/blob/master/Audio_Keyframe_Generator_For_Deforum_Stable_Diffusion.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Audio Keyframe Generator For Deforum Stable Diffusion**
This notebook will automatically generate keyframes for deforum stable diffusion settings

[Deforum Stable Diffusion](https://colab.research.google.com/github/fzantalis/deforum-stable-diffusion-audio/blob/main/Deforum_Stable_Diffusion.ipynb) 

In deforum Stable Diffusion you can control animation settings like zoom/angle etc based on a music aplitudes. To do so, we need to analyze a music track and generate a keyframe string. This notebook automates the whole process for us. 

More specifically, you can upload a music track and the notebook will decompose the track to separate music elements like drums, bass and vocals. Using a single element to drive the animation settings seems to be working better.

Then, based on the music elements a keyframe string is automatically generated.

#Audio Settings

In [None]:
#@markdown **1. Install the mandatory libraries. Demucs will help us decompose our music track to seperate instruments and vocals**
!pip install -q demucs
!pip install -q eyed3

In [None]:
#@markdown **2. Upload your music track**
from google.colab import files 
uploaded = files.upload() 
for name, data in uploaded.items():
  with open('audio_file.mp3', 'wb') as f:
    f.write(data)

In [None]:
#@markdown **3. Decompose your music track with demucs**
!python -m demucs.separate --mp3 --mp3-bitrate=128 audio_file.mp3

In [None]:
import IPython
#@title 4. Select the audio element that you want to isolate
element = 'drums' #@param ["drums", "bass", "vocals", "other"]
audio_file = "separated/htdemucs/audio_file/" + element + ".mp3"
IPython.display.Audio(filename=audio_file)

### Preview the audio waveform

In [None]:
#@markdown **5. Print the audio waveform**
import librosa
x, sr = librosa.load(audio_file)

%matplotlib inline
import matplotlib.pyplot as plt
import librosa.display
plt.figure(figsize=(14, 5))
librosa.display.waveplot(x, sr=sr)

In [None]:
#@markdown **6. Print the hpss modified waveform**
from matplotlib import pyplot as plt
import numpy as np

wav_harmonic, wav_percussive = librosa.effects.hpss(x, margin=(1.0, 5.0))
plt.figure(figsize=(14, 5))
librosa.display.waveplot(wav_percussive)

In [None]:
import eyed3
#@title **7. Scale the waveform according to your FPS and Audio duration and get the normalized amplitudes**
fps = 24 #@param  {type:"number"}
n_mels = 512 #@param {type:"number"}
function = "1.07 + amplitude**2" #@param {type:"string"}
#@markdown This is a function to apply to each frame value, where 'amplitude' is the original amplitude between 0 and 1

frame_duration = int(sr / fps)
duration = round(eyed3.load(audio_file).info.time_secs)

# Generate Mel Spectrogram
spec_raw = librosa.feature.melspectrogram(y=wav_percussive, sr=sr, n_mels=n_mels, hop_length=frame_duration)

# Obtain maximum value per time-frame
spec_max = np.amax(spec_raw, axis=0)

# Normalize all values between 0 and 1
spec_norm = (spec_max - np.min(spec_max)) / np.ptp(spec_max)

# rescale so its exactly the number of frames we want to generate
# 3 seconds at 12 fps == 36
amplitude_arr = np.resize(spec_norm, int(duration * fps))

x = np.arange(amplitude_arr.shape[0]) 
plt.figure(figsize=(14, 5))
plt.plot(x, amplitude_arr) 
plt.show()

In [None]:
#@title 8. Generate audio keyframes for zoom
keyframe_string=""
for i, amplitude in enumerate(amplitude_arr):
    y="{:.2f}".format(eval(function))
    keyframe_string += str(i) + ": (" + str(y) + "), "

print(keyframe_string)

In [None]:
#@title 9. Generate sample frames for rotate
import random
keyframe_string=""
for i in range(0, len(amplitude_arr), 50):
    random_angle = random.randint(-1, 1)
    keyframe_string += str(i) + ": (" + str(random_angle) + "), "

print(keyframe_string)

You can now copy/paste the keyframe string to the 'zoom' setting on the [Deforum Stable Diffusion](https://colab.research.google.com/github/fzantalis/deforum-stable-diffusion-audio/blob/main/Deforum_Stable_Diffusion.ipynb) notebook, or play with the function and generate keyframes for any other setting.