<a href="https://colab.research.google.com/github/pollinations/hive/blob/main/notebooks/3%20Audio-To-Video/1%20Lucid%20Sonic%20Dreams.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://pollinations.ai/ipfs/QmTp8v31wrHt3mvdiTv5FkMVyh2MDhWdk45XT3ff28RuuC" />


Generate a music video from an audio file - the video moves with every sound and produces abstract art by travelling through the latent space of a StyleGAN. 


Lucid Sonic Dreams syncs GAN-generated visuals to music. By default, it uses [NVLabs StyleGAN2](https://github.com/NVlabs/stylegan2), with pre-trained models lifted from [Justin Pinkney's consolidated repository](https://github.com/justinpinkney/awesome-pretrained-stylegan2). Custom weights and other GAN architectures can be used as well.

For a more detailed description of the technique refer to: [Introducing Lucid Sonic Dreams: Sync GAN Art to Music with a Few Lines of Python Code!](https://towardsdatascience.com/introducing-lucid-sonic-dreams-sync-gan-art-to-music-with-a-few-lines-of-python-code-b04f88722de1)

Sample output can be found on [YouTube](https://youtu.be/l-nGC-ve7sI) and [Instagram](https://www.instagram.com/lucidsonicdreams/).

**[UPD 17.10.2021]** Exposed more parameters
[UPD 1.10.2021] Added Visionary Art Dataset

In [None]:
# Input audio file (wav or mp3)
audio_file = '' #@param {type: "string"}

# The style to use
style = "Visionary Art"  #@param ["Abstract art","Anime portraits","CIFAR 10","CIFAR 100","Doors","Imagenet","Maps","Visionary Art","WikiArt","beetles","cakes","car (config-e)","car (config-f)","cat","church","faces (FFHQ config-e 256x256)","faces (FFHQ config-e)","faces (FFHQ config-f 512x512)","faces (FFHQ config-f)","faces (FFHQ slim 256x256)","figure drawings","flowers","fursona","grumpy cat","horse","microscope images","modern art","my little pony","obama","painting faces","panda","textures","trypophobia","ukiyoe faces","wildlife"]

# Resolution of the generated video 
resolution = 512 #@param {type: "integer"}

# Frames per second of generated video
fps = 25 #@param {type: "number"}

# The "strength" of the pulse. It is recommended to keep this between 0 and 100.
pulse_react = 80 #@param {type: "number"}

# Whether the pulse should react to percussive or harmonic elements
pulse_react_to = "percussive" #@param ["percussive", "harmonic"]

#  The "strength" of the motion. Between 0 and 100
motion_react = 80 #@param {type: "number"}

# Whether the motion should react to percussive or harmonic elements
motion_react_to = "harmonic" #@param ["harmonic", "percussive"]

# Degree of randomness of motion. Higher values will typically prevent the video from cycling through the same visuals repeatedly. Must range from 0 to 100.
motion_randomness = 50 #@param {type: "number"}

# Controls the variety of visuals generated. Lower values lead to lower variety. Note: A very low value will usually lead to "jittery" visuals. Must range from 0 to 100.
truncation = 100 #@param {type: "number"}

# Custom StyleGAN2 model file *(optional)*
file_custom_model = '' #@param {type: "string"}

# Load a FastGAN / Projected GAN model instead of StyleGAN2
use_fastgan = False #@param {type: "boolean"}

output_path = '/content'

In [None]:
if file_custom_model != '':
    style = file_custom_model

In [None]:
if use_fastgan:
    
    network_pkl = file_custom_model

    %cd /content/

    !git clone https://github.com/NVlabs/stylegan2-ada.git stylegan2
    !cp -rv stylegan2/dnnlib .
    import sys
    sys.path.append("/content/stylegan2")

    %cd /content
    !git clone https://github.com/autonomousvision/projected_gan
    !pip install timm dill
    import sys
    sys.path.append("/content/projected_gan")
    # Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
    #
    # NVIDIA CORPORATION and its licensors retain all intellectual property
    # and proprietary rights in and to this software, related documentation
    # and any modifications thereto.  Any use, reproduction, disclosure or
    # distribution of this software and related documentation without an express
    # license agreement from NVIDIA CORPORATION is strictly prohibited.

    """Generate images using pretrained network pickle."""

    import os
    import re
    from typing import List, Optional, Tuple, Union

    import click
    import dnnlib
    import numpy as np
    import PIL.Image
    import torch

    import legacy

    #----------------------------------------------------------------------------

    def parse_range(s: Union[str, List]) -> List[int]:
        '''Parse a comma separated list of numbers or ranges and return a list of ints.
        Example: '1,2,5-10' returns [1, 2, 5, 6, 7]
        '''
        if isinstance(s, list): return s
        ranges = []
        range_re = re.compile(r'^(\d+)-(\d+)$')
        for p in s.split(','):
            m = range_re.match(p)
            if m:
                ranges.extend(range(int(m.group(1)), int(m.group(2))+1))
            else:
                ranges.append(int(p))
        return ranges

    #----------------------------------------------------------------------------

    def parse_vec2(s: Union[str, Tuple[float, float]]) -> Tuple[float, float]:
        '''Parse a floating point 2-vector of syntax 'a,b'.
        Example:
            '0,1' returns (0,1)
        '''
        if isinstance(s, tuple): return s
        parts = s.split(',')
        if len(parts) == 2:
            return (float(parts[0]), float(parts[1]))
        raise ValueError(f'cannot parse 2-vector {s}')

    #----------------------------------------------------------------------------

    def make_transform(translate: Tuple[float,float], angle: float):
        m = np.eye(3)
        s = np.sin(angle/360.0*np.pi*2)
        c = np.cos(angle/360.0*np.pi*2)
        m[0][0] = c
        m[0][1] = s
        m[0][2] = translate[0]
        m[1][0] = -s
        m[1][1] = c
        m[1][2] = translate[1]
        return m

    #----------------------------------------------------------------------------
    # "/content/drive/MyDrive/sam/projected gan training/training-runs/00000-fastgan-sofia512-gpus1-batch64-/network-snapshot.pkl",
    # [1,2,3], 
    # 1,
    # "const", 
    # "/content",
    # (0,0), 
    # 0, 
    #  None)

    print('Loading networks from "%s"...' % network_pkl)
    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
    with dnnlib.util.open_url(network_pkl) as f:
        G = legacy.load_network_pkl(f)['G_ema'].to(device) # type: ignore

    noise_dim = G.z_dim

    def generate_images(
        G,
        z,
        truncation_psi: float,
        noise_mode: str,
        translate: Tuple[float,float],
        rotate: float,
        class_idx: Optional[int]
    ):
        """Generate images using pretrained network pickle.
        Examples:
        \b
        # Generate an image using pre-trained AFHQv2 model ("Ours" in Figure 1, left).
        python gen_images.py --outdir=out --trunc=1 --seeds=2 \\
            --network=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-r-afhqv2-512x512.pkl
        \b
        # Generate uncurated images with truncation using the MetFaces-U dataset
        python gen_images.py --outdir=out --trunc=0.7 --seeds=600-605 \\
            --network=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-t-metfacesu-1024x1024.pkl
        """

        # Labels.
        label = torch.zeros([1, G.c_dim], device=device)
        if G.c_dim != 0:
            if class_idx is None:
                raise click.ClickException('Must specify class label with --class when using a conditional network')
            label[:, class_idx] = 1
        else:
            if class_idx is not None:
                print ('warn: --class=lbl ignored when running on an unconditional network')

        # Generate images.
        #for seed_idx, seed in enumerate(seeds):

        # Construct an inverse rotation/translation matrix and pass to the generator.  The
        # generator expects this matrix as an inverse to avoid potentially failing numerical
        # operations in the network.
        if hasattr(G.synthesis, 'input'):
            m = make_transform(translate, rotate)
            m = np.linalg.inv(m)
            G.synthesis.input.transform.copy_(torch.from_numpy(m))

        img = G(z, label, truncation_psi=truncation_psi, noise_mode=noise_mode)
        img = (img.permute(0, 2, 3, 1) * 127.5 + 128).clamp(0, 255).to(torch.uint8)
        return PIL.Image.fromarray(img[0].cpu().numpy(), 'RGB')


    def projected_gan(noise_batch, class_batch):
        noise_tensor = torch.from_numpy(noise_batch).cuda().float()
        return [generate_images(G, noise_tensor, 1, "const", (0,0), 0, None)]


# A. Set-Up

## A.1. Set-up GPU

Navigate to **Runtime -> Change runtime type** and make sure **Hardware accelerator** is set to GPU.

## A.3. Install Lucid Sonic Dreams

In [None]:
!pip install lucidsonicdreams

model_file = f"{style}.pkl"

!wget -N "https://pollinations.ai/ipfs/QmV5HQM1Ms3c6sejmsMwiCDNLhMLNs6f8jESenDvcYjfin/{model_file}"

!ffmpeg -y -i "{audio_file}" -vn -acodec pcm_s16le /tmp/audio.wav
audio_file = '/tmp/audio.wav'

Collecting lucidsonicdreams
  Downloading lucidsonicdreams-0.4.tar.gz (11 kB)
Collecting tensorflow==1.15
  Downloading tensorflow-1.15.0-cp37-cp37m-manylinux2010_x86_64.whl (412.3 MB)
[K     |████████████████████████████████| 412.3 MB 21 kB/s 
Collecting pygit2
  Downloading pygit2-1.6.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB)
[K     |████████████████████████████████| 4.6 MB 60.4 MB/s 
Collecting mega.py
  Downloading mega.py-1.0.8-py2.py3-none-any.whl (19 kB)
Collecting tensorflow-estimator==1.15.1
  Downloading tensorflow_estimator-1.15.1-py2.py3-none-any.whl (503 kB)
[K     |████████████████████████████████| 503 kB 73.3 MB/s 
Collecting tensorboard<1.16.0,>=1.15.0
  Downloading tensorboard-1.15.0-py3-none-any.whl (3.8 MB)
[K     |████████████████████████████████| 3.8 MB 66.5 MB/s 
[?25hCollecting keras-applications>=1.0.8
  Downloading Keras_Applications-1.0.8-py3-none-any.whl (50 kB)
[K     |████████████████████████████████| 50 kB 9.4 MB/s 
Collect

# B. Generate Sample Videos

## B.1. Choosing a Style

Styles can be selected using the **style** parameter, which takes in any of the following:

*   A valid default style name provided by the package. Run **show_styles()** to print valid values. *Note: These styles are loaded from [this repository](https://github.com/justinpinkney/awesome-pretrained-stylegan2) by Justin Pinkney.*

*   A path to a .pkl file that contains pre-trained StyleGAN weights

*   A custom function that takes noise_batch and class_batch parameters and outputs a list of Pillow Images (see example in **B.5**)





## B.2. Using Default Settings

This package is set-up so that the only arguments required are the **file path to your audio track** and the **file name of the video output**. This code snippet outputs a 45-second, low-resolution preview of a video using the "modern art" style, and all the other default settings.

The song used here is **Chemical Love by Basically Saturday Night**. You can watch the official music video [here](https://youtu.be/Gi7oQrtyjKI), or listen to them on [Spotify](https://open.spotify.com/artist/46tGdhXAQbTvxVOGgy0Fqu?si=E8mUjbWbR2uiiMR2MUc_4w)!

Click [here](https://youtu.be/oGXfOmqFYTg) to view a full-length sample video without having to run the code.

In [None]:
from lucidsonicdreams import LucidSonicDream

pulse_percussive = pulse_react_to == "percussive"
pulse_harmonic = pulse_react_to == "harmonic"

motion_percussive = motion_react_to == "percussive"
motion_harmonic =  motion_react_to == "harmonic"

if use_fastgan:
    L = LucidSonicDream(song = audio_file,
                        style = projected_gan, 
                        input_shape = noise_dim,
                        num_possible_classes = 0)
else:   
    L = LucidSonicDream(song = audio_file,
                        style = model_file)

L.hallucinate(file_name = 'output.mp4',
              resolution = resolution,
              fps = fps,
              motion_percussive = motion_percussive,
              motion_harmonic = motion_harmonic,
              pulse_percussive = pulse_percussive,
              pulse_harmonic = pulse_harmonic,
              pulse_react = pulse_react / 200,
              motion_react = motion_react / 200,
              motion_randomness = motion_randomness / 100,
              truncation = truncation / 100
              )
!cp output.mp4 $output_path/output.mp4
#files.download("chemical_love.mp4")

In [None]:
import os.path
if not os.path.exists(output_path+'/output.mp4'):
  raise Exception("Expected output file does not exist.")