#(Almost) One-click Jukebox notebook with autosaving.

Speed upsampling supported. Switch to upsample mode will happen automatically if data file is detected within the folder provided.

Colab Pro users can use 5b_lyrics model.
Free users can also use the 5b_lyrics model, if they are assigned a Tesla T4 GPU. If assigned a Tesla K80, the weaker 1b_lyrics model is recommended.

Join the Jukebox community at https://discord.gg/aEqXFN9amV

Big thank you to Michaels Lab for all the cool new stuff: https://www.youtube.com/user/CraftMine1000

In [None]:
from google.colab import drive
drive.mount('/gdrive')

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
#@title Check which GPU you were assigned by running this cell.
!nvidia-smi -L
your_lyrics = """
"""

GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-a8d06877-8ae6-7286-2151-a636accf26e8)


In [4]:
your_lyrics = """
Can you suffocate the pain or go with the lie of convention
there's glory in this gains, another face, your inventions
are for me...to belive you would lead to a safer place
hakcing in to my world...

We roll the coulds away to clear minds

I nearly let it get a hold...of... hold of me
you sold me to live in a cold... day... the day you stole me
but then I broke free
"""

In [5]:
#@title Select your settings and run this cell to start generating

from google.colab import drive
drive.mount('/content/gdrive')

!pip install --upgrade git+https://github.com/craftmine1000/jukebox-saveopt.git

import jukebox
import torch as t
import librosa
import os
from IPython.display import Audio
from jukebox.make_models import make_vqvae, make_prior, MODELS, make_model
from jukebox.hparams import Hyperparams, setup_hparams
from jukebox.sample import sample_single_window, _sample, \
                           sample_partial_window, upsample, \
                           load_prompts
from jukebox.utils.dist_utils import setup_dist_from_mpi
from jukebox.utils.torch_utils import empty_cache
# MPI Connect. MPI doesn't like being initialized twice, hence the following
try:
    if device is not None:
        pass
except NameError:
    rank, local_rank, device = setup_dist_from_mpi()

model = "5b_lyrics" #@param ["5b_lyrics", "1b_lyrics"]
hps = Hyperparams()
hps.sr = 44100
hps.n_samples =  7#@param {type:"integer"}
# Specifies the directory to save the sample in.
# We set this to the Google Drive mount point.
hps.name = '/content/gdrive/MyDrive' #@param {type:"string"}
chunk_size = 16 if model in ('5b', '5b_lyrics') else 32
gpu_info = !nvidia-smi -L
if gpu_info[0].find('Tesla T4') >= 0:
  max_batch_size = 2
  print('Tesla T4 detected, max_batch_size set to 2')
elif gpu_info[0].find('Tesla K80') >= 0:
  max_batch_size = 8
  print('Tesla K80 detected, max_batch_size set to 8')
elif gpu_info[0].find('Tesla P100') >= 0:
  max_batch_size = 3
  print('Tesla P100 detected, max_batch_size set to 3')
elif gpu_info[0].find('Tesla V100') >= 0:
  max_batch_size = 3
  print('Tesla V100 detected, max_batch_size set to 3')
else:
  max_batch_size = 3
  print('Different GPU detected, max_batch_size set to 3.')
hps.levels = 3
speed_upsampling = False #@param {type: "boolean"}
if speed_upsampling == True:
  hps.hop_fraction = [1,1,.125]
else:
  hps.hop_fraction = [.5,.5,.125]

vqvae, *priors = MODELS[model]
vqvae = make_vqvae(setup_hparams(vqvae, dict(sample_length = 1048576)), device)
top_prior = make_prior(setup_hparams(priors[-1], dict()), vqvae, device)

# The default mode of operation.
# Creates songs based on artist and genre conditioning.
mode = 'ancestral' #@param ["ancestral", "primed"]
if mode == 'ancestral':
  codes_file=None
  audio_file=None
  prompt_length_in_seconds=None
if mode == 'primed':
  codes_file=None
  # Specify an audio file here.
  audio_file = '' #@param {type:"string"}
  # Specify how many seconds of audio to prime on.
  prompt_length_in_seconds=24 #@param {type:"integer"}

sample_length_in_seconds = 50 #@param {type:"integer"}

if os.path.exists(hps.name):
  # Identify the lowest level generated and continue from there.
  for level in [0, 1, 2]:
    data = f"{hps.name}/level_{level}/data.pth.tar"
    if os.path.isfile(data):
      codes_file = data
      if int(sample_length_in_seconds) > int(librosa.get_duration(filename=f'{hps.name}/level_2/item_0.wav')):
        mode = 'continue'
      else:
        mode = 'upsample'
      break

print('mode is now '+mode)
if mode == 'continue':
  print('Continuing from level 2')
if mode == 'upsample':
  print('Upsampling from level '+str(level))

sample_hps = Hyperparams(dict(mode=mode, codes_file=codes_file, audio_file=audio_file, prompt_length_in_seconds=prompt_length_in_seconds))

if mode == 'upsample':
  sample_length_in_seconds=int(librosa.get_duration(filename=f'{hps.name}/level_{level}/item_0.wav'))
  data = t.load(sample_hps.codes_file, map_location='cpu')
  zs = [z.cpu() for z in data['zs']]
  hps.n_samples = zs[-1].shape[0]

if mode == 'continue':
  data = t.load(sample_hps.codes_file, map_location='cpu')
  zs = [z.cpu() for z in data['zs']]
  hps.n_samples = zs[-1].shape[0]

hps.sample_length = (int(sample_length_in_seconds*hps.sr)//top_prior.raw_to_tokens)*top_prior.raw_to_tokens
assert hps.sample_length >= top_prior.n_ctx*top_prior.raw_to_tokens, f'Please choose a larger sampling rate'

# Note: Metas can contain different prompts per sample.
# By default, all samples use the same prompt.

select_artist = "1604" #@param {type:"string"}
select_genre = "58" #@param {type:"string"}
metas = [dict(artist = select_artist,
            genre = select_genre,
            total_length = hps.sample_length,
            offset = 0,
            lyrics = your_lyrics, 
            ),
          ] * hps.n_samples
labels = [None, None, top_prior.labeller.get_batch_labels(metas, 'cuda')]

sampling_temperature = .99 #@param {type:"number"}

lower_batch_size = 12
lower_level_chunk_size = 32
chunk_size = 16 if model in ('5b', '5b_lyrics') else 32
sampling_kwargs = [dict(temp=.99, fp16=True, max_batch_size=lower_batch_size,
                        chunk_size=lower_level_chunk_size),
                    dict(temp=0.99, fp16=True, max_batch_size=lower_batch_size,
                         chunk_size=lower_level_chunk_size),
                    dict(temp=sampling_temperature, fp16=True, 
                         max_batch_size=max_batch_size, chunk_size=chunk_size)]

if sample_hps.mode == 'ancestral':
  zs = [t.zeros(hps.n_samples,0,dtype=t.long, device='cpu') for _ in range(len(priors))]
  zs = _sample(zs, labels, sampling_kwargs, [None, None, top_prior], [2], hps)
elif sample_hps.mode == 'upsample':
  assert sample_hps.codes_file is not None
  # Load codes.
  data = t.load(sample_hps.codes_file, map_location='cpu')
  zs = [z.cpu() for z in data['zs']]
  assert zs[-1].shape[0] == hps.n_samples, f"Expected bs = {hps.n_samples}, got {zs[-1].shape[0]}"
  del data
  print('One click upsampling!')
elif sample_hps.mode == 'primed':
  assert sample_hps.audio_file is not None
  audio_files = sample_hps.audio_file.split(',')
  duration = (int(sample_hps.prompt_length_in_seconds*hps.sr)//top_prior.raw_to_tokens)*top_prior.raw_to_tokens
  x = load_prompts(audio_files, duration, hps)
  zs = top_prior.encode(x, start_level=0, end_level=len(priors), bs_chunks=x.shape[0])
  zs = _sample(zs, labels, sampling_kwargs, [None, None, top_prior], [2], hps)
elif sample_hps.mode == 'continue':
  data = t.load(sample_hps.codes_file, map_location='cpu')
  zs = [z.cuda() for z in data['zs']]
  zs = _sample(zs, labels, sampling_kwargs, [None, None, top_prior], [2], hps)
else:
  raise ValueError(f'Unknown sample mode {sample_hps.mode}.')

# Set this False if you are on a local machine that has enough memory (this allows you to do the
# lyrics alignment visualization during the upsampling stage). For a hosted runtime, 
# we'll need to go ahead and delete the top_prior if you are using the 5b_lyrics model.
if True:
  del top_prior
  empty_cache()
  top_prior=None
upsamplers = [make_prior(setup_hparams(prior, dict()), vqvae, 'cpu') for prior in priors[:-1]]
labels[:2] = [prior.labeller.get_batch_labels(metas, 'cuda') for prior in upsamplers]

zs = upsample(zs, labels, sampling_kwargs, [*upsamplers, top_prior], hps)

Mounted at /content/gdrive
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/craftmine1000/jukebox-saveopt.git
  Cloning https://github.com/craftmine1000/jukebox-saveopt.git to /tmp/pip-req-build-c2f1sl2s
  Running command git clone -q https://github.com/craftmine1000/jukebox-saveopt.git /tmp/pip-req-build-c2f1sl2s
Using cuda True
Tesla V100 detected, max_batch_size set to 3
Downloading from azure
Restored from /root/.cache/jukebox/models/5b/vqvae.pth.tar
0: Loading vqvae in eval mode
Loading artist IDs from /usr/local/lib/python3.7/dist-packages/jukebox/data/ids/v2_artist_ids.txt
Loading artist IDs from /usr/local/lib/python3.7/dist-packages/jukebox/data/ids/v2_genre_ids.txt
Level:2, Cond downsample:None, Raw to tokens:128, Sample length:1048576
Downloading from azure
Restored from /root/.cache/jukebox/models/5b_lyrics/prior_level_2.pth.tar
0: Loading prior in eval mode
mode is now ancestral
Input artis

# Guide to the above settings:

**your_lyrics:** Specify the lyrics Jukebox should attempt to follow. You can paste any lyrics you want in here or leave it blank, which will result in gibberish.

**model:**
OpenAI has trained a few different models for Jukebox. In this notebook, you can access the 5b_lyrics and 1b_lyrics models. As you can imagine, the 5b_lyrics model is the superior ones, but also requires a stronger GPU to run properly. Which model you should choose depends on the GPU you were assigned, which you can check in the first cell of the notebook. Recommended settings: 5b_lyrics on P100 or T4 GPU, 1b_lyrics on K80 GPU.
(The 5b_lyrics model theoretically work on a K80 now, but sampling is going to be super slow.)

**hps.n_samples:**
Here you can choose how many samples you want to generate. Different GPUs can handle a different amount of samples. Recommended settings:
P100 GPU: 3 samples,
T4 GPU: 2 samples;
K80 GPU: up to 8 samples, but 1b_lyrics only.

**hps.name:** Specifies the name of the folder in Google Drive, where you will find your results in. Make sure to choose a different name for each of your runs, or else the notebook will get confused.

**speed_upsampling:** If selected will upsample much faster, at the cost of the samples sounding slightly "choppy". 

**mode:** Available modes are primed and ancestral. Primed will continue an already existing song, ancestral generates a song from scratch. (Upsample mode will be selected automatically if a data file is detected within the folder provided)

**audio_file:** Only needed for primed mode. Specifies which song Jukebox will continue. Upload the file you want (needs to be .wav format!) to the root directory of your Google Drive and fill in its name above.

**prompt_length_in_seconds:** Only needed for primed mode. Specifies how many seconds of your file Jukebox will be primed on (so, at which point Jukebox  will "kick in"). Recommended to keep below 24 seconds for memory reasons.

**sample_length_in_seconds:** Specifies how long your fully generated samples are going to be.

**select_artist and select_genre:** List of available artists and genres can be found here: https://github.com/openai/jukebox/tree/master/jukebox/data/ids
The 5b_lyrics model utilizes the v2 lists, the 1b_lyrics model the v3 lists. It is possible to combine up to five v2 genres, for example "Hip Hop Pop Punk Disco". Combining v3 genres is not possible.

**sampling_temperature:** Determines the creativity and energy of Jukebox. The higher the temperature, the more chaotic and intense the result will be. You can experiment with this. Recommended to keep between .96 and .999

Important links:

Official blog: https://openai.com/blog/jukebox/
Original repo: https://github.com/openai/jukebox/

License: Non-commercial, for details see: https://github.com/openai/jukebox/blob/master/LICENSE