<a href="https://colab.research.google.com/github/rafaelturon/pocs/blob/master/Testing_Jukebox.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

NVIDIA System Management Interface (nvidia-smi) is a command line utility, based on top of the NVIDIA Management Library (NVML), intended to aid in the management and monitoring of NVIDIA GPU devices.

In [None]:
!nvidia-smi -L

Mount Google Drive to save sample levels as they are generated.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Prepare the environment.

In [None]:
!pip install git+https://github.com/openai/jukebox.git

In [None]:
import jukebox
import torch as t
import librosa
import os
from IPython.display import Audio
from jukebox.make_models import make_vqvae, make_prior, MODELS, make_model
from jukebox.hparams import Hyperparams, setup_hparams
from jukebox.sample import sample_single_window, _sample, \
                           sample_partial_window, upsample, \
                           load_prompts
from jukebox.utils.dist_utils import setup_dist_from_mpi
from jukebox.utils.torch_utils import empty_cache
rank, local_rank, device = setup_dist_from_mpi()

Sample from the 5B or 1B Lyrics Model

In [None]:
model = "5b_lyrics" # or "1b_lyrics"
hps = Hyperparams()
hps.sr = 44100
hps.n_samples = 3 if model=='5b_lyrics' else 8
# Specifies the directory to save the sample in.
# We set this to the Google Drive mount point.
hps.name = '/content/gdrive/My Drive/jukebo-samples-gorillaz'
chunk_size = 16 if model=="5b_lyrics" else 32
max_batch_size = 3 if model=="5b_lyrics" else 16
hps.levels = 3
hps.hop_fraction = [.5,.5,.125]

vqvae, *priors = MODELS[model]
vqvae = make_vqvae(setup_hparams(vqvae, dict(sample_length = 1048576)), device)
top_prior = make_prior(setup_hparams(priors[-1], dict()), vqvae, device)

To install  youtube-dl right away for all UNIX users (Linux, OS X, etc.), type:

```
sudo curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl

sudo chmod a+rx /usr/local/bin/youtube-dl
```

To download best wav format:


```
youtube-dl -ci -f 'bestvideo[ext=mp4]+bestaudio' -x --audio-format wav https://www.youtube.com/watch?v=MUSIC-ID
```





# Select mode
Run one of these cells to select the desired mode.

In [None]:
# The default mode of operation.
# Creates songs based on artist and genre conditioning.
mode = 'ancestral'
codes_file=None
audio_file=None
prompt_length_in_seconds=None

In [None]:
# Prime song creation using an arbitrary audio sample.
mode = 'primed'
codes_file=None
# Specify an audio file here.
audio_file = '/content/gdrive/My Drive/gorillaz-clint-eastwood.wav'
# Specify how many seconds of audio to prime on.
prompt_length_in_seconds=12

Run this cell to automatically resume from the latest checkpoint file, but only if the checkpoint file exists. This will override the selected mode. We will assume the existance of a checkpoint means generation is complete and it's time for upsamping to occur.

In [None]:
if os.path.exists(hps.name):
  # Identify the lowest level generated and continue from there.
  for level in [1, 2]:
    data = f"{hps.name}/level_{level}/data.pth.tar"
    if os.path.isfile(data):
      mode = 'upsample'
      codes_file = data
      print('Upsampling from level '+str(level))
      break
print('mode is now '+mode)

Run the cell below regardless of which mode you chose.

In [None]:
sample_hps = Hyperparams(dict(mode=mode, codes_file=codes_file, audio_file=audio_file, prompt_length_in_seconds=prompt_length_in_seconds))

Specify your choice of artist, genre, lyrics, and length of musical sample.

In [None]:
sample_length_in_seconds = 71          # Full length of musical sample to generate - we find songs in the 1 to 4 minute
                                       # range work well, with generation time proportional to sample length.  
                                       # This total length affects how quickly the model 
                                       # progresses through lyrics (model also generates differently
                                       # depending on if it thinks it's in the beginning, middle, or end of sample)
hps.sample_length = (int(sample_length_in_seconds*hps.sr)//top_prior.raw_to_tokens)*top_prior.raw_to_tokens
assert hps.sample_length >= top_prior.n_ctx*top_prior.raw_to_tokens, f'Please choose a larger sampling rate'

In [None]:
# Note: Metas can contain different prompts per sample.
# By default, all samples use the same prompt.
metas = [dict(artist = "Gorillaz",
            genre = "Indie",
            total_length = hps.sample_length,
            offset = 0,
            lyrics = """I ain't happy, I'm feeling glad
I got sunshine in a bag
I'm useless, but not for long
The future is coming on

I ain't happy, I'm feeling glad
I got sunshine, in a bag
I'm useless, but not for long
The future is coming on
It's coming on
It's coming on
It's coming on

Yeah
Ha! Ha!

Finally someone let me out of my cage
Now time for me is nothing, 'cause I'm counting no age
Now I couldn't be there, now you shouldn't be scared
I'm good at repairs and I'm under each snare
Intangible! Bet you didn't think so I command you to
Panoramic view, look I'll make it all manageable
Pick and choose, sit and lose all you different crews
Chicks and dudes, who you think is really kickin' tunes?
Picture you gettin' down in a picture tube

Like you lit the fuse
You think it's fictional, mystical? Maybe!
Spiritual, hero who appears
In you to clear your view when you're too crazy
Lifeless, to know the definition for what life is
Priceless for you because I put you on the hype shit
You like it?
Gun smokin' righteous with one toke
You're psychic among those
Possess you with one go

I ain't happy, I'm feeling glad
I got sunshine, in a bag
I'm useless,but not for long
The future is coming on

I ain't happy, I'm feeling glad
I got sunshine, in a bag
I'm useless, but not for long
The future is coming on
It's coming on
It's coming on
It's coming on

The essence, the basics
Without it, did you make it?
Allow me to make this
Child-like in nature
Rhythm, you have it or you don't
That's a fallacy!
I'm in them, every sproutin' tree, every child apiece
Every cloud at sea, you see with your eyes
I see destruction and demise, corruption in disguise
From this fuckin' enterprise, now I'm sucked into your lies
Through Russel, not his muscles, but percussion he provides
For me as a guide
Y'all can see me now 'cause you don't see with your eye
You perceive with your mind, that's the inner
So I'mma stick around with Russ' and be your mentor
Bust a few rhymes so motherfuckers remember
What the thought is
I brought all this so you can survive when law is lawless
Feelings, sensations that you thought was dead
No squealing, remember (that it's all in your head)

I ain't happy, I'm feeling glad
I got sunshine, in a bag
I'm useless, but not for long
The future is coming on
I ain't happy, I'm feeling glad
I got sunshine, in a bag
I'm useless, but not for long

My future is coming on
It's coming on
It's coming on
It's coming on
It's coming on

My future is coming on
It's coming on
It's coming on
It's coming on
It's coming on

My future is coming on
It's coming on
It's coming on
It's coming on
It's coming on

My future is coming on
It's coming on
It's coming on

My future is coming on
It's coming on
It's coming on

My future is coming on
It's coming on
It's coming on
My future
""",
            ),
          ] * hps.n_samples
labels = [None, None, top_prior.labeller.get_batch_labels(metas, 'cuda')]

Optionally adjust the sampling temperature (we've found .98 or .99 to be our favorite).

In [None]:
sampling_temperature = .98

lower_batch_size = 16
max_batch_size = 3 if model == "5b_lyrics" else 16
lower_level_chunk_size = 32
chunk_size = 16 if model == "5b_lyrics" else 32
sampling_kwargs = [dict(temp=.99, fp16=True, max_batch_size=lower_batch_size,
                        chunk_size=lower_level_chunk_size),
                    dict(temp=0.99, fp16=True, max_batch_size=lower_batch_size,
                         chunk_size=lower_level_chunk_size),
                    dict(temp=sampling_temperature, fp16=True, 
                         max_batch_size=max_batch_size, chunk_size=chunk_size)]

Now we're ready to sample from the model. We'll generate the top level (2) first, followed by the first upsampling (level 1), and the second upsampling (0). In this CoLab we load the top prior separately from the upsamplers, because of memory concerns on the hosted runtimes. If you are using a local machine, you can also load all models directly with make_models, and then use sample.py's ancestral_sampling to put this all in one step.

After each level, we decode to raw audio and save the audio files.

This next cell will take a while (approximately 10 minutes per 20 seconds of music sample)

In [None]:
if sample_hps.mode == 'ancestral':
  zs = [t.zeros(hps.n_samples,0,dtype=t.long, device='cuda') for _ in range(len(priors))]
  zs = _sample(zs, labels, sampling_kwargs, [None, None, top_prior], [2], hps)
elif sample_hps.mode == 'upsample':
  assert sample_hps.codes_file is not None
  # Load codes.
  data = t.load(sample_hps.codes_file, map_location='cpu')
  zs = [z.cuda() for z in data['zs']]
  assert zs[-1].shape[0] == hps.n_samples, f"Expected bs = {hps.n_samples}, got {zs[-1].shape[0]}"
  del data
  print('Falling through to the upsample step later in the notebook.')
elif sample_hps.mode == 'primed':
  assert sample_hps.audio_file is not None
  audio_files = sample_hps.audio_file.split(',')
  duration = (int(sample_hps.prompt_length_in_seconds*hps.sr)//top_prior.raw_to_tokens)*top_prior.raw_to_tokens
  x = load_prompts(audio_files, duration, hps)
  zs = top_prior.encode(x, start_level=0, end_level=len(priors), bs_chunks=x.shape[0])
  zs = _sample(zs, labels, sampling_kwargs, [None, None, top_prior], [2], hps)
else:
  raise ValueError(f'Unknown sample mode {sample_hps.mode}.')

Listen to the results from the top level (note this will sound very noisy until we do the upsampling stage). You may have more generated samples, depending on the batch size you requested.

In [None]:
Audio(f'{hps.name}/level_2/item_0.wav')

We are now done with the large top_prior model, and instead load the upsamplers.

In [None]:
# Set this False if you are on a local machine that has enough memory (this allows you to do the
# lyrics alignment visualization during the upsampling stage). For a hosted runtime, 
# we'll need to go ahead and delete the top_prior if you are using the 5b_lyrics model.
if True:
  del top_prior
  empty_cache()
  top_prior=None
upsamplers = [make_prior(setup_hparams(prior, dict()), vqvae, 'cpu') for prior in priors[:-1]]
labels[:2] = [prior.labeller.get_batch_labels(metas, 'cuda') for prior in upsamplers]

Please note: this next upsampling step will take several hours. At the free tier, Google CoLab lets you run for 12 hours. As the upsampling is completed, samples will appear in the Files tab (you can access this at the left of the CoLab), under "samples" (or whatever hps.name is currently). Level 1 is the partially upsampled version, and then Level 0 is fully completed.

In [None]:
zs = upsample(zs, labels, sampling_kwargs, [*upsamplers, top_prior], hps)

Listen to your final sample!

In [None]:
Audio(f'{hps.name}/level_0/item_0.wav')

In [None]:
del upsamplers
empty_cache()