<a href="https://colab.research.google.com/github/olaviinha/NeuralTextToAudio/blob/main/AudioLDM_pub.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#<font face="Trebuchet MS" size="6">AudioLDM<font color="#999" size="4">&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;</font><font color="#999" size="4">Text-to-audio</font><font color="#999" size="4">&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;</font><a href="https://github.com/olaviinha/NeuralTextToAudio" target="_blank"><font color="#999" size="4">Github</font></a>

Generate audio from text-prompt using [AudioLDM](https://github.com/haoheliu/AudioLDM).

This notebook has been optimized to use both [Diffusers AudioLDM pipeline](https://huggingface.co/docs/diffusers/main/en/api/pipelines/audioldm) (for text-to-audio generation) and native [AudioLDM Python API](https://github.com/haoheliu/AudioLDM) (for Audio-to-Audio and Style Transfer, incl. stereo simulation). This makes first setup slow, but usage fast (see tips below to make future usage considerably faster).

## Instructions and tips



### __Notebook usage__
#### __General__
- `mount_drive` is optional but highly recommended, as it enables you to auto-save all generated WAV files as well as used checkpoints directly to your Google Drive (and thus sync to your computer in near real-time, if you have Google Drive installed). Should you opt not to mount Google Drive, directory _faux_drive_ (`/content/faux_drive`) found in the Files browser of the Colab runtime works as if it was your _My Drive_. You may use it to upload/download files via Colab's own Files browser pretending it's your Google Drive.
- All directory and file paths should be relative to your Google Drive root (My Drive). E.g. `output_dir` value should be `Music/AI-Generated-Sounds` if you have a directory called _Music_ in your Drive, containing a subdirectory called _AI-Generated-Sounds_. All paths are case-sensitive.
- Checkpoint `audioldm-l-full` requires Premium GPU. Other checkpoints should work with standard GPU.
- `local_models_dir` (optional) will save the used checkpoints in your Google Drive and/or use them from there if already available. Using this is a significant timesaver on Setup next times you use the notebook.
- `output_dir` is where the generated WAV files will be saved.
- `batch` will just repeat whatever you're generating that many times.
- If seed is set to `0` (zero), a random seed will be used.
- You may use a `;` (semicolon) in the prompt field as a separator, in which case a separate audio file will be generated for each semicolon-separated prompt in a single run.

#### __Quality simulation__
Some audio quality enhancements using traditional methods – unrelated to AudioLDM per se – have been added to this notebook. These settings will not by any means make the audio sound excellent, but perhaps slightly better than the default 16 kHz mono.
- `stereo_width` (when greater than zero) will generate a secondary audio file applying the same prompt as low-strength Style Transfer on the initial generation, and mash these files as left and right channels with given width.
- `convert_to_44khz` will convert the generated 16 kHz audio to 44.1 kHz audio __after__ generation, using FFMPEG with a decent interpolation filter. This can make some types of audio sound noticably better.
- `simulate_high_end` will generate some high end (>10 kHz) by traditional methods using pitch shift and highpass filtering.

#### __Audio-to-Audio generation__
- Enter a file path to `init_audio_file` field.
- Remove all text from `prompt` field.
- Set `style_strength` to zero.

#### __Style Transfer__ 
- Enter a file path to `init_audio_file` field.
- Describe the style you want in `prompt` field.
- Set `style_strength` to greater than zero.

### __Prompt tips__
Naturally a good prompt depends on what you're after, but generally:
Consider adding more detailed description of what kind of sound you want (add adjectives, etc.).
For better quality, you may try some additional keywords generally associated with better quality, for example
- in studio
- studio recording
- high quality
- album (for music)

### __Advanced__
If you are familiar with Python scripting and want to generate prompt lists programmatically (or just for clarity) you may set your prompts to `prompt_list = []` and use it by simply entering text value `prompt_list` to the `prompt` field. If it's a list of text prompts, it will be used in similar fashion as prompt field value containing semicolons. You may also include prompt-specific parameters in `prompt_list` by making it a list of lists, where each sublist is in format `['<prompt>', <seed>, <guidance_scale>, '<output_dir>']`

# Setup

In [None]:
##@title #Setup
#@markdown This cell needs to be run only once. It will mount your Google Drive and setup prerequisites.<br>
#@markdown <small>Mounting Drive will enable this notebook to save outputs directly to your Drive. Otherwise you will need to copy/download them manually from this notebook.</small>

# Print colors
class c:
  title = '\033[96m'
  ok = '\033[92m'
  okb = '\033[94m'
  warn = '\033[93m'
  fail = '\033[31m'
  endc = '\033[0m'
  bold = '\033[1m'
  dark = '\33[90m'
  u = '\033[4m'

def op(typex, msg, value='', time=False):
  if time == True:
    stamp = timestamp(human_readable=True)
    typex = c.dark+stamp+' '+typex
  if value != '':
    print(typex+msg+c.endc, end=' ')
    print(value)
  else:
    print(typex+msg+c.endc)

quick_setup = False #@ param {type:"boolean"}

use_diffusers = True
use_github = True if quick_setup == False else False

force_setup = False
repositories = ['https://github.com/haoheliu/AudioLDM.git']
apt_packages = 'ffmpeg'
mount_drive = True #@param {type:"boolean"}
skip_setup = False #@ param {type:"boolean"}
local_models_dir = "" #@param {type:"string"}
use_checkpoint = "audioldm-m-full" #@param ["audioldm-s-full", "audioldm-l-full", "audioldm-s-full-v2","audioldm-m-text-ft", "audioldm-s-text-ft", "audioldm-m-full"]

if '-text-' in use_checkpoint:
  use_diffusers = False

pip_packages = 'transformers diffusers accelerate' if use_diffusers == True else ''

if quick_setup == True:
  op(c.title, 'Performing quick setup')

if use_github == False:
  repositories = []
  apt_packages = ''
  if local_models_dir != '':
    op(c.fail, '!! local_models_dir is ignored on quick setup')
  local_models_dir = ''

ckpt_urls = {
  "audioldm-s-full": "https://zenodo.org/record/7600541/files/audioldm-s-full?download=1",
  "audioldm-l-full": "https://zenodo.org/record/7698295/files/audioldm-full-l.ckpt?download=1",
  "audioldm-s-full-v2": "https://zenodo.org/record/7698295/files/audioldm-full-s-v2.ckpt?download=1",
  "audioldm-m-text-ft": "https://zenodo.org/record/7813012/files/audioldm-m-text-ft.ckpt?download=1",
  "audioldm-s-text-ft": "https://zenodo.org/record/7813012/files/audioldm-s-text-ft.ckpt?download=1",
  "audioldm-m-full": "https://zenodo.org/record/7813012/files/audioldm-m-full.ckpt?download=1"
}

ckpt_url = ckpt_urls[use_checkpoint]
use_ckpt = ckpt_url.split('files/')[1].split('?')[0]

import os
from google.colab import output, files
import warnings
warnings.filterwarnings('ignore')
%cd /content/

if pip_packages != '':
  !pip -q install {pip_packages}
if apt_packages != '':
  !apt-get update && apt-get install {apt_packages}

if use_diffusers == True:
  import torch
  from diffusers import AudioLDMPipeline
  from transformers import AutoProcessor, ClapModel

  # make Space compatible with CPU duplicates
  if torch.cuda.is_available():
      device = "cuda"
      torch_dtype = torch.float16
  else:
      device = "cpu"
      torch_dtype = torch.float32

  # load the diffusers pipeline
  if use_checkpoint == 'audioldm-s-full':
    repo_id = "cvssp/audioldm"
  else:
    repo_id = "cvssp/" + use_checkpoint
  pipe = AudioLDMPipeline.from_pretrained(repo_id, torch_dtype=torch_dtype).to(device)
  pipe.unet = torch.compile(pipe.unet)

  # CLAP model (only required for automatic scoring)
  clap_model = ClapModel.from_pretrained("sanchit-gandhi/clap-htsat-unfused-m-full").to(device)
  processor = AutoProcessor.from_pretrained("sanchit-gandhi/clap-htsat-unfused-m-full")
  generator = torch.Generator(device)

import sys, time, ntpath, string, random, librosa, librosa.display, IPython, shutil, math, psutil, datetime, requests, pytz
import numpy as np
import soundfile as sf
from datetime import timedelta

def gen_id(type='short'):
  id = ''
  if type == 'timestamp':
    id = timestamp()
  if type == 'short':
    id = requests.get('https://api.inha.asia/k/?type=short').text
  if type == 'long':
    id = requests.get('https://api.inha.asia/k').text
  return id

def timestamp(no_slash=False, human_readable=False, helsinki_time=True, date_only=False):
  if helsinki_time == True:
    dt = datetime.datetime.now(pytz.timezone('Europe/Helsinki'))
  else:
    dt = datetime.datetime.now()
  if no_slash == True:
    dt = dt.strftime("%Y%m%d%H%M%S")
  else:
    if human_readable == True:
      dt = dt.strftime("%Y-%m-%d %H:%M:%S")
    else:
      if date_only == True:
        dt = dt.strftime("%Y-%m-%d")
      else:
        dt = dt.strftime("%Y-%m-%d_%H%M%S")
  return dt;

def fix_path(path, add_slash=False):
  if path.endswith('/'):
    path = path #path[:-1]
  if not path.endswith('/'):
    path = path+"/"
  if path.startswith('/') and add_slash == True:
    path = path[1:]
  return path
  
def path_leaf(path):
  head, tail = ntpath.split(path)
  return tail or ntpath.basename(head)

def path_dir(path):
  return path.replace(path_leaf(path), '')

def path_ext(path, only_ext=False):
  filename, extension = os.path.splitext(path)
  if only_ext == True:
    extension = extension[1:]
  return extension

def basename(path):
  filename = os.path.basename(path).strip()#.replace(" ", "_")
  filebase = os.path.splitext(filename)[0]
  return filebase

def slug(s):
  valid_chars = "-_. %s%s" % (string.ascii_letters, string.digits)
  file = ''.join(c for c in s if c in valid_chars)
  file = file.replace(' ','_')
  return file
  
def fetch(url, save_as):
  headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'}
  try:
    r = requests.get(url, stream=True, headers=headers, timeout=5)
    if r.status_code == 200:
      with open(save_as, 'wb') as f:
        r.raw.decode_content = True
        shutil.copyfileobj(r.raw, f)
      resp = r.status_code
    else:
      resp = 0
  except requests.exceptions.ConnectionError as e:
    r = 0
    resp = r
  return resp

def list_audio(path, midi=False):
  audiofiles = []
  for ext in ('*.wav', '*.aiff', '*.aif', '*.caf' '*.flac', '*.mp3', '*.m4a', '*.ogg', '*.WAV', '*.AIFF', '*.AIF', '*.CAF', '*.FLAC', '*.MP3', '*.OGG'):
    audiofiles.extend(glob(join(path, ext)))
  if midi is True:
    for ext in ('*.mid', '*.midi', '*.MID', '*.MIDI'):
      audiofiles.extend(glob(join(path, ext)))
  audiofiles.sort()
  return audiofiles

def audio_player(input, sr=44100, limit_duration=2):
  if type(input) != np.ndarray:
    input, sr = librosa.load(input, sr=None, mono=False)
  if limit_duration > 0:
    last_sample = math.floor(limit_duration*60*sr)
    if input.shape[-1] > last_sample:
      input = input[:last_sample, :last_sample]
      op(c.warn, 'WARN! Playback of below audio player is limited to first '+str(limit_duration)+' minutes to prevent Colab from crashing.\n')
  IPython.display.display(IPython.display.Audio(input, rate=sr))

# Mount Drive
if mount_drive == True:
  if not os.path.isdir('/content/drive'):
    from google.colab import drive
    drive.mount('/content/drive')
    drive_root = '/content/drive/My Drive/'
  if not os.path.isdir('/content/mydrive'):
    os.symlink('/content/drive/My Drive', '/content/mydrive')
    drive_root = '/content/mydrive/'
  drive_root_set = True
else:
  drive_root = '/content/faux_drive/'
  if not os.path.isdir(drive_root):
    os.mkdir(drive_root)
  

if mount_drive == False:
  local_models_dir = ''

if len(repositories) > 0 and skip_setup == False:
  for repo in repositories:
    %cd /content/
    install_dir = fix_path('/content/'+path_leaf(repo).replace('.git', ''))
    repo = repo if '.git' in repo else repo+'.git'
    !git clone {repo}
    if os.path.isfile(install_dir+'setup.py') or os.path.isfile(install_dir+'setup.cfg'):
      !pip install -e {install_dir}
    if os.path.isfile(install_dir+'requirements.txt'):
      !pip install -r {install_dir}/requirements.txt

if len(repositories) == 1:
  %cd {install_dir}

dir_tmp = '/content/tmp/'
if not os.path.isdir(dir_tmp): os.mkdir(dir_tmp)

use_ckpt_path = os.path.expanduser('~')+'/.cache/audioldm/'

if not os.path.isdir(use_ckpt_path):
  os.makedirs(use_ckpt_path)

if os.path.isfile(use_ckpt_path+use_ckpt):
  op(c.ok, 'Checkpoint found:', use_ckpt)
else:
  if local_models_dir != '':
    models_dir = drive_root+fix_path(local_models_dir)
    if not os.path.isdir(models_dir):
      os.makedirs(models_dir)
    # for ckpt_url in ckpt_urls:
    #   use_ckpt = ckpt_url.split('files/')[1].split('?')[0]
    if os.path.isfile(models_dir+use_ckpt):
      op(c.title, 'Fetching local ckpt:', models_dir.replace(drive_root, '')+use_ckpt)
      shutil.copy(models_dir+use_ckpt, use_ckpt_path+use_ckpt)
      op(c.ok, 'Done.')
    else:
      op(c.warn, 'Downloading '+use_ckpt+' to ', models_dir.replace(drive_root, ''))
      !wget {ckpt_url} -O {models_dir}{use_ckpt}
      shutil.copy(models_dir+use_ckpt, use_ckpt_path+use_ckpt)
      op(c.ok, 'Done.')
  else:
    # for ckpt_url in ckpt_urls:
    #   use_ckpt = ckpt_url.split('files/')[1].split('?')[0]
    if quick_setup == False:
      models_dir = use_ckpt_path
      op(c.warn, 'Downloading', use_ckpt)
      !wget {ckpt_url} -O {models_dir}{use_ckpt}
      # shutil.copy(models_dir+use_ckpt, use_ckpt_path+use_ckpt)
      op(c.ok, 'Done.')
    else:
      op(c.warn, 'Skipping AudioLDM checkpoints...')

if use_github:
  _ckpt_path = use_ckpt_path+use_ckpt
  op(c.title, 'Build model', _ckpt_path)
  sys.path.append('/content/AudioLDM/audioldm/')
  from audioldm import text_to_audio, style_transfer, super_resolution_and_inpainting, build_model, latent_diffusion
  audioldm = build_model(ckpt_path=_ckpt_path, model_name=use_checkpoint)

def round_to_multiple(number, multiple):
  x = multiple * round(number / multiple)
  if x == 0: x = multiple
  return x

def text2audioDiffusers(text, negative_prompt, duration, guidance_scale, random_seed, n_candidates):
  waveforms = pipe(
      text,
      audio_length_in_s=duration,
      guidance_scale=guidance_scale,
      negative_prompt=negative_prompt,
      num_waveforms_per_prompt=n_candidates if n_candidates else 1,
      generator=generator.manual_seed(int(random_seed)),
  )["audios"]

  if waveforms.shape[0] > 1:
      waveform = score_waveforms(text, waveforms)
  else:
      waveform = waveforms[0]
  return waveform

def score_waveforms(text, waveforms):
  inputs = processor(text=text, audios=list(waveforms), return_tensors="pt", padding=True)
  inputs = {key: inputs[key].to(device) for key in inputs}
  with torch.no_grad():
      logits_per_text = clap_model(**inputs).logits_per_text  # this is the audio-text similarity score
      probs = logits_per_text.softmax(dim=-1)  # we can take the softmax to get the label probabilities
      most_probable = torch.argmax(probs)  # and now select the most likely audio waveform
  waveform = waveforms[most_probable]
  return waveform
  
def text2audio(text, duration, audio_path, guidance_scale, random_seed, n_candidates, steps):
  waveform = text_to_audio(
    audioldm,
    text,
    audio_path,
    random_seed,
    duration=duration,
    guidance_scale=guidance_scale,
    ddim_steps=steps,
    n_candidate_gen_per_text=int(n_candidates)
  )
  if(len(waveform) == 1):
    waveform = waveform[0]
  return waveform

def styleaudio(text, duration, audio_path, strength, guidance_scale, random_seed, steps):
  waveform = style_transfer(
    audioldm,
    text,
    audio_path,
    strength,
    random_seed,
    duration=duration,
    guidance_scale=guidance_scale,
    ddim_steps=steps,
  )
  if(len(waveform) == 1):
    waveform = waveform[0]
  return waveform


# time_mask_ratio_start_and_end=(0.10, 0.15), # regenerate the 10% to 15% of the time steps in the spectrogram
# time_mask_ratio_start_and_end=(1.0, 1.0), # no inpainting
# freq_mask_ratio_start_and_end=(0.75, 1.0), # regenerate the higher 75% to 100% mel bins
# freq_mask_ratio_start_and_end=(1.0, 1.0), # no super-resolution
def superres(text, duration, audio_path, guidance_scale, random_seed, n_candidates, steps):
  waveform = super_resolution_and_inpainting(
    audioldm,
    text,
    audio_path,
    random_seed,
    ddim_steps=steps,
    duration=duration,
    guidance_scale=guidance_scale,
    n_candidate_gen_per_text=n_candidates,
    freq_mask_ratio_start_and_end=(0.75, 1.0)
  )
  if(len(waveform) == 1):
    waveform = waveform[0]
  return waveform

def narrow_stereo(left_data, right_data, amount):
  left = left_data * amount + right_data * (1-amount)
  right = right_data * amount + left_data * (1-amount)
  return np.array([left, right])

def extract_prompt(file_path):
  bn = basename(wav)
  parts = bn.split('_')
  prompt = []
  for i, part in enumerate(parts):
    if i > 0 and not part.replace('.','').isnumeric() and part != '':
      prompt.append(part)
    elif len(part) > 6 and part.isnumeric():
      seed = int(part)
    elif '.' in part:
      guidance_scale = float(part)
  prompt = ' '.join(prompt)
  return [prompt, seed, guidance_scale]

import locale
def getpreferredencoding(do_setlocale = True):
    return "UTF-8"
locale.getpreferredencoding = getpreferredencoding

from scipy.signal import butter, lfilter, freqz

def highpass_audio(input, cutoff=10000, fs=44100, order=6):
  if type(input) == np.ndarray:
    data = input
    sr = fs
  else:
    data, sr = librosa.load(input, sr=None, mono=False)
  b, a = butter(order, cutoff, fs=fs, btype='highpass', analog=False)
  y = lfilter(b, a, data)
  return y

def apply_highpass_mix(input, hipass_vol=0.5, mix_vol=0.8, cutoff=10000, fx=44100, order=6):
  if type(input) == np.ndarray:
    data = input
    sr = fs
  else:
    data, sr = librosa.load(input, sr=None, mono=False)
  octave_up = librosa.effects.pitch_shift(data, sr=sr, n_steps=12, res_type='soxr_vhq') * hipass_vol
  hipass = highpass_audio(octave_up, cutoff, fx, order)
  return data*mix_vol+hipass

def applyInhaCSS():
  display(IPython.display.HTML('''
    <style type="text/css">
      :root {
        --bg-color: #eee;
        --fg-color: #333;
        --radius: 4px;
        --border: 0;
        --bold: 600;
        --dimmed: #ccc;
        --darkened: #aaa;
        --v-margin: 10px;
      }
      input.inhads,
      button.inhads {
        display: inline-block;
        border: var(--border);
        border-radius: var(--radius);
        background: var(--bg-color);
        color: var(--fg-color);
        margin-top: var(--v-margin);
        margin-bottom: calc(var(--v-margin) * 2);
      }
      input.inhads {
        padding: 10px 15px;
        min-width: 30%;
      }
      input.inhads.input-with-button {
        border-top-left-radius: var(--radius);
        border-bottom-left-radius: var(--radius);
        border-top-right-radius: 0;
        border-bottom-right-radius: 0;
      }
      button.inhads {
        cursor: pointer;
      }
      button.inhads.input-button {
        display: inline-block;
        margin: 0;
        padding: 4.5px 10px;
        position: relative;
        top: 3px;
        font-size: 20px;
        border-top-left-radius: 0;
        border-bottom-left-radius: 0;
        border-top-right-radius: var(--radius);
        border-bottom-right-radius: var(--radius)
      }
      button.inhads.download-button {
        padding: 10px 15px;
        font-weight: var(--bold);
      }
      button.inhads.download-button.done {
        background: var(--dimmed);
      }
      button.inhads.download-button.done::before {
        content: '✔ ';
        color: #080;
      }
      .inhads.input-note {
        padding-left: 10px;
        display: none;
        font-weight: var(--bold);
      }
      .inhads.disabled {
        color: #666;
        background: var(--darkened);
      }
    </style>
  '''))

def dl_btn(file_path, show_path=False, show_filename=False):
  applyInhaCSS()
  id = rnd_str(8)
  view_path = file_path if show_path else path_leaf(file_path) if show_filename else ''
  def download_file():
    files.download(file_path)
  display(IPython.display.HTML('''
    <button class="inhads download-button" id="btn_'''+id+'''">Download</button> '''+view_path+'''
    <script>
      document.querySelector("#btn_'''+id+'''").onclick = () => {
        let btn = document.querySelector("#btn_'''+id+'''");
        google.colab.kernel.invokeFunction("notebook.download'''+id+'''", [], {});
        btn.innerHTML='Downloading...';
        btn.classList.add('disabled');
        btn.disabled=true;
        setTimeout(() => {
          btn.innerHTML='Downloaded';
          btn.classList.remove('disabled');
          btn.classList.add('done');
          btn.disabled=false;
        }, 2000);
      };
    </script>
  '''))
  output.register_callback('notebook.download'+id, download_file)

def rnd_str(length):
  letters = string.ascii_lowercase
  result_str = ''.join(random.choice(letters) for i in range(length))
  return result_str

prompt_list = []
first_generated = False

output.clear()
# !nvidia-smi
print()
op(c.title, 'Using:', use_ckpt, time=True)
op(c.ok, 'Setup finished.', time=True)
print()


# Generate audio

In [None]:
##@title # Generate audio

start_from = 1

prompt = "" #@param {type:"string"}
negative_prompt = "low quality, average quality, snare" #@param {type:"string"}

output_dir = "" #@param {type:"string"}
duration = 10 #@param {type:"slider", min:2.5, max:30, step:2.5}
guidance_scale = 2.5 #@param {type:"slider", min:2, max:5, step:0.5}
seed = 0 #@param {type:"integer"}
candidates = 3 #@param {type:"slider", min:2, max:5, step:1}
batch = 1 #@param {type:"integer"}


#@markdown <br>

#@markdown #### <b>Style Transfer & Audio-to-Audio</b> settings – Ignore these settings if you just want to generate audio by text prompt.
init_audio_file = "" #@param {type:"string"}
style_strength = 0 #@param {type:"slider", min:0, max:1, step:0.05}

#@markdown <br>

#@markdown #### <b>Quality simulation</b> by traditional methods (unrelated to AudioLDM)
convert_to_44khz = True #@param {type:"boolean"}
stereo_width = 0.65 #@param {type:"slider", min:0, max:1, step:0.05}
keep_16k = True #@ param {type:"boolean"}
simulate_high_end = 0.25 #@param {type:"slider", min:0, max:0.5, step:0.05}
display_players = True #@ param {type:"boolean"}


# what_to_do = "Audio-to-audio generation" #@param ["Audio-to-audio generation", "Super-resolution", "Style Transfer"]

what_to_do = None
superresolution = False #@ param {type:"boolean"}

trunc = 150
settings_in_filename = False

if what_to_do == 'Audio-to-audio-generation': action = 'audio2audio'
if what_to_do == 'Super-resolution': action = 'superres'
if what_to_do == 'Style Transfer': action = 'style'
if what_to_do == 'Inpaint': action = 'inpaint'

ddim_steps = 200
og_seed = seed
og_duration = duration
uniq_id = gen_id()
sr = 16000

# Prompt/input
if ';' in prompt:
  inputs = prompt.split(';')
elif prompt == 'prompt_list':
  inputs = prompt_list
else:
  inputs = [prompt]

if isinstance(inputs[0], list):
  inputs = [x.strip() for x in inputs]

# Output
if output_dir == '':
  if mount_drive is True:
    dir_out = dir_tmp
  if mount_drive is False:
    dir_out = drive_root+'generated-audio/'
    if not os.path.isdir(dir_out):
      os.mkdir(dir_out)
else:
  if not os.path.isdir(drive_root+output_dir):
    os.makedirs(drive_root+output_dir)
  dir_out = drive_root+fix_path(output_dir)

if convert_to_44khz == True and keep_16k == True:
  dir_16k = dir_out+'16khz/'
  if not os.path.isdir(dir_16k):
    os.mkdir(dir_16k)
    
og_dir_out = dir_out
if batch == 0: batch = 1  
inputs = inputs * batch

timer_start = time.time()
total = len(inputs)
action = 'generate'
init_path = None

op(c.title, 'Run ID', uniq_id, time=True)

for i, input in enumerate(inputs, start_from):

  dir_out = og_dir_out

  if not i % 10:
    output.clear()
    op(c.warn, 'Cell output is cleared every 10 generations to keep Colab running smoothly.', time=True)
    op(c.warn, 'You can find all audio files from directory', dir_out.replace(drive_root, ''), time=True)
    print()
    op(c.title, 'Run ID', uniq_id, time=True)
    
  prompt = input
  
  predefined_file_out = ''

  if isinstance(input, list):
    op(c.warn, 'Prompt-specific parameters found, ignoring form values.', time=True)
    prompt = input[0]
    seed = int(input[1])
    og_seed = seed
    guidance_scale = float(input[2])
    outd = input[3]
    if len(input) > 3:
      predefined_file_out = input[4]
    if outd != '':
      if not os.path.isdir(dir_out+outd):
        os.mkdir(dir_out+outd)
      dir_out = dir_out+outd+'/'
  
  ndx_info = str(i)+'/'+str(total)+' '
  print()

  if os.path.isfile(dir_out+predefined_file_out):
    op(c.warn, ndx_info+'Already exists, skipping', predefined_file_out)
    continue

  if init_audio_file != '':
    if os.path.isfile(drive_root+init_audio_file):
      init_path = drive_root+init_audio_file
      if superresolution is True:
        action = 'superres'
      elif style_strength > 0:
        init_filename = path_leaf(init_path)
        op(c.title, ndx_info+'Styling audio:', init_path.replace(drive_root, ''), time=True)
        op(c.title, 'With prompt:', prompt, time=True)
        action = 'style'
      else:
        op(c.title, ndx_info+'Audio-to-audio generation:', init_path.replace(drive_root, ''), time=True)
        # op(c.title, 'With prompt:', prompt, time=True)
        prompt = None
        action = 'audio2audio'
      # Trim duration if init duration is shorter than given duration
      init_y, init_sr = librosa.load(init_path, sr=None, mono=True)
      init_duration = librosa.get_duration(init_y, init_sr)
      duration = round_to_multiple(init_duration, 2.5) if init_duration < og_duration else duration
      
    else:
      op(c.fail, ndx_info+'Init audio file not found!', time=True)
      sys.exit('Make sure init_audio_file is a valid audio file and a valid file path relative to your My Drive.')
  else:
    op(c.title, ndx_info+'Generating audio:', prompt, time=True)

    if first_generated == False:
      op(c.warn, 'First generation takes a short while to start. Next generations will be considerably faster.', time=True)
      print()
      first_generated = True

    if isinstance(input, list):
      print('File:', path_leaf(predefined_file_out))
      print('Using seed:', seed)
      print('Using guidance scale:', guidance_scale)

  if og_seed == 0: seed = int(time.time()) - 1229904000 + random.randint(11111111, 99999999)

  
  addn = str(seed)+'_'+str(guidance_scale)+'_' if settings_in_filename == True else ''
  fo_head = dir_out+uniq_id+'_'+str(i).zfill(3)+'__'+addn

  if action == 'generate':
    if predefined_file_out != '':
      file_out = dir_out+predefined_file_out
    else:
      file_out = fo_head+slug(prompt)[:trunc]+'.wav'

    if use_diffusers == True:
      generated_audio = text2audioDiffusers(prompt, negative_prompt, duration, guidance_scale, seed, candidates)
    else:
      generated_audio = text2audio(prompt, duration, None, guidance_scale, seed, candidates, ddim_steps)


  elif action == 'audio2audio':
    file_out = fo_head+basename(init_path)+'.wav'
    generated_audio = text2audio('placeholder', duration, init_path, guidance_scale, seed, candidates, ddim_steps)
  elif action == 'superres':
    file_out = fo_head+basename(init_path)+'.wav'
    y, sr = librosa.load(init_path, sr=None)
    duration = librosa.get_duration(y, sr=sr)
    if duration > 30: duration = 30
    generated_audio = superres(None, duration, init_path, guidance_scale, seed, candidates, ddim_steps)
  elif action == 'style':
    file_out = fo_head+basename(init_path)+'_'+slug(prompt)[:trunc]+'.wav'
    generated_audio = styleaudio(prompt, duration, init_path, style_strength, guidance_scale, seed, ddim_steps)
  else:
    op(c.fail, 'Something went wrong.')
    sys.exit()

  if stereo_width > 0:
    op(c.okb, 'Working on stereo...')
    lefty = dir_tmp+'left.wav'
    righto = dir_tmp+'right.wav'
    sf.write(lefty, generated_audio.T, sr, subtype='PCM_16')
    style_strength = 0.15
    left_channel, init_sr = librosa.load(lefty, sr=None, mono=True)
    left_duration = librosa.get_duration(left_channel, sr)
    duration = round_to_multiple(left_duration, 2.5)
    right_channel = styleaudio(prompt, duration, lefty, style_strength, guidance_scale, seed, ddim_steps)
    sf.write(righto, right_channel.T, sr, subtype='PCM_16')
    left_channel, _ = librosa.load(lefty, sr=None)
    right_channel, _ = librosa.load(righto, sr=None)
    last_sample = min([len(left_channel), len(right_channel)])
    generated_audio = narrow_stereo(left_channel[:last_sample], right_channel[:last_sample], stereo_width)

  if convert_to_44khz == True:
    op(c.okb, 'Converting to 44.1 kHz...')
    tmp_16k = dir_tmp+'16k.wav'
    tmp_44k = dir_tmp+'44k.wav'
    sf.write(tmp_16k, generated_audio.T, sr, subtype='PCM_24')
    !ffmpeg -hide_banner -loglevel panic -i "{tmp_16k}" -ar 44100 "{tmp_44k}"
    if simulate_high_end > 0:
      op(c.okb, 'Simulating high-end...')
      hpm = apply_highpass_mix(tmp_44k, hipass_vol=simulate_high_end, mix_vol=0.8, cutoff=11000)
      sf.write(file_out, hpm.T, 44100, subtype='PCM_24')
      os.remove(tmp_44k)
    else:
      shutil.copy(tmp_44k, file_out)
      os.remove(tmp_44k)
    if keep_16k == True:
      shutil.copy(tmp_16k, file_out.replace(dir_out, dir_16k))
    os.remove(tmp_16k)
  else:
    sf.write(file_out, generated_audio.T, sr, subtype='PCM_24')

  if os.path.isfile(file_out):
    if display_players: audio_player(file_out, sr=sr)
    print()
    if output_dir == '':
      dl_btn(file_out, show_filename=True)
    else:
      op(c.ok, 'Saved as', file_out.replace(drive_root, ''), time=True)
  else:
    op(c.fail, 'Error saving', file_out.replace(drive_root, ''), time=True)  

# -- END THINGS --

timer_end = time.time()

print()
op(c.okb, 'Elapsed', timedelta(seconds=timer_end-timer_start), time=True)
op(c.ok, 'FIN.')