# Bark generation

## Optimizations

There are a few ways to improve generation speed. Check Huggingface [article](https://huggingface.co/blog/optimizing-bark)

Here we'll use 3 of them:
1. Loading model in half-precision
2. Using BetterTransformer to fuse some operations and make it faster
3. Using batches

Also remember that you can use bark-small instead of bark

### Load model in float16 and convert to BetterTransformer

In [None]:
import torch
from optimum.bettertransformer import BetterTransformer
from transformers import BarkProcessor, BarkModel
from bark_tinkering.adversarial_speaker import find_semantics_by_wav, \
    create_voice_preset, save_voice_preset_safetensors, load_voice_preset_safetensors, load_voice_preset_numpy
from bark_tinkering.utils import make_text_generations, save_audio_from_generation
import IPython

device = 'cuda'
transformers_cache_dir = '.cache'
model_id = 'suno/bark'

processor: BarkProcessor = BarkProcessor.from_pretrained(model_id, cache_dir=transformers_cache_dir)

# Load model in float16
model: BarkModel = BarkModel.from_pretrained(model_id, cache_dir=transformers_cache_dir, torch_dtype=torch.float16).to(device)

# Convert model to BetterTransformer
model = BetterTransformer.transform(model, keep_original_model=False)

### Generate in batches

Just pass your inputs to processor as an array

In [None]:
voice_preset = load_voice_preset_safetensors('voice_presets/lina.safetensors')

text_prompt = [
    "Let's try generating speech, with Bark, a text-to-speech model",
    "Wow, batching is so great!",
    "I love Hugging Face it's so cool."]

inputs = processor(text_prompt, voice_preset=voice_preset).to(device)

In [None]:
torch.manual_seed(2)
with torch.inference_mode():
  # samples are generated all at once
  speech_output, speech_output_lenghts = model.generate(**inputs, 
                                 do_sample = True, 
                                 semantic_temperature=0.7, 
                                 coarse_temperature=1,
                                 fine_temperature=0.5,
                                 return_output_lengths=True, # Important! Get lengths to cut samples after generation
                                 min_eos_p=0.05) # minimum probability of EOS token to stop generation and prevent Bark hallucinations

for i in range(speech_output.size(0)):
    audio = speech_output[i, :speech_output_lenghts[i]].cpu().numpy()
    IPython.display.display(IPython.display.Audio(audio, rate=24000))

### Do the same thing with utils

In [None]:
from bark_tinkering.utils import bark_generate
from notebook_utils import display_generation_result

torch.manual_seed(2)
with torch.inference_mode():
    generations = bark_generate(model,
                               **inputs, 
                               do_sample = True, 
                               semantic_temperature=0.7, 
                               coarse_temperature=1,
                               fine_temperature=0.5,
                               min_eos_p=0.05) # minimum probability of EOS token to stop generation and prevent Bark hallucinations

display_generation_result(generations)

Save this sarcastic voiceline about batching

In [None]:
save_audio_from_generation(model, generations[1], 'generations/sarcastic.wav')