# Going emotional

Sometimes Bark understands emotions from the text by itself.

We can use that and create speaker from generation to capture the emotion.

## Load model

In [None]:
import torch
from optimum.bettertransformer import BetterTransformer
from transformers import BarkProcessor, BarkModel
from bark_tinkering.adversarial_speaker import find_semantics_by_wav, \
    create_voice_preset, save_voice_preset_safetensors, load_voice_preset_safetensors, load_voice_preset_numpy, \
    create_voice_preset_from_generation
from bark_tinkering.utils import bark_generate
import IPython
from notebook_utils import display_generation_result

device = 'cuda'
transformers_cache_dir = '.cache'
model_id = 'suno/bark'

processor: BarkProcessor = BarkProcessor.from_pretrained(model_id, cache_dir=transformers_cache_dir)

# Load model in float16
model: BarkModel = BarkModel.from_pretrained(model_id, cache_dir=transformers_cache_dir, torch_dtype=torch.float16).to(device)

# Convert model to BetterTransformer
model = BetterTransformer.transform(model, keep_original_model=False)

## Baseline

In [None]:
original_voice_preset = load_voice_preset_safetensors('voice_presets/lina.safetensors')

In [None]:
def generate_neutral_voicelines(voice_preset):
    text_prompt = ["Does that mean what I think it means?", 
               "Who ordered a pizza?", 
               "It suits you."]
    inputs = processor(text_prompt, voice_preset=voice_preset).to(device)
    torch.manual_seed(0)
    with torch.inference_mode():
        generations = bark_generate(model, **inputs, min_eos_p=0.05, return_output_lengths=True)
        display_generation_result(generations)

In [None]:
generate_neutral_voicelines(original_voice_preset)

## Generate angry voicelines

In [None]:
def generate_angry_voicelines(voice_preset):
    text_prompt = ["I'm so furious right now, I could spit nails!",
              "Someone's gonna pay for this! This is freaking ridiculous!",
              "I want to rip someone's head off!",
              "This is beyond frustrating!",
              "You have got to be kidding me..."]
    inputs = processor(text_prompt, voice_preset=voice_preset).to(device)
    torch.manual_seed(4)
    with torch.inference_mode():
        generations = bark_generate(model, **inputs, min_eos_p=0.05, return_output_lengths=True)
        display_generation_result(generations)
        return generations

In [None]:
angry_generations = generate_angry_voicelines(original_voice_preset)

### Ok, sounds bit angry. Let's make a speaker from this generation

In [None]:
display_generation_result(angry_generations[0])

In [None]:
angry_voice_preset = create_voice_preset_from_generation(angry_generations[0])

In [None]:
generate_neutral_voicelines(angry_voice_preset)

It changed a bit, but can't say Lina is really furious. Also seems like a pizza makes everything better

### Use angry speaker to generate even angrier voicelines

In [None]:
furious_generations = generate_angry_voicelines(angry_voice_preset)

Oooh

In [None]:
display_generation_result(furious_generations[2])

In [None]:
furious_voice_preset = create_voice_preset_from_generation(furious_generations[2])

In [None]:
generate_neutral_voicelines(furious_voice_preset)

Well, now it is dangerous. Let's save it

In [None]:
save_voice_preset_safetensors(angry_voice_preset, 'voice_presets/lina_angry.safetensors')
save_voice_preset_safetensors(furious_voice_preset, 'voice_presets/lina_furious.safetensors')

As you've noticed, some generations are very noisy to say at least. It happens even without a speaker. But it seems creating speaker out of speaker makes it worse, so generate more and choose clean generations.