# bark TTS Demo

#### Load documents, build the GPTSimpleVectorIndex

In [1]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index import GPTSimpleVectorIndex, SimpleDirectoryReader
from IPython.display import Markdown, display
from gpt_index.tts import barkTTS
from bark import SAMPLE_RATE
from IPython.display import Audio

INFO:numexpr.utils:Note: NumExpr detected 32 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
Note: NumExpr detected 32 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
NumExpr defaulting to 8 threads.


In [2]:
# load documents
documents = SimpleDirectoryReader("./data").load_data()

In [3]:
# create index
index = GPTSimpleVectorIndex.from_documents(documents)

INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 17617 tokens
> [build_index_from_nodes] Total embedding token usage: 17617 tokens


In [4]:
# save index to disk
index.save_to_disk("index_simple.json")

In [5]:
# load index from disk
index = GPTSimpleVectorIndex.load_from_disk("index_simple.json")

#### Query Index

In [6]:
# set Logging to DEBUG for more detailed outputs
response = index.query("How did the author use IBM 1401?")

INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 4017 tokens
> [query] Total LLM token usage: 4017 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 9 tokens
> [query] Total embedding token usage: 9 tokens


In [7]:
display(Markdown(f"<b>{response}</b>"))

<b>

The author used the IBM 1401 to write programs in an early version of Fortran. He and his friend Rich Draves were given permission to use the machine, which was located in the basement of their junior high school. The programs he wrote were limited by the fact that the only form of input was data stored on punched cards, and he did not have any data stored on punched cards. His clearest memory of using the IBM 1401 was when he learned that programs could not terminate, when one of his programs did not. He used the IBM 1401 to explore programming and develop his skills in coding, while also painting still lives in his bedroom at night.</b>

#### bark TTS

##### Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. 

##### Check here for more details of bark TTS system: https://github.com/suno-ai/bark

#### Inference speed

##### Bark has been tested and works on both CPU and GPU (pytorch 2.0+, CUDA 11.7 and CUDA 12.0). Running Bark requires running >100M parameter transformer models. On modern GPUs and PyTorch nightly, Bark can generate audio in roughly realtime. On older GPUs, default colab, or CPU, inference time might be 10-100x slower.

In [8]:
# Initialise barkTTS
tts = barkTTS()

#### Convert response to audio.

In [9]:
audio_array = tts.generate_bark_audio(str(response))

Audio(audio_array, rate=SAMPLE_RATE)

INFO:bark.generation:model loaded: 312.3M params, 1.269 loss
model loaded: 312.3M params, 1.269 loss


100%|██████████| 100/100 [00:01<00:00, 51.56it/s] 


INFO:bark.generation:model loaded: 314.4M params, 2.901 loss
model loaded: 314.4M params, 2.901 loss


100%|██████████| 8/8 [00:04<00:00,  1.93it/s]


INFO:bark.generation:model loaded: 302.1M params, 2.079 loss
model loaded: 302.1M params, 2.079 loss


100%|██████████| 100/100 [00:01<00:00, 65.43it/s] 
100%|██████████| 9/9 [00:04<00:00,  2.02it/s]
100%|██████████| 100/100 [00:01<00:00, 63.01it/s]
100%|██████████| 8/8 [00:04<00:00,  1.91it/s]
100%|██████████| 100/100 [00:01<00:00, 57.91it/s]
100%|██████████| 10/10 [00:05<00:00,  1.99it/s]
100%|██████████| 100/100 [00:01<00:00, 65.37it/s]
100%|██████████| 9/9 [00:04<00:00,  1.99it/s]
100%|██████████| 100/100 [00:02<00:00, 42.19it/s]
100%|██████████| 13/13 [00:06<00:00,  1.88it/s]
100%|██████████| 100/100 [00:01<00:00, 62.19it/s]
100%|██████████| 9/9 [00:04<00:00,  1.90it/s]
100%|██████████| 100/100 [00:02<00:00, 37.39it/s]
100%|██████████| 15/15 [00:07<00:00,  1.91it/s]
100%|██████████| 100/100 [00:02<00:00, 44.03it/s]
100%|██████████| 13/13 [00:06<00:00,  1.95it/s]
100%|██████████| 100/100 [00:02<00:00, 34.87it/s]
100%|██████████| 16/16 [00:08<00:00,  1.91it/s]
100%|██████████| 100/100 [00:02<00:00, 40.16it/s]
100%|██████████| 14/14 [00:07<00:00,  1.94it/s]
100%|██████████| 100/100 [0

#### Convert response to audio using cloned voices.

##### lang_speaker_voice - parameter allows to use some default speaker cloned voices with bark. Check `Voice Presets and Voice/Audio Cloning` in their github repo.

In [10]:
audio_array = tts.generate_bark_audio(str(response), lang_speaker_voice="en_speaker_1")

Audio(audio_array, rate=SAMPLE_RATE)

100%|██████████| 100/100 [00:01<00:00, 50.73it/s]
100%|██████████| 11/11 [00:05<00:00,  1.90it/s]
100%|██████████| 100/100 [00:01<00:00, 55.39it/s] 
100%|██████████| 10/10 [00:05<00:00,  1.95it/s]
100%|██████████| 100/100 [00:01<00:00, 66.41it/s]
100%|██████████| 8/8 [00:04<00:00,  1.92it/s]
100%|██████████| 100/100 [00:01<00:00, 55.53it/s] 
100%|██████████| 10/10 [00:05<00:00,  1.85it/s]
100%|██████████| 100/100 [00:01<00:00, 77.22it/s] 
100%|██████████| 7/7 [00:03<00:00,  1.86it/s]
100%|██████████| 100/100 [00:01<00:00, 51.07it/s]
100%|██████████| 11/11 [00:05<00:00,  2.01it/s]
100%|██████████| 100/100 [00:01<00:00, 81.04it/s] 
100%|██████████| 7/7 [00:03<00:00,  1.95it/s]
100%|██████████| 100/100 [00:02<00:00, 37.09it/s]
100%|██████████| 15/15 [00:08<00:00,  1.87it/s]
100%|██████████| 100/100 [00:02<00:00, 41.62it/s]
100%|██████████| 13/13 [00:06<00:00,  1.87it/s]
100%|██████████| 100/100 [00:02<00:00, 37.07it/s] 
100%|██████████| 15/15 [00:07<00:00,  1.94it/s]
100%|██████████| 100/