# Creating a Voice Assistant for a Knowledge Base

* [1. Sourcing Content from Hugging Face Hub](#sourcing)
    * [1.1. Scrape content](#scrape)
    * [1.2. Loading and splitting texts](#load_split)
* [2. Embedding and storing in Deep Lake](#storing)
* [3. Voice Assistant](#assistant)
* [4. User Interaction](#interaction)

The main purpose here is to create a voice assistant that can efficiently navigate a knowledge base, providing precise and timely responses to a user's queries.

Inspiration taken from: [GitHub repo](https://github.com/peterw/JarvisBase)

In [1]:
import sys, os
sys.path.append('..')

from keys import OPENAI_API_KEY, ACTIVELOOP_TOKEN, ELEVEN_API_KEY
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
os.environ["ACTIVELOOP_TOKEN"] = ACTIVELOOP_TOKEN
os.environ["ELEVEN_API_KEY"] = ELEVEN_API_KEY

Main stages:
1. Transcribe voice inputs into text - automatic speech recognition (ASR) using OpenAI's Whisper.
2. Generating response to the question:
    - loading the vector database, a repository housing relevant documents
    - retrieve the documents and feed them along with question to LLM
    - LLM then generates the response based on retrieved documents
3. Generating voice outputs - employ Eleven Labs.

<hr>
<a class="anchor" id="sourcing">
    
## 1. Sourcing Content from Hugging Face Hub
    
</a>

The knowledge base for our voice assistant will be created by the articles from the Hugging Face Hub. We'll do some web scraping in order to collect the documents documents.

In [2]:
# Import necessary modules
import os
import requests
from bs4 import BeautifulSoup
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import DeepLake
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import TextLoader
import re

# Set up the path for DeepLake (a vector database)
my_activeloop_org_id = "iryna"
my_activeloop_dataset_name = "voice_assistant_data"
dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"

# Set up an OpenAIEmbeddings instance
model_params = {
    "model_name": "text-embedding-ada-002",
}
embeddings =  OpenAIEmbeddings(model_kwargs=model_params)

<hr>
<a class="anchor" id="scrape">
    
### 1.1. Scrape content
    
</a>

In [3]:
def get_documentation_urls():
    # List of relative URLs for the desired Hugging Face documentation pages
    return [
        '/docs/huggingface_hub/guides/overview',
        '/docs/huggingface_hub/guides/download',
        '/docs/huggingface_hub/guides/upload',
        '/docs/huggingface_hub/guides/hf_file_system',
        '/docs/huggingface_hub/guides/repository',
        '/docs/huggingface_hub/guides/search',
    ]


def construct_full_url(base_url, relative_url):
    # Construct the full URL by appending the relative URL to the base URL
    return base_url + relative_url


def scrape_page_content(url):
    # Send a GET request to the URL and parse the HTML response using BeautifulSoup
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    # Extract the desired content from the page (in this case, the body text)
    text=soup.body.text.strip()
    # Remove non-ASCII characters
    text = re.sub(r'[\x00-\x08\x0b-\x0c\x0e-\x1f\x7f-\xff]', '', text)
    # Remove extra whitespace and newlines
    text = re.sub(r'\s+', ' ', text)
    return text.strip()


def scrape_all_content(base_url, relative_urls, filename):
    # Loop through the list of URLs, scrape content and add it to the content list
    content = []
    for relative_url in relative_urls:
        print("loading content for", relative_url)
        full_url = construct_full_url(base_url, relative_url)
        scraped_content = scrape_page_content(full_url)
        content.append(scraped_content.rstrip('\n'))

    # Write the scraped content to a file
    with open(filename, 'w', encoding='utf-8') as file:
        for item in content:
            file.write("%s\n" % item)
    
    return content

<hr>
<a class="anchor" id="load_split">
    
### 1.2. Loading and splitting texts
    
</a>

In [4]:
# Define a function to load documents from a file
def load_docs(root_dir,filename):
    # Create an empty list to hold the documents
    docs = []
    try:
        # Load the file using the TextLoader class and UTF-8 encoding
        loader = TextLoader(os.path.join(
            root_dir, filename), encoding='utf-8')
        # Split the loaded file into separate documents and add them to the list of documents
        docs.extend(loader.load_and_split())
    except Exception as e:
        # If an error occurs during loading, ignore it and return an empty list of documents
        pass
    # Return the list of documents
    return docs
  
    
def split_docs(docs):
    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    return text_splitter.split_documents(docs)

<hr>
<a class="anchor" id="storing">
    
## 2. Embedding and storing in Deep Lake
    
</a>

In [5]:
# Set the root directory where the content file will be saved
root_dir ='../data/'
# Set the name of the file to which the scraped content will be saved
filename = 'voice_assistant_content.txt'
filepath = root_dir+filename

relative_urls = get_documentation_urls()
    
# Scrape all the content from the relative URLs and save it to the content file
base_url = 'https://huggingface.co'
content = scrape_all_content(base_url, relative_urls, filepath)

loading content for /docs/huggingface_hub/guides/overview
loading content for /docs/huggingface_hub/guides/download
loading content for /docs/huggingface_hub/guides/upload
loading content for /docs/huggingface_hub/guides/hf_file_system
loading content for /docs/huggingface_hub/guides/repository
loading content for /docs/huggingface_hub/guides/search


In [6]:
# Load the content from the file
docs = load_docs(root_dir, filepath)

# Split the content into individual documents
texts = split_docs(docs)
    
# Create a DeepLake database with the given dataset path and embedding function
db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings)
# Add the individual documents to the database
db.add_documents(texts)
    
# Clean up by deleting the content file
os.remove(filepath)

Using embedding function is deprecated and will be removed in the future. Please use embedding instead.


Your Deep Lake dataset has been successfully created!


/

Dataset(path='hub://iryna/voice_assistant_data', tensors=['embedding', 'id', 'metadata', 'text'])

  tensor      htype      shape      dtype  compression
  -------    -------    -------    -------  ------- 
 embedding  embedding  (18, 1536)  float32   None   
    id        text      (18, 1)      str     None   
 metadata     json      (18, 1)      str     None   
   text       text      (18, 1)      str     None   


 

<hr>
<a class="anchor" id="assistant">
    
## 3. Voice Assistant
    
</a>

Once all the necessary data is stored in the vector database in DeepLake, we can utilize this data in our chatbot.

In [13]:
import openai
import streamlit as st
from audio_recorder_streamlit import audio_recorder
from elevenlabs import generate
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import DeepLake
from streamlit_chat import message

# Constants
TEMP_AUDIO_PATH = "../data/temp_audio.wav"
AUDIO_FORMAT = "audio/wav"

In [14]:
# Function to create an instance of the DeepLake vector database
def load_embeddings_and_database(active_loop_data_set_path):
    embeddings = OpenAIEmbeddings()
    db = DeepLake(
        dataset_path=active_loop_data_set_path,
        read_only=True,
        embedding_function=embeddings
    )
    return db

In [15]:
# Transcribe audio using OpenAI Whisper API
def transcribe_audio(audio_file_path, openai_key):
    openai.api_key = openai_key
    try:
        with open(audio_file_path, "rb") as audio_file:
            response = openai.Audio.transcribe("whisper-1", audio_file)
        return response["text"]
    except Exception as e:
        print(f"Error calling Whisper API: {str(e)}")
        return None

In [16]:
# Record audio using audio_recorder and transcribe using transcribe_audio
def record_and_transcribe_audio():
    audio_bytes = audio_recorder()
    transcription = None
    if audio_bytes:
        st.audio(audio_bytes, format=AUDIO_FORMAT)

        with open(TEMP_AUDIO_PATH, "wb") as f:
            f.write(audio_bytes)

        if st.button("Transcribe"):
            transcription = transcribe_audio(TEMP_AUDIO_PATH, openai.api_key)
            os.remove(TEMP_AUDIO_PATH)
            display_transcription(transcription)

    return transcription


# Display the transcription of the audio on the app
def display_transcription(transcription):
    if transcription:
        st.write(f"Transcription: {transcription}")
        with open("audio_transcription.txt", "w+") as f:
            f.write(transcription)
    else:
        st.write("Error transcribing audio.")

        
# Get user input from Streamlit text input field
def get_user_input(transcription):
    return st.text_input("", value=transcription if transcription else "", key="input")

In [17]:
# Search the database for a response based on the user's query
def search_db(user_input, db):
    print(user_input)
    retriever = db.as_retriever()
    retriever.search_kwargs['distance_metric'] = 'cos'
    retriever.search_kwargs['fetch_k'] = 100
    retriever.search_kwargs['maximal_marginal_relevance'] = True
    retriever.search_kwargs['k'] = 4
    model = ChatOpenAI(model_name='gpt-3.5-turbo')
    qa = RetrievalQA.from_llm(model, retriever=retriever, return_source_documents=True)
    return qa({'query': user_input})

In [18]:
# Display conversation history using Streamlit messages
def display_conversation(history):
    for i in range(len(history["generated"])):
        message(history["past"][i], is_user=True, key=str(i) + "_user")
        message(history["generated"][i],key=str(i))
        
        #Voice using Eleven API
        voice= "Bella"
        text= history["generated"][i]
        audio = generate(text=text, voice=voice,api_key=eleven_api_key)
        st.audio(audio, format='audio/mp3')

<hr>
<a class="anchor" id="interaction">
    
## 4. User Interaction
    
</a>

In [None]:
# Main function to run the app
def main():
    # Initialize Streamlit app with a title
    st.write("# JarvisBase 🧙")
   
    # Load embeddings and the DeepLake database
    db = load_embeddings_and_database(dataset_path)

    # Record and transcribe audio
    transcription = record_and_transcribe_audio()

    # Get user input from text input or audio transcription
    user_input = get_user_input(transcription)

    # Initialize session state for generated responses and past messages
    if "generated" not in st.session_state:
        st.session_state["generated"] = ["I am ready to help you"]
    if "past" not in st.session_state:
        st.session_state["past"] = ["Hey there!"]
        
    # Search the database for a response based on user input and update the session state
    if user_input:
        output = search_db(user_input, db)
        print(output['source_documents'])
        st.session_state.past.append(user_input)
        response = str(output["result"])
        st.session_state.generated.append(response)

    #Display conversation history using Streamlit messages
    if st.session_state["generated"]:
        display_conversation(st.session_state)

# Run the main function when the script is executed
if __name__ == "__main__":
    main()

- [Eleven Labs Website](https://elevenlabs.io/)
- [Eleven Labs API documentation](https://api.elevenlabs.io/docs)
- [Voice Assitant GitHub Repo](https://github.com/peterw/JarvisBase)