# Voice Assistant for Knowledge Base
Build your own voice assistant for your knowledge base using Whisper.

---

## Introduction

This is how the system will work:
1. Record our voice input, which is the user query.
2. Transcribe the voice input into text using Whisper.
3. Use the `RetrievalQA` chain to retrieve the answer from the knowledge base using LLM.
4. Convert the answer into voice output and play it.

The core of the project revolves around a robust question-answering mechanism. This process initiates with loading the vector database, a repository housing several documents relevant to our potential queries. On posing a question, the system retrieves the documents from this database and, along with the question, feeds them to the LLM. The LLM then generates the response based on retrieved documents.

## Setup

In [1]:
import openai
import os
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())
openai.api_type = os.environ.get("OPENAI_API_TYPE")
openai.api_base = os.environ.get("OPENAI_API_BASE")
openai.api_version = os.environ.get("OPENAI_API_VERSION")
openai.api_key = os.environ.get("OPENAI_API_KEY")

## Building the system

### 1. Getting the data

#### 1.1 Sourcing content from HF

In [2]:
from typing import List


def _get_documentation_urls() -> List[str]:
    # List of relative URLs for Hugging Face documentation pages
    return [
        "/docs/huggingface_hub/guides/overview",
        "/docs/huggingface_hub/guides/download",
        "/docs/huggingface_hub/guides/upload",
        # '/docs/huggingface_hub/guides/hf_file_system',
        # '/docs/huggingface_hub/guides/repository',
        # '/docs/huggingface_hub/guides/search',
        # You may add additional URLs here or replace all of them
    ]


def _construct_full_url(base_url: str, relative_url: str) -> str:
    # Construct the full URL by appending the relative URL to the base URL
    return base_url + relative_url

In [3]:
import requests
import re
from bs4 import BeautifulSoup


def _scrape_page_content(url: str) -> str:
    # Send a GET request to the URL and parse the HTML response using BeautifulSoup
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")
    # Extract the desired content from the page (in this case, the body text)
    text = soup.body.text.strip()
    # Remove non-ASCII characters
    text = re.sub(r"[\x00-\x08\x0b-\x0c\x0e-\x1f\x7f-\xff]", "", text)
    # Remove extra whitespace and newlines
    text = re.sub(r"\s+", " ", text)
    return text.strip()


def _scrape_all_content(
    base_url: str, relative_urls: List[str], file_path: str
) -> List[str]:
    # Loop through the list of URLs, scrape content and add it to the content list
    content = []
    for relative_url in relative_urls:
        full_url = _construct_full_url(base_url, relative_url)
        scraped_content = _scrape_page_content(full_url)
        content.append(scraped_content.rstrip("\n"))

    # Write the scraped content to a file
    with open(file_path, "w", encoding="utf-8") as file:
        for item in content:
            file.write("%s\n" % item)

    return content

#### 1.2 Loading and splitting texts

In [None]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.schema import Document
import os

# Define a function to load documents from a file
def _load_docs(file_path) -> List[Document]:
    # Create an empty list to hold the documents
    docs = []
    try:
        # Load the file using the TextLoader class and UTF-8 encoding
        loader = TextLoader(file_path, encoding="utf-8")
        # Split the loaded file into separate documents and add them to the list of documents
        docs.extend(loader.load_and_split())
    except Exception as e:
        # If an error occurs during loading, ignore it and return an empty list of documents
        pass
    # Return the list of documents
    return docs


def _split_docs(docs: List[Document]) -> List[Document]:
    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    return text_splitter.split_documents(docs)

### 2. Embedding and storing in Deep Lake

In [5]:
from langchain.vectorstores import DeepLake
from langchain.embeddings import HuggingFaceEmbeddings


def create_knowledge_base(dataset_path: str) -> DeepLake:
    """
    Creates a DeepLake database from the Hugging Face documentation.

    :param dataset_path: The path for DeepLake database.
    :return: A DeepLake database.
    """
    base_url = "https://huggingface.co"
    # Set the file_path to which the scraped content will be saved
    file_path = "../../temp/voice_assistant_kb.txt"
    relative_urls = _get_documentation_urls()
    # Scrape all the content from the relative URLs and save it to the content file
    content = _scrape_all_content(base_url, relative_urls, file_path)
    # Load the content from the file
    docs = _load_docs(file_path)
    # Split the content into individual documents
    docs = _split_docs(docs)
    # Create a DeepLake database with the given dataset path and embedding function

    db = DeepLake(dataset_path=dataset_path, embedding_function=HuggingFaceEmbeddings())
    # Add the individual documents to the database
    db.add_documents(docs)
    # Clean up by deleting the content file
    # os.remove(file_path)
    return db

### 3. Search db for answer

In [19]:
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import RetrievalQA
from typing import Dict

# Search the database for a response based on the user's query
def get_LLM_response(user_input: str, db: DeepLake) -> Dict[str, str]:
    """
    Generates LLM response after searching the database for relevant info based on the user's query.

    :param user_input: The user's query.
    :param db: The DeepLake database.
    :return: The LLM response.
    """
    retriever = db.as_retriever()
    retriever.search_kwargs["distance_metric"] = "cos"
    retriever.search_kwargs["fetch_k"] = 100
    retriever.search_kwargs["maximal_marginal_relevance"] = True
    retriever.search_kwargs["k"] = 4
    model = AzureChatOpenAI(deployment_name="gpt4", temperature=0)
    qa = RetrievalQA.from_llm(model, retriever=retriever, return_source_documents=True)

    print("\nQuerying LLM...")
    return qa(user_input)

### 4. Voice Assistant

#### 4.1 Record Audio

In [7]:
import sounddevice as sd
from scipy.io.wavfile import write


def record_and_save_audio(record_sec: int, file_path: str) -> None:
    """
    Record audio for the given number of seconds and save it to the given file path.

    :param record_sec: The number of seconds to record audio for.
    :param file_path: The file path to which the recorded audio should be saved; must be a `.wav` file.
    """
    fs = 44100  # Sample rate

    print(f"\nRecording for {record_sec} seconds...")
    myrecording = sd.rec(int(record_sec * fs), samplerate=fs, channels=1)
    sd.wait()  # Wait until recording is finished
    print("Recording finished.")

    write(file_path, fs, myrecording)  # Save as WAV file
    print(f"Recording saved at {file_path}")

#### 4.2 Transcribe Recorded Audio (Speech to Text)

In [30]:
import whisper

# Transcribe audio using OpenAI Whisper API
def transcribe_audio(audio_file_path: str) -> str:
    """
    Converts the given audio file to text using the OpenAI Whisper API.

    :param audio_file_path: The path to the audio file to be transcribed.
    :return: The transcribed text.
    """
    try:
        model = whisper.load_model("base")

        print("\nTranscribing audio...")
        response = model.transcribe(audio_file_path, fp16=False)
        print("Transcription complete.")

        print(f"Transcribed text: {response['text']}")

        return response["text"]
    except Exception as e:
        print(f"Error calling Whisper API: {str(e)}")
        return None

#### 4.3 Play the response (Text to Speech)

In [9]:
from gtts import gTTS
import sounddevice as sd
import soundfile as sf


def text_to_speech_play(text: str, file_path: str, language: str = "en") -> None:
    """
    Convert text to speech, save it to a file and then play it.

    :param text: The text to convert to speech.
    :param file_path: The path to the file to save the speech to.
    :param language: The language of the text.
    """
    myobj = gTTS(text=text, lang=language, slow=False)

    # Saving the converted audio in a mp3 file named
    myobj.save(file_path)
    print(f"\nResponse as speech saved at {file_path}")

    # Playing the converted file
    # Extract data and sampling rate from file
    data, fs = sf.read(file_path, dtype="float32")
    sd.play(data, fs)
    status = sd.wait()  # Wait until file is done playing

### 5. Putting it all together

#### 5.1 Get the Knowledge Base at VectorStore ready

In [10]:
my_activeloop_org_id = os.environ.get("ACTIVELOOP_ORG_ID")
my_activeloop_dataset_name = "voice_assistant_kb"
dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"
db = create_knowledge_base(dataset_path)

Your Deep Lake dataset has been successfully created!
The dataset is private so make sure you are logged in!


 

Dataset(path='hub://iamrk04/voice_assistant_kb', tensors=['embedding', 'id', 'metadata', 'text'])

  tensor      htype      shape     dtype  compression
  -------    -------    -------   -------  ------- 
 embedding  embedding  (10, 768)  float32   None   
    id        text      (10, 1)     str     None   
 metadata     json      (10, 1)     str     None   
   text       text      (10, 1)     str     None   


#### 5.2 Get the VectorStore handle

In [11]:
def load_embeddings_and_database(active_loop_data_set_path: str) -> DeepLake:
    embeddings = HuggingFaceEmbeddings()
    print("\nLoading database...")
    db = DeepLake(
        dataset_path=active_loop_data_set_path,
        read_only=True,
        embedding_function=embeddings,
    )
    print("Database loaded.")
    return db

#### 5.3 The `main()` function

In [20]:
import os


def main(record_sec: int = 10) -> None:
    """
    The main function that runs the entire system.

    :param record_sec: The number of seconds to record audio for.
    """
    temp_folder = "../../temp"
    os.makedirs(temp_folder, exist_ok=True)
    record_audio_file_path = temp_folder + "/record.wav"
    output_audio_file_path = temp_folder + "/output.mp3"

    # record audio
    record_and_save_audio(record_sec=record_sec, file_path=record_audio_file_path)

    # transcribe audio
    text = transcribe_audio(audio_file_path=record_audio_file_path)

    # load embeddings and database
    my_activeloop_org_id = os.environ.get("ACTIVELOOP_ORG_ID")
    my_activeloop_dataset_name = "voice_assistant_kb"
    dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"
    db = load_embeddings_and_database(active_loop_data_set_path=dataset_path)

    # do similarity search
    response = get_LLM_response(user_input=text, db=db)
    print(f"Response: {response}")

    result = response["result"]

    # text to speech
    text_to_speech_play(text=result, file_path=output_audio_file_path)

### 6. Run the system

In [31]:
main(record_sec=8)


Recording for 8 seconds...
Recording finished.
Recording saved at ../../temp/record.wav

Transcribing audio...
Transcription complete.
Transcribed text:  How do I search for model at hugging face hub?

Loading database...
Deep Lake Dataset in hub://iamrk04/voice_assistant_kb already exists, loading from the storage
Database loaded.

Querying LLM...
Response: {'query': ' How do I search for model at hugging face hub?', 'result': 'To search for models on the Hugging Face Hub using the `huggingface_hub` library, you can use the `HfApi` class and its `search_models()` method. Here\'s an example of how to search for models:\n\n```python\nfrom huggingface_hub import HfApi\n\napi = HfApi()\nmodels = api.search_models("bert")\n\nfor model in models:\n    print(model.modelId)\n```\n\nReplace "bert" with the keyword or model name you want to search for. This will return a list of models matching your search query, and you can iterate through the results to display the model IDs.', 'source_docum