# Personal Asisatant with web knowledge

![personal Assistant PoC](images/nvidia_personal_assistant.png?raw=true)

In [None]:
#!pip install --upgrade --quiet langchain-nvidia-ai-endpoints
#!pip install langchain==0.1.13 langchain-community==0.0.31 langchain-core==0.1.38
#!pip install beautifulsoup4==4.12.3 numpy==1.26.4 sounddevice==0.4.6 openai-whisper==20231117 rich==13.7.1 

## Core Tech: Nvidia + LangChain

### langchain-nvidia-ai-endpoints

The langchain-nvidia-ai-endpoints package contains LangChain integrations building applications with models on NVIDIA NIM inference microservice. NVIDIA hosted deployments of NIMs are available to test on the [NVIDIA API catalog](https://build.nvidia.com/explore/discover). 

NVIDIA NIM supports models across domains like chat, embedding, and re-ranking models from the community as well as NVIDIA. These models are optimized by NVIDIA to deliver the best performance on NVIDIA accelerated infrastructure and deployed as a NIM, an easy-to-use, prebuilt containers that deploy anywhere using a single command on NVIDIA accelerated infrastructure.

For more information on accessing the chat models through this api, check out the [ChatNVIDIA](https://python.langchain.com/v0.2/docs/integrations/chat/nvidia_ai_endpoints/) documentation.

In [None]:
# Foundation models through the NVIDIA NIM APIs or endpoints
from langchain_nvidia_ai_endpoints import ChatNVIDIA

# along with the LangChain framework
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
from langchain.prompts import PromptTemplate

### App & GUI Dependency

In [None]:
# for allication and GUI
import time
import threading
import numpy as np
import sounddevice as sd
import subprocess
from queue import Queue
from rich.console import Console

# for speech recognition
import whisper

# for search result scraping
import requests
from bs4 import BeautifulSoup

### Setup NVIDIA NIM endpoint

1. Create a free account with NVIDIA, which hosts NVIDIA AI Foundation models.
1. Click on your model of choice.
1. Under Input select the Python tab, and click Get API Key. Then click Generate Key.
1. Copy and save the generated key as NVIDIA_API_KEY. From there, you should have access to the endpoints.

In [None]:
# get access to the NVIDIA API
import getpass
import os
if os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    print("Valid NVIDIA_API_KEY already in environment. Delete to reset")
else:
    nvapi_key = getpass.getpass("NVAPI Key (starts with nvapi-): ")
    assert nvapi_key.startswith("nvapi-"), f"{nvapi_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvapi_key

### LLM on Nvidia

In [None]:
# Working with NVIDIA API Catalog
chat = ChatNVIDIA(
    model="mistralai/mixtral-8x22b-instruct-v0.1",
    temperature=0.1,
    max_tokens=100,
    top_p=1.0,
)

# if you have Nvidia hardware
# Working with NVIDIA NIMs, as fully local computing source
# connect to an embedding NIM running at localhost:8000, specifying a specific model
# chat = ChatNVIDIA(base_url="http://localhost:8000/v1", model="meta/llama3-8b-instruct") 

### LangChain ConversationChain

ConversationChain to have a conversation and load context from memory. Key Features of LangChain's ConversationChain: 
1. Context Management: Tracks the history of the conversation, allowing the AI to provide contextually relevant responses.
1. Memory: Stores important information across multiple turns, such as user preferences or key details mentioned earlier in the conversation.
1. Response Generation: Integrates with various language models to generate natural language responses.
1. State Management: Maintains the overall state of the conversation, keeping track of ongoing topics, user queries, and other relevant details.

In [None]:
template = """
You are a helpful and friendly AI assistant. You are polite, respectful, and aim to provide concise responses of less 
than 20 words.

The conversation transcript is as follows:
{history}

And here is the user's follow-up: {input}

Your response:
"""

PROMPT = PromptTemplate(input_variables=["history", "input"], template=template)

chain = ConversationChain(
    prompt=PROMPT,
    verbose=False,
    memory=ConversationBufferMemory(ai_prefix="Assistant:"),
    llm=chat,
)

## App & GUI

This voice assistant demo leverages advanced Nvidia technology and the LangChain framework. Key features include:

1. Nvidia NIM Computing Power: Serves as the BRAIN, providing robust computational capabilities.
1. LangChain Framework Coordination: Functions as the NERVE, seamlessly managing and integrating various components of the system.
1. Edge Device Compatibility: Supports input and output through voice or text on any edge device, such as AI-enabled PCs.

#### Current Demo Performance and Future Improvements

The current demonstration experiences significant delays in processing both input and output. For input processing, the OpenAI Whisper speech-to-text model is executed on a MacBook. For output, the default Mac text-to-speech application, Say, is utilized. To achieve real-time performance, the following improvement plans are proposed:

* Hardware Upgrade with Nvidia GPUs: 
   1. Deploy Nvidia NIMs locally to utilize self-hosted models
   1. Implement with NVIDIA TensorRT-LLM for superior text-to-speech models, enabling customized voice options
* Multilingual input & output: Integrate Nvidia's technology to support multilingual capabilities using various models, such as those available on Huggingface
* Enhanced Answer Accuracy: Utilize LangGraph to incorporate AI Agent technology, thereby improving the accuracy of responses

These enhancements aim to significantly reduce processing times and improve the overall efficiency and effectiveness of the demonstration


In [None]:
console = Console()
stt = whisper.load_model("base.en") 

def record_audio(stop_event, data_queue):
    """
    Captures audio data from the user's microphone and adds it to a queue for further processing.

    Args:
        stop_event (threading.Event): An event that, when set, signals the function to stop recording.
        data_queue (queue.Queue): A queue to which the recorded audio data will be added.

    Returns:
        None
    """
    def callback(indata, frames, time, status):
        if status:
            console.print(status)
        data_queue.put(bytes(indata))

    with sd.RawInputStream(
        samplerate=16000, dtype="int16", channels=1, callback=callback
    ):
        while not stop_event.is_set():
            time.sleep(0.1)


def transcribe(audio_np: np.ndarray) -> str:
    """
    Transcribes the given audio data using the Whisper speech recognition model.

    Args:
        audio_np (numpy.ndarray): The audio data to be transcribed.

    Returns:
        str: The transcribed text.
    """
    result = stt.transcribe(audio_np, fp16=False)  # Set fp16=True if using a GPU
    text = result["text"].strip()
    return text


def get_llm_response(text: str) -> str:
    """
    Generates a response to the given text using the Nvidia NIM's open language model.

    Args:
        text (str): The input text to be processed.

    Returns:
        str: The generated response.
    """
    response = chain.predict(input=text)
    if response.startswith("Assistant:"):
        response = response[len("Assistant:") :].strip()
    return response

# define a function to fetch srp
def fetch_srp(query):
    # fetch an restful API
    # print('Fetching srp by query')
    r = requests.get(f"https://www.google.com/search?q={query}&hl=en&lr=lang_en")
    #r.content

    # Use the 'html.parser' to parse the page
    soup = BeautifulSoup(r.content, 'html.parser')  
    text = soup.get_text()
    #print(text)
    return text

if __name__ == "__main__":
    console.print("[cyan]Assistant started! Press Ctrl+C to exit.")

    try:
        while True:
            console.input(
                "Press Enter to start recording, then press Enter again to stop."
            )

            data_queue = Queue()  # type: ignore[var-annotated]
            stop_event = threading.Event()
            recording_thread = threading.Thread(
                target=record_audio,
                args=(stop_event, data_queue),
            )
            recording_thread.start()

            input()
            stop_event.set()
            recording_thread.join()

            audio_data = b"".join(list(data_queue.queue))
            audio_np = (
                np.frombuffer(audio_data, dtype=np.int16).astype(np.float32) / 32768.0
            )

            if audio_np.size > 0:
                with console.status("Transcribing...", spinner="earth"):
                    text = transcribe(audio_np)
                console.print(f"[yellow]You: {text}")

                with console.status("Generating response...", spinner="earth"):
                    srp = fetch_srp(text)
                    #append the srp to the text
                    text = text + ". \n\nAnd here is the related search result snippets for you to prepare response: " + srp
                    response = get_llm_response(text)
                    #sample_rate, audio_array = tts.long_form_synthesize(response)

                console.print(f"[cyan]Assistant: {response}")
                #play_audio(sample_rate, audio_array)
                subprocess.run(["say", response])
            else:
                console.print(
                    "[red]No audio recorded. Please ensure your microphone is working."
                )

    except KeyboardInterrupt:
        console.print("\n[red]Exiting...")

    console.print("[blue]Session ended.")


## Acknowledgement

* Thanks DUY HUYNH. The voice processing & console UI code is copied from DUY's work [Build your own voice assistant and run it locally: Whisper + Ollama + Bark](https://blog.duy.dev/build-your-own-voice-assistant-and-run-it-locally/)
* Nvidia for free NIMs endpoint credits
* LangChain for lots of tutorials  

## How to run the app

```shell:
python app_nvidia.py
```