## <b><font color='darkblue'>Preface</font>
([course link](https://learn.deeplearning.ai/courses/langchain-chat-with-your-data/lesson/1/introduction))

## <b><font color='darkblue'>Document Loading</font></b>
([course link](https://learn.deeplearning.ai/courses/langchain-chat-with-your-data/lesson/2/document-loading))

In [123]:
from IPython.display import display, Markdown, Latex

def print_markdown(markdown_content):
    display(Markdown(markdown_content))

### <b><font color='darkgreen'>Retrieval augmented generation</font></b>
<b><font size='3ptx'>In retrieval augmented generation (RAG), an LLM retrieves contextual documents from an external dataset as part of its execution</font></b>. This is useful if we want to ask question about specific documents (e.g., our PDFs, a set of videos, etc).

![RAG flow](images/1.PNG)

In [1]:
#! pip install langchain

In [2]:
import os
import openai
import sys
# sys.path.append(os.path.expandusr('~'))

from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

#### <b>PDF</b>
Let's load a PDF transcript from Andrew Ng's famous CS229 course! These documents are the result of automated transcription so words and sentences are sometimes split unexpectedly.

In [3]:
from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("docs/machinelearning-lecture01.pdf")
pages = loader.load()

Each `page` is a **Document**. A **Document** contains text (`page_content`) and `metadata`.

In [4]:
len(pages)

22

In [5]:
page = pages[0]

In [6]:
print(page.page_content[0:500])

MachineLearning-Lecture01  
Instructor (Andrew Ng):  Okay. Good morning. Welcome to CS229, the machine 
learning class. So what I wanna do today is ju st spend a little time going over the logistics 
of the class, and then we'll start to  talk a bit about machine learning.  
By way of introduction, my name's  Andrew Ng and I'll be instru ctor for this class. And so 
I personally work in machine learning, and I' ve worked on it for about 15 years now, and 
I actually think that machine learning i


In [7]:
page.metadata

{'source': 'docs/machinelearning-lecture01.pdf', 'page': 0}

#### <b>YouTube</b>
**Note:** This can take several minutes to complete:

In [8]:
#!pip install yt_dlp
#!pip install pydub

In [9]:
from langchain.document_loaders.generic import GenericLoader
from langchain.document_loaders.parsers import OpenAIWhisperParser
from langchain.document_loaders.blob_loaders.youtube_audio import YoutubeAudioLoader

In [10]:
# Below codesnippet will take a long time:
#url="https://www.youtube.com/watch?v=jGwO_UgTS7I"
#save_dir="docs/youtube/"
#loader = GenericLoader(
#    YoutubeAudioLoader([url],save_dir),
#    OpenAIWhisperParser()
#)
#docs = loader.load()

In [11]:
#docs[0].page_content[0:500]

#### <b>URLs</b>

In [12]:
from langchain.document_loaders import WebBaseLoader

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [13]:
loader = WebBaseLoader("https://github.com/basecamp/handbook/blob/master/37signals-is-you.md")

In [14]:
docs = loader.load()

In [15]:
print('\n'.join([line for line in docs[0].page_content[:500].split('\n') if line.strip()]))

File not found · GitHub
Skip to content
Navigation Menu
Toggle navigation
            Sign in
        Product
GitHub Copilot
        Write better code with AI
Security
        Find and fix vulnerabilities
Actions
        Automate any workflow
Codespaces


#### <b>Notion</b>
Follow steps here for an example Notion site such as [this one](https://yolospace.notion.site/Blendle-s-Employee-Handbook-e31bff7da17346ee99f531087d8b133f):
* Duplicate the page into your own Notion space and export as Markdown / CSV.
* Unzip it and save it as a folder that contains the markdown file for the Notion page.

In [16]:
from langchain.document_loaders import NotionDirectoryLoader

In [17]:
# loader = NotionDirectoryLoader("docs/Notion_DB")
# docs = loader.load()

In [18]:
# print(docs[0].page_content[0:200])

In [19]:
# docs[0].metadata

## <b><font color='darkblue'>Document Splitting</font></b>
([course link](https://learn.deeplearning.ai/courses/langchain-chat-with-your-data/lesson/3/document-splitting)) <b><font size='3ptx'>Once you've loaded documents, you'll often want to transform them to better suit your application.</font></b>
![steps](images/2.PNG)

The simplest example is you may want to split a long document into smaller chunks that can fit into your model's context window. LangChain has a number of built-in document transformers that make it easy to split, combine, filter, and otherwise manipulate documents. ([more](https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/))

In [20]:
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter

chunk_size =26
chunk_overlap = 4

* [**RecursiveCharacterTextSplitter**](https://api.python.langchain.com/en/latest/character/langchain_text_splitters.character.RecursiveCharacterTextSplitter.html): Splitting text by recursively look at characters.
* [**CharacterTextSplitter**](https://api.python.langchain.com/en/latest/character/langchain_text_splitters.character.CharacterTextSplitter.html): Splitting text that looks at characters.

In [21]:
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap
)
c_splitter = CharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap
)

Why doesn't this split the string below?

In [22]:
text1 = 'abcdefghijklmnopqrstuvwxyz'
r_splitter.split_text(text1)

['abcdefghijklmnopqrstuvwxyz']

In [23]:
text2 = 'abcdefghijklmnopqrstuvwxyzabcdefg'
r_splitter.split_text(text2)

['abcdefghijklmnopqrstuvwxyz', 'wxyzabcdefg']

Ok, this splits the string but we have an overlap specified as 5, but it looks like 3? (try an even number)

In [24]:
text3 = "a b c d e f g h i j k l m n o p q r s t u v w x y z"

In [25]:
r_splitter.split_text(text3)

['a b c d e f g h i j k l m', 'l m n o p q r s t u v w x', 'w x y z']

In [26]:
c_splitter.split_text(text3)

['a b c d e f g h i j k l m n o p q r s t u v w x y z']

In [27]:
c_splitter = CharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap,
    separator = ' '
)
c_splitter.split_text(text3)

['a b c d e f g h i j k l m', 'l m n o p q r s t u v w x', 'w x y z']

### <font color='darkgreen'><b>Recursive splitting details</b></font>
[**RecursiveCharacterTextSplitter**](https://api.python.langchain.com/en/latest/character/langchain_text_splitters.character.RecursiveCharacterTextSplitter.html) is recommended for generic text.

In [28]:
some_text = """When writing documents, writers will use document structure to group content. \
This can convey to the reader, which idea's are related. For example, closely related ideas \
are in sentances. Similar ideas are in paragraphs. Paragraphs form a document. \n\n  \
Paragraphs are often delimited with a carriage return or two carriage returns. \
Carriage returns are the "backslash n" you see embedded in this string. \
Sentences have a period at the end, but also, have a space.\
and words are separated by space."""

In [29]:
len(some_text)

496

In [30]:
c_splitter = CharacterTextSplitter(
    chunk_size=450,
    chunk_overlap=0,
    separator = ' '
)
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=450,
    chunk_overlap=0, 
    separators=["\n\n", "\n", " ", ""]
)

In [31]:
for line in c_splitter.split_text(some_text):
    print(f'> {line} <')

> When writing documents, writers will use document structure to group content. This can convey to the reader, which idea's are related. For example, closely related ideas are in sentances. Similar ideas are in paragraphs. Paragraphs form a document. 

 Paragraphs are often delimited with a carriage return or two carriage returns. Carriage returns are the "backslash n" you see embedded in this string. Sentences have a period at the end, but also, <
> have a space.and words are separated by space. <


In [32]:
for line in r_splitter.split_text(some_text):
    print(f'> {line} <')

> When writing documents, writers will use document structure to group content. This can convey to the reader, which idea's are related. For example, closely related ideas are in sentances. Similar ideas are in paragraphs. Paragraphs form a document. <
> Paragraphs are often delimited with a carriage return or two carriage returns. Carriage returns are the "backslash n" you see embedded in this string. Sentences have a period at the end, but also, have a space.and words are separated by space. <


Let's reduce the chunk size a bit and add a period to our separators:

In [33]:
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=150,
    chunk_overlap=0,
    separators=["\n\n", "\n", "\. ", " ", ""]
)
for line in r_splitter.split_text(some_text):
    print(f'> {line} <')

> When writing documents, writers will use document structure to group content. This can convey to the reader, which idea's are related. For example, <
> closely related ideas are in sentances. Similar ideas are in paragraphs. Paragraphs form a document. <
> Paragraphs are often delimited with a carriage return or two carriage returns. Carriage returns are the "backslash n" you see embedded in this <
> string. Sentences have a period at the end, but also, have a space.and words are separated by space. <


In [34]:
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=150,
    chunk_overlap=0,
    separators=["\n\n", "\n", "(?<=\. )", " ", ""]
)
for line in r_splitter.split_text(some_text):
    print(f'> {line} <')

> When writing documents, writers will use document structure to group content. This can convey to the reader, which idea's are related. For example, <
> closely related ideas are in sentances. Similar ideas are in paragraphs. Paragraphs form a document. <
> Paragraphs are often delimited with a carriage return or two carriage returns. Carriage returns are the "backslash n" you see embedded in this <
> string. Sentences have a period at the end, but also, have a space.and words are separated by space. <


In [35]:
loader = PyPDFLoader("docs/machinelearning-lecture01.pdf")
pages = loader.load()

### <b><font color='darkgreen'>Token splitting</font></b>
<font size='3ptx'><b>We can also split on token count explicity, if we want.</b></font>

This can be useful because LLMs often have context windows designated in tokens. Tokens are often ~4 characters.

In [36]:
from langchain.text_splitter import TokenTextSplitter

text1 = "foo bar bazzyfoo"

In [37]:
text_splitter = TokenTextSplitter(chunk_size=1, chunk_overlap=0)
text_splitter.split_text(text1)

['foo', ' bar', ' b', 'az', 'zy', 'foo']

In [38]:
pages[0].page_content[:100]

'MachineLearning-Lecture01  \nInstructor (Andrew Ng):  Okay. Good morning. Welcome to CS229, the machi'

In [39]:
text_splitter = TokenTextSplitter(chunk_size=10, chunk_overlap=0)
docs = text_splitter.split_text(text1)

In [40]:
docs[0]

'foo bar bazzyfoo'

In [41]:
pages[0].metadata

{'source': 'docs/machinelearning-lecture01.pdf', 'page': 0}

### <b><font color='darkgreen'>Context aware splitting</font></b>
<b><font size='3ptx'>Chunking aims to keep text with common context together.</font></b>

<b>A text splitting often uses sentences or other delimiters to keep related text together</b> but many documents (<font color='brown'>such as Markdown</font>) have structure (<font color='brown'>headers</font>) that can be explicitly used in splitting. We can use [**`MarkdownHeaderTextSplitter`**](https://python.langchain.com/api_reference/text_splitters/markdown/langchain_text_splitters.markdown.MarkdownHeaderTextSplitter.html) to preserve header metadata in our chunks, as show below.

In [42]:
from langchain.text_splitter import MarkdownHeaderTextSplitter

markdown_document = """# Title\n\n \
## Chapter 1\n\n \
Hi this is Jim\n\n Hi this is Joe\n\n \
### Section \n\n \
Hi this is Lance \n\n 
## Chapter 2\n\n \
Hi this is Molly"""

headers_to_split_on = [
    ("#", "Header 1"),
    ("##", "Header 2"),
    ("###", "Header 3"),
]

In [43]:
markdown_splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=headers_to_split_on
)
md_header_splits = markdown_splitter.split_text(markdown_document)

In [44]:
md_header_splits[0]

Document(metadata={'Header 1': 'Title', 'Header 2': 'Chapter 1'}, page_content='Hi this is Jim  \nHi this is Joe')

In [45]:
md_header_splits[1]

Document(metadata={'Header 1': 'Title', 'Header 2': 'Chapter 1', 'Header 3': 'Section'}, page_content='Hi this is Lance')

Try on a real Markdown file from testing file `docs/BTTC_README.md` ([How to load Markdown](https://python.langchain.com/docs/how_to/document_loader_markdown/)):

In [46]:
!pip freeze | grep unstructure

unstructured==0.16.3
unstructured-client==0.26.1


In [47]:
from langchain_community.document_loaders import UnstructuredMarkdownLoader
from langchain_core.documents import Document

markdown_path = "docs/BTTC_README.md"
loader = UnstructuredMarkdownLoader(markdown_path)

In [48]:
#docs = loader.load()
#txt = ' '.join([d.page_content for d in docs])
txt = open(markdown_path).read()

In [49]:
print(txt[:200])

## Common utilities used in BT testing
This package is used to hold common utilities for BT testing.

## Installation
You can install the released package from pip:

```shell
$ pip install bttc
```

o


In [50]:
headers_to_split_on = [
    ("#", "Header 1"),
    ("##", "Header 2"),
    ("###", "Header 3"),
    ("####", "Header 4"),
]
markdown_splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=headers_to_split_on,
)

In [51]:
md_header_splits = markdown_splitter.split_text(txt)

In [52]:
print(md_header_splits[0].page_content[:50])

This package is used to hold common utilities for 


In [53]:
md_header_splits[0].metadata

{'Header 2': 'Common utilities used in BT testing'}

## <b><font color='darkblue'>Vectorestores and Embedding</font></b>
([course link](https://learn.deeplearning.ai/courses/langchain-chat-with-your-data/lesson/4/vectorstores-and-embedding)) <b><font size='3ptx'>Recall the overall workflow for retrieval augmented generation (RAG):</font></b>
![flow](images/4.PNG)

We just discussed `Document Loading` and `Splitting`. Here we are going to prepare some splits for demonstration of vectorestores and embedding topics.

In [54]:
from langchain_community.document_loaders.text import TextLoader

# Load Markdown files
loaders = [
    # Duplicate documents on purpose - messy data
    TextLoader('docs/BTTC_README.md'),
    TextLoader('docs/bt_utils_usages.md'),
    TextLoader('docs/general_utils_usages.md'),
    TextLoader('docs/wifi_utils_usages.md'),
    TextLoader('docs/background_knowledge_bluetooth.md'),
]

docs = []
for loader in loaders:
    docs.extend(loader.load())

In [55]:
# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500,
    chunk_overlap = 150
)

In [56]:
splits = text_splitter.split_documents(docs)

In [57]:
len(splits)

38

### <b><font color='darkgreen'>Embeddings</font></b>
<b><font size='3ptx'>[Embedding models](https://python.langchain.com/docs/concepts/#embedding-models) create a vector representation of a piece of text.</font></b> ([more](https://python.langchain.com/docs/integrations/text_embedding/))

Let's take our splits and embed them. Here we will use **[`OpenAIEmbeddings`](https://python.langchain.com/docs/integrations/text_embedding/openai/)**

In [119]:
import numpy as np
from langchain.embeddings.openai import OpenAIEmbeddings

embedding = OpenAIEmbeddings()
persist_directory = './docs/chroma/'
collection_name = 'bttc_collection'

In [59]:
!rm -rf ./docs/chroma
!mkdir -p ./docs/chroma

In [60]:
sentence1 = "i like dogs"
sentence2 = "i like canines"
sentence3 = "the weather is ugly outside"

In [61]:
embedding1 = embedding.embed_query(sentence1)
embedding2 = embedding.embed_query(sentence2)
embedding3 = embedding.embed_query(sentence3)

In [62]:
np.dot(embedding1, embedding2)

0.9630396460189721

In [63]:
np.dot(embedding1, embedding3)

0.7702742084408513

In [64]:
np.dot(embedding2, embedding3)

0.7590147680413903

### <b><font color='darkgreen'>Vectorstores</font></b>

In [65]:
!pip freeze | grep chromadb

chromadb==0.5.15


In [66]:
from langchain_chroma import Chroma
# from langchain_community.vectorstores.chroma import Chroma

In [67]:
vectordb = Chroma.from_documents(
    collection_name=collection_name,
    documents=splits,
    embedding=embedding,
    persist_directory=persist_directory
)

In [68]:
print(vectordb._collection.count())

38


In [69]:
# help(vectordb)

### <b><font color='darkgreen'>Similarity Search</font></b>

In [70]:
question = "How can I get Bluetooth status?"

In [71]:
docs = vectordb.similarity_search(question, k=3)

In [72]:
len(docs)

3

In [73]:
print(docs[0].page_content)

# Get the Bluetooth Adapter state
>>> dut.bt.le_state
<BluetoothAdapterState.STATE_BLE_ON: 15>
```

If you conduct actions and would like to wait for certain Bluetooth Adapter
states, you could leverage method `dut.bt.wait_adapter_state`. e.g.:
```python
# Import package to get enumeration of Bluetooth Adapter state
>>> from bttc import constants
>>> bt_adapter_state = constants.BluetoothAdapterState

# Turn off Bluetooth
>>> dut.bt.disable()

# Wait for Bluetooth Adapter States `STATE_OFF` or `STATE_BLE_ON` for 3 seconds:
>>> dut.bt.wait_adapter_state({bt_adapter_state.STATE_OFF, bt_adapter_state.STATE_BLE_ON}, timeout_sec=3)
True
>>> dut.bt.le_state
<BluetoothAdapterState.STATE_BLE_ON: 15>
```

### Get Bluetooth status
You can call method `dut.bt.is_enabled` or access attribute `dut.bt.enabled` to
get the Bluetooth status of device. e.g.:
```python
>>> dut.bt.is_enabled()   # Checks if Bluetooth is enabled.
False
>>> dut.bt.enable()       # Enable the Bluetooth of device.
>>> dut.bt.

Let's save this so we can use it later!

In [75]:
#vectordb.persist()

### <b><font color='darkgreen'>Failure modes</font></b>
<b><font size='3ptx'>This seems great, and basic similarity search will get you 80% of the way there very easily.</font></b>

But there are some failure modes that can creep up. Here are some edge cases that can arise - we'll fix them in the next class.

In [76]:
question = "What did they say about Bluetooth?"

In [77]:
docs = vectordb.similarity_search(question, k=5)

Notice that we're getting duplicate chunks. <b><font color='red'>Semantic search fetches all similar documents, but does not enforce diversity.</font></b>

In [78]:
docs[0]

Document(metadata={'source': 'docs/background_knowledge_bluetooth.md'}, page_content='## Terminology\n\n### Piconet\nA piconet in Bluetooth is a small network consisting of one master device and up to seven active slave devices. The master device controls the communication within the piconet, while slave devices follow its instructions. Piconets are the basic building blocks of larger Bluetooth networks.\n\nHow piconets work:\n1. Piconet Formation: A master device initiates a piconet by broadcasting its presence and inviting other devices to join.\n2. Device Roles: Once a device joins, it becomes a slave and synchronizes its clock and frequency hopping pattern with the master.\n3. Communication: The master controls when each device can transmit and receive data, ensuring orderly communication.\n4. Scatternet Formation: If a device is within range of multiple piconets, it can participate in multiple piconets simultaneously, forming a scatternet.\n\n## Abbreviation List\n\n### BDA (Bluet

In [79]:
docs[1]

Document(metadata={'source': 'docs/background_knowledge_bluetooth.md'}, page_content='### FHS (Frequency Hopping Spread Spectrum)\nFHS stands for Frequency Hopping Spread Spectrum. It is a radio transmission technique used by Bluetooth to minimize interference and improve the reliability of wireless communications.\n\nHow FHS works in Bluetooth:\n1. Frequency Hopping: Bluetooth devices rapidly switch between different frequencies within a designated range (the 2.4 GHz ISM band). This hopping pattern is pseudo-random and known to both the transmitting and receiving devices.\n2. Spread Spectrum: The transmitted signal is spread across a wider bandwidth than necessary, making it more resilient to narrowband interference.\n3. Reduced Interference:  By hopping frequencies, Bluetooth avoids staying on any single channel for long, reducing the likelihood of colliding with other wireless devices operating on the same frequencies.\n\n### LMP (Link Manager Protocol.)\n[**LMP**](https://www.bluet

We can see a new failure mode.

The question below asks a question about the `wifi_utils_usage.md`, but includes results from other docs as well.

In [80]:
question = "How do we turn off the Wifi?"

In [81]:
docs = vectordb.similarity_search(question, k=5)

In [82]:
for doc in docs:
    print(doc.metadata)

{'source': 'docs/wifi_utils_usages.md'}
{'source': 'docs/bt_utils_usages.md'}
{'source': 'docs/wifi_utils_usages.md'}
{'source': 'docs/general_utils_usages.md'}
{'source': 'docs/general_utils_usages.md'}


In [83]:
print(docs[4].page_content)

For testing purpose, in order to reduce the flakiness or interruption from other
apps or system notification, you may want to turn on this setting. To do so, you can refer to below sample code:

```python
# Import the package `bttc` and instantiate DUT instance which we have to
# initialize uiautomator while the setting is turned on/off by UI operations.
>>> import bttc
>>> dut = bttc.get('36121FDJG000GR', init_snippet_uiautomator=True)

# Import the package `system_ui` to turn on/off DnD setting:
>>> from bttc.utils.ui_pages import system as system_ui

# Turn of `Do not Disturb` (DnD):
>>> system_ui.set_do_not_disturb(dut, True)

# Confirm that the global setting `zen_mode` reflects the setting:
>>> dut.gm.settings.g['zen_mode']
Settings.Namespace.Setting(name='zen_mode', value='1', err='')
```

If you want to turn if off, it is straight forward too:
```python
# Turn off `Do not Disturb` (DnD):
>>> system_ui.set_do_not_disturb(dut, False)    # Turn off DnD

# Confirm that the global s

Approaches discussed in the next lecture can be used to address both!

## <b><font color='darkblue'>Retrieval</font></b>
([course link](https://learn.deeplearning.ai/courses/langchain-chat-with-your-data/lesson/5/retrieval)) <b><font size='3ptx'>Retrieval is the centerpiece of our retrieval augmented generation (RAG) flow.</font></b>

![flow](images/5.PNG)

Let's get our vectorDB from before and try on different retrieval approaches.

In [84]:
embedding = OpenAIEmbeddings()
vectordb = Chroma(
    collection_name=collection_name,
    persist_directory=persist_directory,
    embedding_function=embedding)

In [85]:
print(vectordb._collection.count())

38


### <b><font color='darkgreen'>Vectorstore retrieval</font></b>
Let's prepare several docs as text for demonstrations of retrieval approaches:

In [86]:
texts = [
    """The Amanita phalloides has a large and imposing epigeous (aboveground) fruiting body (basidiocarp).""",
    """A mushroom with a large fruiting body is the Amanita phalloides. Some varieties are all-white.""",
    """A. phalloides, a.k.a Death Cap, is one of the most poisonous of all known mushrooms.""",
]

In [None]:
smalldb = Chroma.from_texts(texts, embedding=embedding)

In [None]:
question = "Tell me about all-white mushrooms with large fruiting bodies"

In [None]:
smalldb.similarity_search(question, k=2)

In [None]:
smalldb.max_marginal_relevance_search(question,k=2, fetch_k=3)

### <b><font color='darkgreen'>Addressing Diversity: Maximum Marginal Relevance</font></b>

<b><font size='3ptx'>Maximal Marginal Relevance (MMR) is a method used in information retrieval to select documents that are both relevant to the query and diverse with respect to the previously selected documents.</font></b> This approach helps in reducing redundancy and increasing the coverage of different aspects of the query in the selected documents ([more](https://www.kaggle.com/code/marcinrutecki/rag-mmr-search-in-langchain)).
![idea](images/6.PNG)

![idea](images/7.PNG)

In [None]:
question = "Show me usage examples of BTTC?"
docs_ss = vectordb.similarity_search(question, k=3)

In [None]:
print(docs_ss[0].page_content[:100])

In [None]:
print(docs_ss[1].page_content[:100])

Note the difference in results with `MMR`:

In [None]:
docs_mmr = vectordb.max_marginal_relevance_search(question, k=3)

In [None]:
print(docs_mmr[0].page_content[:100])

In [None]:
print(docs_mmr[1].page_content[:100])

### <b><font color='darkgreen'>Addressing Specificity: working with metadata</font></b>
<b><font size='3ptx'>From previous example, we showed that a question about the Wifi functionalities can include results from other irrelevant docs as well.</font></b>

To address this, many vectorstores support operations on `metadata`. `metadata` provides context for each embedded chunk.

In [None]:
question = "How to turn off Wifi?"

In [None]:
docs = vectordb.similarity_search(
    question,
    k=2,
    filter={"source": "docs/wifi_utils_usages.md"}
)

In [None]:
for d in docs:
    print(d.metadata)

#### <b>Working with metadata using self-query retriever</b>
<b><font size='3ptx'>But we have an interesting challenge: we often want to infer the metadata from the query itself.</font></b>

To address this, we can use `SelfQueryRetriever`, which uses an LLM to extract:
 
1. The `query` string to use for vector search
2. A metadata filter to pass in as well

Most vector databases support metadata filters, so this doesn't require any new databases or indexes.

In [None]:
!pip freeze | grep 'lark'

In [87]:
from langchain.llms import OpenAI
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.chains.query_constructor.base import AttributeInfo

In [88]:
metadata_field_info = [
    AttributeInfo(
        name="source",
        description='''The docs explain the usage of BTTC module on certain functionalities.
        - docs/bt_utils_usages.md: Related to Bluetooth operations.
        - docs/background_knowledge_bluetooth.md: Related to background knowledge of Bluetooth.
        - docs/general_utils_usages.md: Related to general functionalities of BTTC. e.g. Get the model of device.
        - docs/wifi_utils_usages.md: Related to WiFi operations. e.g. Turn on/off WiFi of device.
        - docs/BTTC_README.md: The Introduction of BTTC module.
        ''',
        type="string",
    ),
]

**Note:** The default model for `OpenAI` ("from langchain.llms import OpenAI") is `text-davinci-003`. Due to the deprication of OpenAI's model `text-davinci-003` on 4 January 2024, you'll be using OpenAI's recommended replacement model `gpt-3.5-turbo-instruct` instead.

In [89]:
document_content_description = "Usage markdown files"
llm = OpenAI(model='gpt-3.5-turbo-instruct', temperature=0)

  llm = OpenAI(model='gpt-3.5-turbo-instruct', temperature=0)


In [90]:
retriever = SelfQueryRetriever.from_llm(
    llm,
    vectordb,
    document_content_description,
    metadata_field_info,
    verbose=True
)

In [91]:
question = "How to turn off WiFi?"

In [92]:
docs = retriever.get_relevant_documents(question)

  docs = retriever.get_relevant_documents(question)


In [93]:
for d in docs:
    print(d.metadata)

{'source': 'docs/wifi_utils_usages.md'}
{'source': 'docs/wifi_utils_usages.md'}
{'source': 'docs/wifi_utils_usages.md'}


### <b><font color='darkgreen'>Additional tricks: compression</font></b>
<b><font size='3ptx'>Another approach for improving the quality of retrieved docs is compression. ([more](https://python.langchain.com/docs/how_to/contextual_compression/))</font></b>
> Contextual compression is meant to fix this. The idea is simple: instead of immediately returning retrieved documents as-is, you can compress them using the context of the given query, so that only the relevant information is returned. “Compressing” here refers to both compressing the contents of an individual document and filtering out documents wholesale.

Information most relevant to a query may be buried in a document with a lot of irrelevant text.  <b>Passing that full document through your application can lead to more expensive LLM calls and poorer responses.</b>

<b>Contextual compression is meant to fix this.</b> 

In [94]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

In [95]:
def pretty_print_docs(docs):
    print(f"\n{'-' * 80}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]))

In [96]:
# Wrap our vectorstore
llm = OpenAI(temperature=0, model="gpt-3.5-turbo-instruct")
compressor = LLMChainExtractor.from_llm(llm)

In [97]:
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectordb.as_retriever()
)

In [98]:
question = "Show me a few usage examples in using BTTC."
compressed_docs = compression_retriever.get_relevant_documents(question)
pretty_print_docs(compressed_docs)

Document 1:

To see seconds on the clock during testing (helpful for precise timekeeping), use this code:
```python
# Import package `bttc` and get DUT instance.
>>> import bttc
>>> dut = bttc.get('36121FDJG000GR')

# Enable second on clock
>>> dut.gm.enable_second_on_clock()

# Disable second on clock
>>> dut.gm.disable_second_on_clock()
```
--------------------------------------------------------------------------------
Document 2:

1. Piconet Formation: A master device initiates a piconet by broadcasting its presence and inviting other devices to join.
2. Device Roles: Once a device joins, it becomes a slave and synchronizes its clock and frequency hopping pattern with the master.
3. Communication: The master controls when each device can transmit and receive data, ensuring orderly communication.
4. Scatternet Formation: If a device is within range of multiple piconets, it can participate in multiple piconets simultaneously, forming a scatternet.
5. BDA (Bluetooth Device Address)
BD

### <b><font color='darkgreen'>Combining various techniques</font></b>

In [99]:
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectordb.as_retriever(search_type = "mmr"))

In [100]:
question = "What can we use BTTC for?"
compressed_docs = compression_retriever.get_relevant_documents(question)
pretty_print_docs(compressed_docs)

Document 1:

- You can install the released package from pip:
- You may need `sudo` for the above commands if your system has certain permission restrictions.
- You can use function `get_all` to retrieve all local connected adb devices:
- Or use function `get` to retrieve a specific device by its serial number:
- The `dut.gm` utility, loaded automatically when you access your device, streamlines common actions like:
- Getting/setting device properties
- Searching logcat messages
- Managing airplane mode
- Taking screenshots
- Dumping system information (build number, model, etc.)
- And more!
--------------------------------------------------------------------------------
Document 2:

- BRCM stands for Broadcom.
- It is a major semiconductor company that designs and manufactures a wide range of chips, including those used for Bluetooth communication.
- To get the version of BRCM firmware (fw) version by BTTC, below is the sample code:
- Import package bttc and instantiate the DUT object

### <b><font color='darkgreen'>Other types of retrieval</font></b>
<b><font size='3ptx'>It's worth noting that vectordb as not the only kind of tool to retrieve documents. </font></b>

The `LangChain` retriever abstraction includes other ways to retrieve documents, such as TF-IDF or SVM.

In [101]:
from langchain.retrievers import SVMRetriever
from langchain.retrievers import TFIDFRetriever
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [102]:
# Load Markdown files
loaders = [
    # Duplicate documents on purpose - messy data
    TextLoader('docs/BTTC_README.md'),
    TextLoader('docs/bt_utils_usages.md'),
    TextLoader('docs/general_utils_usages.md'),
    TextLoader('docs/wifi_utils_usages.md'),
    TextLoader('docs/background_knowledge_bluetooth.md'),
]

docs = []
for loader in loaders:
    docs.extend(loader.load())

In [107]:
# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1500,chunk_overlap = 150)
splits = text_splitter.split_text("\n".join([doc.page_content for doc in docs]))

In [108]:
# Retrieve
svm_retriever = SVMRetriever.from_texts(splits,embedding)
tfidf_retriever = TFIDFRetriever.from_texts(splits)

In [114]:
question = "What is RNR in Bluetooth?"
docs_svm=svm_retriever.get_relevant_documents(question)
print(docs_svm[0].page_content)

How FHS works in Bluetooth:
1. Frequency Hopping: Bluetooth devices rapidly switch between different frequencies within a designated range (the 2.4 GHz ISM band). This hopping pattern is pseudo-random and known to both the transmitting and receiving devices.
2. Spread Spectrum: The transmitted signal is spread across a wider bandwidth than necessary, making it more resilient to narrowband interference.
3. Reduced Interference:  By hopping frequencies, Bluetooth avoids staying on any single channel for long, reducing the likelihood of colliding with other wireless devices operating on the same frequencies.

### LMP (Link Manager Protocol.)
[**LMP**](https://www.bluetooth.com/wp-content/uploads/Files/Specification/HTML/Core-54/out/en/br-edr-controller/link-manager-protocol-specification.html) in Bluetooth stands for Link Manager Protocol. It is a key protocol within the Bluetooth stack that is responsible for the establishment, management, and control of connections between Bluetooth dev



In [115]:
question = "What is RNR in Bluetooth?"
docs_tfidf=tfidf_retriever.get_relevant_documents(question)
print(docs_tfidf[0].page_content)

### Get Bluetooth Adapter State
The Bluetooth adapter state tells you the current condition of your device's Bluetooth radio.
Think of it like a switch for your Bluetooth: is it off, on, or somewhere in between? This state is crucial for applications that use Bluetooth, as they need to know whether Bluetooth is available and in what capacity.

Bluetooth Adapter State Values:
* `STATE_UNKNOWN` (0): The initial state, usually indicating the system hasn't yet determined the actual Bluetooth adapter state.
* `STATE_OFF` (10): Bluetooth is completely disabled. No Bluetooth functionality is available.
* `STATE_TURNING_ON` (11): The Bluetooth adapter is in the process of turning on. Bluetooth is not yet usable.
* `STATE_ON` (12): Bluetooth is fully enabled and ready for pairing and data exchange.
* `STATE_TURNING_OFF` (13): The Bluetooth adapter is shutting down. Bluetooth is no longer usable.
* `STATE_BLE_TURNING_ON` (14): The Bluetooth adapter is specifically turning on Bluetooth Low Energy

## <b><font color='darkblue'>Question Answering</font></b>
([course link](https://learn.deeplearning.ai/courses/langchain-chat-with-your-data/lesson/6/question-answering))
![flow](images/8.PNG)

* Multiple relevant documents have been retrieved from the vectorestore.
* Potentially compress the relevant splits to fit into the LLM context.
* Send the information along with our question to an LLM to select and format an answer.

We have discussed `Document Loading` and `Splitting` as well as `Storage` and `Retrieval`. Let's load our vectorDB. The code below was added to assign the openai LLM version filmed until it is deprecated, currently in Sept 2023. 

LLM responses can often vary, but the responses may be significantly different when using a different model version.

In [116]:
import datetime

current_date = datetime.datetime.now().date()
if current_date < datetime.date(2023, 9, 2):
    llm_name = "gpt-3.5-turbo-0301"
else:
    llm_name = "gpt-3.5-turbo"
print(llm_name)

gpt-3.5-turbo


In [120]:
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings

embedding = OpenAIEmbeddings()
vectordb = Chroma(
    collection_name=collection_name,
    persist_directory=persist_directory, embedding_function=embedding)

In [121]:
print(vectordb._collection.count())

38


In [122]:
question = "How to turn off the Wifi of device?"
docs = vectordb.similarity_search(question, k=3)
len(docs)

3

In [124]:
print_markdown(docs[0].page_content)

To disable the Bluetooth of device, you can call method `dut.bt.disable`:
```python
>>> dut.bt.enabled    # The Bluetooth is on for now.
True
>>> dut.bt.disable()  # Disable the Bluetooth of device.
>>> dut.bt.enabled    # Get the status of device's Bluetooth
False
```

### Dump the content of Bluetooth manager
To dump the content of Bluetooth manager, call method
`dut.bt.dump_bluetooth_manager`:
```python
>>> bluetooth_manager_output = dut.bt.dump_bluetooth_manager()
>>> print(bluetooth_manager_output[:100])
Bluetooth Status
  enabled: false
  state: BLE_ON
  address: XX:XX:XX:XX:D4:21
  name: Pixel 8 Pro
```

### Enable/Disable Airplane
To enable Airplane mode, we can call method `dut.bt.enable_airplane_mode`:
```python
# Currently, Airplane mode is off
>>> dut.adb.shell(['settings', 'get', 'global', 'airplane_mode_on']).decode().strip()
'0'

# Enable the Airplane mode
>>> dut.gm.enable_airplane_mode()

# Then we have Airplane mode as on
>>> dut.adb.shell(['settings', 'get', 'global', 'airplane_mode_on']).decode().strip()
'1'
```
To disable Airplane mode, we can call method `dut.bt.disable_airplane_mode`:
```python
# Currently, Airplane mode is on
>>> dut.adb.shell(['settings', 'get', 'global', 'airplane_mode_on']).decode().strip()
'1'

# Disable the Airplane mode
>>> dut.gm.disable_airplane_mode()

# Then we have Airplane mode as off
>>> dut.adb.shell(['settings', 'get', 'global', 'airplane_mode_on']).decode().strip()
'0'
```

### <b><font color='darkgreen'>RetrievalQA chain</font></b>
<b><font size='3ptx'>Developing a production-grade LLM application requires many refinements, but tracking multiple versions of prompts, models, and other components can be cumbersome</font>. The LangChain Hub offers a centralized registry to manage and version your LLM artifacts efficiently. ([more](https://docs.smith.langchain.com/old/cookbook/hub-examples/retrieval-qa-chain))</b>

In [127]:
#from langchain.chat_models import ChatOpenAI
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA

llm = ChatOpenAI(model_name=llm_name, temperature=0)

In [128]:
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever())

In [130]:
result = qa_chain.invoke({"query": question})

In [131]:
print_markdown(result["result"])

To turn off the WiFi of the device, you can call the method `dut.wifi.disable()` as shown below:
```python
# Disable the WiFi
>>> dut.wifi.disable()
```
This will turn off the WiFi on the device.

### <b><font color='darkgreen'>Prompt</font></b>

In [132]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

In [133]:
# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT})

In [134]:
question = "How to get the Bluetooth MAC address of a device?"

In [135]:
result = qa_chain.invoke({"query": question})

In [136]:
print_markdown(result["result"])

You can get the Bluetooth MAC address by using the method `dut.bt.get_bluetooth_mac_address()`. Thanks for asking!

In [137]:
result["source_documents"][0]

Document(metadata={'source': 'docs/bt_utils_usages.md'}, page_content='# Get the device time and starting point to search crash afterword.\n>>> begin_device_time = dut.gm.device_time\n>>> begin_device_time\n\'05-22 16:44:00.000\'\n\n# No crash in the beginning.\n>>> dut.bt.crash_since(begin_device_time)\nCrashInfo(total_num_crash=0, collected_crash_times=[])\n\n# Let\'s use below command to force Bluetooth to crash\n# $ adb shell "pgrep -f keyword \'com.google.android.bluetooth\' | xargs kill -9"\n# Then we could obtain the crash record now:\n>>> dut.bt.crash_since(begin_device_time)\nCrashInfo(total_num_crash=1, collected_crash_times=[\'05-22 16:48:52.396\'])\n\n# If we don\'t feed in device time, it will search all crash record.\n>>> dut.bt.crash_since()\nCrashInfo(total_num_crash=1, collected_crash_times=[\'05-22 16:48:52.396\'])\n```\n\n### Get Bluetooth MAC address\nYou can retrieve the Bluetooth MAC address by method\n`dut.bt.get_bluetooth_mac_address`. e.g.:\n```python\n>>> dut.

### <b><font color='darkgreen'>RetrievalQA chain types</font></b>
![chain types](images/9.PNG)

* **Stuffs**: This method simply "stuffs" all the retrieved documents into the prompt for the language model.
    * Pros: Simple, potentially good for smaller documents or when the model can handle a lot of context.
    * Cons: Can lead to exceeding the model's token limit, potentially losing important information from longer documents.
* **map_reduce**: The language model processes each document individually to generate an answer. A separate language model prompt combines the individual answers into a final, overall answer.
    * Pros: Handles longer documents effectively by breaking them down.
    * Cons: Can be less accurate than other methods if the individual answers are not well-synthesized in the reduction step.
* **refine**: Starts with an initial answer based on the first document. Refines the answer iteratively by incorporating information from subsequent documents.
    * Pros: Good for situations where documents build upon each other or have a chronological order.
    * Cons: May be biased towards the initial document and might not be suitable if documents cover diverse aspects.
* **map_rerank**: The language model processes each document individually to generate an answer. The answers are re-ranked based on their relevance or quality. The highest-ranked answer is selected.
    * Pros: Can improve accuracy by focusing on the most relevant answers.
    * Cons: The re-ranking step adds complexity and may not always be effective.

In [138]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="map_reduce")

In [139]:
result = qa_chain_mr.invoke({"query": question})



In [140]:
print_markdown(result["result"])

To get the Bluetooth MAC address of a device, you can use the method `dut.bt.get_bluetooth_mac_address()` with the appropriate device object. Here is an example code snippet:

```python
>>> dut.bt.get_bluetooth_mac_address()
'D4:3A:2C:57:D4:21'
```

Alternatively, you can also retrieve the MAC address of a paired Bluetooth device by using the method `dut.bt.list_paired_devices(only_name=False)`. The MAC address will be under the `mac_addr` field in the output.

### <b><font color='darkgreen'>RetrievalQA limitations</font></b>
QA fails to preserve conversational history.

In [141]:
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever())

In [142]:
question = "How to turn on Bluetooth?"
result = qa_chain.invoke({"query": question})
print_markdown(result["result"])

To turn on Bluetooth on a device, you can call the method `dut.bt.enable()` as shown below:
```python
>>> dut.bt.enable()  # Enable the Bluetooth of the device.
>>> dut.bt.enabled   # Get the status of the device's Bluetooth.
True
```

In [143]:
question = "How to get the address of it?"
result = qa_chain.invoke({"query": question})
print_markdown(result["result"])

To retrieve the Bluetooth MAC address of a device, you can use the following method in the `bttc` library:

```python
>>> dut.bt.get_bluetooth_mac_address()
```

This command will return the Bluetooth MAC address of the device.

<b><font color='orange'>Note</font></b>, The LLM response varies. Some responses **do** include a reference to probability which might be gleaned from referenced documents. The point is simply that the model does not have access to past questions or answers, this will be covered in the next section.

## <b><font color='darkblue'>Chat</font></b>
([course link](https://learn.deeplearning.ai/courses/langchain-chat-with-your-data/lesson/7/chat)) <b><font size='3ptx'>Recall the overall workflow for retrieval augmented generation (RAG):</font></b>
![flow](images/8.PNG)

Below we are going to build a chat with memory:

In [145]:
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True)

In [146]:
from langchain.chains import ConversationalRetrievalChain

retriever=vectordb.as_retriever()
qa = ConversationalRetrievalChain.from_llm(
    llm,
    retriever=retriever,
    memory=memory)

In [151]:
question = "How to turn Wifi off?"
result = qa({"question": question})

In [152]:
print_markdown(result['answer'])

To turn off WiFi on your device, you can call the method `dut.wifi.disable()` as shown below:
```python
# Disable the WiFi
>>> dut.wifi.disable()
```
This command will disable the WiFi on your device.

In [153]:
question = "How to enable it again?"
result = qa({"question": question})

In [154]:
print_markdown(result['answer'])

To enable WiFi on your device, you can use the following steps:

1. Import the necessary packages:
```python
import bttc
from bttc import wifi_utils
```

2. Retrieve the DUT object and bind it to the WiFi utility:
```python
dut = bttc.get('36121FDJG000GR')
wifi_utils.bind(dut)
```

3. Enable the WiFi on your device:
```python
dut.wifi.enable()
```

By following these steps, you should be able to successfully enable WiFi on your device.

### <b><font color='darkgreen'>Create a chatbot that works on your documents</font></b>
Feel free to copy this code and modify it to add your own features. You can try alternate memory and retriever models by changing the configuration in `load_db` function and the `convchain` method. [Panel](https://panel.holoviz.org/) and [Param](https://param.holoviz.org/) have many useful features and widgets you can use to extend the GUI.


In [155]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.vectorstores import DocArrayInMemorySearch
from langchain.document_loaders import TextLoader
from langchain.chains import RetrievalQA,  ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import TextLoader
from langchain.document_loaders import PyPDFLoader

The chatbot code has been updated a bit since filming. The GUI appearance also varies depending on the platform it is running on.

In [166]:
def load_db(chain_type='stuff', k=10):
    # chain_type should be one of: ['stuff', 'map_reduce', 'refine', 'map_rerank']
    # load documents
    loaders = [
        # Duplicate documents on purpose - messy data
        TextLoader('docs/BTTC_README.md'),
        TextLoader('docs/bt_utils_usages.md'),
        TextLoader('docs/general_utils_usages.md'),
        TextLoader('docs/wifi_utils_usages.md'),
        TextLoader('docs/background_knowledge_bluetooth.md'),
    ]

    docs = []
    for loader in loaders:
        docs.extend(loader.load())

    # split documents
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
    docs = text_splitter.split_documents(docs)
    # define embedding
    embeddings = OpenAIEmbeddings()
    # create vector database from data
    db = DocArrayInMemorySearch.from_documents(docs, embeddings)
    # define retriever
    retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": k})
    # create a chatbot chain. Memory is managed externally.
    qa = ConversationalRetrievalChain.from_llm(
        llm=ChatOpenAI(model_name=llm_name, temperature=0), 
        chain_type=chain_type, 
        retriever=retriever, 
        return_source_documents=True,
        return_generated_question=True,
    )
    return qa 

In [179]:
qa = load_db()
chat_history = [] 

In [180]:
question1 = 'What is BTTC?'
result = qa.invoke({"question": question1, "chat_history": chat_history})
chat_history.extend([(question1, result["answer"])])
print_markdown(result['answer'])

BTTC stands for Bluetooth Test Common. It is a Python package that provides utilities for Bluetooth testing. The package simplifies access to Bluetooth diagnostics and allows users to perform various Bluetooth-related actions and retrieve information from Bluetooth-enabled devices. The package includes functionalities such as getting Bluetooth firmware versions, checking Bluetooth adapter states, verifying connected devices, and accessing Bluetooth service information.

In [181]:
question2 = 'Can you show a few usage examples of it?'
result = qa.invoke({"question": question2, "chat_history": chat_history})
chat_history.extend([(question2, result["answer"])])
print_markdown(result['answer'])

Sure! Here are some examples of how you can use BTTC (Bluetooth Test Common) in your Python scripts for Bluetooth testing:

1. **Retrieve Device Information:**
   - You can use BTTC to retrieve information about connected devices, such as Bluetooth adapter state, bonded devices, and more.

2. **Capture Connection Failure Reasons:**
   - BTTC allows you to capture the reason for connection failures when pairing with Bluetooth devices. This can help in diagnosing issues.

3. **Preview Bluetooth Information:**
   - You can use BTTC to preview Bluetooth information, such as the Bluetooth status, enabled state, address, and device name.

4. **Manage Device Properties:**
   - BTTC simplifies property management on devices through the `dut.gm.props` attribute, allowing you to get and set device properties easily.

5. **Disable Fast Pair:**
   - With BTTC, you can disable Fast Pair functionality on devices using simple commands like `dut.bt.set_fast_pair_halfsheet(False)` or `dut.bt.disable_fastpair()`.

6. **Dump Bugreport:**
   - BTTC provides functionality to dump bug reports from devices to specific directories with custom file names for further analysis.

These are just a few examples of how BTTC can be used to streamline Bluetooth testing and diagnostics in Python scripts.

In [174]:
import panel as pn
import param

pn.extension()

class cbfs(param.Parameterized):
    chat_history = param.List([])
    answer = param.String("")
    db_query  = param.String("")
    db_response = param.List([])
    
    def __init__(self,  **params):
        super(cbfs, self).__init__( **params)
        self.panels = []
        self.loaded_file = "docs/cs229_lectures/MachineLearning-Lecture01.pdf"
        self.qa = load_db("stuff", 4)
    
    def call_load_db(self, count):
        if count == 0 or file_input.value is None:  # init or no file specified :
            return pn.pane.Markdown(f"Loaded File: {self.loaded_file}")
        else:
            file_input.save("temp.pdf")  # local copy
            self.loaded_file = file_input.filename
            button_load.button_style="outline"
            self.qa = load_db("temp.pdf", "stuff", 4)
            button_load.button_style="solid"
        self.clr_history()
        return pn.pane.Markdown(f"Loaded File: {self.loaded_file}")

    def convchain(self, query):
        if not query:
            return pn.WidgetBox(pn.Row('User:', pn.pane.Markdown("", width=600)), scroll=True)
        result = self.qa({"question": query, "chat_history": self.chat_history})
        self.chat_history.extend([(query, result["answer"])])
        self.db_query = result["generated_question"]
        self.db_response = result["source_documents"]
        self.answer = result['answer'] 
        self.panels.extend([
            pn.Row('User:', pn.pane.Markdown(query, width=600)),
            pn.Row('ChatBot:', pn.pane.Markdown(self.answer, width=600, style={'background-color': '#F6F6F6'}))
        ])
        inp.value = ''  #clears loading indicator when cleared
        return pn.WidgetBox(*self.panels,scroll=True)

    @param.depends('db_query ', )
    def get_lquest(self):
        if not self.db_query :
            return pn.Column(
                pn.Row(pn.pane.Markdown(f"Last question to DB:", styles={'background-color': '#F6F6F6'})),
                pn.Row(pn.pane.Str("no DB accesses so far"))
            )
        return pn.Column(
            pn.Row(pn.pane.Markdown(f"DB query:", styles={'background-color': '#F6F6F6'})),
            pn.pane.Str(self.db_query )
        )

    @param.depends('db_response', )
    def get_sources(self):
        if not self.db_response:
            return 
        rlist=[pn.Row(pn.pane.Markdown(f"Result of DB lookup:", styles={'background-color': '#F6F6F6'}))]
        for doc in self.db_response:
            rlist.append(pn.Row(pn.pane.Str(doc)))
        return pn.WidgetBox(*rlist, width=600, scroll=True)

    @param.depends('convchain', 'clr_history') 
    def get_chats(self):
        if not self.chat_history:
            return pn.WidgetBox(pn.Row(pn.pane.Str("No History Yet")), width=600, scroll=True)
        rlist=[pn.Row(pn.pane.Markdown(f"Current Chat History variable", styles={'background-color': '#F6F6F6'}))]
        for exchange in self.chat_history:
            rlist.append(pn.Row(pn.pane.Str(exchange)))
        return pn.WidgetBox(*rlist, width=600, scroll=True)

    def clr_history(self,count=0):
        self.chat_history = []
        return 

In [175]:
cb = cbfs()

file_input = pn.widgets.FileInput(accept='.pdf')
button_load = pn.widgets.Button(name="Load DB", button_type='primary')
button_clearhistory = pn.widgets.Button(name="Clear History", button_type='warning')
button_clearhistory.on_click(cb.clr_history)
inp = pn.widgets.TextInput( placeholder='Enter text here…')

bound_button_load = pn.bind(cb.call_load_db, button_load.param.clicks)
conversation = pn.bind(cb.convchain, inp) 

jpg_pane = pn.pane.Image( './img/convchain.jpg')

tab1 = pn.Column(
    pn.Row(inp),
    pn.layout.Divider(),
    pn.panel(conversation,  loading_indicator=True, height=300),
    pn.layout.Divider(),
)
tab2= pn.Column(
    pn.panel(cb.get_lquest),
    pn.layout.Divider(),
    pn.panel(cb.get_sources ),
)
tab3= pn.Column(
    pn.panel(cb.get_chats),
    pn.layout.Divider(),
)
tab4=pn.Column(
    pn.Row( file_input, button_load, bound_button_load),
    pn.Row( button_clearhistory, pn.pane.Markdown("Clears chat history. Can use to start a new topic" )),
    pn.layout.Divider(),
    pn.Row(jpg_pane.clone(width=400))
)
dashboard = pn.Column(
    pn.Row(pn.pane.Markdown('# ChatWithYourData_Bot')),
    pn.Tabs(('Conversation', tab1), ('Database', tab2), ('Chat History', tab3),('Configure', tab4))
)
dashboard

## <b><font color='darkblue'>Supplement</font></b>
* [Coursera - LangChain Chat with Your Data](https://www.coursera.org/learn/langchain-chat-with-your-data-project/home/week/1)