# Content Design for RAG
This notebook is part of a collection of material related to content design principles for retrieval-augmented generation (RAG).

You can explore the complete collection here: [Content Design for RAG on GitHub](https://github.com/spackows/ICAAI-2024_RAG-CD/blob/main/README.md)

## Improving RAG results using information typing
This notebook demonstrates how to improve RAG results by rewriting knowledge base content using information type topic-based writing.

### Contents

**Implementation 1** ( 1/6 questions correct)
1. Original knowledge base content, feature-based writing style
2. Basic chunking at Markdown heading levels
3. Search for the most relevant chunk using the Chroma vector database
4. Ground an LLM prompt in the most relevant chunk

**Implementation 2** ( 4/6 questions correct)
- Rewrite knowledge base content using topic-based writing

**Implementation 3** ( 5/6 questions correct)
- Update search to use parent document retrieval

**Implementation 4** ( 6/6 questions correct)
1. Classify question type
2. Filter search to filter by information type

## Implementation 1
- 1.1 Original knowledge base content, feature-based writing style
- 1.2 Basic chunking at Markdown heading levels
- 1.3 Search for the most relevant chunk using the Chroma vector database
- 1.4 Ground an LLM prompt in the most relevant chunk

### 1.1 Original knowledge base content, feature-based writing style

In [1]:
article_00 = """
# carbonWrite 9000
Congratulations on purchasing the carbonWrite 9000!


## Introduction
Once you have sharpened the end, you can use the pencil to write and draw on a variety of surfaces.


## Features
The carbonWrite 9000 has many state-of-the art features for writing with different line widths, writing in the dark, and erasing what you've written.

### Variable line widths
When the tip is dull, lines will be thick.  When the end is sharp, lines will be thin. 

### Built-in lighting
The carbonWrite 9000 has an on-board light for writing in the dark.

### State-of-the art rubOut(TM) feature
If you purchased the optional rubOut eraser feature, you can erase previous pencil output!

### Voice interface
You can submit administrative requests to the carbonWrite 9000 using voice commands.


## Administration
The carbonWrite 9000 battery has two modes:
- High performance, for faster response times and brighter light
- Long life, to extend the battery life as long as possible

### Command syntax
battery_config [ performance | longevity ]
"""

In [2]:
g_articles_org = {
    "00_carbonWrite-9000" : { "txt" : article_00 }
}

In [3]:
def saveArticleFiles( dir_name, articles_json ):
    for article_id in articles_json.keys():
        file_name = article_id + ".txt"
        txt = articles_json[ article_id ]["txt"]
        with open( dir_name + "/" + file_name, "w" ) as f:
            f.write( txt )

In [None]:
!mkdir content_org

In [5]:
saveArticleFiles( "content_org", g_articles_org )

### 1.2 Basic chunking at Markdown heading levels

See: https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/markdown_header_metadata

In [7]:
!pip install langchain_community | tail -n 1



In [8]:
!pip install langchain_text_splitters | tail -n 1



In [9]:
!pip install unstructured | tail -n 1



In [10]:
from langchain_community.document_loaders import DirectoryLoader
from langchain_text_splitters import MarkdownHeaderTextSplitter

def chunkContent( dir_name ):
    docs_chunks = []
    docs_loader = DirectoryLoader( dir_name )
    docs_arr = docs_loader.load()
    text_splitter = MarkdownHeaderTextSplitter( [ ( "#", "Header 1" ), ( "##", "Header 2" ), ( "###", "Header 3" ), ( "####", "Header 4" ) ], strip_headers=False )
    for doc in docs_arr:
        chunks = text_splitter.split_text( doc.page_content )
        for chunk in chunks:
            chunk.metadata["source"] = doc.metadata["source"]
        docs_chunks.extend( chunks )
    return docs_chunks

In [11]:
g_chunks_org = chunkContent( "content_org" )

In [12]:
for i in range( len( g_chunks_org ) ):
    print( g_chunks_org[i] )
    print( "\n" )

page_content='# carbonWrite 9000 Congratulations on purchasing the carbonWrite 9000!  
## Introduction Once you have sharpened the end, you can use the pencil to write and draw on a variety of surfaces.' metadata={'Header 1': 'carbonWrite 9000 Congratulations on purchasing the carbonWrite 9000!', 'Header 2': 'Introduction Once you have sharpened the end, you can use the pencil to write and draw on a variety of surfaces.', 'source': 'content_org/00_carbonWrite-9000.txt'}


page_content='## Features The carbonWrite 9000 has many state-of-the art features for writing with different line widths, writing in the dark, and erasing what you've written.  
### Variable line widths When the tip is dull, lines will be thick.  When the end is sharp, lines will be thin.' metadata={'Header 1': 'carbonWrite 9000 Congratulations on purchasing the carbonWrite 9000!', 'Header 2': "Features The carbonWrite 9000 has many state-of-the art features for writing with different line widths, writing in the dark,

### 1.3 Search for the most relevant chunk using the Chroma vector database

See: https://python.langchain.com/v0.1/docs/integrations/vectorstores/chroma

In [13]:
!pip install langchain_chroma | tail -n 1



In [14]:
import re
import chromadb
from langchain_chroma import Chroma

def createDocsMetadata( chunks, articles_json ):
    ids_arr = []
    txt_arr = []
    metadata_arr = []
    current_source = ""
    chunk_counter = 0
    for chunk in chunks:
        txt = chunk.page_content
        source = chunk.metadata["source"]
        topic_id = re.sub( r"^.*[\\\/]", "", source )
        topic_id = re.sub( r"\.txt$", "", topic_id )
        title = re.sub( r"[\-_]", " ", topic_id )
        content_type = articles_json[ topic_id ]["content_type"] if ( "content_type" in articles_json[ topic_id ] ) else ""
        if( source != current_source ):
            current_source = source
            chunk_counter = 0
        num_str = str( chunk_counter )
        if( chunk_counter < 10 ):
            num_str = "0" + num_str
        id = source + "_" + num_str
        ids_arr.append( id )
        txt_arr.append( txt )
        metadata_arr.append( { "source"       : source, 
                               "topic_id"     : topic_id, 
                               "title"        : title, 
                               "content_type" : content_type, 
                               "chunk_num"    : num_str  } )
        chunk_counter += 1
    return ids_arr, txt_arr, metadata_arr

def createSimilarityRetriever( chunks_arr, articles_json, chroma_client, collection_name ):
    ids_arr, txt_arr, metadata_arr = createDocsMetadata( chunks_arr, articles_json )
    collection = chroma_client.create_collection( collection_name )
    collection.add( ids=ids_arr, documents=txt_arr, metadatas = metadata_arr )
    return collection

In [15]:
g_chroma_client = chromadb.Client()

In [16]:
g_similarity_db_org = createSimilarityRetriever( g_chunks_org, g_articles_org, g_chroma_client, "collection_org2" )

In [17]:
def searchArticles( similarity_db, question_txt, content_type_filter=None ):

    raw_search_results = []
    if content_type_filter is not None:
        raw_search_results = similarity_db.query( query_texts = [ question_txt ], where={ 'content_type': { '$eq': content_type_filter } }, n_results = 2 )
    if ( len( raw_search_results ) < 1 ):
        raw_search_results = similarity_db.query( query_texts = [ question_txt ], n_results = 2 )
    
    num_results = len( raw_search_results["distances"][0] ) if ( ( "distances" in raw_search_results ) and ( len( raw_search_results["distances"] ) > 0 ) ) else 0
    
    search_results = []
    for i in range( num_results ):

        score        = raw_search_results["distances"][0][i]
        file_name    = raw_search_results["metadatas"][0][i]["source"]
        topic_id     = raw_search_results["metadatas"][0][i]["topic_id"]
        title        = raw_search_results["metadatas"][0][i]["title"]
        content_type = raw_search_results["metadatas"][0][i]["content_type"]
        chunk_num    = raw_search_results["metadatas"][0][i]["chunk_num"]
        txt          = raw_search_results["documents"][0][i]
        
        search_results.append( { "search_diff"  : round( 100 * score ) / 100, 
                                 "file_name"    : file_name,
                                 "topic_id"     : topic_id,
                                 "title"        : title,
                                 "content_type" : content_type,
                                 "chunk_num"    : chunk_num,
                                 "chunk"        : txt } )
    
    return search_results

In [18]:
question_txt = "What features does the carbonWrite 9000 have?"

searchArticles( g_similarity_db_org, question_txt )

[{'search_diff': 0.83,
  'file_name': 'content_org/00_carbonWrite-9000.txt',
  'topic_id': '00_carbonWrite-9000',
  'title': '00 carbonWrite 9000',
  'content_type': '',
  'chunk_num': '02',
  'chunk': '### Built-in lighting The carbonWrite 9000 has an on-board light for writing in the dark.'},
 {'search_diff': 0.84,
  'file_name': 'content_org/00_carbonWrite-9000.txt',
  'topic_id': '00_carbonWrite-9000',
  'title': '00 carbonWrite 9000',
  'content_type': '',
  'chunk_num': '05',
  'chunk': '## Administration The carbonWrite 9000 battery has two modes: - High performance, for faster response times and brighter light - Long life, to extend the battery life as long as possible  \n### Command syntax battery_config [ performance | longevity ]'}]

## 1.4 Ground an LLM prompt in the most relevant chunk

See: [IBM watsonx.ai Python library](https://ibm.github.io/watsonx-ai-python-sdk)

### Prerequisites
Before you can prompt a foundation model in watsonx.ai, you must perform the following setup tasks:
1.4.1 Create an instance of the Watson Machine Learning service
1.4.2 Associate the Watson Machine Learning instance with the current project
1.4.3 Create an IBM Cloud API key
1.4.4 Look up the current project ID

### 1.4.1 Create an instance of the Watson Machine Learning service
If you don't already have an instance of the IBM Watson Machine Learning service, you can create an instance of the service from the IBM Cloud catalog: [Watson Machine Learning service](https://cloud.ibm.com/catalog/services/watson-machine-learning)

### 1.4.2 Associate an instance of the Watson Machine Learning service with the current project
The current project is the project in which you are running this notebook.

If an instance of Watson Machine Learning is not already associated with the current project, follow the instructions in this topic to do so: [Adding associated services to a project](https://dataplatform.cloud.ibm.com/docs/content/wsj/getting-started/assoc-services.html?context=wx&audience=wdp)

### 1.4.3 Create an IBM Cloud API key
Create an IBM Cloud API key by following these instruction: [Creating an IBM Cloud API key](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui#create_user_key)

Then paste your new IBM Cloud API key in the code cell below

In [19]:
cloud_apikey = ""

g_credentials = { 
    "url"    : "https://us-south.ml.cloud.ibm.com", 
    "apikey" : ""
}

### 1.4.4 Look up the current project ID
The current project is the project in which you are running this notebook. You can get the ID of the current project programmatically by running the following cell.

In [20]:
import os

g_project_id = os.environ["PROJECT_ID"]

Now prompt a model to answer the questions ...

In [21]:
g_qa_prompt_template = """
Article:
------
%s
------

Answer the following question using only information from the article. 
Answer in a complete sentence, with proper capitalization and punctuation. 
If there is no good answer in the article, say "I don't know".

Question: %s
Answer: 
"""

In [22]:
from ibm_watson_machine_learning.foundation_models import Model

g_model_id = "google/flan-t5-xxl"

g_qa_prompt_parameters = {
    "decoding_method" : "greedy",
    "min_new_tokens"  : 0,
    "max_new_tokens"  : 300
}

g_qa_model = Model( g_model_id, g_credentials, g_qa_prompt_parameters, g_project_id )

In [23]:
import json

def promptLLM( prompt_txt, model, b_debug=False ):
    raw_response = model.generate( prompt_txt )
    if b_debug:
        print( "prompt_txt:\n'" + prompt_txt + "'\n" )
        print( "raw_response:\n" + json.dumps( raw_response, indent=3 ) )
    if ( "results" in raw_response ) \
       and ( len( raw_response["results"] ) > 0 ) \
       and ( "generated_text" in raw_response["results"][0] ):
        output = raw_response["results"][0]["generated_text"]
        return output
    else:
        return ""

def answerQuestion( chunk_txt, question_txt, b_debug=False ):
    prompt_txt = g_qa_prompt_template % ( chunk_txt, question_txt )
    generated_output = promptLLM( prompt_txt, g_qa_model, b_debug )
    return generated_output

In [24]:
question_txt = "What features does the carbonWrite 9000 have?"

relevant_chunks = searchArticles( g_similarity_db_org, question_txt )
print( "relevant_chunks:\n" + json.dumps( relevant_chunks, indent=3 ) + "\n" )

answerQuestion( relevant_chunks[0]["chunk"], question_txt, True )

relevant_chunks:
[
   {
      "search_diff": 0.83,
      "file_name": "content_org/00_carbonWrite-9000.txt",
      "topic_id": "00_carbonWrite-9000",
      "title": "00 carbonWrite 9000",
      "content_type": "",
      "chunk_num": "02",
      "chunk": "### Built-in lighting The carbonWrite 9000 has an on-board light for writing in the dark."
   },
   {
      "search_diff": 0.84,
      "file_name": "content_org/00_carbonWrite-9000.txt",
      "topic_id": "00_carbonWrite-9000",
      "title": "00 carbonWrite 9000",
      "content_type": "",
      "chunk_num": "05",
      "chunk": "## Administration The carbonWrite 9000 battery has two modes: - High performance, for faster response times and brighter light - Long life, to extend the battery life as long as possible  \n### Command syntax battery_config [ performance | longevity ]"
   }
]

prompt_txt:
'
Article:
------
### Built-in lighting The carbonWrite 9000 has an on-board light for writing in the dark.
------

Answer the following qu

'Built-in lighting'

In [25]:
def answerQuestions( questions, similarity_db, b_debug=False ):
    generated_answers = []
    for question_txt in questions:
        relevant_chunks = searchArticles( similarity_db, question_txt )
        chunk_txt = relevant_chunks[0]["chunk"]
        answer_txt = answerQuestion( chunk_txt, question_txt, b_debug )
        generated_answers.append( answer_txt )
    return generated_answers

In [26]:
g_questions = [
    "What is carbonWrite 9000?",
    "What features does the carbonWrite 9000 have?",
    "Can I write on cardboard?",
    "How can I erase what I wrote?",
    "How can I make my battery last longer?",
    "I'm having trouble writing because the end is dull. What can I do?"
]

In [27]:
g_generated_answers_org = answerQuestions( g_questions, g_similarity_db_org )

In [28]:
for i in range( len( g_questions ) ):
    print( g_questions[i] + "\n" + g_generated_answers_org[i] + "\n" )

What is carbonWrite 9000?
I don't know

What features does the carbonWrite 9000 have?
Built-in lighting

Can I write on cardboard?
Yes.

How can I erase what I wrote?
rubOut eraser feature

How can I make my battery last longer?
The carbonWrite 9000 battery has two modes: - High performance, for faster response times and brighter light - Long life, to extend the battery life as long as possible ### Command syntax battery_config [ performance | longevity ] ------

I'm having trouble writing because the end is dull. What can I do?
I don't know



# Implementation 2
Rewrite knowledge base content using information-typed, topic-based writing

In [29]:
article_01 = """
# carbonWrite 9000
The carbonWrite 9000 is a pencil.  

## Features
The carbonWrite 9000 has many state-of-the art features:
- The ability to produce different line widths
- An on-board light for writing in the dark, or in low light
- The rubOut feature for erasing what you've written
- A voice interface for administering your carbonWrite 9000
"""

In [30]:
article_02 = """
# Writing surfaces
You can write and draw on a variety of surfaces with the carbonWrite 9000.

Supported writing surfaces include:
- Paper
- Cardboard
- Wood
"""

In [31]:
article_03 = """
# Sharpening your carbonWrite 9000
If you need more of the carbon core to be sticking out at the writing 
end of your carbonWrite 9000, or if you want to write or draw thinner 
lines, you can sharpen your carbonWrite 9000.

## Lengthening the carbon writing tip
If the carbon sticking out at the writing end of your carbonWrite 
9000 gets too short, you can expose more of the carbon by unwinding 
the material that surrounds the inner core:
1. Grasp the tail of the white string near the carbon tip.
2. Gently pull on the string, unwinding the material around the 
circumfrence of the pencil.
3. Once the desired amount of carbon is exposed, use scissors 
to cut the trailing string and any material attatched to it.

## Shaping the carbon writing tip to a narrower point
If the tip of the carbon is dull or if the lines the pencil makes 
are too thick, then you can sharpen the tip in one of two ways:
- Rub the sides of the carbon tip on any rough surface to sharpen it to a 
narrower point.
- Use a sharp knife to whittle the carbon writing tip to a narrower point.
"""

In [32]:
article_04 = """
# Writing in the dark, or in low light
The carbonWrite 9000 has an on-board light for writing in the dark.

You do not need to take any manual steps to use the light.  When ambient 
lighting gets below the hard-coded threshold, the lightbulb illuminates 
automatically.
"""

In [33]:
article_05 = """
# Erasing what you wrote or drew
If you purchased the optional rubOut eraser feature, you can erase 
previous pencil output:
1. Lift the pencil from the page
2. Invert the pencil and place the rubOut eraser on the page
3. Rub the eraser over the lines you want to erase until the 
lines are gone
"""

In [34]:
article_06 = """
# Managing battery life
The carbonWrite 9000 battery has two modes:
- High performance, for brighter light
- Long life, to extend the battery life as long as possible

## Option 1: Voice interface
You can set the battery mode by speaking your requested mode 
to the carbonWrite 9000 voice interface.

## Option 2: System command line
You can also set the battery mode by calling the battery_config 
command from a system command line.
- To get the most life from your battery, call "battery_config 
longevity"
- To get the brightest light, at the expense of shorter batter 
life, call "battery_config performance"
"""

In [35]:
article_07 = """
# battery_config command
To configure the carbonWrite 9000 battery mode, call 
battery_config from a system command line.

## Syntax
battery_config [ performance | longevity ]

### Example 1: Configuring for performance
battery_config performance

### Example 2: Configuring for long battery life
battery_config longevity

"""

In [36]:
g_articles_new = {
    "01_carbonWrite-9000" : { "txt" : article_01, "content_type" : "concept" },
    "02_Writing-surfaces" : { "txt" : article_02, "content_type" : "concept" },
    "03_Sharpening-your-carbonWrite-9000" : { "txt" : article_03, "content_type" : "task" },
    "04_Writing-in-the-dark"    : { "txt" : article_04, "content_type" : "task" },
    "05_Erasing-what-you-wrote" : { "txt" : article_05, "content_type" : "task" },
    "06_Managing-battery-life"  : { "txt" : article_06, "content_type" : "task" },
    "07_battery_config-command" : { "txt" : article_07, "content_type" : "reference" }
}

In [None]:
!mkdir content_new

In [38]:
saveArticleFiles( "content_new", g_articles_new )

In [39]:
g_chunks_new = chunkContent( "content_new" )

In [None]:
for i in range( len( g_chunks_new ) ):
    print( g_chunks_new[i] )
    print( "\n" )

In [41]:
g_similarity_db_new = createSimilarityRetriever( g_chunks_new, g_articles_new, g_chroma_client, "collection_new" )

In [42]:
g_generated_answers_new = answerQuestions( g_questions, g_similarity_db_new )

In [43]:
for i in range( len( g_questions ) ):
    print( g_questions[i] + "\n" + g_generated_answers_new[i] + "\n" )

What is carbonWrite 9000?
A pencil.

What features does the carbonWrite 9000 have?
The ability to produce different line widths, An on-board light for writing in the dark, or in low light, The rubOut feature for erasing what you've written, A voice interface for administering your carbonWrite 9000

Can I write on cardboard?
Yes

How can I erase what I wrote?
If you purchased the optional rubOut eraser feature, you can erase previous pencil output: 1. Lift the pencil from the page 2. Invert the pencil and place the rubOut eraser on the page 3. Rub the eraser over the lines you want to erase until the lines are gone

How can I make my battery last longer?
Set the battery mode to "Long life".

I'm having trouble writing because the end is dull. What can I do?
I don't know



Notice the answer to the fifth question is incomplete, because the command is not listed.

In [44]:
question_txt = "How can I make my battery last longer?"

relevant_chunks = searchArticles( g_similarity_db_new, question_txt )
print( "relevant_chunks:\n" + json.dumps( relevant_chunks, indent=3 ) + "\n" )

answerQuestion( relevant_chunks[0]["chunk"], question_txt, True )

relevant_chunks:
[
   {
      "search_diff": 0.95,
      "file_name": "content_new/06_Managing-battery-life.txt",
      "topic_id": "06_Managing-battery-life",
      "title": "06 Managing battery life",
      "content_type": "task",
      "chunk_num": "00",
      "chunk": "# Managing battery life The carbonWrite 9000 battery has two modes: - High performance, for brighter light - Long life, to extend the battery life as long as possible  \n## Option 1: Voice interface You can set the battery mode by speaking your requested mode to the carbonWrite 9000 voice interface."
   },
   {
      "search_diff": 0.96,
      "file_name": "content_new/06_Managing-battery-life.txt",
      "topic_id": "06_Managing-battery-life",
      "title": "06 Managing battery life",
      "content_type": "task",
      "chunk_num": "01",
      "chunk": "## Option 2: System command line You can also set the battery mode by calling the battery_config command from a system command line. - To get the most life from yo

'Set the battery mode to "Long life".'

Notice the answer to the sixth question is still "I don't know".

In [45]:
question_txt = "I'm having trouble writing because the end is dull. What can I do?"

relevant_chunks = searchArticles( g_similarity_db_new, question_txt )
print( "relevant_chunks:\n" + json.dumps( relevant_chunks, indent=3 ) + "\n" )

answerQuestion( relevant_chunks[0]["chunk"], question_txt, True )

relevant_chunks:
[
   {
      "search_diff": 1.33,
      "file_name": "content_new/02_Writing-surfaces.txt",
      "topic_id": "02_Writing-surfaces",
      "title": "02 Writing surfaces",
      "content_type": "concept",
      "chunk_num": "00",
      "chunk": "# Writing surfaces You can write and draw on a variety of surfaces with the carbonWrite 9000.  \nSupported writing surfaces include:  \nPaper  \nCardboard  \nWood"
   },
   {
      "search_diff": 1.35,
      "file_name": "content_new/03_Sharpening-your-carbonWrite-9000.txt",
      "topic_id": "03_Sharpening-your-carbonWrite-9000",
      "title": "03 Sharpening your carbonWrite 9000",
      "content_type": "task",
      "chunk_num": "01",
      "chunk": "## Shaping the carbon writing tip to a narrower point If the tip of the carbon is dull or if the lines the pencil makes are too thick, then you can sharpen the tip in one of two ways: - Rub the sides of the carbon tip on any rough surface to sharpen it to a narrower point. - Use 

"I don't know"

## Implementation 3
Update search to use parent document retrieval

In [46]:
def getParentTopicTxt( chunk, articles_json ):
    file_name = chunk["file_name"]
    file_name_base = re.sub( r"^.*[\\\/]", "", file_name )
    file_name_base = re.sub( r".txt$", "", file_name_base )
    for article_id in articles_json.keys():
        if article_id == file_name_base:
            return articles_json[ article_id ]["txt"].strip()
    return None

In [47]:
question_txt = "How can I make my battery last longer?"

relevant_chunks = searchArticles( g_similarity_db_new, question_txt )
parent_topic_txt = getParentTopicTxt( relevant_chunks[0], g_articles_new )
answerQuestion( parent_topic_txt, question_txt, True )

prompt_txt:
'
Article:
------
# Managing battery life
The carbonWrite 9000 battery has two modes:
- High performance, for brighter light
- Long life, to extend the battery life as long as possible

## Option 1: Voice interface
You can set the battery mode by speaking your requested mode 
to the carbonWrite 9000 voice interface.

## Option 2: System command line
You can also set the battery mode by calling the battery_config 
command from a system command line.
- To get the most life from your battery, call "battery_config 
longevity"
- To get the brightest light, at the expense of shorter batter 
life, call "battery_config performance"
------

Answer the following question using only information from the article. 
Answer in a complete sentence, with proper capitalization and punctuation. 
If there is no good answer in the article, say "I don't know".

Question: How can I make my battery last longer?
Answer: 
'

raw_response:
{
   "model_id": "google/flan-t5-xxl",
   "created_at": "2025

'To get the most life from your battery, call "battery_config longevity".'

## Implementation 4

### 4.1 Classify question type

In [48]:
g_classify_prompt_template = """Classify the given question into one of the following classes: what-is, how-to, syntax

Class: what-is
Description: The question is asking for factual details or for an explanation of a concept or idea

Class: how-to
Description: The question is asking for instructions on how to do something

Examples:

User input: How heavy is the pencil
what-is

User input: What is the command for turning on the light?
syntax

User input: What's the best way to hold it?
how-to

User input:Can I write upside down?
what-is

User input: How do you recharge the battery?
how-to

User input: Tell me the command syntax for reconfiguring?
syntax

User input: What kinds of writing are supported?
what-is

User input: %s
"""

In [49]:
g_classify_prompt_parameters = {
    "decoding_method" : "greedy",
    "min_new_tokens"  : 0,
    "max_new_tokens"  : 20
}

g_classify_model = Model( g_model_id, g_credentials, g_classify_prompt_parameters, g_project_id )

def classifyQuestion( question_txt, b_debug=False ):
    prompt_txt = g_classify_prompt_template % question_txt
    generated_output = promptLLM( prompt_txt, g_classify_model, b_debug )
    return generated_output

In [50]:
question_txt = "What is carbonWrite 9000?"

classifyQuestion( question_txt, True )

prompt_txt:
'Classify the given question into one of the following classes: what-is, how-to, syntax

Class: what-is
Description: The question is asking for factual details or for an explanation of a concept or idea

Class: how-to
Description: The question is asking for instructions on how to do something

Examples:

User input: How heavy is the pencil
what-is

User input: What is the command for turning on the light?
syntax

User input: What's the best way to hold it?
how-to

User input:Can I write upside down?
what-is

User input: How do you recharge the battery?
how-to

User input: Tell me the command syntax for reconfiguring?
syntax

User input: What kinds of writing are supported?
what-is

User input: What is carbonWrite 9000?
'

raw_response:
{
   "model_id": "google/flan-t5-xxl",
   "created_at": "2025-03-01T04:38:08.730Z",
   "results": [
      {
         "generated_text": "what-is",
         "generated_token_count": 4,
         "input_token_count": 171,
         "stop_reason": 

'what-is'

In [51]:
for question_txt in g_questions:
    print( question_txt )
    class_name = classifyQuestion( question_txt )
    print( class_name + "\n" )

What is carbonWrite 9000?
what-is

What features does the carbonWrite 9000 have?
what-is

Can I write on cardboard?
what-is

How can I erase what I wrote?
how-to

How can I make my battery last longer?
how-to

I'm having trouble writing because the end is dull. What can I do?
how-to



### 4.2 Filter search to filter by information type

In [52]:
def answerQuestions2( questions, similarity_db, articles_json, b_debug=False ):
    generated_answers = []
    for question_txt in questions:
        class_name = classifyQuestion( question_txt )
        topic_type = "concept"
        if( "command" == class_name ):
            topic_type = "reference"
        elif( "how-to" == class_name ):
            topic_type = "task"
        relevant_chunks = searchArticles( similarity_db, question_txt, topic_type )
        if b_debug:
            print( question_txt )
            print( "class: " + class_name )
            print( "topic_type: " + topic_type )
            print( "relevant_chunks:\n" )
            print( relevant_chunks )
        parent_topic_txt = getParentTopicTxt( relevant_chunks[0], articles_json )
        answer_txt = answerQuestion( parent_topic_txt, question_txt, b_debug )
        generated_answers.append( answer_txt )
    return generated_answers

In [54]:
g_generated_answers_final = answerQuestions2( g_questions, g_similarity_db_new, g_articles_new, False )

In [81]:
import pandas as pd
from IPython.core.display import HTML

df = pd.DataFrame( { "Questions": g_questions, "Original answers": g_generated_answers_org, "Final answers": g_generated_answers_final } )

styles = [ 
    dict( selector="th", props=[("text-align","left"),] ),
    dict( selector="td", props=[("text-align","left"),("vertical-align","top"),("padding","10px"),("width","300px")] ) 
]

HTML( ( df.style.set_table_styles( styles ) ).hide().to_html(index=False) )

Questions,Original answers,Final answers
What is carbonWrite 9000?,I don't know,A pencil.
What features does the carbonWrite 9000 have?,Built-in lighting,"The ability to produce different line widths, An on-board light for writing in the dark, or in low light, The rubOut feature for erasing what you've written, A voice interface for administering your carbonWrite 9000"
Can I write on cardboard?,Yes.,Yes.
How can I erase what I wrote?,rubOut eraser feature,"If you purchased the optional rubOut eraser feature, you can erase previous pencil output: 1. Lift the pencil from the page 2. Invert the pencil and place the rubOut eraser on the page 3. Rub the eraser over the lines you want to erase until the lines are gone"
How can I make my battery last longer?,"The carbonWrite 9000 battery has two modes: - High performance, for faster response times and brighter light - Long life, to extend the battery life as long as possible ### Command syntax battery_config [ performance | longevity ] ------","To get the most life from your battery, call ""battery_config longevity""."
I'm having trouble writing because the end is dull. What can I do?,I don't know,Sharpen the end.
