# RAG Function Log Analysis and Feedback Evaluation with Topic Modeling

Use this notebook to gain insights into user's feedback.<br><br>
This notebook implements log data retrieval, data preparation, topic modeling and correlation analysis.<br>
It forms a basis for feedback analysis and is intended to be enhanced by more functions.


## Contents

This notebook contains the following parts:
- [Pre-Requisite Libraries and Dependencies](#Pre-Requisite-Libraries-and-Dependencies)
- [Connecting to Vector Database](#connect)
- [Import Log Records](#Import-Log-Records)
- [Map Feedback to Percentage](#Map-Feedback-to-Percentage)
- [Rating Distribution](#Rating-Distribution)
- [Topic Modeling](#Topic-Modeling)
- [Topic Labels](#Topic-Labels)
- [Score by Topic](#Score-by-Topic)
- [Score by Response Length](#Score-by-Response-Length)
- [Document Search Score by Topic](#Document-Search-Score-by-Topic)
- [Feedback on Answer by Topic](#Feedback-on-Answer-by-Topic)

## Topic Modeling

Topic modeling is a key feature of this analysis notebook. It is an unsupervised learning technique that clusters a set of text documents into groups by detecting common word and phrase pattern. Each group is represented by a topic, which is a set of keywords that appear to be relevant in the belonging documents. Depending on its keywords affinity, a topic might appear explicit or somewhat abstract.
Two models for topic modelling are supported:
* **Watson NLP**: A hierarchical topic model that supports a lot of tuning options, in particular stop word optimization.
* **Top2Vec**: An algorithm that leverages document and word embeddings to find topic vectors. It has few tuning options and does not require stop word removal. Its recommended to use this notebook with an appropriate environment ("custom-top2vec-template") template and update runtime accordingly.
* **BERTopic**: A model based on transformers and c-TF-IDF with few tuning options but very good out-of-the-box performance.

Watson NLP is used if parameter `topic_modeling_method` is set to `watson_nlp` or not set at all AND Python module `watson_nlp` is available. On IBM Cloud, this notebook must run in an appropriate environment ("NLP + DO runtime...").<br>
Top2Vec is used if parameter `topic_modeling_method` is set to `top2vec`<br>
BERTopic is used if parameter `topic_modeling_method` is set to `bertopic`


### Pre-Requisite Libraries and Dependencies
Download and import mandatory libraries and dependencies. 

Note : Some of the versions of the libraries may throw warnings after installation. These library versions are crucial for successful execution of the accelerator. Please ignore the warning/error and proceed with your execution. 

In [None]:
!pip install elasticsearch==8.17.2 | tail -n 1
!pip install wordcloud | tail -n 1
!pip install pymilvus==2.5.11 | tail -n 1
!pip install 'torch>=2.3.0' | tail -n 1

Restart the kernel after performing the pip install if the below cell fails to import all the libraries.

In [None]:
from elasticsearch import Elasticsearch, helpers
from wordcloud import WordCloud
import os
import copy
import numpy as np
import pandas as pd
import math
import re
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
from datetime import datetime, timedelta
from matplotlib.colors import hsv_to_rgb, TABLEAU_COLORS

from pymilvus import(connections,Collection,utility)
from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai import Credentials
from ibm_watsonx_ai.foundation_models.prompts import PromptTemplate, PromptTemplateManager
from ibm_watsonx_ai.foundation_models.utils.enums import PromptTemplateFormats
from ibm_watsonx_ai.foundation_models import ModelInference, get_supported_tasks


In [None]:
project_id=os.environ['PROJECT_ID']
# Environment and host url
hostname = os.environ['RUNTIME_ENV_APSX_URL']

if hostname.endswith("cloud.ibm.com") == True:
    environment = "cloud"
    runtime_region = os.environ["RUNTIME_ENV_REGION"]
else:
    environment = "on-prem"
    from ibm_watson_studio_lib import access_project_or_space
    wslib = access_project_or_space()

### Import Parameter Set, Credentials and Helper Functions

In [None]:
try:
    filename = 'rag_helper_functions.py'
    wslib.download_file(filename)
    import rag_helper_functions
    print("rag_helper_functions imported from the project assets")
except NameError as e:
    print(str(e))
    print("If running watsonx.ai aaS on IBM Cloud, check that the first cell in the notebook contains a project token. If not, select the vertical ellipsis button from the notebook toolbar and `insert project token`. Also check that you have specified your ibm_api_key in the second code cell of the notebook")


In [None]:
parameter_sets = ["RAG_parameter_set","RAG_advanced_parameter_set"]

parameters=rag_helper_functions.get_parameter_sets(wslib, parameter_sets)

In [None]:
ibm_api_key=parameters['watsonx_ai_api_key']
space_uid = parameters['watsonx_ai_space_id']

if environment == "cloud":
    WML_SERVICE_URL=f"https://{runtime_region}.ml.cloud.ibm.com" 
    wml_credentials = Credentials(api_key=parameters['watsonx_ai_api_key'], url=WML_SERVICE_URL)
else:
    token = os.environ['USER_ACCESS_TOKEN']
    wml_credentials=Credentials(token=os.environ['USER_ACCESS_TOKEN'],url=hostname,instance_id='openshift')

### Set Watsonx.ai client
Below cell uses the watson machine learning credentials to create an API client to interact with the project and deployment space. 

In [None]:
client = APIClient(wml_credentials)
client.set.default_project(project_id=project_id)

## Connecting to Vector Database
#### Connecting using Project Connection Asset (default)
The notebook, by default, will look for a connection asset in the project named `milvus_connect` or `elasticsearch_connect` or `datastax_connect`.  You can set this up by following the instructions in the project readme. 
This code checks if a specified connection exists in the project. If found, it retrieves the connection details and identifies the connection type. Depending on the connection type, it establishes a connection to the appropriate database. If the connection is not found, it raises an error indicating the absence of the specified connection in the project.

**Note** Datastax is not supported in this cloud version.

In [None]:
if not 'log_index_name' in parameters or parameters['log_index_name'] == '':
    print(f"Log index is not specified.")
    raise
else:
    log_index_name = parameters['log_index_name']
    log_connection_name=parameters["log_connection_asset"]
    if(next((conn for conn in wslib.list_connections() if conn['name'] == log_connection_name), None)):
        print(log_connection_name, "Log connection found in the project")
        log_db_connection = wslib.get_connection(log_connection_name)
        print("Successfully retrieved the log connection details")
        log_connection_datatypesource_id=client.connections.get_details(log_db_connection['.']['asset_id'])['entity']['datasource_type']
        log_connection_type = client.connections.get_datasource_type_details_by_id(log_connection_datatypesource_id)['entity']['name']
        log_client = None
        print("Log Connection type is identified as:",log_connection_name)
    try:
        if log_connection_type == 'elasticsearch':
            log_client = rag_helper_functions.create_and_check_elastic_client(log_db_connection, parameters['elastic_search_model_id'])
        elif log_connection_type == "milvus" or log_connection_type == "milvuswxd":
            milvus_credentials=rag_helper_functions.connect_to_milvus_database(log_db_connection, parameters)  
        elif log_connection_type == "datastax":
            if environment == "cloud":
                raise ValueError(f"ERROR! we don't support datastax connection for Cloud as of now")
            datastax_session,datastax_cluster = rag_helper_functions.connect_to_datastax(log_db_connection, parameters)
    except:
        print(f"Cannot connect to databased for index {log_index_name}.")
        raise

## Import Log Records

Read log records from Elasticsearch index or Milvus Collection or Datastax Collection based on the log connection type. Use query to filter retrieved records.\
By default, data is retrieved for last 30 days. If custom date range is provided, then it will be used for filtered data retrieval.

Date should be in **`mm/dd/yyyy`** format and valid date range should be provided.

In [None]:
# For custom date range filtering, specify both dates in mm/dd/yyyy format.
start_dt = ''
end_dt = ''

In [None]:
query_es = {}
query_mil = ''
try:
    date_curr = datetime.now()

    if end_dt == '':
        date_en = datetime.now()
    else:
        date_en = datetime.strptime(end_dt, '%m/%d/%Y')
        if date_en > date_curr:
            date_en = date_curr
        
    if start_dt == '':
        date_st = date_en - timedelta(30)
    else:
        date_st = datetime.strptime(start_dt, '%m/%d/%Y')

    if date_st > date_curr:
        raise Exception('Invalid start date provided! It cannot be ahead of time.') 

    if date_st > date_en:
        raise Exception('Invalid date range provided!')

    print(f"Will retrieve log records from {date_st} to {date_en}")


    # Elasticsearch query
    query_es = {"bool": {"must": [ {"range": {"log_timestamp": {"gte": date_st.isoformat(), "lte": date_en.isoformat()}}} ] } }
    
    # Milvus query
    query_mil = ''
    delta = timedelta(days=1)
    while date_st <= date_en:
        dt = date_st.isoformat()
        if len(query_mil) > 0:
            query_mil += ' or '
        query_mil += f"log_timestamp like '{dt[:11]}%'"
        date_st += delta
        
except ValueError as e:
    print(f"Error: Invalid date provided: {e}. Please provide valid dates to proceed!")
except Exception as e:
    print(f"Error: {e} Please provide a valid date range to proceed!")

In [None]:
if query_es == {} and query_mil == '':
    raise Exception(f"Query not generated!. Please provide a valid date range to proceed!!")
else:       
    try:
        response_data = []
        if log_connection_type == 'elasticsearch':
            log_query = {"query": query_es, "_source": ["question", "response", "feedback", "source_documents.score"]}
            log_response = helpers.scan(
                log_client, 
                index = log_index_name,
                query = log_query
                )
            response_data = [dict(item)['_source'] for item in log_response]
            if len(response_data) == 0:
                # workaround in case helpers.scan function is not working: search
                print(f"Could not 'scan' data, trying 'search'.")
                log_response = log_client.search(
                    size = 10000,
                    index = log_index_name, 
                    body = log_query
                )
                response_data = [dict(item)['_source'] for item in log_response['hits']['hits']]
        elif log_connection_type == "milvus" or log_connection_type == "milvuswxd":
            collection = Collection(name=log_index_name)
            log_response = collection.query(expr=query_mil, output_fields=["question", "response", "feedback", "source_documents"])
            if len(log_response)>=1:
                for item in log_response:
                    source_docs = item.pop('source_documents')
                    source_doc_score = []
                    for doc in source_docs:
                        source_doc_score.append({'score': doc['score']})
                    item.update({'source_documents': source_doc_score})
                    response_data.append(copy.deepcopy(item))
        elif log_connection_type == "datastax":
            keyspace=log_db_connection.get('keyspace')
            #select_log_query = datastax_session.prepare(f"SELECT * FROM {keyspace}.{log_index_name} WHERE log_timestamp >= ? and log_timestamp <= ? ALLOW FILTERING")
            select_query = datastax_session.prepare(f"SELECT * FROM {keyspace}.{log_index_name} WHERE feedback !='' ALLOW FILTERING")
            results=datastax_session.execute(select_query)
            
            for row in results:
                question=row.question
                expert_details=row.expert_details
                import ast
                source_docs = ast.literal_eval(row.source_documents)
                source_doc_score = []
                for doc in source_docs:
                    source_doc_score.append({'score': doc['score']})
                    row_item={  
                            "question": row.question, 
                            "response": row.response, 
                            "feedback": ast.literal_eval(row.feedback), 
                            "source_documents": source_doc_score
                    }
                response_data.append(row_item)
            
    except:
        print(f"Cannot read data from index {log_index_name}.")
        raise
        
    data = pd.DataFrame().from_dict(response_data)
    
    if len(data)==0:
        raise Exception('No Feedback records found in the log index! Please index feedback records & re run this notebook')
    else:
        print(f"{str(len(data))} log records read.")

    data['source_documents'] = data['source_documents'].apply(lambda x: x if type(x)==list else [])
    data['feedback'] = data['feedback'].fillna('') if 'feedback' in data.columns else ''
    
    if len(data)<25:
        raise Exception('Volume of log entries is too low to run the analysis in this notebook! Please index more feedback records and re-run this notebook')
        
    if len(data)<50:
        print("Warning: Volume of log entries is low! Some parts of the notebook may fail or provide inconsistent results due to low volume of data.")
        
        

## Map Feedback to Percentage
Map the user rating to a percentage by linear interpolation. **Best rating maps to 100%, worst rating maps to 0%. If no feedback is given, rating percentage is -1.**

In [None]:
# Thumb up, thumb down rating
#supported_ratings = ['positive', 'negative']

# 5 rating options
supported_ratings = ['perfect', 'good', 'ok', 'bad', 'very bad']

In [None]:
rating_count = len(supported_ratings)
rating_percentage = {supported_ratings[i]: int(100.5 - i * (100 / (rating_count-1))) for i in range(rating_count)} | {'n/a': -1}
rating_options = list(rating_percentage.keys())
rating_percentage

In [None]:
data['eval_rating_percent'] = data['feedback'].map(lambda val: -1 if not isinstance(val, dict) or len(val)==0 or not 'value' in val else 
    int(val['value']) if val['value'].isnumeric() else 
    rating_percentage[val['value']] if val['value'] in rating_options else
    -1 )

In [None]:
data['eval_rating'] = data['eval_rating_percent'].map(lambda val: \
    rating_options[len([v for v in [rating_percentage[rating] for rating in rating_percentage.keys()] if v > val])])

## Colors
Calculate colors for ratings using HSV (hue, saturation, value) format. Loop on hue from 33% (green) to 0% (red) and on value from 0.6 to 1 to 0.6 (sinus curve).<br>
A color is calculated for each rating option, including lightgray for "no feedback".

In [None]:
rating_color = [hsv_to_rgb((0.33 * (x/(rating_count-1)), 1, 0.6 + 0.4 * math.sin((x/(rating_count-1)) * math.pi))) for x in range(rating_count-1, -1, -1)] + ["lightgray"]


## Rating Distribution
Number of user rates by rating option.<br>
Shows the rating distribution, "n/a" for records without feedback (often the largest number). 

In [None]:
distribution = data['eval_rating'].value_counts()
distribution = {key: distribution[key] if key in distribution.keys() else 0 for key in rating_options}
plt.bar(distribution.keys(), distribution.values(), color = rating_color)
plt.show()


## Topic Modeling
Topic modeling can be done on user questions and/or on RAG function responses.<br>
Topic modeling on user questions might be less valuable due to shorter texts and deviate questions.<br>
Topic modeling on RAG function responses only cannot find topics that are asked for by the user but not part of the content database.<br>
**Depending on the number of documents, the topic modeling procedure may take some time.**

In [None]:
# Topic modeling on user questions
evaluation_field = 'question'

# Topic modeling on RAG function responses
#evaluation_field = 'response'

In [None]:
# Determine topic modeling method. Possible values are 'watson_nlp', 'top2vec', 'bertopic'
topic_modeling_methods_supported = ['watson_nlp', 'top2vec', 'bertopic']
topic_modeling_method = topic_modeling_methods_supported[0] if 'topic_modeling_method' not in parameters else parameters['topic_modeling_method'].lower()
if not topic_modeling_method in topic_modeling_methods_supported:
    raise Exception(f"Topic modeling method '{topic_modeling_method}' not supported")

### Watson NLP
Topic detection with Watson NLP. To optimize detected topic, add words to `stopwords` that appear in your log records on various subjects and are therefore not beneficial for topic separation. 

In [None]:
if not topic_modeling_method == topic_modeling_methods_supported[0]:
    print(f"Skipped ... Topic modeling method is '{topic_modeling_method}'.")
else:
    try:
        import watson_nlp
        from watson_nlp.toolkit.summary_utils import NGramSummary
        from watson_nlp.blocks.topics import HierarchicalClustering
        from watson_nlp import data_model as dm
    except:
        topic_modeling_method = topic_modeling_methods_supported[1]
        print(f"Watson NLP not used: Watson NLP not available in runtime.\nSwitching to topic detection method {topic_modeling_method}.")

if topic_modeling_method == topic_modeling_methods_supported[0]:

    eval_df = pd.DataFrame(data[evaluation_field])
    train_file = './TMP_train_data.csv'
    eval_df.to_csv(train_file)

    # load the syntax model
    syntax_model = watson_nlp.load('syntax_izumo_en_stock')

    csv_stream = dm.DataStream.from_csv(train_file, skip=1)
    syntax_data = syntax_model.stream(csv_stream[1])

    wnlp_stopwords = watson_nlp.load('text_stopwords_classification_ensemble_en_stock').stopwords
    stopwords = list(wnlp_stopwords)
    stopwords.extend(['create','add','delete','select','find','click','go','app','allow','type','based','base','provided','answer','question','following','information','difference'])
    # words that do not separate topics in actual business context, adjust to meet your needs
    #stopwords.extend(['transaction','report','program','code','reference','business','document','data','date'])

    summary_model = NGramSummary.train(train_data=syntax_data, train_params={
        'min_words_per_utterance': 5,
        'num_turns_to_remove': 0,
        'beginning_ratio': 1,
        'beginning_weighting_factor': 5,
        'min_ngram_size': 2,
        'max_ngram_size': 3,
        'max_ngrams': 50,
        'stopwords': list(stopwords)
    })

    topic_model = HierarchicalClustering.train(train_data=syntax_data, summary_model=summary_model, train_params = {
        'king_cluster_min_ratio': 1.5, 
        'min_records_per_king_cluster': 10 + int(len(eval_df) / 50),
        'num_topics_per_iteration': 10, 
        'max_num_iters_per_model': 4, 
        'max_ngrams_per_topic': 10
    })

    topics_list = sorted(topic_model.model.to_json_summary()['clusters'], key=lambda t: t['numDocuments'], reverse=True)

    keywords_list = {}
    for _t in topics_list:
        keywords = {s.split(',')[0].split('(')[0].strip(): float(s.split(',')[-1]) for s in _t['modelWords']}
        if _t['topicName'] in keywords_list:
            keywords_list[_t['topicName']] = keywords_list[_t['topicName']] | keywords
        else:
            keywords_list[_t['topicName']] = keywords
        
    topicnames = {i: name for i, name in enumerate(keywords_list.keys())} | {-1: 'unknown'}
    topicnames_reverse = {topicnames[key]: key for key in topicnames.keys()}

    def extract(item):
        return topicnames_reverse[item.topics[0].name] if len(item.topics) > 0 and item.topics[0].name in topicnames_reverse else -1
    data['eval_topic'] = data[evaluation_field].map(lambda r: extract(topic_model.run(syntax_model.run(r.strip()))))
    
    print(f"{str(len(topicnames)-1)} topics detected.")


### Top2Vec
Top2Vec finds topic vectors that are jointly embedded with the document and word vectors with distance between them representing semantic similarity. It does not require stopword lists, stemming or lemmatization. Also, it automatically finds the number of topics. See also https://github.com/ddangelov/Top2Vec.

In [None]:
if not topic_modeling_method == topic_modeling_methods_supported[1]:
    print(f"Skipped ... Topic modeling method is '{topic_modeling_method}'.")
else:
    keywords_list, topicnames = dict(), dict()
    try:
        !pip install scikit-learn==1.6.1 | tail -n 1
        !pip install -U top2vec==1.0.36 | tail -n 1
        try:
            from top2vec import top2vec
        except ImportError as e:
            raise ImportError(f"{str(e)}\n Please Restart kernel and re-run this notebook to continue. Or use `custom_top2vec_template` runtime for this notebook when this topic modelling method is selected!")

        # perform model detection
        documents = data[evaluation_field].tolist()
        top2vec_config = {
            'min_count': 1,
            'embedding_model': 'all-MiniLM-L6-v2',
            'umap_args': {
                'n_neighbors': 10, 
                'n_components': 2, 
                'min_dist': 0.1, 
                'random_state': 42
            },
            'hdbscan_args': {
                'min_cluster_size': 25,
                'min_samples': 5
            },
            'contextual_top2vec': True
        }
        topic_model = top2vec.Top2Vec(documents=documents, **top2vec_config)

        # get topic detection results
        topic_words, word_scores, topic_nums = topic_model.get_topics(num_topics=None)

        # compile topic names from 4 most relevant keywords
        topicnames = {_i: '_'.join([x for i,x in enumerate(_keywords) if i < 4]) for _i, _keywords in enumerate(topic_words)} | {-1: 'unknown'}
        topicnames_reverse = {topicnames[key]: key for key in topicnames.keys()}

        # compile keywords lists (including scores)
        keywords_list = {topicnames[_i]: {_word_score[0]: _word_score[1] for _word_score in zip(topic_words[_i], word_scores[_i])} for _i in topicnames.keys() if _i>= 0}

        # assign topic to log records
        # if topic score is lower than 0.18 topic is considered 'unkown'
        topic_relevance = topic_model.get_document_topic_relevance()
        doc_topic_id = topic_relevance.argmax(axis=1)
        data['eval_topic'] = [doc_topic_id[_i] if topic_relevance[_i,doc_topic_id[_i]] >= 0.18 else -1 for _i in range(len(doc_topic_id))]

        # final message
        print(f"{str(len(topicnames)-1)} topics detected.")

    except Exception as e:
        if 'at least one array to concatenate' in str(e):
            raise Exception('Not enough documents to find more than 1 topic.')
        else:
            raise e

### BERTopic
BERTopic creates clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions. Find information on BERTopic at https://maartengr.github.io/BERTopic/index.html. The below cell detects stopwords, generates keywords (without stopwords) & topic names.

In [None]:
if not topic_modeling_method == topic_modeling_methods_supported[2]:
    print(f"Skipped ... Topic modeling method is '{topic_modeling_method}'.")
else:
    keywords_list, topicnames = dict(), dict()
    try:
        !pip install --upgrade bertopic | tail -n 1

        from bertopic import BERTopic

        docs = data[evaluation_field].astype(str).values.tolist()
        if len(docs) == 0:
            raise Exception('Data is very low!')
        elif len(docs) < 1000:
            topic_model = BERTopic()
        else:
            topic_model = BERTopic(nr_topics='auto')

        topic_model.fit(docs)
        topics, probs = topic_model.fit_transform(docs)

        topic_count = len(topic_model.get_topic_info())

        if topic_count < 1 or (topic_count == 1 and topic_model.get_topic(-1)):
            raise Exception("Topic modeling failed: No topic found.")
        if not topic_model.get_topic(-1):
            print(f"Warning: Detected topics might be poor.")
        else:
            print(f"{str(topic_count-1)} topics detected.")

        # Detected worlds that are not relevant for topic separation ("stopwords")
        stopwords = [w[0] for w in topic_model.get_topic(-1)] if topic_model.get_topic(-1) else []

        # Generate list of keywords (without stopwords) and frequencies
        keywords_list = {"_".join([y for i,y in enumerate([x for x in r[1] if x not in stopwords]) if i < 4]): {w[0]: w[1] for w in topic_model.get_topic(r[0]) if w[0] not in stopwords} \
                         for r in topic_model.get_topic_info().loc[:, ['Topic','Representation']].values if r[0] >= 0}

        # Find duplicate topics
        topic_tuples = [(num, k) for num, k in enumerate(keywords_list.keys())]
        topic_tuples.sort(key = lambda t: t[1])
        prev_keyword = '#'
        prev_keyword_num = -1
        duplicate_map = {}
        for _t in topic_tuples:
            if not _t[1] == prev_keyword:
                prev_keyword = _t[1]
                prev_keyword_num = _t[0]
            else:
                duplicate_map[_t[0]] = prev_keyword_num
                print(f"Duplicate topic pruned: {prev_keyword}") 

        # Remove duplicate topics
        keywords_list = {k: keywords_list[k] for num, k in enumerate(keywords_list.keys()) if not num in duplicate_map.keys()}
        topics = [t if not t in duplicate_map else duplicate_map[t] for t in topics]

        # Generate topic name from top 4 keywords
        topicnames = {num: k for num, k in enumerate(keywords_list.keys())} | {-1: 'unknown'}

        # Append topic assignment to data frame
        data['eval_topic'] = topics

    except TypeError as e:
        print('BERTopic Error: Data is low')
        
    except Exception as e:
        print('Error:', e)

## Topic Labels
Use GenAI to generate topic labels from keywords and questions.

In [None]:
# for each topic, collect all questions assigned to it
df1 = data[['eval_topic', 'question']]
df_grouped = df1.groupby('eval_topic', as_index=False).agg({'question': list})

In [None]:
# get prompt template manager instance
prompt_template_manager = PromptTemplateManager(credentials=wml_credentials, project_id=project_id)

# get prompt asset for topic labelling
df_p = prompt_template_manager.list()
prompt_id = prompt_id = df_p[df_p['NAME'].str.contains("topic_labeling", na=False, case=False)]['ID'].iloc[0]
prompt = prompt_template_manager.load_prompt(prompt_id)

In [None]:
topics = df_grouped.to_dict(orient='records')
model = ModelInference(
    model_id=prompt.model_id,
    params=prompt.model_params,
    credentials=wml_credentials,
    project_id=project_id
)
topiclabels = {}
for _topic in topics:
    if _topic['eval_topic'] == -1:
        label = 'Others'
    else:
        variables = {
            'questions': '\n'.join(_topic['question']),
            'keywords': '"' + '"\n"'.join(keywords_list[topicnames[_topic['eval_topic']]].keys()) + '"'
        }
        input_text = prompt_template_manager.load_prompt(prompt_id, PromptTemplateFormats.STRING, prompt_variables=variables)
        generated_text = model.generate(prompt=input_text)['results'][0]['generated_text']
        matches = re.search('Label\s*:\s*([^\\n]*)', generated_text, flags=re.IGNORECASE)
        if matches == None or not matches.group(0):
            matches = re.search('\s*([^\\n]*)', generated_text, flags=re.IGNORECASE)
        try:
            label = (matches.group(1) if matches.group(1) else matches.group(0)).strip('[](){}" \'')
            if label == '' or label.startswith('\n'):
                label = topicnames[_topic['eval_topic']]
        except:
            print(f"Warning: Label not found in '{generated_text}'")
            label = topicnames[_topic['eval_topic']]
    topiclabels[_topic['eval_topic']] = label.strip()
    print(f"Topic {_topic['eval_topic']:d}: {label}")

## Topic Keywords
For top 10 topics, plot keywords as wordcloud to visualize topic subject. Use this visualization to perceive the essence of the detected topics.

In [None]:
def plot_wordcloud_top10_topics(labels, keywords_list):
    colors = [color for name, color in TABLEAU_COLORS.items()]
    cloud = WordCloud(background_color='white', width=400, height=400, max_words=8, color_func=lambda *args, **kwargs: colors[i], prefer_horizontal=1.0)
    count = min(10,len(keywords_list))
    if count < 1:
        return
    
    # create and fill 2 x 5 grid of subplots with 10 word clouds
    figure, axes = plt.subplots(nrows=2, ncols=5, figsize=(25,10), dpi=100, sharex=True, sharey=True)
    for i, topic in enumerate(zip(labels, keywords_list.keys())):
        if i >= count:
            break
        figure.add_subplot(axes[divmod(i,5)])
        topic_words = keywords_list[topic[1]]
        cloud.generate_from_frequencies(topic_words, max_font_size=72)
        plt.gca().imshow(cloud)
        plt.gca().set_title(f"{str(topic[0])}: {labels[topic[0]]}", fontdict=dict(size=16))
        
    # in case there are less than 10 topics, remove redundant subplots
    for i in range(count,10):
        axes[divmod(i,5)].remove()

    # adjust margins, remove ticks and show plot
    plt.subplots_adjust(wspace=0, hspace=0)
    plt.margins(x=0, y=0)
    plt.tight_layout()
    plt.xticks([])
    plt.yticks([])
    plt.show()

In [None]:
plot_wordcloud_top10_topics({_id: _label if len(_label) < 33 else _label[:30]+'...' for _id, _label in topiclabels.items() if _id>=0}, keywords_list)

## Score by Topic
Find correlation between rating and topic.<br>
Poor rating on a topic indicates that content on that topic is bad or missing.

In [None]:
def displayName(title):
    return ' '.join([_t[0].upper() + (_t[1:] if len(_t) > 1 else '') for _t in title.replace('eval_','').replace('_', ' ').strip().split()]) 

In [None]:
def correlation_matrix(eval_column, row_order=None, index_mapper=None, totals=True):
    name = eval_column.replace('eval_','').replace('_', ' ')
    if len(data)>0:
        correlation_df = data.groupby([eval_column, 'eval_rating']).size().unstack(fill_value=0).reindex(index=row_order, columns=supported_ratings)
        if index_mapper:
             correlation_df.index = correlation_df.index.map(index_mapper)
        if totals:
            correlation_df.loc['Total'] = correlation_df.sum(numeric_only=True)

        title = f"Rating by {displayName(name)}"
        display(correlation_df)
        correlation_df.plot.bar(rot=90, stacked=True, color=rating_color, title=title, legend=True, xlabel='')

        title = f"\nRating by {displayName(name)} [Percentages]"
        correlation_df_percentage = correlation_df.div(correlation_df.sum(axis=1), axis=0).mul(100).round(1) 
        print(title)
        display(correlation_df_percentage)
        correlation_df_percentage.plot.bar(rot=90, stacked=True, color=rating_color, title=title, legend=False, xlabel='')

In [None]:
def score(eval_column, value_column='eval_rating_percent', rename=None, reindex=None):
    if len(data)>0:
        score_df = data[[eval_column, value_column]].groupby([eval_column]).mean()
        if rename:
            score_df = score_df.rename(index=rename)
        if reindex:
            score_df = score_df.reindex(reindex)
        score_df.plot.barh(title=f"{displayName(value_column)} by {displayName(eval_column)}", legend=False, xlabel='', ylabel='') 

In [None]:
try:
    correlation_matrix('eval_topic', None, topiclabels, False)
except Exception as e:
    raise Exception('Error: Topics not generated, may be due to insufficient logs. Check topic modelling', e)

In [None]:
try:
    score('eval_topic', rename=topiclabels)
except Exception as e:
    raise Exception('Error: Topics not generated, may be due to insufficient logs. Check topic modelling', e)

## Score by Response Length
Find correlation between rating and response length.<br>
Based on the result, model parameters can be adjusted to generate more or less tokens, respectively.

In [None]:
# Number of bins for histogram
number_of_bins = 10

In [None]:
# Count words in response
data['eval_words'] = data['response'].map(lambda txt: len(txt.split()))

In [None]:
# Calculate bin ranges and labels
hist, bins_raw = np.histogram(data['eval_words'], number_of_bins)
bins = [int(b+0.5) for b in bins_raw]
bins_labels = ['<'+str(bins[1])] + [str(bins[i])+'-'+str(bins[i+1]-1) for i in range(1,len(bins)-2)] + ['>='+str(bins[len(bins)-2])]

In [None]:
data['eval_words_bins'] = data['eval_words'].map(lambda x: bins_labels[len([v for v in bins[:-2] if v < x])])

In [None]:
_ = plt.hist(data['eval_words'], bins=bins)
plt.title("Response Length [Words]")
plt.show()

In [None]:
df = correlation_matrix('eval_words_bins', bins_labels, None, False)

In [None]:
score('eval_words_bins', reindex=bins_labels)

## Document Search Score by Topic
List document search score by topic. The document search score is the maximum search score over all source documents in a log record. Search score by topic is the average document search score over all queries or responses on that topic.<br>
A poor score indicates insufficient coverage of the corresponding topic within the content database.

In [None]:
data['source_documents'] = data['source_documents'].apply(lambda x: x if len(x)>=1 else np.nan)

if 'source_documents' in data:
    data['eval_search_score'] = data['source_documents'].map(lambda _scores: max([_s['score'] for _s in _scores]), na_action='ignore')
else:
    print(f"Document scores not available!")
    data['eval_search_score'] = 0

In [None]:
try:
    score('eval_topic', value_column='eval_search_score', rename=topiclabels)
except Exception as e:
    raise Exception('Error: Topics not generated, may be due to insufficient logs. Check topic modelling', e)

## Feedback on Answer by Topic
Comments that users have provided for the worst rated answers. These might give insights on document quality and completeness as well as on the user's expectations.

In [None]:
# number of comments per topic to be displayed
number_of_comments_per_topic = 5

# only comments with rating value below this threshold are displayed
rating_value_threshold = 50

In [None]:
html = '<table><tr><th style=\"vertical-align: top;text-align: left;\">Topic</th><th style=\"vertical-align: top;text-align: left;\">Comment</th><th style=\"vertical-align: top;text-align: left;\">Question</th></tr>'
for _topic_no in topiclabels.keys():
    feedback_raw = data[data['eval_topic'] == _topic_no][['question', 'feedback', 'eval_rating_percent']]
    feedback_dat = pd.DataFrame(data={'question': feedback_raw['question'], 'comment': feedback_raw['feedback'].str['comment'], 'value': feedback_raw['eval_rating_percent']})
    feedback_df  = feedback_dat[((feedback_dat['value'] < rating_value_threshold) & (feedback_dat['value'] >= 0)) & (feedback_dat['comment'].values != None) & (feedback_dat['comment'].values != '')].sort_values(by=['value'])
    comment_count = min(feedback_df.shape[0], number_of_comments_per_topic)
    html = html + f"<tr style=\"border-top: 1px solid;\"><td style=\"vertical-align: top;text-align: left;\" rowspan=\"{comment_count:1d}\">{topiclabels[_topic_no]}</td>" + \
                "<tr>".join(['<td style=\"vertical-align: top;text-align: left;\">'+_row[0]+'</td><td style=\"vertical-align: top;text-align: left;\">'+_row[1]+'</td></tr>' for _row in feedback_df[:comment_count][['comment', 'question']].values])
html = html + '</table>'

In [None]:
from IPython.display import display, HTML
display(HTML(html))

**Note** It's recommended to close the datastax session once your steps are ran for this notebook for optimal performance. once you execute this cell existing datastax connections are closed. if have to re run above code cells you have to create new connection for datastax by re running cells from `Connect to Vector Database`

In [None]:
if log_connection_type=="datastax" and environment != "cloud":
    if not datastax_session.is_shutdown:
        datastax_session.shutdown()
        print(f"datastax_session got shutdown : {datastax_session.is_shutdown}")
    if not datastax_cluster.is_shutdown:
        datastax_cluster.shutdown()
        print(f"datastax_cluster got shutdown : {datastax_cluster.is_shutdown}")

**Sample Materials, provided under license.</a> <br>
Licensed Materials - Property of IBM. <br>
Â© Copyright IBM Corp. 2024, 2025. All Rights Reserved. <br>
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. <br>**