# 🤖 GenAI Model Validation Workshop <a class='anchor' id='top'></a>

[Program](https://docs.google.com/document/d/1uqOlTim6czjeK16xXz4tXvYznTxkiy19moeoGNSynSY/edit?tab=t.0#heading=h.8c6jf2k12z8) | [GitHub](https://github.com/h2oai/h2o-genai-model-validation-training) | [Enterprise h2oGPTe](https://h2ogpte.h2oworld.h2o.ai/) | [EvalStudio](https://eval-studio.h2oworld.h2o.ai/)


## 📝 Outline <a class='anchor' id='outline'></a>
1. [Environment Preparation](#preparation)
2. [Embedding and Explainability](#embedding_explainability)
3. [Test Generation and Benchmarking](#test_gen)
4. [Eval Metrics and RAG](#eval_metrics)
5. [Human Evals](#human_evals)

## 🛠️ Environment Preparation <a class='anchor' id='preparation'></a> [↑](#top)

### Check compatibility your browser

Run the following cells - they will check compatibility of your browser and refresh the page.

*Technical Note: the Python kernel is not impacted by page refresh*

In [7]:
import ipywidgets as widgets

In [None]:
from IPython.display import display, Javascript
# Refresh the page only if a specific flag is not set
def refresh_page_once():
    display(Javascript("""
    if (!localStorage.getItem('pageRefreshed')) {
        localStorage.setItem('pageRefreshed', 'true');
        window.location.reload();
    } else {
        localStorage.removeItem('pageRefreshed');
    }
    """))
refresh_page_once()

In [9]:
widgets.Button(description="Your browser is OK", disabled=True, button_style='success')

Button(button_style='success', description='Your browser is OK', disabled=True, style=ButtonStyle())

### Get h2oGPTe API Key

1. Got to [h2oGPTe Settings](https://h2ogpte.h2oworld.h2o.ai/settings).
2. Generate new API key and copy the key.
3. Fill the key into text box below
4. Click on 'Generate config' button

In [10]:
def generate_env(h2ogpte_url):       
    h2ogpte_api_key = widgets.Text(placeholder='Enter your API key', description="🔑 h2oGPTe API Key", style={'description_width': 'initial'})
    # Create a button to confirm the input
    confirm_button = widgets.Button(description='Generate config', button_style='success', tooltip='Generate config',icon='check')

    def on_click(b):
        with open(".env", "w+") as f:
            f.write(f"H2OGPTE_API_KEY={h2ogpte_api_key.value}\n")
            f.write(f"H2OGPTE_URL='{h2ogpte_url}'\n")
            f.write(f"TOKENIZERS_PARALLELISM=false")
    confirm_button.on_click(on_click)

    display(h2ogpte_api_key, confirm_button)

generate_env("https://h2ogpte.h2oworld.h2o.ai")

Text(value='', description='🔑 h2oGPTe API Key', placeholder='Enter your API key', style=TextStyle(description_…

Button(button_style='success', description='Generate config', icon='check', style=ButtonStyle(), tooltip='Gene…

In [26]:
!cat .env

H2OGPTE_API_KEY=sk-d1SJGpNBH9HBqr8JElpUKPMGYd6bIyIP7CQPgkNZ96sNyRcl
H2OGPTE_URL='https://h2ogpte.h2oworld.h2o.ai'
TOKENIZERS_PARALLELISM=false

### 🐍 Prepare Python Environment [↑](#top)

In [12]:
# Supress Warnings
import warnings
warnings.filterwarnings("ignore")

# Load Environment Variables
from dotenv import load_dotenv

_ = load_dotenv()

In [None]:
# Python packages
from pathlib import Path

# Experiment
from h2o_mrm.experiment import Experiment

# Topic Modeling
from h2o_mrm.widgets import topic_model_widget

# Question Generation
from h2o_mrm.widgets.chunk_nav import create_qa_gen_widget
from h2o_mrm.widgets.chunk_nav.core import create_question_generator, create_summarizer

# Generated Question Evaluation
from h2o_mrm.widgets.aw_data_table import create_genqa_eval_widget

# RAG Models
from h2o_mrm.rag_models import H2OGPTERAG, H2ogpteConfig

In [16]:
CACHE_LOC="/tmp/home/jovyan/cache"
DOCS_LOC="/tmp/home/jovyan/docs"

# 1. Embedding and Explainability <a class='anchor' id='embedding_explainability'></a> [↑](#top)

The goal of experiment is to analyze document ["Comptroller’s Handbook: Model Risk Management"](https://www.occ.treas.gov/publications-and-resources/publications/comptrollers-handbook/files/model-risk-management/index-model-risk-management.html) in the context of RAG systems.

## Experiment

Experiment defines scope of work including documents and rag system under testing.

It does:
 - chunking of document using H2OGPTe chunking strategy.
 - embedding of chunks into vectors using given embedding model.
  

> ℹ️ Note: we pre-cached computed results to speed up the workshop
  

In [31]:
exp = Experiment( 
    "OCC Handbook", # Do not change name since it is used for cache look-ups to speed up computation.
    max_tokens_per_chunk=320,
    embedding_model_name="BAAI/bge-m3",
    cache_dir=CACHE_LOC,
)
exp.add_documents([f"{DOCS_LOC}/pub-ch-model-risk.pdf"])

In [33]:
exp


Name:            OCC Handbook
Docs:            ['/tmp/home/jovyan/docs/pub-ch-model-risk.pdf']
Embedding model: BAAI/bge-m3
Chunks:          0 (max tokens/chunk: 320)
Topics:


Local cache embeddings: /tmp/home/jovyan/cache/chromadb
Local cache collection: /tmp/home/jovyan/cache/database.db


### Create Chunks

Divide document into chunks of specified number of tokens.

In [34]:
# Create and Save Chunks
exp_chunks = exp.chunk_documents()

OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.


TODO: filter out some chunks that they are not used for topic modeling


#### Topic Modeling

In [None]:
exp.build_all_topic_models(
    n_neighbors=[40],
    n_components=[15],
    min_cluster_size=[5, 7, 9],
)

In [None]:
# List topic models for this experiment
exp.list_topic_models()

In [None]:
exp.set_best_topic_model()

In [None]:
from h2o_mrm.viz import create_topics_distribution_pie

create_topics_distribution_pie(exp.chunks, exp.topic_names)

In [None]:
from h2o_mrm.viz import create_chunk_distribution_map

create_chunk_distribution_map(exp.chunks, exp.topic_names)

In [None]:
tmw = topic_model_widget.create_widget(
    tm_config=exp.bertopic_model_config,
    create_topic_cluster_data=exp.build_topic_cluster_creator(
        show_doc_in_tooltip=True,
        show_topic_names=True,
    ),
    interactive=True,
)
tmw

In [None]:
# Change the huperparameters for the Topic Model manually to create a new custom config

# my_topic_model_id = exp.add_topic_model(tmw.topic_model_config, name="my_topic_model_1")
# exp.set_topic_model(my_topic_model_id)

In [None]:
exp.get_num_chunks_in_topic_chart()

# 2. Test Generation and Benchmarking <a class='anchor' id='test_gen'></a> [↑](#top)

- Automatic Prompt engineering
- Automatic QA generation


In [None]:
llama_summerizer = create_summarizer(
    model_type="h2ogpte",
    model_name="meta-llama/Meta-Llama-3.1-70B-Instruct",
)
llama_question_generator = create_question_generator(
    model_type="h2ogpte",
    model_name="meta-llama/Meta-Llama-3.1-70B-Instruct",
)

#### Interactive Question Generation

In [None]:
question_gen_widget = create_qa_gen_widget(
    exp.chunks,
    fig_data=exp.fig_data,
    summarize_text=llama_summerizer,
    generate_questions=llama_question_generator,
)
question_gen_widget

#### Automatic Question Generation

In [None]:
# exp.generate_questions(
#     topics=[
#         2,
#     ],
#     summarizer=llama_summerizer,
#     question_generator=llama_question_generator,
#     question_generator_name="Meta-Llama-3.1-70B-Instruct",
#     sampling_method="twinning",
# )

In [None]:
generated_questions = exp.list_generated_questions()
print(len(generated_questions))
for x in generated_questions[:5]:
    print(x)

#### Evaluate Generated Questions

In [None]:
exp.validate_generated_questions()

#### Load Validated Questions in a Widget

In [None]:
validated_questions = exp.get_validated_questions()
genq_eval_widget = create_genqa_eval_widget(validated_questions)
genq_eval_widget

# 3. Eval Metrics and RAG <a class='anchor' id='eval_metrics'></a> [↑](#top)

#### Metrics

- [X] Groundedness
- [X] Context Recall
- [X] Context Precision
- [X] Recall Relevancy
- [X] Precision Relevancy
- [X] Answer Relevancy



#### Get Answers from RAG

In [None]:
rag_name = "h2ogpte.dev.h2o.ai"
rag_version = "1.6.0-dev28"
llm_name = "meta-llama/Meta-Llama-3.1-70B-Instruct"
llm_args = dict(
    temperature=0.0,
    seed=42,
    max_new_tokens=4096,
)

In [None]:
rag_under_test_id = exp.register_rag_under_test(
    rag_name=rag_name,
    rag_version=rag_version,
    llm_name=llm_name,
    llm_args=llm_args,
    embedding_model_name="BAAI/bge-m3",
)
rag_under_test_id

In [None]:
%set_env H2OGPTE_API_KEY="sk-12ydJa9ujkjrI4wdIXBw4UcfYxmnQemJvsiUYr6uQLk7xu5H"

In [None]:
rag_collection_name = "OCC Handbook 3"
config = H2ogpteConfig.from_env()
rag = H2OGPTERAG(config, rag_collection_name, llm_name, llm_args)

In [None]:
rag.add_documents([Path("./pub-ch-model-risk.pdf")])

In [None]:
exp.get_answers_from_rag(
    rag_under_test_id=rag_under_test_id,
    answer_question=rag.answer_question,
)

In [None]:
exp.add_rag_chunks(rag_under_test_id, rag.get_all_chunks)

In [None]:
exp.evaluate_answers(rag_under_test_id)

In [None]:
exp.plot_metrics(rag_under_test_id)

# 4. Human Evaluation <a class='anchor' id='human_evals'></a> [↑](#top)