# LLM Evaluation with MLflow example

This notebook demonstrates how to evaluate various LLMs and RAG systems with MLflow, leveraging simple metrics such as perplexity and toxicity, as well as LLM-judged metrics such as relevance, and even custom LLM-judged metrics such as professionalism.

For details about how to use `mlflow.evaluate()`, refer to Evaluate LLMs with MLflow ([AWS](https://docs.databricks.com/en/mlflow/llm-evaluate.html)|[Azure](https://learn.microsoft.com/azure/databricks/mlflow/llm-evaluate)).

## Requirements
 
To use the MLflow LLM evaluation feature, you must use MLflow flavor 2.8.0 or above.


If your cluster is running Databricks Runtime, uncomment and run the following cell to install the `mlflow` library. This is required for Databricks Runtime clusters only. If you are using a cluster running Databricks Runtime ML, skip to Set OpenAI Key step.

In [0]:
# If you are running Databricks Runtime, uncomment this line and run this cell:
#%pip install mlflow

Import the required libraries.

In [0]:
import os
import openai
import pandas as pd

import mlflow

## Check yout MLflow version
mlflow.__version__

'2.9.2'

## Set OpenAI Key

In [0]:
os.environ["OPENAI_API_KEY"] = dbutils.secrets.get(scope="", key="")
os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_VERSION"] = ""
os.environ["OPENAI_API_BASE"] = ""
os.environ["OPENAI_DEPLOYMENT_NAME"] = ""

## Basic Question-Answering Evaluation

Create a test case of `inputs` that is passed into the model and `ground_truth` which is used to compare against the generated output from the model.

In [0]:
eval_df = pd.DataFrame(
    {
        "inputs": [
            "How does useEffect() work?",
            "What does the static keyword in a function mean?",
            "What does the 'finally' block in Python do?",
            "What is the difference between multiprocessing and multithreading?",
        ],
        "ground_truth": [
            "The useEffect() hook tells React that your component needs to do something after render. React will remember the function you passed (we’ll refer to it as our “effect”), and call it later after performing the DOM updates.",
            "Static members belongs to the class, rather than a specific instance. This means that only one instance of a static member exists, even if you create multiple objects of the class, or if you don't create any. It will be shared by all objects.",
            "'Finally' defines a block of code to run when the try... except...else block is final. The finally block will be executed no matter if the try block raises an error or not.",
            "Multithreading refers to the ability of a processor to execute multiple threads concurrently, where each thread runs a process. Whereas multiprocessing refers to the ability of a system to run multiple processors in parallel, where each processor can run one or more threads.",
        ],
    }
)

Create a simple OpenAI model that asks gpt-3.5 to answer the question in two sentences. Call `mlflow.evaluate()` with the model and evaluation dataframe. 

In [0]:
with mlflow.start_run() as run:
    system_prompt = "Answer the following question in two sentences"
    basic_qa_model = mlflow.openai.log_model(
        model="gpt-3.5-turbo",
        task=openai.ChatCompletion,
        artifact_path="model",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": "{question}"},
        ],
    )
    results = mlflow.evaluate(
        basic_qa_model.model_uri,
        eval_df,
        targets="ground_truth",  # specify which column corresponds to the expected output
        model_type="question-answering",  # model type indicates which metrics are relevant for this task
        evaluators="default",
    )
results.metrics



Uploading artifacts:   0%|          | 0/5 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/5 [00:00<?, ?it/s]

2024/03/15 12:25:37 INFO mlflow.models.evaluation.base: Evaluating the model with the default evaluator.
2024/03/15 12:25:37 INFO mlflow.models.evaluation.default_evaluator: Computing model predictions.
2024/03/15 12:25:38 INFO mlflow.models.evaluation.default_evaluator: Testing metrics on first row...
2024/03/15 12:25:38 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: token_count
2024/03/15 12:25:38 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: toxicity
2024/03/15 12:25:39 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: flesch_kincaid_grade_level
2024/03/15 12:25:39 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: ari_grade_level
2024/03/15 12:25:39 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: exact_match


{'toxicity/v1/mean': 0.00020113740538363345,
 'toxicity/v1/variance': 1.06491546679447e-09,
 'toxicity/v1/p90': 0.00023621094587724655,
 'toxicity/v1/ratio': 0.0,
 'exact_match/v1': 0.0}

Inspect the evaluation results table as a dataframe to see row-by-row metrics to further assess model performance

In [0]:
display(results.tables["eval_results_table"])

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

inputs,ground_truth,outputs,token_count,toxicity/v1/score
How does useEffect() work?,"The useEffect() hook tells React that your component needs to do something after render. React will remember the function you passed (we’ll refer to it as our “effect”), and call it later after performing the DOM updates.",useEffect() is a hook in React that allows you to run side-effects such as fetching data or updating the DOM after every render. It takes two arguments: a function that contains the side effect code and an optional array of dependencies that tells React when to re-run the effect.,56,0.0002191865
What does the static keyword in a function mean?,"Static members belongs to the class, rather than a specific instance. This means that only one instance of a static member exists, even if you create multiple objects of the class, or if you don't create any. It will be shared by all objects.",The static keyword in a function means that the function retains the value of its local variables across multiple function calls. This also means that the function is only visible inside the file it is declared in.,39,0.0001584084
What does the 'finally' block in Python do?,'Finally' defines a block of code to run when the try... except...else block is final. The finally block will be executed no matter if the try block raises an error or not.,"The 'finally' block in Python is used to specify a block of code to be executed after the try and except blocks, regardless of whether an exception was raised or not. It is commonly used to clean up resources such as closing files or releasing network connections.",52,0.0002435071
What is the difference between multiprocessing and multithreading?,"Multithreading refers to the ability of a processor to execute multiple threads concurrently, where each thread runs a process. Whereas multiprocessing refers to the ability of a system to run multiple processors in parallel, where each processor can run one or more threads.","Multiprocessing involves the execution of multiple processes simultaneously, utilizing multiple processors. On the other hand, multithreading involves the execution of multiple threads within a single process, utilizing a single processor.",38,0.0001834476


## LLM-judged correctness with OpenAI GPT-4

Construct an answer similarity metric using the `answer_similarity()` metric  function.

In [0]:
from mlflow.metrics.genai import EvaluationExample, answer_similarity

# Create an example to describe what answer_similarity means like for this problem.
example = EvaluationExample(
    input="What is MLflow?",
    output="MLflow is an open-source platform for managing machine "
    "learning workflows, including experiment tracking, model packaging, "
    "versioning, and deployment, simplifying the ML lifecycle.",
    score=4,
    justification="The definition effectively explains what MLflow is "
    "its purpose, and its developer. It could be more concise for a 5-score.",
    grading_context={
        "targets": "MLflow is an open-source platform for managing "
        "the end-to-end machine learning (ML) lifecycle. It was developed by Databricks, "
        "a company that specializes in big data and machine learning solutions. MLflow is "
        "designed to address the challenges that data scientists and machine learning "
        "engineers face when developing, training, and deploying machine learning models."
    },
)

# Construct the metric using OpenAI GPT-4 as the judge
answer_similarity_metric = answer_similarity(model="openai:/gpt-4", examples=[example])

print(answer_similarity_metric)

EvaluationMetric(name=answer_similarity, greater_is_better=True, long_name=answer_similarity, version=v1, metric_details=
Task:
You must return the following fields in your response in two lines, one below the other:
score: Your numerical score for the model's answer_similarity based on the rubric
justification: Your reasoning about the model's answer_similarity score

You are an impartial judge. You will be given an input that was sent to a machine
learning model, and you will be given an output that the model produced. You
may also be given additional information that was used by the model to generate the output.

Your task is to determine a numerical score called answer_similarity based on the input and output.
A definition of answer_similarity and a grading rubric are provided below.
You must use the grading rubric to determine your score. You must also justify your score.

Examples could be included below for reference. Make sure to use them as references and to
understand them be

Call `mlflow.evaluate()` again but with your new `answer_similarity_metric`

In [0]:
with mlflow.start_run() as run:
    results = mlflow.evaluate(
        basic_qa_model.model_uri,
        eval_df,
        targets="ground_truth",
        model_type="question-answering",
        evaluators="default",
        extra_metrics=[answer_similarity_metric],  # use the answer similarity metric created above
    )
results.metrics

Downloading artifacts:   0%|          | 0/5 [00:00<?, ?it/s]

2024/03/15 12:25:45 INFO mlflow.models.evaluation.base: Evaluating the model with the default evaluator.
2024/03/15 12:25:45 INFO mlflow.models.evaluation.default_evaluator: Computing model predictions.
2024/03/15 12:25:46 INFO mlflow.models.evaluation.default_evaluator: Testing metrics on first row...


  0%|          | 0/1 [00:00<?, ?it/s]

2024/03/15 12:25:47 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: token_count
2024/03/15 12:25:47 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: toxicity
2024/03/15 12:25:48 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: flesch_kincaid_grade_level
2024/03/15 12:25:48 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: ari_grade_level
2024/03/15 12:25:48 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: exact_match
2024/03/15 12:25:48 INFO mlflow.models.evaluation.default_evaluator: Evaluating metrics: answer_similarity


  0%|          | 0/4 [00:00<?, ?it/s]

{'toxicity/v1/mean': 0.0001848994470492471,
 'toxicity/v1/variance': 1.1711853175686401e-09,
 'toxicity/v1/p90': 0.0002216897177277133,
 'toxicity/v1/ratio': 0.0,
 'exact_match/v1': 0.0,
 'answer_similarity/v1/mean': 4.25,
 'answer_similarity/v1/variance': 0.1875,
 'answer_similarity/v1/p90': 4.7}

See the row-by-row LLM-judged answer similarity score and justifications

In [0]:
display(results.tables["eval_results_table"])

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

inputs,ground_truth,outputs,token_count,toxicity/v1/score,answer_similarity/v1/score,answer_similarity/v1/justification
How does useEffect() work?,"The useEffect() hook tells React that your component needs to do something after render. React will remember the function you passed (we’ll refer to it as our “effect”), and call it later after performing the DOM updates.",The useEffect() function in React is a built-in Hook that allows you to run side effects in functional components after rendering. It takes two arguments: a function that defines the desired side effect and an array of dependencies that triggers the effect.,47,0.0001849041,4,"The output provides a clear and concise definition of the useEffect() function in React, including its purpose, arguments, and how it works. It closely aligns with the provided targets in most aspects and has substantial semantic similarity."
What does the static keyword in a function mean?,"Static members belongs to the class, rather than a specific instance. This means that only one instance of a static member exists, even if you create multiple objects of the class, or if you don't create any. It will be shared by all objects.",The static keyword in a function means that the function retains its value between invocations and its lifetime is throughout the entire program execution. A static function is only visible to other functions in the same file.,40,0.0001750719,4,"The output effectively explains what the static keyword in a function means, including its retention of value and lifetime, and its visibility to other functions in the same file. It also includes additional information about static members belonging to the class and being shared by all objects. However, it could be more concise for a 5-score."
What does the 'finally' block in Python do?,'Finally' defines a block of code to run when the try... except...else block is final. The finally block will be executed no matter if the try block raises an error or not.,"The 'finally' block in Python is used to specify a block of code that will be executed after a 'try' block and/or an 'except' block, regardless of whether an exception has been raised or not. It is often used for tasks that need to be performed whether or not an error occurs, such as releasing resources or closing files.",70,0.000237455,5,"The output closely aligns with the provided targets in all significant aspects. The output provides a clear and concise definition of the 'finally' block in Python, including its purpose and when it is executed, which closely matches the provided targets. The additional information used by the model also supports the accuracy of the output."
What is the difference between multiprocessing and multithreading?,"Multithreading refers to the ability of a processor to execute multiple threads concurrently, where each thread runs a process. Whereas multiprocessing refers to the ability of a system to run multiple processors in parallel, where each processor can run one or more threads.","Multiprocessing refers to the use of multiple processors or cores to execute multiple processes simultaneously, while multithreading refers to the use of multiple threads within a single process to perform concurrent tasks.",37,0.0001421669,4,"The output effectively explains the difference between multiprocessing and multithreading, including their definitions and how they differ in terms of processors and threads. However, it could be more concise for a 5-score."


## Custom LLM-judged metric for professionalism

Create a custom metric that is used to determine professionalism of the model outputs. Use `make_genai_metric` with a metric definition, grading prompt, grading example, and judge model configuration

In [0]:
from mlflow.metrics.genai import EvaluationExample, make_genai_metric

professionalism_metric = make_genai_metric(
    name="professionalism",
    definition=(
        "Professionalism refers to the use of a formal, respectful, and appropriate style of communication that is tailored to the context and audience. It often involves avoiding overly casual language, slang, or colloquialisms, and instead using clear, concise, and respectful language"
    ),
    grading_prompt=(
        "Professionalism: If the answer is written using a professional tone, below "
        "are the details for different scores: "
        "- Score 1: Language is extremely casual, informal, and may include slang or colloquialisms. Not suitable for professional contexts."
        "- Score 2: Language is casual but generally respectful and avoids strong informality or slang. Acceptable in some informal professional settings."
        "- Score 3: Language is balanced and avoids extreme informality or formality. Suitable for most professional contexts. "
        "- Score 4: Language is noticeably formal, respectful, and avoids casual elements. Appropriate for business or academic settings. "
        "- Score 5: Language is excessively formal, respectful, and avoids casual elements. Appropriate for the most formal settings such as textbooks. "
    ),
    examples=[
        EvaluationExample(
            input="What is MLflow?",
            output=(
                "MLflow is like your friendly neighborhood toolkit for managing your machine learning projects. It helps you track experiments, package your code and models, and collaborate with your team, making the whole ML workflow smoother. It's like your Swiss Army knife for machine learning!"
            ),
            score=2,
            justification=(
                "The response is written in a casual tone. It uses contractions, filler words such as 'like', and exclamation points, which make it sound less professional. "
            ),
        )
    ],
    version="v1",
    model="openai:/gpt-4",
    parameters={"temperature": 0.0},
    grading_context_columns=[],
    aggregations=["mean", "variance", "p90"],
    greater_is_better=True,
)

print(professionalism_metric)

EvaluationMetric(name=professionalism, greater_is_better=True, long_name=professionalism, version=v1, metric_details=
Task:
You must return the following fields in your response in two lines, one below the other:
score: Your numerical score for the model's professionalism based on the rubric
justification: Your reasoning about the model's professionalism score

You are an impartial judge. You will be given an input that was sent to a machine
learning model, and you will be given an output that the model produced. You
may also be given additional information that was used by the model to generate the output.

Your task is to determine a numerical score called professionalism based on the input and output.
A definition of professionalism and a grading rubric are provided below.
You must use the grading rubric to determine your score. You must also justify your score.

Examples could be included below for reference. Make sure to use them as references and to
understand them before complet

Call `mlflow.evaluate` with your new professionalism metric. 

In [0]:
with mlflow.start_run() as run:
    results = mlflow.evaluate(
        basic_qa_model.model_uri,
        eval_df,
        model_type="question-answering",
        evaluators="default",
        extra_metrics=[professionalism_metric],  # use the professionalism metric we created above
    )
print(results.metrics)

Downloading artifacts:   0%|          | 0/5 [00:00<?, ?it/s]

2024/03/15 12:25:55 INFO mlflow.models.evaluation.base: Evaluating the model with the default evaluator.
2024/03/15 12:25:55 INFO mlflow.models.evaluation.default_evaluator: Computing model predictions.
2024/03/15 12:25:57 INFO mlflow.models.evaluation.default_evaluator: Testing metrics on first row...


  0%|          | 0/1 [00:00<?, ?it/s]

2024/03/15 12:25:58 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: token_count
2024/03/15 12:25:58 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: toxicity
2024/03/15 12:25:59 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: flesch_kincaid_grade_level
2024/03/15 12:25:59 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: ari_grade_level
2024/03/15 12:25:59 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: exact_match
2024/03/15 12:25:59 INFO mlflow.models.evaluation.default_evaluator: Evaluating metrics: professionalism


  0%|          | 0/4 [00:00<?, ?it/s]

{'toxicity/v1/mean': 0.00022576300034415908, 'toxicity/v1/variance': 1.3472043575596806e-08, 'toxicity/v1/p90': 0.00035474369651637976, 'toxicity/v1/ratio': 0.0, 'professionalism/v1/mean': 3.0, 'professionalism/v1/variance': 0.0, 'professionalism/v1/p90': 3.0}


In [0]:
display(results.tables["eval_results_table"])

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

inputs,ground_truth,outputs,token_count,toxicity/v1/score,professionalism/v1/score,professionalism/v1/justification
How does useEffect() work?,"The useEffect() hook tells React that your component needs to do something after render. React will remember the function you passed (we’ll refer to it as our “effect”), and call it later after performing the DOM updates.","useEffect() is a hook in React that allows executing a function or invoking a side-effect on the component render. It is called after every render and it can accept dependencies, which determine if the function should be called or not.",46,0.0001946291,3,The response is written in a balanced tone. It avoids extreme informality or formality and uses clear and concise language. It is suitable for most professional contexts.
What does the static keyword in a function mean?,"Static members belongs to the class, rather than a specific instance. This means that only one instance of a static member exists, even if you create multiple objects of the class, or if you don't create any. It will be shared by all objects.",The static keyword in a function means that the function is visible only within the file it is defined in and can retain its value between function calls. It is commonly used in functions that are not meant to be accessed or modified outside their source file.,49,0.0001396205,3,"The response uses clear and concise language, avoiding overly casual language or slang. However, it could benefit from a slightly more formal tone for some professional contexts."
What does the 'finally' block in Python do?,'Finally' defines a block of code to run when the try... except...else block is final. The finally block will be executed no matter if the try block raises an error or not.,The 'finally' block in Python is used to execute a set of statements whether an exception is raised or not. It is generally used to release external resources (such as files or network connections) or to perform cleanup tasks.,45,0.0004233642,3,"The response uses a balanced tone that avoids extreme informality or formality. It is suitable for most professional contexts. However, it does include a few filler words such as 'generally' and 'or not', which slightly detract from its professionalism."
What is the difference between multiprocessing and multithreading?,"Multithreading refers to the ability of a processor to execute multiple threads concurrently, where each thread runs a process. Whereas multiprocessing refers to the ability of a system to run multiple processors in parallel, where each processor can run one or more threads.","Multiprocessing is the ability of a system to have more than one processor or core for parallel execution of multiple processes, while multithreading is the ability of a system to have multiple threads within a process for parallel execution of multiple tasks within that process.",50,0.0001454381,3,"The response uses a balanced tone that avoids extreme informality or formality. It is suitable for most professional contexts. However, there are a few instances of casual language such as ""ability"" instead of ""capability"" and ""tasks"" instead of ""operations""."


Lets see if we can improve `basic_qa_model` by creating a new model that could perform better by changing the system prompt.

Call `mlflow.evaluate()` using the new model. Observe that the professionalism score has increased!

In [0]:
with mlflow.start_run() as run:
    system_prompt = "Answer the following question using extreme formality."
    professional_qa_model = mlflow.openai.log_model(
        model="gpt-3.5-turbo",
        task=openai.ChatCompletion,
        artifact_path="model",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": "{question}"},
        ],
    )
    results = mlflow.evaluate(
        professional_qa_model.model_uri,
        eval_df,
        model_type="question-answering",
        evaluators="default",
        extra_metrics=[professionalism_metric],
    )
print(results.metrics)



Uploading artifacts:   0%|          | 0/5 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/5 [00:00<?, ?it/s]

2024/03/15 12:26:08 INFO mlflow.models.evaluation.base: Evaluating the model with the default evaluator.
2024/03/15 12:26:08 INFO mlflow.models.evaluation.default_evaluator: Computing model predictions.
2024/03/15 12:26:11 INFO mlflow.models.evaluation.default_evaluator: Testing metrics on first row...


  0%|          | 0/1 [00:00<?, ?it/s]

2024/03/15 12:26:13 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: token_count
2024/03/15 12:26:13 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: toxicity
2024/03/15 12:26:15 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: flesch_kincaid_grade_level
2024/03/15 12:26:15 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: ari_grade_level
2024/03/15 12:26:15 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: exact_match
2024/03/15 12:26:15 INFO mlflow.models.evaluation.default_evaluator: Evaluating metrics: professionalism


  0%|          | 0/4 [00:00<?, ?it/s]

{'toxicity/v1/mean': 0.0002741648022492882, 'toxicity/v1/variance': 1.0150108719897355e-08, 'toxicity/v1/p90': 0.000375902172527276, 'toxicity/v1/ratio': 0.0, 'professionalism/v1/mean': 3.25, 'professionalism/v1/variance': 0.1875, 'professionalism/v1/p90': 3.7}


In [0]:
display(results.tables["eval_results_table"])

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

inputs,ground_truth,outputs,token_count,toxicity/v1/score,professionalism/v1/score,professionalism/v1/justification
How does useEffect() work?,"The useEffect() hook tells React that your component needs to do something after render. React will remember the function you passed (we’ll refer to it as our “effect”), and call it later after performing the DOM updates.","Would you kindly allow me to explicate the intricacies of the method known as useEffect() utilized within the realm of React programming? The aforementioned method concerns itself with the handling of side effects which may occur during the operational course of a component. To elaborate further, the useEffect() method allows a React component to synchronize with its environment, generally by means of connecting to and reading from external data sources, addressing changes in the component’s prop values or state, incorporating DOM manipulations, or dispatching a function call when the component unmounts. By invoking useEffect(), the React component will subsequently trigger the function specified within the method as a side effect of the component rendering. Hence, it is imperative to employ useEffect() when implementing any feature within a React component that necessitates synchronization with the external environment.",159,0.0002060573,3,"The response strikes a balance between formality and informality. It avoids extreme informality or formality and is suitable for most professional contexts. The language is clear, concise, and respectful, but could benefit from slightly more formal language in certain areas."
What does the static keyword in a function mean?,"Static members belongs to the class, rather than a specific instance. This means that only one instance of a static member exists, even if you create multiple objects of the class, or if you don't create any. It will be shared by all objects.","Pardon me, esteemed inquirer, the static keyword in a function serves as a storage class specifier that determines the scope and duration of the variable or function to which it is applied. When utilized in the context of a function, the static keyword functions as a local variable that persists throughout the life of the program, allowing the variable to retain its value between multiple function calls and ensuring that its value is not lost upon exiting the function's block.",89,0.0003810984,4,"The response is written in a noticeably formal and respectful tone, avoiding casual elements such as contractions or slang. It uses formal language and technical terms, making it appropriate for business or academic settings."
What does the 'finally' block in Python do?,'Finally' defines a block of code to run when the try... except...else block is final. The finally block will be executed no matter if the try block raises an error or not.,"Pray, allow me to elucidate the function of the 'finally' block in the Python programming language. The 'finally' block, being a part of a 'try' statement, serves the purpose of providing a code block that is executed regardless of whether an exception is raised or not. It is always executed after the 'try' and 'except' blocks, no matter the reason for the exception. In essence, it serves as a guarantee that the code within it will be executed no matter what.",102,0.0003637776,3,"The response uses a balanced tone that avoids extreme informality or formality. It is suitable for most professional contexts. However, there are a few instances of overly complex language that could be simplified for better clarity."
What is the difference between multiprocessing and multithreading?,"Multithreading refers to the ability of a processor to execute multiple threads concurrently, where each thread runs a process. Whereas multiprocessing refers to the ability of a system to run multiple processors in parallel, where each processor can run one or more threads.",May I seek clarification on the level of formality required for the answer? Is there a specific tone you would like me to use?,27,0.0001457259,3,"The response uses a balanced tone that avoids extreme informality or formality. It is clear and concise, and respectful in its request for clarification. It would be suitable for most professional contexts."
