# Exploring Tests in ValidMind:
## A Comprehensive Guide to List and Describe Tests

Welcome to this comprehensive guide through the ValidMind Developer Framework! In this notebook, we'll dive deep into the utilities available for managing and understanding the various tests that can be run against your models and datasets. Whether you're just getting started or looking for advanced tips, you'll find clear examples and explanations to assist you every step of the way.

Before we delve into the details, let's set up our environment by importing the necessary modules.

In [1]:
from validmind.tests import describe_test, list_tests

## Listing All Tests

The `list_tests` function provides a convenient way to retrieve all available tests in the `validmind.tests` module. When invoked without any parameters, it returns a pandas DataFrame containing detailed information about each test.

In [2]:
list_tests()

Test Type,Name,Description,ID
ThresholdTest,Bias,Evaluates bias in a Language Learning Model based on the order and distribution of exemplars in a prompt....,validmind.prompt_validation.Bias
ThresholdTest,Clarity,Evaluates and scores the clarity of prompts in a Language Learning Model based on specified guidelines....,validmind.prompt_validation.Clarity
ThresholdTest,Specificity,"Evaluates and scores the specificity of prompts provided to a Language Learning Model (LLM), based on clarity,...",validmind.prompt_validation.Specificity
ThresholdTest,Robustness,Assesses the robustness of prompts provided to a Language Learning Model under varying conditions and contexts....,validmind.prompt_validation.Robustness
ThresholdTest,Negative Instruction,"Evaluates and grades the use of affirmative, proactive language over negative instructions in ML model prompts....",validmind.prompt_validation.NegativeInstruction
ThresholdTest,Conciseness,Analyzes and grades the conciseness of prompts provided to a Language Learning Model....,validmind.prompt_validation.Conciseness
ThresholdTest,Delimitation,Evaluates the proper use of delimiters in prompts provided to Language Learning Models....,validmind.prompt_validation.Delimitation
Metric,Bert Score,"Evaluates text generation models' performance by calculating precision, recall, and F1 score based on BERT...",validmind.model_validation.BertScore
Metric,Bleu Score,Assesses translation quality by comparing machine-translated sentences with human-translated ones using BLEU score....,validmind.model_validation.BleuScore
Metric,Contextual Recall,Evaluates a Natural Language Generation model's ability to generate contextually relevant and factually correct...,validmind.model_validation.ContextualRecall


## Understanding Tags and Task Types

Effectively using ValidMind's tests involves a deep understanding of its 'tags' and 'task types'. Here's a breakdown:

- **Task Types**: Represent the kind of modeling task associated with a test. For instance:
  - **classification:** Classifying data into specific categories.
  - **regression:** Predicting a continuous outcome variable.
  - **text classification:** Classifying text into specific categories.
  - **text summarization:** Producing a concise summary for a text.

- **Tags**: Free-form descriptors providing detailed insights about a test. Some examples include:
  - **nlp:** Tests relevant for natural language processing.
  - **binary_classification:** Tests for binary classification tasks.
  - **forecasting:** Tests for forecasting and time-series analysis.
  - **tabular_data:** Tests for tabular data like CSVs and Excel spreadsheets.

## Searching for Specific Tests using `tags` and `task_types`

While listing all tests is valuable, there are times when you need to narrow down your search. The `list_tests` function offers `filter`, `task`, and `tags` parameters to assist in this.

In [3]:
list_tests(filter="classification")

Test Type,Name,Description,ID
ThresholdTest,Minimum ROCAUC Score,Validates model by checking if the ROC AUC score meets or surpasses a specified threshold....,validmind.model_validation.sklearn.MinimumROCAUCScore
Metric,Tabular Categorical Bar Plots,Generates and visualizes bar plots for each category in categorical features to evaluate dataset's composition....,validmind.data_validation.TabularCategoricalBarPlots
Metric,Confusion Matrix,Evaluates and visually represents the classification ML model's predictive performance using a Confusion Matrix...,validmind.model_validation.sklearn.ConfusionMatrix
Metric,Pearson Correlation Matrix,Evaluates linear dependency between numerical variables in a dataset via a Pearson Correlation coefficient heat map....,validmind.data_validation.PearsonCorrelationMatrix
Metric,Common Words,Identifies and visualizes the 40 most frequent non-stopwords in a specified text column within a dataset....,validmind.data_validation.nlp.CommonWords
Metric,Scatter Plot,"Creates a scatter plot matrix to visually analyze feature relationships, patterns, and outliers in a dataset....",validmind.data_validation.ScatterPlot
ThresholdTest,Hashtags,"Assesses hashtag frequency in a text column, highlighting usage trends and potential dataset bias or spam....",validmind.data_validation.nlp.Hashtags
Metric,Bivariate Histograms,"Generates bivariate histograms for paired features, aiding in visual inspection of categorical variables'...",validmind.data_validation.BivariateHistograms
ThresholdTest,Missing Values,Evaluates dataset quality by ensuring missing value ratio across all features does not exceed a set threshold....,validmind.data_validation.MissingValues
ThresholdTest,Stop Words,Evaluates and visualizes the frequency of English stop words in a text dataset against a defined threshold....,validmind.data_validation.nlp.StopWords


If you're targeting a specific test or tests that match a particular task type, the `filter` parameter comes in handy. For example, to list tests for 'classification':

In [4]:
list_tests(task="regression")

Test Type,Name,Description,ID
Metric,Model Metadata,**Purpose:**...,validmind.model_validation.ModelMetadata
Metric,Regression Models Coeffs,Compares feature importance by evaluating and contrasting coefficients of different regression models....,validmind.model_validation.statsmodels.RegressionModelsCoeffs
Metric,Box Pierce,Detects autocorrelation in time-series data through the Box-Pierce test to validate model performance....,validmind.model_validation.statsmodels.BoxPierce
Metric,Regression Coeffs Plot,Visualizes regression coefficients with 95% confidence intervals to assess predictor variables' impact on response...,validmind.model_validation.statsmodels.RegressionCoeffsPlot
Metric,Regression Model Sensitivity Plot,Tests the sensitivity of a regression model to variations in independent variables by applying shocks and...,validmind.model_validation.statsmodels.RegressionModelSensitivityPlot
Metric,Regression Models Performance,"Evaluates and compares regression models' performance using R-squared, Adjusted R-squared, and MSE metrics....",validmind.model_validation.statsmodels.RegressionModelsPerformance
Metric,Zivot Andrews Arch,Evaluates the order of integration and stationarity of time series data using Zivot-Andrews unit root test....,validmind.model_validation.statsmodels.ZivotAndrewsArch
Metric,Regression Model Outsample Comparison,Computes MSE and RMSE for multiple regression models using out-of-sample test to assess model's prediction accuracy...,validmind.model_validation.statsmodels.RegressionModelOutsampleComparison
Metric,Regression Model Forecast Plot Levels,Compares and visualizes forecasted and actual values of regression models on both raw and transformed datasets....,validmind.model_validation.statsmodels.RegressionModelForecastPlotLevels
Metric,Feature Importance And Significance,Evaluates and visualizes the statistical significance and feature importance using regression and decision tree...,validmind.model_validation.statsmodels.FeatureImportanceAndSignificance


The `task` parameter is designed for pinpointing tests that align with a specific task type. For instance, to find tests tailored for 'regression' tasks:

In [5]:
list_tests(tags=["model_performance", "visualization"])

Test Type,Name,Description,ID
Metric,Confusion Matrix,Evaluates and visually represents the classification ML model's predictive performance using a Confusion Matrix...,validmind.model_validation.sklearn.ConfusionMatrix
Metric,Precision Recall Curve,Evaluates the precision-recall trade-off for binary classification models and visualizes the Precision-Recall curve....,validmind.model_validation.sklearn.PrecisionRecallCurve
Metric,ROC Curve,Evaluates binary classification model performance by generating and plotting the Receiver Operating Characteristic...,validmind.model_validation.sklearn.ROCCurve
ThresholdTest,Training Test Degradation,Tests if model performance degradation between training and test datasets exceeds a predefined threshold....,validmind.model_validation.sklearn.TrainingTestDegradation
Metric,Log Regression Confusion Matrix,"Generates a confusion matrix for logistic regression model performance, utilizing thresholded probabilities for...",validmind.model_validation.statsmodels.LogRegressionConfusionMatrix
Metric,GINI Table,"Evaluates classification model performance using AUC, GINI, and KS metrics for training and test datasets....",validmind.model_validation.statsmodels.GINITable


The `tags` parameter facilitates searching tests by their tags. For instance, if you're interested in only tests associated designed for `model_performance` that produce a plot (denoted by the `visualization` tag)

To work with a specific set of tests programmatically, you can store the results in a variable. For instance, let's list all regression tests and store them in `regression_tests` for further use.

In [6]:
regression_tests = vt.list_tests(task="regression", pretty=False)
regression_tests

['validmind.model_validation.ModelMetadata',
 'validmind.model_validation.statsmodels.RegressionModelsCoeffs',
 'validmind.model_validation.statsmodels.BoxPierce',
 'validmind.model_validation.statsmodels.RegressionCoeffsPlot',
 'validmind.model_validation.statsmodels.RegressionModelSensitivityPlot',
 'validmind.model_validation.statsmodels.RegressionModelsPerformance',
 'validmind.model_validation.statsmodels.ZivotAndrewsArch',
 'validmind.model_validation.statsmodels.RegressionModelOutsampleComparison',
 'validmind.model_validation.statsmodels.RegressionModelForecastPlotLevels',
 'validmind.model_validation.statsmodels.FeatureImportanceAndSignificance',
 'validmind.model_validation.statsmodels.LJungBox',
 'validmind.model_validation.statsmodels.JarqueBera',
 'validmind.model_validation.statsmodels.PhillipsPerronArch',
 'validmind.model_validation.statsmodels.KolmogorovSmirnov',
 'validmind.model_validation.statsmodels.ResidualsVisualInspection',
 'validmind.model_validation.statsmode

## Delving into Test Details with `describe_test`

After identifying a set of potential tests, you might want to explore the specifics of an individual test. The `describe_test` function provides a deep dive into the details of a test. It reveals the test name, description, ID, test type, and required inputs. Below, we showcase how to describe a test using its ID:

In [7]:
vt.describe_test("validmind.model_validation.sklearn.ConfusionMatrix")

HTML(value='\n<div>\n  <h2>Confusion Matrix</h2>\n  <p>Evaluates and visually represents the classification ML…

## Conclusion and Next Steps

By harnessing the functionalities presented in this guide, you should be able to easily list and filter through all of ValidMind's available tests and find those you are interested in running against your model and/or dataset. The next step is to take the IDs of the tests you'd like to run and either create a Test Suite for reuse or just run them directly to try them out. See the other notebooks for a tutorial on how to do both.