# LLMComparison

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDoxJudge/blob/main/examples/llm_comparison.ipynb)

**Explanation:**

- **Defining Model Data:** The `models` list contains dictionaries representing three different language models, each with a name, overall score, and detailed evaluation metrics. These metrics include Faithfulness, Answer Relevancy, Bias, Hallucination, Knowledge Retention, Toxicity, Precision, Recall, F1 Score, and BLEU. The `score` key represents the overall evaluation score for each model.

- **Importing LLMComparison:** The `LLMComparison` class from the `indoxJudge.piplines` module is imported. This class is used to compare the performance of the different language models based on their metrics.

- **Initializing the Comparison:** An instance of `LLMComparison` is created by passing the `models` list to it. This instance, `llm_comparison`, will handle the comparison of the models.

- **Plotting the Comparison:** The `plot` method is called with `mode="inline"` to generate and display a comparative visualization of the models' performance within the notebook. This is especially useful for users working in environments like Google Colab, where inline plotting is preferred for ease of use.

This cell is designed to compare multiple language models visually, allowing for a detailed analysis of their respective strengths and weaknesses across various metrics.


In [None]:
!pip install indoxJudge -U

In [1]:
models = [{'name': 'Model_1',
  'score': 0.50,
  'metrics': {'Faithfulness': 0.55,
   'AnswerRelevancy': 1.0,
   'Bias': 0.45,
   'Hallucination': 0.8,
   'KnowledgeRetention': 0.0,
   'Toxicity': 0.0,
   'precision': 0.64,
   'recall': 0.77,
   'f1_score': 0.70,
   'BLEU': 0.11}},
 {'name': 'Model_2',
  'score': 0.61,
  'metrics': {'Faithfulness': 1.0,
   'AnswerRelevancy': 1.0,
   'Bias': 0.0,
   'Hallucination': 0.8,
   'KnowledgeRetention': 1.0,
   'Toxicity': 0.0,
   'precision': 0.667,
   'recall': 0.77,
   'f1_score': 0.71,
   'BLEU': 0.14}},
 {'name': 'Model_3',
  'score': 0.050,
  'metrics': {'Faithfulness': 1.0,
   'AnswerRelevancy': 1.0,
   'Bias': 0.0,
   'Hallucination': 0.83,
   'KnowledgeRetention': 0.0,
   'Toxicity': 0.0,
   'precision': 0.64,
   'recall': 0.76,
   'f1_score': 0.70,
   'BLEU': 0.10}},
]

In [None]:
from indoxJudge.piplines import LLMComparison
llm_comparison = LLMComparison(models=models)
llm_comparison.plot(mode="inline")