Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,9 @@ os.environ["OPENAI_API_KEY"] = "your-openai-key"
dataset: Dataset

results = evaluate(dataset)
# {'ragas_score': 0.860, 'context_precision': 0.817,
# 'faithfulness': 0.892, 'answer_relevancy': 0.874}
# {'context_precision': 0.817,
# 'faithfulness': 0.892,
# 'answer_relevancy': 0.874}
```

Refer to our [documentation](https://docs.ragas.io/) to learn more.
Expand Down
2 changes: 1 addition & 1 deletion docs/concepts/metrics/critique.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ This is designed to assess submissions based on predefined aspects such as `harm

Critiques within the LLM evaluators evaluate submissions based on the provided aspect. Ragas Critiques offers a range of predefined aspects like correctness, harmfulness, etc. (Please refer to `SUPPORTED_ASPECTS` for a complete list). If you prefer, you can also create custom aspects to evaluate submissions according to your unique requirements.

The `strictness` parameter plays a crucial role in maintaining a certain level of self-consistency in predictions, with an ideal range typically falling between 2 to 4. It's important to note that the scores obtained from aspect critiques are binary and do not contribute to the final Ragas score due to their non-continuous nature.
The `strictness` parameter plays a crucial role in maintaining a certain level of self-consistency in predictions, with an ideal range typically falling between 2 to 4.


```{hint}
Expand Down
2 changes: 1 addition & 1 deletion docs/getstarted/evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ result = evaluate(

result
```
and there you have it, all the scores you need. `ragas_score` gives you a single metric that you can use while 4 metrics individually would measure the different parts of your pipeline.
and there you have it, all the scores you need.

Now if we want to dig into the results and figure out examples where your pipeline performed worse or really good you can easily convert it into a pandas array and use your standard analytics tools too!

Expand Down
4 changes: 2 additions & 2 deletions docs/howtos/applications/compare_embeddings.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ result = evaluate(query_engine1, metrics, test_questions, test_answers)

```{code-block}
:caption: output
{'ragas_score': 0.3570, 'context_precision': 0.2378, 'context_recall': 0.7159}
{'context_precision': 0.2378, 'context_recall': 0.7159}
```

## Evaluate Bge embeddings
Expand All @@ -124,7 +124,7 @@ result = evaluate(query_engine2, metrics, test_questions, test_answers)

```{code-block}
:caption: output
{'ragas_score': 0.3883, 'context_precision': 0.2655, 'context_recall': 0.7227}
{'context_precision': 0.2655, 'context_recall': 0.7227}

```

Expand Down
4 changes: 2 additions & 2 deletions docs/howtos/applications/compare_llms.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ result_zephyr

```{code-block}
:caption: output
{'ragas_score': 0.7809, 'faithfulness': 0.8365, 'answer_relevancy': 0.8831, 'answer_correctness': 0.6605}
{'faithfulness': 0.8365, 'answer_relevancy': 0.8831, 'answer_correctness': 0.6605}
```

## Evaluate Falcon-7B-Instruct LLM
Expand All @@ -168,7 +168,7 @@ result

```{code-block}
:caption: output
{'ragas_score': 0.6956, 'faithfulness': 0.6909, 'answer_relevancy': 0.8651, 'answer_correctness': 0.5850}
{'faithfulness': 0.6909, 'answer_relevancy': 0.8651, 'answer_correctness': 0.5850}
```

## Compare Scores
Expand Down
2 changes: 1 addition & 1 deletion docs/howtos/customisations/aws-bedrock.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@
"id": "a2dc0ec2",
"metadata": {},
"source": [
"and there you have the it, all the scores you need. `ragas_score` gives you a single metric that you can use while the other onces measure the different parts of your pipeline.\n",
"and there you have the it, all the scores you need.\n",
"\n",
"now if we want to dig into the results and figure out examples where your pipeline performed worse or really good you can easily convert it into a pandas array and use your standard analytics tools too!"
]
Expand Down
2 changes: 1 addition & 1 deletion docs/howtos/customisations/azure-openai.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -258,7 +258,7 @@
"id": "a2dc0ec2",
"metadata": {},
"source": [
"and there you have the it, all the scores you need. `ragas_score` gives you a single metric that you can use while the other onces measure the different parts of your pipeline.\n",
"and there you have the it, all the scores you need.\n",
"\n",
"now if we want to dig into the results and figure out examples where your pipeline performed worse or really good you can easily convert it into a pandas array and use your standard analytics tools too!"
]
Expand Down
2 changes: 1 addition & 1 deletion docs/howtos/customisations/gcp-vertexai.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -294,7 +294,7 @@
"id": "960f88fc-c90b-4ac6-8e97-252edd2f1661",
"metadata": {},
"source": [
"and there you have the it, all the scores you need. `ragas_score` gives you a single metric that you can use while the other onces measure the different parts of your pipeline.\n",
"and there you have the it, all the scores you need.\n",
"\n",
"now if we want to dig into the results and figure out examples where your pipeline performed worse or really good you can easily convert it into a pandas array and use your standard analytics tools too!"
]
Expand Down
2 changes: 1 addition & 1 deletion docs/howtos/integrations/langfuse.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -559,7 +559,7 @@
{
"data": {
"text/plain": [
"{'ragas_score': 0.9309, 'faithfulness': 0.8889, 'answer_relevancy': 0.9771}"
"{'faithfulness': 0.8889, 'answer_relevancy': 0.9771}"
]
},
"execution_count": 15,
Expand Down
2 changes: 1 addition & 1 deletion docs/howtos/integrations/langsmith.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@
{
"data": {
"text/plain": [
"{'ragas_score': 0.7744, 'context_precision': 0.5976, 'faithfulness': 0.8889, 'answer_relevancy': 0.9300}"
"{'context_precision': 0.5976, 'faithfulness': 0.8889, 'answer_relevancy': 0.9300}"
]
},
"execution_count": 1,
Expand Down
2 changes: 1 addition & 1 deletion docs/howtos/integrations/llamaindex.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -282,7 +282,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"{'ragas_score': 0.5142, 'faithfulness': 0.7000, 'answer_relevancy': 0.9550, 'context_precision': 0.2335, 'context_recall': 0.9800, 'harmfulness': 0.0000}\n"
"{faithfulness': 0.7000, 'answer_relevancy': 0.9550, 'context_precision': 0.2335, 'context_recall': 0.9800, 'harmfulness': 0.0000}\n"
]
}
],
Expand Down
8 changes: 4 additions & 4 deletions src/ragas/evaluation.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,7 @@ def evaluate(
-------
Result
Result object containing the scores of each metric. You can use this do analysis
later. If the top 3 metrics are provided then it also returns the `ragas_score`
for the entire pipeline.
later.

Raises
------
Expand All @@ -64,8 +63,9 @@ def evaluate(
})

>>> result = evaluate(dataset)
>>> print(result["ragas_score"])
{'ragas_score': 0.860, 'context_precision': 0.817, 'faithfulness': 0.892,
>>> print(result)
{'context_precision': 0.817,
'faithfulness': 0.892,
'answer_relevancy': 0.874}
```
"""
Expand Down
3 changes: 1 addition & 2 deletions src/ragas/llama_index/evaluation.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,7 @@ def evaluate(
-------
Result
Result object containing the scores of each metric. You can use this do analysis
later. If the top 3 metrics are provided then it also returns the `ragas_score`
for the entire pipeline.
later.

Raises
------
Expand Down