Skip to content

Commit

Permalink
Simplify example
Browse files Browse the repository at this point in the history
  • Loading branch information
pantonante committed Feb 22, 2024
1 parent 59cc31a commit a0713f0
Showing 1 changed file with 29 additions and 28 deletions.
57 changes: 29 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@

## Getting Started

This code is provided as a Python package. To install it, run the following command:
This code is provided as a PyPi package. To install it, run the following command:

```bash
python3 -m pip install continuous-eval
Expand All @@ -57,9 +57,11 @@ if you want to install from source
git clone https://github.com/relari-ai/continuous-eval.git && cd continuous-eval
poetry install --all-extras
```

To run LLM-based metrics, the code requires at least one of the LLM API keys in `.env`. Take a look at the example env file `.env.example`.

## Run a single metric

Here's how you run a single metric on a datum.
Check all available metrics here: [link](https://docs.relari.ai/)

Expand All @@ -81,6 +83,7 @@ metric = PrecisionRecallF1()

print(metric(**datum))
```

### Off-the-shelf Metrics

<table border="0">
Expand Down Expand Up @@ -132,45 +135,40 @@ print(metric(**datum))
</tr>
</table>

You can also define your own metrics (coming soon).
## Run modularized evaluation over a dataset

Define your pipeline and select the metrics for each.
You can also define your own metrics, you only need to extend the [Metric](continuous_eval/metrics/base.py#23) class implementing the `__call__` method.
Optional methods are `batch` (if it is possible to implement optimizations for batch processing) and `aggregate` (to aggregate metrics results over multiple samples_).

```python
from continuous_eval.eval import Module, Pipeline, Dataset
from continuous_eval.eval import Module, ModuleOutput, Pipeline, Dataset
from continuous_eval.metrics.retrieval import PrecisionRecallF1, RankedRetrievalMetrics
from continuous_eval.metrics.generation.text import DeterministicAnswerCorrectness
from typing import List, Dict

dataset = Dataset("dataset_folder")
Documents = List[Dict[str, str]]
DocumentsContent = ModuleOutput(lambda x: [z["page_content"] for z in x])

# Simple 3-step RAG pipeline with Retriever->Reranker->Generation

retriever = Module(
name="Retriever",
input=dataset.question,
output=List[Dict[str, str]],
eval={
output=List[str],
eval=[
PrecisionRecallF1().use(
retrieved_context=DocumentsContent,
retrieved_context=ModuleOutput(),
ground_truth_context=dataset.ground_truth_contexts,
),
},
],
)

reranker = Module(
name="reranker",
input=retriever,
output=List[Dict[str, str]],
eval={
eval=[
RankedRetrievalMetrics().use(
retrieved_context=DocumentsContent,
retrieved_context=ModuleOutput(),
ground_truth_context=dataset.ground_truth_contexts,
),
},
],
)

llm = Module(
Expand All @@ -188,25 +186,28 @@ llm = Module(
pipeline = Pipeline([retriever, reranker, llm], dataset=dataset)
```

Log your results in your pipeline (see full example)
Now you can run the evaluation on your pipeline

```python
eval_manager.start_run()
while eval_manager.is_running():
if eval_manager.curr_sample is None:
break
q = eval_manager.curr_sample["question"] # get the question or any other field
# run your pipeline ...
eval_manager.next_sample()
```

from continuous_eval.eval.manager import eval_manager
To **log** the results you just need to call the `eval_manager.log` method with the module name and the output, for example:

...
# Run and log module outputs
response = ask(q, reranked_docs)
```python
eval_manager.log("answer_generator", response)
...
# Set the pipeline
eval_manager.set_pipeline(pipeline)
...
# View the results
eval_manager.run_eval()
print(eval_manager.eval_graph())
```

the evaluator manager also offers

- `eval_manager.run_metrics()` to run all the metrics defined in the pipeline
- `eval_manager.run_tests()` to run the tests defined in the pipeline (see the documentation [docs](docs.relari.ai) for more details)

## Resources

Expand Down

0 comments on commit a0713f0

Please sign in to comment.