Skip to content

Commit

Permalink
updated graph and minor tweaks
Browse files Browse the repository at this point in the history
  • Loading branch information
yisz committed Feb 22, 2024
1 parent a0713f0 commit 88098ec
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 5 deletions.
20 changes: 15 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,9 @@

- **Modularized Evaluation**: Measure each module in the pipeline with tailored metrics.

- **Comprehensive Metric Library**: Covers Retrieval-Augmented Generation (RAG), Code Generation, Tool Use, Agent Tool, Classification and a variety of LLM use cases. Mix and match Deterministic, Semantic and LLM-based metrics.
- **Comprehensive Metric Library**: Covers Retrieval-Augmented Generation (RAG), Code Generation, Agent Tool Use, Classification and a variety of other LLM use cases. Mix and match Deterministic, Semantic and LLM-based metrics.

- **Leverage User Feedback in Evaluation**: easily build a close-to-human ensemble evaluation pipeline with mathematical guarantees.
- **Leverage User Feedback in Evaluation**: Easily build a close-to-human ensemble evaluation pipeline with mathematical guarantees.

- **Synthetic Dataset Generation**: Generate large-scale synthetic dataset to test your pipeline.

Expand All @@ -51,7 +51,7 @@ This code is provided as a PyPi package. To install it, run the following comman
python3 -m pip install continuous-eval
```

if you want to install from source
if you want to install from source:

```bash
git clone https://github.com/relari-ai/continuous-eval.git && cd continuous-eval
Expand Down Expand Up @@ -133,11 +133,20 @@ print(metric(**datum))
<td>Deterministic</td>
<td>ToolSelectionAccuracy</td>
</tr>
<tr>
<td>Custom</td>
<td></td>
<td>Define your own metrics</td>
</tr>
</table>

You can also define your own metrics, you only need to extend the [Metric](continuous_eval/metrics/base.py#23) class implementing the `__call__` method.
To define your own metrics, you only need to extend the [Metric](continuous_eval/metrics/base.py#L23C7-L23C13) class implementing the `__call__` method.
Optional methods are `batch` (if it is possible to implement optimizations for batch processing) and `aggregate` (to aggregate metrics results over multiple samples_).

## Run evaluation on pipeline modules

Define modules in your pipeline and select corresponding metrics.

```python
from continuous_eval.eval import Module, ModuleOutput, Pipeline, Dataset
from continuous_eval.metrics.retrieval import PrecisionRecallF1, RankedRetrievalMetrics
Expand Down Expand Up @@ -184,6 +193,7 @@ llm = Module(
)

pipeline = Pipeline([retriever, reranker, llm], dataset=dataset)
print(pipeline.graph_repr()) # optional: visualize the pipeline
```

Now you can run the evaluation on your pipeline
Expand All @@ -204,7 +214,7 @@ To **log** the results you just need to call the `eval_manager.log` method with
eval_manager.log("answer_generator", response)
```

the evaluator manager also offers
The evaluator manager also offers

- `eval_manager.run_metrics()` to run all the metrics defined in the pipeline
- `eval_manager.run_tests()` to run the tests defined in the pipeline (see the documentation [docs](docs.relari.ai) for more details)
Expand Down
Binary file modified docs/public/module-level-eval.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 88098ec

Please sign in to comment.