Feat:Add RAG Benchmark method #1193

YangQianli92 · 2024-04-15T07:08:30Z

Features

New MetaGPT-RAG assessment module, involving RougL, Bleu, Recall, Hit Rate, MRR and other assessment indicators.
Feel free to review the effects of the different modules of RAG.
Support customized evaluation dataset, please follow the sample provided by us to modify the structure can be.
Added Reranker support for Cohere, FlagEmbedding.
Based on the above work, we have evaluated the various components of MetaGPT, and some of the settings can be viewed in Figure
- LLM: chatgpt-3.5-1106-turbo
- Embedding: text-embedding-3-small
- chunk_size: 256
- chunk_overleap: 0
- similarity_top_k: 5
- ranker_top_n: 3

codecov-commenter · 2024-04-15T07:13:53Z

Codecov Report

Attention: Patch coverage is 8.41121% with 98 lines in your changes are missing coverage. Please review.

Project coverage is 70.26%. Comparing base (933d6c1) to head (debe6b0).
Report is 22 commits behind head on main.

Files	Patch %	Lines
metagpt/rag/benchmark/base.py	0.00%	86 Missing ⚠️
metagpt/rag/factories/ranker.py	16.66%	10 Missing ⚠️
metagpt/rag/benchmark/__init__.py	0.00%	2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1193      +/-   ##
==========================================
- Coverage   70.60%   70.26%   -0.34%     
==========================================
  Files         314      316       +2     
  Lines       18714    18821     +107     
==========================================
+ Hits        13213    13225      +12     
- Misses       5501     5596      +95

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

metagpt/rag/factories/ranker.py

metagpt/rag/schema.py

metagpt/rag/factories/ranker.py

metagpt/rag/benchmark/base.py

examples/rag_bm.py

metagpt/rag/benchmark/base.py

geekan · 2024-04-22T07:04:41Z

/review

codiumai-pr-agent-pro · 2024-04-22T07:05:25Z

PR Review

⏱️ Estimated effort to review [1-5]	4, due to the extensive amount of new code across multiple files, involving complex functionalities such as data retrieval, ranking, and evaluation metrics. The PR integrates new features and configurations which require careful review to ensure correctness and performance.
🧪 Relevant tests	No
🔍 Possible issues	Possible Bug: The method `rag_evaluate_single` in `rag_bm.py` might return incorrect metrics if an exception is thrown and caught. The method catches all exceptions and returns a default metric set which might not accurately reflect the error state or provide meaningful feedback for debugging.
🔍 Possible issues	Performance Concern: The extensive use of synchronous file I/O operations and potentially large data processing in loops could lead to performance bottlenecks, especially noticeable when processing large datasets or when used in a high-latency network environment.
🔒 Security concerns	No

Code feedback:

relevant file	examples/rag_bm.py
suggestion	Consider implementing more granular exception handling in the `rag_evaluate_single` method to differentiate between different types of errors (e.g., network issues, data format errors) and handle them appropriately. This will improve the robustness and debuggability of the module. [important]
relevant line	except Exception as e:

relevant file	examples/rag_bm.py
suggestion	To enhance performance, consider using asynchronous file operations or a more efficient data handling mechanism to manage I/O operations, especially when loading or writing large datasets in the `rag_evaluate_pipeline` method. [important]
relevant line	write_json_file((EXAMPLE_BENCHMARK_PATH / dataset.name / "bm_result.json").as_posix(), results, "utf-8")

relevant file	metagpt/rag/benchmark/base.py
suggestion	Optimize the `compute_metric` method by caching results of expensive operations like `bleu_score` and `rougel_score` if the same responses and references are being evaluated multiple times. This can significantly reduce computation time in scenarios with repetitive data. [medium]
relevant line	bleu_avg, bleu1, bleu2, bleu3, bleu4 = self.bleu_score(response, reference)

relevant file	examples/rag_bm.py
suggestion	Refactor the `rag_evaluate_pipeline` method to break down its functionality into smaller, more manageable functions. This improves modularity and makes the code easier to maintain and test. [medium]
relevant line	async def rag_evaluate_pipeline(self, dataset_name: list[str] = ["all"]):

✨ Review tool usage guide:

Overview:
The review tool scans the PR code changes, and generates a PR review which includes several types of feedbacks, such as possible PR issues, security threats and relevant test in the PR. More feedbacks can be added by configuring the tool.

The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on any PR.

When commenting, to edit configurations related to the review tool (pr_reviewer section), use the following template:

/review --pr_reviewer.some_config1=... --pr_reviewer.some_config2=...

With a configuration file, use the following template:

[pr_reviewer]
some_config1=...
some_config2=...

See the review usage page for a comprehensive guide on using this tool.

better629 · 2024-04-24T10:55:27Z

lgtm

YangQianli92 · 2024-04-26T14:40:58Z

In the PR submitted above, there is a slight error in the MRR calculation of the Benchmark metrics, I have submitted another PR for fixing this bug, and all the results are recalculated after fixing the bug!
#1228

YangQianli92 added 12 commits April 15, 2024 14:54

Create dataset_info.json

5483cc6

Delete examples/rag_bm directory

048b97a

Create dataset_info.json

dd1d12d

Create answer.json

44eab88

Add files via upload

e8865ab

Create answer.json

458c588

Add files via upload

c8d0228

Add files via upload

dd46992

Add files via upload

20ab429

Create __init__.py

882b75e

Add files via upload

6a6ec84

Add files via upload

1b3f7fd

YangQianli92 had a problem deploying to unittest April 15, 2024 07:08 — with GitHub Actions Failure

better629 reviewed Apr 15, 2024

View reviewed changes

YangQianli92 mentioned this pull request Apr 16, 2024

Support for Cohere API #1197

Closed

Update base.py

a1b017a

YangQianli92 had a problem deploying to unittest April 16, 2024 03:14 — with GitHub Actions Failure

Merge branch 'geekan:main' into main

5460ca8

YangQianli92 had a problem deploying to unittest April 16, 2024 03:27 — with GitHub Actions Failure

Add files via upload

22b5611

YangQianli92 had a problem deploying to unittest April 16, 2024 03:28 — with GitHub Actions Failure

Add files via upload

211ba3d

YangQianli92 had a problem deploying to unittest April 16, 2024 03:32 — with GitHub Actions Failure

Update schema.py

2e3a73b

YangQianli92 had a problem deploying to unittest April 16, 2024 03:37 — with GitHub Actions Failure

Update base.py

4047caf

YangQianli92 had a problem deploying to unittest April 16, 2024 03:44 — with GitHub Actions Failure

Update __init__.py

23ab5ae

YangQianli92 had a problem deploying to unittest April 16, 2024 03:45 — with GitHub Actions Failure

YangQianli92 had a problem deploying to unittest April 18, 2024 03:42 — with GitHub Actions Failure

Add files via upload

d5d4511

YangQianli92 had a problem deploying to unittest April 18, 2024 03:44 — with GitHub Actions Failure

Add files via upload

4777fea

YangQianli92 had a problem deploying to unittest April 18, 2024 03:44 — with GitHub Actions Failure

Add files via upload

c094851

YangQianli92 had a problem deploying to unittest April 18, 2024 03:45 — with GitHub Actions Failure

Add files via upload

15d8a88

YangQianli92 had a problem deploying to unittest April 18, 2024 03:45 — with GitHub Actions Failure

Delete metagpt/rag/rankers/ranker.py

aa68e90

YangQianli92 had a problem deploying to unittest April 18, 2024 03:48 — with GitHub Actions Failure

Add files via upload

60dd3b0

YangQianli92 had a problem deploying to unittest April 18, 2024 03:48 — with GitHub Actions Failure

codiumai-pr-agent-pro bot added the Review effort [1-5]: 4 label Apr 22, 2024

Update dataset_info.json

95011ab

YangQianli92 had a problem deploying to unittest April 24, 2024 03:54 — with GitHub Actions Failure

Update dataset_info.json

d0614f1

YangQianli92 had a problem deploying to unittest April 24, 2024 03:54 — with GitHub Actions Failure

Create answer.json

51c7c8f

YangQianli92 had a problem deploying to unittest April 24, 2024 03:55 — with GitHub Actions Failure

Add files via upload

debe6b0

YangQianli92 had a problem deploying to unittest April 24, 2024 03:56 — with GitHub Actions Failure

Merge branch 'geekan:main' into main

9e4e32e

YangQianli92 had a problem deploying to unittest April 24, 2024 14:37 — with GitHub Actions Failure

geekan approved these changes Apr 25, 2024

View reviewed changes

geekan merged commit 2476672 into geekan:main Apr 25, 2024
0 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat:Add RAG Benchmark method #1193

Feat:Add RAG Benchmark method #1193

YangQianli92 commented Apr 15, 2024 •

edited

Loading

codecov-commenter commented Apr 15, 2024 •

edited

Loading

geekan commented Apr 22, 2024

codiumai-pr-agent-pro bot commented Apr 22, 2024

better629 commented Apr 24, 2024

YangQianli92 commented Apr 26, 2024 •

edited

Loading

Feat:Add RAG Benchmark method #1193

Feat:Add RAG Benchmark method #1193

Conversation

YangQianli92 commented Apr 15, 2024 • edited Loading

codecov-commenter commented Apr 15, 2024 • edited Loading

Codecov Report

geekan commented Apr 22, 2024

codiumai-pr-agent-pro bot commented Apr 22, 2024

PR Review

better629 commented Apr 24, 2024

YangQianli92 commented Apr 26, 2024 • edited Loading

YangQianli92 commented Apr 15, 2024 •

edited

Loading

codecov-commenter commented Apr 15, 2024 •

edited

Loading

YangQianli92 commented Apr 26, 2024 •

edited

Loading