RecLM- eval by LINJH00 · Pull Request #104 · microsoft/RecAI

LINJH00 · 2025-09-11T08:02:28Z

Description

New Tasks Added
Two new tasks have been added: cf_ranking_mc and seq_ranking_mc, along with two metrics:
acc@1: computes the accuracy of the current evaluation task
none_ratio: the proportion of cases that cannot be recognized by our defined rules
2.change the task of the ranking
Checklist:

[√] I have added description accordingly.
[√] This PR is being made to dev branch AND NOT TO main branch.

Leavingseason · 2025-09-17T02:40:24Z

-```
+```
+
+## Amazon_Fashion · Ranking Evaluation (1000 samples)


可以在RecLM-eval 根目录新建一个“examples” folder，把运行这个结果的shell 脚本放在这里，方便别人复现结果。比如 "amazon_fashion_ranking_evalaution.sh"

现在看起来只有Ranking的结果？可以把 cf_ranking_mc 和seq_ranking_mc 的两个Table结果也放上去。

已经添加cf_ranking_mc 和seq_ranking_mc 的两个Table结果上去，关于 ”examples“ 这个指的是要把我们测评的数据备份一份上去吗？

Leavingseason · 2025-09-17T02:44:19Z

 {"text-embedding-3-small": {"input": 0.02, "output": 0}}
 {"text-embedding-3-large": {"input": 0.13, "output": 0}}
 {"ada v2": {"input": 0.1, "output": 0}}
+{"gpt-4.1": {"input": 3.0, "output": 12.0}}


这个文件的价格都按照openai 官网的价格更新一下吧？大部分都过时了。
chatgpt-4o-latest 这一行可以去掉，不知道是指那个model。
只需要保留 gpt-35-turbo， gpt-4.1， gpt-4o， gpt-4o-mini，text-embedding-3-small， text-embedding-3-large

已经更改

Leavingseason · 2025-09-17T02:51:21Z

+                    data["prompt"],
+                    tokenize=False,
+                    add_generation_prompt=True,
+                    enable_thinking=False  # <- turn off thought mode


对于非thinking model，比如Llama 3.1 8B，加入enable_thinking=False 参数会不会报异常？

enable_thinking=False 只有在支持该参数的 tokenizer（如 Qwen 系列）里才会生效；若模型 tokenizer 根本没有该关键字，Transformers 会抛 TypeError/ValueError/AttributeError。然后代码已用 try/except 捕获这类异常；一旦捕获，会回退到简单的字符串拼接方案，不再调用 apply_chat_template，从而完全规避参数不兼容问题。同时我也测试了带该参数和不带该参数的模型的测评效果是一样的

"若模型 tokenizer 根本没有该关键字，Transformers 会抛 TypeError/ValueError/AttributeError" 那就不对了。不能走这条路。我们得用apply_chat_template，不能自己拼接。你这里需要先用if条件+hasattr() 判断一下tokenizer里面有没有enable_thinking这个变量，再决定要不要传入。

Leavingseason · 2025-09-17T02:53:02Z

+        text = unicodedata.normalize('NFKD', text).encode('ascii', 'ignore').decode()
+        return re.sub(r'[^a-z0-9]', '', text.lower())
+
+    def _map_titles(answer_line: str, candidates: list[str]) -> list[str]:


添加一些英文注释

Leavingseason · 2025-09-17T02:55:20Z

+            # filter history
+            mapped_titles = [t for t in mapped_titles if t not in history]
+            # pad if necessary with remaining candidates
+            if len(mapped_titles) < 20:


大段的代码都添加一些英文注释。
这里为什么会有一个magic number 20？需要定义成变量，以便灵活支持不同数量的配置。

这里在脚本中添加了一个参数用来控制该变量，一开始设定为20是因为我们的指标的k考虑的是1、5、10、20，所以就设定到了20。

Leavingseason · 2025-09-17T02:56:06Z

@@ -15,11 +15,53 @@

 ## If you use customerized deployment names, don't forget to add them to this list
 OPENAI_MODELS = ["gpt-35-turbo", "gpt-3.5-turbo", "gpt-4", "gpt-4-turbo", 


这里也对应修改一下，只需要支持上面提到过的几个models

已经修改

Leavingseason · 2025-09-17T02:59:43Z

-            --bench-name steam \
-            --model_path_or_name NousResearch/Hermes-3-Llama-3.1-8B \
+            --bench-name "${DATASETS[@]}" \
+            --model_path_or_name /home/data/model/qwen3-8B \


如果不是自己finetuned过的model的话，需要用原始huggingface的地址，比如这里用“Qwen/Qwen3-8B”

已经修改

Leavingseason · 2025-09-17T03:01:05Z

+# Optional acceleration
+xformers==0.0.31
+triton==3.3.1      # CUDA kernels for xformers
+# flash-attn 需要在已有 torch 环境中手动安装；请在完成 requirements 安装后运行：


中文都需要翻译成英文

已经修改

* change the task of ranking and add two tasks. * new change * new changes * new changes * change something here Co-authored-by: LINJH00 <2020043053@email.szu.edu.cn>

LINJH00 added 2 commits September 10, 2025 16:20

change the task of ranking and add two tasks.

9b281c0

new change

7a808db

Leavingseason suggested changes Sep 17, 2025

View reviewed changes

LINJH00 added 3 commits September 17, 2025 22:38

new changes

d1c9314

new changes

ea9bb80

change something here

d2d0a57

Leavingseason merged commit 35939d5 into microsoft:dev Sep 28, 2025
1 check passed

		@@ -15,11 +15,53 @@

		## If you use customerized deployment names, don't forget to add them to this list
		OPENAI_MODELS = ["gpt-35-turbo", "gpt-3.5-turbo", "gpt-4", "gpt-4-turbo",

Conversation

LINJH00 commented Sep 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Leavingseason Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Leavingseason Sep 18, 2025 •

edited

Loading