Conversation
| ``` No newline at end of file | ||
| ``` | ||
|
|
||
| ## Amazon_Fashion · Ranking Evaluation (1000 samples) |
There was a problem hiding this comment.
可以在RecLM-eval 根目录新建一个“examples” folder,把运行这个结果的shell 脚本放在这里,方便别人复现结果。 比如 "amazon_fashion_ranking_evalaution.sh"
There was a problem hiding this comment.
现在看起来只有Ranking的结果? 可以把 cf_ranking_mc 和seq_ranking_mc 的两个Table结果也放上去。
There was a problem hiding this comment.
已经添加cf_ranking_mc 和seq_ranking_mc 的两个Table结果上去,关于 ”examples“ 这个指的是要把我们测评的数据备份一份上去吗?
| {"text-embedding-3-small": {"input": 0.02, "output": 0}} | ||
| {"text-embedding-3-large": {"input": 0.13, "output": 0}} | ||
| {"ada v2": {"input": 0.1, "output": 0}} | ||
| {"gpt-4.1": {"input": 3.0, "output": 12.0}} |
There was a problem hiding this comment.
这个文件的价格都按照openai 官网的价格更新一下吧? 大部分都过时了。
chatgpt-4o-latest 这一行可以去掉,不知道是指那个model。
只需要保留 gpt-35-turbo, gpt-4.1, gpt-4o, gpt-4o-mini,text-embedding-3-small, text-embedding-3-large
| data["prompt"], | ||
| tokenize=False, | ||
| add_generation_prompt=True, | ||
| enable_thinking=False # <- turn off thought mode |
There was a problem hiding this comment.
对于非thinking model,比如Llama 3.1 8B, 加入enable_thinking=False 参数会不会报异常?
There was a problem hiding this comment.
enable_thinking=False 只有在支持该参数的 tokenizer(如 Qwen 系列)里才会生效;若模型 tokenizer 根本没有该关键字,Transformers 会抛 TypeError/ValueError/AttributeError。然后代码已用 try/except 捕获这类异常;一旦捕获,会回退到简单的字符串拼接方案,不再调用 apply_chat_template,从而完全规避参数不兼容问题。同时我也测试了带该参数和不带该参数的模型的测评效果是一样的
There was a problem hiding this comment.
"若模型 tokenizer 根本没有该关键字,Transformers 会抛 TypeError/ValueError/AttributeError" 那就不对了。 不能走这条路。我们得用apply_chat_template,不能自己拼接。 你这里需要先用if条件+hasattr() 判断一下tokenizer里面有没有enable_thinking这个变量,再决定要不要传入。
| text = unicodedata.normalize('NFKD', text).encode('ascii', 'ignore').decode() | ||
| return re.sub(r'[^a-z0-9]', '', text.lower()) | ||
|
|
||
| def _map_titles(answer_line: str, candidates: list[str]) -> list[str]: |
| # filter history | ||
| mapped_titles = [t for t in mapped_titles if t not in history] | ||
| # pad if necessary with remaining candidates | ||
| if len(mapped_titles) < 20: |
There was a problem hiding this comment.
大段的代码都添加一些英文注释。
这里为什么会有一个magic number 20? 需要定义成变量,以便灵活支持不同数量的配置。
There was a problem hiding this comment.
这里在脚本中添加了一个参数用来控制该变量,一开始设定为20是因为我们的指标的k考虑的是1、5、10、20,所以就设定到了20。
| @@ -15,11 +15,53 @@ | |||
|
|
|||
| ## If you use customerized deployment names, don't forget to add them to this list | |||
| OPENAI_MODELS = ["gpt-35-turbo", "gpt-3.5-turbo", "gpt-4", "gpt-4-turbo", | |||
There was a problem hiding this comment.
这里也对应修改一下,只需要支持上面提到过的几个models
| --bench-name steam \ | ||
| --model_path_or_name NousResearch/Hermes-3-Llama-3.1-8B \ | ||
| --bench-name "${DATASETS[@]}" \ | ||
| --model_path_or_name /home/data/model/qwen3-8B \ |
There was a problem hiding this comment.
如果不是自己finetuned过的model的话,需要用原始huggingface的地址, 比如这里用“Qwen/Qwen3-8B”
| # Optional acceleration | ||
| xformers==0.0.31 | ||
| triton==3.3.1 # CUDA kernels for xformers | ||
| # flash-attn 需要在已有 torch 环境中手动安装;请在完成 requirements 安装后运行: |
* change the task of ranking and add two tasks. * new change * new changes * new changes * change something here Co-authored-by: LINJH00 <2020043053@email.szu.edu.cn>
Description
Two new tasks have been added: cf_ranking_mc and seq_ranking_mc, along with two metrics:
acc@1: computes the accuracy of the current evaluation task
none_ratio: the proportion of cases that cannot be recognized by our defined rules
2.change the task of the ranking
Checklist:
dev branchAND NOT TOmain branch.