Add scripts for g-leaderboard (GENIAC official evaluation) #31

hkiyomaru · 2024-08-28T08:34:11Z

What

Add scripts for evaluating LLMs using g-leaderboard (GENIAC official evaluation).

Related issues

#29

hkiyomaru · 2024-08-29T05:14:53Z

GENIAC の評価をローカルで回すためのスクリプトです．

@Taka008 児玉さんは自分で回せた方が都合が良いと思うので，動作確認をお願いしたいです．Azure OpenAI API の endpoint と key は @cr-liu さんに聞いてください．

@YumaTsuta llm-jp-eval-v1.3.1 のスクリプトを参考に作成しました．レビューお願いします．

evaluation/installers/g-leaderboard/scripts/run_g-leaderboard.sh

evaluation/installers/g-leaderboard/scripts/env_common.sh

evaluation/installers/g-leaderboard/scripts/run_g-leaderboard.sh

ytivy

@hkiyomaru レビューしました。
OPENAI_API_KEYが必要そうなのもあり、動作確認できておらず、またconfig周りはあまり確認できていません。

evaluation/installers/g-leaderboard/scripts/env_common.sh

ytivy · 2024-08-29T08:26:15Z

install.sh は wrapper欲しくなりますね。時間がった時にでもやりますか

Taka008 · 2024-08-29T08:26:46Z

動作確認は @YumaTsuta さんの修正が反映されたあとにこちらでやります

Co-authored-by: YumaTsuta <67862948+YumaTsuta@users.noreply.github.com>

evaluation/installers/g-leaderboard/README.md

hkiyomaru · 2024-08-29T09:03:48Z

@Taka008 修正終わったので動作確認お願いします．（手元では動くことを確認済みです）

ytivy · 2024-08-29T10:32:20Z

@hkiyomaru gpu関連の module loadが不要なことに気がついて、評価スクリプト (v1.4.0)の方を修正しています（動作確認済み）。同様に適用しても問題ないですが、その場合はお手数おかけします。

hkiyomaru · 2024-08-30T02:28:42Z

llm-jp-eval v1.4.0 の変更を反映しました．動作確認済みです．

Taka008 · 2024-08-30T10:39:05Z

v3 シリーズ用の resources/config_base.yaml の見本はありますか？

hkiyomaru · 2024-08-31T07:56:51Z

v3 シリーズ用の resources/config_base.yaml の見本はありますか？

今のものが v3 シリーズ用のつもりです．モデルサイズに関しては 172B 想定で，MT Bench 評価時に 8GPUs を確保するのをデフォルトにしています．

Taka008 · 2024-08-31T08:45:59Z

1.7B v3 で試しに動かしてみましたが，空の回答が結構あったのでなにか間違えたのかと思っていました
こんなものなんですか？

hkiyomaru · 2024-08-31T15:40:44Z

空の回答，そんなにありますか？Jaster 4-shot は空回答 0 件ですし，問題なさそうに見えます．

Taka008 · 2024-08-31T15:43:20Z

MT-bench を見てました

hkiyomaru · 2024-08-31T23:47:43Z

172B-instruct (55k steps) は無回答問題はありませんでした．

https://wandb.ai/nii-geniac/llm-leaderboard/runs/8xrr9dqg

いきなり EOS を吐いているとは考えづらいので，会話の separator (###) を吐いて出力が truncate されているとかでしょうか．いずれにそても，ベースモデルに指示追従能力がないことに由来する問題な気がします．

Taka008 · 2024-09-01T10:10:02Z

チューニング済みの 13B v3 exp4 を試しに回してみましたが，確かに問題なさそうです
https://wandb.ai/llm-jp-eval/test/runs/wouna8fd

Taka008 · 2024-09-03T01:01:48Z

@hkiyomaru
mdx, sakura の両環境で動作確認が取れました．approve しておきました

hkiyomaru · 2024-09-03T01:41:08Z

マージします

hkiyomaru added 12 commits August 28, 2024 17:32

[wip] add g-leaderboard

5c6f082

add OpenAI-related environment variables

ab8acba

configure blended run

683fd4c

fix

d1a5060

fix indent

08f3fb3

fix

04130c0

use env command

820cba1

fix envvar name

1f38534

update readme

392fe00

update documentation

347f71a

update documentation

e36abfc

update documentation

93cc663

hkiyomaru requested review from Taka008 and ytivy August 29, 2024 05:09

hkiyomaru marked this pull request as ready for review August 29, 2024 05:09

hkiyomaru added 2 commits August 29, 2024 14:22

update readme

44ee980

fix mtbench.model_id

33d3827

ytivy reviewed Aug 29, 2024

View reviewed changes

evaluation/installers/g-leaderboard/scripts/run_g-leaderboard.sh Outdated Show resolved Hide resolved

ytivy reviewed Aug 29, 2024

View reviewed changes

evaluation/installers/g-leaderboard/scripts/env_common.sh Show resolved Hide resolved

ytivy reviewed Aug 29, 2024

View reviewed changes

evaluation/installers/g-leaderboard/scripts/run_g-leaderboard.sh Outdated Show resolved Hide resolved

ytivy reviewed Aug 29, 2024

View reviewed changes

evaluation/installers/g-leaderboard/scripts/env_common.sh Outdated Show resolved Hide resolved

hkiyomaru and others added 4 commits August 29, 2024 17:34

Update evaluation/installers/g-leaderboard/scripts/run_g-leaderboard.sh

5191baa

Co-authored-by: YumaTsuta <67862948+YumaTsuta@users.noreply.github.com>

Update evaluation/installers/g-leaderboard/scripts/env_common.sh

f11a2ee

Co-authored-by: YumaTsuta <67862948+YumaTsuta@users.noreply.github.com>

hardcode to use g-leaderboard branch

b39adaf

deploy blended run condig during installation

7f1fb58

ytivy reviewed Aug 29, 2024

View reviewed changes

evaluation/installers/g-leaderboard/README.md Outdated Show resolved Hide resolved

update readme

a69083b

remove env-specific process

c8050c1

Taka008 approved these changes Sep 3, 2024

View reviewed changes

hkiyomaru merged commit 68a5d31 into main Sep 3, 2024

hkiyomaru deleted the g-leaderboard branch September 3, 2024 01:41

Add scripts for g-leaderboard (GENIAC official evaluation) #31

Add scripts for g-leaderboard (GENIAC official evaluation) #31

Uh oh!

Conversation

hkiyomaru commented Aug 28, 2024

What

Related issues

Uh oh!

hkiyomaru commented Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ytivy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ytivy commented Aug 29, 2024

Uh oh!

Taka008 commented Aug 29, 2024

Uh oh!

Uh oh!

hkiyomaru commented Aug 29, 2024

Uh oh!

ytivy commented Aug 29, 2024

Uh oh!

hkiyomaru commented Aug 30, 2024

Uh oh!

Taka008 commented Aug 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hkiyomaru commented Aug 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Taka008 commented Aug 31, 2024

Uh oh!

hkiyomaru commented Aug 31, 2024

Uh oh!

Taka008 commented Aug 31, 2024

Uh oh!

hkiyomaru commented Aug 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Taka008 commented Sep 1, 2024

Uh oh!

Taka008 commented Sep 3, 2024

Uh oh!

hkiyomaru commented Sep 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hkiyomaru commented Aug 29, 2024 •

edited

Loading

Taka008 commented Aug 30, 2024 •

edited

Loading

hkiyomaru commented Aug 31, 2024 •

edited

Loading

hkiyomaru commented Aug 31, 2024 •

edited

Loading