Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Update docs for new entry script #246

Merged
merged 7 commits into from
Aug 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 29 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -325,9 +325,36 @@ Some third-party features, like Humaneval and Llama, may require additional step

## 🏗️ ️Evaluation

Make sure you have installed OpenCompass correctly and prepared your datasets according to the above steps. Please read the [Quick Start](https://opencompass.readthedocs.io/en/latest/get_started.html#quick-start) to learn how to run an evaluation task.
After ensuring that OpenCompass is installed correctly according to the above steps and the datasets are prepared, you can evaluate the performance of the LLaMA-7b model on the MMLU and C-Eval datasets using the following command:

For more tutorials, please check our [Documentation](https://opencompass.readthedocs.io/en/latest/index.html).
```bash
python run.py --models hf_llama_7b --datasets mmlu_ppl ceval_ppl
```

OpenCompass has predefined configurations for many models and datasets. You can list all available model and dataset configurations using the [tools](./docs/en/tools.md#list-configs).

```bash
# List all configurations
python tools/list_configs.py
# List all configurations related to llama and mmlu
python tools/list_configs.py llama mmlu
```

You can also evaluate other HuggingFace models via command line. Taking LLaMA-7b as an example:

```bash
python run.py --datasets ceval_ppl mmlu_ppl \
--hf-path huggyllama/llama-7b \ # HuggingFace model path
--model-kwargs device_map='auto' \ # Arguments for model construction
--tokenizer-kwargs padding_side='left' truncation='left' use_fast=False \ # Arguments for tokenizer construction
--max-out-len 100 \ # Maximum number of tokens generated
--max-seq-len 2048 \ # Maximum sequence length the model can accept
--batch-size 8 \ # Batch size
--no-batch-padding \ # Don't enable batch padding, infer through for loop to avoid performance loss
--num-gpus 1 # Number of required GPUs
```

Through the command line or configuration files, OpenCompass also supports evaluating APIs or custom models, as well as more diversified evaluation strategies. Please read the [Quick Start](https://opencompass.readthedocs.io/en/latest/get_started.html) to learn how to run an evaluation task.

## 🔜 Roadmap

Expand Down
31 changes: 30 additions & 1 deletion README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -326,7 +326,36 @@ unzip OpenCompassData.zip

## 🏗️ ️评测

确保按照上述步骤正确安装 OpenCompass 并准备好数据集后,请阅读[快速上手](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html#id3)了解如何运行一个评测任务。
确保按照上述步骤正确安装 OpenCompass 并准备好数据集后,可以通过以下命令评测 LLaMA-7b 模型在 MMLU 和 C-Eval 数据集上的性能:

```bash
python run.py --models hf_llama_7b --datasets mmlu_ppl ceval_ppl
```

OpenCompass 预定义了许多模型和数据集的配置,你可以通过 [工具](./docs/zh_cn/tools.md#ListConfigs) 列出所有可用的模型和数据集配置。

```bash
# 列出所有配置
python tools/list_configs.py
# 列出所有跟 llama 及 mmlu 相关的配置
python tools/list_configs.py llama mmlu
```

你也可以通过命令行去评测其它 HuggingFace 模型。同样以 LLaMA-7b 为例:

```bash
python run.py --datasets ceval_ppl mmlu_ppl \
--hf-path huggyllama/llama-7b \ # HuggingFace 模型地址
--model-kwargs device_map='auto' \ # 构造 model 的参数
--tokenizer-kwargs padding_side='left' truncation='left' use_fast=False \ # 构造 tokenizer 的参数
--max-out-len 100 \ # 最长生成 token 数
--max-seq-len 2048 \ # 模型能接受的最大序列长度
--batch-size 8 \ # 批次大小
--no-batch-padding \ # 不打开 batch padding,通过 for loop 推理,避免精度损失
--num-gpus 1 # 所需 gpu 数
```

通过命令行或配置文件,OpenCompass 还支持评测 API 或自定义模型,以及更多样化的评测策略。请阅读[快速上手](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html#id3)了解如何运行一个评测任务。

更多教程请查看我们的[文档](https://opencompass.readthedocs.io/zh_CN/latest/index.html)。

Expand Down
49 changes: 4 additions & 45 deletions configs/eval_demo.py
Original file line number Diff line number Diff line change
@@ -1,51 +1,10 @@
from mmengine.config import read_base

with read_base():
from .datasets.winograd.winograd_ppl import winograd_datasets
from .datasets.siqa.siqa_gen import siqa_datasets
from .datasets.winograd.winograd_ppl import winograd_datasets
from .models.hf_opt_125m import opt125m
from .models.hf_opt_350m import opt350m

datasets = [*siqa_datasets, *winograd_datasets]

from opencompass.models import HuggingFaceCausalLM

# OPT-350M
opt350m = dict(
type=HuggingFaceCausalLM,
# the folowing are HuggingFaceCausalLM init parameters
path='facebook/opt-350m',
tokenizer_path='facebook/opt-350m',
tokenizer_kwargs=dict(
padding_side='left',
truncation_side='left',
proxies=None,
trust_remote_code=True),
model_kwargs=dict(device_map='auto'),
max_seq_len=2048,
# the folowing are not HuggingFaceCausalLM init parameters
abbr='opt350m', # Model abbreviation
max_out_len=100, # Maximum number of generated tokens
batch_size=64,
run_cfg=dict(num_gpus=1), # Run configuration for specifying resource requirements
)

# OPT-125M
opt125m = dict(
type=HuggingFaceCausalLM,
# the folowing are HuggingFaceCausalLM init parameters
path='facebook/opt-125m',
tokenizer_path='facebook/opt-125m',
tokenizer_kwargs=dict(
padding_side='left',
truncation_side='left',
proxies=None,
trust_remote_code=True),
model_kwargs=dict(device_map='auto'),
max_seq_len=2048,
# the folowing are not HuggingFaceCausalLM init parameters
abbr='opt125m', # Model abbreviation
max_out_len=100, # Maximum number of generated tokens
batch_size=128,
run_cfg=dict(num_gpus=1), # Run configuration for specifying resource requirements
)

models = [opt350m, opt125m]
models = [opt125m, opt350m]
21 changes: 21 additions & 0 deletions configs/models/hf_opt_125m.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
from opencompass.models import HuggingFaceCausalLM

# OPT-125M
opt125m = dict(
type=HuggingFaceCausalLM,
# the folowing are HuggingFaceCausalLM init parameters
path='facebook/opt-125m',
tokenizer_path='facebook/opt-125m',
tokenizer_kwargs=dict(
padding_side='left',
truncation_side='left',
proxies=None,
trust_remote_code=True),
model_kwargs=dict(device_map='auto'),
max_seq_len=2048,
# the folowing are not HuggingFaceCausalLM init parameters
abbr='opt125m', # Model abbreviation
max_out_len=100, # Maximum number of generated tokens
batch_size=128,
run_cfg=dict(num_gpus=1), # Run configuration for specifying resource requirements
)
21 changes: 21 additions & 0 deletions configs/models/hf_opt_350m.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
from opencompass.models import HuggingFaceCausalLM

# OPT-350M
opt350m = dict(
type=HuggingFaceCausalLM,
# the folowing are HuggingFaceCausalLM init parameters
path='facebook/opt-350m',
tokenizer_path='facebook/opt-350m',
tokenizer_kwargs=dict(
padding_side='left',
truncation_side='left',
proxies=None,
trust_remote_code=True),
model_kwargs=dict(device_map='auto'),
max_seq_len=2048,
# the folowing are not HuggingFaceCausalLM init parameters
abbr='opt350m', # Model abbreviation
max_out_len=100, # Maximum number of generated tokens
batch_size=64,
run_cfg=dict(num_gpus=1), # Run configuration for specifying resource requirements
)
Loading