open-compass · gaotongxiao · Aug 31, 2023 · Aug 22, 2023 · Aug 22, 2023 · Aug 25, 2023
diff --git a/README.md b/README.md
@@ -325,9 +325,36 @@ Some third-party features, like Humaneval and Llama, may require additional step
 
 ## 🏗️ ️Evaluation
 
-Make sure you have installed OpenCompass correctly and prepared your datasets according to the above steps. Please read the [Quick Start](https://opencompass.readthedocs.io/en/latest/get_started.html#quick-start) to learn how to run an evaluation task.
+After ensuring that OpenCompass is installed correctly according to the above steps and the datasets are prepared, you can evaluate the performance of the LLaMA-7b model on the MMLU and C-Eval datasets using the following command:
 
-For more tutorials, please check our [Documentation](https://opencompass.readthedocs.io/en/latest/index.html).
+```bash
+python run.py --models hf_llama_7b --datasets mmlu_ppl ceval_ppl
+```
+
+OpenCompass has predefined configurations for many models and datasets. You can list all available model and dataset configurations using the [tools](./docs/en/tools.md#list-configs).
+
+```bash
+# List all configurations
+python tools/list_configs.py
+# List all configurations related to llama and mmlu
+python tools/list_configs.py llama mmlu
+```
+
+You can also evaluate other HuggingFace models via command line. Taking LLaMA-7b as an example:
+
+```bash
+python run.py --datasets ceval_ppl mmlu_ppl \
+--hf-path huggyllama/llama-7b \  # HuggingFace model path
+--model-kwargs device_map='auto' \  # Arguments for model construction
+--tokenizer-kwargs padding_side='left' truncation='left' use_fast=False \  # Arguments for tokenizer construction
+--max-out-len 100 \  # Maximum number of tokens generated
+--max-seq-len 2048 \  # Maximum sequence length the model can accept
+--batch-size 8 \  # Batch size
+--no-batch-padding \  # Don't enable batch padding, infer through for loop to avoid performance loss
+--num-gpus 1  # Number of required GPUs
+```
+
+Through the command line or configuration files, OpenCompass also supports evaluating APIs or custom models, as well as more diversified evaluation strategies. Please read the [Quick Start](https://opencompass.readthedocs.io/en/latest/get_started.html) to learn how to run an evaluation task.
 
 ## 🔜 Roadmap
 

diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -326,7 +326,36 @@ unzip OpenCompassData.zip
 
 ## 🏗️ ️评测
 
-确保按照上述步骤正确安装 OpenCompass 并准备好数据集后，请阅读[快速上手](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html#id3)了解如何运行一个评测任务。
+确保按照上述步骤正确安装 OpenCompass 并准备好数据集后，可以通过以下命令评测 LLaMA-7b 模型在 MMLU 和 C-Eval 数据集上的性能：
+
+```bash
+python run.py --models hf_llama_7b --datasets mmlu_ppl ceval_ppl
+```
+
+OpenCompass 预定义了许多模型和数据集的配置，你可以通过 [工具](./docs/zh_cn/tools.md#ListConfigs) 列出所有可用的模型和数据集配置。
+
+```bash
+# 列出所有配置
+python tools/list_configs.py
+# 列出所有跟 llama 及 mmlu 相关的配置
+python tools/list_configs.py llama mmlu
+```
+
+你也可以通过命令行去评测其它 HuggingFace 模型。同样以 LLaMA-7b 为例：
+
+```bash
+python run.py --datasets ceval_ppl mmlu_ppl \
+--hf-path huggyllama/llama-7b \  # HuggingFace 模型地址
+--model-kwargs device_map='auto' \  # 构造 model 的参数
+--tokenizer-kwargs padding_side='left' truncation='left' use_fast=False \  # 构造 tokenizer 的参数
+--max-out-len 100 \  # 最长生成 token 数
+--max-seq-len 2048 \  # 模型能接受的最大序列长度
+--batch-size 8 \  # 批次大小
+--no-batch-padding \  # 不打开 batch padding，通过 for loop 推理，避免精度损失
+--num-gpus 1  # 所需 gpu 数
+```
+
+通过命令行或配置文件，OpenCompass 还支持评测 API 或自定义模型，以及更多样化的评测策略。请阅读[快速上手](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html#id3)了解如何运行一个评测任务。
 
 更多教程请查看我们的[文档](https://opencompass.readthedocs.io/zh_CN/latest/index.html)。
 

diff --git a/configs/eval_demo.py b/configs/eval_demo.py
@@ -1,51 +1,10 @@
 from mmengine.config import read_base
 
 with read_base():
-    from .datasets.winograd.winograd_ppl import winograd_datasets
     from .datasets.siqa.siqa_gen import siqa_datasets
+    from .datasets.winograd.winograd_ppl import winograd_datasets
+    from .models.hf_opt_125m import opt125m
+    from .models.hf_opt_350m import opt350m
 
 datasets = [*siqa_datasets, *winograd_datasets]
-
-from opencompass.models import HuggingFaceCausalLM
-
-# OPT-350M
-opt350m = dict(
-       type=HuggingFaceCausalLM,
-       # the folowing are HuggingFaceCausalLM init parameters
-       path='facebook/opt-350m',
-       tokenizer_path='facebook/opt-350m',
-       tokenizer_kwargs=dict(
-           padding_side='left',
-           truncation_side='left',
-           proxies=None,
-           trust_remote_code=True),
-       model_kwargs=dict(device_map='auto'),
-       max_seq_len=2048,
-       # the folowing are not HuggingFaceCausalLM init parameters
-       abbr='opt350m',                    # Model abbreviation
-       max_out_len=100,                   # Maximum number of generated tokens          
-       batch_size=64,
-       run_cfg=dict(num_gpus=1),    # Run configuration for specifying resource requirements
-    )
-
-# OPT-125M
-opt125m = dict(
-       type=HuggingFaceCausalLM,
-       # the folowing are HuggingFaceCausalLM init parameters
-       path='facebook/opt-125m',
-       tokenizer_path='facebook/opt-125m',
-       tokenizer_kwargs=dict(
-           padding_side='left',
-           truncation_side='left',
-           proxies=None,
-           trust_remote_code=True),
-       model_kwargs=dict(device_map='auto'),
-       max_seq_len=2048,
-       # the folowing are not HuggingFaceCausalLM init parameters
-       abbr='opt125m',                # Model abbreviation
-       max_out_len=100,               # Maximum number of generated tokens
-       batch_size=128,
-       run_cfg=dict(num_gpus=1),   # Run configuration for specifying resource requirements
-    )
-
-models = [opt350m, opt125m]
+models = [opt125m, opt350m]
diff --git a/configs/models/hf_opt_125m.py b/configs/models/hf_opt_125m.py
@@ -0,0 +1,21 @@
+from opencompass.models import HuggingFaceCausalLM
+
+# OPT-125M
+opt125m = dict(
+       type=HuggingFaceCausalLM,
+       # the folowing are HuggingFaceCausalLM init parameters
+       path='facebook/opt-125m',
+       tokenizer_path='facebook/opt-125m',
+       tokenizer_kwargs=dict(
+           padding_side='left',
+           truncation_side='left',
+           proxies=None,
+           trust_remote_code=True),
+       model_kwargs=dict(device_map='auto'),
+       max_seq_len=2048,
+       # the folowing are not HuggingFaceCausalLM init parameters
+       abbr='opt125m',                # Model abbreviation
+       max_out_len=100,               # Maximum number of generated tokens
+       batch_size=128,
+       run_cfg=dict(num_gpus=1),   # Run configuration for specifying resource requirements
+    )
diff --git a/configs/models/hf_opt_350m.py b/configs/models/hf_opt_350m.py
@@ -0,0 +1,21 @@
+from opencompass.models import HuggingFaceCausalLM
+
+# OPT-350M
+opt350m = dict(
+       type=HuggingFaceCausalLM,
+       # the folowing are HuggingFaceCausalLM init parameters
+       path='facebook/opt-350m',
+       tokenizer_path='facebook/opt-350m',
+       tokenizer_kwargs=dict(
+           padding_side='left',
+           truncation_side='left',
+           proxies=None,
+           trust_remote_code=True),
+       model_kwargs=dict(device_map='auto'),
+       max_seq_len=2048,
+       # the folowing are not HuggingFaceCausalLM init parameters
+       abbr='opt350m',                    # Model abbreviation
+       max_out_len=100,                   # Maximum number of generated tokens          
+       batch_size=64,
+       run_cfg=dict(num_gpus=1),    # Run configuration for specifying resource requirements
+    )