Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eval/vbench #312

Merged
merged 18 commits into from
May 13, 2024
Merged
1 change: 1 addition & 0 deletions .github/workflows/unit-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ jobs:
sudo apt-get install ffmpeg
python -m pip install --upgrade pip
pip install -v -e .[all]
pip install -v -e .[sandbox]
- name: Increase swapfile
run: |
df -h
Expand Down
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ RUN cat environments/* | xargs pip install --default-timeout 1000
# install data-juicer then
COPY . .
RUN pip install -v -e .[all]
RUN pip install -v -e .[sandbox]

# install 3rd-party system dependencies
RUN apt-get update && apt-get install ffmpeg libsm6 libxext6 -y
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,12 +178,12 @@ The dependency options are listed below:
| Tag | Description |
|------------------|----------------------------------------------------------------------------------------------|
| `.` or `.[mini]` | Install minimal dependencies for basic Data-Juicer. |
| `.[all]` | Install all optional dependencies (including minimal dependencies and all of the following). |
| `.[all]` | Install all dependencies except sandbox. |
| `.[sci]` | Install all dependencies for all OPs. |
| `.[sandbox]` | Install all dependencies for sandbox. |
| `.[dist]` | Install dependencies for distributed data processing. (Experimental) |
| `.[dev]` | Install dependencies for developing the package as contributors. |
| `.[tools]` | Install dependencies for dedicated tools, such as quality classifiers. |
| `.[sandbox]` | Install all dependencies for sandbox. |

### Using pip

Expand Down
4 changes: 2 additions & 2 deletions README_ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,12 +161,12 @@ pip install -v -e .[tools] # 安装部分工具库的依赖
| 标签 | 描述 |
|------------------|------------------------------|
| `.` 或者 `.[mini]` | 安装支持 Data-Juicer 基础功能的最小依赖项 |
| `.[all]` | 安装所有可选依赖项(包括最小依赖项以及下面所有依赖项) |
| `.[all]` | 安装除了沙盒实验以外的所有依赖项 |
| `.[sci]` | 安装所有算子的全量依赖 |
| `.[sandbox]` | 安装沙盒实验室的基础依赖 |
| `.[dist]` | 安装以分布式方式进行数据处理的依赖(实验性功能) |
| `.[dev]` | 安装作为贡献者开发 Data-Juicer 所需的依赖项 |
| `.[tools]` | 安装专用工具库(如质量分类器)所需的依赖项 |
| `.[sandbox]` | 安装沙盒实验室的基础依赖 |

### 使用 pip 安装

Expand Down
27 changes: 27 additions & 0 deletions configs/demo/sandbox/vbench_eval_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
type: vbench_video_evaluator

# The vbench prompts for video generation.
prompt_path: ./tools/mm_eval/vbench_metrics/VBench_full_info.json

# The path to the dir of generated videos
videos_path: /path/to/the/generated/videos

# The dir to store the eval results
result_dir: ./outputs/demo-sandbox/vbench_eval_results

# Give a name for this eval
eval_name: <eval_name>

# If true, load the required model for VBench from the cache path of evironment parameter VBENCH_CACHE_DIR
load_ckpt_from_local: false

# The dimensions considered in this eval.
# All dimensions include: ['subject_consistency', 'background_consistency', 'temporal_flickering',
# 'motion_smoothness', 'dynamic_degree', 'aesthetic_quality', 'imaging_quality', 'object_class',
# 'multiple_objects', 'human_action', 'color', 'spatial_relationship', 'scene', 'temporal_style',
# 'appearance_style', 'overall_consistency']
# NOTE: Current version of vbench in pypi lacks of a third party code for motion_smoothness.
# NOTE: Besides, when len(dimension_list) > 1, it would occur an error in video loading.
dimension_list:
- subject_consistency
- dynamic_degree
50 changes: 50 additions & 0 deletions data_juicer/core/sandbox/evaluators.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
import json
import os
import shutil

import torch
from loguru import logger
from vbench import VBench

from data_juicer import cuda_device_count
from tools.mm_eval.inception_metrics.calc_metrics_for_videos import \
calc_metrics
# TODO: cannot import tools correctly if DJ is installed by pypi. Maybe we need
Expand Down Expand Up @@ -101,6 +107,50 @@ def run(self, eval_type, eval_obj, **kwargs):
'To be refactored from gpt4v related operators/tools.')


class VBenchEvaluator(BaseEvaluator):

def get_score(self, result_path, dimension):
cur_result = json.load(open(result_path))
return cur_result[dimension][0]

def run(self, eval_type, eval_obj, **kwargs):
if eval_type == 'data':
prompt_path = self.eval_config.prompt_path
videos_path = self.eval_config.videos_path
result_dir = self.eval_config.result_dir
name = self.eval_config.eval_name
dimension_list = self.eval_config.dimension_list
local = self.eval_config.load_ckpt_from_local
if cuda_device_count() > 0:
device = torch.device('cuda')
else:
device = torch.device('cpu')
my_vbench = VBench(device, prompt_path, result_dir)
result_dict = {'mean_score': 0, 'detail': {}}
scores = []
for dimension in dimension_list:
logger.info(f'Evaluating for {dimension}')
my_vbench.evaluate(videos_path=videos_path,
name=f'{name}-{dimension}',
dimension_list=[dimension],
local=local)
score = self.get_score(result_path=os.path.join(
result_dir, f'{name}-{dimension}_eval_results.json'),
dimension=dimension)
result_dict['detail'][dimension] = score
scores.append(score)
result_dict['mean_score'] = sum(scores) / len(scores)

with open(os.path.join(result_dir, name + '_merged_results.json'),
'w') as f:
json.dump(result_dict, f)

return float(result_dict['mean_score'])
else:
raise NotImplementedError(
'Unsupported evaluation type: {}'.format(eval_type))


class LmHarnessEvaluator(BaseEvaluator):

def run(self, eval_type, eval_obj, **kwargs):
Expand Down
9 changes: 6 additions & 3 deletions data_juicer/core/sandbox/factories.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from data_juicer.core.sandbox.evaluators import (Gpt3QualityEvaluator,
InceptionEvaluator)
InceptionEvaluator,
VBenchEvaluator)
from data_juicer.core.sandbox.model_executors import (ModelscopeInferExecutor,
ModelscopeTrainExecutor)

Expand All @@ -16,9 +17,11 @@ def __call__(self, eval_cfg: dict = None, *args, **kwargs):
return None

evaluator = None
if eval_cfg.type == 'video_inception_evaluator':
if eval_cfg.type == 'vbench_video_evaluator':
evaluator = VBenchEvaluator(eval_cfg)
elif eval_cfg.type == 'video_inception_evaluator':
evaluator = InceptionEvaluator(eval_cfg)
if eval_cfg.type == 'dj_text_quality_classifier':
elif eval_cfg.type == 'dj_text_quality_classifier':
evaluator = Gpt3QualityEvaluator(eval_cfg)

return evaluator
Expand Down
4 changes: 2 additions & 2 deletions docs/Sandbox-ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,7 @@
```shell
pip install -v -e .[sandbox]

# 或者直接安装全量依赖
pip install -v -e .[all]
pip install detectron2@git+https://github.com/facebookresearch/detectron2.git@b7c7f4ba82192ff06f2bbb162b9f67b00ea55867
```

**注意**:一些沙盒的依赖还需要额外的领域依赖。例如,如果用户想要在沙盒中训练一个 ModelScope 平台的NLP模型,那可能需要为 `modelscope` 库
Expand Down Expand Up @@ -88,6 +87,7 @@ python tools/sandbox_starter.py --config configs/demo/sandbox/sandbox.yaml
| 组件 | 功能 | `run`方法说明 | 参考材料 |
| --- | --- | --- | --- |
| `Gpt3QualityEvaluator` | 使用Data-Juicer复现的GPT-3文本质量分类器对数据集进行质量评估 | <br />- `eval_type`:该评估器评估对象类型,目前只支持`"data"`<br />- `eval_obj`:待评估的数据集路径<br />- 返回值:待评估数据集样本质量打分均值<br /> | [Data-Juicer质量分类器工具集](https://github.com/modelscope/data-juicer/tree/main/tools/quality_classifier) |
| `VBenchEvaluator` | 使用VBench对基于prompt生成的视频进行多维度的评估 | <br />- `eval_type`:该评估器评估对象类型,目前只支持`"data"`<br />- `eval_obj`:未使用的参数<br />- 返回值:待评生成视频集各维度打分均值<br /> | [VBench论文](https://arxiv.org/abs/2311.17982) |
| `InceptionEvaluator` | 通过视频分类模型抽取特征测评生成的视频 | <br />- `eval_type`:该评估器评估对象类型,目前只支持`"data"`<br />- `eval_obj`:未使用的参数<br />- 返回值:根据给定的metric返回对应的字典<br /> | [Inception Metrics](https://github.com/NVlabs/long-video-gan/tree/main/metrics) |

- 模型训练工厂 -- ModelTrainExecutorFactory
Expand Down
4 changes: 2 additions & 2 deletions docs/Sandbox.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,7 @@ Before using sandbox, you might need to install sandbox-related third-party depe
```shell
pip install -v -e .[sandbox]

# or install all dependencies
pip install -v -e .[all]
pip install detectron2@git+https://github.com/facebookresearch/detectron2.git@b7c7f4ba82192ff06f2bbb162b9f67b00ea55867
```

**NOTICE**: some sandbox-related dependencies require extra domain dependencies. For example, if users want to train an NLP model from ModelScope
Expand Down Expand Up @@ -88,6 +87,7 @@ The currently supported component factories and the components supported within
| Component | Function | Desc. of Method `run` | Reference Materials |
| --- | --- | --- | --- |
| `Gpt3QualityEvaluator` | Evaluate the quality of a dataset using the GPT-3 text quality classifier reproduced by Data-Juicer. | <br />- `eval_type`: The type of the object to be evaluated by the evaluator, currently only supports `"data"`.<br />- `eval_obj`: The path to the dataset to be evaluated.<br />- Return: The average quality score of the dataset samples.<br /> | [Data-Juicer Quality Classifier Toolkit](https://github.com/modelscope/data-juicer/tree/main/tools/quality_classifier) |
| `VBenchEvaluator` | Evaluate the generated videos according to given prompts in multi dimensions | <br />- `eval_type`: The type of the object to be evaluated by the evaluator, currently only supports `"data"`<br />- `eval_obj`: A useless parameter<br />- Return: The average score of generated videos in multi dimensions.<br /> | [VBench paper](https://arxiv.org/abs/2311.17982) |
| `InceptionEvaluator` | Evaluate the generated videos by features extracted from video classification models. | <br />- `eval_type`: The type of the object to be evaluated by the evaluator, currently only supports `"data"`<br />- `eval_obj`: A useless parameter<br />- Return: A dictionary of scores in the given metric. <br /> | [Inception Metrics](https://github.com/NVlabs/long-video-gan/tree/main/metrics) |

- ModelTrainExecutorFactory
Expand Down
5 changes: 5 additions & 0 deletions environments/sandbox_requires.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
torch>=1.11.0,<2.0.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里 <2.0.0 是 vbench 强制要求的吗?比赛中提供的 docker 是 torch 2.2.0

wandb
fire
pyspark
# vbench-related
vbench
# modelscope-related
modelscope
2 changes: 1 addition & 1 deletion environments/science_requires.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
torch>=1.11.0
torch>=1.11.0,<2.0.0
torchaudio
easyocr
fasttext-wheel
Expand Down
3 changes: 1 addition & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,9 @@ def get_install_requirements(require_f_paths, env_dir='environments'):
'tools':
get_install_requirements(
['preprocess_requires.txt', 'quality_classifier_requires.txt']),
'sandbox':
get_install_requirements(['sandbox_requires.txt']),
}
extra_requires['all'] = [v for v in extra_requires.values()]
extra_requires['sandbox'] = get_install_requirements(['sandbox_requires.txt'])

with open('data_juicer/__init__.py', 'r') as f:
version = re.search(r'^__version__\s*=\s*[\'"]([^\'"]*)[\'"]', f.read(),
Expand Down
3 changes: 3 additions & 0 deletions tools/mm_eval/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
VBench from the paper "VBench: Comprehensive Benchmark Suite for Video Generative Models".

Please refer to [GitHub](https://github.com/Vchitect/VBench) for more detail.
3 changes: 3 additions & 0 deletions tools/mm_eval/README_ZH.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
VBench来自paper:"VBench: Comprehensive Benchmark Suite for Video Generative Models"。

请跳转[GitHub](https://github.com/Vchitect/VBench)查看更多信息。
Loading
Loading