### 视频数据处理方法

视频数据的评估方法主要分为以下几类：
- 运动分数
- 基于深度学习（DL）的分数
- 基于CLIP的分数

有关详细的配置示例，请参阅文件 `configs/video_process.yaml`。以下是一个简单的YAML配置文件格式（`./video_process.yaml`）：

```yaml
model_cache_path: '../ckpt' # Path to cache models
num_workers: 2
dependencies: [video]
save_path: './example.jsonl'
data:
  video:
    meta_data_path: 'demos/video_process/video5data.json' # Path to meta data (mainly for image or video data)
    data_path: 'demos/video_process/' # Path to dataset
    formatter: 'PureVideoFormatter' # Specify the data formatter
```

`data` 部分指定了数据文件或文件夹的路径及相关配置。

```yaml
processors:
  VideoResolutionFilter:
    min_width: 160
    max_width: 7680
    min_height: 120
    max_height: 4320
    scorer_args:
      num_workers: 4
      batch_size: 1
  VideoMotionFilter:                              # Keep samples with video motion scores within a specific range.
    min_score: 0.25                                         # the minimum motion score to keep samples
    max_score: 10                                     # the maximum motion score to keep samples
    scorer_args:
      batch_size: 1
      num_workers: 4
      min_score: 0.25                                         # the minimum motion score to keep samples
      max_score: 10000.0                                      # the maximum motion score to keep samples
      sampling_fps: 2                                         # the samplig rate of frames_per_second to compute optical flow
      size: null                                              # resize frames along the smaller edge before computing optical flow, or a sequence like (h, w)
      max_size: null                                          # maximum allowed for the longer edge of resized frames
      relative: false                                         # whether to normalize the optical flow magnitude to [0, 1], relative to the frame's diagonal length
      any_or_all: any                                         # keep this sample when any/all videos meet the filter condition
```

`processors` 部分定义了所使用的评分器的参数配置。

In [1]:
import sys
import os

dataflow_path = os.path.abspath(os.path.join(os.getcwd(), '..', '..')) 
sys.path.insert(0, dataflow_path)
sys.argv = ['notebook', '--config', './demos/video_process/video_process.yaml']

target_dir = os.path.abspath('../..')  # 获取目标目录的绝对路径
current_dir = os.getcwd()

if current_dir != target_dir:
    os.chdir(target_dir)  # 只有在当前目录不等于目标目录时才更改


import dataflow
from dataflow.utils import process



In [2]:
process()

VideoResolutionFilter {'min_width': 160, 'max_width': 7680, 'min_height': 120, 'max_height': 4320, 'scorer_args': {'num_workers': 4, 'batch_size': 1}, 'num_workers': 2, 'model_cache_dir': '../ckpt'}
Module dataflow.process.text.refiners has no attribute VideoResolutionFilter
Module dataflow.process.text.filters has no attribute VideoResolutionFilter
Module dataflow.process.text.deduplicators has no attribute VideoResolutionFilter
Module dataflow.process.image.filters has no attribute VideoResolutionFilter
Module dataflow.process.image.deduplicators has no attribute VideoResolutionFilter
{'width': array([960., 960., 960., 960., 960.]), 'height': array([540., 540., 540., 540., 540.])}
[0, 1, 2, 3, 4]
VideoMotionFilter {'min_score': 0.25, 'max_score': 10, 'scorer_args': {'batch_size': 1, 'num_workers': 4, 'min_score': 0.25, 'max_score': 10000.0, 'sampling_fps': 2, 'size': None, 'max_size': None, 'relative': False, 'any_or_all': 'any'}, 'num_workers': 2, 'model_cache_dir': '../ckpt'}
Modul