Skip to content

Data Set Description Language Specification (新一代人工智能数据集描述语言DSDL)

License

Notifications You must be signed in to change notification settings

opendatalab/dsdl-docs

Repository files navigation

English | 简体中文

Introduction

DSDL (Data Set Description Language) is a new generation of AI data set description language, which aims to solve the problem of inconvenient use caused by the non-uniform format of AI data sets. The ultimate goal is to achieve interoperability between different tasks and different modal data in the future to promote the further development of AI.

Design Goals The design of DSDL is driven by three goals, namely generic, portable, extensible. We refer to these three goals together as GPE.
  • Generic This language aims to provide a unified representation standard for data in multiple fields of artificial intelligence, rather than being designed for a single field or task. It should be able to express data sets with different modalities and structures in a consistent format.

  • Portable Write once, distribute everywhere.

    Dataset descriptions can be widely distributed and exchanged, and used in different environments without modification of the source files. The achievement of this goal is crucial for creating an open and thriving ecosystem. To this end, we need to carefully examine the details of the design, and remove unnecessary dependencies on specific assumptions about the underlying facilities or organizations.

  • Extensible One should be able to extend the boundary of expression without modifying the core standard. For a programming language such as C++ or Python, its application boundaries can be significantly extended by libraries or packages, while the core language remains stable over a long period. Such libraries and packages form a rich ecosystem, making the language stay alive for a very long time.

Documentation

DSDL Specification and Tutorials

Citation

@misc{wang2024dsdl,
      title={DSDL: Data Set Description Language for Bridging Modalities and Tasks in AI Data}, 
      author={Bin Wang and Linke Ouyang and Fan Wu and Wenchang Ning and Xiao Han and Zhiyuan Zhao and Jiahui Peng and Yiying Jiang and Dahua Lin and Conghui He},
      year={2024},
      eprint={2405.18315},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

License

This project is released under the Apache 2.0 license

About

Data Set Description Language Specification (新一代人工智能数据集描述语言DSDL)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages