ARK

A GPU-driven system framework for scalable AI applications.

Pipelines	Build Status
Unit Tests (CUDA)
Unit Tests (ROCm)

NOTE (Nov 2023): ROCm unit tests will be replaced into an Azure pipeline in the future.

See Quick Start to quickly get started.

Overview

ARK is a deep learning framework especially designed for highly optimized performance over distributed GPUs. Specifically, ARK adopts a GPU-driven execution model, where the GPU autonomously schedule and execute both computation and communication without any CPU intervention.

ARK provides a set of APIs for users to express their distributed deep learning applications. ARK then automatically schedules a GPU-driven execution plan for the application, which generates a GPU kernel code called loop kernel. The loop kernel is a GPU kernel that contains a loop that iteratively executes the entire application, including both computation and communication. ARK then executes the loop kernel on the distributed GPUs.

Status & Roadmap

ARK is under active development and a part of its features will be added in a future release. The following describes key features of each version.

New in ARK v0.5 (Latest Release)

Integrate with MSCCL++
Removed dependency on gpudma
Add AMD CDNA3 architecture support
Support communication for AMD GPUs
Optimize OpGraph scheduling
Add a multi-GPU Llama2 example

See details from #168.

ARK v0.6 (TBU, Jan. 2024)

Overall performance optimization
Improve Python unit tests & code coverage

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Citations

ARK is a collaborative research initiative between KAIST and Microsoft Research. If you use this project in your research, please cite our NSDI'23 paper:

@inproceedings{HwangPSQCX23,
  author    = {Changho Hwang and
               KyoungSoo Park and
               Ran Shu and
               Xinyuan Qu and
               Peng Cheng and
               Yongqiang Xiong},
  title     = {ARK: GPU-driven Code Execution for Distributed Deep Learning},
  booktitle = {20th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 23)},
  year      = {2023},
  publisher = {{USENIX} Association},
}

Name		Name	Last commit message	Last commit date
Latest commit History 183 Commits
.azure-pipelines		.azure-pipelines
.github		.github
.vscode		.vscode
ark		ark
cmake		cmake
docker		docker
docs		docs
examples		examples
python		python
third_party		third_party
.clang-format		.clang-format
.codecov.yml		.codecov.yml
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
CITATION.cff		CITATION.cff
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ARK

Overview

Status & Roadmap

New in ARK v0.5 (Latest Release)

ARK v0.6 (TBU, Jan. 2024)

Contributing

Trademarks

Citations

About

Releases 7

Packages 1

Contributors 5

Languages

License

microsoft/ark

Folders and files

Latest commit

History

Repository files navigation

ARK

Overview

Status & Roadmap

New in ARK v0.5 (Latest Release)

ARK v0.6 (TBU, Jan. 2024)

Contributing

Trademarks

Citations

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 7

Packages 1

Contributors 5

Languages