ArcticTraining v0.0.3
New Features
- Fastest Speculative Decoding in vLLM with Arctic Inference and Arctic Training
- Snowflake Arctic Embed Joins ArcticTraining: Simple And Scalable Embedding Model Training
- DPO Trainer
What's Changed
- Refactor Data Loading by @sfc-gh-mwyatt in #41
- Improve data loading error message by @sfc-gh-mwyatt in #55
- Add unit tests for DataFactory by @sfc-gh-mwyatt in #54
- Refactor class registry by @sfc-gh-mwyatt in #56
- Switch to all fully qualified imports by @sfc-gh-mwyatt in #59
- Add Wandb callback by @sfc-gh-mwyatt in #60
- add additional W&B args by @sfc-gh-mwyatt in #61
- full deprecate scheduler.lr setting by @sfc-gh-mwyatt in #64
- Move to smaller CI model by @sfc-gh-mwyatt in #67
- info log the caching path by @sfc-gh-sbekman in #68
- Add PEFT support by @sfc-gh-mwyatt in #66
- Add tokenizer name as input to cache path creation by @sfc-gh-mwyatt in #69
- Use cached environment in actions workflows by @sfc-gh-mwyatt in #70
- Fix error with unit test env caching by @sfc-gh-mwyatt in #71
- Fix cache path collision by @sfc-gh-mwyatt in #76
- Make arctic training logging manage all logging by @sfc-gh-lmerrick in #53
- Add ZeRO3 checkpoint support for PEFT models by @sfc-gh-mwyatt in #73
- remove redundant code by @sfc-gh-sbekman in #83
- Create STYLE_GUIDE.md by @sfc-gh-sbekman in #77
- add Makefile by @sfc-gh-sbekman in #78
- cpu adam support by @sfc-gh-jrasley in #89
- add basic step timer by @sfc-gh-jrasley in #91
- Multi-Replica Generation (PR 1/4) by @sfc-gh-srajbhandari in #95
- allow newer transformers versions by @sfc-gh-jrasley in #96
- Add help message about DeepSpeed args to ArcticTraining launcher by @sfc-gh-mwyatt in #100
- Error on yaml config duplicate keys by @sfc-gh-mwyatt in #99
- fix caller import by @sfc-gh-bzhai in #103
- Resolve data cache path collision bug by @sfc-gh-mwyatt in #102
- Refactoring data generation for Spec Decoding to use the new Multi-Re… by @sfc-gh-srajbhandari in #104
- Arctic Embed in Arctic Training! by @sfc-gh-lmerrick in #107
- Fix include and exclude git LFS example code by @sfc-gh-lmerrick in #108
- Allow local data files for huggingface data sources by @sfc-gh-mwyatt in #106
- require
liger-kernel>=0.5.5by @sfc-gh-sbekman in #109 - Switch to DistributedSampler by @sfc-gh-mwyatt in #105
- add news section by @sfc-gh-jrasley in #110
- Fix for cache path arg included values by @sfc-gh-mwyatt in #111
- Add support for user-passed data splits by @sfc-gh-mwyatt in #112
- Add dev flag to repeat data samples by @sfc-gh-mwyatt in #113
- ExCoT-DPO project by @sfc-gh-bzhai in #65
- update links by @sfc-gh-jrasley in #119
- Update DPO liger loss check by @sfc-gh-mwyatt in #118
- Add
cache_fs_typefield by @sfc-gh-mwyatt in #121 - add arxiv link in tutorial by @sfc-gh-jrasley in #120
- update readme eval by @sfc-gh-bzhai in #124
- elevate projects and add older news by @sfc-gh-jrasley in #123
- Add training metrics logging by @sfc-gh-mwyatt in #122
- Training metrics output to W&B by @sfc-gh-mwyatt in #125
- Increase max line width to 119 by @sfc-gh-mwyatt in #128
- ExCoT models are public now by @sfc-gh-jrasley in #130
- bump to v0.0.3 by @sfc-gh-jrasley in #129
- Small improvements / fixes for metrics logging by @sfc-gh-mwyatt in #127
- Fix for Liger model callback warning by @sfc-gh-mwyatt in #131
- fix rank by @sfc-gh-sbekman in #132
- another multi-node rank issue by @sfc-gh-sbekman in #133
- metrics: seqlen report by @sfc-gh-sbekman in #134
- human_format_base2_number change to 2 decimals by @sfc-gh-sbekman in #136
- Sparse Attention Recipe by @sfc-gh-srajbhandari in #135
- Update cli.py by @sfc-gh-zhyao in #139
- Fix
clean_files_older_than_n_daysnot working caused by Azure API change by @sfc-gh-caxu in #140 - Add
div_lengthconfig to SFTDataFactory by @sfc-gh-mwyatt in #142 - metrics: better seconds formatting by @sfc-gh-sbekman in #137
- Add Qwen2-SwiftKV and reorganize swiftkv project by @sfc-gh-aqiao in #145
- fix multi-epoch training by @sfc-gh-aqiao in #147
- project: authors + stableness + py versions by @sfc-gh-sbekman in #151
- Change python version for workflows by @sfc-gh-mwyatt in #154
- HfDeepSpeedConfig import by @sfc-gh-sbekman in #150
- [Makefile] new optional helper to remove unused imports by @sfc-gh-sbekman in #148
- Fix for license check hook adding multiple license by @sfc-gh-mwyatt in #156
- train iter log: add mem metrics by @sfc-gh-sbekman in #153
- an optional mem profiler by @sfc-gh-sbekman in #152
- Fix SwiftKV README by @sfc-gh-aqiao in #164
- Parallel SFT Data Packing by @sfc-gh-mwyatt in #162
- Settable pad length for SFTDataFactory by @sfc-gh-mwyatt in #163
- Human-friendly number parsing by @sfc-gh-mwyatt in #166
- Move
max_lengthto baseDataConfigby @sfc-gh-mwyatt in #168 - Fix ReadTheDocs build and add unit test by @sfc-gh-mwyatt in #169
- support mlp-variant-speculator by @sfc-gh-jaelee in #170
- Update README.md by @sfc-gh-aqiao in #173
- Add early exit kill switch by @sfc-gh-mwyatt in #175
- Allow stderr on non-print ranks by @sfc-gh-jrasley in #161
- Add data-process mode to CLI by @sfc-gh-mwyatt in #178
- Human friendly values for DeepSpeed config by @sfc-gh-mwyatt in #167
- Fix for python wheel builds by @sfc-gh-mwyatt in #179
New Contributors
- @sfc-gh-sbekman made their first contribution in #68
- @sfc-gh-lmerrick made their first contribution in #53
- @sfc-gh-srajbhandari made their first contribution in #95
- @sfc-gh-bzhai made their first contribution in #103
- @sfc-gh-zhyao made their first contribution in #139
- @sfc-gh-aqiao made their first contribution in #145
- @sfc-gh-jaelee made their first contribution in #170
Full Changelog: v0.0.2...v0.0.3