Releases: neuralmagic/sparseml
SparseML v1.8.0
New Features:
QuantizationModifier
implemented to use compressed-tensors as a backend. (#2307)- INT4 and grouped quantization support (#2307)
Changes:
- UX updated for GPTQModifier. (#2263)
- Upgraded pydantic 1.x to 2.x for compatibility to external dependencies, such as Transformers (#2248)
Resolved Issues:
- None
Known Issues:
- ONNX Export and computer vision models are not officially supported in version 1.8; refer to version v1.7.0 for computer vision support.
SparseML v1.7.0
New Features:
- Fine-tuning, one-shot, and general compression techniques now support large language models built on top of Hugging Face Transformers, including full FSDP support and model stages for transitioning between training and post-training pathways. (#1834, #1891, #1907, #1902, #1940, #1939, #1897, #1907, #1912)
- SparseML eval pathways have been added with plugins for perplexity and lm-eval-harness specifically for large language model support. (#1834)
- AutoModel for casual language models, including quantized and sparse quantized support, has been added.
Changes:
- Exporting pathways has been simplified across text generation and CV use cases to auto infer previously required arguments, such as task type. (#1858, #1878, #1880, #1883, #1884, #1888, #1889, #1890, #1898, #1908, #1909, #1910)
- Recipe pathways have been updated to fully support LLMs for model compression techniques. (#1802, #1804, #1819, #1825, #1849)
- Pruning for models that are partially quantized is now supported. (#1792)
- OBCQ modifier
target_ids
argument is now optional. (#1825) sequence_length
for transformer exports is now automatically inferred if it is not supplied. (#1826)- OBCQ now supports non-CUDA systems. (#1828)
- Neural Magic's Ultrayltics Enterprise License has been updated with a December 2023 amendment as cited. (#2090)
Resolved Issues:
- KV-cache injections now function accurately with MPT models in DeepSparse and SparseML, where before they crashed on export for MPT models. (#1801)
SmoothQuant
updated to support proper device forwarding where it would not work properly in FSDP setups and crash. (#1830)- With
nsamples
increased to 512, the stability of OBCQ improved, resulting in a higher likelihood of it converging correctly. (#1812) SmoothQuant
NaN values are resolved during computation. (#1872)TypeError
with OBCQ when nosequence_length
is provided is now resolved. (#1899)
Known Issues:
- Memory usage is currently high for one-shot and fine-tuning algorithms on LLMs, resulting in the need for GPUs with more memory for model sizes 7B and above.
- Memory usage is currently high for export pathways for LLMs, resulting in a requirement of large CPU RAM (>150GB) to successfully export for model sizes 7B and above.
- Currently, exporting models created with quantization through FSDP pathways is failing on reloading the model from disk. The workaround is to perform quantization on a single GPU rather than multiple GPUs. A hotfix is forthcoming.
- Currently, multi-stage pipelines that include quantization and are running through FSDP will fail after running training and on initialization of the SparseGPT quantization stage. This is due to the FSDP state not being propagated correctly. The workaround is to restart the run from the saved checkpoint after training and pruning are finished. A hotfix is forthcoming.
SparseML v1.6.1 Patch Release
This is a patch release for 1.6.0 that contains the following changes:
- The Neural Magic DeepSparse Community License reference has been renamed from
LICENSE-NEURALMAGIC
toLICENSE
in the NOTICE file. (#1915)
Known Issues:
- Python API bug when loading a SparseYolo model then calling
model.val()
returnsAttributeError: 'DetectionModel' object has no attribute 'args'
.- [Immediate Resolution] Run
model.model.args = model.overrides
before themodel.val()
function call. - [FIX] Coming in Release 1.7
- [Immediate Resolution] Run
- The compile time for dense LLMs can be very slow. Compile time to be addressed in forthcoming release.
- Docker images are not currently pushing. A resolution is forthcoming for functional Docker builds. [RESOLVED]
SparseML v1.6.0
New Features:
-
Version support added:
-
Ultralytics YOLOv8 training and sparsification pipelines added. (Documentation) (#1517, #1522, #1520, #1528, #1521, #1561, #1579, #1597, #1599, #1629, #1637, #1638, #1673, #1686, #1656, #1787)
-
NOTICE updated to reflect now public-facing Ultralytics Enterprise Software License Agreement for YOLOv3/v5/v8.
-
Initial sparsification framework v2 added for better generative AI support and improved functionality and extensibility. (Documentation available in v1.7) (#1713, #1751, #1742, #1763, #1759, #1769)
-
BLOOM, CodeGen, OPT, Falcon, GPTNeo, LLAMA, MPT, and Whisper large language and generative models are supported through transformers training, sparsification, and export pipelines. (Documentation) (#1562, #1571, #1585, #1584, #1616, #1633, #1590, #1644, #1615, #1664, #1646, #1631, #1648, #1683, #1687, #1677, #1692, #1694, #1699, #1703, #1709, #1691, #171, #1720, #1746)
-
QuantizationModifier for PyTorch sparsification pathways implemented to enable cleaner, more robust, and simpler arguments for quantizing models in comparison to the legacy quantization modifier. (Documentation) (#1568, #1594, #1639, #1693, #1745, #1738)
-
CLIP pruning, quantization, and export supported. (Documentation) ( #1581, #1626, #1711)
-
INT4 quantization support added for model sparsification and export. (Documentation available in v1.8 with LLM support expansion)(#1670)
-
DDP support added to Torchvision image classification training and sparsification pipelines. (Documentation available in v1.8 with new research paper)(#1698, #1784)
-
SparseGPT, OBC, and OBQ one-shot/post-training pruning and quantization modifiers added for PyTorch pathways. (Documentation) (#1705, #1736, #1737, #1761, #1770, #1781, #1776, #1777, #1758)
Changes:
-
SparseML upgraded for SparseZoo V2 model file structure changes, which expands the number of supported files and reduces the number of bytes that need to be downloaded for model checkpoints, folders, and files. (#1719)
-
Docker builds updated to consistently rebuild for new releases and nightlies. (#1506, #1531, #1543, #1537, #1665, #1684)
-
README and documentation updated to include: Slack Community name change, Contact Us form introduction, Python version changes; corrections for YOLOv5 torchvision, transformers, and SparseZoo broken links; and installation command. (#1536, #1577, #1578, #1610, #1617, #1612, #1602, #1659, #1721, #1725 , #1726, #1785)
-
Improved support for large ONNX files to improve loading performance and limit memory performance issues, especially for LLMs. (#1515, #1540, #1514, #1586)
-
Transformers datasets can now be created without a model needing to be passed in. (#1544, #1545)
-
Torchvision training and sparsification pipelines updated to enable patch versions of torchvision as installable dependencies, whereas before the version was restricted to 0.14.0 and now supports 0.14.x. (#1556)
-
Image classification training and sparsification pipelines for torchvision now support arguments for RGB emans and standard deviations to be passed in, enabling overriding of the default ImageNet values that were hardcoded. (#1546)
-
YOLOv5 training and sparsification pipelines migrated to install from
nm-yolov5
on PyPI and remove the autoinstall from thenm-yolov5
GitHub repository that would happen on invocation of the relevant pathways, enabling more predictable environments. (#1518, #1564, #1566) -
Transformers training and sparsification pipelines migrated to install from
nm-transformers
on PyPI and remove the autoinstall from thenm-transformers
GitHub repository that would happen on invocation of the relevant pathways, enabling more predictable environments. (#1518, #1553, #1564, #1566, #1730) -
Deprecated and no longer supported:
-
Pydantic version pinned to <2.0 preventing potential issues with untested versions. (#1645)
-
Automatic link checking added to GitHub actions. (#1525)
Resolved Issues:
-
ONNX export for MobileBERT results in an exported ONNX model that previously had poor performance in DeepSparse. (#1539)
-
OpenCV is now installed for image classification pathways when running
pip install sparseml[torchvision]
. Before it would crash with a missing dependency error of opencv unless installed. (#1575) -
Scipy version dependency issues resolved with
scikit-image
which would result in incompatibility errors on install ofscikit-image
for computer vision pathways. (#1570) -
Transformers export pathways for quantized models addressed where the export would improperly crash and not export for all transformers models. (#1654)
-
Transformers data support for jsonl files through the question answering pathways was resulting in a JSONDecodeError; these are now loading correctly. (#1667, #1669)
-
Unit and integration tests updated to remove temporary test files and limit test file creation which were not being properly deleted. (#1609, #1668, #1672, #1696)
-
Image classification pipelines no longer crash with an extra argument error when using CIFAR10 or CIFAR100 datasets. (#1671)
Known Issues:
- The compile time for dense LLMs can be very slow. Compile time to be addressed in forthcoming release.
- Docker images are not currently pushing. A resolution is forthcoming for functional Docker builds. [RESOLVED]
DeepSparse v1.5.4 Patch Release
This is a patch release for 1.5.0 that contains the following changes:
- ClearML logging has been enabled for transformers. (#81)
SparseML v1.5.3 Patch Release
This is a patch release for 1.5.0 that contains the following changes:
- Pinned dependency Pydantic, a data validation library for Python, to < v2.0, to prevent current workflows from breaking. Pydantic upgrade planned for future release. (#1651)
SparseML v1.5.2 Patch Release
This is a patch release for 1.5.0 that contains the following changes:
- Latest 1.5-supported transformers datasets are incompatible with pandas 2.0. Future releases will support later datasets versions so this is to restrict pandas to < 2.0. (#1634 )
SparseML v1.5.1 Patch Release
This is a patch release for 1.5.0 that contains the following changes:
- Propagated
datasets_dir
argument in YOLOv8 training command to address missing args error. (#1620)
SparseML v1.5.0
New Features:
- PyTorch 1.13 support (#1143)
- Enabled patch versions for torchvision 0.14.x (#1557)
- YOLOv8 sparsification pipelines (view)
- Per layer distillation support for PyTorch Distillation modifier (#1272)
- Torchvision training pipelines:
- Product usage analytics tracking; to disable, run the command
export NM_DISABLE_ANALYTICS=True
(#1487)
Changes:
- Transformers and YOLOv5 integrations migrated from auto install to install from PyPI packages. Going forward,
pip install sparseml[transformers]
andpip install sparseml[yolov5]
will need to be used. - Error message updated when utilizing wandb loggers and wandb is not installed in the environment, telling user to pip install wandb. (#1374)
- Keras and TensorFlow tests have been removed; these are no longer actively supported pathways.
scikit-learn
now replaced withsklearn
to stay current with dependency name changes. (#1294)
Resolved Issues:
- Using recipes that utilized the legacy PyTorch QuantizationModifier with DDP when restoring weights for sparse transfer no longer crashes. (#1490)
- If labels were not being set correctly when utilizing a distillation teacher different from the student with token classification pipelines, training runs would crash. (#1414)
- Q/DQ folding fixed on ONNX export for quantization nodes occurring before Softmax in transformer graphs; performance issues would result for some transformer models in DeepSparse. (#1343)
- Inaccurate metrics calculations for torchvision training pipelines led to discrepancies in top1 and top5 accuracies by ~1%. (#1341)
Known Issues:
- None
SparseML v1.4.4 Patch Release
This is a patch release for 1.4.0 that contains the following changes:
- Support implemented for overriding ONNX inputs with static and dynamic shapes. (#1476)