Releases · ludwig-ai/ludwig

09 Oct 21:39

justinxzhao

v0.8.5

b1f5ead

v0.8.5

What's Changed

Add function to free GPU memory by @Infernaught in #3643
❗ Enable LLM fine-tuning tests when no quantization is specified by @arnavgarg1 in #3626
Add check to ensure selected backend works with quantization for LLMs by @arnavgarg1 in #3646
[CI] Use a torch-nightly-compatible version of torchaudio by @justinxzhao in #3644
Set do_sample default to True by @Infernaught in #3641
FIX: Failure in audio feature related test by @jimthompson5802 in #3651
Remove unnecessary peft config updating by @Infernaught in #3642
FIX: docker build error for ludwig-gpu by @jimthompson5802 in #3658
Exclude getdaft on Windows by @carlogrisetti in #3629
Add daft back for windows since the wheels are now officially published by @arnavgarg1 in #3663
fix: The final batch of an epoch is skipped when batch size is 1 by @jeffkinnison in #3653
Place metric functions for BLEU and Rogue on correct devices when using multiple GPUs by @arnavgarg1 in #3671
Remove duplicate metrics by @Infernaught in #3670
Increment epochs based on last_batch() instead of at the end of the train loop. by @justinxzhao in #3668
[FEATURE] Support Merging LoRA Weights Into Base Model (Issue-3603) by @alexsherstinsky in #3649
[FEATURE] Include Mistral-7B model in list of supported base models by @alexsherstinsky in #3674
[MAINTENANCE] Partially reconcile type hints, fix some warnings, and fix comments in parts of the codebase. by @alexsherstinsky in #3673
Improve error message for when an LLM base model can't be loaded. by @justinxzhao in #3675
Fix eos_token and pad_token issue by @Infernaught in #3667
FIX: error with nightly CI tests for test_resize_image by @jimthompson5802 in #3678
[BUGFIX] Remove spurious test directory at the end of the test_llm.py::test_local_path_loading test run by @alexsherstinsky in #3680
Add per-device logging to tensorboard by @Infernaught in #3677
Fix dynamic generation config load during model.predict by @geoffreyangus in #3666
[CI] Ensure that mlflow callback cleans up background-saving threads on trainer teardown. by @justinxzhao in #3683
fix: temporarily remove config validation check for backend by @geoffreyangus in #3688
fix: Failing test for backend with quantization by @arnavgarg1 in #3689
[BUGFIX] Ensure that full base models and not only adapter weights get saved when merge_and_unload is set by @alexsherstinsky in #3679
Add Ludwig Star History to README by @arnavgarg1 in #3696
Use sphinx for all docstrings in api.py by @justinxzhao in #3693
Fix binary variables being visualized as 0 and 1 by @Infernaught in #3691
[MAINTENANCE] Fix the linting warnings in two backend component classes. by @alexsherstinsky in #3698
[BUGFIX] Pin deepspeed<0.11, skip Horovod tests by @alexsherstinsky in #3700
Unpin deepspeed following fix in v0.11.1 by @tgaddair in #3706
Move on_epoch_end and epoch increment to after run_evaluation loop. by @justinxzhao in #3690
Remove model_load_path from experiment by @Infernaught in #3707
[FEATURE] Allow typehints without the quotes. by @alexsherstinsky in #3699

New Contributors

@alexsherstinsky made their first contribution in #3649

Full Changelog: v0.8.4...v0.8.5

Contributors

alexsherstinsky, jimthompson5802, and 7 other contributors

Assets 2

0 Join discussion

19 Sep 16:20

justinxzhao

v0.8.4

b607629

v0.8.4

What's Changed

Add codellama to tokenizer list for set_pad_token by @Infernaught in #3598
Set default eval batch size to 2 for LLM fine-tuning by @arnavgarg1 in #3599
[CI] Explicitly set eval batch size in determinism tests, introduce a new integration test group, and exclude slow tests. by @justinxzhao in #3590
[CI] Run sudo apt-get update in GHAs. by @justinxzhao in #3608
Store steps_per_epoch in Trainer by @hungcs in #3601
Updated characters, underscore and comma preprocessors to be TorchScriptable. by @martindavis in #3602
[CI] Deflake: Explicitly set eval batch size for mlflow test. by @justinxzhao in #3612
Fix registration for char error rate. by @justinxzhao in #3604
fix: Load 8-bit quantized models for eval after fine-tuning by @jeffkinnison in #3606
Add Code Alpaca and Consumer Complaints Datasets by @connor-mccorm in #3611
Add support for gradient checkpointing for LLM fine-tuning by @arnavgarg1 in #3613
Bump min support transformers to 4.33.0 by @tgaddair in #3616
[CI] Fix failing tests on master by @arnavgarg1 in #3617
Eliminate short-circuiting for loading from local by @Infernaught in #3600
Refactor integration tests into matrix by @tgaddair in #3618
fix: Check underlying model device type when moving 8-bit quantized models to GPU at eval by @jeffkinnison in #3622
Fixed range validation for text generation penalty parameters by @tgaddair in #3623
Update comment for predict to update Ludwig docs by @Infernaught in #3535
Avoid deprecation warnings on pandas Series.fillna by @carlogrisetti in #3631
QoL: Default to using fast tokenizer for Llama models by @arnavgarg1 in #3625
fixed typo in EfficientNet's model variant from v2_ to v2_s by @saad-palapa in #3628
Add pytorch profiler and additional tensorboard logs for GPU memory usage. by @justinxzhao in #3607
Pin minimum transformers version to 4.33.2 by @arnavgarg1 in #3637

New Contributors

@saad-palapa made their first contribution in #3628

Full Changelog: v0.8.3...v0.8.4

Contributors

martindavis, tgaddair, and 8 other contributors

Assets 2

0 Join discussion

12 Sep 01:05

justinxzhao

v0.8.3

afd12a6

v0.8.3

What's Changed

Add test to show global_max_sequence_length can never exceed an LLMs context length by @arnavgarg1 in #3548
WandB: Add metric logging support on eval end and epoch end by @arnavgarg1 in #3586
schema: Add prompt validation check by @ksbrar in #3564
Unpin Transformers for CodeLlama support by @arnavgarg1 in #3592
Add support for Paged Optimizers (Adam, Adamw), 8-bit optimizers, and new optimizers: LARS, LAMB and LION by @arnavgarg1 in #3588
fix: Failure in TabTransformer Combiner Unit test by @jimthompson5802 in #3596
fix: Move target tensor to model output device in check_module_parameters_updated by @jeffkinnison in #3567
Allow user to specify huggingface link or local path to pretrained lora weights by @Infernaught in #3572

Full Changelog: v0.8.2...v0.8.3

Contributors

jimthompson5802, jeffkinnison, and 3 other contributors

Assets 2

0 Join discussion

01 Sep 14:06

justinxzhao

v0.8.2

5bd9287

v0.8.2

What's Changed

int: Rename original combiner_registry to combiner_config_registry, update decorator name by @ksbrar in #3516
Add mechanic to override default values for generation during model.predict() by @justinxzhao in #3520
[feat] Support for numeric date feature inputs by @jeffkinnison in #3517
Add new sythesized response column for text output features during postprocessing by @arnavgarg1 in #3521
Disable flaky twitter bots dataset loading test. by @justinxzhao in #3439
Add test that verifies that the generation config passed in at model.predict() is used correctly. by @justinxzhao in #3523
Move loss metric to same device as inputs by @Infernaught in #3522
Add comment about batch size tuning by @arnavgarg1 in #3526
Ensure user sets backend to local w/ quantization by @Infernaught in #3524
README: Update LLM fine-tuning config by @arnavgarg1 in #3530
Revert "Ensure user sets backend to local w/ quantization (#3524)" by @tgaddair in #3531
Revert "Ensure user sets backend to local w/ quantization" for release-0.8 branch and upgrade version to 0.8.1.post1 by @justinxzhao in #3532
Improve observability during LLM inference by @arnavgarg1 in #3536
[bug] Pin pydantic to < 2.0 by @jeffkinnison in #3537
[bug] Support preprocessing datetime.date date features by @jeffkinnison in #3534
Remove obsolete prompt tuning example. by @justinxzhao in #3540
Add Ludwig 0.8 notebook to the README by @arnavgarg1 in #3542
Add effective_batch_size to auto-adjust gradient accumulation by @tgaddair in #3533
Refactor evaluation metrics to support decoded generated text metrics like BLEU and ROUGE. by @justinxzhao in #3539
Fix sequence generator test. by @justinxzhao in #3546
Revert "Add Cosine Annealing LR scheduler as a decay method (#3507)" by @justinxzhao in #3545
Set default max_sequence_length to None for LLM text input/output features by @arnavgarg1 in #3547
Add skip_all_evaluation as a mechanic to skip all evaluation. by @justinxzhao in #3543
Roll-forward with fixes: Fix interaction between scheduler.step() and gradient accumulation steps, refactor schedulers to use LambdaLR, and add cosine annealing LR scheduler as a decay method. by @justinxzhao in #3555
fix: Move model to the correct device for eval by @jeffkinnison in #3554
Report loss in tqdm to avoid log spam by @tgaddair in #3559
Wrap each metric update in try/except. by @justinxzhao in #3562
Move DDP model to device if it hasn't been wrapped yet by @tgaddair in #3566
ensure that there are enough colors to match the score index in visua… by @thelinuxkid in #3560
Pin Transformers to 4.31.0 by @arnavgarg1 in #3569

New Contributors

@thelinuxkid made their first contribution in #3560

Full Changelog: v0.8.1...v0.8.2

Contributors

thelinuxkid, tgaddair, and 5 other contributors

Assets 2

0 Join discussion

15 Aug 16:26

justinxzhao

v0.8.1.post1

0caad8f

v0.8.1.post1

What's Changed

Revert "Ensure user sets backend to local w/ quantization" for release-0.8 branch and upgrade version to 0.8.1.post1 by @justinxzhao in #3532

Full Changelog: v0.8.1...v0.8.1.post1

Contributors

justinxzhao

Assets 2

11 Aug 23:46

arnavgarg1

v0.8.1

c9ab619

v0.8.1

What's Changed

Update ludwig version to 0.8.1 by @arnavgarg1 in #3527
Release 0.8.1 latest by @arnavgarg1 in #3528

Commits

Full Changelog: v0.8...v0.8.1

Contributors

arnavgarg1

Assets 2

09 Aug 04:55

tgaddair

v0.8

efed598

v0.8: Low Code Framework to Efficiently Build Custom LLMs on Your Data

Full Release Blog Post Here: https://predibase.com/blog/ludwig-v0-8-open-source-toolkit-to-build-and-fine-tune-custom-llms-on-your-data

What's Changed

Make fill_value a medium impact parameter in preprocessing by @arnavgarg1 in #3155
Allow auto cherry-pick into release-0.7 by @tgaddair in #3157
Fixed confidence_penalty for newer versions of pytorch by @tgaddair in #3156
Fixed set explanations by @tgaddair in #3160
Bump to hummingbird 0.4.8 by @tgaddair in #3164
Update SequenceGeneratorDecoder to output predictions and probabilities by @jeffkinnison in #3152
Disable sampling in preprocessing or when it results in too few rows by @ShreyaR in #3117
Unpin pyarrow by @tgaddair in #3167
Make Horovod an optional dependency when using Ray by @tgaddair in #3166
Skip sample_ratio validation when using Dask to prevent materialization of DF by @tgaddair in #3174
Fix TorchVision channel preprocessing by @geoffreyangus in #3173
Bump Ludwig to v0.7.1 by @tgaddair in #3179
Add fallback mirrors to dataset API by @abidwael in #3168
Log cached dataset write paths during cache miss by @arnavgarg1 in #3181
Re-enable benchmark tests on Sarcos dataset by @abidwael in #3169
Disable passthrough decoder for all feature types by @arnavgarg1 in #3151
Log when cached dataset can't be found by @arnavgarg1 in #3192
Remove hard dependency on ludwig[tree]. Check model.type() instead of instanceof(model). by @justinxzhao in #3184
Add sequence decoder integration tests by @jeffkinnison in #3175
int: [REBASE] Remove unnecessary JSON schema code by @ksbrar in #3196
fix: [REBASE] Hoist uniqueItemProperties to top of feature JSON schema by @ksbrar in #3183
Guarantee determinism when sampling (either overall via sample_ratio, or while balancing data) by @arnavgarg1 in #3191
Use tmpdir for more files generated during tests by @tgaddair in #3197
Removed .vscode and added to .gitignore by @tgaddair in #3201
Revert vscode by @tgaddair in #3202
Fixes dict_hash discrepancy by @w4nderlust in #3195
Reset Ray address by @arnavgarg1 in #3200
Unpin scikit-learn. by @justinxzhao in #3185
Fixed learning_rate_scheduler params in automl by @tgaddair in #3203
Fix Docker image dependencies and add tests for minimal install by @tgaddair in #3186
Bump to v0.7.2 by @tgaddair in #3208
fix: [REBASE] Misc. JSON schema fixes by @ksbrar in #3187
fix: [REBASE] Streamline GBM defaults schema by @ksbrar in #3188
Update sequence/text feature max_sequence_length default to None by @geoffreyangus in #3205
Add category onehot encoder for both ECD and GBM by @tgaddair in #3057
Enable transformer encoder and disable embed encoder from SequenceCombiner by @abidwael in #3154
Add auxiliary validation for all features to be present in comparator combiner entities by @abidwael in #3216
Disable number decoder in the decoder config by @abidwael in #3217
Add non_zero to common_fields.NumFCLayersField by @abidwael in #3215
Add the ability to specify a local or S3 HF mirror for more guaranteed loading of pre-trained HF models. by @justinxzhao in #3211
Schemafy merge_fixed_preprocessing_params by @tgaddair in #3223
Tagger decoder config override and auxiliary validation checks by @arnavgarg1 in #3222
Refactor loss implementation to use the schema config for all parameters by @tgaddair in #3227
follow up: unregister passthrough decoder for all features by @abidwael in #3225
top_k should be a positive integer by @connor-mccorm in #3230
Refactored combiner registry and broke circular dep with schema by @tgaddair in #3228
Implements sequence_length param by @geoffreyangus in #3221
Add columns and data types to ludwig datasets by @connor-mccorm in #3231
Fixes Explain step for tied weights by @geoffreyangus in #3214
Fix @slow hf tests by @justinxzhao in #3233
Remove partial RayTune checkpoints for trials that have not completed because of forceful termination by @arnavgarg1 in #3232
Replace NaN in timeseries rows with padding_value by @tgaddair in #3238
Adds support for TEXT features when using GBM with tf-idf encoder by @tgaddair in #3235
Persist Dask Dataframe after binary image/audio reads by @arnavgarg1 in #3241
Add Timeseries forecasting for column-major data, and introduce Timeseries output feature by @tgaddair in #3212
fix: transform onehot encoder outputs to float32 tensor by @abidwael in #3242
Pin torchaudio by @geoffreyangus in #3244
Pin torchvision and torchtext by @geoffreyangus in #3248
Filter entities from comparator combiner when not listed in input_features by @tgaddair in #3251
Reset os.environ var in hf_utils. by @justinxzhao in #3253
fix typo in pretrained_model_name_or_path by @abidwael in #3257
Fixes torch DDP distributed metric computation for AUROC by @geoffreyangus in #3234
Unpin torchvision, torchtext, and torchaudio. by @justinxzhao in #3255
Added compute tiers to parameter metadata by @tgaddair in #3254
Allow encoders for GBMs by @arnavgarg1 in #3258
feat: Add env var LUDWIG_SCHEMA_VALIDATION_POLICY to change marshmallow validation strictness by @tgaddair in #3226
Removes invalid keys from GBM defaults in the schema by @arnavgarg1 in #3252
Add support for model.compile in PyTorch 2.0 by @tgaddair in #3246
Fix ludwig docker by @tgaddair in #3264
Bump to v0.7.3 by @tgaddair in #3267
Update gpt text encoder afn parameter default to what's listed in HF docs. by @justinxzhao in #3261
Add test for falling back to HF model that's not in the ludwig pretrained dir by @justinxzhao in #3256
Fixed non-scalar (text, vector, set) output feature explanations by @tgaddair in #3269
Use LudwigFeatureDict to permit module keys that are rejected by torch ModuleDict. by @justinxzhao in #3270
Fixed handling of datetime types in input parquet files by @tgaddair in #3274
Fix date explanations by @tgaddair in #3276
Handle np.bool to JSON NumpyEncoder by @abidwael in #3280
Update all combiners to use .get to access LudwigFeatureDict contents by @abidwael in #3279
Check that concat combiner doesn't receive a mixture of non-reduced sequence and non-sequence features. by @justinxzhao in #3271
Better handling for missing dataset columns by @connor-mccorm in #3285
feat: Add kwargs option to all file readers and feed nrows where possible by @ksbrar in #3266
Update version to 0.8.dev by @justinxzhao in #3286
Added DeBERTa (v2 / v3) text encoder by @tgaddair in #3289
Check fill_with_const has fill_value for binary features by @ab...

Contributors

neuhausler, w4nderlust, and 20 other contributors

Assets 2

07 Aug 19:59

justinxzhao

v0.7.5

c020e64

v0.7.5

What's Changed

Fixed URI loading by @tgaddair in #3314
Filter auto_transformer kwargs based on forward signature by @tgaddair in #3332
Remove tables for Ludwig 0.7. by @justinxzhao in #3503
Update ludwig version to v0.7.5 by @justinxzhao in #3506

Full Changelog: v0.7.4...v0.7.5

Contributors

tgaddair and justinxzhao

Assets 2

23 Mar 15:29

justinxzhao

v0.7.4

7b72de5

v0.7.4

What's Changed

Tagger decoder config override and auxiliary validation checks (#3222)

Full Changelog: v0.7.3...v0.7.4

Assets 2

17 Mar 20:43

tgaddair

v0.7.3

3a2bfa0

v0.7.3

What's Changed

Support for PyTorch 2.0 via trainer.compile: true (#3246)
Fix ludwig docker (#3264)
Add env var LUDWIG_SCHEMA_VALIDATION_POLICY to change marshmallow validation strictness (#3226)
Add sequence_length capability (#3259)
Persist Dask Dataframe after binary image/audio reads (#3241)
Replace NaN in timeseries rows with padding_value (#3238)
Remove partial RayTune checkpoints for trials that have not completed because of forceful termination (#3232)

Full Changelog: v0.7.2...v0.7.3

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Commits

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

What's Changed

Releases: ludwig-ai/ludwig

v0.8.5

What's Changed

New Contributors

Contributors

v0.8.4

What's Changed

New Contributors

Contributors

v0.8.3

What's Changed

Contributors

v0.8.2

What's Changed

New Contributors

Contributors

v0.8.1.post1

What's Changed

Contributors

v0.8.1

What's Changed

Commits

Contributors

v0.8: Low Code Framework to Efficiently Build Custom LLMs on Your Data

What's Changed

Contributors

v0.7.5

What's Changed

Contributors

v0.7.4

What's Changed

v0.7.3

What's Changed