Release v0.6.beta · ludwig-ai/ludwig

What's Changed

Fix ray nightly import by @jppgks in #2196
Restructured split config and added datetime splitting by @tgaddair in #2132
enh: Implements InferenceModule as a pipelined module with separate preprocessor, predictor, and postprocessor modules by @brightsparc in #2105
Explicitly pass data credentials when reading binary files from a RayBackend by @jeffreyftang in #2198
MlflowCallback: do not end run on_trainer_train_teardown by @jppgks in #2201
Fail hyperopt with full import error when Ray not installed by @tgaddair in #2203
Make convert_predictions() backend-aware by @hungcs in #2200
feat: MVP for explanations using Integrated Gradients from captum by @jppgks in #2205
[Torchscript] Adds GPU-enabled input types for Vector and Timeseries by @geoffreyangus in #2197
feat: Added model type GBM (LightGBM tree learner), as an alternative to ECD by @jppgks in #2027
[Torchscript] Parallelized Text/Sequence Preprocessing by @geoffreyangus in #2206
feat: Adding feature type shared parameter capability for hyperopt by @arnavgarg1 in #2133
Bump up version to 0.6.dev. by @justinxzhao in #2209
Define FloatOrAuto and IntegerOrAuto schema fields, and use them. by @justinxzhao in #2219
Define a dataclass for parameter metadata. by @justinxzhao in #2218
Add explicit handling for zero-length image byte buffers to avoid cryptic errors by @jeffreyftang in #2210
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2231
Create dataset util to form repeatable train/vali/test split by @amholler in #2159
Bug fix: Use safe rename which works across filesystems when writing checkpoints by @dantreiman in #2225
Add parameter metadata to the trainer schema. by @justinxzhao in #2224
Add an explicit call to merge_wtih_defaults() when loading a config from a model directory. by @justinxzhao in #2226
Fixes flaky test test_datetime_split[dask] by @dantreiman in #2232
Fixes prediction saving for models with Set output by @geoffreyangus in #2211
Make ExpectedImpact JSON serializable by @hungcs in #2233
standardised quotation marks, added missing word by @Marvjowa in #2236
Add boolean postprocessing to dataset type inference for automl by @magdyksaleh in #2193
Update get_repeatable_train_val_test_split to handle non-stratified split w/ no existing split by @amholler in #2237
Update R2 score to handle single sample computation by @arnavgarg1 in #2235
Input/Output Feature Schema Refactor by @connor-mccorm in #2147
Fix nan in entmax loss and flaky sparsemax/entmax loss tests by @dantreiman in #2238
Fix preprocessing dataset split API backwards compatibility upgrade bug. by @justinxzhao in #2239
Removing duplicates in constants from recent PRs by @arnavgarg1 in #2240
Add attention scores of the vit encoder as an additional return value by @Dennis-Rall in #2192
Unnest Audio Feature Preprocessing Config by @connor-mccorm in #2242
Fixed handling of invalud number values to treat as missing values by @tgaddair in #2247
Support saving numpy predictions to remote FS by @hungcs in #2245
Use global constant for description.json by @hungcs in #2246
Removed import warnings when LightGBM and Ray not requested by @tgaddair in #2249
Adds ability to read images from numpy files and numpy arrays by @geoffreyangus in #2212
Hyperopt steps per epoch not being computed correctly by @arnavgarg1 in #2175
Fixed splitting when providing pre-split inputs by @tgaddair in #2248
Added Backwards Compatibility for Audio Feature Preprocessing by @connor-mccorm in #2254
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2256
Fix: Don't skip saving the model if the save path already exists. by @justinxzhao in #2264
Load best weights outside of finally block, since load may throw an exception by @dantreiman in #2268
Reduce number of distributed tests. by @justinxzhao in #2270
[WIP] Adds inference_utils.py by @geoffreyangus in #2213
Run github checks for pushes and merges to *-stable. by @justinxzhao in #2266
Add ludwig logo and version to CLI help text. by @justinxzhao in #2258
Add hyperopt_statistics.json constant by @hungcs in #2276
fix: Make BaseTrainerConfig an abstract class by @ksbrar in #2273
[Torchscript] Adds --device argument to export_torchscript CLI command by @geoffreyangus in #2275
Use pytest tmpdir fixture wherever temporary directories are used in tests. by @justinxzhao in #2274
adding configs used in benchmarking by @abidwael in #2263
Fixes #2279 by @noahlh in #2284
adding hardware usage and software packages tracker by @abidwael in #2195
benchmarking utils by @abidwael in #2260
dataclasses for summarizing benchmarking results by @abidwael in #2261
Benchmarking core by @abidwael in #2262
Fixed default eval_batch_size when setting batch_size=auto by @tgaddair in #2286
Remove obsolete postprocess_inference_graph function. by @justinxzhao in #2267
[Torchscript] Adds BERT tokenizer + partial HF tokenizer support by @geoffreyangus in #2272
Support passing ground_truth as df for visualizations by @hungcs in #2281
catching urllib3 exception by @abidwael in #2294
Run pytest workflow on release branches. by @justinxzhao in #2291
Save checkpoint if train_steps is smaller than batcher's steps_per_epoch by @dantreiman in #2298
Fix typo in amazon review datasets: s/review_tile/review_title by @dantreiman in #2300
Refactor non-distributed automl utils into a separate directory. by @justinxzhao in #2296
Don't skip normalization in TabNet during inference on a single row. by @dantreiman in #2299
Fix error in postproc_predictions calculation in model.evaluate() by @arnavgarg1 in #2304
Test for parameter updates in Ludwig components by @jimthompson5802 in #2194
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2311
Use warnings to suppress repeated logs for failed image reads by @arnavgarg1 in #2312
Use ray dataset and drop type casting in binary_feature prediction post processing for speedup by @magdyksaleh in #2293
Add size_bytes to DatasetInfo and DataSource by @jeffreyftang in #2306
Fixes TensorDtype TypeError in Ray nightly by @geoffreyangus in #2320
Add configuration section for global feature parameters by @arnavgarg1 in #2208
Ensures unit tests are deleting artifacts during teardown by @geoffreyangus in #2310
Fixes unit test that had empty Dask partitions after splitting by @geoffreyangus in #2313
Serve json numpy encoding by @jeffkinnison in #2316
fix: Mlflow config being injected in hyperopt config by @hungcs in #2321
Update tests that use preprocessing to match new defaults config structure by @arnavgarg1 in #2323
Bump test timeout to 60 minutes by @tgaddair in #2325
Set a default value for size_bytes in DatasetInfo by @jeffreyftang in #2331
Pin nightly versions to fix CI by @geoffreyangus in #2327
Log number of failed image reads by @arnavgarg1 in #2317
Add test with encoder dependencies for global defaults by @arnavgarg1 in #2342
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2334
Add wine quality notebook to demonstrate using config defaults by @arnavgarg1 in #2333
fix: GBM tests failing after new release from upstream dependency by @jppgks in #2347
fix: restore overwrite of eval_batch_size on GBM schema by @jppgks in #2345
Removes empty partitions after dropping rows and splitting datasets by @geoffreyangus in #2328
fix: Properly serialize ParameterMetadata to JSON by @ksbrar in #2348
Test for parameter updates in Ludwig Components - Part 2 by @jimthompson5802 in #2252
refactor: Replace bespoke marshmallow fields that accept multiple types with a new 'combinatorial' OneOfField that accepts other fields as arguments. by @ksbrar in #2285
Use Ray Datasets to read binary files in parallel by @tgaddair in #2241
typos: Update README.md by @andife in #2358
Respect the resource requests in RayPredictor by @magdyksaleh in #2359
Resource tracker threading by @abidwael in #2352
Allow writing init_config results to remote filesystems by @tgaddair in #2364
Fixed export_mlflow command to not assume an existing registered_model_name by @tgaddair in #2369
fix: Fixes to serialization, and update to allow set repo location. by @brightsparc in #2367
Add amazon employee access challenge kaggle dataset by @justinxzhao in #2349
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2362
Wrap read of cached training set metadata in try/except for robustness by @jeffreyftang in #2373
Reduce dropout prob in test_conv1d_stack by @dantreiman in #2380
fever: change broken download links by @jppgks in #2381
Add default split config by @hungcs in #2379
Fix CI: Skip failing ray GBM tests by @justinxzhao in #2391
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2389
Triton ensemble export by @abidwael in #2251
Fix: Random dataset splitting with 0.0 probability for optional validation or test sets. by @justinxzhao in #2382
Print final training report as tabulated text. by @justinxzhao in #2383
Add Ray 2.0 to CI by @tgaddair in #2337
add GBM configs to benchmarking by @jppgks in #2395
Optional artifact logging for MLFlow by @ShreyaR in #2255
Simplify ludwig.benchmarking.benchmark API and add ludwig benchmark CLI by @abidwael in #2394
rename kaggle_api_key to kaggle_key by @jppgks in #2384
use new URL for yosemite dataset by @jppgks in #2385
Encoder refactor V2 by @dantreiman in #2370
re-enable GBM tests after new lightgbm-ray release by @jppgks in #2393
Added option to log artifact location while creating mlflow experiment by @ShreyaR in #2397
Treat dataset columns as object dtype during first pass of handle_missing_values by @jeffreyftang in #2398
fix: ParameterMetadata JSON serialization bug by @ksbrar in #2399
Adds registry to organize backward compatibility updates around versions and config sections by @dantreiman in #2335
Include split column in explanation df by @connor-mccorm in #2405
Fix AimCallback to model_name as Run.name by @alberttorosyan in #2413
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2410
Hotfix: features eligible for shared params hyperopt by @arnavgarg1 in #2417
Nest FC Params in Decoder by @connor-mccorm in #2400
Hyperopt Backwards Compatibility by @connor-mccorm in #2419
Investigating test_resnet_block_layer intermittent test failure by @dantreiman in #2414
fix: Remove duplicate option from cell_type field schema by @ksbrar in #2428
Test for parameter updates in Ludwig Combiners - Part 3 by @jimthompson5802 in #2332
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2430
Hotfix: Proc column missing in output feature schema by @arnavgarg1 in #2435
Nest hyperopt parameters into decoder object by @arnavgarg1 in #2436
Fix: Make the twitter bots modeling example runnable by @justinxzhao in #2433
Add MLG-ULB creditcard fraud dataset by @jppgks in #2425
Bugfix: non-number inputs to GBM by @jppgks in #2418
GBM: log intermediate progress by @jppgks in #2421
Fix: Upgrade ludwig config before schema validation by @connor-mccorm in #2441
Log warning for calibration if validation set is trivially small by @dantreiman in #2440
Fixes calibration and adds example scripts by @dantreiman in #2431
Add medical no-show appointments dataset by @jppgks in #2387
Added conditional check for UNK token insertion into category feature vocab by @arnavgarg1 in #2429
Ensure synthetic dataset unit tests to clean up extra files. by @justinxzhao in #2442
Added feature specific parameter test for hyperopt by @arnavgarg1 in #2329
Fixed version transformation to accept user configs without ludwig_version by @tgaddair in #2424
Fix mulitple partition predict by @magdyksaleh in #2422
Cache jsonschema validator to reduce memory pressure by @tgaddair in #2444
[tests] Added more explicit lifecycle management to Ray clusters during tests by @tgaddair in #2447
Fix: explicit keyword args for seaborn plot fn by @jppgks in #2454
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2453
Extended hyperopt to support nested configuration block parameters by @tgaddair in #2445
Consolidate missing value strategy to only include bfill and ffill by @arnavgarg1 in #2457
fix: Switched Learning Rate to NonNegativeFloat Field by @connor-mccorm in #2446
Support GitHub Codespaces by @jppgks in #2463
Enh: quality-of-life improvements for export_torchscript by @geoffreyangus in #2459
Disables batch_size: auto for CPU-only training by @geoffreyangus in #2455
buxfix: triton model version as a string by @abidwael in #2461
Updating images to Ray 2.0.0 and CUDA 11.3 by @abidwael in #2390
Loss, Split, and Defaults Schema Additions by @connor-mccorm in #2439
More precise resource usage tracking by @abidwael in #2363
Summarizing performance metrics and resource usage results by @abidwael in #2372

New Contributors

@Marvjowa made their first contribution in #2236
@Dennis-Rall made their first contribution in #2192
@abidwael made their first contribution in #2263
@noahlh made their first contribution in #2284
@jeffkinnison made their first contribution in #2316
@andife made their first contribution in #2358
@alberttorosyan made their first contribution in #2413

Full Changelog: v0.5.3...v0.6.beta

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.6.beta

What's Changed

New Contributors

Contributors