v0.6.beta
Pre-release
Pre-release
·
1007 commits
to master
since this release
What's Changed
- Fix ray nightly import by @jppgks in #2196
- Restructured split config and added datetime splitting by @tgaddair in #2132
- enh: Implements
InferenceModule
as a pipelined module with separate preprocessor, predictor, and postprocessor modules by @brightsparc in #2105 - Explicitly pass data credentials when reading binary files from a RayBackend by @jeffreyftang in #2198
- MlflowCallback: do not end run on_trainer_train_teardown by @jppgks in #2201
- Fail hyperopt with full import error when Ray not installed by @tgaddair in #2203
- Make convert_predictions() backend-aware by @hungcs in #2200
- feat: MVP for explanations using Integrated Gradients from captum by @jppgks in #2205
- [Torchscript] Adds GPU-enabled input types for Vector and Timeseries by @geoffreyangus in #2197
- feat: Added model type GBM (LightGBM tree learner), as an alternative to ECD by @jppgks in #2027
- [Torchscript] Parallelized Text/Sequence Preprocessing by @geoffreyangus in #2206
- feat: Adding feature type shared parameter capability for hyperopt by @arnavgarg1 in #2133
- Bump up version to 0.6.dev. by @justinxzhao in #2209
- Define
FloatOrAuto
andIntegerOrAuto
schema fields, and use them. by @justinxzhao in #2219 - Define a dataclass for parameter metadata. by @justinxzhao in #2218
- Add explicit handling for zero-length image byte buffers to avoid cryptic errors by @jeffreyftang in #2210
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2231
- Create dataset util to form repeatable train/vali/test split by @amholler in #2159
- Bug fix: Use safe rename which works across filesystems when writing checkpoints by @dantreiman in #2225
- Add parameter metadata to the trainer schema. by @justinxzhao in #2224
- Add an explicit call to merge_wtih_defaults() when loading a config from a model directory. by @justinxzhao in #2226
- Fixes flaky test test_datetime_split[dask] by @dantreiman in #2232
- Fixes prediction saving for models with Set output by @geoffreyangus in #2211
- Make ExpectedImpact JSON serializable by @hungcs in #2233
- standardised quotation marks, added missing word by @Marvjowa in #2236
- Add boolean postprocessing to dataset type inference for automl by @magdyksaleh in #2193
- Update get_repeatable_train_val_test_split to handle non-stratified split w/ no existing split by @amholler in #2237
- Update R2 score to handle single sample computation by @arnavgarg1 in #2235
- Input/Output Feature Schema Refactor by @connor-mccorm in #2147
- Fix nan in entmax loss and flaky sparsemax/entmax loss tests by @dantreiman in #2238
- Fix preprocessing dataset split API backwards compatibility upgrade bug. by @justinxzhao in #2239
- Removing duplicates in constants from recent PRs by @arnavgarg1 in #2240
- Add attention scores of the vit encoder as an additional return value by @Dennis-Rall in #2192
- Unnest Audio Feature Preprocessing Config by @connor-mccorm in #2242
- Fixed handling of invalud number values to treat as missing values by @tgaddair in #2247
- Support saving numpy predictions to remote FS by @hungcs in #2245
- Use global constant for description.json by @hungcs in #2246
- Removed import warnings when LightGBM and Ray not requested by @tgaddair in #2249
- Adds ability to read images from numpy files and numpy arrays by @geoffreyangus in #2212
- Hyperopt steps per epoch not being computed correctly by @arnavgarg1 in #2175
- Fixed splitting when providing pre-split inputs by @tgaddair in #2248
- Added Backwards Compatibility for Audio Feature Preprocessing by @connor-mccorm in #2254
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2256
- Fix: Don't skip saving the model if the save path already exists. by @justinxzhao in #2264
- Load best weights outside of finally block, since load may throw an exception by @dantreiman in #2268
- Reduce number of distributed tests. by @justinxzhao in #2270
- [WIP] Adds
inference_utils.py
by @geoffreyangus in #2213 - Run github checks for pushes and merges to *-stable. by @justinxzhao in #2266
- Add ludwig logo and version to CLI help text. by @justinxzhao in #2258
- Add hyperopt_statistics.json constant by @hungcs in #2276
- fix: Make
BaseTrainerConfig
an abstract class by @ksbrar in #2273 - [Torchscript] Adds
--device
argument toexport_torchscript
CLI command by @geoffreyangus in #2275 - Use pytest tmpdir fixture wherever temporary directories are used in tests. by @justinxzhao in #2274
- adding configs used in benchmarking by @abidwael in #2263
- Fixes #2279 by @noahlh in #2284
- adding hardware usage and software packages tracker by @abidwael in #2195
- benchmarking utils by @abidwael in #2260
- dataclasses for summarizing benchmarking results by @abidwael in #2261
- Benchmarking core by @abidwael in #2262
- Fixed default eval_batch_size when setting batch_size=auto by @tgaddair in #2286
- Remove obsolete postprocess_inference_graph function. by @justinxzhao in #2267
- [Torchscript] Adds BERT tokenizer + partial HF tokenizer support by @geoffreyangus in #2272
- Support passing ground_truth as df for visualizations by @hungcs in #2281
- catching urllib3 exception by @abidwael in #2294
- Run pytest workflow on release branches. by @justinxzhao in #2291
- Save checkpoint if train_steps is smaller than batcher's steps_per_epoch by @dantreiman in #2298
- Fix typo in amazon review datasets: s/review_tile/review_title by @dantreiman in #2300
- Refactor non-distributed automl utils into a separate directory. by @justinxzhao in #2296
- Don't skip normalization in TabNet during inference on a single row. by @dantreiman in #2299
- Fix error in postproc_predictions calculation in model.evaluate() by @arnavgarg1 in #2304
- Test for parameter updates in Ludwig components by @jimthompson5802 in #2194
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2311
- Use warnings to suppress repeated logs for failed image reads by @arnavgarg1 in #2312
- Use ray dataset and drop type casting in binary_feature prediction post processing for speedup by @magdyksaleh in #2293
- Add size_bytes to DatasetInfo and DataSource by @jeffreyftang in #2306
- Fixes TensorDtype TypeError in Ray nightly by @geoffreyangus in #2320
- Add configuration section for global feature parameters by @arnavgarg1 in #2208
- Ensures unit tests are deleting artifacts during teardown by @geoffreyangus in #2310
- Fixes unit test that had empty Dask partitions after splitting by @geoffreyangus in #2313
- Serve json numpy encoding by @jeffkinnison in #2316
- fix: Mlflow config being injected in hyperopt config by @hungcs in #2321
- Update tests that use preprocessing to match new defaults config structure by @arnavgarg1 in #2323
- Bump test timeout to 60 minutes by @tgaddair in #2325
- Set a default value for size_bytes in DatasetInfo by @jeffreyftang in #2331
- Pin nightly versions to fix CI by @geoffreyangus in #2327
- Log number of failed image reads by @arnavgarg1 in #2317
- Add test with encoder dependencies for global defaults by @arnavgarg1 in #2342
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2334
- Add wine quality notebook to demonstrate using config defaults by @arnavgarg1 in #2333
- fix: GBM tests failing after new release from upstream dependency by @jppgks in #2347
- fix: restore overwrite of eval_batch_size on GBM schema by @jppgks in #2345
- Removes empty partitions after dropping rows and splitting datasets by @geoffreyangus in #2328
- fix: Properly serialize
ParameterMetadata
to JSON by @ksbrar in #2348 - Test for parameter updates in Ludwig Components - Part 2 by @jimthompson5802 in #2252
- refactor: Replace bespoke marshmallow fields that accept multiple types with a new 'combinatorial'
OneOfField
that accepts other fields as arguments. by @ksbrar in #2285 - Use Ray Datasets to read binary files in parallel by @tgaddair in #2241
- typos: Update README.md by @andife in #2358
- Respect the resource requests in RayPredictor by @magdyksaleh in #2359
- Resource tracker threading by @abidwael in #2352
- Allow writing init_config results to remote filesystems by @tgaddair in #2364
- Fixed export_mlflow command to not assume an existing registered_model_name by @tgaddair in #2369
- fix: Fixes to serialization, and update to allow set repo location. by @brightsparc in #2367
- Add amazon employee access challenge kaggle dataset by @justinxzhao in #2349
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2362
- Wrap read of cached training set metadata in try/except for robustness by @jeffreyftang in #2373
- Reduce dropout prob in test_conv1d_stack by @dantreiman in #2380
- fever: change broken download links by @jppgks in #2381
- Add default split config by @hungcs in #2379
- Fix CI: Skip failing ray GBM tests by @justinxzhao in #2391
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2389
- Triton ensemble export by @abidwael in #2251
- Fix: Random dataset splitting with 0.0 probability for optional validation or test sets. by @justinxzhao in #2382
- Print final training report as tabulated text. by @justinxzhao in #2383
- Add Ray 2.0 to CI by @tgaddair in #2337
- add GBM configs to benchmarking by @jppgks in #2395
- Optional artifact logging for MLFlow by @ShreyaR in #2255
- Simplify ludwig.benchmarking.benchmark API and add ludwig benchmark CLI by @abidwael in #2394
- rename kaggle_api_key to kaggle_key by @jppgks in #2384
- use new URL for yosemite dataset by @jppgks in #2385
- Encoder refactor V2 by @dantreiman in #2370
- re-enable GBM tests after new lightgbm-ray release by @jppgks in #2393
- Added option to log artifact location while creating mlflow experiment by @ShreyaR in #2397
- Treat dataset columns as object dtype during first pass of handle_missing_values by @jeffreyftang in #2398
- fix: ParameterMetadata JSON serialization bug by @ksbrar in #2399
- Adds registry to organize backward compatibility updates around versions and config sections by @dantreiman in #2335
- Include split column in explanation df by @connor-mccorm in #2405
- Fix AimCallback to model_name as Run.name by @alberttorosyan in #2413
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2410
- Hotfix: features eligible for shared params hyperopt by @arnavgarg1 in #2417
- Nest FC Params in Decoder by @connor-mccorm in #2400
- Hyperopt Backwards Compatibility by @connor-mccorm in #2419
- Investigating test_resnet_block_layer intermittent test failure by @dantreiman in #2414
- fix: Remove duplicate option from
cell_type
field schema by @ksbrar in #2428 - Test for parameter updates in Ludwig Combiners - Part 3 by @jimthompson5802 in #2332
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2430
- Hotfix: Proc column missing in output feature schema by @arnavgarg1 in #2435
- Nest hyperopt parameters into decoder object by @arnavgarg1 in #2436
- Fix: Make the twitter bots modeling example runnable by @justinxzhao in #2433
- Add MLG-ULB creditcard fraud dataset by @jppgks in #2425
- Bugfix: non-number inputs to GBM by @jppgks in #2418
- GBM: log intermediate progress by @jppgks in #2421
- Fix: Upgrade ludwig config before schema validation by @connor-mccorm in #2441
- Log warning for calibration if validation set is trivially small by @dantreiman in #2440
- Fixes calibration and adds example scripts by @dantreiman in #2431
- Add medical no-show appointments dataset by @jppgks in #2387
- Added conditional check for UNK token insertion into category feature vocab by @arnavgarg1 in #2429
- Ensure synthetic dataset unit tests to clean up extra files. by @justinxzhao in #2442
- Added feature specific parameter test for hyperopt by @arnavgarg1 in #2329
- Fixed version transformation to accept user configs without ludwig_version by @tgaddair in #2424
- Fix mulitple partition predict by @magdyksaleh in #2422
- Cache jsonschema validator to reduce memory pressure by @tgaddair in #2444
- [tests] Added more explicit lifecycle management to Ray clusters during tests by @tgaddair in #2447
- Fix: explicit keyword args for seaborn plot fn by @jppgks in #2454
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2453
- Extended hyperopt to support nested configuration block parameters by @tgaddair in #2445
- Consolidate missing value strategy to only include bfill and ffill by @arnavgarg1 in #2457
- fix: Switched Learning Rate to NonNegativeFloat Field by @connor-mccorm in #2446
- Support GitHub Codespaces by @jppgks in #2463
- Enh: quality-of-life improvements for
export_torchscript
by @geoffreyangus in #2459 - Disables
batch_size: auto
for CPU-only training by @geoffreyangus in #2455 - buxfix: triton model version as a string by @abidwael in #2461
- Updating images to Ray 2.0.0 and CUDA 11.3 by @abidwael in #2390
- Loss, Split, and Defaults Schema Additions by @connor-mccorm in #2439
- More precise resource usage tracking by @abidwael in #2363
- Summarizing performance metrics and resource usage results by @abidwael in #2372
New Contributors
- @Marvjowa made their first contribution in #2236
- @Dennis-Rall made their first contribution in #2192
- @abidwael made their first contribution in #2263
- @noahlh made their first contribution in #2284
- @jeffkinnison made their first contribution in #2316
- @andife made their first contribution in #2358
- @alberttorosyan made their first contribution in #2413
Full Changelog: v0.5.3...v0.6.beta