Skip to content

​​v0.4.1: Ray training, Ray datasets, experimental AutoML with auto config generation integrated with hyperopt on RayTune, image improvements, Python3.9/TF2.7

Compare
Choose a tag to compare
@justinxzhao justinxzhao released this 01 Feb 07:28

Summary

This release features experimental AutoML with auto config generation and auto-training integrated with hyperopt on RayTune, and integrations with Ray training and Ray datasets. We're still working on a comprehensive overhaul of the documentation, and all the new functionality will all available in the upcoming v0.5 too.

Aside from critical bugs and new datasets, v0.4.1 will be the last release of Ludwig using TensorFlow. Starting with v0.5+ (release coming soon), Ludwig will use PyTorch as the backend for tensor computation. We will release a blogpost detailing the rationale and impact of this decision, but we wanted to do one last TensorFlow release to make sure that all those committed to a TensorFlow ecosystem that have used Ludwig so far could enjoy the benefits of many bug fixes and improvements we did on the codebase that were not specific to PyTorch.

The next version v0.5 will also have several additional improvements that we’ll be excited to share in the coming weeks.

Additions

Improvements

  • Allow logging params to mlflow from any epoch by @tgaddair in #1211
  • Changed remote fs behavior to upload at the end of each epoch by @tgaddair in #1210
  • Add metric and loss modules for RMSE, RMSPE, and AUC by @ANarayan in #1214
  • [hyperopt] fixed metric_score to use test split when available by @tgaddair in #1239
  • Fixed metric selection to ignore config split if unavailable by @tgaddair in #1248
  • Ray Tune Intermediate Checkpoint Cleaning by @ANarayan in #1255
  • Do not initialize Ray if already initalized by @Yard1 in #1277
  • Changed default combiner to concat from tabnet by @ShreyaR in #1278
  • Ray data migration by @ShreyaR in #1260
  • Fix automl to treat binary as categorical when missing values present by @tgaddair in #1292
  • Add serialization for DatasetInfo and round avg_words to int by @hungcs in #1294
  • Cast max_length to int in build_sequence_matrix::pad by @Yard1 in #1295
  • [automl] update model config parameter ranges by @ANarayan in #1298
  • Change INFER_IMAGE_DIMENSIONS default to True by @hungcs in #1303
  • Add HTTPS retries for image urls by @hungcs in #1304
  • Return None for unreadable images and try to infer num channels by @hungcs in #1307
  • Add gray image/avg image fallbacks for unreachable images by @hungcs in #1312
  • Account for image extensions during image type inference by @hungcs in #1335
  • Fixed schema validation to handle null preprocessing values for strings by @tgaddair in #1344
  • Added default size and output_size for tabnet by @tgaddair in #1355
  • Removed DaskBackend and moved tests to RayBackend by @tgaddair in #1412
  • Perform preprocessing first before hyperopt when possible by @tgaddair in #1415
  • Employ a fallback str2bool mapping from the feature column's distinct values when the feature's values aren't boolean-like. by @justinxzhao in #1471
  • Remove trailing dot in income label field in adult_census… by @amholler in #1475
  • Update Ludwig AutoML Feature Type Selection by @amholler in #1485
  • Update infer_type tests to reflect interface and functionality updates by @amholler in #1493
  • Skip converting to TensorDType if the column is binary by @tgaddair in #1547
  • Remove TensorDType conversion for all scalar types by @tgaddair in #1560
  • Update AutoML tabular model type choice to remove heuristic for concat by @amholler in #1548
  • Better handle empty fields with distinct_values=[] by @hungcs in #1574
  • Port #1476 ('dict' option for weights_initializer and bias_initializer) to tf_legacy by @ksbrar in #1599
  • Modify combiners to accept input_features as a dict instead of a list by @jeffreyftang in #1618
  • Update hyperopt: Choose best model from validation data; For stopped Ray Tune trials, run evaluate at search end by @amholler in #1612
  • Keep search_alg type in dict to record in hyperopt_statistics.json by @amholler in #1626
  • For ames_housing, remove test.csv from processing; it has no label column which prevents test split eval by @amholler in #1634
  • Improve Ludwig resilience to Ray Tune issues by @amholler in #1660
  • Handle download gzip files by @amholler in #1676
  • Upgrade tf from 2.5.2 to 2.7.0. by @justinxzhao in #1713
  • Add basic precommit to tf-legacy to pass precommit checks on tf-legacy PRs. by @justinxzhao in #1718
  • For kdd datasets, do not include unlabeled test data by default by @amholler in #1704
  • Use config which has been previously validated by @vreyespue in #1213
  • Update Readme to activate directly the virtualenv by @vreyespue in #1212
  • doc: Correct README.md link to Developer Guide by @jimthompson5802 in #1217
  • Update pandas version by @w4nderlust in #1223
  • Modify Kaggle datasets to not process test sets by @ANarayan in #1233
  • Restructure dataframe preprocessing setup and change to avoid creatin… by @amholler in #1240

Bug fixes

Other changes and things to note

  • Moved experiments to separate repo by @tgaddair in #1245
  • Neuropod does not yet support python 3.9. Ludwig still supports neuropod for python<=3.8.

New Contributors

Full Changelog: v0.4...v0.4.1