Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
5 contributors

Users who have contributed to this file

@michalkurka @abal5 @tomkraljevic @ledell @angela0xdata
8859 lines (6828 sloc) 615 KB

Recent Changes

H2O

Yates (3.24.0.4) - 5/28/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yates/4/index.html

Bug

  • [PUBDEV-4305] - Fixed an error that occurred when applying as.matrix() to an h2o dataframe with numeric values of size ~ 600K x 300.
  • [PUBDEV-5937] - Introduced a new xgboost.predict.native.enable property, which ensures that H2OXGBoostEstimator will no longer always predicts the same value.
  • [PUBDEV-6440] - Users can now parse files from s3 using s3's directory URL with s3 protocol.
  • [PUBDEV-6475] - Fixed an issue that caused h2o.getModelTree to produce an "invalid object for slot nas" error when XGBoost produced a root-node only decision tree.
  • [PUBDEV-6476] - Improved performance of H2OXGBoost on OS X.
  • [PUBDEV-6479] - In Stacked Ensembles, fixed a categorical encoding mismatch error when building the ensemble. Users can now use SE on top of base models that are trained with categorical encoding.
  • [PUBDEV-6483] - In Isolation Forest, you can now specify that mtries = the number of features.
  • [PUBDEV-6488] - Fixed an issue that caused XGBoost to produce a tree with split features being all NA.
  • [PUBDEV-6489] - In h2o.getModelTree, when retrieving a threshold for values that are all NAs, updated the description to state that the "Split value is NA."
  • [PUBDEV-6490] - Fixed an issue that caused trivial features with NAs to be given inflated importance when monotonicity constraints was enabled. As a result, variable importance values were incorrect.
  • [PUBDEV-6491] - Fixed an NPE issue at water.init.HostnameGuesser when trying to launch a Sparkling Water cluster.
  • [PUBDEV-6496] - Removed internal_cv_weights from h2o.predict_contributions() output when the prediction was used on a fold column from a model run with nfolds.
  • [PUBDEV-6521] - Models that use Label Encoding no longer predict incorrectly on test data.
  • [PUBDEV-6523] - Predictions now work correctly on a subset of training features when using categorical_encoding.
  • [PUBDEV-6532] - Fixed an issue that caused XGBoost to format non-integer numbers (doubles, floats) using Locale.ENGLISH to ensure that a decimal point "." was used instead of a comma ",". This locale setting grouped large numbers by thousands and split the groups with ",", which was unparseable to XGBoost.

New Feature

  • [PUBDEV-6478] - Added support for CDH 6.2.
  • [PUBDEV-6503] - Users can now specify an external IP for h2odriver callback.

Improvement

  • [PUBDEV-6519] - Added a "toCategoricalCol" helper function for column type conversion.
  • [PUBDEV-6522] - Renamed "Generic Models" to "MOJO Import" in the documentation.

Docs

  • [PUBDEV-6486] - Added CDH 6.2 to list of supported Hadoop platforms.
  • [PUBDEV-6511] - Added the import_hive_table() and import_mojo() functions to the R HTML documentation.

Yates (3.24.0.3) - 5/7/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yates/3/index.html

Bug

  • [PUBDEV-5969] - Updated H2O-3 Plotting Functionality to be Compatible with Matplotlib Version 3.0.0.
  • [PUBDEV-6384] - Flow now shows the correct long value of a seed.
  • [PUBDEV-6394] - Fixed an issue that cause Rapids string operations on enum (categorical) columns to yield counterintuitive results.
  • [PUBDEV-6402] - Fixed an issue that caused monotonicity constraint in XGBoost to fail with certain parameters
  • [PUBDEV-6408] - Fixed an ArrayIndexOutOfBounds error. that occurred when parsing quotes in CSV files.
  • [PUBDEV-6416] - Fixed an error with Grid Search that caused the API to print errors not related to model CURRENTLY being added to the grid, but for all previous failures. This occurred even when the model was not added to the grid due to failure.
  • [PUBDEV-6431] - Fixed an exception that occurred when requesting Jobs from h2o.
  • [PUBDEV-6439] - When using Python 2.7, fixed an issue with non-ascii character handling in the as_data_frame() method.
  • [PUBDEV-6449] - Predicting on a dataset that has a response column with domain in a different order no longer leads to memory leaks.
  • [PUBDEV-6451] - Fixed an issue with retrieving details of a GLM model in Flow due to lack of support for long seeds.

Improvement

  • [PUBDEV-6419] - Simplified the directory structure of logs within downloaded zip archives.
  • [PUBDEV-6428] - Upgrades XGBoost to latest stable build.
  • [PUBDEV-6435] - Users can how import and upload MOJOs in R and Python using `import_mojo()` and `upload_mojo()`.
  • [PUBDEV-6450] - It is now possible to retrieve a list of features from a trained model.

Docs

  • [PUBDEV-6024] - Enhanced the GBM Reproducibility FAQ.
  • [PUBDEV-6456] - Added information about the Target Encoding smoothing parameter to the User Guide.

Yates (3.24.0.2) - 4/16/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yates/2/index.html

Bug

  • [PUBDEV-6221] - In the R client, fixed a caching issue that caused tests to fail when running commands line by line after running the entire test at once.
  • [PUBDEV-6369] - Fixed an issue that caused the h2o.upload_custom_metric to fail when using python3.
  • [PUBDEV-6370] - Fixed an issue that caused h2o.upload_custom_metric to fail on data that includes strings.
  • [PUBDEV-6371] - Fixed an issue with the K-Means_Example.flow.
  • [PUBDEV-6372] - The IP:port that is shown for logging now matches the IP:port that is described in the makeup of the cluster.
  • [PUBDEV-6377] - In XGBoost, fixed an AIOOB issue that occurred when running large data.
  • [PUBDEV-6390] - H2O-hive is now published to Maven central.
  • [PUBDEV-6393] - The Rapids as.factor operation no longer automatically converts non-ASCII strings to sanitized forms.
  • [PUBDEV-6395] - Fixed an AIOOB error in the AUC builder.
  • [PUBDEV-6399] - AUCBuilder now finds the first bin to merge when merging per-chunk histograms.
  • [PUBDEV-6409] - When running H2O on Hadoop, Hadoop now writes only to its container directory.
  • [PUBDEV-6418] - Users now receive a warning if two different versions of H2O are trying to communicate on the same node.
  • [PUBDEV-6421] - Fixed an issue that caused the H2O Python package to fail to load on a fresh install from pip.
  • [PUBDEV-6433] - Fixed an error that occurred when running multiple concurrent Group-By operations.

Improvement

  • [PUBDEV-6310] - The new GCP Marketplace offering contains the option to add a network tags script.

Docs

  • [PUBDEV-6040] - Added Python examples to the Target Encoding topic.
  • [PUBDEV-6401] - Fixed links to Sparkling Water topics in the Sparkling Water FAQ.
  • [PUBDEV-6425] - In CoxPH chapter, changed the link for the available R demo.

Yates (3.24.0.1) - 3/31/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yates/1/index.html

Bug

  • [PUBDEV-6159] - The AutoMLTest.java test suite now runs correctly on a local machine.
  • [PUBDEV-6189] - Fixed an issue in as_date that occurred when the column included NAs.
  • [PUBDEV-6208] - AutoML no longer fails if one of the Stacked Ensemble models is deleted.
  • [PUBDEV-6230] - Removed elipses after the H2O server link when launching the Python client.
  • [PUBDEV-6231] - In Deep Learning, fixed an issue that occurred when running one-hot-encoding on categoricals.
  • [PUBDEV-6262] - When running GBM in R without specifically setting a seed, users can now extract the seed that was used to build the model and reproduce that model.
  • [PUBDEV-6266] - In predictions, fixed an issue that resulted in a "Categorical value out of bounds error" when calling a model.
  • [PUBDEV-6284] - The Python API no longer reverses the labels for positive and negative values in the standardized coefficients plot legend.
  • [PUBDEV-6346] - In R, fixed an issue that cause group_by mean to only calculate one column when multiple columns were specified.
  • [PUBDEV-6350] - Fixed an issue that caused the confusion_matrix method to return matrices for other metrics.
  • [PUBDEV-6357] - Fixed an issue that resulted in a "Categorical value out of bounds error" when calling a model using Python.
  • [PUBDEV-6360] - Improved the error message that displays when a user attempts to modify an Enum/categorical column as if it were a string.
  • [PUBDEV-6367] - Rows that start with a # symbol are no longer dropped during the import process.
  • [PUBDEV-6368] - Fixed an SVM import failure.
  • [PUBDEV-6376] - Fixed an issue that caused the default StackedEnsemble prediction to fail when applied to a test dataset without a response column.
  • [PUBDEV-6379] - Fixed handling of BAD state in CategoricalWrapperVec.

New Feature

  • [PUBDEV-4680] - Added Blending mode to Stacked Ensembles, which can be specified with the `blending_frame` parameter. With Blending mode, you do not use cross-validation preds to train the metalearner. Instead you score the base models on a holdout set and use those predicted values.
  • [PUBDEV-5801] - Model output now includes column names and types.
  • [PUBDEV-5809] - AutoML now includes a max_runtime_secs_per_model option.
  • [PUBDEV-5925] - In GLM, added support for negative binomial family.
  • [PUBDEV-5980] - ExposeD Java target encoding to R.
  • [PUBDEV-6056] - For GBM and XGBoost models, users can now generate feature contributions (SHAP values).
  • [PUBDEV-6136] - Added support for Generic Models, which provide a means to use external, pretrained MOJO models in H2O for scoring. Currently only GBM, DRF, IF, and GLM MOJO models are supported.
  • [PUBDEV-6180] - Added the blending_frame parameter to Stacked Ensembles in Flow.
  • [PUBDEV-6196] - Added an include_algos parameter to AutoML in the R and Python APIs. Note that in Flow, users can specify exclude_algos only.
  • [PUBDEV-6339] - In the R and Python clients, added a function that calculates the chunk size based on raw size of the data, number of CPU cores, and number of nodes.
  • [PUBDEV-6344] - Added ability to import from Hive using metadata from Metastore.
  • [PUBDEV-6358] - Users can now choose the database where import_sql_select creates a temporary table.
  • [PUBDEV-6365] - Added support for monotonicity constraints for binomial GBMs.
  • [PUBDEV-6374] - Users can now define custom HTTP headers using an `-add_http_header` option.
  • [PUBDEV-6386] - XGBoost MOJO now uses Java predictor by default.

Task

  • [PUBDEV-4982] - Fixed an issue that caused the pyunit_lending_club_munging_assembly_large.py and pyunit_assembly_munge_large.py tests to sometimes fail when run inside a Docker container.
  • [PUBDEV-5876] - Simplified and improved the GLM COD implementation.

Improvement

  • [PUBDEV-5491] - SQLite support is available via any JDBC driver in streaming mode.
  • [PUBDEV-5993] - Updated Retrofit and okHttp dependecies.
  • [PUBDEV-6129] - Target Encoding is now available in the Python client.
  • [PUBDEV-6176] - Moved StackedEnsembleModel to hex.ensemble packages. In prior versions, this was in a root hex package.
  • [PUBDEV-6188] - Secret key ID and secret key are available for s3:// AWS protocol.
    • This can be done in the R client using:
      h2o.setS3Credentials(accessKeyId, accesSecretKey)

    • and in the Python client using:
      from h2o.persist import set_s3_credentials
      set_s3_credentials(access_key_id, secret_access_key)
  • [PUBDEV-6217] - Users can now specify AWS credentials at runtime.
  • [PUBDEV-6254] - The new blending_frame parameter is now available in AutoML.
  • [PUBDEV-6334] - Fixed an error in the Javadoc for the Frame.java sort function.
  • [PUBDEV-6363] - Fixed Hive delegation token generation.
  • [PUBDEV-6388] - Reordered the algorithms train in AutoML and prioritized hardcoded XGBoost models.

Docs

  • [PUBDEV-4977] - Removed FAQ indicating that Java 9 was not yet supported.
  • [PUBDEV-6136] - Added a "Generic Models" chapter to the Algorithms section.
  • [PUBDEV-6179] - Added the blending_frame parameter to Stacked Ensembles documentation.
  • [PUBDEV-6280] - Added information about the Negative Binomial family to the GLM booklet and the user guide.
  • [PUBDV-6289] - Improved the R and Python client documentation for the `sum` function.
  • [PUBDEV-6331] - Added include_algos,e xclude_algos, max_models, and max_runtime_secs_per_model examples to the Parameters appendix.
  • [PUBDEV-6362] - In the User Guide and R an Python documentation, replaced references to "H2O Cloud" with "H2O Cluster".
  • [PUBDEV-6375] - Added information about predict_contributions to the Performance and Prediction chapter.
  • [PUBDEV-6381] - In the GBM chapter, noted that monotone_constraints is available for Bernoulli distributions in addition to Gaussian distributions.
  • Improved the GBM Reproducibility FAQ.

Xu (3.22.1.6) - 3/13/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xu/6/index.html

Bug

  • [PUBDEV-6335] - In GBM, added a check to ensure that monotonicity constraints can only be used when distribution="gaussian".
  • [PUBDEV-6342] - Fixed an issue that caused decreasing monotonic constraints to fail to work correctly. Min-Max bounds are now properly propagated to the subtrees.

Improvement

  • [PUBDEV-6343] - Added internal validation of monotonicity of GBM trees.

Docs

Xu (3.22.1.5) - 3/4/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xu/5/index.html

Bug

  • [PUBDEV-6283] - Fixed an issue that caused stratified_split to fail when run on same column twice.
  • [PUBDEV-6290] - Fixed an error that occurred when retreiving AutoML leader model with max_models = 1 in R.
  • [PUBDEV-6292] - Fixed an issue that ersulted in an extra NA row in the GLM variable importance frame.
  • [PUBDEV-6298] - h2odriver now works correctly on MapR.
  • [PUBDEV-6300] - Flow no longer displays an error when searching for a file without first providing a path.
  • [PUBDEV-6303] - GBM monotonicity constraints now correctly preserves the exact monotonicity.
  • [PUBDEV-6304] - Fixed the warning message that displays for categorical data with more then 10,000,000 values.
  • [PUBDEV-6305] - Users can now download logs from R after connecting via Steam.
  • [PUBDEV-6313] - In AutoML, created new partition rules for generating new validation and leaderboard frames when cross validation is disabled and validation/leaderboard frames are not provided:
    • If only the validation frame is missing: training/validation = 90/10.
    • If only the leaderboard frame is missing: training/leaderboard = 90/10.
    • If both the validation and leaderboard frames are missing: training/validation/leaderboard = 80/10/10.
  • [PUBDEV-6321] - Fixed resolution of `spark-shell --packages "ai.h2o:h2o-algos:"` by Spark Ivy resolver.
  • [PUBDEV-6333] - Fixed an issue that caused h2o driver to fail to start when Hive was not configured.

Improvement

  • [PUBDEV-6271] - In Isolation Forest, fixed an issue that caused the minimum and maximum path length to not be correctly calculated when there are no OOB observations.
  • [PUBDEV-6294] - A `check_constant_response` option is available in DRF and GBM. When enabled (default), then an exception is thrown if the response column is a constant value.

Docs

  • [PUBDEV-5554] - When running XGBoost on Hadoop, recommend that users set -extramempercent to 120.
  • [PUBDEV-6287] - Added the new check_constant_response option to the GBM and DRF chapters. Also added an example usage to the Parameters Appendix.
  • [PUBDEV-6301] - Added a description of the AUCPR metric to the Model Performance section in the User Guide.
  • [PUBDEV-6314] - Fixed the Random Grid Search in Python example in the Grid Search chapter.

Xu (3.22.1.4) - 2/15/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xu/4/index.html

Bug

  • [PUBDEV-6242] - Users can now save and load Isolation Forest models.
  • [PUBDEV-6264] - In K-Means, fixed and issue in which time columns were treated as if they were categorical.
  • [PUBDEV-6267] - Fixed Autoencoder `calculateReconstructionErrorPerRowData` error and set the default value of the result MSE to -1.

Improvement

  • [HEXDEV-733] - When using h2o.import_sql_table to read from a Hive table, the username and password no longer appear in the logs.
  • [PUBDEV-6207] - Monotone constraints are now exposed in Flow.
  • [PUBDEV-6277] - The check for constants in response columns is now optional for all models.

Docs

  • [PUBDEV-6032] - Added to the documentation that MOJO/POJO predict cannot parse columns enclosed in double quotes (for example, ""2"").
  • [PUBDEV-6174] - Updated the description for Gini in the User Guide.
  • [PUBDEV-6183] - Fixed the equation for Tweedie Deviance in the GLM booklet and in the User Guide.
  • [PUBDEV-6199] - Added a "Tokenize Strings" topic to the Data Manipulation chapter.
  • [PUBDEV-6245] - Added `predict_leaf_node_assignment` information to the User Guide in the Performance and Prediction chapter.
  • [PUBDEV-6253] - Noted in the documentation that the `custom` and `custom_increasing` stopping metric options are not available in the R client.

Xu (3.22.1.3) - 1/25/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xu/3/index.html

Bug

  • [PUBDEV-6186] - Improved error handling for a wrong Hive JDBC connector error.
  • [PUBDEV-6233] - Fixed an issue that caused H2O clusters to fail to come up on Cloudera 6 with HTTPS.

New Feature

  • [PUBDEV-6216] - Added Hive with Kerberos support for H2O on Hadoop.

Docs

  • [PUBDEV-6219] - Updated the default value for min_rows in the User Guide when used with XGBoost, DRF, and Isolation Forest.

Xu (3.22.1.2) - 1/18/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xu/2/index.html

Bug

  • [PUBDEV-6109] - In Flow, fixed an issue that caused POJOs, MOJOs, and genmodel.jar to fail to download. This occurred when Flow was launched via Enterprise Steam and in any deployment where user_context was specified.
  • [PUBDEV-6177] - Fixed an issue that caused H2OTree to fail with Isolation Forest models trained on data with categorical columns.
  • [PUBDEV-6178] - When a new tree is assembled from a model, the root node now includes information about the split feature in the description array.
  • [PUBDEV-6181] - Fixed an issue where Flow failed to provide the ability to ignore certain columns.
  • [PUBDEV-6192] - In Flow, fixed an issue where users were not able to select a frame when splitting a dataset.
  • [PUBDEV-6197] - Setting the `ignored_columns` parameter via the Python API now works correctly.
  • [PUBDEV-6198] - Fixed an issue that caused H2O to hang in Sparkling Water deployments.
  • [PUBDEV-6200] - Splitting frames now works correctly in Flow.
  • [PUBDEV-6201] - Import SQL Table now works correctly in Flow.
  • [PUBDEV-6203] - Fixed an issue with imports in Flow.
  • [PUBDEV-6204] - Fixed interaction pairs for GLM in Flow.
  • [PUBDEV-6206] - Fixed broken "Combine predictions with frame" in Flow.

New Feature

Task

  • [PUBDEV-6171] - Fixed the pyunit_pubdev_3500_max_k_large.py unit test.
  • [PUBDEV-6172] - Fixed the runit_PUBDEV_5705_drop_columns_parser_gz.R unit test.

Improvement

  • [PUBDEV-6167] - Increased the XGBoost stress test timeout.
  • [PUBDEV-6188] - Implemented secret key credentials for s3:// AWS protocol.
  • [PUBDEV-6205] - Renamed .jade files to .pug.

Docs

  • [PUBDEV-6165] - Added HDP 3.0 and 3.1 to list of supported Hadoop versions.
  • [PUBDEV-6190] - Updated wording for Kmeans Scoring History Graph. This graph shows the number of iterations vs. within the cluster’s sum of squares.

Xu (3.22.1.1) - 12/28/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xu/1/index.html

Bug

  • [PUBDEV-5236] - PCA tests now work correctly with the "from h2o.estimators.pca import H2OPrincipalComponentAnalysisEstimator" import statement.
  • [PUBDEV-5956] - Fixed an AutoMLTest test that was leaking keys in KeepCrossValidationFoldAssignment test.
  • [PUBDEV-6081] - Reduced the Invocation JMH level setup/teardown to only the training model.
  • [PUBDEV-6124] - In XGBoost, the default value of L2 regularization for tree models is now 1, which is consistent with native XGBoost.
  • [PUBDEV-6157] - Fixed an issue that caused Stacked Ensembles to fail with GLM metalearner when the same H2O instance was used to train a GLM multinomial classification model with more classes than what is used in Stacked Ensembles.

New Feature

  • [PUBDEV-5261] - Users can now specify `custom` and `custom_increasing` when setting the `stopping_criteria` parameter in GBM and DRF.
  • [PUBDEV-5770] - Checkpoints can now be exported when running Grid Search or AutomL.

Task

Improvement

  • [PUBDEV-5820] - Hadoop builds now work with Jetty 8 and 9.
  • [PUBDEV-5897] - R examples in the R package docs now use Hadley's style guide.

Docs

  • [PUBDEV-6048] - Added documentation for the new stopping_metric options in GBM and DRF.
  • [PUBDEV-6154] - Added CDH 6 and 6.1 to list of supported Hadoop versions.
  • [PUBDEV-6156] - In the XGBoost chapter, updated the default value for reg_lambda to be 1.

Xia (3.22.0.5) - 1/16/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xia/5/index.html

Bug

  • [PUBDEV-6198] - Fixed an H2O hang issue in Sparkling Water deployments.

Xia (3.22.0.4) - 1/4/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xia/4/index.html

Bug

  • [PUBDEV-6109] - In Flow, fixed an issue that caused POJOs, MOJOs, and genmodel.jar to fail to download. This occurred when Flow was launched via Enterprise Steam and in any deployment where user_context was specified.
  • [PUBDEV-6166] - On the external backedn, H2O now explicitly passes the timestamp from the Spark Driver node.

Xia (3.22.0.3) - 12/21/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xia/3/index.html

Bug

  • [PUBDEV-5829] - Fixed an issue with the REST API. Calling "get model" no longer returns 0 for the timestamp of the model.
  • [PUBDEV-5959] - The PySparking client no longer hangs after re-connecting to the H2O external backend.
  • [PUBDEV-5990] - Fixed an OOM issue in h2o.arrange.
  • [PUBDEV-6059] - Fixed an issue that caused importing Pargue files with large Double data to fail.
  • [PUBDEV-6076] - After applying group_by to a time stamped column, the original time stamp format is now retained.
  • [PUBDEV-6079] - In AutoML, cross-validation metrics are now used for early stopping by default. Because of this, the validation_frame argument is now ignored unless nfolds==0 and, in that case, will be used for early stopping.
  • [PUBDEV-6098] - Fixed an issue that caused the MOJO visualizer to fail for Isolation Forest models.
  • [PUBDEV-6101] - StackedEnsembleMojoModel is now serializable.
  • [PUBDEV-6107] - In the R client, fixed an error that occurrred when running getModelTree.
  • [PUBDEV-6109] - In Flow, fixed an issue that caused POJOs, MOJOs, and genmodel.jar to fail to download. This occurred when Flow was launched via Enterprise Steam and in any deployment where user_context was specified.
  • [PUBDEV-6111] - Fixed the formula used for calculating L2 distance.
  • [PUBDEV-6117] - The Python client now allows users to enable XGBoost compare with any H2O frame. The convert_H2OFrame_2_DMatrix method accepts any H2O frame and can convert it to valid data for native XGBoost.
  • [PUBDEV-6120] - H2O XGBoost now reports correct variable importances. The variable importances are computed from the gains of their respective loss functions during tree construction.
  • [PUBDEV-6122] - Users can now save PDP plots.
  • [PUBDEV-6123] - Fixed an issue that resulted in a SQL exception when connecting H2O to a SQL server and importing a table.
  • [PUBDEV-6137] - Fixed an issue with GCS support on Hadoop environments.

New Feature

  • [PUBDEV-1984] - Added monotonic variables for GBM.
  • [PUBDEV-6030] - EasyPredictModelWrapper now calculates reconstruction errors for AutoEncoder.
  • [PUBDEV-6091] - When running a grid search, a timesteamp column was added that shows when each model was added to the grid summary table.

Improvement

  • [PUBDEV-5865] - In GBM, users can now specify the `monotone_constraints` parameter.
  • [PUBDEV-6106] - Prediction contributions from each tree from MOJO to easywrapper are now exposed.
  • [PUBDEV-6110] - Updated Gradle to version 5.0.
  • [PUBDEV-6115] - Fixed the output of rankTsv in the AutoML leaderboard.

Docs

  • [PUBDEV-4377] - Updated the Prediction section to include information on how the prediction threshold is selected for classification problems.
  • [PUBDEV-6105] - Updated the description of enum_limited to indicate that T=1024.
  • [PUBDEV-6148] - In the GBM chapter, added `monotone_constraints` to list of available parameters.

Xia (3.22.0.2) - 11/21/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xia/2/index.html

Bug

  • [PUBDEV-3281] - Fixed an issue that caused ARFF parser to parse some file incorrectly.
  • [PUBDEV-4737] - When performing a grid search in Python, fixed an issue that caused all models to return a model.type of "supervised."
  • [PUBDEV-5352] - When running DRF in the Python client, checkpointing on new data now works correctly.
  • [PUBDEV-5869] - Fixed an issue that caused the confusion matrix recall and precision values to be switched.
  • [PUBDEV-6036] - In the Python client, fixed an issue that caused the `offset_column` parameter to be ignored when it was passed in the GLM train statement.
  • [PUBDEV-6042] - The H2O Tree Handler now works correctly on Isolation Forest models.
  • [PUBDEV-6046] - When running AutoML, fixed an issue that resulted in a "Failed to get metric: auc from ModelMetrics type BinomialGLM" message.
  • [PUBDEV-6050] - In Flow, Precision and Recall definitions are no longer inverted in the confusion matrix.
  • [PUBDEV-6052] - Fixed the error message that displays when converting from a pandas dataframe to an h2oframe in Python 3.6.
  • [PUBDEV-6054] - In XGBoost, fixed an issue that resulted in a "Maximum amount of file descriptors hit" message.
  • [PUBDEV-6060] - Fixed the description of sample_rate in Isolation Forest.
  • [PUBDEV-6063] - Cross validation models are no longer deleted by default.
  • [PUBDEV-6065] - When viewing an AutoML leaderboard, fixed an issue that resulted in an ArrayIndexOutOfBoundsException if `sort_metric` was specified but no model was built.

New Feature

  • [PUBDEV-5766] - Added monotonicity constraints to H2O XGBoost.

Task

  • [PUBDEV-6039] - When generating MOJOs, h2o-genmodel.jar now includes a check for MOJO version 1.3 to determine whether the ho2-genmodel.jar and the MOJO version can work together. Prior versions of h2o-3 did not include MOJO 1.3, and as a result, MOJOs silently returned predicted values executed on an empty vector.

Improvement

  • [PUBDEV-5705] - With a new `skipped_columns` option, users can now specify to drop specific columns before parsing. Note that this functionality is not supported for SVMLight or Avro file formats.
  • [PUBDEV-6062] - The GLM multinomial coefficient table now includes the original levels as column names.

Docs

  • [PUBDEV-3216] - Created new Performance & Prediction and Variable Importance sections in the User Guide.
  • [PUBDEV-5313] - Updatd the default value of `categorical_encoding` for XGBoost. This defaults to Auto (which is one_hot_encoding).
  • [PUBDEV-6012] - In the parameter entry for `weights_column`, updated the example to exclude the weight column in the list of predictors.
  • [PUBDEV-6016] - In the DRF FAQ, updated the "What happens when you try to predict on a categorical level not seen during training?" question.
  • [PUBDEV-6025] - TargetingEncoder is now included in the Python module docs.
  • [PUBDEV-6041] - In GLM, updated the documentation to indicate that coordinate_descent is no longer experimental.
  • [PUBDEV-6064] - Added default values for `max_depth`, `sample_size`, and `sample_rate`. Also added a parameter description entry for `sample_size`, showing an Isolation Forest example.
  • [PUBDEV-6086] - Added the new `monotone_constraints` option to the XGBoost chapter.

Xia (3.22.0.1) - 10/26/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xia/1/index.html

Bug

  • [PUBDEV-5023] - In Python, the metalearner method is only available for Stacked Ensembles.
  • [PUBDEV-5658] - Fixed an issue that caused micro benchmark tests to fail to run in the jmh directory.
  • [PUBDEV-5663] - Fixed an issue that caused H2O to fail to export dataframes to S3.
  • [PUBDEV-5745] - Added the `keep_cross_validation_models` argument to Grid Search.
  • [PUBDEV-5746] - Improved efficiency of the `keep_cross_validation_models` parameter in AutoML
  • [PUBDEV-5777] - Simplified the comparison of H2OXGBoost with native XGBoost when using the Python client.
  • [PUBDEV-5780] - Fixed JDBC ingestion for Teradata databases.
  • [PUBDEV-5824] - In the Python client and the Java API, multiple runs of the same AutoML instance no longer fail training new "Best Of Family" SE models that would include the newly generated models.
  • [PUBDEV-5873] - Fixed an issue that resulted in an AssertionError when calling `cbind` from the Python client.
  • [PUBDEV-5881] - AutoML now enforces case for the `sort_metric` option when using the Java API.
  • [PUBDEV-5903] - In AutoML, StackEnsemble models are now always trained, even if we reached `max_runtime_secs` limit.
  • [PUBDEV-5904] - In the R client, added documentation for helper functions.
  • [PUBDEV-5922] - Renamed `x` to `X` in the H2O-sklearn fit method to be consistent with the sklearn API.
  • [PUBDEV-5924] - Merging datasets now works correctly.
  • [PUBDEV-5931] - Building on Maven with h2o-ext-xgboost on versions later than 3.18.0.11 no longer results in a dependency error.
  • [PUBDEV-5933] - Fixed a Java 11 ORC file parsing failure.
  • [PUBDEV-5954] - Upgraded the version of the lodash package used in H2O Flow.
  • [PUBDEV-5967] - `-ip localhost` now works correctly on WSL.
  • [PUBDEV-5971] - CSV/ARFF Parser no longer treats blank lines as data lines with NAs.
  • [PUBDEV-5976] - Starting h2o-3 from the Python Client no longer fails on Java 10.0.2.
  • [PUBDEV-5995] - Fixed an issue that caused StackedEnsemble MOJO model to return an "IllegalArgumentException: categorical value out of range" message.
  • [PUBDEV-5996] - Removed the "nclasses" parameter from tree traversal routines.
  • [PUBDEV-5998] - Exposed H2OXGBoost parameters used to train a model to the Python API. Previously, this information was visible in the Java backend but was not passed back to the Python API.
  • [PUBDEV-5999] - Removed "illegal reflective access" warnings when starting H2O-3 with Java 10.
  • [PUBDEV-6004] - In Stacked Ensembles, changes made to data during scoring now apply to all models.
  • [PUBDEV-6005] - When running AutoML in Flow, updated the list of algorithms that can ber selected in the "Exclude These Algorithms" section.

New Feature

  • [PUBDEV-5170] - Individual predictions of GBM trees are now exposed in the MOJO API.
  • [PUBDEV-5378] - Exposed target encoding in the Java API.
  • [PUBDEV-5399] - The `keep_cross_validation_fold_assignment` option is now available in AutoML.
  • [PUBDEV-5609] - Added support for the Isolation Forest algorithm in H2O-3. Note that this is a Beta version of the algorithm.
  • [PUBDEV-5668] - Added the `keep_cross_validation_fold_assignment` option to AutoML in Flow.
  • [PUBDEV-5681] - `h2o.connect` no longer ignores `strict_version_check=FALSE` when connecting to a Steam cluster.
  • [PUBDEV-5695] - Created an R demo for CoxPH. This is available here.
  • [PUBDEV-5775] - It is now possible to combine two models into one MOJO, with the second model using the prediction from the first model as a feature. These models can be from any algorithm or combination of algorithms except Word2Vec.
  • [PUBDEV-5852] - Implemented h2oframe.fillna(method='backward').
  • [PUBDEV-5977] - Improved speed-up of AutoML training on smaller datesets in client mode (Sparkling Water).
  • [PUBDEV-5979] - Exposed Java Target Encoding in the Python client.
  • [PUBDEV-5988] - Users can now specify a `-features` parameter when starting h2o from the command line. This allows users to remove experimental or beta algorithms when starting H2O-3. Available options for this parameter include `beta`, `stable`, and `experimental`.

Task

  • [PUBDEV-4507] - Added XGBoost to AutoML.
  • [PUBDEV-5696] - Added an option to allow users to use a user-specified JDBC driver.
  • [PUBDEV-5722] - Exposed `pr_auc` to areas where you can find AUC, including scoring_history, model summary. Also added h2o.pr_auc() in R.
  • [PUBDEV-5901] - Added support for Java 11.
  • [PUBDEV-6001] - Improved the AutoML documentation in the User Guide.

Improvement

  • [PUBDEV-5590] - Added a `MAX_USR_CONNECTIONS_KEY` argument to limit number of sessions for import_sql_table.
  • [PUBDEV-5669] - Improved performance gap when importing data using Hive2.
  • [PUBDEV-5719] - Improved and cleaned up output for the h2o.mojo_predict_csv and h2o.mojo_predict_df functions.
  • [PUBDEV-5743] - Users can now visualize XGBoost trees when running predictions.
  • [PUBDEV-5761] - Added weights to partial depenced plots. Also added a level for missing values.
  • [PUBDEV-5822] - Users can now download the genmodel.jar in Flow for completed models.
  • [PUBDEV-5886] - In AutoML, changed the default for `keep_cross_validation_models` and `keep_cross_validation_predictions` from True to False.
  • [PUBDEV-5888] - Added support for predicting using the XGBoost Predictor.
  • [PUBDEV-5909] - In XGBoost, optimized the matrix exchange between Java and native C++ code.
  • [PUBDEV-5913] - Improved the h2o-3 README for installing in R and IntelliJ IDEA.
  • [PUBDEV-5927] - Introduced a simple "streaming" mode that allows H2O to read from a table using basic SQL:92 constructs.
  • [PUBDEV-5929] - In AutoML, `stopping_metric` is now based on `sort_metric`.
  • [PUBDEV-5952] - The requirements.txt file now includes the Colorama version.
  • [PUBDEV-5961] - In lockable.java, delete is now final in order to prevent inconsistent overrides.
  • [PUBDEV-5964] - Reverted AutoML naming change from Auto.Algo to Auto.algo.
  • [PUBDEV-6000] - In AutoML, automatic partitioning of the valiation frame now uses 10% of the training data instead of 20%.
  • [PUBDEV-6002] - Changed model and grid indexing in autogenerated model names in AutoML to be 1 instead of 0 indexed.
  • [PUBDEV-6017] - Allow public access to H2O instances started from R/Python. This can be done with the new `bind_to_localhost` (Boolean) parameter, which can be specified in `h2o.init()`.

Docs

  • [PUBDEV-4505] - Added Scala and Java examples to the Building and Extracting a MOJO topic.
  • [PUBDEV-4590] - Added a Scala example to the Stacked Ensembles topic.
  • [PUBDEV-5949] - Added Tree class method to the Python module documentation.
  • [PUBDEV-5641] - Removed references to UDP in the documentation.
  • [PUBDEV-5664] - Removed Sparkling Water topics from H2O-3 User Guide. These are in the Sparkling Water User Guide.
  • [PUBDEV-5674] - Added a Resources section to the Overview and included links to the awesome-h2o repository, H2O.ai blogs, and customer use cases.
  • [PUBDEV-5693] - Updated GCP Installation documentation with infomation about quota limits.
  • [PUBDEV-5709] - Updated Gains/Lift documentation. 16 groups are now used by default.
  • [PUBDEV-5756] - Added Python examples to the Cross-Validation topic in the User Guide.
  • [PUBDEV-5762] - Added `loss_by_col` and `loss_by_col_idx` to list of GLRM parameters.
  • [PUBDEV-5810] - Updated documentation for `class_sampling_factors`. `balance_classes` must be enabled when using `class_sampling_factors`.
  • [PUBDEV-5839] - Added a Python example for initializing and starting h2o-3 in Docker.
  • [PUBDEV-5857] - Updated the Admin menu documentation in Flow after adding "Download Gen Model" option.
  • [PUBDEV-5905] - In GBM and DRF, `enum_limited` is a supported option for `categorical_encoding`.
  • [PUBDEV-5962] - Added the -notify_local flag to list of flags available when starting H2O-3 from the command line.
  • [PUBDEV-5982] - Added documentation for Isolation Forest (beta).

Wright (3.20.0.10) - 10/16/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/10/index.html

Bug

  • [PUBDEV-5613] - AutoML now correctly. respects the max_runtime_secs setting.
  • [PUBDEV-5856] - Fixed a multinomial COD solver bug.
  • [PUBDEV-5919] - Fixed an issue that caused importing of ARFF files to fail if the header was too large and/or with large datasets with categoricals.

Wright (3.20.0.9) - 10/1/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/9/index.html

Bug

  • [PUBDEV-5930] - Fixed an issue that caused H2O to fail when loading a GLRM model.

Improvement

  • [PUBDEV-5938] - log4j.properties can be loaded from classpath.
  • [PUBDEV-5939] - Buffer configuration is now available for http/https connections.

Wright (3.20.0.8) - 9/21/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/8/index.html

Bug

  • [PUBDEV-5855] - Fixed an issue that occurred when parsing columns that include double quotation.
  • [PUBDEV-5880] - The `max_runtime_secs` option is no longer ignored when using the Python client.
  • [PUBDEV-5906] - Fixed an XGBoost Sparsity detection test to make it deterministic.
  • [PUBDEV-5907] - Hadoop driver class no longer fails to parse new Java version string.

New Feature

  • [PUBDEV-5861] - Added a GBM/DRF Tree walker API in the R client.
  • [PUBDEV-5862] - The R API for obtaining and traversing model trees in GBM/DRF is available in Python.

Improvement

Wright (3.20.0.7) - 8/31/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/7/index.html

Bug

  • [PUBDEV-5826] - Fixed an issue that caused a mismatch between GLRM MOJO predict and GLRM predict.
  • [PUBDEV-5841] - Fixed an issue that caused H2O XGBoost grid search to fail even when sizing the sessions 4xs the data size and using extramempercent of 150.
  • [PUBDEV-5848] - When performing multiple AutoML runs using the H2O R client, viewing the first AutoML leaderboard no longer results in an error.
  • [PUBDEV-5864] - H2O now only binds to the local interface when started from R/Python.
  • [PUBDEV-5871] - Fixed an issue that caused DeepLearning and XGBoost MOJOs to get a corrupted input row. This occurred when GenModel's helper functions that perform 1-hot encoding failed to take correctly into considerations cases where useAllFactorLevels = false and corrupted the first categorical value in the input row.
  • [PUBDEV-5872] - Added gamma, tweedie, and poisson objective functions to the XGBoost Java Predictor.
  • [PUBDEV-5877] - Fixed an issue in HDFS file import. In rare cases the import could fail due to temporarily inconsistent state of H2O distributed memory.

Wright (3.20.0.6) - 8/24/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/6/index.html

Bug

  • [PUBDEV-5724] - H2oApi.frameColumn in h2o-bindings.jar now correctly parses responses.
  • [PUBDEV-5751] - biz.k11i:xgboost-predictor:0.3.0 is now ported to the h2oai repo and released to Maven Central. This allows for easier deployment of H2O and Sparkling Water.
  • [PUBDEV-5786] - In GLM, the coordinate descent solver is now only disabled for when family=multinomial.
  • [PUBDEV-5792] - Fixed an issue that caused the H2O parser to hang when reading a Parquet file.
  • [PUBDEV-5803] - Fixed an issue that resulted in an AutoML "Unauthorized" Error when running through Enterprise Steam via R.
  • [PUBDEV-5818] - Leaf Node assignment no longer produces the wrong paths for degenerated trees.
  • [PUBDEV-5823] - Updated the list of Python dependencies on the release download page and in the User Guide.
  • [PUBDEV-5826] - Fixed an issue that resulted in a mismatch between GLRM predict and GLRM MOJO predict.
  • [PUBDEV-5844] - Launching H2O on a machine with greater than 2TB no longer results in an integer overflow error.
  • [PUBDEV-5847] - The HTTP parser no longer reads fewer rows when the data is compressed.
  • [PUBDEV-5851] - AstFillNA Rapids expression now returns H2O.unimp() on backward methods.

New Feature

  • [PUBDEV-5735] - In GBM and DRF, tree traversal and information is now accessible from the R and Python clients. This can be done using the new h2o.getModelTree function.
  • [PUBDEV-5779] - In GBM, added a new staged_predict_proba function.
  • [PUBDEV-5812] - MOJO output now includes terminal node IDs.
  • [PUBDEV-5832] - GBM/DRF, the H2OTreeClass function now allows you to specify categorical levels.

Task

  • [PUBDEV-5845] - Updated the XGBoost dependency to ai.h2o:xgboost-predictor:0.3.1.

Improvement

  • [PUBDEV-5837] - Terminal node IDs can now be retrieved in the predict_leaf_node_assignment function.

Docs

  • [PUBDEV-5836] - The User Guide now indicates that only Hive versions 2.2.0 or greater are supported for JDBC drivers. Hive 2.1 is not currently supported.
  • [PUBDEV-5838] - In GLM, the documentation for the Coordinate Descent solver now notes that Coordinate Descent is not available when family=multinomial.

Wright (3.20.0.5) - 8/8/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/5/index.html

Bug

  • [PUBDEV-5543] - Hive smoke tests no longer time out on HDP.
  • [PUBDEV-5793] - AutoML now correctly ignores columns specified in Flow.
  • [PUBDEV-5794] - In Flow, the Import SQL Table button now works correctly.
  • [PUBDEV-5806] - XGBoost cross validation now works correctly.
  • [PUBDEV-5811] - Fixed an issue that caused AutoML to fail in Flow due to the keep_cross_validation_fold_assignment option.
  • [PUBDEV-5814] - Multinomial Stacked Ensemble no longer fails when either XGBoost or Naive Bayes is the base model.
  • [PUBDEV-5816] - Fixed an issue that caused XGBoost to generate the wrong metrics for multinomial cases.
  • [PUBDEV-5819] - Increased the client_disconnect_timeout value when ClientDisconnectCheckThread searches for connected clients.

Improvement

  • [PUBDEV-5813] - Added automated Flow test for AutoML.

Wright (3.20.0.4) - 7/31/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/4/index.html

Bug

  • [PUBDEV-5555] - In Flow, increased the height of the summary section for the column summary.
  • [PUBDEV-5720] - Cross-validation now works correctly in XGBoost.
  • [PUBDEV-5739] - Documentation for the MOJO predict functions (mojo_predict_pandas and mojo_predict_csv) is now available in the Python User Guide.
  • [PUBDEV-5744] - Regression comparison tests no longer fail between H2OXGBoost and native XGBoost.
  • [PUBDEV-5760] - GBM/DRF MOJO scoring no longer allocates unnecessary objects for each scored row.

New Feature

  • [PUBDEV-5736] - In GBM, added point estimation as a metric.

Task

Improvement

  • [PUBDEV-5429] - The h2o.importFile([List of Directory Paths]) function will now import all the files located in the specified folders.
  • [PUBDEV-5637] - Added Standard Error of Mean (SEM) to Partial Dependence Plots.
  • [PUBDEV-5718] - Added two new formatting options to hex.genmodel.tools.PrintMojo. The --decimalplaces (or -d) option allows you to set the number of places after the decimal point. The --fontsize (or -f) option allows you to set the fontsize. The default fontsize is 14.
  • [PUBDEV-5733] - Optimized the performance of ingesting large number of small Parquet files by using sequential parse.
  • [PUBDEV-5749] - Added support for weights in a calibration frame.
  • [PUBDEV-5752] - Added a new port_offset command. This parameter lets you specify the relationship of the API port ("web port") and the internal communication port. The previous implementation expected h2o port = api port + 1. Because there are assumptions in the code that the h2o port and API port can be derived from each other, we cannot fully decouple them. Instead, this new option lets the user specify an offset such that h2o port = api port + offset. This enables the user to move the communication port to a specific range, which can be firewalled.
  • [PUBDEV-5765] - Improved speed of ingesting data from HTTP/HTTPS data sources in standalone H2O.

Docs

  • [PUBDEV-5694] - The User Guide now specifies that XLS/XLSX files must be BIFF 8 format. Other formats are not supported.
  • [PUBDEV-5731] - Added to docs that when downloading MOJOs/POJOs, users must specify the entire path and not just the relative path.
  • [PUBDEV-5774] - Added documentation for the new port_offset command when starting H2O.

Wright (3.20.0.3) - 7/10/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/3/index.html

Bug

  • [PUBDEV-5353] - The `fold_column` option now works correctly in XGBoost.
  • [PUBDEV-5560] - Calling `describe` on empty H2O frame no longer results in an error in Python.
  • [PUBDEV-5576] - In XGBoost, when performing a grid search from Flow, the correct cross validation AUC score is now reported back.
  • [PUBDEV-5612] - Fixed an issue that cause XGBoost to fail with Tesla V100 drivers 70 and above and with CUDA 9.
  • [PUBDEV-5654] - H2O's XGBoost results no longer differ from native XGBoost when dmatrix_type="sparse".
  • [PUBDEV-5672] - In the R documentation, fixed the description for h2o.sum to state that this function indicates whether to return an H2O frame or one single aggregated sum.
  • [PUBDEV-5673] - H2O data import for Parquet files no longer fails on numeric decimalTypes.
  • [PUBDEV-5683] - Fixed an error that occurred when viewing the AutoML Leaderboard in Flow before the first model was completed.
  • [PUBDEV-5686] - When connecting to a Linux H2O Cluster from a Windows machine using Python, the `import_file()` function can now correctly locate the file on the Linux Server.
  • [PUBDEV-5692] - H2O now reports the project version in the logs.
  • [PUBDEV-5700] - In CoxPH, fixed an issue that caused training to fail to create JSON output when the dataset included too many features.
  • [PUBDEV-5707] - Users can now switch between edit and command modes on Scala cells.
  • [PUBDEV-5721] - Fixed an issue with the way that RMSE was calculated for cross-validated models.
  • [PUBDEV-5727] - In GLRM, fixed an issue that caused differences between the result of h2o.predict and MOJO predictions.

New Feature

  • [PUBDEV-5680] - Added a new `-report_hostname` flag that can be specified along with `-proxy` when starting H2O on Hadoop. When this flag is enabled, users can replace the IP address with the machine's host name when starting Flow.
  • [PUBDEV-5697] - Added support for the Amazon Redshift data warehouse.
  • [PUBDEV-5725] - Added support for CDH 5.9.

Task

  • [PUBDEV-5628] - Accessing secured (Kerberized) HDFS from a standalone H2O instance works correctly.
  • [PUBDEV-5656] - AutoML Python tests always use max models to avoid running out of time.
  • [PUBDEV-5682] - CoxPH now validates that a `stop_column` is specified. `stop_column` is a required parameter.
  • [PUBDEV-5688] - Fixed an issue that caused a GCS Exception to display when H2O was launched offline.

Improvement

  • [PUBDEV-5572] - In Flow, improved the display of the confusion matrix for multinomial cases.
  • [PUBDEV-5665] - Users will now see a Precision-Recall AUC when training binomial models.
  • [PUBDEV-5666] - Synchronous and Asynchronous Scala Cells are now allowed in H2O Flow.
  • [PUBDEV-5687] - H2O now autodetects string columns and skips them before calculating `groupby`. H2O also warns the user when this happens.

Docs

  • [PUBDEV-5424] - The h2o.mojo_predict_csv and h2o.mojo_predict_df functions now appear in the R HTML documentation.
  • [PUBDEV-5702] - In GLM, documented that the Poisson family uses the -log(maximum likelihood function) for deviance.
  • [PUBDEV-5710] - Fixed the R example in the "Replacing Values in a Frame" data munging topic. Columns and rows do not start at 0; R has a 1-based index.
  • [PUBDEV-5711] - Fixed the R example in the "Group By" data munging topic. Specify the "Month" column instead of the "NumberOfFlights" column when finding the number of flights in a given month based on origin.
  • [PUBDEV-5714] - Added the new `-report_hostname` flag to the list of Hadoop launch parameters.
  • [PUBDEV-5715] - Added Amazon Redshift to the list of supported JDBC drivers.
  • [PUBDEV-5726] - Added CDH 5.9 to the list of supported Hadoop platforms.

Wright (3.20.0.2) - 6/15/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/2/index.html

Bug

  • [PUBDEV-3950] - Fixed an issue that resulted in a null pointer exception for H2O ensembles.
  • [PUBDEV-5250] - In AutoML, ignored_columns are now passed in the API call when specifying both x and a fold_column in during training.
  • [PUBDEV-5622] - Fixed a bug in documentation that incorrectly referenced 'calibrate_frame' instead of 'calibration_frame'.
  • [PUBDEV-5629] - java -jar h2o.jar no longer fails on Java 7.
  • [PUBDEV-5634] - Fixed a typo in the AutoML pydocs for sort_metric.
  • [PUBDEV-5651] - Exported CoxPH functions in R.

Task

  • [PUBDEV-5621] - Added balance_classes, class_sampling_factors, and max_after_balance_size options to AutoML in Flow.

Improvement

  • [PUBDEV-3754] - Updated the project URL, bug reports link, and list of authors in the h2o R package DESCRIPTION file.
  • [PUBDEV-5542] - Update description of the h2o R package in the DESCRIPTION file.
  • [PUBDEV-5570] - AutoML now produces an error message when a response column is missing.
  • [PUBDEV-5623] - Fixed intermittent test failures for AutoML.
  • [PUBDEV-5625] - Removed frame metadata calculation from AutoML.
  • [PUBDEV-5635] - Removed the keep_cross_validation_models = False argument from the AutoML User Guide examples.
  • [PUBDEV-5636] - Users can now set a MAX_CM_CLASSES parameter to set a maximum number of confusion matrix classes.

Docs

  • [PUBDEV-5619] - Updated the AutoML screenshot in Flow to show the newly added parameters.

Wright (3.20.0.1) - 6/6/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/1/index.html

Bug

  • [PUBDEV-4299] - In Scala, the `new H2OFrame()` API no longer fails when using http/https URL-based data sources.
  • [PUBDEV-4865] - Fixed an issue that caused the Java client JVM to get stuck with a latch/lock leak on the server.
  • [PUBDEV-5342] - Fixed an issue that caused intermittent NPEs in AutoML.
  • [PUBDEV-5357] - In parse, each lock now includes the owner rather than locking with null.
  • [PUBDEV-5359] - LDAP documentation now contains the correct name of the Auth module.
  • [PUBDEV-5426] - h2o.jar no longer includes a Jetty 6 dependency.
  • [PUBDEV-5462] - `model_summary` is now available when running Stacked Ensembles in R.
  • [PUBDEV-5478] - XGBoost now correctly respects the H2O `nthreads` parameter.
  • [PUBDEV-5488] - Fixed an invalid invariant in the recall calculation.
  • [PUBDEV-5497] - h2o-genmodel.jar can now be loaded into Spark's spark.executor.extraClassPath.
  • [PUBDEV-5501] - AutoML now correctly detects the leaderboard frame in H2O Flow.
  • [PUBDEV-5524] - In XGBoost, fixed an issue that resulted in a "Check failed: param.max_depth < 16 Tree depth too large" error.
  • [PUBDEV-5551] - Zero decimal values and NAs are now represented correctly in XGBoost.
  • [PUBDEV-5552] - Response variable datatype checks are now extended to include TIME datatypes.
  • [PUBDEV-5598] - The `-proxy` argument is now available as part of the h2odriver.args file.
  • [PUBDEV-5605] - Fixed `stopping_metric` values in user guide. Abbreviated values should be specified using upperchase characters (for example, MSE, RMSE, etc.).
  • [PUBDEV-5610] - Proxy Mode of h2odriver now supports a notification file (specified with the `-notify` argument).
  • [PUBDEV-5617] - Fixed an issue that caused h2o.predict to throw an exception in H2OCoxPH models with interactions with stratum.

New Feature

  • [PUBDEV-3901] - Added MOJO support in Python (via jar file).
  • [PUBDEV-4927] - Added the `sort_metric` argument to AutoML.
  • [PUBDEV-4939] - Users now have the option to save CV predictions and CV models in AutoML.
  • [PUBDEV-4968] - Added an `h2o.H2OFrame.rename` method to rename columns in Python.
  • [PUBDEV-4991] - MOJO and POJO support are now available for AutoML.
  • [PUBDEV-5019] - Added support for the Cox Proportional Hazard (CoxPH) algorithm. Note that this is currently available in R and Flow only. It is not yet available in Python.
  • [PUBDEV-5177] - Added h2o.get_automl()/h2o.getAutoML function to R/Python APIs.
  • [PUBDEV-5377] - Added the `balance_classes`, `class_sampling_factors`, and max_after_balance_size` arguments to AutoML.
  • [PUBDEV-5408] - When running GLM in Flow, users can now see the InteractionPairs option.
  • [PUBDEV-5424] - Added support for MOJO scoring on a CSV or data frame in R.
  • [PUBDEV-5452] - Added an "export model as MOJO" button to Flow for supported algorithms.
  • [PUBDEV-5520] - Added support for XGBoost MOJO deployment on Windows 10.
  • [PUBDEV-5529] - GBM and DRF MOJOs and POJOs now return leaf node assignments.
  • [PUBDEV-5599] - Added the `sort_metric` option to AutoML in Flow.
  • [PUBDEV-5600] - keep_cross_validation_predictions and keep_cross_validation_models are now available when running AutoML in Flow.
  • [PUBDEV-5615] - Deep Learning MOJO now extends Serializable.

Story

  • [PUBDEV-5398] - In CoxPH, when a categorical column is only used for a numerical-categorical interaction, the algorithm will enforce useAllFactorLevels for that interaction.

Task

  • [PUBDEV-4570] - When running AutoML and XGBoost, fixed an issue that caused the adapting test frame to be different than the train frame.
  • [PUBDEV-4826] - Removed Domain length check for Stacked Ensembles.
  • [PUBDEV-5058] - GLRM predict no longer generates different outputs when performing predictions on training and testing dataframes.
  • [PUBDEV-5368] - Added support for ingesting data from Hive2 using SQLManager (JDBC interface). Note that this is experimental and is not yet suitable for large datasets.

Improvement

  • [PUBDEV-4375] - Replaced the Jama SVD computation in PCA with netlib-java library MTJ.
  • [PUBDEV-4518] - Created more tests in AutoML to ensure that all fold_assignment values and fold_column work correctly.
  • [PUBDEV-4571] - Fixed an NPE the occurred when clicking on View button while running AutoML.
  • [PUBDEV-4581] - Bundled Windows XGboost libraries.
  • [PUBDEV-4618] - Search-based models are no longer duplicated when AutoML is run again on the same dataset with the same seed.
  • [PUBDEV-4718] - When running Stacked Ensembles in R, added support for a vector of base_models in addition to a list.
  • [PUBDEV-4956] - Added support for Java 9.
  • [PUBDEV-5388] - Fixed an issue that resulted in an additional progress bar when running h2o.automl() in R.
  • [PUBDEV-5411] - Fixed an issue that resulted in an additional progress bar when running AutoML in Python.
  • [PUBDEV-5440] - The runint_automl_args.R test now always builds at least 2 models.
  • [PUBDEV-5459] - Improved XGBoost speed by not recreating DMatrix in each iteration (during training).
  • [PUBDEV-5476] - `offset_column` is now exposed in EasyPredictModelWrapper.
  • [PUBDEV-5477] - Improved single node XGBoost performance.
  • [PUBDEV-5486] - Added support for pip 10.0.0.
  • [PUBDEV-5495] - In GLM, gamma distribution with 0's in the response results in an improved message: "Response value for gamma distribution must be greater than 0."
  • [PUBDEV-5499] - Added metrics to AutoML leaderboard. Binomial models now also show mean_per_class_error, rmse, and mse. Multinomial problems now also show logloss, rmse and mse. Regression models now also show mse.
  • [PUBDEV-5533] - Exposed `model dump` in XGBoost MOJOs.
  • [PUBDEV-5538] - Improved rebalance for Frames.
  • [PUBDEV-5553] - Introduced the precise memory allocation algorithm for XGBoost sparse matrices.
  • [PUBDEV-5577] - Improved SSL documentation.
  • [PUBDEV-5601] - The Exclude Algorithms section in Flow AutoML is now always visible, even if you have not yet selected a training frame.
  • [PUBDEV-5606] - Removes unused parameters, fields, and methods from AutoML. Also exposed buildSpec in the AutoML REST API.

Docs

  • [PUBDEV-4977] - Updated documentation to indicate support for Java 9.
  • [PUBDEV-5154] - Added the new `pca_impl` parameter to PCA section of the user guide.
  • [PUBDEV-5164] - Added a Checkpointing Models section to the User Guide. This describes how checkpointing works for each supported algorithm.
  • [PUBDEV-5401] - In the "Getting Data into H2O" section, added a link to the new Hive JDBC demo.
  • [PUBDEV-5407] - The Import File example now also shows how to import from HDFS.
  • [PUBDEV-5436] - Fixed markdown headings in the example Flows.
  • [PUBDEV-5474] - All installation examples use H2O version 3.20.0.1.
  • [PUBDEV-5494] - Added a "Data Manipulation" topic for target encoding in R.
  • [PUBDEV-5496] - Added new keep_cross_validation_models and keep_cross_validation_predictions options to the AutoML documentation.
  • [PUBDEV-5509] - Added an example of using XGBoost MOJO with Maven.
  • [PUBDEV-5513] - In the XGBoost chapter, added information describing how to disable XGBoost.
  • [PUBDEV-5554] - When running XGBoost on Hadoop, added a note that users should set -extramempercent to a much higher value.
  • [PUBDEV-5579] - Added a section for the CoxPH (Cox Proportional Hazards) algorithm.
  • [PUBDEV-5581] - Added a topic describing how to install H2O-3 from the Google Cloud Platform offering.

Wolpert (3.18.0.11) - 5/24/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/11/index.html

New Feature

  • [PUBDEV-5584] - Enabled Java 10 support for CRAN release.

Task

Wolpert (3.18.0.10) - 5/22/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/10/index.html

Bug

  • [PUBDEV-5558] - Fixed an issue for adding Double.NaN to IntAryVisitor via addValue().

Task

  • [PUBDEV-5559] - Removed all code that referenced Google Analytics.
  • [PUBDEV-5565] - Disabled version check in H2O-3.
  • [PUBDEV-5567] - Removed all Google Analytics references and code from Flow.
  • [PUBDEV-5568] - Removed all Google Analytics references and code from Documentation.

Docs

  • [PUBDEV-5545] - The Security chapter in the User Guide now describes how to enforce system-level command-line arguments in h2odriver when starting H2O.

Wolpert (3.18.0.9) - 5/11/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/9/index.html

Bug

  • [PUBDEV-5290] - Fixed an issue that caused distributed XGBoost to not be registered in the REST API
  • [PUBDEV-5325] - Fixed an issue that caused XGBoost to crash due "too many open files."
  • [PUBDEV-5444] - Frames are now rebalanced correctly on multinode clusters.
  • [PUBDEV-5464] - Fixed an issue that prevented H2O libraries to load in DBC.
  • [PUBDEV-5507] - Added more robust checks for Colorama version.
  • [PUBDEV-5510] - Added more robust checks for Colorama version in H2O Python client.
  • [PUBDEV-5518] - A response column is no longer required when performing Deep Learning grid search with autoencoder enabled.
  • [PUBDEV-5527] - Fixed a KeyV3 error message that incorrectly referenced KeyV1.
  • [PUBDEV-5544] - The external backend now stores sparse vector values correctly.

New Feature

  • [PUBDEV-5456] - Added a new rank_within_group_by function in R and Python for ranking groups and storing the ranks in a new column.

Improvement

  • [PUBDEV-5500] - Improved warning messages in AutoML.
  • [PUBDEV-5537] - System administrators can now create a configuration file with implicit arguments of h2odriver and use it to make sure the h2o cluster is started with proper security settings.

Wolpert (3.18.0.8) - 4/19/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/8/index.html

Task

Wolpert (3.18.0.7) - 4/14/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/7/index.html

Bug

  • [PUBDEV-5485] - Fixed a MOJO/POJO scoring issue caused by a serialization bug in EasyPredictModelWrapper.

Wolpert (3.18.0.6) - 4/13/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/6/index.html

Bug

  • [PUBDEV-5484] - In XGBoost, fixed a memory issue that caused training to fail even when running on small datasets.
  • [PUBDEV-5441] - When files have a Ctr-M character as part of data in the row and Ctr-M also signifies the end of line in that file, it is now parsed correctly.
  • [PUBDEV-5458] - H2O-3 no longer displays the server version in HTTP response headers.
  • [PUBDEV-5460] - Updated the Mockito library.

Task

  • [PUBDEV-5449] - Conda packages are now availabe on S3, enabling installation for users who cannot access anaconda.org.

Improvement

  • [PUBDEV-5473] - Added an offset to predictBinomial Easy wrapper.

Docs

  • [PUBDEV-5227] - Updated the AutoML chapter of the User Guide to include a link to H2O World AutoML Tutorials and updated code examples that do not use leaderboard_frame.
  • [PUBDEV-5457] - Fixed links to POJO/MOJO tutorials in the GBM FAQ > Scoring section.

Wolpert (3.18.0.5) - 3/28/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/5/index.html

Bug

  • [PUBDEV-4933] - AutoML no longer trains a Stacked Ensemble with only one model.
  • [PUBDEV-5028] - GBM and GLM grids no longer fail in AutoML for multinomial problems.
  • [PUBDEV-5266] - Users can now merge/sort frames that contain string columns.
  • [PUBDEV-5303] - Fixed an issue that occured with multinomial GLM POJO/MOJO models.
  • [PUBDEV-5334] - Users can no longer specify a value of 0 for the col_sample_rate_change_per_level parameter. The value for this parameter must be greater than 0 and <= 2.0.
  • [PUBDEV-5336] - The H2O-3 Python client no longer returns an incorrect answer when running a conditional statement.
  • [PUBDEV-5365] - Added support for CDH 5.14.
  • [PUBDEV-5366] - Fixed an issue that caused XGBoost to fail when running the airlines dataset on a single-node H2O cluster.
  • [PUBDEV-5370] - The H2O-3 parser can now handle utf-8 characters that appear in the header.
  • [PUBDEV-5394] - The H2O-3 parser no longer treats the "Ctr-M" character as an end of line on Linux.
  • [PUBDEV-5414] - H2O no longer generates a warning when predicting without a weights column.

New Feature

  • [PUBDEV-5402] - The AutoML leaderboard no longer prints NaNs for non-US locales.

Task

  • [PUBDEV-5235] - Added a demo of XGBoost in Flow.
  • [PUBDEV-5386] - Improved the ordinal regression parameter optimization by changing the implementation.

Improvement

  • [PUBDEV-3978] - In Flow, improved the vertical scrolling for training and validation metrics for thresholds.
  • [PUBDEV-5364] - Added more logging regarding the WatchDog client.
  • [PUBDEV-5383] - Replaced unknownCategoricalLevelsSeenPerColumn with ErrorConsumer events in POJO log messages.
  • [PUBDEV-5400] - Improved the logic that triggers rebalance.
  • [PUBDEV-5404] - AutoML now uses correct datatypes in the AutoML leaderboard TwoDimTable.

Docs

  • [PUBDEV-5292] - Added ``beta constraints`` and ``prior`` entries to the Parameters Appendix, along with examples in R and Python.
  • [PUBDEV-5369] - Added CDH 5.14 to the list of supported Hadoop platforms in the User Guide.
  • [PUBDEV-5413] - Updated the documenation for the Ordinal ``family`` option in GLM based on the new implementation. Also added new solvers to the documenation: GRADIENT_DESCENT_LH and GRADIENT_DESCENT_SQERR.
  • [PUBDEV-5416] - Added information about Extremely Randomized Trees (XRT) to the DRF chapter in the User Guide.
  • [PUBDEV-5421] - On the H2O-3 and Sparkling Water download pages, the link to documentation site now points to the most updated version.
  • [PUBDEV-5432] - The ``target_encode_create`` and ``target_encode_apply`` are now included in the R HTML documentation.

Fault

  • [PUBDEV-5367] - Fixed an issue that caused SQLManager import to break on cluster with over 100 nodes.

Wolpert (3.18.0.4) - 3/8/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/4/index.html

  • Fixed minor release process issue preventing Sparkling Water release.

Wolpert (3.18.0.3) - 3/2/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/3/index.html

Bug

  • [PUBDEV-5102] - In Flow, the metalearner_fold_column option now correctly displays a drop-down of column names.
  • [PUBDEV-5282] - Fixed an issue that caused data import and building models fail when using Flow in IE 11.1944 on Windows 10 Enterprise.
  • [PUBDEV-5323] - Stacked Ensemble no longer fails when using a grid or list of GLMs as the base models.
  • [PUBDEV-5330] - Fixed an issue that caused an error when during Parquet data ingest.
  • [PUBDEV-5335] - In Random Forest, added back the distribution and offset_column options for backward compatibility. Note that these options are deprecated and will be ignored if used.
  • [PUBDEV-5339] - MOJO export to a file now works correctly.
  • [PUBDEV-5343] - Fixed an NPE that occurred when checking if a request is Xhr.

New Feature

  • [PUBDEV-5008] - Added support for ordinal regression in GLM. This is specified using the `family` option.
  • [PUBDEV-5274] - Added the exclude_algos option to AutoML in Flow.
  • [PUBDEV-5308] - Added a Leave-One-Out Target Encoding option to the R API. This can help improve supervised learning results when there are categorical predictors with high cardinality. Note that a similar function for Python will be available at a later date.
  • [PUBDEV-5324] - POJO now logs error messages for all incorrect data types and includes default values rather than NULL when a data type is unexpected.

Improvement

  • [PUBDEV-5344] - Moved AutoML to the top of the Model menu in Flow.

Docs

  • [PUBDEV-5306] - In the GLM chapter, added Ordinal to the list of `family` options. Also added Ologit, Oprobit, and Ologlog to the list of `link` options, which can be used with the Ordinal family.

Wolpert (3.18.0.2) - 2/20/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/2/index.html

Bug

  • [PUBDEV-5301] - Distributed XGBoost no longer fails silently when expanding a 4G dataset on a 1TB cluster.
  • [PUBDEV-5254] - Fixed an issue that caused GLM Multinomial to not work properly.
  • [PUBDEV-5278] - In XGBoost, when the first domain of a categorical is parseable as an Int, the remaining columns are not automatically assumed to also be parseable as an Int. As a result of this fix, the default value of categorical_encoding in XGBoost is now AUTO rather than label_encoder.
  • [PUBDEV-5294] - Fixed an issue that caused XGBoost models to fail to converge when an unknown decimal separator existed.
  • [PUBDEV-5326] - Fixed an issue in ParseTime that led to parse failing.

Docs

  • [PUBDEV-5313] - In the User Guide, the default value for categorical_encoding in XGBoost is now AUTO rather than label_encoder.

Wolpert (3.18.0.1) - 2/12/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/1/index.html

Bug

  • [PUBDEV-4585] - Fixed an issue that caused XGBoost binary save/load to fail.
  • [PUBDEV-4593] - Fixed an issue that caused a Levensthein Distance Normalization Error. Levenstein distance is now implemented directly into H2O.
  • [PUBDEV-5112] - The Word2Vec Python API for pretrained models no longer requires a training frame. In addition, a new `from_external` option was added, which creates a new H2OWord2vecEstimator based on an external model.
  • [PUBDEV-5128] - Fixed an issue that caused the show function of metrics base to fail to check for a key custom_metric_name and excepts.
  • [PUBDEV-5129] - The fold column in Kmeans is no longer required to be in x.
  • [PUBDEV-5130] - The date is now parsed correctly when parsed from H2O-R.
  • [PUBDEV-5133] - In Flow, the scoring history plot is now available for GLM models.
  • [PUBDEV-5135] - The Parquet parser no longer fails if one of the files to parse has no records.
  • [PUBDEV-5145] - Added error checking and logging on all the uses of `water.util.JSONUtils.parse().
  • [PUBDEV-5155] - In AutoML, fixed an exception in Python binding that occurred when the leaderboard was empty.
  • [PUBDEV-5156] - In AutoML, fixed an exception in R binding that occurred when the leaderboard was empty.
  • [PUBDEV-5159] - Removed Pandas dependency for AutoML in Python.
  • [PUBDEV-5167] - In PySparkling, reading Parquet/Orc data with time type now works correctly in H2O.
  • [PUBDEV-5174] - Fixed a maximum recursion depth error when using `isin` in the H2O Python client.
  • [PUBDEV-5175] - When running getJobs in Flow, fixed a ClassNotFoundException that occurred when AutoML jobs existed.
  • [PUBDEV-5179] - Fixed an issue that caused a list of columns to be truncated in PySparkling. Light endpoint now returns all columns.
  • [PUBDEV-5186] - In AutoML, fixed a deadlock issue that occurred when two AutoML runs came in the same second, resulting in matching timestamps.
  • [PUBDEV-5191] - The offset_column and distribution parameters are no longer available in Random Forest.
  • [PUBDEV-5195] - Fixed an issue in XGBoost that caused MOJOs to fail to work without manually adding the Commons Logging dependency.
  • [PUBDEV-5203] - Fixed an issue that caused XGBoost to mangle the domain levels for datasets that have string response domains.
  • [PUBDEV-5213] - In Flow, the separator drop down now shows 3-digit decimal values instead of 2.
  • [PUBDEV-5215] - Users can now specify interactions when running GLM in Flow.
  • [PUBDEV-5228] - FrameMetadate code no longer uses hardcoded keys. Also fixed an issue that caused AutoML to fail when multiple AutoMLs are run simultaneously.
  • [PUBDEV-5229] - A frame can potentially have a null key. If there is a Frame with a null key (just a container for vecs), H2O no longer attempts to track a null key.
  • [PUBDEV-5256] - Users can now successfully build an XGBoost model as compile chain. XGBoost no longer fails to provide the compatible artifact for an Oracle Linux environment.
  • [PUBDEV-5265] - GLM no longer fails when a categorical column exists in the dataset along with an empty value on at least one row.
  • [PUBDEV-5286] - Fixed an issue that cause GBM grid to fail on some datasets when specifying `sample_rate` in the grid.
  • [PUBDEV-5287] - The x argument is no longer required when performing a grid search.
  • [PUBDEV-5297] - Fixed an issue that caused the Parquet parser to fail on Spark 2.0 (SW-707).
  • [PUBDEV-5315] - Fixed an issue that caused XGBoost OpenMP to fail on Ubuntu 14.04.

New Feature

  • [PUBDEV-4111] - Added support for INT96 timestamp to the Parquet parser.
  • [PUBDEV-4652] - Added support for XGBoost multinode training in H2O. Note that this is still a BETA feature.
  • [PUBDEV-4980] - Users can now specify a list of algorithms to exclude during an AutoML run. This is done using the new `exclude_algos` parameter.
  • [PUBDEV-5204] - In GLM, users can now specify a list of interactions terms to include when building a model instead of relying on the default action of including all interactions.

Task

  • [PUBDEV-5230] - The Python PCA code examples in github and in the User Guide now use the h2o.estimators.pca.H2OPrincipalComponentAnalysisEstimator method instead of the h2o.transforms.decomposition.H2OPCA method.
  • [PUBDEV-5251] - Upgraded the XGBoost version. This now supports RHEL 6.

Improvement

  • [PUBDEV-5086] - Stacked Ensemble allows you to specify the metalearning algorithm to use when training the ensemble. When an algorithm is specified, Stacked Ensemble runs with the specified algorithm's default hyperparameter values. The new ``metalearner_params`` option allows you to pass in a dictionary/list of hyperparameters to use for that algorithm instead of the defaults.
  • [PUBDEV-5224] - Users can now specify a seed parameter in Stacked Ensemble.
  • [PUBDEV-5310] - Documented clouding behavior of an H2O cluster. This is available at https://github.com/h2oai/h2o-3/blob/master/h2o-docs/devel/h2o_clouding.rst.

Docs

  • [PUBDEV-5149] - Updated the documentation to indicate that datetime parsing from R and Flow now is UTC by default.
  • [PUBDEV-5151] - R documentation on docs.h2o.ai is now available in HTML format.
  • [PUBDEV-5172] - Added a new Cloud Integration topic for using H2O with AWS.
  • [PUBDEV-5221] - In the XGBoost chapter, added that XGBoost in H2O supports multicore.
  • [PUBDEV-5242] - Added `interaction_pairs` to the list of GLM parameters.
  • [PUBDEV-5283] - Added `metalearner_algorithm` and `metalearner_params` to the Stacked Ensembles chapter.
  • [PUBDEV-5311] - The H2O-3 download site now includes a link to the HTML version of the R documentation.
  • [PUBDEV-5312] - Updated the XGBoost documentation to indicate that multinode support is now available as a Beta feature.
  • [PUBDEV-5314] - Added the seed parameter to the Stacked Ensembles section of the User Guide.

Wheeler (3.16.0.4) - 1/15/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wheeler/4/index.html

Bug

  • [PUBDEV-5206] - Fixed several client deadlock issues.
  • [PUBDEV-5212] - When verifying that a supported version of Java is available, H2O no longer checks for version 1.6.
  • [PUBDEV-5216] - The H2O-3 download site has an updated link for the Sparkling Water README.
  • [PUBDEV-5220] - In Aggregator, fixed the way that a created mapping frame is populated.
  • New Feature

  • [PUBDEV-5209] - XGBoost can now be used in H2O on Hadoop with a single node.
  • Improvement

  • [PUBDEV-5210] - Deep Water is disabled in AutoML.
  • [PUBDEV-5211] - This release of H2O includes an upgraded XGBoost version.
  • Wheeler (3.16.0.3) - 1/8/2018

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wheeler/3/index.html

    Technical task

    • [PUBDEV-5184] - H2O-3 now allows definition of custom function directly in Python notebooks and enables iterative updates on defined functions.

    Bug

    • [PUBDEV-4863] - When a frame name includes numbers followed by alphabetic characters (for example, "250ML"), Rapids no longer parses the frame name as two tokens.
    • [PUBDEV-4897] - Fixed an issue that caused Partial Dependence Plots to a use different order of categorical values after calling as.factor.
    • [PUBDEV-5148] - Added support for CDH 5.13.
    • [PUBDEV-5180] - Fixed an issue that caused a Python 2 timestamp to be interpreted as two tokens.
    • [PUBDEV-5196] - Aggregator supports categorial features. Fixed a discrepency in the Aggregator documentation.

    New Feature

    • [PUBDEV-4622] - In GBM, users can now specify quasibinomial distribution.
    • [PUBDEV-4965] - H2O-3 now supports the Netezza JDBC driver.

    Improvement

    • [PUBDEV-5171] - Users can now optionally export the mapping of rows in an aggregated frame to that of the original raw data.

    Docs

    • [PUBDEV-5120] - When using S3/S3N, revised the documentation to recommend that S3 should be used for data ingestion, and S3N should be used for data export.
    • [PUBDEV-5150] - The H2O User Guide has been updated to indicate support for CDH 5.13.
    • [PUBDEV-5162] - Updated the Anaconda section with information specifically for Python 3.6 users.
    • [PUBDEV-5178] - The H2O User Guide has been updated to indicate support for the Netezza JDBC driver.
    • [PUBDEV-5190] - Added "quasibinomial" to the list of `distribution` options in GBM.
    • [PUBDEV-5192] - Added the new `save_mapping_frame` option to the Aggregator documentation.

    Wheeler (3.16.0.2) - 11/30/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wheeler/2/index.html

    Bug

  • [PUBDEV-5115] - In AutoML, fixed an issue that caused the leaderboard_frame to be ignored when nfolds > 1.
  • [PUBDEV-5117] - Improved the warning that displays when mismatched jars exist.
  • [PUBDEV-5126] - The correct H2O version now displays in setup.py for sdist.
  • Improvement

  • [PUBDEV-5111] - Incorporated final improvements to the Sparkling Water booklet.
  • [PUBDEV-5127] - Automated Anaconda releases.
  • [PUBDEV-5131] - This version of H2O introduces light rest endpoints for obtaining frames in the python client.
  • Wheeler (3.16.0.1) - 11/24/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wheeler/1/index.html

    Technical Task

    • [PUBDEV-5087] - A backend Java API is now available for custom evaluation metrics.

    Bug

    • [PUBDEV-1465] - Users can now save models to and download models from S3.
    • [PUBDEV-3567] - When running h2o.merge in the R client, the status line indicator will no longer return quickly. Users can no longer enter new commands until the merge process is completed.
    • [PUBDEV-4172] - In the R client strings, training_frame says no longer states that it is an optional parameter.
    • [PUBDEV-4672] - The H2OFrame.mean method now works in Python 3.6.
    • [PUBDEV-4697] - Early stopping now works with perfectly predictive data.
    • [PUBDEV-4727] - h2o.group_by now works correctly when specifying a median() value.
    • [PUBDEV-4778] - In XGBoost fixed an issue that caused prediction on a dataset without a response column to return an error.
    • [PUBDEV-4853] - When running AutoML in Flow, users can now specify a project name.
    • [PUBDEV-4857] - h2odriver in proxy mode now correctly forwards the authentication headers to the H2O node.
    • [PUBDEV-4900] - H2O can ingest Parquet 1.8 files created by Spark.
    • [PUBDEV-4906] - Loading models and exporting models to/from AWS S3 now works correctly.
    • [PUBDEV-4907] - Fixed an issue that caused binary model imports and exports from/to S3 to fail.
    • [PUBDEV-4930] - Users can now load data from s3n resources after setting core-site.xml correctly.
    • [PUBDEV-4953] - Fixed an error that occurred when exporting data to s3.
    • [PUBDEV-4985] - Fixed an issue that caused H2O to "forget" that a column is of factor type if it contains only NA values.
    • [PUBDEV-4996] - The download instructions for Python now indicate that version 3.6 is supported.
    • [PUBDEV-5002] - In Flow, fixed an issue with retaining logs from the client node.
    • [PUBDEV-5003] - H2O can now handle the case where I'm the Client and the md5 should be ignored.
    • [PUBDEV-5005] - h2o.residual_deviance now works correctly.
    • [PUBDEV-5017] - h2o.predict no longer returns an error when the user does not specify an offset_column.
    • [PUBDEV-5033] - Fixed an issue with Spark string chunks.
    • [PUBDEV-5037] - Logs now display correctly on HADOOP, and downloaded logs no longer give an empty folder when the cluster is up.
    • [PUBDEV-5038] - Added an option for handling empty strings. If compare_empty if set to FALSE, empty strings will be handled as NaNs.
    • [PUBDEV-5040] - HTTP logs can now be obtained in Flow UI.
    • [PUBDEV-5048] - Fixed an issue with the progress bar that occurred when running PySparkling + DataBricks.
    • [PUBDEV-5067] - Fixed reporting of clients with the wrong md5.
    • [PUBDEV-5070] - In the R and Python clients, updated the strings for max_active_predictors to indicate that the default is now 5000.
    • [PUBDEV-5072] - h2o.merge now works correctly for one-to-many when all.x=TRUE.
    • [PUBDEV-5074] - Fixed an issue that caused GLM predict to fail when a weights column was not specified.
    • [PUBDEV-5081] - Reduced the number of URLs that get sent to google analytics.
    • [PUBDEV-5095] - When building a Stacked Ensemble model, the fold_column from AutoML is now piped through to the stacked ensemble.
    • [PUBDEV-5096] - Fixed an issue that cause GLM scoring to produce incorrect results for sparse data.

    Epic

    • [PUBDEV-4684] - This version of H2O includes support for Python 3.6.

    New Feature

    • [PUBDEV-3877] - MOJOs are now supported for Stacked Ensembles.
    • [PUBDEV-3743] - User can now specify the metalearner algorithm type that StackedEnsemble should use. This can be AUTO, GLM, GBM, DRF, or Deep Learning.
    • [PUBDEV-3971] - Added a metalearner_folds option in Stacked Ensembles, enabling cross validation.
    • [PUBDEV-4085] - In GBM, endpoints are now exposed that allow for custom evaluation metrics.
    • [PUBDEV-4882] - When running AutoML through the Python or R clients, users can now specify the nfolds argument.
    • [PUBDEV-4891] - Add another Stacked Ensemble (top model for each algo) to AutoML
    • [PUBDEV-5071] - The AutoML leaderboard now uses cross-validation metrics (new default).
    • [PUBDEV-4914] - K-Means POJOs and MOJOs now expose distances to cluster centers.
    • [PUBDEV-4957] - Multiclass stacking is now supported in AutoML. Removed the check that caused AutoML to skip stacking for multiclass.
    • [PUBDEV-5043] - Users can now specify a number of folds when running AutoML in Flow.
    • [PUBDEV-5084] - Added a metalearner_fold_column option in Stacked Ensembles, allowing for custom folds during cross validation.
    • [PUBDEV-4994] - The Aggregator Function is now exposed in the R client.
    • [PUBDEV-4995] - The Aggregator Function is now available in the Python client.

    Story

    Task

    • [PUBDEV-4803] - The current version of h2o-py is now published into PyPi.
    • [PUBDEV-4896] - Change behavior of auto-generation of validation and leaderboard frames in AutoML
    • [PUBDEV-4931] - Updated the download site and the end user documentation to indicate that Python3.6 is now supported.
    • [PUBDEV-4935] - PyPi/Anaconda descriptors now indicate support for Python 3.6.

    Improvement

    • [PUBDEV-4791] - Enabled the lambda search for the GLM metalearner in Stacked Ensembles. This is set to TRUE and early_stopping is set to FALSE.
    • [PUBDEV-4831] - Running `pip install` now installs the latest version of H2O-3.
    • [PUBDEV-4963] - In EasyPredictModelWrapper, preamble(), predict(), and fillRawData() are now protected rather than private.
    • [PUBDEV-5082] - MOJOs/POJOs will not be created for unsupported categorical_encoding values.
    • [PUBDEV-5109] - An AutoML run now outputs two StackedEnsemble model IDs. These are labeled StackedEnsemble_AllModels and StackedEnsemble_BestOfFamily.

    Docs

    • [PUBDEV-4298] - In the Data Manipulation chapter, added a topic for pivoting tables.
    • [PUBDEV-4662] - Added a topic to the Data Manipulation chapter describing the h2o.fillna function.
    • [PUBDEV-4747] - Added MOJO and POJO Quick Start sections directly into the Productionizing H2O chapter. Previously, this chapter included links to quick start files.
    • [PUBDEV-4810] - In the GBM booklet when describing nbins_cat, clarified that factors rather than columns get grouped together.
    • [PUBDEV-4816] - The description for the GLM lambda_max option now states that this is the smallest lambda that drives all coefficients to zero.
    • [PUBDEV-4833] - Updated the installation instructions for PySparkling.
    • [PUBDEV-4864] - Clarified that in H2O-3, sampling is without replacement.
    • [PUBDEV-4878] - Updated documentation to state that multiclass classification is now supported in Stacked Ensembles.
    • [PUBDEV-4879] - Updated documentation to state that multiclass stacking is now supported in AutoML.
    • [PUBDEV-4895] - Added an Early Stopping section the Algorithms > Common chapter.
    • [PUBDEV-4945] - Added a note in Word2vec stating that binary format is not supported.
    • [PUBDEV-4946] - In the Parameters Appendix, updated the description for histogram_type=random.
    • [PUBDEV-4958] - In the Using Flow > Models > Run AutoML section, updated the AutoML screenshot to show the new Project Name field.
    • [PUBDEV-4971] - Added a Sorting Columns data munging topic describing how to sort a data frame by column or columns.
    • [PUBDEV-5000] - In KMeans, updated the list of model summary statistics and training metrics that are outputted.
    • [PUBDEV-5011] - Removed SortByResponse from the list of categorical_encoding options for Aggregator and K-Means.
    • [PUBDEV-5026] - Updated the Sparkling Water links on docs.h2o.ai to point to the latest release.
    • [PUBDEV-5032] - Added a section in the Algorithms chapter for Aggregator.
    • [PUBDEV-5056] - Updated the description for Save and Loading Models to indicate that H2O binary models are not compatible across H2O versions.
    • [PUBDEV-5057] - Added ignored_columns and 'x' parameters to AutoML section. Also added the 'x' parameter to the Parameters Appendix.
    • [PUBDEV-5062] - In DRF, add FAQs describing splitting criteria.
    • [PUBDEV-5085] - Added the new metalearner_folds and metalearner_fold_assignment parameters to the Defining a Stacked Ensemble Model section in the User Guide.
    • [PUBDEV-5089] - Updated the Sparking Water booklet. (Also PUBDEV-5004.)
    • [PUBDEV-5092] - Added the new metalearner_algorithm parameter to Defining a Stacked Ensemble Model section in the User Guide.
    • [PUBDEV-5097] - The User Guide and the POJO/MOJO Javadoc have been updated to indicate that MOJOs are supported for Stacked Ensembles.

    Weierstrass (3.14.0.7) - 10/20/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-weierstrass/7/index.html

    Bug

    • [PUBDEV-4987] - h2o.H2OFrame.any() and h2o.H2OFrame.all() not working properly if frame contains only True
    • [PUBDEV-4988] - Don't check H2O client hash-code ( Fix )

    Task

    • [PUBDEV-4003] - Generate Python API tests for Python Module Data in H2O and Data Manipulation

    Weierstrass (3.14.0.6) - 10/9/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-weierstrass/6/index.html

    Bug

    • [SW-542] - Fixed an issue that prevented Sparkling Water from importing Parquet files.

    Weierstrass (3.14.0.5) - 10/9/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-weierstrass/5/index.html

    Bug

    • [PUBDEV-4870] - Fixed an issue that caused sorting to be done incorrectly.
    • [PUBDEV-4917] - Only relevant clients (the ones with the same cloud name) are now reported to H2O.
    • [PUBDEV-4954] - Improved error messaging in the case where H2O fails to parse a valid Parquet file.
    • [PUBDEV-4959] - Fixed an issue that allowed nodes from different clusters to kill different H2O clusters.
    • [PUBDEV-4979] - Fixed an issue that caused K-Means to improperly calculate scaled distance.

    Task

    • [PUBDEV-4925] - Nightly and stable releases will now have published sha256 hashes.

    Improvement

    • [PUBDEV-4404] - The h2o.sort() function now includes an `ascending` parameter that allows you to specify whether a numeric column should be sorted in ascending or descending order.
    • [PUBDEV-4964] - H2O no longer terminates when an incompatible client tries to connect.

    Docs

    • [PUBDEV-4949] - Updated the list of required packages for the H2O-3 R client on the H2O Download site and in the User Guide.
    • [PUBDEV-4966] - Added an FAQ to the User Guide FAQ describing how Java 9 users can switch to a supported Java version.

    Weierstrass (3.14.0.3) - 9/18/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-weierstrass/3/index.html

    Technical Task

    • [PUBDEV-4873] - Introduced a Python client side AST optimization.

    Bug

    • [PUBDEV-3525] - In R, `h2o.arrange()` can now sort on a float column.
    • [PUBDEV-4723] - The `as_data_frame()` function no longer drops rows with NAs when `use_pandas` is set to TRUE.
    • [PUBDEV-4735] - In Deep Learning POJOs, fixed an issue in the sharing stage between threads.
    • [PUBDEV-4739] - Fixed an issue in R that caused `h2o.sub` to fail to retain the column names of the frame.
    • [PUBDEV-4757] - Running ifelse() on a constant column no longer results in an error.
    • [PUBDEV-4846] - Using + on string columns now works correctly.
    • [PUBDEV-4848] - Fixed an issue that caused a POJO and a MOJO to return different column names with the `getNames()` method.
    • [PUBDEV-4849] - The R and Python clients now have consistent timeout numbers.
    • [PUBDEV-4868] - Fixed an issue that resulted in an AIOOB error when predicting with GLM. NA responses are now removed prior to GLM scoring.
    • [PUBDEV-4909] - The set_name method now works correctly in the Python client.
    • [PUBDEV-4921] - Replaced the deprecated Clock class in timing.gradle.
    • [PUBDEV-4937] - The MOJO Reader now closes open files after reading.

    New Feature

    • [PUBDEV-4628] - MOJO support has been extended to include the Deep Learning algorithm.
    • [PUBDEV-4845] - Added the ability to import an encrypted (AES128) file into H2O. This can be configured glovally by specifying the `-decrypt_tool` option and installing the tool in DKV.
    • [PUBDEV-4904] - The Decryption API is now exposed in the REST API and in the R client.

    Docs

    • [PUBDEV-4811] - Updated the MOJO Quick Start Guide to show separator differences between Linux/OS X and Windows. Also updated the R example to match the Python example.

    Weierstrass (3.14.0.2) - 8/21/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-weierstrass/2/index.html

    Bug

    • [PUBDEV-4804] - Fixed a broken link to the Hive tutorials from the Productionizing section in the User Guide.
    • [PUBDEV-4822] - Sparkling Water can now pass a data frame with a vector for conversion into H2OFrame. In prior versions, the vector was not properly expanded and resulted in a failure.

    Task

    • [PUBDEV-4802] - Added more tests to ensure that, when max_runtime_secs is set, the returned model works correctly.

    Improvement

    • [PUBDEV-4812] - This version of H2O includes an option to force toggle (on/off) a specific extension. This enables users to enable the XGBoost REST API on a system that does not support XGBoost.
    • [PUBDEV-4829] - A warning now displays when the minimal XGBoost version is used.

    Weierstrass (3.14.0.1) - 8/10/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-weierstrass/1/index.html

    Bug

    • [PUBDEV-2767] - In the R client, making a copy of a factor column and then changing the factor levels no longer causes the levels of the original column to change.
    • [PUBDEV-4584] - Added a **Leaderboard Frame** option in Flow when configuring an AutoML run.
    • [PUBDEV-4586] - The `h2o.performance` function now works correctly on XGBoost models.
    • [PUBDEV-4625] - In the Python client, improved the help string for `h2o_import_file`. This string now indicates that setting `(parse=False)` will return a list instead of an H2OFrame.
    • [PUBDEV-4654] - Removed the Ecko dependency. This is not needed.
    • [PUBDEV-4683] - Fixed an issue that caused the parquet parser to store numeric/float values in a string column. This issue occurred when specifying an unsupported type conversion in Parse Setup (for example, numeric -> string). Users will now encounter an error when attempting this. Additionally, users can now change Enums->Strings in parse setup.
    • [PUBDEV-4686] - Deep Learning POJOs are now thread safe.
    • [PUBDEV-4688] - Fixed the default print method for H2OFrame in Python. Now when a user types the H2OFrame name, a new line is added, and the header is pushed to the next line.
    • [PUBDEV-4702] - Fixed an issue that caused the `max_runtime_secs` parameter to fail correctly when run through the Python client. As a result of this fix, the `max_runtime_secs` parameter was added to Word2vec.
    • [PUBDEV-4704] - Fixed an issue that caused XGBoost grid search to fail when using the Python client.
    • [PUBDEV-4724] - When running with weighted data and columns that are constant after applying weights, a GLM lambda search no longer results in an AIOOB error.
    • [PUBDEV-4730] - The XGBoost `max_bin` parameter has been renamed to `max_bins`, and its default value is now 256.
    • [PUBDEV-4731] - XGBoost Python documentation is now available.
    • [PUBDEV-4732] - In XGBoost, the `learning_rate` (alias: `eta` parameter now has a default value of 0.3.
    • [PUBDEV-4734] - In XGBoost, the `max_depth` parameter now has a default value of 6.
    • [PUBDEV-4735] - Multi-threading is now supported by POJO downloaded.
    • [PUBDEV-4751] - The XGBoost `min_rows` (alias: `min_child_weight`) parameter now has a default value of 1.
    • [PUBDEV-4752] - The XGBoost `max_abs_leafnode_pred` (alias: `max_delta_step`) parameter now has a default value of 0.
    • [PUBDEV-4753] - H2O XGBoost default options are now consistent with XGBoost default values. This fix involved the following changes:
      • num_leaves has been renamed max_leaves, and its default value is 0.
      • The default value for reg_lambda is 0.
    • [PUBDEV-4756] - Removed the Guava dependency from the Deep Water API.
    • [PUBDEV-4776] - In XGBoost, the default value for sample_rate and the alias subsample are now both 1.0.
    • [PUBDEV-4777] - In XGBoost, the default value for colsample_bylevel (alias: colsample_bytree) has been changed to 1.0.
    • [PUBDEV-4783] - Hidden files are now ignored when reading from HDFS.

    New Feature

    • [PUBDEV-4446] - Added a `verbose` option to Deep Learning, DRF, GBM, and XGBoost. When enabled, this option will display scoring histories as a model job is running.
    • [PUBDEV-4682] - Added an `extra_classpath` option, which allows users to specify a custom classpath when starting H2O from the R and Python client.
    • [PUBDEV-4685] - Users can now override the type of a Str/Cat column in a Parquet file when the parser attempts to auto detect the column type.
    • [PUBDEV-4738] - Users can now run a standalone H2O instance and read from a Kerberized cluster's HDFS.
    • [PUBDEV-4745] - Added support for CDH 5.10.
    • [PUBDEV-4750] - Added support for MapR 5.2.

    Improvement

    • [PUBDEV-3947] - Fixed an issue that caused PCA to take 39 minutes to run on a wide dataset. The wide dataset method for PCA is now only enabled if the dataset is very wide.
    • [PUBDEV-4596] - XGBoost-specific WARN messages have been converted to TRACE.
    • [PUBDEV-4624] - When printing frames via `head()` or `tail()`, the `nrows` option now allows you to specify more than 10 rows. With this change, you can print the complete frame, if desired.
    • [PUBDEV-4630] - Improved the speed of converting a sparse matrix to an H2OFrame in R.
    • [PUBDEV-4664] - Added the following parameters to the XGBoost R/Py clients:
      • categorical_encoding
      • sample_type
      • normalize_type
      • rate_drop
      • one_drop
      • skip_drop
    • [PUBDEV-4676] - H2O can now handle sparse vectors as the input of the external frame handler.
    • [PUBDEV-4692] - Added MOJO support for Spark SVM.
    • [PUBDEV-4701] - When running AutoML from within Flow, the default `stopping_tolerance` is now NULL instead of 0.001.
    • [PUBDEV-4748] - Removed dependency on Reflections.

    Docs

    Vapnik (3.12.0.1) 6/6/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-vapnik/1/index.html

    Epic

    • [PUBDEV-4273] - AutoML is now available in H2O. AutoML can be used for automatically training and tuning a number of models within a user-specified time limit or model limit. It is designed to run with as few parameters as possible, and the top performing models can be viewed on a leaderboard. More information about AutoML is available here.

    New Feature

    • [PUBDEV-4451] - With the addition of the AutoML feature, a new **Run AutoML** option is available in Flow under the **Models** dropdown menu.

    Vajda (3.10.5.4) - 7/17/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-vajda/4/index.html

    Bug

    • [PUBDEV-4694] - Fixed an issue that caused tree algos to waste memory by storing categorical values in every tree.

    Vajda (3.10.5.3) - 6/30/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-vajda/3/index.html

    Bug

    • [PUBDEV-4026] - Fixed an issue that resulted in "Unexpected character after column id:" warnings when parsing an SVMLight file.
    • [PUBDEV-4445] - h2o.predict now displays a warning if the features (columns) in the test frame do not contain those features used by the model.
    • [PUBDEV-4572] - The XGBoost REST API is now only registered when backend lib exists.
    • [PUBDEV-4595] - H2O no longer displays an error if there is a "/" in the user-supplied model name. Instead, a message will display indicating that the "/" is replaced with "_".

    Improvement

    • [PUBDEV-3941] - Added support for autoencoder POJOs in in the EasyPredictModelWrapper.
    • [PUBDEV-4269] - H2O now warns the user about the minimal required Colorama version in case of python client. Note that the current minimum version is 0.3.8.
    • [PUBDEV-4537] - Removed deprecation warnings from the H2O build.
    • [PUBDEV-4548] - Moved the initialization of XGBoost into the H2O core extension.

    Docs

    • [PUBDEV-4515] - Added a link to paper describing balance classes in the balance_classes parameter topic.
    • [PUBDEV-4610] - Removed `laplace`, `huber`, and `quantile` from list of supported distributions in the XGBoost documentation.
    • [PUBDEV-4612] - Add heuristics to the FAQ > General Troubleshooting topic.

    Vajda (3.10.5.2) - 6/19/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-vajda/2/index.html

    Bug

    • [PUBDEV-3860] - In PCA, fixed an issue that resulted in errors when specifying `pca_method=glrm` on wide datasets. In addition, the GLRM algorithm can now be used with wide datasets.
    • [PUBDEV-4416] - Fixed issues with streamParse in ORC parser that caused a NullPointerException when parsing multifile from Hive.
    • [PUBDEV-4438] - Fixed an issue that occurred with H2O data frame indexing for large indices that resulted in off-by-one errors. Now, when indexing is set to a value greater than 1000, indexing between left and right sides is no longer inconsistent.
    • [PUBDEV-4456] - In DRF, fixed an issue that resulted in an AssertionError when run on certain datasets with weights.
    • [PUBDEV-4579] - Removed an incorrect Python example from the Sparkling Water booklet. Python users must start Spark using the H2O pysparkling egg on the Python path. Using `--package` when running the pysparkling app is not advised, as the pysparkling distribution already contains the required jar file.
    • [PUBDEV-4594] - In GLM fixed an issue that caused a Runtime exception when specifying the quasibinomial family with `nfold = 2`.

    New Feature

    • [PUBDEV-3624] - Added top an bottom N functions, which allow users to grab the top or bottom N percent of a numerical column. The returned frame contains the original row indices of the top/bottom N percent values extracted into the second column.
    • [PUBDEV-4096] - When building Stacked Ensembles in R, the base_models parameter can accept models rather than just model IDs. Updated the documentation in the User Guide for the base_models parameter to indicate this.
    • [PUBDEV-4523] - Added the following new GBM and DRF parameters to the User Guide: `calibrate_frame` and `calibrate_model`.

    Improvement

    • [PUBDEV-4531] - Improved PredictCsv.java as follows:
      • Enabled PredictCsv.java to accept arbitrary separator characters in the input dataset file if the user includes the optional flag `--separator` in the input arguments. If a user enters a special Java character as the separator, then H2O will add "\".
      • Enabled PredictCsv.java to perform setConvertInvalidNumbersToNa(setInvNumNA)) if the optional flag `--setConvertInvalidNum` is included in the input arguments.
    • [PUBDEV-4578] - Fixed the R package so that a "browseURL" NOTE no longer appears.
    • [PUBDEV-4583] - In the R package documentation, improved the description of the GLM `alpha` parameter.

    Docs

    • [PUBDEV-4524] - In the "Using Flow - H2O’s Web UI" section of the User Guide, updated the Viewing Models topic to include that users can download the h2o-genmodel.jar file when viewing models in Flow.
    • [PUBDEV-4549] - The `group_by` function accepts a number of aggregate options, which were documented in the User Guide and in the Python package documentation. These aggregate options are now described in the R package documentation.
    • [PUBDEV-4575] - Added an initial XGBoost topic to the User Guide. Note that this is still a work in progress.

    Vajda (3.10.5.1) - 6/9/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-vajda/1/index.html

    Technical Task

    Bug

    • [PUBDEV-1457] - PCA no longer reports incorrect values when multiple eigenvectors exist.
    • [PUBDEV-1571] - Users can now specify the weights_column as a numeric index in R.
    • [PUBDEV-1578] - Fixed an issue that caused GLM models returned by h2o.glm() and h2o.getModel(..) to be different.
    • [PUBDEV-1616] - Fixed an issue that caused PCA with GLRM to display incorrect results on data.
    • [PUBDEV-2286] - Fixed an issue that caused `df.show(any_int)` to always display 10 rows.
    • [PUBDEV-2415] - Starting an H2O cloud from R no longer results in "Error in as.numeric(x["max_mem"]) : (list) object cannot be coerced to type 'double'"
    • [PUBDEV-2656] - `h2o::ifelse` now handles NA values the same way that `base::ifelse` does.
    • [PUBDEV-2715] - Fixed an issue in PCA that resulted in incorrect standard deviation and components results for non standardized data.
    • [PUBDEV-2759] - When performing a grid search with a `fold_assignment` specified and with `cross_validation` disabled, Python unit tests now display a Java error message. This is because a fold assignment is meaningless without cross validation.
    • [PUBDEV-2816] - The Python `h2o.get_grid()` function is now in the base h2o object, allowing you to use it the same way as `h2o.get_model()`, `h2o.get_frame()` etc.
    • [PUBDEV-3196] - The `.mean()` function can now be applied to a row in `H2OFrame.apply()`.
    • [PUBDEV-3350] - Fixed an issue that caused a negative value to display in the H2O cluster version.
    • [PUBDEV-3396] - GLM now checks to see if a response is encoded as a factor and warns the user if it is not.
    • [PUBDEV-3470] - Fixed an issue that resulted in an `h2o.init()` fail message even though the server had actually been started. As a result, H2O did not shutdown automatically upon exit.
    • [PUBDEV-3502] - Fixed an issue that caused PCA to hang when run on a wide dataset using the Randomized `pca_method`. Note that it is still not recommended to use Randomized with wide datasets.
    • [PUBDEV-3520] - `h2o.setLevels` now works correctly when wrapped into invisible.
    • [PUBDEV-3651] - Added a dependency for the roxygen2 package.
    • [PUBDEV-3711] - `h2o.coef` in R is now functional for multinomial models.
    • [PUBDEV-3729] - When converting a column to `type = string` with `.ascharacter()` in Python, the `structure` method now correctly recognizes the change.
    • [PUBDEV-3759] - Fixed an issue that caused GBM Grid Search to hang.
    • [PUBDEV-3777] - Subset h2o frame now allows 0 row subset - just as data.frame.
    • [PUBDEV-3815] - Fixed an issue that caused the R `apply` method to fail to work with `h2o.var()`.
    • [PUBDEV-3859] - PCA no longer reports errors when using PCA on wide datasets with `pca_method = Randomized`. Note that it is still not recommended to use Randomized with wide datasets.
    • [PUBDEV-3900] - Jenkins builds no longer all share the same R package directory, and new H2O R libraries are installed during testing.
    • [PUBDEV-3905] - When trimming is done, H2O now checks if it passes the beginning of the string. This check prevents the code from going further down the memory with negative indexes.
    • [PUBDEV-3973] - Stacked Ensembles no longer fails when the `fold_assignment` for base learners is not `Modulo`.
    • [PUBDEV-3988] - Fixed an issue that caused H2O to generate invalid code in POJO for PCA/SVM.
    • [PUBDEV-4079] - Instead of using random charset for getting bytes from strings, the source code now centralizes "byte extraction" in StringUtils. This prevents different build machines from using different default encoders.
    • [PUBDEV-4090] - When performing a Random Hyperparameter Search, if the model parameter seed is set to the default value but a search_criteria seed is not, then the model parameter seed will now be set to search_criteria seed+0, 1, 2, ..., model_number. Seeding the built models makes random hyperparameter searches more repeatable.
    • [PUBDEV-4100] - Fixed a bad link that was included in the "A K/V Store for In-Memory Analytics, Part 2" blog.
    • [PUBDEV-4138] - Comments are now permitted in Content-Type header for application/json mime type. As a result, specifying content-type charset no longer results in the request body being ignored.
    • [PUBDEV-4143] - Improved the Python `group_by` option count column name to match the R client.
    • [PUBDEV-4146] - Fixed broken links in the "Hacking Algorithms into H2O" blog post.
    • [PUBDEV-4156] - The Python API now provides a method to extract parameters from `cluster_status`.
    • [PUBDEV-4171] - Fixed incorrect parsing of input parameters. Previously, system property parsing logic added the value of any system property other than "ga_opt_out" to the arguments list if a property was prefixed with "ai.h2o.". This caused an attempt to parse the value of a system property as if it were itself a system property and at times resulted in an "Unknown Argument" error.
    • [PUBDEV-4174] - Fixed intermittent pyunit_javapredict_dynamic_data_paramsDR.
    • [PUBDEV-4177] - Fixed orc parser test by setting timezone to local time.
    • [PUBDEV-4185] - H2O can now correctly handle preflight OPTIONS calls - specifically in the event of a (1) CORS request and (2) the request has a content type other than text/plain, application/x-www-form-urlencoded, or multipart/form-data.
    • [PUBDEV-4202] - In the REST API, POST of application/json requests no longer fails if requests expect required fields.
    • [PUBDEV-4216] - The R client `impute` function now checks for categorical values and returns an error if none exist.
    • [PUBDEV-4231] - Fixed a filepath issue that occurred on Windows 7 systems when specifying a network drive.
    • [PUBDEV-4234] - Added a response column to Stacked Ensembles so that it can be exposed in the Flow UI.
    • [PUBDEV-4235] - Updated the list of required packages on the H2O download page for the Python client.
    • [PUBDEV-4250] - Updated the header in the Confusion Matrix to make the list of actual vs predicted values more clear.
    • [PUBDEV-4300] - Explicit 1-hot encoding in FrameUtils no longer generates an invalid order of column names. MissingLevel is now the last column.
    • [PUBDEV-4304] - Fixed an issue that caused ModelBuilder to leak xval frames if hyperparameter errors existed.
    • [PUBDEV-4311] - Fixed an issue that caused PCA model output to fail to display the Importance of Components.
    • [PUBDEV-4314] - When using the H2O Python client, the varimp() function can now be used in PCA to retrieve the Importance of Components details.
    • [PUBDEV-4315] - Fixed an issue that caused an ArrayIndexOutOfBoundsException in GLM.
    • [PUBDEV-4316] - When a main model is cloned to create the CV models, clearValidationMessages() is now called. Messages are no longer all thrown into a single bucket, which previously caused confusion with the `error_count()`.
    • [PUBDEV-4317] - ModelBuilder.message(...) now correctly bumps the error count when the message is an error.
    • [PUBDEV-4319] - Fixed an issue with unseen categorical levels handling in GLM scoring. Prediction with "skip" missing value handling in GLM with more than one variable no longer fails.
    • [PUBDEV-4321] - ModelMetricsRegression._mean_residual_deviance is now exposed. For all algorithms except GLM, this is the mean residual deviance. For GLM, this is the total residual deviance.
    • [PUBDEV-4326] - Fixed an issue that caused the`~` operator to fail when used in the Python client. Now, all logical operators set their results as Boolean.
    • [PUBDEV-4328] - Fixed an issue that caused an assertion error in GLM.
    • [PUBDEV-4330] - In GLM, fixed an issue that caused GLM to fail when `quasibinomial` was specified with a link other than the default. Specifying an incorrect link for the quasibinomial family will now result in an error message.
    • [PUBDEV-4350] - Improved the doc strings for `sample_rate_per_class` in R and Python.
    • [PUBDEV-4351] - Fixed a bug in the cosine distance formula.
    • [PUBDEV-4352] - Fixed an issue with CBSChunk set with long argument.
    • [PUBDEV-4363] - C0DChunk with con == NaN now works with strings.
    • [PUBDEV-4378] - When retrieving a Variable Importance plot using the H2O Python client, the default number of features shown is now 10 (or all if < 10 exist). Also reduced the top and bottom margins of the Y axis.
    • [PUBDEV-4381] - When retrieving a Variable Importance plot using the H2O R client, the default number of features shown is now 10 (or all if < 10 exist).
    • [PUBDEV-4416] - Fixed an ORC stream parse.
    • [PUBDEV-4429] - Appended constant string to frame.
    • [PUBDEV-4495] - Fixed an issue with the View Log option in Flow.
    • [PUBDEV-4499] - The h2o.deepwater.available function is now working in the R API.
    • [PUBDEV-4542] - Fixed a bug with Log.info that resulted in bypassing log initialization.
    • [PUBDEV-4543] - LogsHandler now checks whether logging on specific level is enabled before accessing the particular log.
    • [PUBDEV-4546] - Fixed a logging issue that caused PID values to be set to an incorrect value. H2O now initializes PID before we initializing SELF_ADDRESS. This change was necessary because initialization of SELF_ADDRESS triggers buffered logged messages to be logged, and PID is part of the log header.

    Epic

    New Feature

    • [PUBDEV-47] - Generate R bindings now available for REST API.
    • [PUBDEV-103] - Flow: Implemented test infrastructure for Jenkins/CI.
    • [PUBDEV-525] - The R client now reports to the user when memory limits have been exceeded.
    • [PUBDEV-2022] - Added support to impute missing elements for RandomForest.
    • [PUBDEV-2348] - Added a probability calibration plot function.
    • [PUBDEV-2535] - A new h2o.pivot() function is available to allow pivoting of tables.
    • [PUBDEV-3666] - MOJO support has been extended to K-Means models.
    • [PUBDEV-3840] - Added two new options in GBM and DRF: `calibrate_model` and `calibrate_frame`. These flags allow you to retrieve calibrated probabilities for binary classification problems.
    • [PUBDEV-3850] - In Stacked Ensembles, added support for passing in models instead of model IDs when using the R client.
    • [PUBDEV-3970] - Added support for saving and loading binary Stacked Ensemble models.
    • [PUBDEV-4104] - Added support for idxmax, idxmin in Python H2OFrame to get an index of max/min values.
    • [PUBDEV-4105] - Added support for which.max, which.min support for R H2OFrame to get an index of max/min values.
    • [PUBDEV-4134] - A new h2o.sort() function is available in the H2O Python client. This returns a new Frame that is sorted by column(s) in ascending order. The column(s) to sort by can be either a single column name, a list of column names, or a list of column indices.
    • [PUBDEV-4147] - Word2vec can now be used with the H2O Python client.
    • [PUBDEV-4151] - Missing values are filled sequentially for time series data.
    • [PUBDEV-4168] - Enabled cors option flag behind the sys.ai.h2o. prefix for debugging.
    • [PUBDEV-4266] - Added support for converting a Word2vec model to a Frame.
    • [PUBDEV-4280] - Created a Capability rest end point that gives the client an overview of registered extensions.
    • [PUBDEV-4329] - When viewing a model in Flow, a new **Download Gen Model** button is available, allowing you to save the h2o-genmodel.jar file locally.
    • [PUBDEV-4425] - Added an `h2o.flow()` function to base H2O. This allows users to open up a Flow window from within R and Python.
    • [PUBDEV-4472] - The `parse_type` parameter is now case insensitive.
    • [PUBDEV-4478] - Added automatic reduction of categorical levels for Aggregator. This can be done by setting `categorical_encoding=EnumLimited`.
    • [NA] - In GBM and DRF, added two new categorical_encoding schemas: SortByResponse and LabelEncoding. More information about these options is available here.

    Story

    • [PUBDEV-3927] - Added support for Leave One Covariate Out (LOCO). This calculates row-wise variable importances by re-scoring a trained supervised model and measuring the impact of setting each variable to missing or its most central value (mean or median & mode for categoricals).
    • [PUBDEV-4049] - Removed support for Java 6.
    • [PUBDEV-4274] - Integrated XGBoost with H2O core as a separate extension module.

    Task

    • [PUBDEV-4062] - Users can now run predictions in R using a MOJO or POJO without running h2o running.
    • [PUBDEV-4087] - Created a test to verify that random grid search honors the `max_runtime_secs` parameter.
    • [PUBDEV-4193] - Removed javaMess.txt from scripts
    • [PUBDEV-4238] - A new `node()` function is available for retrieving node information from an H2O Cluster.
    • [PUBDEV-4353] - Improved the R/Py doc strings for the `sample_rate_per_class` parameter.
    • [PUBDEV-4412] - Users can now optionally build h2o.jar with a visualization data server using the following: `./gradlew -PwithVisDataServer=true -PvisDataServerVersion=3.14.0 :h2o-assemblies:main:projects`
    • [PUBDEV-4454] - Removed support for the following Hadoop platforms: CDH 5.2, CDH 5.3, and HDP 2.1.
    • [PUBDEV-4466] - Added the ability to go from String to Enum in PojoUtils.
    • [PUBDEV-4479 - Continued modularization of H2O by removing reflections utils and replace them by SPI.
    • [PUBDEV-4481] - Removed the deprecated `h2o.importURL` function from the R API.
    • [PUBDEV-4490] - Stacked Ensembles now removes any unnecessary frames, vecs, and models that were produced when compiled.
    • [PUBDEV-4494] - Updated R and Python doc strings to indicate that users can save and load Stacked Ensemble binary models. In the User Guide, updated the FAQ that previously indicated users could not save and load stacked ensemble models.

    Improvement

    • [PUBDEV-3088] - Improved error handling when users receive the following error: `Error: lexical error: invalid char in json text.
    • [PUBDEV-3500] - In PCA, when the user specifies a value for k that is <=0, then all principal components will automatically be calculated.
    • [PUBDEV-3908] - Exposed metalearner and base model keys in R/Py StackedEnsemble object.
    • [PUBDEV-4072] - The `h2o.download_pojo()` function now accepts a `jar_name` parameter, allowing users to create custom names for the downloaded file.
    • [PUBDEV-4103] - Added port and ip details to the error logs for h2o cloud.
    • [PUBDEV-4141] - When using Hadoop with SSL Internode Security, the `-internal_security` flag is now deprecated in favor of the `-internal_security_conf` flag.
    • [PUBDEV-4169] - Scala version of udf now serializes properly in multinode.
    • [PUBDEV-4181] - Fixed an NPM warn message.
    • [PUBDEV-4184] - Updated the documentation for using H2O with Anaconda and included an end-to-end example.
    • [PUBDEV-4190] - Arguments in h2o.naiveBayes in R are now the same as Python/Java.
    • [PUBDEV-4207] - StackedEnsembles is now stable vs. experimental.
    • [PUBDEV-4256] - Introduced latest_stable_R and latest_stable_py links, making it easy to point users to the current stable version of H2O for Python and R.
    • [PUBDEV-4267] - In the R client, the default for `nthreads` is now -1. The documentation examples have been updated to reflect this change.
    • [PUBDEV-4307] - ModelMetrics can sort models by a different Frame.
    • [PUBDEV-4331] - The application type is now reported in YARN manager, and H2O now overrides the default MapReduce type to H2O type.
    • [PUBDEV-4419] - Added a title option to PrintMOJO utility
    • [PUBDEV-4431] - Flow now uses ip:port for identifying the node as part of LogHandler.
    • [PUBDEV-4465] - Reduced the frequency of Hadoop heartbeat logging.
    • [PUBDEV-4484] - In GLM, quasibinomial models produce binomial metrics when scoring.
    • [PUBDEV-4492] - Implemented methods to get registered H2O capabilities in Python client.
    • [PUBDEV-4493] - Implemented methods to get registered H2O capabilities in R client.
    • [PUBDEV-4498] - Upgraded Flow to version 0.7.0
    • [PUBDEV-4511] - Removed the `selection_strategy` argument from Stacked Ensembles.
    • [PUBDEV-4533] - In Stacked Ensembles, added support for passing in models instead of model IDs when using the Python client.
    • [PUBDEV-4536] - Provided a file that contains a list of licenses for each H2O dependency. This can be acquired using com.github.hierynomus.license.
    • [PUBDEV-4540] - H2O now explicitly checks if the port and baseport is within allowed port range.

    Docs

    • [PUBDEV-2864] - Added documentation describing how to call Rapids expressions from Flow.
    • [PUBDEV-3944] - Added parameter descriptions for Naive Bayes parameter.
    • [PUBDEV-3945] - Added examples for Naive Bayes parameter.
    • [PUBDEV-4075] - Added `label_encoder` and `sort_by_response` to the list of available `categorical_encoding` options.
    • [PUBDEV-4095] - Added support for KMeans in MOJO documentation.
    • [PUBDEV-4078] - Added a topic to the Data Manipulation section describing the `group_by` function.
    • [PUBDEV-4140] - In the Productionizing H2O section of the User Guide, added an example showing how to read a MOJO as a resource from a jar file.
    • [PUBDEV-4182] - Improved the R and Python documentation for coef() and coef_norm().
    • [PUBDEV-4183] - In the GLM section of the User Guide, added a topic describing how to extract coefficient table information. This new topic includes Python and R examples.
    • [PUBDEV-4184] - Added information about Anaconda support to the User Guide. Also included an IPython Notebook example.
    • [PUBDEV-4194] - Added Word2vec to list of supported algorithms on docs.h2o.ai.
    • [PUBDEV-4201] - Uncluttered the H2O User Guide. Combined serveral topics on the left navigation/TOC. Some changes include the following:
      • Moved AWS, Azure, DSX, and Nimbix to a new Cloud Integration section.
      • Added a new **Getting Data into H2O** topic and moved the Supported File Formats and Data Sources topics into this.
      • Moved POJO/MOJO topic into the **Productionizing H2O** section.
    • [PUBDEV-4206] - In the Security topic of the User Guide, added a section about using H2O with PAM authentication.
    • [PUBDEV-4211] - Documentation for `h2o.download_all_logs()` now informs the user that the supplied file name must include the .zip extension.
    • [PUBDEV-4218 - Added an FAQ describing how to use third-party plotting libraries to plot metrics in the H2O Python client. This faq is available in the FAQ > Python topic.
    • [PUBDEV-4230] - Added an "Authentication Options" section to **Starting H2O > From the Command Line**. This section describes the options that can be set for all available supported authentication types. This section also includes flags for setting the newly supported Pluggable Authentication Module (PAM) authentication as well as Form Authentication and Session timeouts for H2O Flow.
    • [PUBDEV-4232] - Updated documentation to indicate that Word2vec is now supported for Python.
    • [PUBDEV-4253] - Added support for HDP 2.6 in the Hadoop Users section.
    • [PUBDEV-4258] - Added two FAQs within the GLM section describing why H2O's glm differs from R's glm and the steps to take to get the two to match. These FAQs are available in the GLM > FAQ section.
    • [PUBDEV-4268] - Updated R examples in the User Guide to reflect that the default value for `nthreads` is now -1.
    • [PUBDEV-4281] - Updated the POJO Quick Start markdown file and Javadoc.
    • [PUBDEV-4290] - Added the `-principal` keyword to the list of Hadoop launch parameters.
    • [PUBDEV-4294] - In the Deep Learning topic, deleted the Algorithm section. The information included in that section has been moved into the Deep Learning FAQ.
    • [PUBDEV-4297] - Documented support for using H2O with Microsoft Azure Linux Data Science VM. Note that this is currently still a BETA feature.
    • [PUBDEV-4309] - Added an FAQ describing YARN resource usage. This FAQ is available in the FAQ > Hadoop topic.
    • [PUBDEV-4336] - Added parameter descriptions for PCA parameters.
    • [PUBDEV-4337] - Added examples for PCA parameters.
    • [PUBDEV-4348] - A new h2o.sort() function is available in the H2O Python client. This returns a new Frame that is sorted by column(s) in ascending order. The column(s) to sort by can be either a single column name, a list of column names, or a list of column indices. Information about this function is available in the Python and R documentation.
    • [PUBDEV-4349] - Updated the "Using H2O with Microsoft Azure" topics.
    • [PUBDEV-4362] - Updated the "What is H2O" section in each booklet.
    • [PUBDEV-4387] - A Deep Water booklet is now available. A link to this booklet is on docs.h2o.ai.
    • [PUBDEV-4396] - Updated GLM documentation to indicate that GLM supports both multinomial and binomial handling of categorical values.
    • [PUBDEV-4397] - Added an FAQ describing the steps to take if a user encounters a "Server error - server 127.0.0.1 is unreachable at this moment" message. This FAQ is available in the FAQ > R topic.
    • [PUBDEV-4401] - Fixed documentation that described estimating in K-means.
    • [PUBDEV-4403] - Updated the documentation that described how to download a model in Flow.
    • [PUBDEV-4444] - The Data Sources topic, which describes that data can come from local file system, S3, HDFS, and JDBC, now also includes that data can be imported by specifying the URL of a file.
    • [PUBDEV-4467] - H2O now supports GPUs. Updated the FAQ that indicated we do not, and added a pointer to Deep Water.

    Ueno (3.10.4.8) - 5/21/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/8/index.html

    Bug

    • [PUBDEV-4123] - Python: Frame summary does not return Python object
    • [PUBDEV-4315] - AIOOB with GLM
    • [PUBDEV-4330] - glm : quasi binomial with link other than default causes an h2o crash

    Improvement

    • [PUBDEV-4332] - Create new /3/SteamMetrics REST API endpoint
    • [PUBDEV-4436] - Steam hadoop user impersonation

    Ueno (3.10.4.7) - 5/8/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/7/index.html

    Bug

    • [PUBDEV-4392] - h2o on yarn: H2O does not respect the cloud name in case of flatfile mode

    Ueno (3.10.4.6) - 4/26/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/6/index.html

    Bug

    • [PUBDEV-4265] - Problem with h2o.uploadFile on Windows
    • [PUBDEV-4339] - glm: get AIOOB exception on attached data
    • [PUBDEV-4341] - External cluster always reports ""Timeout for confirmation exceeded!"

    Ueno (3.10.4.5) - 4/19/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/5/index.html

    Bug

    • [PUBDEV-4293] - Problem with h2o.merge in python
    • [PUBDEV-4306] - Failing SVM parse
    • [PUBDEV-4308] - Rollups computation errors sometimes get wrapped in a unhelpful exception and the original cause is hidden.

    Ueno (3.10.4.4) - 4/15/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/4/index.html

    Technical task

    • [PUBDEV-4244] - Add documentation on how to create a config file

    Bug

    • [PUBDEV-2807] - PCA Rotations not displayed in Python API
    • [PUBDEV-4081] - Sparse matrix cannot be converted to H2O
    • [PUBDEV-4229] - Flow/Schema problem, predicting on frame without response returns empty model metrics
    • [PUBDEV-4246] - Proportion of variance in GLRM for single component has a value > 1
    • [PUBDEV-4251] - HDP 2.6 add to the build
    • [PUBDEV-4252] - Set timeout for read/write confirmation in ExternalFrameWriter/ExternalFrameReader
    • [PUBDEV-4261] - GLM default solver gets AIIOB when run on dataset with 1 categorical variable and no intercept
    • [PUBDEV-4285] - Correct exit status reporting ( when running on YARN )
    • [PUBDEV-4287] - Documentation: Update GLM FAQ and missing_values_handling parameter regarding unseen categorical values

    New Feature

    Task

    • [PUBDEV-4180] - Wrap R examples in code so that they don't run on Mac OS
    • [PUBDEV-4215] - Export polygon function to fix CRAN note in h2o R package
    • [PUBDEV-4248] - Add a parameter that ignores the config file reader when h2o.init() is called

    Improvement

    • [PUBDEV-4239] - Extend Watchdog client extension so cluster is also stopped when the client doesn't connect in specified timeout
    • [PUBDEV-4288] - Set hadoop user from h2odriver

    Ueno (3.10.4.3) - 3/31/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/3/index.html

    Bug

    • [PUBDEV-3281] - ARFF parser parses attached file incorrectly
    • [PUBDEV-4097] - Proxy warning message displays proxy with username and password.
    • [PUBDEV-4165] - h2o.import_sql_table works in R but on python gives error
    • [PUBDEV-4167] - java.lang.IllegalArgumentException with PCA
    • [PUBDEV-4187] - Impute does not handle catgoricals when values is specified
    • [PUBDEV-4219] - Increase number of bins in partial plots

    New Feature

    • [PUBDEV-4162] - h2o.transform can produce incorrect aggregated sentence embeddings

    Improvement

    • [PUBDEV-3858] - Errors with PCA on wide data for pca_method = Power
    • [PUBDEV-4102] - Introduce mode in which failure of H2O client ensures whole H2O clouds goes down
    • [PUBDEV-4178] - Add support for IBM IOP 4.2
    • [PUBDEV-4186] - Placeholder for: [SW-334]
    • [PUBDEV-4191] - Remove minor version from hadoop distribution in buildinfo.json file

    Ueno (3.10.4.2) - 3/18/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/2/index.html

    Bug

    • [PUBDEV-4119] - Deep Learning: mini_batch_size >>> 1 causes OOM issues
    • [PUBDEV-4135] - head(df) and tail(df) results in R are inconsistent for datetime columns
    • [PUBDEV-4144] - GLM with family = multinomial, intercept=false, and weights or SkipMissing produces error
    • [PUBDEV-4155] - glm hot fix: fix model.score0 for multinomial

    New Feature

    • [PUBDEV-4133] - Add option to specify a port range for the Hadoop driver callback
    • [PUBDEV-4139] - Support reading MOJO from a classpath resource

    Improvement

    • [PUBDEV-4056] - Arff Parser doesn't recognize spaces in @attribute
    • [PUBDEV-4099] - How to generate Precision Recall AUC (PRAUC) from the scala code

    Docs

    • [PUBDEV-3977] - Documentation: Add documentation for word2vec
    • [PUBDEV-4118] - Documentation: Add topic for using with IBM Data Science Experience
    • [PUBDEV-4149] - Document "driverportrange" option of H2O's Hadoop driver

    Ueno (3.10.4.1) - 3/3/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/1/index.html

    Technical task

    • [PUBDEV-3943] - Documentation: Naive Bayes links to parameters section

    Bug

    • [PUBDEV-3817] - Error in predict, performance functions caused by fold_column
    • [PUBDEV-3820] - Kmeans Centroid info not Rendered through Python API
    • [PUBDEV-3827] - PCA "Importance of Components" returns "data frame with 0 columns and 0 rows"
    • [PUBDEV-3866] - Stratified sampling does not split minority class
    • [PUBDEV-3885] - R Kmean's user_point doesn't get used
    • [PUBDEV-3903] - Setting -context_path doesn't change REST API path
    • [PUBDEV-3932] - K-means Training Metrics do not match Prediction Metrics with same data
    • [PUBDEV-3938] - h2o-py/tests/testdir_hdfs/pyunit_INTERNAL_HDFS_timestamp_date_orc.py failing
    • [PUBDEV-4017] - gradle update broke the build
    • [PUBDEV-4019] - H2O config (~/.h2oconfig) should allow user to specify username and password
    • [PUBDEV-4032] - Flow/R/Python - H2O cloudInfo should show if cluster is secured or not
    • [PUBDEV-4039] - FLOW fails to display custom models including Word2Vec
    • [PUBDEV-4040] - Import json module as different alias in Python API
    • [PUBDEV-4041] - Stacked Ensemble docstring example is broken
    • [PUBDEV-4042] - The autogen R bindings have an incorrect definition for the y argument
    • [PUBDEV-4047] - AIOOB while training an H2OKMeansEstimator
    • [PUBDEV-4065] - Fix bug in randomgridsearch and Fix intermittent pyunit_gbm_random_grid_large.py
    • [PUBDEV-4066] - Typos in Stacked Ensemble Python H2O User Guide example code
    • [PUBDEV-4073] - StackedEnsemble: stacking fails if combined with ignore_columns
    • [PUBDEV-4083] - AIOOB in GLM

    New Feature

    • [PUBDEV-3852] - Documentation: Add Data Munging topic for file name globbing
    • [PUBDEV-4009] - Integration to add new top-level Plot menu to Flow
    • [PUBDEV-4038] - Add stddev to PDP computation

    Task

    • [PUBDEV-3685] - Update h2o-py README
    • [PUBDEV-3797] - Generate Python API tests for H2O Cluster commands
    • [PUBDEV-3914] - Add documentation for python GroupBy class
    • [PUBDEV-3915] - Document python's Assembly and ConfusionMatrix classes, add python API tests as well
    • [PUBDEV-3937] - Clean up R docs
    • [PUBDEV-3986] - Documentation: Summarize the method for estimating k in kmeans and add to docs
    • [PUBDEV-4006] - Update links to Stacking on docs.h2o.ai
    • [PUBDEV-4021] - H2O config (~/.h2oconfig) should allow user to specify username and password
    • [PUBDEV-4067] - Check if strict_version_check is TRUE when checking for config file

    Improvement

    • [PUBDEV-3781] - Documentation: Add info about sparse data support
    • [PUBDEV-3784] - h2o doc deeplearning: clarify what the (heuristics)defaults for auto are in categorical_encoding
    • [PUBDEV-3919] - Saving/serializing currently existing, detailed model information
    • [PUBDEV-3961] - Py/R: Remove unused 'cluster_id' parameter
    • [PUBDEV-3983] - Update GBM FAQ
    • [PUBDEV-3994] - Documentation: Add info about imputing data in Flow and in Data Manipulation
    • [PUBDEV-3998] - Documentation: Add instructions for running demos
    • [PUBDEV-4005] - AIOOB Exception with fold_column set with kmeans
    • [PUBDEV-4055] - Modify h2o#connect function to accept config with connect_params field
    • [PUBDEV-4059] - Change of h2o.connect(config) interface to support Steam

    Tverberg (3.10.3.5) - 2/16/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tverberg/5/index.html

    Bug

    • [PUBDEV-3848] - GLM with interaction parameter and cross-validation cause Exception
    • [PUBDEV-3916] - pca: hangs on attached data
    • [PUBDEV-3964] - StepOutOfRangeException when building GBM model
    • [PUBDEV-3976] - py unique() returns frame of integers (since epoch) instead of frame of unique dates
    • [PUBDEV-3979] - py date comparisons don't work for rows > 1
    • [PUBDEV-3980] - AstUnique drops column types
    • [PUBDEV-4013] - In R, the confusion matrix at the end doesn’t say: vertical: actual, across: predicted
    • [PUBDEV-4014] - AIOOB in GLM with hex.DataInfo.getCategoricalId(DataInfo.java:952) is the error with 2 fold cross validation
    • [PUBDEV-4036] - Parse fails when trying to parse large number of Parquet files
    • [HEXDEV-683] - POJO doesn't include Forest classes
    • [PUBDEV-4044] - moment producing wrong dates

    Tverberg (3.10.3.4) - 2/3/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tverberg/4/index.html

    Bug

    • [PUBDEV-3965] - Importing data in python returns error - TypeError: expected string or bytes-like object

    Tverberg (3.10.3.3) - 2/2/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tverberg/3/index.html

    Bug

    • [PUBDEV-3835] - Standard Errors in GLM: calculating and showing specifically when called

    Improvement

    Tverberg (3.10.3.2) - 1/31/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tverberg/2/index.html

    Bug

    • Hotfix: Remove StackedEnsemble from Flow UI. Training is only supported from Python and R interfaces. Viewing is supported in the Flow UI.

    Tverberg (3.10.3.1) - 1/30/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tverberg/1/index.html

    Bug

    • [PUBDEV-2464] - Using asfactor() in Python client cannot allocate to a variable
    • [PUBDEV-3111] - R API's h2o.interaction() does not use destination_frame argument
    • [PUBDEV-3694] - Errors with PCA on wide data for pca_method = GramSVD which is the default
    • [PUBDEV-3742] - StackedEnsemble should work for regression
    • [PUBDEV-3865] - h2o gbm : for an unseen categorical level, discrepancy in predictions when score using h2o vs pojo/mojo
    • [PUBDEV-3883] - Negative indexing for H2OFrame is buggy in R API
    • [PUBDEV-3894] - Relational operators don't work properly with time columns.
    • [PUBDEV-3966] - java.lang.AssertionError when using h2o.makeGLMModel

    Story

    • [PUBDEV-3739] - StackedEnsemble: put ensemble creation into the back end

    New Feature

    • [PUBDEV-2058] - Implement word2vec in h2o
    • [PUBDEV-3635] - Ability to Select Columns for PDP computation in Flow
    • [PUBDEV-3881] - Add PCA Estimator documentation to Python API Docs
    • [PUBDEV-3902] - Documentation: Add information about Azure support to H2O User Guide (Beta)

    Task

    • [PUBDEV-3336] - h2o.create_frame(): if randomize=True, `value` param cannot be used
    • [PUBDEV-3740] - REST: implement simple ensemble generation API
    • [PUBDEV-3843] - Modify R REST API to always return binary data
    • [PUBDEV-3844] - Safe GET calls for POJO/MOJO/genmodel
    • [PUBDEV-3864] - Import files by pattern
    • [PUBDEV-3884] - StackedEnsemble: Add to online documentation
    • [PUBDEV-3940] - Add Stacked Ensemble code examples to R docs

    Improvement

    • [PUBDEV-3257] - Documentation: As a K-Means user, I want to be able to better understand the parameters
    • [PUBDEV-3741] - StackedEnsemble: add tests in R and Python to ensure that a StackedEnsemble performs at least as well as the base_models
    • [PUBDEV-3857] - Clean up the generated Python docs
    • [PUBDEV-3895] - Filter H2OFrame on pandas dates and time (python)
    • [PUBDEV-3912] - Provide way to specify context_path via Python/R h2o.init methods
    • [PUBDEV-3933] - Modify gen_R.py for Stacked Ensemble
    • [PUBDEV-3972] - Add Stacked Ensemble code examples to Python docstrings

    Tutte (3.10.2.2) - 1/12/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tutte/2/index.html

    Bug

    Task

    • [PUBDEV-3816] - import functions required for r-release check

    Tutte (3.10.2.1) - 12/22/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tutte/1/index.html

    Bug

    • [PUBDEV-3291] - Summary() doesn't update stats values when asfactor() is applied
    • [PUBDEV-3498] - rectangular assign to a categorical column does not work (should be possible to assign either an existing level, or a new one)
    • [PUBDEV-3618] - Numerical Column Names in H2O and R
    • [PUBDEV-3690] - pred_noise_bandwidth parameter is not reproducible with seed
    • [PUBDEV-3723] - Fix mktime() referencing from 0 base to 1 base for month and day
    • [PUBDEV-3728] - Binary loss functions return error in GLRM
    • [PUBDEV-3747] - python hist() plotted bars overlap
    • [PUBDEV-3750] - Python set_levels doesn't change other methods
    • [PUBDEV-3753] - h2o doc: glm grid search hyper parameters missing/incorrect listing. Presently glrm's is marked as glm's
    • [PUBDEV-3764] - Partial Plot incorrectly calculates for constant categorical column
    • [PUBDEV-3778] - h2o.proj_archetypes returns error if constant column is dropped in GLRM model
    • [PUBDEV-3788] - GLRM loss by col produces error if constant columns are dropped
    • [PUBDEV-3796] - isna() overwrites column names
    • [PUBDEV-3812] - NullPointerException with Quantile GBM, cross validation, & sample_rate < 1
    • [PUBDEV-3819] - R h2o.download_mojo broken - writes a 1 byte file
    • [PUBDEV-3831] - Seed definition incorrect in R API for RF, GBM, GLM, NB
    • [PUBDEV-3834] - h2o.glm: get AIOOB exception with xval and lambda search

    New Feature

    • [PUBDEV-3482] - Supporting GLM binomial model to allow two arbitrary integer values
    • [PUBDEV-3376] - Implement ISAX calculations per ISAX word
    • [PUBDEV-3377] - Optimizations and final fixes for ISAX
    • [PUBDEV-3664] - Implement GLM MOJO
    • [PUBDEV-3501] - Variance metrics are missing from GLRM that are available in PCA
    • [PUBDEV-3541] - py h2o.as_list() should not return headers
    • [PUBDEV-3715] - Modify sum() calculation to work on rows or columns
    • [PUBDEV-3737] - make sure that the generated R bindings work with StackedEnsemble
    • [PUBDEV-3833] - Add HDP 2.5 Support

    Task

    • [PUBDEV-3012] - Remove grid.sort_by method in Python API
    • [PUBDEV-3695] - Documentation: Add GLM to list of algorithms that support MOJOs
    • [PUBDEV-3791] - Documentation: Add quasibinomomial family in GLM
    • [PUBDEV-3676] - Add SLURM cluster documentation
    • [PUBDEV-3692] - Add memory check for GLRM before proceeding
    • [PUBDEV-3765] - Check to make sure hinge loss works for GLRM
    • [PUBDEV-3803] - Add parameters from _upload_python_object to H2OFrame constructor
    • [PUBDEV-3804] - Refer to .h2o.jar.env when detaching R package
    • [PUBDEV-3805] - Call on proper port when exiting R/detaching package
    • [PUBDEV-3806] - Modify search for config file in R api
    • [PUBDEV-3818] - properly handle url in R docs from autogen

    Improvement

    • [PUBDEV-3256] - Documentation: As a GLM user, I want to be able to better understand the parameters
    • [PUBDEV-3758] - Fix bad/inconsistent/empty categorical (bitset) splits for DRF/GBM
    • [PUBDEV-3793] - Auto-generate R bindings

    Turnbull (3.10.1.2) - 12/14/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turnbull/2/index.html

    Bug

    • [PUBDEV-2801] - Starting h2o server from R ignores IP and port parameters
    • [PUBDEV-3484] - Treat 1-element numeric list as acceptable when numeric input required
    • [PUBDEV-3509] - h2o's cor() breaks R's native cor()
    • [PUBDEV-3592] - h2o.get_grid isn't working
    • [PUBDEV-3607] - `cor` function should properly pass arguments
    • [PUBDEV-3629] - Avoid confusing error message when column name is not found.
    • [PUBDEV-3631] - overwrite_with_best_model fails when using checkpoint
    • [PUBDEV-3633] - plot.h2oModel in R no longer supports metrics with uppercase names (e.g. AUC)
    • [PUBDEV-3642] - Fix citibike R demo
    • [PUBDEV-3697] - Create an Attribute for Number of Interal Trees in Python
    • [PUBDEV-3704] - Error with early stopping and score_tree_interval on GBM
    • [PUBDEV-3735] - Python's coef() and coef_norm() should use column name not index
    • [PUBDEV-3757] - Perfbar does not work for hierarchical path passed via -h2o_context

    New Feature

    • [PUBDEV-3474] - Show Partial Dependence Plots in Flow
    • [PUBDEV-3620] - Allow setting nthreads > 255.
    • [PUBDEV-3700] - Add RMSE, MAE, RMSLE, and lift_top_group as stopping metrics
    • [PUBDEV-3719] - Update h2o.mean in R to match Python API

    Task

    • [PUBDEV-3579] - Document Partial Dependence Plot in Flow
    • [PUBDEV-3621] - Add R endpoint for cumsum, cumprod, cummin, and cummax
    • [PUBDEV-3649] - Modify correlation matrix calculation to match R
    • [PUBDEV-3657] - Remove max_confusion_matrix_size from booklets & py doc

    Improvement

    • [HEXDEV-645] - aggregator should calculate domain for enum columns in aggregated output frames & member frames based on current output or member frame
    • [HEXDEV-658] - Naive Bayes (and maybe GLM): Drop limit on classes that can be predicted (currently 1000)
    • [PUBDEV-3625] - Speed up GBM and DRF
    • [PUBDEV-3756] - Support `-context_path` to change servlet path for REST API

    IT Help

    Turing (3.10.0.10) - 11/7/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/10/index.html

    Bug

    • [PUBDEV-3484] - Treat 1-element numeric list as acceptable when numeric input required
    • [PUBDEV-3675] - Cannot determine file type

    Turing (3.10.0.9) - 10/25/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/9/index.html

    Bug

    • [PUBDEV-3546] - h2o.year() method does not return year
    • [PUBDEV-3559] - Regression Training Metrics: Deviance and MAE were swapped
    • [PUBDEV-3568] - h2o.max returns NaN even when na.rf condition is set to TRUE
    • [PUBDEV-3593] - Fix display of array-valued entries in TwoDimTables such as grid search results

    Improvement

    • [PUBDEV-3585] - Optimize algorithm for automatic estimation of K for K-Means
    • [HEXDEV-646] - include flow, /3/ API accessible Aggregator model in h2o-3

    Turing (3.10.0.8) - 10/10/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/8/index.html

    Technical task

    Bug

    • [PUBDEV-3384] - S3 API method PersistS3#uriToKey breaks expected contract
    • [PUBDEV-3437] - GLM multinomial with defaults fails on attached dataset
    • [PUBDEV-3441] - .structure() encounters list index out of bounds when nan is encountered in column
    • [PUBDEV-3455] - max_active_predi tors option in glm does not work anymore
    • [PUBDEV-3461] - Printed PCA model metrics in R is missing
    • [PUBDEV-3477] - R - Unnecessary JDK requirement on Windows
    • [PUBDEV-3505] - uuid columns with mostly missing values causes parse to fail.
    • [HEXDEV-599] - Fold Column not available in h2o.grid

    New Feature

    • [PUBDEV-1943] - Compute partial dependence data
    • [PUBDEV-3422] - Create Method to Return Columns of Specific Type
    • [PUBDEV-3491] - Find optimal number of clusters in K-Means
    • [PUBDEV-3492] - Add optional categorical encoding schemes for GBM/DRF

    Task

    • [PUBDEV-3327] - Tasks for completing MOJO support
    • [PUBDEV-3444] - Ensure functions have `h2o.*` alias in R API

    Improvement

    • [PUBDEV-3465] - Sync up functionality of download_mojo and download_pojo in R & Py
    • [PUBDEV-3499] - Improve the stopping criterion for K-Means Lloyds iterations
    • [HEXDEV-596] - Encryption of H2O communication channels
    • [HEXDEV-636] - add option to Aggregator model to show ignored columns in output frame

    Turing (3.10.0.7) - 9/19/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/7/index.html

    Bug

    • [PUBDEV-3300] - NPE during categorical encoding with cross-validation (Windows 8 runit only??)
    • [PUBDEV-3306] - H2OFrame arithmetic/statistical functions return inconsistent types
    • [PUBDEV-3315] - Multi file parse fails with NPE
    • [PUBDEV-3374] - h2o.hist() does not respect breaks
    • [PUBDEV-3401] - importFiles, with s3n, gives NullPointerException
    • [PUBDEV-3409] - Python Structure() Breaks When Applied to Entire Dataframe

    New Feature

    • [PUBDEV-2707] - Diff operation on column in H2O Frame
    • [HEXDEV-619] - calculate residuals in h2o-3 and in flow and create a new frame with a new column that contains the residuals

    Task

    Improvement

    • [PUBDEV-3296] - In R, allow x to be missing (meaning take all columns except y) for all supervised algo's
    • [PUBDEV-3329] - median() should return a list of medians from an entire frame
    • [PUBDEV-3334] - Conduct rbind and cbind on multiple frames
    • [PUBDEV-3387] - Add argument to H2OFrame.print in R to specify number of rows
    • [PUBDEV-3418] - Suppress chunk summary in describe()

    Turing (3.10.0.6) - 8/25/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/6/index.html

    Bug

    • [HEXDEV-608] - Hashmap in H2OIllegalArgumentException fails to deserialize & throws FATAL
    • [PUBDEV-2879] - NPE in MetadataHandler
    • [PUBDEV-3086] - hist() fails for constant numeric columns
    • [PUBDEV-3173] - Client mode: flatfile requires list of all nodes, but a single entry node should be sufficient
    • [PUBDEV-3207] - Make CreateFrame reproducible for categorical columns.
    • [PUBDEV-3208] - Fix intermittency of categorical encoding via eigenvector.
    • [PUBDEV-3211] - isBitIdentical is returning true for two Frames with different content
    • [PUBDEV-3222] - AssertionError for DL train/valid with categorical encoding
    • [PUBDEV-3237] - Wrong MAE for observation weights other than 1.
    • [PUBDEV-3244] - H2ODriver for CDH5.7.0 does not accept memory settings
    • [PUBDEV-3276] - H2OFrame.drop() leaves the frame in inconsistent state

    New Feature

    • [PUBDEV-3007] - Implement skewness calculation for H2O Frames
    • [PUBDEV-3008] - Implement kurtosis calculation for H2O Frames
    • [PUBDEV-3128] - Add ability to do a deep copy in Python API
    • [PUBDEV-3163] - Add docs for h2o.make_metrics() for R and Python
    • [PUBDEV-3218] - Add RMSLE to model metrics
    • [PUBDEV-3264] - Return unique values of a categorical column as a Pythonic list

    Task

    • [PUBDEV-3235] - Refactor and simplify implementation of Pearson Correlation
    • [PUBDEV-3238] - Add MAE to CV Summary

    Improvement

    • [PUBDEV-2702] - Create h2o.* functions for H2O primitives
    • [PUBDEV-3098] - Add methods to get actual and default parameters of a model
    • [PUBDEV-3132] - Add ability to drop a list of columns or a subset of rows from an H2OFrame
    • [PUBDEV-3138] - Ensure all is*() functions return a list

    Turing (3.10.0.3) - 7/29/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/3/index.html

    Bug

    • [PUBDEV-2805] - Error when setting a string column to a single value in R/Py
    • [PUBDEV-2965] - R h2o.merge() ignores by.x and by.y
    • [PUBDEV-3135] - Download Logs broken URL from Flow

    New Feature

    • [PUBDEV-2958] - H2O Version Check
    • [PUBDEV-3022] - Add an h2o.concat function equivalent to pandas.concat
    • [PUBDEV-3050] - Add Huber loss function for GBM and DL (for regression)
    • [PUBDEV-3071] - Add RMSE to model metrics
    • [PUBDEV-3104] - Add Mean Absolute Error to Model Metrics
    • [PUBDEV-3108] - Add mean absolute error to scoring history and model plotting
    • [PUBDEV-3116] - Add categorical encoding schemes for DL and Aggregator
    • [PUBDEV-3155] - Compute supervised ModelMetrics from predicted and actual values in Java/R
    • [PUBDEV-3162] - Compute supervised ModelMetrics from predicted and actual values in Python

    Improvement

    • [PUBDEV-1888] - Implement gradient checking for DL
    • [PUBDEV-2627] - Add better warning message to functions of H2OModelMetrics objects
    • [PUBDEV-3021] - Add demo datasets to Python package
    • [PUBDEV-3113] - Replace "MSE" with "RMSE" in scoring history table
    • [PUBDEV-3122] - Make all TwoDimTable Headers Pythonic in R and Python API
    • [PUBDEV-3129] - Achieve consistency between DL and GBM/RF scoring history in regression case
    • [PUBDEV-3131] - Disable R^2 stopping criterion in tree model builders
    • [PUBDEV-3149] - Remove R^2 from all model output except GLM

    Turin (3.8.3.4) - 7/15/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turin/4/index.html

    Bug

    • [PUBDEV-3040] - File parse from S3 extremely slow
    • [PUBDEV-3145] - Fix Deep Learning POJO for hidden dropout other than 0.5

    Turin (3.8.3.2) - 7/1/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turin/2/index.html

    Bug

    • [PUBDEV-898] - DRF: sample_rate=1 not permitted unless validation is performed
    • [PUBDEV-2087] - create a set of tests which create large POJOs for each algo and compiles them
    • [PUBDEV-2322] - Merge (method="radix") bug1
    • [PUBDEV-2325] - Merge (method="radix") bug2
    • [PUBDEV-2565] - Fold Column not available in h2o.grid
    • [PUBDEV-2964] - h2o.merge(,method="radix") failing 15/40 runs
    • [PUBDEV-3030] - Parse: java.lang.IllegalArgumentException: 0 > -2147483648
    • [PUBDEV-3032] - Cached errors are not printed if H2O exits
    • [PUBDEV-3072] - java.lang.ClassCastException for Quantile GBM
    • [PUBDEV-3077] - model_summary number of trees is too high for multinomial DRF/GBM models
    • [PUBDEV-3079] - NPE when accessing invalid null Frame cache in a Frame's vecs()
    • [PUBDEV-3081] - TwoDimTable version of a Frame prints missing value (NA) as 0
    • [PUBDEV-3089] - Fix tree split finding logic for some cases where min_rows wasn't satisfied and the entire column was no longer considered even if there were allowed split points
    • [PUBDEV-3093] - saveModel and loadModel don't work with windows c:/ paths
    • [PUBDEV-3095] - getStackTrace fails on NumberFormatException
    • [PUBDEV-3096] - TwoDimTable for Frame Summaries doesn't always show the full precision
    • [PUBDEV-3097] - DRF OOB scoring isn't using observation weights
    • [PUBDEV-3099] - AIOOBE when calling 'getModel' in Flow while a GLM model is training

    Task

    • [PUBDEV-2681] - Properly document the addition of missing_values_handling arg to GLM

    Improvement

    • [PUBDEV-1617] - Matt's new merge (aka join) integrated into H2O
    • [PUBDEV-2822] - Improved handling of missing values in tree models (training and testing)
    • [PUBDEV-3060] - IPv6 documentation
    • [PUBDEV-3066] - Stop GBM models once the effective learning rate drops below 1e-6.
    • [PUBDEV-3094] - Log input parameters during boot of H2O

    Turchin (3.8.2.9) - 6/10/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/9/index.html

    Bug

    • [PUBDEV-2920] - Python apply() doesn't recognize % (modulo) within lambda function
    • [PUBDEV-2940] - Documentation: Add RoundRobin histogram_type to GBM/DRF
    • [PUBDEV-2957] - Add "seed" option to GLM in documentation
    • [PUBDEV-2973] - Documentation: Update supported Hadoop versions
    • [PUBDEV-2981] - Models hang when max_runtime_secs is too small
    • [PUBDEV-2982] - Default min/max_mem_size to gigabytes in h2o.init
    • [PUBDEV-2997] - Add "ignore_const_cols" argument to glm and gbm for Python API
    • [PUBDEV-2999] - AIOOBE in GBM if no nodes are split during tree building
    • [PUBDEV-3004] - Negative R^2 (now NaN) can prevent early stopping
    • [PUBDEV-3011] - Two grid sorting methods in Py API - only one works sometimes

    New Feature

    Task

    • [PUBDEV-3005] - Verify checkpoint argument in h2o.gbm (for R)

    Improvement

    • [PUBDEV-2040] - Sync up argument names in `h2o.init` between R and Python
    • [PUBDEV-2996] - Change `getjar` to `get_jar` in h2o.download_pojo in R
    • [PUBDEV-2998] - Change min_split_improvement default value from 0 to 1e-5 for GBM/DRF
    • [PUBDEV-3013] - Allow specification of "AUC" or "auc" or "Auc" for stopping_metrics, sorting of grids, etc.

    Turchin (3.8.2.8) - 6/2/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/8/index.html

    Bug

    • [PUBDEV-2985] - Make Random grid search consistent between clients for same parameters
    • [PUBDEV-2987] - Allow learn_rate_annealing to be passed to H2OGBMEstimator constructor in Python API
    • [PUBDEV-2989] - Fix typo in GBM/DRF Python API for col_sample_rate_change_per_level - was misnamed and couldn't be set

    New Feature

    • [PUBDEV-2979] - Add a new metric: mean misclassification error for classification models

    Improvement

    • [PUBDEV-2972] - No longer print negative R^2 values - show NaN instead
    • [PUBDEV-2984] - Add xval=True/False as an option to model_performance() in Python API

    Turchin (3.8.2.6) - 5/24/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/6/index.html

    Bug

    • [PUBDEV-1899] - Number of active predictors is off by 1 when Intercept is included
    • [PUBDEV-2942] - GLM with cross-validation AIOOBE (+ Grid-Search + Multinomial, may be related)
    • [PUBDEV-2943] - Improved accuracy for histogram_type="QuantilesGlobal" for DRF/GBM

    New Feature

    • [PUBDEV-1705] - GLM needs 'seed' argument for new (random) implementation of n-folds
    • [PUBDEV-2743] - Add seed argument to GLM

    Improvement

    • [PUBDEV-2928] - Remove _Dev from file name _DataScienceH2O-Dev
    • [PUBDEV-2945] - Clean up overly long and duplicate error message in KeyV3
    • [PUBDEV-2953] - Allow the user to pass column types of an existing H2OFrame during Parse/Upload in R and Python
    • [PUBDEV-2954] - Tweak Parser Heuristic
    • [PUBDEV-2955] - GLM improvements and fixes

    Turchin (3.8.2.5) - 5/19/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/5/index.html

    Technical task

    Bug

    • [PUBDEV-2282] - DRF: cannot compile pojo
    • [PUBDEV-2304] - GBM pojo compile failures
    • [PUBDEV-2878] - Bug in h2o-py H2OScaler.inverse_transform()
    • [PUBDEV-2880] - Add NAOmit() to Rapids
    • [PUBDEV-2897] - AIOOBE in Vec.factor (due to Parse bug?)
    • [PUBDEV-2903] - In grid search, max_runtime_secs without max_models hangs
    • [PUBDEV-2933] - GBM's fold_assignment = "Stratified" breaks with missing values in response column

    New Feature

    • [PUBDEV-2729] - Implement h2o.relevel, equivalent of base R's relevel function
    • [PUBDEV-2857] - Add Kerberos authentication to Flow
    • [PUBDEV-2893] - Summaries Fail in rdemo.citi.bike.small.R
    • [PUBDEV-2895] - DimReduction for EasyModelAPI
    • [PUBDEV-2915] - Make histograms truly adaptive (quantiles-based) for DRF/GBM

    Task

    Improvement

    • [PUBDEV-2905] - Improve the progress bar based on max_runtime_secs & max_models & actual work
    • [PUBDEV-2908] - Improve GBM/DRF reproducibility for fixed parameters and hardware
    • [PUBDEV-2911] - Check sanity of random grid search parameters (max_models and max_runtime_secs)
    • [PUBDEV-2912] - Add Job's remaining time to Flow
    • [PUBDEV-2919] - Add enum option 'histogram_type' to DRF/GBM (and remove random_split_points)
    • [PUBDEV-2923] - JUnit: Separate POJO namespace during junit testing

    Turchin (3.8.2.3) - 4/25/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/3/index.html

    Bug

    • [PUBDEV-2852] - Incorrect sparse chunk getDoubles() extraction

    New Feature

    • [PUBDEV-2825] - Create h2o.get_grid
    • [PUBDEV-2834] - Implement distributed Aggregator for visualization
    • [PUBDEV-2835] - Add col_sample_rate_change_per_level for GBM/DRF
    • [PUBDEV-2836] - Add learn_rate_annealing for GBM
    • [PUBDEV-2837] - Add random cut points for histograms in DRF/GBM (ExtraTreesClassifier)
    • [PUBDEV-2851] - Add limit on max. leaf node contribution for GBM

    Task

    • [PUBDEV-2848] - Add tests for early stopping logic (stopping_rounds > 0)

    Improvement

    • [PUBDEV-2877] - Make NA split decisions internally more consistent

    Turchin (3.8.2.2) - 4/8/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/2/index.html

    Bug

    • [PUBDEV-2820] - Implement max_runtime_secs to limit total runtime of building GLM models with and without cross-validation enabled

    New Feature

    • [PUBDEV-2815] - Add stratified sampling per-tree for DRF/GBM

    Turchin (3.8.2.1) - 4/7/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/1/index.html

    Bug

    • [PUBDEV-2766] - AIOOBE for quantile regression with stochastic GBM
    • [PUBDEV-2770] - Naive Bayes AIOOBE
    • [PUBDEV-2772] - AIOOBE for GBM if test set has different number of classes than training set
    • [PUBDEV-2775] - Number of CPUs incorrect in Flow when using a hypervisor
    • [PUBDEV-2796] - Grid search runtime isn't enforced for CV models
    • [PUBDEV-2819] - AIOOBE in GLM for dense rows in sparse data

    New Feature

    • [PUBDEV-2540] - Compute and display statistics of cross-validation model metrics
    • [PUBDEV-2774] - Add keep_cross_validation_fold_assignment and more CV accessors
    • [PUBDEV-2776] - Set initial weights and biases for DL models
    • [PUBDEV-2791] - Control min. relative squared error reduction for a node to split (DRF/GBM)
    • [PUBDEV-2806] - On-the-fly interactions for GLM
    • [PUBDEV-2815] - Add stratified sampling per-tree for DRF/GBM

    Task

    • [PUBDEV-2055] - Create test cases to show that POJO prediction behavior can be different than in-h2o-model prediction behavior

    Improvement

    • [PUBDEV-2620] - Populate start/end/duration time in milliseconds for all models
    • [PUBDEV-2695] - Consistent handling of missing categories in GBM/DRF (and between H2O and POJO)
    • [PUBDEV-2736] - Alert the user if columns can't be histogrammed due to numerical extremities
    • [PUBDEV-2756] - GLM should generate error if user enter an alpha value greater than 1.
    • [PUBDEV-2763] - Create full holdout prediction frame for cross-validation predictions
    • [PUBDEV-2769] - Support Validation Frame and Cross-Validation for Naive Bayes
    • [PUBDEV-2810] - Add class_sampling_factors argument to DRF/GBM for R and Python APIs

    Turan (3.8.1.4) - 3/16/16

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turan/4/index.html

    Bug

    • [PUBDEV-542] - KMeans: Size of clusters in Model Output is different from the labels generated on the training set
    • [PUBDEV-1976] - GLM fails on negative alpha
    • [PUBDEV-2718] - countmatches bug
    • [PUBDEV-2727] - bug in processTables in communication.R
    • [PUBDEV-2742] - Allow strings to be set to NA

    New Feature

    • [PUBDEV-2719] - Implement Shannon entropy for a string
    • [PUBDEV-2720] - Implement proportion of substrings that are valid English words
    • [PUBDEV-2733] - Add utility function, h2o.ensemble_performance for ensemble and base learner metrics
    • [PUBDEV-2741] - Add date/time and string columns to createFrame.

    Task

    • [PUBDEV-58] - Certify sparkling water on CDH5.2

    Improvement

    • [PUBDEV-277] - Make python equivalent of as.h2o() work for numpy array and pandas arrays

    Turan (3.8.1.3) - 3/6/16

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turan/3/index.html

    Bug

    • [PUBDEV-2644] - Collinear columns cause NPE for P-values computation
    • [PUBDEV-2721] - Update default values in h2o.glm.wrapper from -1 and NaN to NULL
    • [PUBDEV-2722] - AIOOBE in NewChunk

    New Feature

    • [PUBDEV-2111] - Hive UDF form for Scoring Engine POJO for H2O Models

    Turan (3.8.1.2) - 3/4/16

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turan/2/index.html

    Bug

    New Feature

    • [PUBDEV-2711] - Allow DL models to be pretrained on unlabeled data with an autoencoder

    Improvement

    • [PUBDEV-2708] - H2O Flow does not contain CodeMirror library
    • [PUBDEV-2710] - Model export fails: parent directory does not exist
    • [PUBDEV-2712] - Flow doesn't show DL AE error (MSE) plot
    • [PUBDEV-2717] - Do not compute expensive quantiles during h2o.summary call

    Turan (3.8.1.1) - 3/3/16

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turan/1/index.html

    Technical task

    • [PUBDEV-2705] - implement random (stochastic) hyperparameter search

    Bug

    • [PUBDEV-2639] - Parse: Incorrect assertion error caused by very large few column data
    • [PUBDEV-2649] - h2o::|,& operator handles NA's differently than base::|,&
    • [PUBDEV-2655] - h2o::as.logical behavior is different than base::as.logical
    • [PUBDEV-2682] - Importing CSV file is not working with "java -jar h2o.jar -nthreads -1"
    • [PUBDEV-2685] - Allow DL reproducible mode to work with user-given train_samples_per_iteration >= 0
    • [PUBDEV-2690] - Grid Search NPE during Flow display after grid was cancelled
    • [PUBDEV-2693] - NPE in initialMSE computation for GBM
    • [PUBDEV-2696] - DL checkpoint restart doesn't honor a change in stopping_rounds

    New Feature

    • [PUBDEV-1883] - Add option to train with mini-batch updates for DL
    • [PUBDEV-2698] - Return leaf node assignments for DRF + GBM

    Improvement

    • [PUBDEV-2674] - Change default functionality of as_data_frame method in Py H2O
    • [PUBDEV-2697] - Add method setNames for setting column names on H2O Frame
    • [PUBDEV-2703] - NPE in Log.write during cluster shutdown

    Tukey (3.8.0.6) - 2/23/16

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tukey/6/index.html

    Enhancements

    The following changes are improvements to existing features (which includes changed default values):

    System
    • PUBDEV-2362: Handling Sparsity with Missing Values
    • PUBDEV-2683: Fix for erroneous conversion of NaNs to zeros during rebalancing
    • PUBDEV-2684: Remove bigdata test file (not available)

    Bug Fixes

    The following changes resolve incorrect software behavior:

    Algorithms
    • PUBDEV-2678: CV models during grid search get overwritten
    R
    • PUBDEV-2648: Di/trigamma handle NA
    • PUBDEV-2679: Progress bar for grid search with N-fold CV is wrong when max_models is given

    Tukey (3.8.0.1) - 2/10/16

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tukey/1/index.html

    New Features

    These changes represent features that have been added since the previous release:

    API
    • PUBDEV-1798: Ability to conduct a randomized grid search with optional limit of max. number of models or max. runtime
    • PUBDEV-1822: Add score_tree_interval to GBM to score every n'th tree
    • PUBDEV-2311: Make it easy for clients to sort by model metric of choice
    • PUBDEV-2548: Add ability to set a maximum runtime limit on all models
    • PUBDEV-2632: Return a grid search summary as a table with desired sort order and metric
    Algorithms
    • HEXDEV-495: Added ability to calculate GLM p-values for non-regularized models
    • PUBDEV-853: Implemented gain/lift computation to allow using predicted data to evaluate the model performance
    • PUBDEV-2118: Compute the lift metric for binomial classification models
    • PUBDEV-2212: Add absolute loss (Laplace distribution) to GBM and Deep Learning
    • PUBDEV-2402: Add observations weights to quantile computation
    • PUBDEV-2469: For GBM/DRF, add ability to pick columns to sample from once per tree, instead of at every level
    • PUBDEV-2594: Quantile regression for GBM and Deep Learning
    • PUBDEV-2625: Add recall and specificity to default ROC metrics
    Python
    • HEXDEV-399: Added support for Python 3.5 and better (in addition to existing support for 2.7 and better)

    Enhancements

    The following changes are improvements to existing features (which includes changed default values):

    Algorithms
    • PUBDEV-2233: Adjust string substitution and global string substitution to do in place updates on a string column.
    Python
    • PUBDEV-1981: Fix layout issues of Python docs.
    • PUBDEV-2335: as.numeric for a string column only converts strings to ints rather than reals
    • PUBDEV-2257: Table printout in Python doesn't warn the user about truncation
    • PUBDEV-2460: Version mismatch message directs user to get a matching download
    • HEXDEV-527: Implement secure Python h2o.init
    • PUBDEV-2504: Check and print a warning if a proxy environment variable is found
    R
    • PUBDEV-2335: as.numeric for a string column only converts strings to ints rather than reals
    • PUBDEV-2257: Table printout in R doesn't warn the user about truncation
    • PUBDEV-2430: Improve R's reporting on quantiles
    • PUBDEV-2460: Version mismatch message directs user to get a matching download
    Flow
    • PUBDEV-2407: Improve model convergence plots in Flow
    • PUBDEV-2596: Flow shows empty logloss box for regression models
    • PUBDEV-2617: Flow's histogram doesn't cover the full support
    System
    • HEXDEV-436: exportFile should be a real job and have a progress bar
    • PUBDEV-2459: Improve parse chunk size heuristic for better use of cores on small data sets
    • PUBDEV-2606: Print all columns to stdout for Hadoop jobs for easier debugging

    Bug Fixes

    The following changes resolve incorrect software behavior:

    API
    • PUBDEV-2633: Ability to extend grid searches with more models
    Algorithms
    • PUBDEV-1867: GLRM with Simplex Fails with Infinite Objective
    • PUBDEV-2114: Set GLM to give error when lower bound > upper bound in beta contraints
    • PUBDEV-2190: Set GLM to default to a value of rho = 0, if rho is not provided when beta constraints are used
    • PUBDEV-2210: Add check for epochs value when using checkpointing in deep learning
    • PUBDEV-2241: Set warnings about slowness from wide column counts comes before building a model, not after
    • PUBDEV-2278: Fix docstring reporting in iPython
    • PUBDEV-2366: Fix display of scoring speed for autoencoder
    • PUBDEV-2426: GLM gives different std. dev. and means than expected
    • PUBDEV-2595: Bad (perceived) quality of DL models during cross-validation due to internal weights handling
    • PUBDEV-2626: GLM with weights gives different answer h2o vs R
    Python
    • PUBDEV-2319: sd not working inside group_by
    • PUBDEV-2403: Parser reads file of empty strings as 0 rows
    • PUBDEV-2404: Empty strings in Python objects parsed as missing
    R
    • PUBDEV-2319: sd not working inside group_by
    • PUBDEV-2231: Fix bug in summary when zero-count categoricals were present.
    • PUBDEV-1749: Fix h2o.apply to correctly handle functions (so long as functions contain only H2O supported primitives)
    System
    • PUBDEV-1872: Ability to ignore 0-byte files during parse
    • PUBDEV-2401: /Jobs fails if you build a Model and then overwrite it in the DKV with any other type
    • PUBDEV-2603: Improve progress bar for grid/hyper-param searches

    Tibshirani (3.6.0.9) - 12/7/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tibshirani/9/index.html

    New Features

    These changes represent features that have been added since the previous release:

    API
    • PUBDEV-2189: H2O now allows selection of the non_negative flag in GLM for R and Python
    Algorithms
    R
    • PUBDEV-2079: R now retrieves column types for a H2O Frame more efficiently
    Python

    Enhancements

    The following changes are improvements to existing features (which includes changed default values):

    Algorithms
    • GitHub commit: Change in behavior in GLM beta constraints - when ignoring constant/bad columns, remove them from beta_constraints as well
    • GitHub commit: Added ignore_const_cols to all algos
    • PUBDEV-2311: Improved ability to sort by model metric of choice in client
    Python
    • PUBDEV-2409: H2O now checks for H2O_DISABLE_STRICT_VERSION_CHECK env variable in Python GitHub commit
    • GitHub commit: H2O now allows l/r values to be null or an empty string
    • GitHub commit: H2O now accomodates LOAD_FAST and LOAD_GLOBAL in bytecode_to_ast
    R
    • PUBDEV-1378: In R, h2o.getTimezone() previously returned a list of one, now it just returns the string
    System
    • GitHub commit: Added more tweaks to help various low-memory configurations

    Bug Fixes

    The following changes resolve incorrect software behavior:

    API
    • PUBDEV-2042: h2o.grid failed when REST API version was not default
    • PUBDEV-2401: /Jobs failed if you built a Model and then overwrote it in the DKV with any other type GitHub commit
    • PUBDEV-2392: /3/Jobs failed with exception after running /3/SplitFrame
    • GitHub commit: PUBDEV-2426 - Fixed error where sd and mean were adjusted to weights even if no observation weights were passed
    Algorithms
    • PUBDEV-2396: GLRM validation frames must have the same number of rows as the training frame
    • PUBDEV-2053: Fixed assertion failure in Deep Learning
    • PUBDEV-2315: Could not compile POJO using K-means
    • PUBDEV-2317: Could not compile POJO using PCA
    • PUBDEV-2320: Could not compile POJO using Naive Bayes
    • GitHub commit: Fixed weighted mean and standard deviation computation in GLM
    • GitHub commit: Fixed stopping criteria for lambda search and multinomial in GLM
    Python
    R
    • PUBDEV-1749: h2o.apply did not correctly handle functions
    • PUBDEV-2335: R: as.numeric for a string column only converted strings to ints rather than reals
    • PUBDEV-2319: R: sd was not working inside group_by
    • PUBDEV-2397: R: Ignore Constant Columns was not an argument in Algos in R like it is in Flow
    • PUBDEV-2134: When a dataset was sliced, the int mapping of enums was returned
    • PUBDEV-2408: Improved handling when H2O has already been shutdown in R GitHub commit
    • PUBDEV-2231: Fixed categorical levels mapping bug
    System

    Tibshirani (3.6.0.7) - 11/23/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tibshirani/7/index.html

    Enhancements

    The following changes are improvements to existing features (which includes changed default values):

    Algorithms
    • GitHub commit: Added Iterations and Epochs to DL job status updates, added Iterations to scoring history
    • GitHub commit: Cleaned up iteration counter to work for checkpointing
    • GitHub commit: Cleaned up counter iteration logic

    Bug Fixes

    The following changes resolve incorrect software behavior:

    Algorithms
    • GitHub commit: Fixed scoring speed display for autoencoder, was showing 0 because wrong runtime was used (ms since 1970 instead of actual runtime)

    Tibshirani (3.6.0.2) - 11/5/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tibshirani/2/index.html

    New Features

    Algorithms
    • GitHub commit: Added support for grid search
    • PUBDEV-2272: Implemented GLRM grid search in R and Python
    • GitHub commit: PUBDEV-2289: Enabled early convergence-based stopping by default for Deep Learning
    • GitHub commit: Added L1+LBFGS solver for multinomial GLM
    Python
    • GitHub commit: PUBDEV-2289: Added Python API for convergence-based stopping
    R
    • GitHub commit: Added .Last to Delete InitID
    • GitHub commit: PUBDEV-2289: Enabled convergence-based early stopping for R API of Deep Learning

    Enhancements

    Algorithms
    • GitHub commit: Enable grid search for Deep Learning parameters overwrite_with_best_model, momentum_ramp, elastic_averaging, elastic_averaging_moving_rate, & elastic_averaging_regularization
    • GitHub commit: PUBDEV-2289: Stopping tolerance and stopping metric are no longer hidden if stopping_rounds is 0
    • GitHub commit: Added checks to verify the mean, median, nrow, var, and sd are calculated correctly in groupby
    • GitHub commit: mean and sd now return lists
    Python
    • GitHub commit: [PUBDEV-2257] H2O now gives users [row x col] of Frame in __str__
    • GitHub commit: sd/var is now sampled for group_by
    • GitHub commit: Parameter checking is now split between float and strings/unicode
    • GitHub commit: H2O now only wipes src._ex if src_in_self
    • GitHub commit: Refactored default arg handling in astfun
    • GitHub commit: Added new parameters to estimators
    • GitHub commit: Added session start/end; Python now ends the session on exit
    • GitHub commit: src and self types are now checked for None
    • GitHub commit: H2O now passes caches through all prefix ops
    • GitHub commit: H2O now pushes cached types, names, and ncols forward if possible
    R
    System
    • HEXDEV-475: Added EasyPOJO comments and improvements
    • GitHub commit: [PUBDEV-2204] Enabled Vec#toCategoricalVec to convert string columns to categorical columns
    • GitHub commit: apply now works in

    Bug Fixes

    Algorithms
    Python
    R
    • GitHub commit: [PUBDEV-2301, PUBDEV-2314] Hidden grid parameter was passed incorrectly from R
    • GitHub commit: H2O now uses deep copy when using assign from one global to another
    • GitHub commit: Fixed getFrame and directory unlink
    System

    Slotnick (3.4.0.1)

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-slotnick/1/index.html

    New Features

    API
    Algorithms
    • GitHub commit: Added option in PCA to use randomized subspace iteration method for calculation
    • GitHub commit: Deep Learning: Added target_ratio_comm_to_comp to R and Python client APIs
    • GitHub commit: PUBDEV-1247: Added stochastic GBM parameters (sample_rate and col_sample_rate) to R/Py APIs
    • PUBDEV-1450: GLRM has been tested and removed from "experimental" status
    Hadoop
    Python
    R

    This software release introduces changes to the R API that may cause previously written R scripts to be inoperable. For more information, refer to the following link.

    • GitHub commit: Added h2o.getTypes() to the R wrapper
    • GitHub commit: Added ability to set col.types with a named list
    • GitHub commit: Added h2o.getId() to get the back-end distributed key/value store ID from a Frame
    • GitHub commit: Added column types to H2O frame in R, which allows R to set the correct column types when as.data.frame() is used on an H2O frame
    • GitHub commit: Added @export for exported R functions
    System
    • GitHub commit: Added string length util for Enum columns
    • [GitHub commit: Added pass-through version of toCategoricalVec(), toNumericVec(), and toStringVec() to Vec.java for code simplicity and backwards compatibility
    • GitHub commit: Added string column handling to StrSplit()
    Web UI

    Enhancements

    Algorithms
    • PUBDEV-467: Show Frames for DL weights/biases in Flow
    • PUBDEV-1847: DRF/GBM: nbins_top_level is now configurable
    • GitHub commit: Deep Learning: Scoring time is now shown in the logs
    • GitHub commit: Sped up GBM split finding by dynamically switching between single and multi-threaded based on workload
    • PUBDEV-1247: Implemented Stochastic GBM
    • GitHub commit: Parallelized split finding for GBM/DRF (useful for large numbers of columns and nbins).
    • GitHub commit: Added improvements to speed up DRF (up to 35% faster) and stochastic GBM (up to 5x faster)
    • GitHub commit: Added some straight-forward optimizations for GBM histogram building
    • GitHub commit: GLRM is now deterministic between one vs. many chunks
    • GitHub commit: Input parameters are now immutable
    • GitHub commit: PUBDEV-2135: Cleaned up N-fold CV model parameter sanity checking and error message propagation; now checks all N-fold model parameters upfront and lets the main model carry the message to the user
    • GitHub commit: PUBDEV-2130: N-fold CV models are no longer deleted when the main model is deleted
    • GitHub commit: PUBDEV-2107: The title in plot.H2OBinomialMetrics is now editable
    • GitHub commit: Parse Python lambda (bytecode -> ast -> rapids)
    • GitHub commit: PUBDEV-1847: Cleaned up/refactored GBM/DRF
    • GitHub commit: Updated MeanSquare to Quadratic for DL
    • GitHub commit: PUBDEV-2133: Speed up Enum mapping between train/test from O(N^2) to O(N*log(N))
    • GitHub commit: Added GLRM scoring history with step size and average change in objective function value
    • GitHub commit: SVD now outputs the V matrix as a frame with a frame key, rather than a double array in the API
    • GitHub commit: Modified k-means++ initialization in GLRM to set X to inverse of cluster distance with sum normalized to one, for each observation in training data
    • GitHub commit: Increased GBM worker thread priority to avoid deadlock with high parallel GBM job counts
    • GitHub commit: Added input parameter svd_method to GLRM
    Python
    • GitHub commit: centers_std is now returned as a list of columns
    • GitHub commit: str(Frame) no longer returns an ID; updated ExprNode _to_string to accomodate
    • GitHub commit: Changed default setting for _isAllAscii to false
    • GitHub commit: Fixed var to return scalar/frame based on nrow
    • GitHub commit: Python now checks ncol, not nrow
    • PUBDEV-1060: Python's h2o.import_frame() now matches R's importFile() parameters where applicable
    • PUBDEV-1960: Python now uses the streaming endpoint /3/DownloadDataset.bin
    • PUBDEV-2223: Added normalization and standardization coefficients to the model output in Python
    • GitHub commit: Renamed logging to h2o_logging to avoid conflict with original logging package
    • GitHub commit: H2O now recognizes additional parameters (such as column names) for Python objects
    • GitHub commit: head and tail no longer download the entire dataset
    • GitHub commit: Truncated DF in head and tail before calling /DownloadDataset
    • GitHub commit: head() and tail() now default to pretty printing in Python
    • GitHub commit: Moved setup functionality from parse to parse setup; col_types and na_strings can now be dictionaries
    • GitHub commit: Updated H2OColSelect to supply extra argument
    • GitHub commit: PUBDEV-2174: Relative tolerance is now used for floating point comparison
    • GitHub commit: Added more cloud health output to run.py
    • GitHub commit: When Pandas frames are returned, they are now wrapped to display nicely in iPython
    R
    • GitHub commit: Added null check
    • PUBDEV-2185: When appending a vec to an existing data frame, H2O now creates a new data frame while still keeping the original frame in memory
    • PUBDEV-1959: R now uses the streaming endpoint /3/DownloadDataset.bin
    • PUBDEV-2020: h2o.splitFrame() in R/Python now uses the runif technique instead of the horizontal slice technique
    • GitHub commit: Changed T/F to TRUE/FALSE
    • GitHub commit: xml2 package is now required for rversions package
    • GitHub commit: Package dependencies are taken into account when installing R packages
    • GitHub commit: Metrics are now always computer if a dataset is provided (R h2o.performance call)
    • GitHub commit: Column names are now fetched from H2O
    • GitHub commit: PUBDEV-2150: Time columns in H2O are now imported as Date columns in R
    • GitHub commit: h2o.ls() now returns data.frame
    • GitHub commit: h2o.ls() now returns the whole frame
    • GitHub commit: Removed unnamed additional parameters (ellipses) in R algos
    • GitHub commit: Added as.characterto Rapids implementation
    • GitHub commit: Updated plot.H2OModel in R
    • GitHub commit: Updated scoring history plot in R for training_frame only
    • GitHub commit: Instead of : and assign, attr is now used
    • GitHub commit: Raw strings are now used as accessors
    • GitHub commit: name.Frame and dimnames.Frame are now visible
    System
    • GitHub commit: Added vertical prefetch of all chunks' worth of data for dense rows
    • PUBDEV-1426: Scoring is now a non-blocking job with a progress bar
    • GitHub commit: EasyPojo API is now serializable
    • GitHub commit: Changed parse setup guess when encountering large NA counts to not favor numeric over dates or UUIDs
    • GitHub commit: Refactored vector type conversion methods into a class called VecUtils
    • GitHub commit: Cleaned up ASTStrList to handle frames with more than one vector during column conversion; checks types before converting; added several new column type conversions
    • GitHub commit: If the job is cancelled, scoring is now canceled
    • GitHub commit: Refactored doAll_numericResult() -> doAll(nout, type, frame) where all output vecs are of the given type
    • GitHub commit: Improved hash function
    • GitHub commit: The output of _train.get() is now passed to a Frame
    • GitHub commit: Refactored binary/col ops for aesthetics and maintainability
    • GitHub commit: Added correct types for new Vecs; CategoricalWrappedVec now exports a utility for enum conversions instead of a constructor
    • GitHub commit: Mean/sigma values are now printed to the logs after parsing
    • GitHub commit: PUBDEV-2174: Added some optimizations for some chunks (mostly integers) in RollupStats
    • GitHub commit: PUBDEV-2174: Added instantiations of Rollups for dense numeric chunks
    • GitHub commit: PUBDEV-2174: Implemented single-pass variance/stddev calculation for rollups
    • GitHub commit: PUBDEV-2174: Added hasNA() for chunks
    • GitHub commit: Reordered args in sub/gsub (astid > astparameter, add string -> numeric
    • GitHub commit: Ensured all chunks get closed
    • GitHub commit: NewChunk.addString() now accepts a Java string or BufferedString, eliminating needless conversion to a BufferedString before inserting into the NewChunk buffer. Improves efficiency of several ASTStrOps as well as converting Categorical columns to String columns.
    • GitHub commit: Renamed enums to categoricals system-wide
    • GitHub commit: Renamed ValueString -> BufferedString
    • GitHub commit: Removed redundant frame creation; added Java comments to each string utility; changed RAPIDS name of gsub -> replaceall and sub -> replacefirst; added nchar utility to the R client; updated comments in Python and R client
    • GitHub commit: All NA chunks are now handled in string ops
    • GitHub commit: Added ability for string utils to handle NA chunks
    • GitHub commit: Added the ability to handle duplicate rows to merge
    • GitHub commit: countMatches utilities now only work on string columns
    • GitHub commit: Changed names of SubStr and GSubStr to ReplaceFirst and ReplaceAll; both methods now only accept string columns as input
    • GitHub commit: Changed toUpper and toLower to only work on string columns; includes an optimzied version of each method as well as a UTF-safe version
    • GitHub commit: CStrChunks now track whether they are pure ASCII to allow StringUtilities to use optimized versions of the utilities that operate directly on the string buffer
    • GitHub commit: Moved frame function to ArrayUtils
    • GitHub commit: Removed categorical versions of trim() and length()
    • GitHub commit: Changed the merge defaults to match the implementation
    • GitHub commit: Merge no longer uses a by argument
    • GitHub commit: Added trim and length functionality for string columns
    • GitHub commit: HEXDEV-442: Improved POJO handling
    • GitHub commit: Config files are now transferred using a hexstring to avoid issues with Hadoop XML parsing
    • GitHub commit: HEXDEV-445: Added isNA check
    • GitHub commit: Means, mults, modes, and size now do bulk rollups
    • GitHub commit: Increased priority of model builder Driver classes to prevent deadlock when bulk-launching parallel unrelated model builds
    • GitHub commit: Renamed Currents to Rapids
    • GitHub commit: CRAN-based R clients are now set to opt-out by default
    • GitHub commit: Assembly states are now saved in the DKV
    Web UI
    • PUBDEV-1961: Flow now uses the streaming endpoit /3/DownloadDataset.bin

    Bug Fixes

    Algorithms
    • GitHub commit: Fixed bug with CategoricalWrappedVec
    • PUBDEV-1664: Corrected math for GBM Tweedie with offsets/weights
    • PUBDEV-1665: Corrected math for GBM Poisson with offsets/weights
    • PUBDEV-2130: Deleting Deep Learning n-fold models resulted in a java.lang.AssertionError
    • GitHub commit: Fixed GLM with nfolds
    • GitHub commit: Updated GLM InitTsk to run at +1 priority level to avoid deadlock when launching hundreds of GLMs in parallel
    • GitHub commit: Column names (feature names) are now named correctly for the exported weight matrix connecting the input to the first hidden layer
    • GitHub commit: Changed isEnum to isCategorical
    • GitHub commit: Cleaned up DRF and GBM; fixed checkpoint restart logic for trees and changed which parameters are configurable
    • GitHub commit: Fixed incorrect logistic and hinge loss functions and apply to binary numeric columns in {0,1} only
    • GitHub commit: Fixed a bug where Poisson loss function was calculated incorrectly for values of 0
    • GitHub commit: Fixed DL POJO for large input columns
    Python
    R
    System
    • PUBDEV-2250: During parsing, SVMLight-formatted files failed with an NPE GitHub commit
    • PUBDEV-2213: During parsing, alphanumeric data in a column was converted to missing values and the column was assigned a type of int
    • PUBDEV-1990: Spaces are now permitted in the Flow directory name
    • PUBDEV-1037: Space in the user name was preventing H2O from starting
    • GitHub commit: Fixed VecUtils.copyOver() to accept a column type for the resulting copy
    • GitHub commit: Fixed Vec.preWriting so that it does not use an anonymous inner task which causes the entire Vec header to be passed
    • GitHub commit: Fixed parse to mark categorical references in ParseWriter as transient (enums must be node-shared during the entire multiple parse task)
    • GitHub commit: PUBDEV-2182: Fixed DL checkpoint restart with given validation set after R (currents) behavior changed; now the validation set key no longer necessarily matches the file name
    • GitHub commit: Fixed makeCon memory leak when redistribute=T
    • GitHub commit: PUBDEV-2174: Fixed sigma calculation for sparse chunks
    • GitHub commit: Restored pre-existing string manipulation utilities for categorical columns
    • GitHub commit: Fixed syncRPackages task so it doesn't run during the normal build process
    • GitHub commit: Fixed intermittent failures caused by different default timezone settings on different machines; sets needed timezone before starting test
    • GitHub commit: Fixed error message for countmatches
    • GitHub commit: PUBDEV-1443: Fixed size computation in merge
    • GitHub commit: Fixed h2o.tabulate() to work in multi-node mode
    • GitHub commit: Fixed integer overflow in printout of CM to TwoDimTable

    Slater (3.2.0.7) - 10/09/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-slater/7/index.html

    Bug Fixes

    • GitHub commit: Fix Java 6 compatibility

      The Java 7 API call _rawChannel.setOption(StandardSocketOptions.TCP_NODELAY, true); has been replaced by the Java 6 API call _rawChannel.socket().setTcpNoDelay(true);

      The Java 7 API call sock.getRemoteAddress()) has been replaced by sock.socket().getRemoteSocketAddress()


    Slater (3.2.0.5) - 09/24/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-slater/5/index.html

    Enhancements

    Algorithms

    Slater (3.2.0.3) - 09/21/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-slater/3/index.html

    New Features

    R

    Enhancements

    Algorithms
    • GitHub commit: Added back support for sparse activations in DL; currently changes results as numerical values are de-scaled only, no standardized
    Python
    • GitHub commit: Adjusted import_file in Python to accept the same parameters as import_file in R
    R

    Bug Fixes

    Algorithms
    R
    System

    Slater (3.2.0.1) - 09/12/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-slater/1/index.html

    New Features

    Algorithms
    • GitHub: PUBDEV-1888: Added loss function calculation for DL.
    • GitHub: Set more parameters for GLM to be gridable.
    • GitHub: [KMeans] Enable grid search with max_iterations parameter.
    • GitHub: Add kfold column builders
    • GitHub: Add stratified kfold method
    Python
    • PUBDEV-684: Add nfolds to R/Python
    • GitHub: Improved group-by functionality
    • GitHub: Added python example for downloading glm pojo.
    • GitHub: Added countmatches to Python along with a test.
    • GitHub: Added support for getting false positive rates and true positive rates for all thresholds from binomial models; makes it easier to calculate custom metrics from ROC data (like weighted ROC)
    R
    • PUBDEV-1788: Added a factor function that will allow the user to set the levels for a enum column GitHub
    • PUBDEV-1881: Fixed bug in h2o.group_by for enumerator columns
    • GitHub: Refactor SVD method name and add svd_method option to R package to set preferred calculation method
    • PUBDEV-2071: Accept columns of type integer64 from R through as.h2o()
    Sparkling Water
    • PUBDEV-282: Support Windows OS in Sparkling Water
    System
    • HEXDEV-120: Switch from NanoHTTPD to Jetty
    • GitHub: Allow for "most" and "mode" in groupby
    • GitHub: Added NA check to checking for matches in categorical columns
    • PUBDEV-1470: Dropped UDP mode in favor of TCP
    • PUBDEV-1431: /3/DownloadDataset.bin is now a registered handler in JettyHTTPD.java. Allows streaming of large downloads from H2O.GitHub
    • PUBDEV-1865: Implemented per-row 1D, 2D and 3D DCT transformations for signal/image/volume processing
    • PUBDEV-1686: LDAP Integration
    • HEXDEV-381: LDAP Integration
    • HEXDEV-224: Added https support
    • GitHub: Added mapr5.0 version to builds
    • GitHub: Add Vec.Reader which replaces lost caching
    Web UI
    • GitHub: Disallow N-fold CV for GLM when lambda-search is on.
    • GitHub: Added typeahead for http and https.
    • PUBDEV-1821: Added Save Model and Load Model

    Enhancements

    Algorithms
    • GitHub: Don't allocate input dropout helper if input_dropout_ratio = 0.
    • PUBDEV-1920: Datasets : Unbalanced sparse for binomial and multinomial
    • GitHub: Major code cleanup for DL: Remove dead code, deprecate sparse/col_major.
    • PUBDEV-1942: Use prior class probabilities to break ties when making labels GitHub
    • GitHub: Update DL perf Rmd file to get the overall CM error.
    • GitHub: Enable training data shuffling if train_samples_per_iteration==0 and reproducible==true
    • GitHub: Checkpointing for DL now follows the same convention as for DRF/GBM.
    • GitHub: No longer do sampling with replacement during training with shuffle_training_data
    • GitHub: Add printout of sparsity ratio for double chunks.
    • GitHub: Check memory footprint for Gram matrix in PCA and SVD initialization
    • GitHub: Print more fill ratio debugging.
    • GitHub: Fix the RNG for createFrame to be more random (since we are setting the seed for each row).
    • PUBDEV-2010: Improve reporting of unstable DL models GitHub
    • PUBDEV-2018: Improve auto-tuning for DL on large clusters / large datasets GitHub
    • GitHub: Add input parameter to h2o.glrm indicating whether to ignore constant columns
    • GitHub: Missing enums are imputed using the majority class of the column. For other types of missing categorical, just round the mean to the nearest integer.
    • GitHub: Skip rows in training frame with missing value(s) if requested
    • GitHub: Speed up direct SVD by working with transpose directly
    • GitHub: Fix a bug in initialization of SVD and change l2 norm to sum of squared error in convergence test.
    • GitHub: Use absolute value for mean weight and bias checks.
    • GitHub: No longer leak constant chunks during AE scoring/reconstruction.
    • GitHub: No longer differentiate between DL model instabilitites (weights vs biases).
    • GitHub: Make method static, where possible.
    • GitHub: Make GLRM seeding independent of number of chunks.
    API
    • GitHub: Added REST end-points for glrm,svd,pca,naive bayes algorithms.
    • GitHub: Added unicode to frame getter possibilities
    • GitHub: Added proper lookup of offset/weights/fold_column
    • GitHub: Data should be eagered before download_csv.
    • GitHub: Simplified model builder
    • GitHub: Added None as default for "on" field
    • GitHub: Removed all of the unnecessary calls to h2o.init and removed the unnecessary environment variable for version checking during testing
    • PUBDEV-2064: rename the coordinate decent solvers in the REST API / Flow to (experimental)
    Grid Search
    • GitHub: Added check that x is not null before verifying data in unsupervised grid search algorithm
    • GitHub: Made naivebayes parameters gridable.
    • PUBDEV-1933: Called drf as randomForest in algorithm option GitHub
    • GitHub: Validation of grid parameters against algo /parameters rest endpoint.
    • PUBDEV-1979: Train N-fold CV models in parallel GitHub
    • PUBDEV-1978: grid: would be good to add to h2o.grid R help example, how to access the individual grid models
    Python
    • GitHub: Refactored into h2o.system_file so it's parallel to R client.
    • GitHub: Added h2o_deprecated decorator
    • GitHub: Use import_file in import_frame
    • GitHub: Handle a list of columns in python group-by api
    • GitHub: Use pandas if available for twodimtables and h2oframes
    • GitHub: Transform the parameters list into a dict with keys being the parameter label
    • GitHub: Added pop option which does inplace update on a frame (Frame.remove)
    • GitHub: ncol,dim,shape, and friends are now all properties
    • PUBDEV-193: Write python version of h2o.init() which knows how to start h2o
    • PUBDEV-1903: Method to get parameters of model in Python API
    • GitHub: Allow for single alpha specified not be in a list
    • GitHub: Updated endpoint for python client download_csv
    • GitHub: Allow for enum in scale/mean/sd (ignore or give NA)
    • GitHub: Allow for n_jobs=-1 and n_jobs > 1 for Parallel jobs
    • GitHub: Added frame_id property to frame
    • GitHub: Removed remaining splats on dicts
    • GitHub: Removed need to splat pass thru args
    • GitHub: Added get_jar flag to download_pojo
    R
    • PUBDEV-1866: Rewrote h2o.ensemble to utilize nfolds/fold_column in h2o base learners
    • GitHub: Added max_active_predictors.
    • GitHub: Updated REST call from R for model export
    • PUBDEV-1853: Removed addToNavbar from RequestServer GitHub
    • GitHub: Add "Open H2O Flow" message.
    • GitHub: Replaced additive float op by multiplication
    • GitHub: Reimplement checksum for Model.Parameters
    • GitHub: Remove debug prints.
    • PUBDEV-1857: Removed the need for String[] path_params in RequestServer.register() GitHub
    • PUBDEV-1856: Removed the writeHTML_impl methods from all the schemas
    • PUBDEV-1854: Made _doc_method optional in the in Route constructors GitHub
    • PUBDEV-1858: Changed RequestServer so that only one handler instance is created for each Route
    • GitHub: Swapped out rjson for jsonlite for better handling of odd characters from dataset.
    • GitHub: Prettify R's grid output.
    • PUBDEV-1841: R now respects the TwoDimTable's column types
    • GitHub: Fixes show method for grid object when hyper_params is empty.
    • GitHub: h2o.levels returns R vector for single column
    • GitHub: Uses PredictCsv from genmodel now.
    • GitHub: Exposed stacktraces in R's summary() call.
    • GitHub: print type of failed value in $<-
    • GitHub: allow value to be integer in $<-
    • GitHub: Check for is_client being NULL since older H2O clusters may not have is_client.
    Sparkling Water
    • GitHub: Copy content of h2o-dist into target directory.
    System
    • GitHub: Rename label fields in prediction object.
    • GitHub: Uses the original Vec's domain in alignment
    • GitHub: Added columnName and unknownLevel to PredictUnknownCategoricalLevelException.
    • PUBDEV-1559: Added compression of 64-bit Reals GitHub
    • GitHub: Added time information to buildinfo.json.
    • GitHub: Put build metadata into a json file.
    • -GitHub: Add time information to buildinfo.json.
    • GitHub: Delete any prior main CV models of the same key if CV model building is cancelled before the main model started to build.
    • GitHub: Change loading name parameter to a String to address a Flow issue.
    • GitHub: Remove extra assertion to avoid NPEs after client call of bulk remove after done() is called but before the finally is done with updateModelOutput.
    • GitHub: Ensures that date time methods return year/month/day values in the currently set timezone.
    • GitHub: Frees memory from streamed zip reads after the chunk has been parsed.
    • GitHub: Unifies categorical strings to UTF-8 and warns the user about all conversion.
    • GitHub: add isNA checks to scale
    • GitHub: Do not start UDPRecevier thread (unless running with useUDP option)
    Web UI
    • PUBDEV-1961: Flow: use streamining endpoint /3/DownloadDataset.bin

    Bug Fixes

    Algorithms
    • PUBDEV-1785: Deadlock while running GBM
    • GitHub: Fix name for standardized_coefficient_magnitudes.
    • PUBDEV-1774: Setting gbm's balance_classes to True produces suspect models
    • PUBDEV-1849: K-Means: negative sum-of-squares after mean imputation
    • GitHub: Set the iters counter during kmeans center initialization correctly
    • GitHub: fixed parenthesis in GLM POJO generation
    • GitHub: Should be updating model each iteration with the newly fitted kmeans clusters, not the old ones!
    • PUBDEV-1867: GLRM with Simplex Fails with Infinite Objective
    • PUBDEV-1666: GBM:Math correctness for Gamma with offsets/weights
    • PUBDEV-451: Trees in GBM change for identical models GitHub
    • PUBDEV-1924: R^2 stopping criterion isn't working GitHub
    • PUBDEV-1776: GLM: cross-validation bug GitHub
    • PUBDEV-1682: GLM : Lending club dataset => build GLM model => 100% complete => click on model => null pointer exception GitHub
    • PUBDEV-1987: error returned on prediction for xval model
    • PUBDEV-1928: Properly implement Maxout/MaxoutWithDropout GitHub
    • GitHub: print actual number of columns (was just #cols) in DRF init
    • PUBDEV-2026: Fix setting the proper job state in DL models GitHub
    • PUBDEV-1950: Splitframe with rapids is not blocking
    • PUBDEV-1995: nfold: when user cancels an nfold job, fold data still remains in the cluster memory
    • PUBDEV-1994: nfold: cancel results in a java.lang.AssertionError
    • PUBDEV-1910: Canceled GBM with CV keeps lock
    • GitHub: Fix DL checkpoint restart with new data.
    API
    • PUBDEV-1955: Change Schema behavior to accept a single number in place of array GitHub
    • PUBDEV-1914: Iced deserialization fails for Enum Arrays
    Grid
    • PUBDEV-1876: Grid: progress bar not working for grid jobs
    • PUBDEV-1875: Grid: the meta info should not be dumped on the R screen, once the grid job is over
    • GitHub: [PUBDEV-1876] Fix grid update.
    • PUBDEV-1874: Grid search: observe issues with model naming/overwriting and error msg propagation GitHub
    • HEXDEV-402: R: kmeans grid search doesn't work
    • PUBDEV-1901: Grid appends new models even though models already exist.
    • PUBDEV-1874: Grid search: observe issues with model naming/overwriting and error msg propagation
    • PUBDEV-1940: Grid: glm grid on alpha fails with error "Expected '[' while reading a double[], but found 1.0"
    • PUBDEV-1877: Grid: if user specify the parameter value he is running the grid on, would be good to warn him/her
    • PUBDEV-1938: Grid: randomForest: unsupported grid params and wrong error msg
    Hadoop
    • PUBDEV-2036: importModel from hdfs doesn't work
    • PUBDEV-2027: Clicking shutdown in the Flow UI dropdown does not exit the Hadoop cluster
    Python
    • PUBDEV-1789: Python client h2o.remove_vecs (ExprNode) makes bad ast
    • PUBDEV-1795: Unable to read H2OFrame from Python
    • PUBDEV-1764: Python importFile does not import all files in directory, only one file GitHub
    • GitHub: parameter name is "dir" not "path"
    • PUBDEV-1693: Python: Options for handling NAs in group_by is broken
    • PUBDEV-1415: Intermittent Unimplemented rapids exception: pyunit_var.py . Also prior test got unimplemented too, but test didn't fail (client wasn't notified)
    • PUBDEV-1119: Python: Need to be able to access resource genmodel.jar
    • GitHub: Fix download of pojo in Python.
    R
    • GitHub: Fixed bug in h2o.ensemble .make_Z function
    • PUBDEV-1796: R: h2o.importFile doesn't allow user to choose column type during parse
    • PUBDEV-1768: R: Fails to return summary on subsetted frame GitHub
    • PUBDEV-1909: R: Adding column to frame changes string enums in column to numerics
    • PUBDEV-1936: R: h2o.levels return only the first factor of factor levels
    • PUBDEV-1869: R: sd function should convert enum column into numeric and calculate standard deviation GitHub
    • PUBDEV-1246: R: h2o.hist needs to run pretty function for pretty breakpoints to get same results as R's hist GitHub
    • PUBDEV-1868: R: h2o.performance returns error (not warning) when model is reloaded into H2O
    • PUBDEV-1723: h2o R : subsetting data :h2o removing wrong columns, when asked to delete more than 1 columns
    • GitHub: fix h2o.levels issue
    • PUBDEV-1972: R: setting weights_column = NULL causes unwanted variables to be used as predictors
    Sparkling Water
    • PUBDEV-1173: create conversion tasks from primitive RDD
    • GitHub: Fix return value issue in distribution script.
    System
    • HEXDEV-360: getFrame fails on Parsed Data
    • PUBDEV-366: Fix parsing for high-cardinality categorical features GitHub
    • PUBDEV-1143: Parse: Cancel parse unreliable; does not work at all times
    • PUBDEV-1872: Ability to ignore files during parse GitHub
    • PUBDEV-777: Parse : Parsing compressed files takes too long
    • PUBDEV-1916: Parse: 2 node cluster takes 49min vs 40sec on a 1 node cluster GitHub
    • PUBDEV-1431: Convert /3/DownloadDataset to streaming
    • PUBDEV-1995: nfold: when user cancels an nfold job, fold data still remains in the cluster memory
    • PUBDEV-1994: nfold: cancel results in a java.lang.AssertionError
    • PUBDEV-1910: Canceled GBM with CV keeps lock GitHub
    • PUBDEV-1992: CreateFrame isn't totally random
    • GitHub: Fixes a bug that allowed big buffers to be constantly reallocated when it wasn't needed. This saves memory and time.
    • GitHub: Fix print statement.
    • GitHub: Fixed orderly shutdown to work with flatfile.
    • PUBDEV-1998: Parse : Lending club dataset parse => cancelled by user
    • PUBDEV-2028: Shutdown => unimplemented error on curl -X POST 172.16.2.186:54321/3/Shutdown.html
    • PUBDEV-2070: Download frame brings down cluster
    • PUBDEV-2067: Cannot mix negative and positive array selection
    • PUBDEV-2024: Save model to HDFS fails
    Web UI

    Simons (3.0.1.7) - 8/11/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-simons/7/index.html

    New Features

    The following changes represent features that have been added since the previous release:

    Python
    Web UI

    Enhancements

    The following changes are improvements to existing features (which includes changed default values):

    Algorithms
    • GitHub: add seed to the model building that uses balance_classes, for determinism/repeatability
    • GitHub: Reduce the frequency at which tiny tree models are printed to stdout: Only print during the first 4 seconds if score_each_iteration is enabled.
    • GitHub: Only call the limited printout for TwoDimTables during Model.toString () that prints all TwoDimTables of the model._output.
    • GitHub: Only print up to 10 rows of TwoDimTables in ASCII logs (first/last 5).
    • GitHub: Remove some overflow/underflow checks: Let exp(x) be small and log(x) be large.
    • GitHub: Add nbins_top_level parameter to DRF/GBM. Not yet in R.
    • GitHub: Disallow N-fold CV for GLM when lambda-search is on.
    API
    • GitHub: Cleanup of public API of Schema.java. Improve its JavaDoc a lot.
    Python
    • PUBDEV-1765: Improve python online documentation
    • PUBDEV-1497: Python : Weights R tests to be ported from R for GLM/GBM/RF/DL
    • GitHub: adjust to split frame jobs result
    • GitHub: allow for update thingy to be a tuple (so rows and columns)
    • GitHub: when starting h2o jvm with h2o.init(), give h2o child process different id than parent, so it doesn't get killed on Ctrl-C
    • GitHub: add option to turn off progress bar print out
    • GitHub: add unicode to frame getter possibilities
    • GitHub: remove remaining splats on dicts
    • GitHub: no need to splat pass thru args
    • GitHub: proper lookup of offset/weights/fold_column
    • GitHub: data should be eagered before download_csv.
    • GitHub: simplify model builder
    • GitHub: use None as default for "on" field
    • GitHub: add get_jar flag to download_pojo
    • GitHub:remove all of the unnecessary calls to h2o.init and remove the unnecessary environment variable for version checking during testing
    R
    • PUBDEV-1744: Improve help message of h2o.init function
    • GitHub: add valid expression to list of accepted R CMD check outputs.
    • GitHub: added h2o.anomaly demo to r package
    System
    • GitHub: Add -JJ command line argument to allow extra JVM arguments to be passed.
    • GitHub: Refactored CSVStream to be more understandable. Fix empty chunk bug.
    • GitHub: Add hintFlushRemoteChunk to CSVStream.
    • GitHub: Add parameterized route for frame export
    • GitHub: allow string vecs to be toEnum'd (with a sensible cap)
    • GitHub: allow lists of numbers in reducer ops
    • GitHub: Add warning message during POJO export if offset_column is specified (is not supported)
    • PUBDEV-1853: cleanup: remove addToNavbar from RequestServer GitHub
    • GitHub: Add "Open H2O Flow" message.
    • GitHub: Code refactoring to allow GBM JUnits to work with H2OApp in multi-node mode.
    • GitHub: Replace additive float op by multiplication
    • GitHub: Reimplement checksum for Model.Parameters
    • GitHub: Remove debug prints.
    • PUBDEV-1857: cleanup: remove the need for String[] path_params in RequestServer.register() GitHub
    • PUBDEV-1856: cleanup: remove the writeHTML_impl methods from all the schemas
    • PUBDEV-1854: cleanup: make _doc_method optional in the in Route constructors GitHub
    • PUBDEV-1858: cleanup: change RequestServer so that only one handler instance is created for each Route

    Bug Fixes

    The following changes are to resolve incorrect software behavior:

    Algorithms
    • PUBDEV-1674: gbm w gamma: does not seems to split at all; all trees node pred=0 for attached data GitHub
    • PUBDEV-1760: GBM : Deviance testing for exp family
    • PUBDEV-1714: gbm gamma: R vs h2o same split variable, slightly different leaf predictions
    • PUBDEV-1755: DL : Math correctness for Tweedie with Offsets/Weights
    • PUBDEV-1758: DL : Deviance testing for exp family
    • PUBDEV-1756: DL : Math correctness for Poisson with Offsets/Weights
    • PUBDEV-1651: null/residual deviances don't match for various weights cases
    • PUBDEV-1757: DL : Math correctness for Gamma with Offsets/Weights
    • PUBDEV-1680: gbm gamma: seeing train set mse incs after sometime
    • PUBDEV-1724: gbm w tweedie: weird validation error behavior
    • PUBDEV-1774: setting gbm's balance_classes to True produces suspect models
    • PUBDEV-1849: K-Means: negative sum-of-squares after mean imputation
    • GitHub: Set the iters counter during kmeans center initialization correctly
    • GitHub: fixed parenthesis in GLM POJO generation
    • GitHub: Should be updating model each iteration with the newly fitted kmeans clusters, not the old ones!
    • PUBDEV-1867: GLRM with Simplex Fails with Infinite Objective
    • PUBDEV-1666: GBM:Math correctness for Gamma with offsets/weights
    Python
    • PUBDEV-1779: Fixes intermittent failure seen when Model Metrics were looked at too quickly after a cross validation run.
    • PUBDEV-1409: h2o python h2o.locate() should stop and return "Not found" rather than passing path=None to h2o? causes confusion h2o message GitHub
    • PUBDEV-1630: GBM getting intermittent assertion error on iris scoring in pyunit_weights_api.py
    • PUBDEV-1770: sigterm caught by python is killing h2o GitHub
    • PUBDEV-1409: h2o python h2o.locate() should stop and return "Not found" rather than passing path=None to h2o? causes confusion h2o message
    • HEXDEV-397: Python fold_column option requires fold column to be in the training data
    • HEXDEV-394: Python client occasionally throws attached error
    • GitHub: add missing args to kmeans
    • GitHub: add missing kmeans params in
    • GitHub: add missing checkpoint param
    • PUBDEV-1785: Deadlock while running GBM
    R
    • PUBDEV-1830: h2o.glm throws an error when fold_column and validation_frame are both specified
    • PUBDEV-1660: h2oR: when try to get a slice from pca eigenvectors get some formatting error GitHub
    • GitHub: fix broken %in% in R
    • PUBDEV-1831: Cross-validation metrics are not displayed in R (and Python?)
    • PUBDEV-1840: Autoencoder model doesn't display properly in R (training metrics) GitHub
    System
    • PUBDEV-1790: can't convert iris species column to a character column.
    • PUBDEV-1520: Kmeans pojo naming inconsistency
    • GitHub: fix parse of range ast
    • GitHub: Sets POJO file name to match the class name. Prior behavior would allow them to be different and give a compile error.
    Web UI
    • PUBDEV-1754: Export frame not working in flow : H2OKeyNotFoundArgumentException

    Simons (3.0.1.4) - 7/29/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-simons/4/index.html

    New Features

    Algorithms
    Python
    • PUBDEV-386: Expose ParseSetup to user in Python
    • PUBDEV-1239: Python: getFrame and getModel missing
    • HEXDEV-334: support rbind in python
    • PUBDEV-1215: python to have exportFile calll
    • GitHub: add cross-validation parameter to metric accessors and respective pyunit
    • PUBDEV-1729: Cross-validation metrics should be shown in R and Python for all models
    R
    • PUBDEV-385: Expose ParseSetup to user in R
    • GitHub: add mean residual deviance accessor to R interface
    • GitHub: incorporate cross-validation metric access into the R client metric accessors
    • GitHub: R interface for checkpointing in RF enabled
    System
    • PUBDEV-1735: Add 24-MAR-14 06.10.48.000000000 PM style date to autodetected

    Enhancements

    #####API

    Algorithms
    • GitHub: Add proper deviance computation for DL regression.
    • GitHub: Print GLM model details to the logs.
    • GitHub: Disallow categorical response for GLM with non-binomial family.
    • GitHub: Disallow models with more than 1000 classes, can lead to too large values in DKV due to memory usage of 8*N^2 bytes (the Metrics objects which are in the model output)
    • GitHub: DL: Don't train too long in single node mode with auto-tuning.
    • GitHub: Use mean residual deviance to do early stopping in DL.
    • GitHub: Add a "AUTO" setting for fold_assignment (which is Random). This allows the code to reject non-default user-given values if n-fold CV is not enabled.
    Python
    • HEXDEV-317: Python has to play nicely in a polyglot, long-running environment
    • GitHub: simplify ast in python frame slicer
    • GitHub: add cross validation metrics and mean residual deviance to model show()
    • GitHub: any to take a frame, simplify python's __contains__
    R
    • GitHub: On detaching h2o R package, only shut down H2O instance if it was started by the R client
    • GitHub: update h2o load
    System
    • GitHub: Print a handy message (Open H2O Flow in your web browser) when the cluster comes up like Sparkling Water does.
    • GitHub: Replace memory leaky RCurl getURL with curlPerform.
    • GitHub: Add -disable_web parameter.
    • GitHub: allow numerics in match
    • GitHub: More refactoring of h2o start. Includes:
      • H2OStarter - a generic class to start H2O. It does all dynamic registration
      • H2OTestStarter - a generic class to start h2o-core tests
    • GitHub: Use typed key when it is necessary. Key.make() now returns typed Key. The trick is that type T can be derived by left side of assignment. If it is not possible to derive type of the Key, then developer has to use typed syntax: Key.<Frame>make("myframe.hex") The change simplifies Scala code which will be able to derive type key.
    • PUBDEV-1793: Add Job state and start/end time to the model's output GitHub
    • GitHub: add more places to look when trying to start jar from python's h2o.init
    • GitHub: Cosmetic name changes
    • GitHub: Fetch local node differently from remote node.
    • GitHub: Don't clamp node_idx at 0 anymore.
    • GitHub: Added -log_dir option.

    Bug Fixes

    API
    • PUBDEV-776: Schema.parse() needs to be better behaved (like, not crash)
    Algorithms
    • PUBDEV-1725: pca:glrm - give bad results for attached data (bec of plus plus initialization)
    • GitHub: Fix deviance calculation, use the sanitized parameters from the model info, where Auto parameter values have been replaced with actual values
    • GitHub: Fix offset in DL for exponential family (that doesn't do standardization)
    • GitHub: Fix a bug where initial Y was set to all zeroes by kmeans++ when scaling was disabled
    • PUBDEV-1668: GBM: Math correctness for weights
    • PUBDEV-1783: dl: deviance off for large dataset GitHub
    • PUBDEV-1667: GBM: Math correctness for Offsets
    • PUBDEV-1778: drf: reporting incorrect mse on validation set GitHub
    • GitHub: Fix DRF scoring with 0 trees.
    Python
    R
    • PUBDEV-1257: R: no is.numeric method for H2O objects
    • PUBDEV-1622: NPE in water.api.RequestServer, water.util.RString.replace(RString.java:132)...got flagged as WARN in log...I would think we should have all NPE's be ERROR / fatal? or ?? GitHub
    • PUBDEV-1655: h2o.strsplit needs isNA check
    • PUBDEV-1084: h2o.setTimezone NPE
    • PUBDEV-1738: R: cloud name creation can't handle user names with spaces
    System
    • PUBDEV-1410: apply causes assert errors mentioning deadlock in runit_small_client_mode ...build never completes after hours ..deadlock?
    • PUBDEV-1195: docker build fails
    • HEXDEV-362: Bug in /parsesetup data preview GitHub
    • PUBDEV-1766: H2O xval: when delete all models: get Error evaluating future[6] :Error calling DELETE /3/Models/gbm_cv_13
    • PUBDEV-1767: H2O: when list frames after removing most frames, get: roll ups not possible vec deleted error GitHub
    Web UI
    • PUBDEV-1782: Flow: View Data fails when there is a UUID column (and maybe also a String column)
    • PUBDEV-1769: xval: cancel job does not work GitHub

    Simons (3.0.1.3) - 7/24/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-simons/3/index.html

    New Features

    Python

    Enhancements

    API
    • GitHub: Increase sleep from 2 to 3 because h2o itself does a sleep 2 on the REST API before triggering the shutdown.
    System

    Bug Fixes

    The following changes are to resolve incorrect software behavior:

    Algorithms
    • PUBDEV-1743: gbm poisson w weights: deviance off
    • PUBDEV-1736: gbm poisson with offset: seems to be giving wrong leaf predictions
    Python
    • PUBDEV-1731: Python get_frame() results in deleting a frame created by Flow
    • HEXDEV-389: Split frame from python
    • HEXDEV-388: python client H2OFrame constructor puts the header into the data (as the first row)
    R
    • PUBDEV-1504: Runit intermittent fails : runit_pub_180_ddply.R
    • PUBDEV-1678: Client mode jobs fail on runit_hex_1750_strongRules_mem.R
    System
    • GitHub: Model parameters should be always public.

    Simons (3.0.1.1) - 7/20/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-simons/1/index.html

    New Features

    Algorithms
    Python
    • PUBDEV-1437: Python needs "nlevels" operator like R
    • PUBDEV-1434: Python needs "levels" operator, like R
    • PUBDEV-1355: Python needs h2o.trim, like in R
    • PUBDEV-1354: Python needs h2o.toupper, like in R
    • PUBDEV-1352: Python needs h2o.tolower, like in R
    • PUBDEV-1350: Python needs h2o.strsplit, like in R
    • PUBDEV-1347: Python needs h2o.shutdown, like in R
    • PUBDEV-1343: Python needs h2o.rep_len, like in R
    • PUBDEV-1340: Python needs h2o.nlevels, like in R
    • PUBDEV-1338: Python needs h2o.ls, like in R
    • PUBDEV-1344: Python needs h2o.saveModel, like in R
    • PUBDEV-1337: Python needs h2o.loadModel, like in R
    • PUBDEV-1335: Python needs h2o.interaction, like in R
    • PUBDEV-1334: Python needs h2o.hist, like in R
    • PUBDEV-1351: Python needs h2o.sub, like in R
    • PUBDEV-1333: Python needs h2o.gsub, like in R
    • PUBDEV-1336: Python needs h2o.listTimezones, like in R
    • PUBDEV-1346: Python needs h2o.setTimezone, like in R
    • PUBDEV-1332: Python needs h2o.getTimezone, like in R
    • PUBDEV-1329: Python needs h2o.downloadCSV, like in R
    • PUBDEV-1328: Python needs h2o.downloadAllLogs, like in R
    • PUBDEV-1327: Python needs h2o.createFrame, like in R
    • PUBDEV-1326: Python needs h2o.clusterStatus, like in R
    • PUBDEV-1323: Python needs svd algo
    • PUBDEV-1322: Python needs prcomp algo
    • PUBDEV-1321: Python needs naiveBayes algo
    • PUBDEV-1320: Python needs model num_iterations accessor for clustering models, like R's
    • PUBDEV-1318: Python needs screeplot and plot methods, like R's. (should probably check for matplotlib)
    • PUBDEV-1317: Python needs multinomial model hit_ratio_table accessor, like R's
    • PUBDEV-1316: Python needs model scoreHistory accessor, like R's
    • PUBDEV-1315: R needs weights and biases accessors for deeplearning models
    • PUBDEV-1313: Python needs "as.Date" operator, like R's
    • PUBDEV-1312: Python needs "rbind" operator, like R's
    • PUBDEV-1345: Python needs h2o.setLevel and h2o.setLevels, like in R
    • PUBDEV-1311: Python needs "setLevel" operator, like R's
    • PUBDEV-1306: Python needs "anyFactor" operator, like R's
    • PUBDEV-1305: Python needs "table" operator, like R's
    • PUBDEV-1301: Python needs "as.numeric" operator, like R's
    • PUBDEV-1300: Python needs "as.character" operator, like R's
    • PUBDEV-1293: Python needs "signif" operator, like R's
    • PUBDEV-1292: Python needs "round" operator, like R's
    • PUBDEV-1291: Python need transpose operator, like R's t operator
    • PUBDEV-1289: Python needs element-wise division and multiplication operators, like %/% and %-%in R
    • PUBDEV-1330: Python needs h2o.exportHDFS, like in R
    • PUBDEV-1357: Python and R need which operator GitHub
    • PUBDEV-1356: Python and R needs isnumeric and ischaracter operators
    • PUBDEV-1342: Python needs h2o.removeVecs, like in R
    • PUBDEV-1324: Python needs h2o.assign, like in R GitHub
    • PUBDEV-1296: Python and R h2o clients need "any" operator, like R's
    • PUBDEV-1295: Python and R h2o clients need "prod" operator, like R's
    • PUBDEV-1294: Python and R h2o clients need "range" operator, like R's
    • PUBDEV-1290: Python and R h2o clients need "cummax", "cummin", "cumprod", and "cumsum" operators, like R's
    • PUBDEV-1325: Python needs h2o.clearLog, like in R
    • PUBDEV-1349: Python needs h2o.startLogging and h2o.stopLogging, like in R
    • PUBDEV-1341: Python needs h2o.openLog, like in R
    • PUBDEV-1348: Python needs h2o.startGLMJob, like in R
    • PUBDEV-1331: Python needs h2o.getFutureModel, like in R
    • PUBDEV-1302: Python needs "match" operator, like R's
    • PUBDEV-1298: Python needs "%in%" operator, like R's
    • PUBDEV-1310: Python needs "scale" operator, like R's
    • PUBDEV-1297: Python needs "all" operator, like R's
    • GitHub: add start_glm_job() and get_future_model() to python client. add H2OModelFuture class. add respective pyunit
    R
    • PUBDEV-1273: Add h2oEnsemble R package to h2o-3
    • PUBDEV-1319: R needs centroid_stats accessor like Python, for clustering models
    Rapids
    • PUBDEV-1635: the equivalent of R's "any" should probably implemented in rapids
    • PUBDEV-1634: the equivalent of R's cummin, cummax, cumprod, cumsum should probably implemented in rapids
    • PUBDEV-1633: the equivalent of R's "range" should probably implemented in rapids
    • PUBDEV-1632: the equivalent of R's "prod" should probably implemented in rapids
    • PUBDEV-1699: the equivalent of R's "unique" should probably implemented in rapids GitHub
    System
    • GitHub: changed to new AMI
    • PUBDEV-679: Create cross-validation holdout sets using the per-row weights
    • GitHub: Add user_name. Add ExtensionHandler1.
    • GitHub: Added auth options to h2o.init().
    • GitHub: Added H2O.calcNextUniqueModelId().
    • GitHub: Add ldap arg.
    Web UI
    • HEXDEV-231: Flow: Ability to change column type post-Parse

    Enhancements

    Algorithms
    • GitHub: use fixed seed to avoid bad splits with some seeds
    • GitHub: Change seed to avoid type flip from integer to double after row slicing, which leads to different split decisions
    • GitHub: Add option during kmeans scoring to return matrix of indicator columns for cluster assignment, which is necessary for initializing GLRM
    • GitHub: Output number of processed observations in PCA
    • GitHub: Add validation into PCA with GramSVD
    • GitHub: Code cleanup of distributions. Also rename _n_folds -> _nfolds for consistency
    • GitHub: Remove restriction to data frames with more than 1 column
    • GitHub: Add debugging output for DL auto-tuning.
    • PUBDEV-556: implement algo-agnostic cross-validation mechanism via a column of weights
    • GitHub: When initializing with kmeans++ set X to matrix of indicator columns corresponding to cluster assignments, unless closed form solution exists
    • GitHub: Always print DL auto-tuning info for now.
    • PUBDEV-1657: pca: would be good to remove the redundant std dev from flow pca model object
    API
    • GitHub: Set Content-Type: application/x-www-form-urlencoded for regular POST requests.
    • HEXDEV-272: Move response_column parameter above ignored_columns parameter GitHub
      • All of the fields of a schema are now stored in the leaf child of the class hierarchy. Changed the implementation of fields() to simply return the fields variable of a schema. The function calls H2O.fail() if it attempts to access a field from a non-leaf child. response_column is now moved above ignored_columns for every applicable schema. 'own_fields' is also now renamed to 'fields'
    • GitHub: Don't use features from servlet api 3.0 or later anymore. Instead save the response status in a thread local variable and fish it out when needed.
    Python
    • GitHub: don't use the header of the timezone table for a choice
    • GitHub: never delete models. ever.
    • GitHub: add na_rm argument
    • GitHub: add prod to python interface
    System
    • GitHub: use Key instead of Vec in refcnter
    • GitHub: protect vecs in apply
    • GitHub: Allows for more than one column to remain unnamed. The new naming will fill in the blanks.
    • GitHub: Refactoring of hadoop mapper and driver.
    • GitHub: Remove -hdfs option.
    • GitHub: Adds more checks for a parse cancel at more stages during the post ingestion file parse.
    • GitHub: Refactor method name for clarification.
    • GitHub: Cleans up and comments the freeing of chunks from a parsed file.
    • GitHub: Since more startup logic is getting added, simplify H2OClientApp as much as possible. Remove H2OClient entirely.
    • GitHub: Add dedicated AddCommonResponseHeadersHandler handler to set common response headers up-front.
    • GitHub: More refactoring of startup. Pushed a bunch of code from H2OApp into H2O. Added H2O.configureLogging().
    • GitHub: Make Progress extend Keyed.
    • GitHub: Make createServer() protected.
    • GitHub: model_id should probably be a Key, not Key.
    • GitHub: Change Jetty version from 9 to 8 to get Java 6 compatibility back.
    Web UI
    • PUBDEV-1521: show REST API and overall UI response times for each cell in Flow
    • HEXDEV-304: Flow: Emphasize run time in job-progress output
    • PUBDEV-1522: show wall-clock start and run times in the Flow outline
    • PUBDEV-1707: Hook up "Export" button for datasets (frames) in Flow.

    Bug Fixes

    Algorithms
    • PUBDEV-1641: gbm w poisson: get java.lang.AssertionError' at hex.tree.gbm.GBM$GBMDriver.buildNextKTrees on attached data
    • PUBDEV-1672: kmeans: get AIOOB with user specified centroids GitHub
      • Throw an error if the number of rows in the user-specified initial centers is not equal to k.
    • PUBDEV-1654: pca: gram-svd std dev differs for v2 vs v3 for attached data
    • GitHub: Fix DL
    • GitHub: Fix a bug in PCA utilities for k = 1
    • PUBDEV-1700: nfolds: flow-when set nfold =1 job hangs for ever; in terminal get java.lang.AssertionError
    • PUBDEV-1706: GBM/DRF: is balance_classes=TRUE and nfolds>1 valid? GitHub
    • PUBDEV-806: GLM => runit_demo_glm_uuid.R : water.exceptions.H2OIllegalArgumentException
    • PUBDEV-1696: Client (model-build) is blocked when passing illegal nfolds value. GitHub
    • PUBDEV-1690: Cross Validation: if nfolds > number of observations, should it default to leave-one-out cross-validation?
    • PUBDEV-1537: pca: on airlines get java.lang.AssertionError at hex.svd.SVD$SVDDriver.compute2(SVD.java:219) GitHub
    • PUBDEV-1603: pca: glrm giving very different std dev than R and h2o's other methods for attached data
    • GitHub: Fix a potential race condition in tree validation scoring.
    • GitHub: Fix GLM parameter schema. Clean up hasOffset() and hasWeights()
    Python
    • PUBDEV-1627: column name missing (python client)
    • PUBDEV-1629: python client's tail() header incorrect GitHub
    • PUBDEV-1413: intermittent assertion errors in pyunit_citi_bike_small.py/pyunit_citi_bike_large.py. Client apparently not notified
    • PUBDEV-1590: "Trying to unlock null" assertion during pyunit_citi_bike_large.py
    • PUBDEV-1400: match operator should take numerics
    R
    Rapids
    Sparkling Water
    System
    • PUBDEV-1551: Parser: Multifile Parse fails with 0-byte files in directory GitHub
    • HEXDEV-325: Empty reply when parsing dataset with mismatching header and data column length
    • PUBDEV-1509: Split frame : Big datasets : On 186K rows 3200 Cols split frame took 40 mins => which is too long
    • PUBDEV-1438: Column naming can create duplicate column names
    • PUBDEV-1105: NPE in Rollupstats after failed parse
    • PUBDEV-1142: H2O parse: When cancel a parse job, key remains locked and hence unable to delete the file GitHub
    • GitHub: client mode deadlock issue resolution
    • PUBDEV-1670: Client mode fails consistently sometimes : GBM_offset_tweedie.R.out.txt :
    • GitHub: nbhm bug: K == TOMBSTONE not key == TOMBSTONE
    • GitHub: Pulls out a GAID from resource in jar if the GAID doesn't equal the default. Presumably the GAID has been changed by the jar baking program.
    Web UI
    • PUBDEV-872: Flows : Not able to load saved flows from hdfs/local GitHub
    • PUBDEV-554: Flow:Parse two different files simultaneously, flow should either complain or fill the additional (incompatible) rows with nas
    • PUBDEV-1527: missing .java extension when downloading pojo GitHub
    • PUBDEV-1642: Changing columns type takes column list back to first page of columns
    • PUBDEV-1508: Flow : Import file => Parse => Error compiling coffee-script Maximum call stack size exceeded
    • PUBDEV-1606: Flow :=> Cannot save flow on hdfs
    • PUBDEV-1527: missing .java extension when downloading pojo
    • PUBDEV-1653: Flow: the column names do not modify when user changes the dataset in model builder

    Shannon (3.0.0.26) - 7/4/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/26/index.html

    New Features

    Algorithms
    • PUBDEV-1592: Expose standardization shift/mult values in the Model output in R/Python. GitHub
    Python
    • GitHub: add h2o.shutdown to python client
    • GitHub: add h2o.hist and respective pyunit
    • GitHub: gbm weight pyunit (variable importances)
    R
    Web UI

    Enhancements

    Algorithms
    • PUBDEV-1494: GBM : Weights math correctness tests in R
    • PUBDEV-1523: GLM w tweedie: for attached data, R giving much better res dev than h2o
    • PUBDEV-1396: Offsets/Weights: Math correctness for GLM
    • PUBDEV-1496: RF : Weights Math correctness tests in R
    • HEXDEV-366: remove weights option from DRF and GBM in REST API, Python, R
    • PUBDEV-1553: Threshold in GLM is hardcoded to 0
    • GitHub: Make min_rows a double instead of int: Is now weighted number of observations (min_obs in R).
    • GitHub: Don't use sample weighted variance, but full weighted variance.
    • GitHub: Fix R^2 computation.
    • GitHub: Skip rows with missing response in weighted mean computation.
    • _binomial_double_trees disabled by default for DRF (was enabled).
    • GitHub: Relax tolerance.
    • HEXDEV-329 : Offset for GBM
    • HEXDEV-211 : Tweedie distributions for GLM
    API
    • PUBDEV-1491: generated REST API POJOS should be compiled and jar'd up as part of the build
    • GitHub: Change schema for PCA, SVD, and GLRM to version 99
    Python
    • GitHub: is factor returns TRUE/FALSE cast to scalar 1/0
    • GitHub: take a slightly different syntactic approach to dropping column
    • GitHub: better list comp in interaction call
    • GitHub: if weights_column argument is specified, attach the column to the training and/or validation frame (if not already specified as part of x/validation_x). if weights_column is not already part of x/validation_x, then a training_frame/validation_frame needs to be provided and the weights column is taken from here. respective pyunit added
    R
    • GitHub: better ref handling in the [<- for python and R
    • GitHub: Pass binomial_double_trees in the R wrapper for DRF.
    • GitHub: carefully format NAs and non NAs
    • GitHub: for loop over the x[[j]] to format NAs properly
    • GitHub: Added example to h2o-r/ensemble/create_h2o_wrappers.R
    System
    • GitHub: allow for no y in model_builder
    • GitHub: Enable auto-flag for Java6 generation.
    • GitHub: better compression in split frame
    • PUBDEV-1594: All basic file accessors in PersistHDFS should check file permissions
    • PUBDEV-1518: getFrames should show a Parse button for raw frames
    Web UI
    • PUBDEV-1545: Flow => Build model => ignored columns table => should have column width resizing based on column names width => looks odd if column names are short
    • PUBDEV-1546: Flow: Build model => Search for 1 column => select it => build model shows list of columns instead of 1 column
    • PUBDEV-1254: Flow: Add Impute

    Bug Fixes

    Algorithms
    • PUBDEV-1554: dl with offset: when offset same as response, do not get 0 mse
    • PUBDEV-1555: h2oR: dl with offset giving : Error in args$x_ignore : object of type 'closure' is not subsettable
    • PUBDEV-1487: gbm weights: give different terminal node predictions than R for attached data
    • PUBDEV-1569: Investigate effectiveness of _binomial_double_trees (DRF) GitHub
    • PUBDEV-1574: Actually pass 'binomial_double_trees' argument given to R wrapper to DRF.
    • PUBDEV-1444: DL: h2o.saveModel cannot save metrics when a deeplearning model has a validation_frame
    • PUBDEV-1579: GBM test time predictions without weights seem off when training with weights GitHub
    • PUBDEV-1533: GLM: doubled weights should produce the same result as doubling the observations GitHub
    • PUBDEV-1531: GLM: it appears that observations with 0 weights are not ignored, as they should be.
    • GitHub: Fix a bug in PCA scoring that was handling categorical NAs inconsistently
    • PUBDEV-1581: Regression 3060 fails on GLRM in R tests
    • PUBDEV-1586: change Grid endpoints and schemas to v99 since they are still in flux
    • PUBDEV-1589: GLM : build model => airlinesbillion dataset => IRLSM/LBFGS => fails with array index out of bound exception
    • PUBDEV-1607: gbm w offset: predict seems to be wrong
    • PUBDEV-1600: Frame name creation fails when file name contains csv or zip (not as extension)
    • PUBDEV-1577: DL predictions on test set require weights if trained with weights
    • PUBDEV-1598: Flow: After running pca when call get Model/ jobs get: Failed to find schema for version: 3 and type: PCA
    • PUBDEV-1576: Test variable importances for weights for GBM/DRF/DL
    • PUBDEV-1517: With R, deep learning autoencoder using all columns in frame, not just those specified in x parameter
    • PUBDEV-1593: dl var importance:there is a .missing(NA) variable in Dl variable importnce even when data has no nas
    Python
    • PUBDEV-1538: h2o.save_model fails on windoz due to path nonsense
    • GitHub: python leaked key check for Vecs, Chunks, and Frames
    • PUBDEV-1609: frame dimension mismatch between upload/import method
    R
    • PUBDEV-1601: h2o.loadModel() from hdfs
    • PUBDEV-1611: R CMD Check failing on : The Date field is over a month old.
    System
    • PUBDEV-1514: Large number of columns (~30000) on importFile (flow) is slow / unresponsive for long time
    • PUBDEV-841: Split frame : Flow should not show raw frames for SplitFrame dialog (water.exceptions.H2OIllegalArgumentException)
    • PUBDEV-1459: bug in GLM POJO: seems threshold for binary predictions is always 0
    • PUBDEV-1566: Cannot save model on windows since Key contains '@' (illegal character to path)
    • GitHub: Fixes the timezone lists.
    • GitHub: R CMD check fix for date
    • GitHub: add ec2 back into project
    Web UI
    • HEXDEV-54: Flow : Import file 100k.svm => Something went wrong while displaying page

    Shannon (3.0.0.25) - 6/25/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/25/index.html

    Enhancements

    API
    • PUBDEV-1452: branch 3.0.0.2 to REGRESSION_REST_API_3 and cherry-pick the /99/Rapids changes to it

    ##Web UI

    • PUBDEV-1545: Flow => Build model => ignored columns table => should have column width resizing based on column names width => looks odd if column names are short
    • PUBDEV-1546: Flow : Build model => Search for 1 column => select it => build model shows list of columns instead of 1 column

    Bug Fixes

    The following changes are to resolve incorrect software behavior:

    Algorithms
    • PUBDEV-1487: gbm weights: give different terminal node predictions than R for attached data
    • GitHub: Fix offset for DL.
    • GitHub: Gracefully handle 0 weight for GBM.
    Python
    • PUBDEV-1547: Weights API: weights column not found in python client
    R
    • GitHub: Fix R wrapper for DL for weights/offset.
    Web UI
    • PUBDEV-1528: Flow model builder: the na filter does not select all ignored columns; just the first 100.

    Shannon (3.0.0.24) - 6/25/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/24/index.html

    New Features

    Algorithms
    • GitHub: Allow validation for unsupervised models.
    R
    • GitHub: Added runit GBM weights
    • GitHub: Updated runit_GBM_weights.R
    Python
    • GitHub: add h2o.set_timezone h2o.get_timezone and h2o.list_timezones to python client and respective pyunit.
    • GitHub: add h2o.save_model and h2o.load_model to python client and respective pyunit

    Enhancements

    Algorithms
    • GitHub: Skip rows with weight 0.
    • GitHub: x_ignore must be set when autoencoder is TRUE
    System
    • GitHub: Fix Java bindings generator to generate code under project's location.
    • GitHub: Adds input parameter check to ParseSetup.

    Bug Fixes

    Algorithms
    • PUBDEV-1529: dl with ae: get ava.lang.UnsupportedOperationException: Trying to predict with an unstable model.
    • GitHub: Bring back accidentally removed hiding of classification-related fields for unsupervised models.
    API
    • PUBDEV-1456: fix REST API POJO generation for enums, + java.util.map import

    Shannon (3.0.0.23) - 6/19/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/23/index.html

    New Features

    Algorithms
    API
    • PUBDEV-61: do back-end work to allow document navigation from one Schema to another
    • PUBDEV-133: doing summary means calling it with each columns name, index not supported?
    Python
    • GitHub: add num_iterations accessor to python client and respective pyunit
    • GitHub: add score_history accessor to python client and respective pyunit
    • GitHub: add hit ratio table accessor to python interface and respective pyunit
    • GitHub: add h2o.naivebayes and respective pyunits
    • GitHub: add h2o.prcomp and respective pyunits.
    • PUBDEV-681: Add user-given input weight parameters to Python
    • GitHub: add h2o.create_frame to python client and respective pyunit
    • GitHub: add h2o.interaction and respective pyunit
    • GitHub: add h2o.strplit to python client and respective pyunit
    • GitHub: add h2o.toupper and h2o.tolower to python client and respective pyunit
    • GitHub: add h2o.sub and h2o.gsub to python interface and respective pyunit
    • GitHub: add h2o.trim() to python client and respective pyunit
    • GitHub: add h2o.rep_len to python client and respective pyunit
    • GitHub: add h2o.svd to python client and respective golden pyunit
    • GitHub: add scree plot functionality to python client and respective pyunit
    • GitHub: add plotting functionality to python client and respective pyunit
    R
    • GitHub: added h2o.weights and h2o.biases accessors to R client and update respective runit
    • GitHub: add h2o.centroid_stats to R client and respective runit
    • PUBDEV-680: Add user-given input weight parameters to R
    • GitHub: Add offset/weights to DRF/GBM R wrappers.
    Web UI

    Enhancements

    Algorithms
    • PUBDEV-676: Use the user-given weight Vec as observation weights for all algos
    • GitHub: Refactor the code to let the caller compute the weighted sigma.
    • GitHub: Modify prior class distribution to be computed from weighted response.
    • GitHub: Put back the defaultThreshold that's based on training/validation metrics. Was accidentally removed together with SupervisedModel.
    • GitHub: Always sample to at least #class labels when doing stratified sampling.
    • GitHub: Cutout for NAs in GLM score0(data[],...), same as for score0(Chunk[],…)
    R
    • PUBDEV-856: All h2o things in R should have an h2o.something version so it's unambiguous GitHub
    • GitHub: export clusterIsUp and clusterInfo commands
    • GitHub: update accessors in the shim
    • GitHub: gbm with async exec
    System
    • HEXDEV-361: Wide frame handling for model builders
    • GitHub: Remove application plugin from assembly to speedup build process.
    • GitHub: add byteSize to ls
    • GitHub: option to launch randomForest async
    • GitHub: Return HDFS persist manager for URIs starting with s3n and s3a
    • GitHub: quote strings when writing to disk

    Bug Fixes

    Algorithms
    • PUBDEV-1217: pca: when cancel the job the key remains locked
    • PUBDEV-1468: Error in GBM if response column is constant GitHub
    • PUBDEV-1476: dl with obs weights: nas in weights cause 'java.lang.AssertionError GitHub
    • PUBDEV-1458: pca: data with nas, v2 vs v3 slightly different results GitHub
    • PUBDEV-1477: dl w/obs wts: when all wts are zero, get java.lang.AssertionError GitHub
    • GitHub: Fix check for offset (allow offset for logistic regression).
    • GitHub: Gracefully handle exception when launching single-node DRF/GBM in client mode.
    • GitHub: Hack around the fact that hasWeights()/hasOffset() isn't available on remote nodes and that SharedTree is sent to remote nodes and its private internal classes need access to the above methods...
    • GitHub: Fix scoring when NAs are predicted.
    Python
    • PUBDEV-1469: pyunit_citi_bike_large.py : test failing consistently on regression jobs
    • PUBDEV-1472: Regression job : Pyunit small tests groupie and pub_444_spaces failing consistently
    • PUBDEV-1372: Regression of pyunit_small, Groupby.py
    • PUBDEV-1386: intermittent fail in pyunit_citi_bike_small.py: -Unimplemented- failed lookup on token
    • PUBDEV-1471: pyunit_citi_bike_small.py : failing consistently on regression jobs
    • PUBDEV-1466: matplotlib.pyplot import failure on MASTER jenkins pyunit small jobs GitHub
    • GitHub: minor fix to python's h2o.create_frame
    • GitHub: update the path to jar in connection.py
    R
    • PUBDEV-1475: Client mode failed tests : runit_GBM_one_node.R, runit_RF_one_node.R, runit_v_3_apply.R, runit_v_4_createfunctions.R GitHub
    • PUBDEV-1235: Split Frame causes AIOOBE on Chicago crimes data GitHub
    • PUBDEV-746: runit_demo_NOPASS_h2o_impute_R : h2o.impute() is missing. seems like we want that?
    • PUBDEV-582: H2O-R- does not give the full column summary
    • PUBDEV-1473: Regression : Runit small jobs failing on tests :
    • PUBDEV-741: runit_NOPASS_pub-668 R tests uses all() ...h2o says all is unimplemented
    • PUBDEV-1506: R: h2o.ls() needs to return data sizes
    • PUBDEV-1436: Intermitent runit fail : runit_GBM_ecology.R GitHub
    • PUBDEV-1464: R: toupper/tolower don't work GitHub GitHub
    • PUBDEV-1194: R: dataset is imported but can't return head of frame
    Sparkling Water
    • PUBDEV-975: Download page for Sparkling Water should point to the right R-client and Python client
    • PUBDEV-1428: Sparkling water => Flow => Million song/KDD Cup path issues GitHub
    Web UI
    • PUBDEV-1433: Flow UI: Change Help > FAQ link to h2o-docs/index.html#FAQ

    Shannon (3.0.0.22) - 6/13/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/22/index.html

    #New Features

    ##API

    • PUBDEV-633: Generate Java bindings for REST API: POJOs for the entities (schemas)

    ##Python

    • GitHub: added h2o.anyfactor() and respective pyunit
    • GitHub: add h2o.scale and respective pyunit
    • GitHub: added levels, nlevels, setLevel and setLevels and respective pyunit...PUBDEV-1434 PUBDEV-1437 PUBDEV-1434 PUBDEV-1345 PUBDEV-1311
    • GitHub: add H2OFrame.as_date and pyunit addition. H2OFrame.setLevel should return a H2OFrame not a H2OVec.

    #Enhancements

    ##Algorithms

    • GitHub: Add _build_tree_one_node option to GBM

    ## API

    • HEXDEV-352: Additional attributes on /Frames and /Frames/foo/summary

    ##R

    • PUBDEV-706: Release h2o-dev to CRAN
    • Adding parameter parse_type to upload/import file (GitHub)

    ##Python

    • GitHub: print out where h2o jar is looked for
    • GitHub:add h2o.ls and respective pyunit

    ##System

    • PUBDEV-717: refector the duplicated code in FramesV2
    • PUBDEV-1281: Add horizontal pagination of frames to Flow GitHub
    • PUBDEV-607: Add Xmx reporting to GA
    • GitHub:Added support for Freezable[][][] in serialization (added addAAA to auto buffer and DocGen, DocGen will just throw H2O.fail())
    • GitHub: No longer set yyyy-MM-dd and dd-MMM-yy dates that precede the epoch to be NA. Negative time values are fine. This unifies these two time formats with the behavior of as.Date.
    • GitHub: Reduces the verbosity of parse tracing messages.
    • GitHub: Rename AUTO->GUESS for figuring out file type.

    ## Web UI

    • HEXDEV-276: Add frame pagination
    • PUBDEV-1405: Flow : Decision to be made on display of number of columns for wider datasets for Parse and Frame summary
    • PUBDEV-1404: Usability improvements
    • PUBDEV-244: "View Data" display may need to be modified/shortened.

    #Bug Fixes

    ##Algorithms

    • PUBDEV-1365: GLM: Buggy when likelihood equals infinity
    • PUBDEV-1394: GLM: Some offsets hang
    • PUBDEV-1268: GLM: get java.lang.AssertionError at hex.glm.GLM$GLMSingleLambdaTsk.compute2 for attached data
    • PUBDEV-1403: pca: h2o-3 reporting incorrect proportion of variance and cum prop GitHub
    • HEXDEV-281: GLM - beta constraints with categorical variables fails with AIOOB
    • HEXDEV-280: GLM - gradient not within tolerance when specifying beta_constraints w/ and w/o prior values

    ## Python

    ## R

    ## System

    • PUBDEV-1423: Phantomjs : Add timeout command line option
    • PUBDEV-1401: Flow : Import file 15 M Rows 2.2K cols=> Parse these files => Change first column type => Unknown => Try to change other columns => Kind of hangs
    • PUBDEV-1406: make the ParseSetup / Parse API more efficient for high column counts GitHub

    Shannon (3.0.0.21) - 6/12/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/21/index.html

    New Features

    Python
    • HEXDEV-29: The ability to define features as categorical or continuous in the web UI and in the python API

    Enhancements

    Algorithms
    • GitHub Made intercept option public and added it to field list in parameter schema
    • GitHub GLM: Updated null model intercept fit.
    • GitHub GLM: Updated null-model constant term fitting when running with offset
    • GitHub glm update
    • GitHub DL code refactoring to reduce file sizes
    Python
    • GitHub add h2o.round() and h2o.signif() and additional pyunit checks
    • GitHub add h2o.all() and respective pyunit checks
    R
    • GitHub added intercept option top R
    System
    Web UI
    • GitHub Add horizontal pagination of /Frames to handle UI navigation of wide datasets more efficiently.
    • GitHub Only show the top 7 metrics for the max metrics table
    • GitHub Make the max metrics table entries be called max f1 etc.

    Bug Fixes

    The following changes are to resolve incorrect software behavior:

    Algorithms
    • PUBDEV-1365: GLM: Buggy when likelihood equals infinity GitHub
    • PUBDEV-1394: GLM: Some offsets hang
    • PUBDEV-1268: GLM: get java.lang.AssertionError at hex.glm.GLM$GLMSingleLambdaTsk.compute2 for attached data
    • PUBDEV-1382: pca: giving wrong std- dev for mentioned data
    • PUBDEV-1383: pca: std dev numbers differ for v2 and v3 for attached data GitHub
    • PUBDEV-1381: GBM, RF: get an NPE when run with a validation set with no response GitHub
    • GitHub GLM fix - fixed fitting of null model constant term
    • GitHub Fix remote bug
    • GitHub Remove elastic averaging parameters from Flow.
    • PUBDEV-1398: pca: predictions on the attached data from v2 and v3 differ
    Python
    R
    • PUBDEV-761: Save model and restore model (from R)
    • PUBDEV-1236: h2o-r/tests/testdir_misc/runit_mergecat.R failure (client mode only)
    System
    • PUBDEV-1402: move Rapids to /99 since it's going to be in flux for a while GitHub
    • GitHub Fixes an operator precedence issue, and replaces debug GA target with actual one.
    • GitHub Fix log download bug where all nodes were getting the same zip file.

    Shannon (3.0.0.18) - 6/9/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/18/index.html

    New Features

    System
    Python
    • GitHub: Added --h2ojar option

    Enhancements

    Python
    • PUBDEV-277: Make python equivalent of as.h2o() work for numpy array and pandas arrays

    Bug Fixes

    Algorithms
    • PUBDEV-1371: pca: get java.lang.AssertionError at hex.svd.SVD$SVDDriver.compute2(SVD.java:198)
    • PUBDEV-1376: pca: predictions from h2o-3 and h2o-2 differs for attached data
    • PUBDEV-1380: DL: when try to access the training frame from the link in the dl model get: Object not found
    R

    Shannon (3.0.0.17) - 6/8/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/17/index.html

    New Features

    Algorithms
    Python
    • PUBDEV-1270: Python Interface needs H2O Cut Function GitHub
    • PUBDEV-1242: Need equivalent of as.Date feature in Python GitHub
    • PUBDEV-1165: H2O Python needs Modulus Operations
    • HEXDEV-29: The ability to define features as categorical or continuous in the web UI and in the python API
    • PUBDEV-1237: environment variable to disable the strict version check in the R and Python bindings
    Web UI
    • PUBDEV-1175: Flow: Good interactive confusion matrix for binomial
    • PUBDEV-1176: Flow: Good confusion matrix for multinomial

    Enhancements

    Algorithms
    • GitHub: GLM weights fix: regularize by sum of weights rather than number of observations
    • GitHub: GLM fix: added line search (and limited number of iterations) to constant term model fitting with offset (could enter infinite loop)
    • GitHub: No longer warn if binomial_double_trees option is enabled for _nclass!=2
    • GitHub: Fix CM table to have integer entries unless there are real-valued entries
    • GitHub: Add extra assertion for train_samples_per_iteration
    • GitHub: Update model during runtime of algorithm.
    • GitHub: Changes to glm forloop to add offsets and add NOPASS/NOFEATURE functionality back to run.py
    R
    • GitHub: month was off by one, runit test edited
    • GitHub: Comments to clarify the policy on dates in H2O.
    System
    • HEXDEV-344: Logs should include JVM launch parameters
    Web UI
    • PUBDEV-467: Show Frames for DL weights/biases in Flow
    • PUBDEV-1221: add a "I like this" style button with LinkedIn or Github (beside the Flow Assist Me button)
    • PUBDEV-1245: Flow: use new _exclude_fields query parameter to speed up REST API usage

    Bug Fixes

    Algorithms
    • PUBDEV-1353: GLM: model with weights different in R than in H2o for attached data
    • PUBDEV-1358: GLM: when run with -ive weights, would be good to tell the user that -ive weights not allowed instead of throwing exception
    • PUBDEV-1264: GLM: reporting incorrect null deviance GitHub
    • PUBDEV-1362: GLM: when run with weights and offset get wrong ans
    • PUBDEV-1263: GLM: name ordering for the coefficients is incorrect GitHub
    • PUBDEV-1261: pca: wrong std dev for data with nas rest numeric cols GitHub
    • PUBDEV-1218: pca: progress bar not showing progress just the initial and final progress status GitHub
    • PUBDEV-1204: pca: from flow when try to invoke build model, displays-ERROR FETCHING INITIAL MODEL BUILDER STATE
    • PUBDEV-1212: pca: with enum column reporting (some junk) wrong stdev/ rotation GitHub
    • PUBDEV-1228: pca: no std dev getting reported for attached data
    • PUBDEV-1233: pca: std dev for attached data differ when run on h2o-3 and h2o-2
    • PUBDEV-1258: h2o.glm with offset column: get Error in .h2o.startModelJob(conn, algo, params) : Offset column 'logInsured' not found in the training frame.
    R
    Sparkling Water
    System
    • PUBDEV-1288: Confusion Matrix: class java.lang.ArrayIndexOutOfBoundsException', with msg '2' java.lang.ArrayIndexOutOfBoundsException: 2 at hex.ConfusionMatrix.createConfusionMatrixHeader Github
    • HEXDEV-323: SVMLight Parse Bug GitHub
    • PUBDEV-1207: implement JSON field-filtering features: _exclude_fields
    • GitHub: Fix a missing field update in Job.
    • PUBDEV-65: Handling of strings columns in summary is broken
    • PUBDEV-1230: Parse: get AIOOB when parses the attached file with first two cols as enum while h2o-2 does fine
    • PUBDEV-1377: Get AIOOBE when parsing a file with fewer column names than columns GitHub
    • PUBDEV-1364: Variable importance Object
    Web UI
    • PUBDEV-1198: Flow: Selecting "Cancel" for "Load Notebook" prompt clears current notebook anyway
    • PUBDEV-1172: Model builder takes forever to load the column names in Flow, hence cannot build any models
    • PUBDEV-1248: Flow GLM: from Flow the drop down with column names does not show up and hence not able to select the offset column
    • PUBDEV-1380: DL: when try to access the training frame from the link in the dl model get: Object not found GitHub

    Shannon (3.0.0.13) - 5/30/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/13/index.html

    New Features

    Algorithms
    Python
    R

    Enhancements

    Algorithms
    API
    • PUBDEV-669: have the /Frames/{key}/summary API call Vec.startRollupStats
    R/Python
    • PUBDEV-479: Port MissingInserter to R/Python
    • PUBDEV-632: Display TwoDimTable of HitRatios in R/Python
    • github: minor change to h2o.demo()
    • github: add h2o.demo() facility to python package, along with some built-in (small) data
    • github: remove cols param

    Bug Fixes

    Algorithms
    • PUBDEV-1211: pca: descaled pca, std dev seems to be wrong for attached data github
    • PUBDEV-1213: pca: would be good to have the std dev numbered bec difficult to relate to the principal components (github)
    • PUBDEV-1201: pca: get ArrayIndexOutOfBoundsException (github)
    • PUBDEV-1203: pca: giving wrong std dev/rotation-labels for iris with species as enum (github)
    • PUBDEV-1199: DL with <1 epochs has wrong initial estimated time (github)
    • github: Fix missing AUC for training data in DL.
    • github: Add the seed back to GBM imbalanced test (was set to 0 by default before, now explicit)
    R
    • PUBDEV-1189: R: h2o.hist broken for breaks that is a list of the break intervals (github)
    • PUBDEV-1206: Frame summary from R and Python need to use the Frame summary endpoint (github)
    • PUBDEV-1177: R summary() is slow when large number of columns
    • PUBDEV-1097: R: R should be able to take a of paths similar to how python does

    Shannon (3.0.0.11) - 5/22/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/11/index.html

    Enhancements

    Algorithms
    • PUBDEV-1179: DRF: investigate if larger seeds giving better models
    • PUBDEV-1178: Add logloss/AUC/Error to GBM/DRF Logs & ScoringHistory
    • PUBDEV-1169: Use only 1 tree for DRF binomial (github)
    • PUBDEV-1170: Wrong ROC is shown for DRF (Training ROC, even though Validation is given)
    • PUBDEV-1162: Speed up sorting of histograms with O(N log N) instead of O(N^2)
    System

    Bug Fixes

    Algorithms
    • HEXDEV-253: model output consistency
    • HEXDEV-319: DRF in h2o 3.0 is worse than in h2o 2.0 for Airline
    • PUBDEV-1180: DRF has wrong training metrics when validation is given
    API
    • PUBDEV-501: H2OPredict: does not complain when you build a model with one dataset and predict on completely different dataset
    Python
    • PUBDEV-1183: Python version check should fail hard by default
    • PUBDEV-1185: Python binding version mismatch check should fail hard and be on by default
    • HEXDEV-138: Port Python tests for Deep Learning

    ##R

    • PUBDEV-1160: R: h2o.hist doesn't support breaks argument
    • PUBDEV-1159: R: h2o.hist takes too long to run
    • PUBDEV-1150: R CMD Check: URLs not working
    • PUBDEV-1149: R CMD check not happy with our use of .OnAttach
    • PUBDEV-1174: R: h2o.hist FD implementation broken
    • PUBDEV-1167: R: h2o.group_by broken
    • HEXDEV-318: the fix to H2O startup for the host unreachable from R causes a security hole
    • PUBDEV-1187: FramesHandler.summary() needs to run summary on all Vecs concurrently.
    System
    • PUBDEV-862: Building a model without training file -> NPE
    • HEXDEV-315: importFile fails: Error in fromJSON(txt, ...) : unexpected character: A
    • PUBDEV-1137: Parse: upload and import gives different chunk compression on the same file
    • PUBDEV-1054: Parse: h2o parses arff file incorrectly
    • PUBDEV-1181: Rapids should queue and block on the back-end to prevent overlapping calls
    • PUBDEV-1184: importFile fails for paths containing spaces
    Web UI
    • PUBDEV-1182: Flow: when upload file fails, the control does not come back to the flow screen, and have to refresh the whole page to get it back
    • PUBDEV-1131: GBM crashes after calling getJobs in Flow

    Shannon (3.0.0.7) - 5/18/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/7/index.html

    Enhancements

    API
    • PUBDEV-711: take a final look at all REST API parameter names and help strings
    • PUBDEV-757: Rename DocsV1 + DocsHandler to MetadataV1 + MetadataHandler
    • PUBDEV-1138: Performance improvements for big data sets => getModels
    • PUBDEV-1126: Performance improvements for big data sets => Get frame summary
    System
    • HEXDEV-316: ImportFiles should not download files from HTTP
    Web UI

    Bug Fixes

    The following changes are to resolve incorrect software behavior:

    API
    • PUBDEV-501: H2OPredict: does not complain when you build a model with one dataset and predict on completely different dataset
    • PUBDEV-1047: API : Get frames and Build model => takes long time to get frames
    • HEXDEV-149: Allow JobsV3 to return properly typed jobs, not always instances of JobV3
    • PUBDEV-1036: rename straggler V2 schemas to V3
    R
    System
    • PUBDEV-1034: Windows 7/8/2012 Multicast Error UDP
    • PUBDEV-862: Building a model without training file -> NPE
    • HEXDEV-253: model output consistency
    • PUBDEV-1135: While predicting get:class water.fvec.RollupStats$ComputeRollupsTask; class java.lang.ArrayIndexOutOfBoundsException: 5
    • PUBDEV-1090: POJO: Models with "." in key name (ex. pros.glm) can't access pojo endpoint
    • PUBDEV-1077: Getting an IcedHashMap warning from H2O startup
    Web UI
    • PUBDEV-1133: getModels in Flow returns error
    • PUBDEV-926: Flow: When user hits build model without specifying the training frame, it would be good if Flow guides the user. It presently shows an NPE msg
    • PUBDEV-1131: GBM crashes after calling getJobs in Flow

    Shannon (3.0.0.2) - 5/15/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/2/index.html

    New Features

    ModelMetrics
    WebUI
    • PUBDEV-942: ModelMetrics by model category - Autoencoder

    Enhancements

    Algorithms
    • github: GLM update: skip lambda max during lambda search
    • github: removed higher accuracy option
    • github: Rename constant col parameter
    • github: GLM update: added stopping criteria to lbfgs, tweaked some internal constants in ADMM
    • github: Add support for ignore_const_col in DL
    Python
    • PUBDEV-852: Binomial: show per-metric-optimal CM and per-threshold CM in Python
    • github: add filterNACols to python
    • github: h2o.delete replaced with h2o.removeFrameShallow
    • github: Add distribution summary to Python
    R
    • github: add filterNACols to R
    • github: explicitly set cols=TRUE for R style str on frames
    • github: enable faster str, bulk nlevels, bulk levels, bulk is.factor
    • github: Add optional blocking parameter to h2o.uploadFile
    System
    • PUBDEV-672 HTML version of the REST API docs should be available on the website
    • PUBDEV-827: class GenModel duplicates part of code of Model
    Web UI
    • HEXDEV-181 Flow: Handle deep features prediction input and output
    • github: removed use_all_factor_levels from glm flows

    Bug Fixes

    Algorithms
    • HEXDEV-302: AIOOBE during Prediction with DL github
    • github: glm fix: don't force in null model for lambda search with user given list of lambdas
    • github: Fix domain in glm scoring output for binomial
    • github: GLM Fix - fix degrees of freedom when running without intercept (+/-1)
    • github: GLM fix: make valid data info be clone of train data info (needs exactly the same categorical offsets, ignore unseen levels)
    • github: Fix glm scoring, fill in default domain {0,1} for binary columns when scoring
    R
    • PUBDEV-1116: R: Parse that works from flow doesn't work from R using as.h2o
    • PUBDEV-798: R: String Munging Functions Missing
    • PUBDEV-584: R: hist() doesn't currently work for H2O objects
    • PUBDEV-820: H2oR: model objects should return the CM when run classification like h2o1
    • PUBDEV-1113: Remove Keys : Parse => Remove => doesn't complete
    • PUBDEV-1102: R: h2o.rbind fails to join two dataset together
    • PUBDEV-899: R: all doesn't work
    • PUBDEV-555: H2O-R: str does not work
    • PUBDEV-1110: H2OR: while printing a gbm model object, get invalid format '%d'; use format %f, %e, %g or %a for numeric objects
    • PUBDEV-903: R: Errors from some rapids calls seem to fail to return an error
    • HEXDEV-311: Performance bug from R with Expect: 100-continue
    • PUBDEV-1030: h2o.performance: ignores the user specified threshold
    • PUBDEV-1071: R: regression models don't show in print statement r2 but it exists in the model object
    • PUBDEV-1072: R: missing accessors for glm specific fields
    • PUBDEV-1032: After running some R and py demos when invoke a build model from flow get- rollup stats problem vec deleted error
    • PUBDEV-1069: R: missing implementation for h2o.r2
    • PUBDEV-1064: Passing sep="," to h2o.importFile() fails with '400 Bad Request'
    • PUBDEV-1092: Get NPE while predicting
    System
    • PUBDEV-1091: S3 gzip parse failure
    • PUBDEV-1081: Probably want to cleanly disable multicast (not retry) and print suggestion message, if multicast not supported on picked multicast network interface
    • PUBDEV-1112: User has no way to specify whether to drop constant columns
    • PUBDEV-1109: Change all extdata imports to uploadFile
    • PUBDEV-1104: .gz file parse exception from local filesystem
    Web UI
    • PUBDEV-1134: getPredictions in Flow returns error
    • PUBDEV-1020: Flow : Drop NA Cols enable => Should automatically populate the ignored columns
    • PUBDEV-1041: Flow GLM: formatting needed for the model parameter listing in the model object github
    • PUBDEV-1108: Flow: When predict on data with no response get :Error processing POST /3/Predictions/models/gbm-a179db76-ba96-420f-a643-0e166aea3af3/frames/subset_1 'undefined' is not an object (evaluating 'prediction.model')

    H2O-Dev

    Shackleford (0.2.3.6) - 5/8/15

    Download at: http://h2o-release.s3.amazonaws.com/h2o-dev/rel-shackleford/6/index.html

    New Features

    Python

    ##Sparkling Water

    • Publish h2o-scala and h2o-app latest version to maven central (PUBDEV-443)

    Enhancements

    Algorithms
    • Use AUC's default threshold for label-making for binomial classifiers predict() (PUBDEV-1063) (github)
    • GLM update (github)
    • Cleanup AUC2, make incremental version (github)
    • Name change: override_with_best_model -> overwrite_with_best_model (github)
    • Couple of GLM updates (github)
    • Disable _replicate_training_data for data that's larger than 10GB (github)
    • Added replicate_training_data param for DL (github)
    • Change a few kmeans output parameters so no longer dividing by nrows or num_clusters (github)
    • GLMValidation Updated auc computation (github)
    • Do not delete model metrics at end of GBM/DRF (github)
    API
    • Clean REST api for Parse (PUBDEV-993)
    • Removes is_valid, invalid_lines, and domains from REST api (github)
    • Annotate domains output field as expert level (github)
    Python