Permalink
Switch branches/tags
last_OK jenkins-tomk-hadoop-1 jenkins-tomas_jenkins-7 jenkins-tomas_jenkins-6 jenkins-tomas_jenkins-5 jenkins-tomas_jenkins-4 jenkins-tomas_jenkins-3 jenkins-tomas_jenkins-2 jenkins-tomas_jenkins-1 jenkins-sample-docs-3 jenkins-sample-docs-2 jenkins-sample-docs-1 jenkins-rel-wright-5 jenkins-rel-wright-4 jenkins-rel-wright-3 jenkins-rel-wright-2 jenkins-rel-wright-1 jenkins-rel-wolpert-11 jenkins-rel-wolpert-10 jenkins-rel-wolpert-9 jenkins-rel-wolpert-8 jenkins-rel-wolpert-7 jenkins-rel-wolpert-6 jenkins-rel-wolpert-5 jenkins-rel-wolpert-4 jenkins-rel-wolpert-3 jenkins-rel-wolpert-2 jenkins-rel-wolpert-1 jenkins-rel-wheeler-12 jenkins-rel-wheeler-11 jenkins-rel-wheeler-10 jenkins-rel-wheeler-9 jenkins-rel-wheeler-8 jenkins-rel-wheeler-7 jenkins-rel-wheeler-6 jenkins-rel-wheeler-5 jenkins-rel-wheeler-4 jenkins-rel-wheeler-3 jenkins-rel-wheeler-2 jenkins-rel-wheeler-1 jenkins-rel-weierstrass-7 jenkins-rel-weierstrass-6 jenkins-rel-weierstrass-5 jenkins-rel-weierstrass-4 jenkins-rel-weierstrass-3 jenkins-rel-weierstrass-2 jenkins-rel-weierstrass-1 jenkins-rel-vapnik-1 jenkins-rel-vajda-4 jenkins-rel-vajda-3 jenkins-rel-vajda-2 jenkins-rel-vajda-1 jenkins-rel-ueno-12 jenkins-rel-ueno-11 jenkins-rel-ueno-10 jenkins-rel-ueno-9 jenkins-rel-ueno-8 jenkins-rel-ueno-7 jenkins-rel-ueno-6 jenkins-rel-ueno-5 jenkins-rel-ueno-4 jenkins-rel-ueno-3 jenkins-rel-ueno-2 jenkins-rel-ueno-1 jenkins-rel-tverberg-6 jenkins-rel-tverberg-5 jenkins-rel-tverberg-4 jenkins-rel-tverberg-3 jenkins-rel-tverberg-2 jenkins-rel-tverberg-1 jenkins-rel-tutte-2 jenkins-rel-tutte-1 jenkins-rel-turnbull-2 jenkins-rel-turnbull-1 jenkins-rel-turing-10 jenkins-rel-turing-9 jenkins-rel-turing-8 jenkins-rel-turing-7 jenkins-rel-turing-6 jenkins-rel-turing-5 jenkins-rel-turing-4 jenkins-rel-turing-3 jenkins-rel-turing-2 jenkins-rel-turing-1 jenkins-rel-turin-4 jenkins-rel-turin-3 jenkins-rel-turin-2 jenkins-rel-turin-1 jenkins-rel-turchin-11 jenkins-rel-turchin-10 jenkins-rel-turchin-9 jenkins-rel-turchin-8 jenkins-rel-turchin-7 jenkins-rel-turchin-6 jenkins-rel-turchin-5 jenkins-rel-turchin-4 jenkins-rel-turchin-3 jenkins-rel-turchin-2 jenkins-rel-turchin-1 jenkins-rel-turan-4
Nothing to show
Find file Copy path
7707 lines (5863 sloc) 546 KB

Recent Changes

H2O

Wright (3.20.0.5) - 8/8/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/5/index.html

Bug

  • [PUBDEV-5543] - Hive smoke tests no longer time out on HDP.
  • [PUBDEV-5793] - AutoML now correctly ignores columns specified in Flow.
  • [PUBDEV-5794] - In Flow, the Import SQL Table button now works correctly.
  • [PUBDEV-5806] - XGBoost cross validation now works correctly.
  • [PUBDEV-5811] - Fixed an issue that caused AutoML to fail in Flow due to the keep_cross_validation_fold_assignment option.
  • [PUBDEV-5814] - Multinomial Stacked Ensemble no longer fails when either XGBoost or Naive Bayes is the base model.
  • [PUBDEV-5819] - Increased the client_disconnect_timeout value when ClientDisconnectCheckThread searches for connected clients.
  • [PUBDEV-5816] - Fixed an issue that caused XGBoost to generate the wrong metrics for multinomial cases.

Improvement

  • [PUBDEV-5813] - Added automated Flow test for AutoML.

Wright (3.20.0.4) - 7/31/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/4/index.html

Bug

  • [PUBDEV-5555] - In Flow, increased the height of the summary section for the column summary.
  • [PUBDEV-5720] - Cross-validation now works correctly in XGBoost.
  • [PUBDEV-5739] - Documentation for the MOJO predict functions (mojo_predict_pandas and mojo_predict_csv) is now available in the Python User Guide.
  • [PUBDEV-5744] - Regression comparison tests no longer fail between H2OXGBoost and native XGBoost.
  • [PUBDEV-5760] - GBM/DRF MOJO scoring no longer allocates unnecessary objects for each scored row.

New Feature

  • [PUBDEV-5736] - In GBM, added point estimation as a metric.

Task

Improvement

  • [PUBDEV-5429] - The h2o.importFile([List of Directory Paths]) function will now import all the files located in the specified folders.
  • [PUBDEV-5637] - Added Standard Error of Mean (SEM) to Partial Dependence Plots.
  • [PUBDEV-5718] - Added two new formatting options to hex.genmodel.tools.PrintMojo. The --decimalplaces (or -d) option allows you to set the number of places after the decimal point. The --fontsize (or -f) option allows you to set the fontsize. The default fontsize is 14.
  • [PUBDEV-5733] - Optimized the performance of ingesting large number of small Parquet files by using sequential parse.
  • [PUBDEV-5749] - Added support for weights in a calibration frame.
  • [PUBDEV-5752] - Added a new port_offset command. This parameter lets you specify the relationship of the API port ("web port") and the internal communication port. The previous implementation expected h2o port = api port + 1. Because there are assumptions in the code that the h2o port and API port can be derived from each other, we cannot fully decouple them. Instead, this new option lets the user specify an offset such that h2o port = api port + offset. This enables the user to move the communication port to a specific range, which can be firewalled.
  • [PUBDEV-5765] - Improved speed of ingesting data from HTTP/HTTPS data sources in standalone H2O.

Docs

  • [PUBDEV-5694] - The User Guide now specifies that XLS/XLSX files must be BIFF 8 format. Other formats are not supported.
  • [PUBDEV-5731] - Added to docs that when downloading MOJOs/POJOs, users must specify the entire path and not just the relative path.
  • [PUBDEV-5774] - Added documentation for the new port_offset command when starting H2O.

Wright (3.20.0.3) - 7/10/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/3/index.html

Bug

  • [PUBDEV-5353] - The `fold_column` option now works correctly in XGBoost.
  • [PUBDEV-5560] - Calling `describe` on empty H2O frame no longer results in an error in Python.
  • [PUBDEV-5576] - In XGBoost, when performing a grid search from Flow, the correct cross validation AUC score is now reported back.
  • [PUBDEV-5612] - Fixed an issue that cause XGBoost to fail with Tesla V100 drivers 70 and above and with CUDA 9.
  • [PUBDEV-5654] - H2O's XGBoost results no longer differ from native XGBoost when dmatrix_type="sparse".
  • [PUBDEV-5672] - In the R documentation, fixed the description for h2o.sum to state that this function indicates whether to return an H2O frame or one single aggregated sum.
  • [PUBDEV-5673] - H2O data import for Parquet files no longer fails on numeric decimalTypes.
  • [PUBDEV-5683] - Fixed an error that occurred when viewing the AutoML Leaderboard in Flow before the first model was completed.
  • [PUBDEV-5686] - When connecting to a Linux H2O Cluster from a Windows machine using Python, the `import_file()` function can now correctly locate the file on the Linux Server.
  • [PUBDEV-5692] - H2O now reports the project version in the logs.
  • [PUBDEV-5700] - In CoxPH, fixed an issue that caused training to fail to create JSON output when the dataset included too many features.
  • [PUBDEV-5707] - Users can now switch between edit and command modes on Scala cells.
  • [PUBDEV-5721] - Fixed an issue with the way that RMSE was calculated for cross-validated models.
  • [PUBDEV-5727] - In GLRM, fixed an issue that caused differences between the result of h2o.predict and MOJO predictions.

New Feature

  • [PUBDEV-5680] - Added a new `-report_hostname` flag that can be specified along with `-proxy` when starting H2O on Hadoop. When this flag is enabled, users can replace the IP address with the machine's host name when starting Flow.
  • [PUBDEV-5697] - Added support for the Amazon Redshift data warehouse.
  • [PUBDEV-5725] - Added support for CDH 5.9.

Task

  • [PUBDEV-5628] - Accessing secured (Kerberized) HDFS from a standalone H2O instance works correctly.
  • [PUBDEV-5656] - AutoML Python tests always use max models to avoid running out of time.
  • [PUBDEV-5682] - CoxPH now validates that a `stop_column` is specified. `stop_column` is a required parameter.
  • [PUBDEV-5688] - Fixed an issue that caused a GCS Exception to display when H2O was launched offline.

Improvement

  • [PUBDEV-5572] - In Flow, improved the display of the confusion matrix for multinomial cases.
  • [PUBDEV-5665] - Users will now see a Precision-Recall AUC when training binomial models.
  • [PUBDEV-5666] - Synchronous and Asynchronous Scala Cells are now allowed in H2O Flow.
  • [PUBDEV-5687] - H2O now autodetects string columns and skips them before calculating `groupby`. H2O also warns the user when this happens.

Docs

  • [PUBDEV-5424] - The h2o.mojo_predict_csv and h2o.mojo_predict_df functions now appear in the R HTML documentation.
  • [PUBDEV-5702] - In GLM, documented that the Poisson family uses the -log(maximum likelihood function) for deviance.
  • [PUBDEV-5710] - Fixed the R example in the "Replacing Values in a Frame" data munging topic. Columns and rows do not start at 0; R has a 1-based index.
  • [PUBDEV-5711] - Fixed the R example in the "Group By" data munging topic. Specify the "Month" column instead of the "NumberOfFlights" column when finding the number of flights in a given month based on origin.
  • [PUBDEV-5714] - Added the new `-report_hostname` flag to the list of Hadoop launch parameters.
  • [PUBDEV-5715] - Added Amazon Redshift to the list of supported JDBC drivers.
  • [PUBDEV-5726] - Added CDH 5.9 to the list of supported Hadoop platforms.

Wright (3.20.0.2) - 6/15/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/2/index.html

Bug

  • [PUBDEV-3950] - Fixed an issue that resulted in a null pointer exception for H2O ensembles.
  • [PUBDEV-5250] - In AutoML, ignored_columns are now passed in the API call when specifying both x and a fold_column in during training.
  • [PUBDEV-5622] - Fixed a bug in documentation that incorrectly referenced 'calibrate_frame' instead of 'calibration_frame'.
  • [PUBDEV-5629] - java -jar h2o.jar no longer fails on Java 7.
  • [PUBDEV-5634] - Fixed a typo in the AutoML pydocs for sort_metric.
  • [PUBDEV-5651] - Exported CoxPH functions in R.

Task

  • [PUBDEV-5621] - Added balance_classes, class_sampling_factors, and max_after_balance_size options to AutoML in Flow.

Improvement

  • [PUBDEV-3754] - Updated the project URL, bug reports link, and list of authors in the h2o R package DESCRIPTION file.
  • [PUBDEV-5542] - Update description of the h2o R package in the DESCRIPTION file.
  • [PUBDEV-5570] - AutoML now produces an error message when a response column is missing.
  • [PUBDEV-5623] - Fixed intermittent test failures for AutoML.
  • [PUBDEV-5625] - Removed frame metadata calculation from AutoML.
  • [PUBDEV-5635] - Removed the keep_cross_validation_models = False argument from the AutoML User Guide examples.
  • [PUBDEV-5636] - Users can now set a MAX_CM_CLASSES parameter to set a maximum number of confusion matrix classes.

Docs

  • [PUBDEV-5619] - Updated the AutoML screenshot in Flow to show the newly added parameters.

Wright (3.20.0.1) - 6/6/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/1/index.html

Bug

  • [PUBDEV-4299] - In Scala, the `new H2OFrame()` API no longer fails when using http/https URL-based data sources.
  • [PUBDEV-4865] - Fixed an issue that caused the Java client JVM to get stuck with a latch/lock leak on the server.
  • [PUBDEV-5342] - Fixed an issue that caused intermittent NPEs in AutoML.
  • [PUBDEV-5357] - In parse, each lock now includes the owner rather than locking with null.
  • [PUBDEV-5359] - LDAP documentation now contains the correct name of the Auth module.
  • [PUBDEV-5426] - h2o.jar no longer includes a Jetty 6 dependency.
  • [PUBDEV-5462] - `model_summary` is now available when running Stacked Ensembles in R.
  • [PUBDEV-5478] - XGBoost now correctly respects the H2O `nthreads` parameter.
  • [PUBDEV-5488] - Fixed an invalid invariant in the recall calculation.
  • [PUBDEV-5497] - h2o-genmodel.jar can now be loaded into Spark's spark.executor.extraClassPath.
  • [PUBDEV-5501] - AutoML now correctly detects the leaderboard frame in H2O Flow.
  • [PUBDEV-5524] - In XGBoost, fixed an issue that resulted in a "Check failed: param.max_depth < 16 Tree depth too large" error.
  • [PUBDEV-5551] - Zero decimal values and NAs are now represented correctly in XGBoost.
  • [PUBDEV-5552] - Response variable datatype checks are now extended to include TIME datatypes.
  • [PUBDEV-5598] - The `-proxy` argument is now available as part of the h2odriver.args file.
  • [PUBDEV-5605] - Fixed `stopping_metric` values in user guide. Abbreviated values should be specified using upperchase characters (for example, MSE, RMSE, etc.).
  • [PUBDEV-5610] - Proxy Mode of h2odriver now supports a notification file (specified with the `-notify` argument).
  • [PUBDEV-5611] - Jetty 9 no longer fails in h2odriver proxy mode.
  • [PUBDEV-5617] - Fixed an issue that caused h2o.predict to throw an exception in H2OCoxPH models with interactions with stratum.

New Feature

  • [PUBDEV-3901] - Added MOJO support in Python (via jar file).
  • [PUBDEV-4927] - Added the `sort_metric` argument to AutoML.
  • [PUBDEV-4939] - Users now have the option to save CV predictions and CV models in AutoML.
  • [PUBDEV-4968] - Added an `h2o.H2OFrame.rename` method to rename columns in Python.
  • [PUBDEV-4991] - MOJO and POJO support are now available for AutoML.
  • [PUBDEV-5019] - Added support for the Cox Proportional Hazard (CoxPH) algorithm. Note that this is currently available in R and Flow only. It is not yet available in Python.
  • [PUBDEV-5177] - Added h2o.get_automl()/h2o.getAutoML function to R/Python APIs.
  • [PUBDEV-5377] - Added the `balance_classes`, `class_sampling_factors`, and max_after_balance_size` arguments to AutoML.
  • [PUBDEV-5408] - When running GLM in Flow, users can now see the InteractionPairs option.
  • [PUBDEV-5424] - Added support for MOJO scoring on a CSV or data frame in R.
  • [PUBDEV-5452] - Added an "export model as MOJO" button to Flow for supported algorithms.
  • [PUBDEV-5520] - Added support for XGBoost MOJO deployment on Windows 10.
  • [PUBDEV-5529] - GBM and DRF MOJOs and POJOs now return leaf node assignments.
  • [PUBDEV-5599] - Added the `sort_metric` option to AutoML in Flow.
  • [PUBDEV-5600] - keep_cross_validation_predictions and keep_cross_validation_models are now available when running AutoML in Flow.
  • [PUBDEV-5615] - Deep Learning MOJO now extends Serializable.

Story

  • [PUBDEV-5398] - In CoxPH, when a categorical column is only used for a numerical-categorical interaction, the algorithm will enforce useAllFactorLevels for that interaction.

Task

  • [PUBDEV-4570] - When running AutoML and XGBoost, fixed an issue that caused the adapting test frame to be different than the train frame.
  • [PUBDEV-4826] - Removed Domain length check for Stacked Ensembles.
  • [PUBDEV-5058] - GLRM predict no longer generates different outputs when performing predictions on training and testing dataframes.
  • [PUBDEV-5368] - Added support for ingesting data from Hive2 using SQLManager (JDBC interface). Note that this is experimental and is not yet suitable for large datasets.

Improvement

  • [PUBDEV-4375] - Replaced the Jama SVD computation in PCA with netlib-java library MTJ.
  • [PUBDEV-4447] - Upgraded Jetty to Jetty 9.
  • [PUBDEV-4518] - Created more tests in AutoML to ensure that all fold_assignment values and fold_column work correctly.
  • [PUBDEV-4571] - Fixed an NPE the occurred when clicking on View button while running AutoML.
  • [PUBDEV-4581] - Bundled Windows XGboost libraries.
  • [PUBDEV-4618] - Search-based models are no longer duplicated when AutoML is run again on the same dataset with the same seed.
  • [PUBDEV-4718] - When running Stacked Ensembles in R, added support for a vector of base_models in addition to a list.
  • [PUBDEV-4956] - Added support for Java 9.
  • [PUBDEV-5388] - Fixed an issue that resulted in an additional progress bar when running h2o.automl() in R.
  • [PUBDEV-5411] - Fixed an issue that resulted in an additional progress bar when running AutoML in Python.
  • [PUBDEV-5440] - The runint_automl_args.R test now always builds at least 2 models.
  • [PUBDEV-5459] - Improved XGBoost speed by not recreating DMatrix in each iteration (during training).
  • [PUBDEV-5476] - `offset_column` is now exposed in EasyPredictModelWrapper.
  • [PUBDEV-5477] - Improved single node XGBoost performance.
  • [PUBDEV-5486] - Added support for pip 10.0.0.
  • [PUBDEV-5495] - In GLM, gamma distribution with 0's in the response results in an improved message: "Response value for gamma distribution must be greater than 0."
  • [PUBDEV-5499] - Added metrics to AutoML leaderboard. Binomial models now also show mean_per_class_error, rmse, and mse. Multinomial problems now also show logloss, rmse and mse. Regression models now also show mse.
  • [PUBDEV-5533] - Exposed `model dump` in XGBoost MOJOs.
  • [PUBDEV-5538] - Improved rebalance for Frames.
  • [PUBDEV-5553] - Introduced the precise memory allocation algorithm for XGBoost sparse matrices.
  • [PUBDEV-5577] - Improved SSL documentation.
  • [PUBDEV-5601] - The Exclude Algorithms section in Flow AutoML is now always visible, even if you have not yet selected a training frame.
  • [PUBDEV-5606] - Removes unused parameters, fields, and methods from AutoML. Also exposed buildSpec in the AutoML REST API.

Docs

  • [PUBDEV-4977] - Updated documentation to indicate support for Java 9.
  • [PUBDEV-5154] - Added the new `pca_impl` parameter to PCA section of the user guide.
  • [PUBDEV-5164] - Added a Checkpointing Models section to the User Guide. This describes how checkpointing works for each supported algorithm.
  • [PUBDEV-5401] - In the "Getting Data into H2O" section, added a link to the new Hive JDBC demo.
  • [PUBDEV-5407] - The Import File example now also shows how to import from HDFS.
  • [PUBDEV-5436] - Fixed markdown headings in the example Flows.
  • [PUBDEV-5474] - All installation examples use H2O version 3.20.0.1.
  • [PUBDEV-5494] - Added a "Data Manipulation" topic for target encoding in R.
  • [PUBDEV-5496] - Added new keep_cross_validation_models and keep_cross_validation_predictions options to the AutoML documentation.
  • [PUBDEV-5509] - Added an example of using XGBoost MOJO with Maven.
  • [PUBDEV-5513] - In the XGBoost chapter, added information describing how to disable XGBoost.
  • [PUBDEV-5554] - When running XGBoost on Hadoop, added a note that users should set -extramempercent to a much higher value.
  • [PUBDEV-5579] - Added a section for the CoxPH (Cox Proportional Hazards) algorithm.
  • [PUBDEV-5581] - Added a topic describing how to install H2O-3 from the Google Cloud Platform offering.

Wolpert (3.18.0.11) - 5/24/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/11/index.html

New Feature

  • [PUBDEV-5584] - Enabled Java 10 support for CRAN release.

Task

Wolpert (3.18.0.10) - 5/22/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/10/index.html

Bug

  • [PUBDEV-5558] - Fixed an issue for adding Double.NaN to IntAryVisitor via addValue().

Task

  • [PUBDEV-5559] - Removed all code that referenced Google Analytics.
  • [PUBDEV-5565] - Disabled version check in H2O-3.
  • [PUBDEV-5567] - Removed all Google Analytics references and code from Flow.
  • [PUBDEV-5568] - Removed all Google Analytics references and code from Documentation.

Docs

  • [PUBDEV-5545] - The Security chapter in the User Guide now describes how to enforce system-level command-line arguments in h2odriver when starting H2O.

Wolpert (3.18.0.9) - 5/11/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/9/index.html

Bug

  • [PUBDEV-5290] - Fixed an issue that caused distributed XGBoost to not be registered in the REST API
  • [PUBDEV-5325] - Fixed an issue that caused XGBoost to crash due "too many open files."
  • [PUBDEV-5444] - Frames are now rebalanced correctly on multinode clusters.
  • [PUBDEV-5464] - Fixed an issue that prevented H2O libraries to load in DBC.
  • [PUBDEV-5507] - Added more robust checks for Colorama version.
  • [PUBDEV-5510] - Added more robust checks for Colorama version in H2O Python client.
  • [PUBDEV-5518] - A response column is no longer required when performing Deep Learning grid search with autoencoder enabled.
  • [PUBDEV-5527] - Fixed a KeyV3 error message that incorrectly referenced KeyV1.
  • [PUBDEV-5544] - The external backend now stores sparse vector values correctly.

New Feature

  • [PUBDEV-5456] - Added a new rank_within_group_by function in R and Python for ranking groups and storing the ranks in a new column.

Improvement

  • [PUBDEV-5500] - Improved warning messages in AutoML.
  • [PUBDEV-5537] - System administrators can now create a configuration file with implicit arguments of h2odriver and use it to make sure the h2o cluster is started with proper security settings.

Wolpert (3.18.0.8) - 4/19/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/8/index.html

Task

Wolpert (3.18.0.7) - 4/14/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/7/index.html

Bug

  • [PUBDEV-5485] - Fixed a MOJO/POJO scoring issue caused by a serialization bug in EasyPredictModelWrapper.

Wolpert (3.18.0.6) - 4/13/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/6/index.html

Bug

  • [PUBDEV-5484] - In XGBoost, fixed a memory issue that caused training to fail even when running on small datasets.
  • [PUBDEV-5441] - When files have a Ctr-M character as part of data in the row and Ctr-M also signifies the end of line in that file, it is now parsed correctly.
  • [PUBDEV-5458] - H2O-3 no longer displays the server version in HTTP response headers.
  • [PUBDEV-5460] - Updated the Mockito library.

Task

  • [PUBDEV-5449] - Conda packages are now availabe on S3, enabling installation for users who cannot access anaconda.org.

Improvement

  • [PUBDEV-5473] - Added an offset to predictBinomial Easy wrapper.

Docs

  • [PUBDEV-5227] - Updated the AutoML chapter of the User Guide to include a link to H2O World AutoML Tutorials and updated code examples that do not use leaderboard_frame.
  • [PUBDEV-5457] - Fixed links to POJO/MOJO tutorials in the GBM FAQ > Scoring section.

Wolpert (3.18.0.5) - 3/28/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/5/index.html

Bug

  • [PUBDEV-4933] - AutoML no longer trains a Stacked Ensemble with only one model.
  • [PUBDEV-5028] - GBM and GLM grids no longer fail in AutoML for multinomial problems.
  • [PUBDEV-5266] - Users can now merge/sort frames that contain string columns.
  • [PUBDEV-5303] - Fixed an issue that occured with multinomial GLM POJO/MOJO models.
  • [PUBDEV-5334] - Users can no longer specify a value of 0 for the col_sample_rate_change_per_level parameter. The value for this parameter must be greater than 0 and <= 2.0.
  • [PUBDEV-5336] - The H2O-3 Python client no longer returns an incorrect answer when running a conditional statement.
  • [PUBDEV-5365] - Added support for CDH 5.14.
  • [PUBDEV-5366] - Fixed an issue that caused XGBoost to fail when running the airlines dataset on a single-node H2O cluster.
  • [PUBDEV-5370] - The H2O-3 parser can now handle utf-8 characters that appear in the header.
  • [PUBDEV-5394] - The H2O-3 parser no longer treats the "Ctr-M" character as an end of line on Linux.
  • [PUBDEV-5414] - H2O no longer generates a warning when predicting without a weights column.

New Feature

  • [PUBDEV-5402] - The AutoML leaderboard no longer prints NaNs for non-US locales.

Task

  • [PUBDEV-5235] - Added a demo of XGBoost in Flow.
  • [PUBDEV-5386] - Improved the ordinal regression parameter optimization by changing the implementation.

Improvement

  • [PUBDEV-3978] - In Flow, improved the vertical scrolling for training and validation metrics for thresholds.
  • [PUBDEV-5364] - Added more logging regarding the WatchDog client.
  • [PUBDEV-5383] - Replaced unknownCategoricalLevelsSeenPerColumn with ErrorConsumer events in POJO log messages.
  • [PUBDEV-5400] - Improved the logic that triggers rebalance.
  • [PUBDEV-5404] - AutoML now uses correct datatypes in the AutoML leaderboard TwoDimTable.

Docs

  • [PUBDEV-5292] - Added ``beta constraints`` and ``prior`` entries to the Parameters Appendix, along with examples in R and Python.
  • [PUBDEV-5369] - Added CDH 5.14 to the list of supported Hadoop platforms in the User Guide.
  • [PUBDEV-5413] - Updated the documenation for the Ordinal ``family`` option in GLM based on the new implementation. Also added new solvers to the documenation: GRADIENT_DESCENT_LH and GRADIENT_DESCENT_SQERR.
  • [PUBDEV-5416] - Added information about Extremely Randomized Trees (XRT) to the DRF chapter in the User Guide.
  • [PUBDEV-5421] - On the H2O-3 and Sparkling Water download pages, the link to documentation site now points to the most updated version.
  • [PUBDEV-5432] - The ``target_encode_create`` and ``target_encode_apply`` are now included in the R HTML documentation.

Fault

  • [PUBDEV-5367] - Fixed an issue that caused SQLManager import to break on cluster with over 100 nodes.

Wolpert (3.18.0.4) - 3/8/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/4/index.html

  • Fixed minor release process issue preventing Sparkling Water release.

Wolpert (3.18.0.3) - 3/2/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/3/index.html

Bug

  • [PUBDEV-5102] - In Flow, the metalearner_fold_column option now correctly displays a drop-down of column names.
  • [PUBDEV-5282] - Fixed an issue that caused data import and building models fail when using Flow in IE 11.1944 on Windows 10 Enterprise.
  • [PUBDEV-5323] - Stacked Ensemble no longer fails when using a grid or list of GLMs as the base models.
  • [PUBDEV-5330] - Fixed an issue that caused an error when during Parquet data ingest.
  • [PUBDEV-5335] - In Random Forest, added back the distribution and offset_column options for backward compatibility. Note that these options are deprecated and will be ignored if used.
  • [PUBDEV-5339] - MOJO export to a file now works correctly.
  • [PUBDEV-5343] - Fixed an NPE that occurred when checking if a request is Xhr.

New Feature

  • [PUBDEV-5008] - Added support for ordinal regression in GLM. This is specified using the `family` option.
  • [PUBDEV-5274] - Added the exclude_algos option to AutoML in Flow.
  • [PUBDEV-5308] - Added a Leave-One-Out Target Encoding option to the R API. This can help improve supervised learning results when there are categorical predictors with high cardinality. Note that a similar function for Python will be available at a later date.
  • [PUBDEV-5324] - POJO now logs error messages for all incorrect data types and includes default values rather than NULL when a data type is unexpected.

Improvement

  • [PUBDEV-5344] - Moved AutoML to the top of the Model menu in Flow.

Docs

  • [PUBDEV-5306] - In the GLM chapter, added Ordinal to the list of `family` options. Also added Ologit, Oprobit, and Ologlog to the list of `link` options, which can be used with the Ordinal family.

Wolpert (3.18.0.2) - 2/20/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/2/index.html

Bug

  • [PUBDEV-5301] - Distributed XGBoost no longer fails silently when expanding a 4G dataset on a 1TB cluster.
  • [PUBDEV-5254] - Fixed an issue that caused GLM Multinomial to not work properly.
  • [PUBDEV-5278] - In XGBoost, when the first domain of a categorical is parseable as an Int, the remaining columns are not automatically assumed to also be parseable as an Int. As a result of this fix, the default value of categorical_encoding in XGBoost is now AUTO rather than label_encoder.
  • [PUBDEV-5294] - Fixed an issue that caused XGBoost models to fail to converge when an unknown decimal separator existed.
  • [PUBDEV-5326] - Fixed an issue in ParseTime that led to parse failing.

Docs

  • [PUBDEV-5313] - In the User Guide, the default value for categorical_encoding in XGBoost is now AUTO rather than label_encoder.

Wolpert (3.18.0.1) - 2/12/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/1/index.html

Bug

  • [PUBDEV-4585] - Fixed an issue that caused XGBoost binary save/load to fail.
  • [PUBDEV-4593] - Fixed an issue that caused a Levensthein Distance Normalization Error. Levenstein distance is now implemented directly into H2O.
  • [PUBDEV-5112] - The Word2Vec Python API for pretrained models no longer requires a training frame. In addition, a new `from_external` option was added, which creates a new H2OWord2vecEstimator based on an external model.
  • [PUBDEV-5128] - Fixed an issue that caused the show function of metrics base to fail to check for a key custom_metric_name and excepts.
  • [PUBDEV-5129] - The fold column in Kmeans is no longer required to be in x.
  • [PUBDEV-5130] - The date is now parsed correctly when parsed from H2O-R.
  • [PUBDEV-5133] - In Flow, the scoring history plot is now available for GLM models.
  • [PUBDEV-5135] - The Parquet parser no longer fails if one of the files to parse has no records.
  • [PUBDEV-5145] - Added error checking and logging on all the uses of `water.util.JSONUtils.parse().
  • [PUBDEV-5155] - In AutoML, fixed an exception in Python binding that occurred when the leaderboard was empty.
  • [PUBDEV-5156] - In AutoML, fixed an exception in R binding that occurred when the leaderboard was empty.
  • [PUBDEV-5159] - Removed Pandas dependency for AutoML in Python.
  • [PUBDEV-5167] - In PySparkling, reading Parquet/Orc data with time type now works correctly in H2O.
  • [PUBDEV-5174] - Fixed a maximum recursion depth error when using `isin` in the H2O Python client.
  • [PUBDEV-5175] - When running getJobs in Flow, fixed a ClassNotFoundException that occurred when AutoML jobs existed.
  • [PUBDEV-5179] - Fixed an issue that caused a list of columns to be truncated in PySparkling. Light endpoint now returns all columns.
  • [PUBDEV-5186] - In AutoML, fixed a deadlock issue that occurred when two AutoML runs came in the same second, resulting in matching timestamps.
  • [PUBDEV-5191] - The offset_column and distribution parameters are no longer available in Random Forest.
  • [PUBDEV-5195] - Fixed an issue in XGBoost that caused MOJOs to fail to work without manually adding the Commons Logging dependency.
  • [PUBDEV-5203] - Fixed an issue that caused XGBoost to mangle the domain levels for datasets that have string response domains.
  • [PUBDEV-5213] - In Flow, the separator drop down now shows 3-digit decimal values instead of 2.
  • [PUBDEV-5215] - Users can now specify interactions when running GLM in Flow.
  • [PUBDEV-5228] - FrameMetadate code no longer uses hardcoded keys. Also fixed an issue that caused AutoML to fail when multiple AutoMLs are run simultaneously.
  • [PUBDEV-5229] - A frame can potentially have a null key. If there is a Frame with a null key (just a container for vecs), H2O no longer attempts to track a null key.
  • [PUBDEV-5256] - Users can now successfully build an XGBoost model as compile chain. XGBoost no longer fails to provide the compatible artifact for an Oracle Linux environment.
  • [PUBDEV-5265] - GLM no longer fails when a categorical column exists in the dataset along with an empty value on at least one row.
  • [PUBDEV-5286] - Fixed an issue that cause GBM grid to fail on some datasets when specifying `sample_rate` in the grid.
  • [PUBDEV-5287] - The x argument is no longer required when performing a grid search.
  • [PUBDEV-5297] - Fixed an issue that caused the Parquet parser to fail on Spark 2.0 (SW-707).
  • [PUBDEV-5315] - Fixed an issue that caused XGBoost OpenMP to fail on Ubuntu 14.04.

New Feature

  • [PUBDEV-4111] - Added support for INT96 timestamp to the Parquet parser.
  • [PUBDEV-4652] - Added support for XGBoost multinode training in H2O. Note that this is still a BETA feature.
  • [PUBDEV-4980] - Users can now specify a list of algorithms to exclude during an AutoML run. This is done using the new `exclude_algos` parameter.
  • [PUBDEV-5204] - In GLM, users can now specify a list of interactions terms to include when building a model instead of relying on the default action of including all interactions.

Task

  • [PUBDEV-5230] - The Python PCA code examples in github and in the User Guide now use the h2o.estimators.pca.H2OPrincipalComponentAnalysisEstimator method instead of the h2o.transforms.decomposition.H2OPCA method.
  • [PUBDEV-5251] - Upgraded the XGBoost version. This now supports RHEL 6.

Improvement

  • [PUBDEV-5086] - Stacked Ensemble allows you to specify the metalearning algorithm to use when training the ensemble. When an algorithm is specified, Stacked Ensemble runs with the specified algorithm's default hyperparameter values. The new ``metalearner_params`` option allows you to pass in a dictionary/list of hyperparameters to use for that algorithm instead of the defaults.
  • [PUBDEV-5224] - Users can now specify a seed parameter in Stacked Ensemble.
  • [PUBDEV-5310] - Documented clouding behavior of an H2O cluster. This is available at https://github.com/h2oai/h2o-3/blob/master/h2o-docs/devel/h2o_clouding.rst.

Docs

  • [PUBDEV-5149] - Updated the documentation to indicate that datetime parsing from R and Flow now is UTC by default.
  • [PUBDEV-5151] - R documentation on docs.h2o.ai is now available in HTML format.
  • [PUBDEV-5172] - Added a new Cloud Integration topic for using H2O with AWS.
  • [PUBDEV-5221] - In the XGBoost chapter, added that XGBoost in H2O supports multicore.
  • [PUBDEV-5242] - Added `interaction_pairs` to the list of GLM parameters.
  • [PUBDEV-5283] - Added `metalearner_algorithm` and `metalearner_params` to the Stacked Ensembles chapter.
  • [PUBDEV-5311] - The H2O-3 download site now includes a link to the HTML version of the R documentation.
  • [PUBDEV-5312] - Updated the XGBoost documentation to indicate that multinode support is now available as a Beta feature.
  • [PUBDEV-5314] - Added the seed parameter to the Stacked Ensembles section of the User Guide.

Wheeler (3.16.0.4) - 1/15/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wheeler/4/index.html

Bug

  • [PUBDEV-5206] - Fixed several client deadlock issues.
  • [PUBDEV-5212] - When verifying that a supported version of Java is available, H2O no longer checks for version 1.6.
  • [PUBDEV-5216] - The H2O-3 download site has an updated link for the Sparkling Water README.
  • [PUBDEV-5220] - In Aggregator, fixed the way that a created mapping frame is populated.
  • New Feature

  • [PUBDEV-5209] - XGBoost can now be used in H2O on Hadoop with a single node.
  • Improvement

  • [PUBDEV-5210] - Deep Water is disabled in AutoML.
  • [PUBDEV-5211] - This release of H2O includes an upgraded XGBoost version.
  • Wheeler (3.16.0.3) - 1/8/2018

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wheeler/3/index.html

    Technical task

    • [PUBDEV-5184] - H2O-3 now allows definition of custom function directly in Python notebooks and enables iterative updates on defined functions.

    Bug

    • [PUBDEV-4863] - When a frame name includes numbers followed by alphabetic characters (for example, "250ML"), Rapids no longer parses the frame name as two tokens.
    • [PUBDEV-4897] - Fixed an issue that caused Partial Dependence Plots to a use different order of categorical values after calling as.factor.
    • [PUBDEV-5148] - Added support for CDH 5.13.
    • [PUBDEV-5180] - Fixed an issue that caused a Python 2 timestamp to be interpreted as two tokens.
    • [PUBDEV-5196] - Aggregator supports categorial features. Fixed a discrepency in the Aggregator documentation.

    New Feature

    • [PUBDEV-4622] - In GBM, users can now specify quasibinomial distribution.
    • [PUBDEV-4965] - H2O-3 now supports the Netezza JDBC driver.

    Improvement

    • [PUBDEV-5171] - Users can now optionally export the mapping of rows in an aggregated frame to that of the original raw data.

    Docs

    • [PUBDEV-5120] - When using S3/S3N, revised the documentation to recommend that S3 should be used for data ingestion, and S3N should be used for data export.
    • [PUBDEV-5150] - The H2O User Guide has been updated to indicate support for CDH 5.13.
    • [PUBDEV-5162] - Updated the Anaconda section with information specifically for Python 3.6 users.
    • [PUBDEV-5178] - The H2O User Guide has been updated to indicate support for the Netezza JDBC driver.
    • [PUBDEV-5190] - Added "quasibinomial" to the list of `distribution` options in GBM.
    • [PUBDEV-5192] - Added the new `save_mapping_frame` option to the Aggregator documentation.

    Wheeler (3.16.0.2) - 11/30/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wheeler/2/index.html

    Bug

  • [PUBDEV-5115] - In AutoML, fixed an issue that caused the leaderboard_frame to be ignored when nfolds > 1.
  • [PUBDEV-5117] - Improved the warning that displays when mismatched jars exist.
  • [PUBDEV-5126] - The correct H2O version now displays in setup.py for sdist.
  • Improvement

  • [PUBDEV-5111] - Incorporated final improvements to the Sparkling Water booklet.
  • [PUBDEV-5127] - Automated Anaconda releases.
  • [PUBDEV-5131] - This version of H2O introduces light rest endpoints for obtaining frames in the python client.
  • Wheeler (3.16.0.1) - 11/24/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wheeler/1/index.html

    Technical Task

    • [PUBDEV-5087] - A backend Java API is now available for custom evaluation metrics.

    Bug

    • [PUBDEV-1465] - Users can now save models to and download models from S3.
    • [PUBDEV-3567] - When running h2o.merge in the R client, the status line indicator will no longer return quickly. Users can no longer enter new commands until the merge process is completed.
    • [PUBDEV-4172] - In the R client strings, training_frame says no longer states that it is an optional parameter.
    • [PUBDEV-4672] - The H2OFrame.mean method now works in Python 3.6.
    • [PUBDEV-4697] - Early stopping now works with perfectly predictive data.
    • [PUBDEV-4727] - h2o.group_by now works correctly when specifying a median() value.
    • [PUBDEV-4778] - In XGBoost fixed an issue that caused prediction on a dataset without a response column to return an error.
    • [PUBDEV-4853] - When running AutoML in Flow, users can now specify a project name.
    • [PUBDEV-4857] - h2odriver in proxy mode now correctly forwards the authentication headers to the H2O node.
    • [PUBDEV-4900] - H2O can ingest Parquet 1.8 files created by Spark.
    • [PUBDEV-4906] - Loading models and exporting models to/from AWS S3 now works correctly.
    • [PUBDEV-4907] - Fixed an issue that caused binary model imports and exports from/to S3 to fail.
    • [PUBDEV-4930] - Users can now load data from s3n resources after setting core-site.xml correctly.
    • [PUBDEV-4953] - Fixed an error that occurred when exporting data to s3.
    • [PUBDEV-4985] - Fixed an issue that caused H2O to "forget" that a column is of factor type if it contains only NA values.
    • [PUBDEV-4996] - The download instructions for Python now indicate that version 3.6 is supported.
    • [PUBDEV-5002] - In Flow, fixed an issue with retaining logs from the client node.
    • [PUBDEV-5003] - H2O can now handle the case where I'm the Client and the md5 should be ignored.
    • [PUBDEV-5005] - h2o.residual_deviance now works correctly.
    • [PUBDEV-5017] - h2o.predict no longer returns an error when the user does not specify an offset_column.
    • [PUBDEV-5033] - Fixed an issue with Spark string chunks.
    • [PUBDEV-5037] - Logs now display correctly on HADOOP, and downloaded logs no longer give an empty folder when the cluster is up.
    • [PUBDEV-5038] - Added an option for handling empty strings. If compare_empty if set to FALSE, empty strings will be handled as NaNs.
    • [PUBDEV-5040] - HTTP logs can now be obtained in Flow UI.
    • [PUBDEV-5048] - Fixed an issue with the progress bar that occurred when running PySparkling + DataBricks.
    • [PUBDEV-5067] - Fixed reporting of clients with the wrong md5.
    • [PUBDEV-5070] - In the R and Python clients, updated the strings for max_active_predictors to indicate that the default is now 5000.
    • [PUBDEV-5072] - h2o.merge now works correctly for one-to-many when all.x=TRUE.
    • [PUBDEV-5074] - Fixed an issue that caused GLM predict to fail when a weights column was not specified.
    • [PUBDEV-5081] - Reduced the number of URLs that get sent to google analytics.
    • [PUBDEV-5095] - When building a Stacked Ensemble model, the fold_column from AutoML is now piped through to the stacked ensemble.
    • [PUBDEV-5096] - Fixed an issue that cause GLM scoring to produce incorrect results for sparse data.

    Epic

    • [PUBDEV-4684] - This version of H2O includes support for Python 3.6.

    New Feature

    • [PUBDEV-3877] - MOJOs are now supported for Stacked Ensembles.
    • [PUBDEV-3743] - User can now specify the metalearner algorithm type that StackedEnsemble should use. This can be AUTO, GLM, GBM, DRF, or Deep Learning.
    • [PUBDEV-3971] - Added a metalearner_folds option in Stacked Ensembles, enabling cross validation.
    • [PUBDEV-4085] - In GBM, endpoints are now exposed that allow for custom evaluation metrics.
    • [PUBDEV-4882] - When running AutoML through the Python or R clients, users can now specify the nfolds argument.
    • [PUBDEV-4891] - Add another Stacked Ensemble (top model for each algo) to AutoML
    • [PUBDEV-5071] - The AutoML leaderboard now uses cross-validation metrics (new default).
    • [PUBDEV-4914] - K-Means POJOs and MOJOs now expose distances to cluster centers.
    • [PUBDEV-4957] - Multiclass stacking is now supported in AutoML. Removed the check that caused AutoML to skip stacking for multiclass.
    • [PUBDEV-5043] - Users can now specify a number of folds when running AutoML in Flow.
    • [PUBDEV-5084] - Added a metalearner_fold_column option in Stacked Ensembles, allowing for custom folds during cross validation.
    • [PUBDEV-4994] - The Aggregator Function is now exposed in the R client.
    • [PUBDEV-4995] - The Aggregator Function is now available in the Python client.

    Story

    Task

    • [PUBDEV-4803] - The current version of h2o-py is now published into PyPi.
    • [PUBDEV-4896] - Change behavior of auto-generation of validation and leaderboard frames in AutoML
    • [PUBDEV-4931] - Updated the download site and the end user documentation to indicate that Python3.6 is now supported.
    • [PUBDEV-4935] - PyPi/Anaconda descriptors now indicate support for Python 3.6.

    Improvement

    • [PUBDEV-4791] - Enabled the lambda search for the GLM metalearner in Stacked Ensembles. This is set to TRUE and early_stopping is set to FALSE.
    • [PUBDEV-4831] - Running `pip install` now installs the latest version of H2O-3.
    • [PUBDEV-4963] - In EasyPredictModelWrapper, preamble(), predict(), and fillRawData() are now protected rather than private.
    • [PUBDEV-5082] - MOJOs/POJOs will not be created for unsupported categorical_encoding values.
    • [PUBDEV-5109] - An AutoML run now outputs two StackedEnsemble model IDs. These are labeled StackedEnsemble_AllModels and StackedEnsemble_BestOfFamily.

    Docs

    • [PUBDEV-4298] - In the Data Manipulation chapter, added a topic for pivoting tables.
    • [PUBDEV-4662] - Added a topic to the Data Manipulation chapter describing the h2o.fillna function.
    • [PUBDEV-4747] - Added MOJO and POJO Quick Start sections directly into the Productionizing H2O chapter. Previously, this chapter included links to quick start files.
    • [PUBDEV-4810] - In the GBM booklet when describing nbins_cat, clarified that factors rather than columns get grouped together.
    • [PUBDEV-4816] - The description for the GLM lambda_max option now states that this is the smallest lambda that drives all coefficients to zero.
    • [PUBDEV-4833] - Updated the installation instructions for PySparkling.
    • [PUBDEV-4864] - Clarified that in H2O-3, sampling is without replacement.
    • [PUBDEV-4878] - Updated documentation to state that multiclass classification is now supported in Stacked Ensembles.
    • [PUBDEV-4879] - Updated documentation to state that multiclass stacking is now supported in AutoML.
    • [PUBDEV-4895] - Added an Early Stopping section the Algorithms > Common chapter.
    • [PUBDEV-4945] - Added a note in Word2vec stating that binary format is not supported.
    • [PUBDEV-4946] - In the Parameters Appendix, updated the description for histogram_type=random.
    • [PUBDEV-4958] - In the Using Flow > Models > Run AutoML section, updated the AutoML screenshot to show the new Project Name field.
    • [PUBDEV-4971] - Added a Sorting Columns data munging topic describing how to sort a data frame by column or columns.
    • [PUBDEV-5000] - In KMeans, updated the list of model summary statistics and training metrics that are outputted.
    • [PUBDEV-5011] - Removed SortByResponse from the list of categorical_encoding options for Aggregator and K-Means.
    • [PUBDEV-5026] - Updated the Sparkling Water links on docs.h2o.ai to point to the latest release.
    • [PUBDEV-5032] - Added a section in the Algorithms chapter for Aggregator.
    • [PUBDEV-5056] - Updated the description for Save and Loading Models to indicate that H2O binary models are not compatible across H2O versions.
    • [PUBDEV-5057] - Added ignored_columns and 'x' parameters to AutoML section. Also added the 'x' parameter to the Parameters Appendix.
    • [PUBDEV-5062] - In DRF, add FAQs describing splitting criteria.
    • [PUBDEV-5085] - Added the new metalearner_folds and metalearner_fold_assignment parameters to the Defining a Stacked Ensemble Model section in the User Guide.
    • [PUBDEV-5089] - Updated the Sparking Water booklet. (Also PUBDEV-5004.)
    • [PUBDEV-5092] - Added the new metalearner_algorithm parameter to Defining a Stacked Ensemble Model section in the User Guide.
    • [PUBDEV-5097] - The User Guide and the POJO/MOJO Javadoc have been updated to indicate that MOJOs are supported for Stacked Ensembles.

    Weierstrass (3.14.0.7) - 10/20/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-weierstrass/7/index.html

    Bug

    • [PUBDEV-4987] - h2o.H2OFrame.any() and h2o.H2OFrame.all() not working properly if frame contains only True
    • [PUBDEV-4988] - Don't check H2O client hash-code ( Fix )

    Task

    • [PUBDEV-4003] - Generate Python API tests for Python Module Data in H2O and Data Manipulation

    Weierstrass (3.14.0.6) - 10/9/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-weierstrass/6/index.html

    Bug

    • [SW-542] - Fixed an issue that prevented Sparkling Water from importing Parquet files.

    Weierstrass (3.14.0.5) - 10/9/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-weierstrass/5/index.html

    Bug

    • [PUBDEV-4870] - Fixed an issue that caused sorting to be done incorrectly.
    • [PUBDEV-4917] - Only relevant clients (the ones with the same cloud name) are now reported to H2O.
    • [PUBDEV-4954] - Improved error messaging in the case where H2O fails to parse a valid Parquet file.
    • [PUBDEV-4959] - Fixed an issue that allowed nodes from different clusters to kill different H2O clusters.
    • [PUBDEV-4979] - Fixed an issue that caused K-Means to improperly calculate scaled distance.

    Task

    • [PUBDEV-4925] - Nightly and stable releases will now have published sha256 hashes.

    Improvement

    • [PUBDEV-4404] - The h2o.sort() function now includes an `ascending` parameter that allows you to specify whether a numeric column should be sorted in ascending or descending order.
    • [PUBDEV-4964] - H2O no longer terminates when an incompatible client tries to connect.

    Docs

    • [PUBDEV-4949] - Updated the list of required packages for the H2O-3 R client on the H2O Download site and in the User Guide.
    • [PUBDEV-4966] - Added an FAQ to the User Guide FAQ describing how Java 9 users can switch to a supported Java version.

    Weierstrass (3.14.0.3) - 9/18/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-weierstrass/3/index.html

    Technical Task

    • [PUBDEV-4873] - Introduced a Python client side AST optimization.

    Bug

    • [PUBDEV-3525] - In R, `h2o.arrange()` can now sort on a float column.
    • [PUBDEV-4723] - The `as_data_frame()` function no longer drops rows with NAs when `use_pandas` is set to TRUE.
    • [PUBDEV-4735] - In Deep Learning POJOs, fixed an issue in the sharing stage between threads.
    • [PUBDEV-4739] - Fixed an issue in R that caused `h2o.sub` to fail to retain the column names of the frame.
    • [PUBDEV-4757] - Running ifelse() on a constant column no longer results in an error.
    • [PUBDEV-4846] - Using + on string columns now works correctly.
    • [PUBDEV-4848] - Fixed an issue that caused a POJO and a MOJO to return different column names with the `getNames()` method.
    • [PUBDEV-4849] - The R and Python clients now have consistent timeout numbers.
    • [PUBDEV-4868] - Fixed an issue that resulted in an AIOOB error when predicting with GLM. NA responses are now removed prior to GLM scoring.
    • [PUBDEV-4909] - The set_name method now works correctly in the Python client.
    • [PUBDEV-4921] - Replaced the deprecated Clock class in timing.gradle.
    • [PUBDEV-4937] - The MOJO Reader now closes open files after reading.

    New Feature

    • [PUBDEV-4628] - MOJO support has been extended to include the Deep Learning algorithm.
    • [PUBDEV-4845] - Added the ability to import an encrypted (AES128) file into H2O. This can be configured glovally by specifying the `-decrypt_tool` option and installing the tool in DKV.
    • [PUBDEV-4904] - The Decryption API is now exposed in the REST API and in the R client.

    Docs

    • [PUBDEV-4811] - Updated the MOJO Quick Start Guide to show separator differences between Linux/OS X and Windows. Also updated the R example to match the Python example.

    Weierstrass (3.14.0.2) - 8/21/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-weierstrass/2/index.html

    Bug

    • [PUBDEV-4804] - Fixed a broken link to the Hive tutorials from the Productionizing section in the User Guide.
    • [PUBDEV-4822] - Sparkling Water can now pass a data frame with a vector for conversion into H2OFrame. In prior versions, the vector was not properly expanded and resulted in a failure.

    Task

    • [PUBDEV-4802] - Added more tests to ensure that, when max_runtime_secs is set, the returned model works correctly.

    Improvement

    • [PUBDEV-4812] - This version of H2O includes an option to force toggle (on/off) a specific extension. This enables users to enable the XGBoost REST API on a system that does not support XGBoost.
    • [PUBDEV-4829] - A warning now displays when the minimal XGBoost version is used.

    Weierstrass (3.14.0.1) - 8/10/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-weierstrass/1/index.html

    Bug

    • [PUBDEV-2767] - In the R client, making a copy of a factor column and then changing the factor levels no longer causes the levels of the original column to change.
    • [PUBDEV-4584] - Added a **Leaderboard Frame** option in Flow when configuring an AutoML run.
    • [PUBDEV-4586] - The `h2o.performance` function now works correctly on XGBoost models.
    • [PUBDEV-4625] - In the Python client, improved the help string for `h2o_import_file`. This string now indicates that setting `(parse=False)` will return a list instead of an H2OFrame.
    • [PUBDEV-4654] - Removed the Ecko dependency. This is not needed.
    • [PUBDEV-4683] - Fixed an issue that caused the parquet parser to store numeric/float values in a string column. This issue occurred when specifying an unsupported type conversion in Parse Setup (for example, numeric -> string). Users will now encounter an error when attempting this. Additionally, users can now change Enums->Strings in parse setup.
    • [PUBDEV-4686] - Deep Learning POJOs are now thread safe.
    • [PUBDEV-4688] - Fixed the default print method for H2OFrame in Python. Now when a user types the H2OFrame name, a new line is added, and the header is pushed to the next line.
    • [PUBDEV-4702] - Fixed an issue that caused the `max_runtime_secs` parameter to fail correctly when run through the Python client. As a result of this fix, the `max_runtime_secs` parameter was added to Word2vec.
    • [PUBDEV-4704] - Fixed an issue that caused XGBoost grid search to fail when using the Python client.
    • [PUBDEV-4724] - When running with weighted data and columns that are constant after applying weights, a GLM lambda search no longer results in an AIOOB error.
    • [PUBDEV-4730] - The XGBoost `max_bin` parameter has been renamed to `max_bins`, and its default value is now 256.
    • [PUBDEV-4731] - XGBoost Python documentation is now available.
    • [PUBDEV-4732] - In XGBoost, the `learning_rate` (alias: `eta` parameter now has a default value of 0.3.
    • [PUBDEV-4734] - In XGBoost, the `max_depth` parameter now has a default value of 6.
    • [PUBDEV-4735] - Multi-threading is now supported by POJO downloaded.
    • [PUBDEV-4751] - The XGBoost `min_rows` (alias: `min_child_weight`) parameter now has a default value of 1.
    • [PUBDEV-4752] - The XGBoost `max_abs_leafnode_pred` (alias: `max_delta_step`) parameter now has a default value of 0.
    • [PUBDEV-4753] - H2O XGBoost default options are now consistent with XGBoost default values. This fix involved the following changes:
      • num_leaves has been renamed max_leaves, and its default value is 0.
      • The default value for reg_lambda is 0.
    • [PUBDEV-4756] - Removed the Guava dependency from the Deep Water API.
    • [PUBDEV-4776] - In XGBoost, the default value for sample_rate and the alias subsample are now both 1.0.
    • [PUBDEV-4777] - In XGBoost, the default value for colsample_bylevel (alias: colsample_bytree) has been changed to 1.0.
    • [PUBDEV-4783] - Hidden files are now ignored when reading from HDFS.

    New Feature

    • [PUBDEV-4446] - Added a `verbose` option to Deep Learning, DRF, GBM, and XGBoost. When enabled, this option will display scoring histories as a model job is running.
    • [PUBDEV-4682] - Added an `extra_classpath` option, which allows users to specify a custom classpath when starting H2O from the R and Python client.
    • [PUBDEV-4685] - Users can now override the type of a Str/Cat column in a Parquet file when the parser attempts to auto detect the column type.
    • [PUBDEV-4738] - Users can now run a standalone H2O instance and read from a Kerberized cluster's HDFS.
    • [PUBDEV-4745] - Added support for CDH 5.10.
    • [PUBDEV-4750] - Added support for MapR 5.2.

    Improvement

    • [PUBDEV-3947] - Fixed an issue that caused PCA to take 39 minutes to run on a wide dataset. The wide dataset method for PCA is now only enabled if the dataset is very wide.
    • [PUBDEV-4596] - XGBoost-specific WARN messages have been converted to TRACE.
    • [PUBDEV-4624] - When printing frames via `head()` or `tail()`, the `nrows` option now allows you to specify more than 10 rows. With this change, you can print the complete frame, if desired.
    • [PUBDEV-4630] - Improved the speed of converting a sparse matrix to an H2OFrame in R.
    • [PUBDEV-4664] - Added the following parameters to the XGBoost R/Py clients:
      • categorical_encoding
      • sample_type
      • normalize_type
      • rate_drop
      • one_drop
      • skip_drop
    • [PUBDEV-4676] - H2O can now handle sparse vectors as the input of the external frame handler.
    • [PUBDEV-4692] - Added MOJO support for Spark SVM.
    • [PUBDEV-4701] - When running AutoML from within Flow, the default `stopping_tolerance` is now NULL instead of 0.001.
    • [PUBDEV-4748] - Removed dependency on Reflections.

    Docs

    Vajda (3.10.5.4) - 7/17/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-vajda/4/index.html

    Bug

    • [PUBDEV-4694] - Tree Algos are wasting memory by storing categorical values in every tree

    Vajda (3.10.5.3) - 6/30/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-vajda/3/index.html

    Bug

    • [PUBDEV-4026] - Fixed an issue that resulted in "Unexpected character after column id:" warnings when parsing an SVMLight file.
    • [PUBDEV-4445] - h2o.predict now displays a warning if the features (columns) in the test frame do not contain those features used by the model.
    • [PUBDEV-4572] - The XGBoost REST API is now only registered when backend lib exists.
    • [PUBDEV-4595] - H2O no longer displays an error if there is a "/" in the user-supplied model name. Instead, a message will display indicating that the "/" is replaced with "_".

    Improvement

    • [PUBDEV-3941] - Added support for autoencoder POJOs in in the EasyPredictModelWrapper.
    • [PUBDEV-4269] - H2O now warns the user about the minimal required Colorama version in case of python client. Note that the current minimum version is 0.3.8.
    • [PUBDEV-4537] - Removed deprecation warnings from the H2O build.
    • [PUBDEV-4548] - Moved the initialization of XGBoost into the H2O core extension.

    Docs

    • [PUBDEV-4515] - Added a link to paper describing balance classes in the balance_classes parameter topic.
    • [PUBDEV-4610] - Removed `laplace`, `huber`, and `quantile` from list of supported distributions in the XGBoost documentation.
    • [PUBDEV-4612] - Add heuristics to the FAQ > General Troubleshooting topic.

    Vajda (3.10.5.2) - 6/19/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-vajda/2/index.html

    Bug

    • [PUBDEV-3860] - In PCA, fixed an issue that resulted in errors when specifying `pca_method=glrm` on wide datasets. In addition, the GLRM algorithm can now be used with wide datasets.
    • [PUBDEV-4416] - Fixed issues with streamParse in ORC parser that caused a NullPointerException when parsing multifile from Hive.
    • [PUBDEV-4438] - Fixed an issue that occurred with H2O data frame indexing for large indices that resulted in off-by-one errors. Now, when indexing is set to a value greater than 1000, indexing between left and right sides is no longer inconsistent.
    • [PUBDEV-4456] - In DRF, fixed an issue that resulted in an AssertionError when run on certain datasets with weights.
    • [PUBDEV-4579] - Removed an incorrect Python example from the Sparkling Water booklet. Python users must start Spark using the H2O pysparkling egg on the Python path. Using `--package` when running the pysparkling app is not advised, as the pysparkling distribution already contains the required jar file.
    • [PUBDEV-4594] - In GLM fixed an issue that caused a Runtime exception when specifying the quasibinomial family with `nfold = 2`.

    New Feature

    • [PUBDEV-3624] - Added top an bottom N functions, which allow users to grab the top or bottom N percent of a numerical column. The returned frame contains the original row indices of the top/bottom N percent values extracted into the second column.
    • [PUBDEV-4096] - When building Stacked Ensembles in R, the base_models parameter can accept models rather than just model IDs. Updated the documentation in the User Guide for the base_models parameter to indicate this.
    • [PUBDEV-4523] - Added the following new GBM and DRF parameters to the User Guide: `calibrate_frame` and `calibrate_model`.

    Improvement

    • [PUBDEV-4531] - Improved PredictCsv.java as follows:
      • Enabled PredictCsv.java to accept arbitrary separator characters in the input dataset file if the user includes the optional flag `--separator` in the input arguments. If a user enters a special Java character as the separator, then H2O will add "\".
      • Enabled PredictCsv.java to perform setConvertInvalidNumbersToNa(setInvNumNA)) if the optional flag `--setConvertInvalidNum` is included in the input arguments.
    • [PUBDEV-4578] - Fixed the R package so that a "browseURL" NOTE no longer appears.
    • [PUBDEV-4583] - In the R package documentation, improved the description of the GLM `alpha` parameter.

    Docs

    • [PUBDEV-4524] - In the "Using Flow - H2O’s Web UI" section of the User Guide, updated the Viewing Models topic to include that users can download the h2o-genmodel.jar file when viewing models in Flow.
    • [PUBDEV-4549] - The `group_by` function accepts a number of aggregate options, which were documented in the User Guide and in the Python package documentation. These aggregate options are now described in the R package documentation.
    • [PUBDEV-4575] - Added an initial XGBoost topic to the User Guide. Note that this is still a work in progress.

    Vajda (3.10.5.1) - 6/9/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-vajda/1/index.html

    Technical Task

    Bug

    • [PUBDEV-1457] - PCA no longer reports incorrect values when multiple eigenvectors exist.
    • [PUBDEV-1571] - Users can now specify the weights_column as a numeric index in R.
    • [PUBDEV-1578] - Fixed an issue that caused GLM models returned by h2o.glm() and h2o.getModel(..) to be different.
    • [PUBDEV-1616] - Fixed an issue that caused PCA with GLRM to display incorrect results on data.
    • [PUBDEV-2286] - Fixed an issue that caused `df.show(any_int)` to always display 10 rows.
    • [PUBDEV-2415] - Starting an H2O cloud from R no longer results in "Error in as.numeric(x["max_mem"]) : (list) object cannot be coerced to type 'double'"
    • [PUBDEV-2656] - `h2o::ifelse` now handles NA values the same way that `base::ifelse` does.
    • [PUBDEV-2715] - Fixed an issue in PCA that resulted in incorrect standard deviation and components results for non standardized data.
    • [PUBDEV-2759] - When performing a grid search with a `fold_assignment` specified and with `cross_validation` disabled, Python unit tests now display a Java error message. This is because a fold assignment is meaningless without cross validation.
    • [PUBDEV-2816] - The Python `h2o.get_grid()` function is now in the base h2o object, allowing you to use it the same way as `h2o.get_model()`, `h2o.get_frame()` etc.
    • [PUBDEV-3196] - The `.mean()` function can now be applied to a row in `H2OFrame.apply()`.
    • [PUBDEV-3350] - Fixed an issue that caused a negative value to display in the H2O cluster version.
    • [PUBDEV-3396] - GLM now checks to see if a response is encoded as a factor and warns the user if it is not.
    • [PUBDEV-3470] - Fixed an issue that resulted in an `h2o.init()` fail message even though the server had actually been started. As a result, H2O did not shutdown automatically upon exit.
    • [PUBDEV-3502] - Fixed an issue that caused PCA to hang when run on a wide dataset using the Randomized `pca_method`. Note that it is still not recommended to use Randomized with wide datasets.
    • [PUBDEV-3520] - `h2o.setLevels` now works correctly when wrapped into invisible.
    • [PUBDEV-3651] - Added a dependency for the roxygen2 package.
    • [PUBDEV-3711] - `h2o.coef` in R is now functional for multinomial models.
    • [PUBDEV-3729] - When converting a column to `type = string` with `.ascharacter()` in Python, the `structure` method now correctly recognizes the change.
    • [PUBDEV-3759] - Fixed an issue that caused GBM Grid Search to hang.
    • [PUBDEV-3777] - Subset h2o frame now allows 0 row subset - just as data.frame.
    • [PUBDEV-3815] - Fixed an issue that caused the R `apply` method to fail to work with `h2o.var()`.
    • [PUBDEV-3859] - PCA no longer reports errors when using PCA on wide datasets with `pca_method = Randomized`. Note that it is still not recommended to use Randomized with wide datasets.
    • [PUBDEV-3900] - Jenkins builds no longer all share the same R package directory, and new H2O R libraries are installed during testing.
    • [PUBDEV-3905] - When trimming is done, H2O now checks if it passes the beginning of the string. This check prevents the code from going further down the memory with negative indexes.
    • [PUBDEV-3973] - Stacked Ensembles no longer fails when the `fold_assignment` for base learners is not `Modulo`.
    • [] - Fixed an issue that caused H2O to generate invalid code in POJO for PCA/SVM.
    • [PUBDEV-4079] - Instead of using random charset for getting bytes from strings, the source code now centralizes "byte extraction" in StringUtils. This prevents different build machines from using different default encoders.
    • [PUBDEV-4090] - When performing a Random Hyperparameter Search, if the model parameter seed is set to the default value but a search_criteria seed is not, then the model parameter seed will now be set to search_criteria seed+0, 1, 2, ..., model_number. Seeding the built models makes random hyperparameter searches more repeatable.
    • [PUBDEV-4100] - Fixed a bad link that was included in the "A K/V Store for In-Memory Analytics, Part 2" blog.
    • [PUBDEV-4138] - Comments are now permitted in Content-Type header for application/json mime type. As a result, specifying content-type charset no longer results in the request body being ignored.
    • [PUBDEV-4143] - Improved the Python `group_by` option count column name to match the R client.
    • [PUBDEV-4146] - Fixed broken links in the "Hacking Algorithms into H2O" blog post.
    • [PUBDEV-4156] - The Python API now provides a method to extract parameters from `cluster_status`.
    • [PUBDEV-4171] - Fixed incorrect parsing of input parameters. Previously, system property parsing logic added the value of any system property other than "ga_opt_out" to the arguments list if a property was prefixed with "ai.h2o.". This caused an attempt to parse the value of a system property as if it were itself a system property and at times resulted in an "Unknown Argument" error.
    • [PUBDEV-4174] - Fixed intermittent pyunit_javapredict_dynamic_data_paramsDR.
    • [PUBDEV-4177] - Fixed orc parser test by setting timezone to local time.
    • [PUBDEV-4185] - H2O can now correctly handle preflight OPTIONS calls - specifically in the event of a (1) CORS request and (2) the request has a content type other than text/plain, application/x-www-form-urlencoded, or multipart/form-data.
    • [PUBDEV-4202] - In the REST API, POST of application/json requests no longer fails if requests expect required fields.
    • [PUBDEV-4216] - The R client `impute` function now checks for categorical values and returns an error if none exist.
    • [PUBDEV-4231] - Fixed a filepath issue that occurred on Windows 7 systems when specifying a network drive.
    • [PUBDEV-4234] - Added a response column to Stacked Ensembles so that it can be exposed in the Flow UI.
    • [PUBDEV-4235] - Updated the list of required packages on the H2O download page for the Python client.
    • [PUBDEV-4250] - Updated the header in the Confusion Matrix to make the list of actual vs predicted values more clear.
    • [PUBDEV-4300] - Explicit 1-hot encoding in FrameUtils no longer generates an invalid order of column names. MissingLevel is now the last column.
    • [PUBDEV-4304] - Fixed an issue that caused ModelBuilder to leak xval frames if hyperparameter errors existed.
    • [PUBDEV-4311] - Fixed an issue that caused PCA model output to fail to display the Importance of Components.
    • [PUBDEV-4314] - When using the H2O Python client, the varimp() function can now be used in PCA to retrieve the Importance of Components details.
    • [PUBDEV-4315] - Fixed an issue that caused an ArrayIndexOutOfBoundsException in GLM.
    • [PUBDEV-4316] - When a main model is cloned to create the CV models, clearValidationMessages() is now called. Messages are no longer all thrown into a single bucket, which previously caused confusion with the `error_count()`.
    • [PUBDEV-4317] - ModelBuilder.message(...) now correctly bumps the error count when the message is an error.
    • [PUBDEV-4319] - Fixed an issue with unseen categorical levels handling in GLM scoring. Prediction with "skip" missing value handling in GLM with more than one variable no longer fails.
    • [PUBDEV-4321] - ModelMetricsRegression._mean_residual_deviance is now exposed. For all algorithms except GLM, this is the mean residual deviance. For GLM, this is the total residual deviance.
    • [PUBDEV-4326] - Fixed an issue that caused the`~` operator to fail when used in the Python client. Now, all logical operators set their results as Boolean.
    • [PUBDEV-4328] - Fixed an issue that caused an assertion error in GLM.
    • [PUBDEV-4330] - In GLM, fixed an issue that caused GLM to fail when `quasibinomial` was specified with a link other than the default. Specifying an incorrect link for the quasibinomial family will now result in an error message.
    • [PUBDEV-4350] - Improved the doc strings for `sample_rate_per_class` in R and Python.
    • [PUBDEV-4351] - Fixed a bug in the cosine distance formula.
    • [PUBDEV-4352] - Fixed an issue with CBSChunk set with long argument.
    • [PUBDEV-4363] - C0DChunk with con == NaN now works with strings.
    • [PUBDEV-4378] - When retrieving a Variable Importance plot using the H2O Python client, the default number of features shown is now 10 (or all if < 10 exist). Also reduced the top and bottom margins of the Y axis.
    • [PUBDEV-4381] - When retrieving a Variable Importance plot using the H2O R client, the default number of features shown is now 10 (or all if < 10 exist).
    • [PUBDEV-4416] - Fixed an ORC stream parse.
    • [PUBDEV-4429] - Appended constant string to frame.
    • [PUBDEV-4495] - Fixed an issue with the View Log option in Flow.
    • [PUBDEV-4499] - The h2o.deepwater.available function is now working in the R API.
    • [PUBDEV-4542] - Fixed a bug with Log.info that resulted in bypassing log initialization.
    • [PUBDEV-4543] - LogsHandler now checks whether logging on specific level is enabled before accessing the particular log.
    • [PUBDEV-4546] - Fixed a logging issue that caused PID values to be set to an incorrect value. H2O now initializes PID before we initializing SELF_ADDRESS. This change was necessary because initialization of SELF_ADDRESS triggers buffered logged messages to be logged, and PID is part of the log header.

    Epic

    New Feature

    • [PUBDEV-47] - Generate R bindings now available for REST API.
    • [PUBDEV-103] - Flow: Implemented test infrastructure for Jenkins/CI.
    • [PUBDEV-525] - The R client now reports to the user when memory limits have been exceeded.
    • [PUBDEV-2022] - Added support to impute missing elements for RandomForest.
    • [PUBDEV-2348] - Added a probability calibration plot function.
    • [PUBDEV-2535] - A new h2o.pivot() function is available to allow pivoting of tables.
    • [PUBDEV-3666] - MOJO support has been extended to K-Means models.
    • [PUBDEV-3840] - Added two new options in GBM and DRF: `calibrate_model` and `calibrate_frame`. These flags allow you to retrieve calibrated probabilities for binary classification problems.
    • [PUBDEV-3850] - In Stacked Ensembles, added support for passing in models instead of model IDs when using the R client.
    • [PUBDEV-3970] - Added support for saving and loading binary Stacked Ensemble models.
    • [PUBDEV-4104] - Added support for idxmax, idxmin in Python H2OFrame to get an index of max/min values.
    • [PUBDEV-4105] - Added support for which.max, which.min support for R H2OFrame to get an index of max/min values.
    • [PUBDEV-4134] - A new h2o.sort() function is available in the H2O Python client. This returns a new Frame that is sorted by column(s) in ascending order. The column(s) to sort by can be either a single column name, a list of column names, or a list of column indices.
    • [PUBDEV-4147] - Word2vec can now be used with the H2O Python client.
    • [PUBDEV-4151] - Missing values are filled sequentially for time series data.
    • [PUBDEV-4168] - Enabled cors option flag behind the sys.ai.h2o. prefix for debugging.
    • [PUBDEV-4266] - Added support for converting a Word2vec model to a Frame.
    • [PUBDEV-4280] - Created a Capability rest end point that gives the client an overview of registered extensions.
    • [PUBDEV-4329] - When viewing a model in Flow, a new **Download Gen Model** button is available, allowing you to save the h2o-genmodel.jar file locally.
    • [PUBDEV-4425] - Added an `h2o.flow()` function to base H2O. This allows users to open up a Flow window from within R and Python.
    • [PUBDEV-4472] - The `parse_type` parameter is now case insensitive.
    • [PUBDEV-4478] - Added automatic reduction of categorical levels for Aggregator. This can be done by setting `categorical_encoding=EnumLimited`.
    • [NA] - In GBM and DRF, added two new categorical_encoding schemas: SortByResponse and LabelEncoding. More information about these options is available here.

    Story

    • [PUBDEV-3927] - Added support for Leave One Covariate Out (LOCO). This calculates row-wise variable importances by re-scoring a trained supervised model and measuring the impact of setting each variable to missing or its most central value (mean or median & mode for categoricals).
    • [PUBDEV-4049] - Removed support for Java 6.
    • [PUBDEV-4274] - Integrated XGBoost with H2O core as a separate extension module.

    Task

    • [PUBDEV-4062] - Users can now run predictions in R using a MOJO or POJO without running h2o running.
    • [PUBDEV-4087] - Created a test to verify that random grid search honors the `max_runtime_secs` parameter.
    • [PUBDEV-4193] - Removed javaMess.txt from scripts
    • [PUBDEV-4238] - A new `node()` function is available for retrieving node information from an H2O Cluster.
    • [PUBDEV-4353] - Improved the R/Py doc strings for the `sample_rate_per_class` parameter.
    • [PUBDEV-4412] - Users can now optionally build h2o.jar with a visualization data server using the following: `./gradlew -PwithVisDataServer=true -PvisDataServerVersion=3.14.0 :h2o-assemblies:main:projects`
    • [PUBDEV-4454] - Removed support for the following Hadoop platforms: CDH 5.2, CDH 5.3, and HDP 2.1.
    • [PUBDEV-4466] - Added the ability to go from String to Enum in PojoUtils.
    • [PUBDEV-4479 - Continued modularization of H2O by removing reflections utils and replace them by SPI.
    • [PUBDEV-4481] - Removed the deprecated `h2o.importURL` function from the R API.
    • [PUBDEV-4490] - Stacked Ensembles now removes any unnecessary frames, vecs, and models that were produced when compiled.
    • [PUBDEV-4494] - Updated R and Python doc strings to indicate that users can save and load Stacked Ensemble binary models. In the User Guide, updated the FAQ that previously indicated users could not save and load stacked ensemble models.

    Improvement

    • [PUBDEV-3088] - Improved error handling when users receive the follwoing error: `Error: lexical error: invalid char in json text.
    • [PUBDEV-3500] - In PCA, when the user specifies a value for k that is <=0, then all principal components will automatically be calculated.
    • [PUBDEV-3908] - Exposed metalearner and base model keys in R/Py StackedEnsemble object.
    • [PUBDEV-4072] - The `h2o.download_pojo()` function now accepts a `jar_name` parameter, allowing users to create custom names for the downloaded file.
    • [PUBDEV-4103] - Added port and ip details to the error logs for h2o cloud.
    • [PUBDEV-4141] - When using Hadoop with SSL Internode Security, the `-internal_security` flag is now deprecated in favor of the `-internal_security_conf` flag.
    • [PUBDEV-4169] - Scala version of udf now serializes properly in multinode.
    • [PUBDEV-4181] - Fixed an NPM warn message.
    • [PUBDEV-4184] - Updated the documentation for using H2O with Anaconda and included an end-to-end example.
    • [PUBDEV-4190] - Arguments in h2o.naiveBayes in R are now the same as Python/Java.
    • [PUBDEV-4207] - StackedEnsembles is now stable vs. experimental.
    • [PUBDEV-4256] - Introduced latest_stable_R and latest_stable_py links, making it easy to point users to the current stable version of H2O for Python and R.
    • [PUBDEV-4267] - In the R client, the default for `nthreads` is now -1. The documentation examples have been updated to reflect this change.
    • [PUBDEV-4307] - ModelMetrics can sort models by a different Frame.
    • [PUBDEV-4331] - The application type is now reported in YARN manager, and H2O now overrides the default MapReduce type to H2O type.
    • [PUBDEV-4419] - Added a title option to PrintMOJO utility
    • [PUBDEV-4431] - Flow now uses ip:port for identifying the node as part of LogHandler.
    • [PUBDEV-4465] - Reduced the frequency of Hadoop heartbeat logging.
    • [PUBDEV-4484] - In GLM, quasibinomial models produce binomial metrics when scoring.
    • [PUBDEV-4492] - Implemented methods to get registered H2O capabilities in Python client.
    • [PUBDEV-4493] - Implemented methods to get registered H2O capabilities in R client.
    • [PUBDEV-4498] - Upgraded Flow to version 0.7.0
    • [PUBDEV-4511] - Removed the `selection_strategy` argument from Stacked Ensembles.
    • [PUBDEV-4533] - In Stacked Ensembles, added support for passing in models instead of model IDs when using the Python client.
    • [PUBDEV-4536] - Provided a file that contains a list of licenses for each H2O dependency. This can be acquired using com.github.hierynomus.license.
    • [PUBDEV-4540] - H2O now explicitly checks if the port and baseport is within allowed port range.

    Docs

    • [PUBDEV-2864] - Added documentation describing how to call Rapids expressions from Flow.
    • [PUBDEV-3944] - Added parameter descriptions for Naive Bayes parameter.
    • [PUBDEV-3945] - Added examples for Naive Bayes parameter.
    • [PUBDEV-4075] - Added `label_encoder` and `sort_by_response` to the list of available `categorical_encoding` options.
    • [PUBDEV-4095] - Added support for KMeans in MOJO documentation.
    • [PUBDEV-4078] - Added a topic to the Data Manipulation section describing the `group_by` function.
    • [PUBDEV-4140] - In the Productionizing H2O section of the User Guide, added an example showing how to read a MOJO as a resource from a jar file.
    • [PUBDEV-4182] - Improved the R and Python documentation for coef() and coef_norm().
    • [PUBDEV-4183] - In the GLM section of the User Guide, added a topic describing how to extract coefficient table information. This new topic includes Python and R examples.
    • [PUBDEV-4184] - Added information about Anaconda support to the User Guide. Also included an IPython Notebook example.
    • [PUBDEV-4194] - Added Word2vec to list of supported algorithms on docs.h2o.ai.
    • [PUBDEV-4201] - Uncluttered the H2O User Guide. Combined serveral topics on the left navigation/TOC. Some changes include the following:
      • Moved AWS, Azure, DSX, and Nimbix to a new Cloud Integration section.
      • Added a new **Getting Data into H2O** topic and moved the Supported File Formats and Data Sources topics into this.
      • Moved POJO/MOJO topic into the **Productionizing H2O** section.
    • [PUBDEV-4206] - In the Security topic of the User Guide, added a section about using H2O with PAM authentication.
    • [PUBDEV-4211] - Documentation for `h2o.download_all_logs()` now informs the user that the supplied file name must include the .zip extension.
    • [PUBDEV-4218 - Added an FAQ describing how to use third-party plotting libraries to plot metrics in the H2O Python client. This faq is available in the FAQ > Python topic.
    • [PUBDEV-4230] - Added an "Authentication Options" section to **Starting H2O > From the Command Line**. This section describes the options that can be set for all available supported authentication types. This section also includes flags for setting the newly supported Pluggable Authentication Module (PAM) authentication as well as Form Authentication and Session timeouts for H2O Flow.
    • [PUBDEV-4232] - Updated documentation to indicate that Word2vec is now supported for Python.
    • [PUBDEV-4253] - Added support for HDP 2.6 in the Hadoop Users section.
    • [PUBDEV-4258] - Added two FAQs within the GLM section describing why H2O's glm differs from R's glm and the steps to take to get the two to match. These FAQs are available in the GLM > FAQ section.
    • [PUBDEV-4268] - Updated R examples in the User Guide to reflect that the default value for `nthreads` is now -1.
    • [PUBDEV-4281] - Updated the POJO Quick Start markdown file and Javadoc.
    • [PUBDEV-4290] - Added the `-principal` keyword to the list of Hadoop launch parameters.
    • [PUBDEV-4294] - In the Deep Learning topic, deleted the Algorithm section. The information included in that section has been moved into the Deep Learning FAQ.
    • [PUBDEV-4297] - Documented support for using H2O with Microsoft Azure Linux Data Science VM. Note that this is currently still a BETA feature.
    • [PUBDEV-4309] - Added an FAQ describing YARN resource usage. This FAQ is available in the FAQ > Hadoop topic.
    • [PUBDEV-4336] - Added parameter descriptions for PCA parameters.
    • [PUBDEV-4337] - Added examples for PCA parameters.
    • [PUBDEV-4348] - A new h2o.sort() function is available in the H2O Python client. This returns a new Frame that is sorted by column(s) in ascending order. The column(s) to sort by can be either a single column name, a list of column names, or a list of column indices. Information about this function is available in the Python and R documentation.
    • [PUBDEV-4349] - Updated the "Using H2O with Microsoft Azure" topics.
    • [PUBDEV-4362] - Updated the "What is H2O" section in each booklet.
    • [PUBDEV-4387] - A Deep Water booklet is now available. A link to this booklet is on docs.h2o.ai.
    • [PUBDEV-4396] - Updated GLM documentation to indicate that GLM supports both multinomial and binomial handling of categorical values.
    • [PUBDEV-4397] - Added an FAQ describing the steps to take if a user encounters a "Server error - server 127.0.0.1 is unreachable at this moment" message. This FAQ is available in the FAQ > R topic.
    • [PUBDEV-4401] - Fixed documentation that described estimating in K-means.
    • [PUBDEV-4403] - Updated the documentation that described how to download a model in Flow.
    • [PUBDEV-4444] - The Data Sources topic, which describes that data can come from local file system, S3, HDFS, and JDBC, now also includes that data can be imported by specifying the URL of a file.
    • [PUBDEV-4467] - H2O now supports GPUs. Updated the FAQ that indicated we do not, and added a pointer to Deep Water.

    Ueno (3.10.4.8) - 5/21/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/8/index.html

    Bug

    • [PUBDEV-4123] - Python: Frame summary does not return Python object
    • [PUBDEV-4315] - AIOOB with GLM
    • [PUBDEV-4330] - glm : quasi binomial with link other than default causes an h2o crash

    Improvement

    • [PUBDEV-4332] - Create new /3/SteamMetrics REST API endpoint
    • [PUBDEV-4436] - Steam hadoop user impersonation

    Ueno (3.10.4.7) - 5/8/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/7/index.html

    Bug

    • [PUBDEV-4392] - h2o on yarn: H2O does not respect the cloud name in case of flatfile mode

    Ueno (3.10.4.6) - 4/26/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/6/index.html

    Bug

    • [PUBDEV-4265] - Problem with h2o.uploadFile on Windows
    • [PUBDEV-4339] - glm: get AIOOB exception on attached data
    • [PUBDEV-4341] - External cluster always reports ""Timeout for confirmation exceeded!"

    Ueno (3.10.4.5) - 4/19/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/5/index.html

    Bug

    • [PUBDEV-4293] - Problem with h2o.merge in python
    • [PUBDEV-4306] - Failing SVM parse
    • [PUBDEV-4308] - Rollups computation errors sometimes get wrapped in a unhelpful exception and the original cause is hidden.

    Ueno (3.10.4.4) - 4/15/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/4/index.html

    Technical task

    • [PUBDEV-4244] - Add documentation on how to create a config file

    Bug

    • [PUBDEV-2807] - PCA Rotations not displayed in Python API
    • [PUBDEV-4081] - Sparse matrix cannot be converted to H2O
    • [PUBDEV-4229] - Flow/Schema problem, predicting on frame without response returns empty model metrics
    • [PUBDEV-4246] - Proportion of variance in GLRM for single component has a value > 1
    • [PUBDEV-4251] - HDP 2.6 add to the build
    • [PUBDEV-4252] - Set timeout for read/write confirmation in ExternalFrameWriter/ExternalFrameReader
    • [PUBDEV-4261] - GLM default solver gets AIIOB when run on dataset with 1 categorical variable and no intercept
    • [PUBDEV-4285] - Correct exit status reporting ( when running on YARN )
    • [PUBDEV-4287] - Documentation: Update GLM FAQ and missing_values_handling parameter regarding unseen categorical values

    New Feature

    Task

    • [PUBDEV-4180] - Wrap R examples in code so that they don't run on Mac OS
    • [PUBDEV-4215] - Export polygon function to fix CRAN note in h2o R package
    • [PUBDEV-4248] - Add a parameter that ignores the config file reader when h2o.init() is called

    Improvement

    • [PUBDEV-4239] - Extend Watchdog client extension so cluster is also stopped when the client doesn't connect in specified timeout
    • [PUBDEV-4288] - Set hadoop user from h2odriver

    Ueno (3.10.4.3) - 3/31/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/3/index.html

    Bug

    • [PUBDEV-3281] - ARFF parser parses attached file incorrectly
    • [PUBDEV-4097] - Proxy warning message displays proxy with username and password.
    • [PUBDEV-4165] - h2o.import_sql_table works in R but on python gives error
    • [PUBDEV-4167] - java.lang.IllegalArgumentException with PCA
    • [PUBDEV-4187] - Impute does not handle catgoricals when values is specified
    • [PUBDEV-4219] - Increase number of bins in partial plots

    New Feature

    • [PUBDEV-4162] - h2o.transform can produce incorrect aggregated sentence embeddings

    Improvement

    • [PUBDEV-3858] - Errors with PCA on wide data for pca_method = Power
    • [PUBDEV-4102] - Introduce mode in which failure of H2O client ensures whole H2O clouds goes down
    • [PUBDEV-4178] - Add support for IBM IOP 4.2
    • [PUBDEV-4186] - Placeholder for: [SW-334]
    • [PUBDEV-4191] - Remove minor version from hadoop distribution in buildinfo.json file

    Ueno (3.10.4.2) - 3/18/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/2/index.html

    Bug

    • [PUBDEV-4119] - Deep Learning: mini_batch_size >>> 1 causes OOM issues
    • [PUBDEV-4135] - head(df) and tail(df) results in R are inconsistent for datetime columns
    • [PUBDEV-4144] - GLM with family = multinomial, intercept=false, and weights or SkipMissing produces error
    • [PUBDEV-4155] - glm hot fix: fix model.score0 for multinomial

    New Feature

    • [PUBDEV-4133] - Add option to specify a port range for the Hadoop driver callback
    • [PUBDEV-4139] - Support reading MOJO from a classpath resource

    Improvement

    • [PUBDEV-4056] - Arff Parser doesn't recognize spaces in @attribute
    • [PUBDEV-4099] - How to generate Precision Recall AUC (PRAUC) from the scala code

    Docs

    • [PUBDEV-3977] - Documentation: Add documentation for word2vec
    • [PUBDEV-4118] - Documentation: Add topic for using with IBM Data Science Experience
    • [PUBDEV-4149] - Document "driverportrange" option of H2O's Hadoop driver

    Ueno (3.10.4.1) - 3/3/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/1/index.html

    Technical task

    • [PUBDEV-3943] - Documentation: Naive Bayes links to parameters section

    Bug

    • [PUBDEV-3817] - Error in predict, performance functions caused by fold_column
    • [PUBDEV-3820] - Kmeans Centroid info not Rendered through Python API
    • [PUBDEV-3827] - PCA "Importance of Components" returns "data frame with 0 columns and 0 rows"
    • [PUBDEV-3866] - Stratified sampling does not split minority class
    • [PUBDEV-3885] - R Kmean's user_point doesn't get used
    • [PUBDEV-3903] - Setting -context_path doesn't change REST API path
    • [PUBDEV-3932] - K-means Training Metrics do not match Prediction Metrics with same data
    • [PUBDEV-3938] - h2o-py/tests/testdir_hdfs/pyunit_INTERNAL_HDFS_timestamp_date_orc.py failing
    • [PUBDEV-4017] - gradle update broke the build
    • [PUBDEV-4019] - H2O config (~/.h2oconfig) should allow user to specify username and password
    • [PUBDEV-4032] - Flow/R/Python - H2O cloudInfo should show if cluster is secured or not
    • [PUBDEV-4039] - FLOW fails to display custom models including Word2Vec
    • [PUBDEV-4040] - Import json module as different alias in Python API
    • [PUBDEV-4041] - Stacked Ensemble docstring example is broken
    • [PUBDEV-4042] - The autogen R bindings have an incorrect definition for the y argument
    • [PUBDEV-4047] - AIOOB while training an H2OKMeansEstimator
    • [PUBDEV-4065] - Fix bug in randomgridsearch and Fix intermittent pyunit_gbm_random_grid_large.py
    • [PUBDEV-4066] - Typos in Stacked Ensemble Python H2O User Guide example code
    • [PUBDEV-4073] - StackedEnsemble: stacking fails if combined with ignore_columns
    • [PUBDEV-4083] - AIOOB in GLM

    New Feature

    • [PUBDEV-3852] - Documentation: Add Data Munging topic for file name globbing
    • [PUBDEV-4009] - Integration to add new top-level Plot menu to Flow
    • [PUBDEV-4038] - Add stddev to PDP computation

    Task

    • [PUBDEV-3685] - Update h2o-py README
    • [PUBDEV-3797] - Generate Python API tests for H2O Cluster commands
    • [PUBDEV-3914] - Add documentation for python GroupBy class
    • [PUBDEV-3915] - Document python's Assembly and ConfusionMatrix classes, add python API tests as well
    • [PUBDEV-3937] - Clean up R docs
    • [PUBDEV-3986] - Documentation: Summarize the method for estimating k in kmeans and add to docs
    • [PUBDEV-4006] - Update links to Stacking on docs.h2o.ai
    • [PUBDEV-4021] - H2O config (~/.h2oconfig) should allow user to specify username and password
    • [PUBDEV-4067] - Check if strict_version_check is TRUE when checking for config file

    Improvement

    • [PUBDEV-3781] - Documentation: Add info about sparse data support
    • [PUBDEV-3784] - h2o doc deeplearning: clarify what the (heuristics)defaults for auto are in categorical_encoding
    • [PUBDEV-3919] - Saving/serializing currently existing, detailed model information
    • [PUBDEV-3961] - Py/R: Remove unused 'cluster_id' parameter
    • [PUBDEV-3983] - Update GBM FAQ
    • [PUBDEV-3994] - Documentation: Add info about imputing data in Flow and in Data Manipulation
    • [PUBDEV-3998] - Documentation: Add instructions for running demos
    • [PUBDEV-4005] - AIOOB Exception with fold_column set with kmeans
    • [PUBDEV-4055] - Modify h2o#connect function to accept config with connect_params field
    • [PUBDEV-4059] - Change of h2o.connect(config) interface to support Steam

    Tverberg (3.10.3.5) - 2/16/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tverberg/5/index.html

    Bug

    • [PUBDEV-3848] - GLM with interaction parameter and cross-validation cause Exception
    • [PUBDEV-3916] - pca: hangs on attached data
    • [PUBDEV-3964] - StepOutOfRangeException when building GBM model
    • [PUBDEV-3976] - py unique() returns frame of integers (since epoch) instead of frame of unique dates
    • [PUBDEV-3979] - py date comparisons don't work for rows > 1
    • [PUBDEV-3980] - AstUnique drops column types
    • [PUBDEV-4013] - In R, the confusion matrix at the end doesn’t say: vertical: actual, across: predicted
    • [PUBDEV-4014] - AIOOB in GLM with hex.DataInfo.getCategoricalId(DataInfo.java:952) is the error with 2 fold cross validation
    • [PUBDEV-4036] - Parse fails when trying to parse large number of Parquet files
    • [HEXDEV-683] - POJO doesn't include Forest classes
    • [PUBDEV-4044] - moment producing wrong dates

    Tverberg (3.10.3.4) - 2/3/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tverberg/4/index.html

    Bug

    • [PUBDEV-3965] - Importing data in python returns error - TypeError: expected string or bytes-like object

    Tverberg (3.10.3.3) - 2/2/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tverberg/3/index.html

    Bug

    • [PUBDEV-3835] - Standard Errors in GLM: calculating and showing specifically when called

    Improvement

    Tverberg (3.10.3.2) - 1/31/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tverberg/2/index.html

    Bug

    • Hotfix: Remove StackedEnsemble from Flow UI. Training is only supported from Python and R interfaces. Viewing is supported in the Flow UI.

    Tverberg (3.10.3.1) - 1/30/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tverberg/1/index.html

    Bug

    • [PUBDEV-2464] - Using asfactor() in Python client cannot allocate to a variable
    • [PUBDEV-3111] - R API's h2o.interaction() does not use destination_frame argument
    • [PUBDEV-3694] - Errors with PCA on wide data for pca_method = GramSVD which is the default
    • [PUBDEV-3742] - StackedEnsemble should work for regression
    • [PUBDEV-3865] - h2o gbm : for an unseen categorical level, discrepancy in predictions when score using h2o vs pojo/mojo
    • [PUBDEV-3883] - Negative indexing for H2OFrame is buggy in R API
    • [PUBDEV-3894] - Relational operators don't work properly with time columns.
    • [PUBDEV-3966] - java.lang.AssertionError when using h2o.makeGLMModel

    Story

    • [PUBDEV-3739] - StackedEnsemble: put ensemble creation into the back end

    New Feature

    • [PUBDEV-2058] - Implement word2vec in h2o
    • [PUBDEV-3635] - Ability to Select Columns for PDP computation in Flow
    • [PUBDEV-3881] - Add PCA Estimator documentation to Python API Docs
    • [PUBDEV-3902] - Documentation: Add information about Azure support to H2O User Guide (Beta)

    Task

    • [PUBDEV-3336] - h2o.create_frame(): if randomize=True, `value` param cannot be used
    • [PUBDEV-3740] - REST: implement simple ensemble generation API
    • [PUBDEV-3843] - Modify R REST API to always return binary data
    • [PUBDEV-3844] - Safe GET calls for POJO/MOJO/genmodel
    • [PUBDEV-3864] - Import files by pattern
    • [PUBDEV-3884] - StackedEnsemble: Add to online documentation
    • [PUBDEV-3940] - Add Stacked Ensemble code examples to R docs

    Improvement

    • [PUBDEV-3257] - Documentation: As a K-Means user, I want to be able to better understand the parameters
    • [PUBDEV-3741] - StackedEnsemble: add tests in R and Python to ensure that a StackedEnsemble performs at least as well as the base_models
    • [PUBDEV-3857] - Clean up the generated Python docs
    • [PUBDEV-3895] - Filter H2OFrame on pandas dates and time (python)
    • [PUBDEV-3912] - Provide way to specify context_path via Python/R h2o.init methods
    • [PUBDEV-3933] - Modify gen_R.py for Stacked Ensemble
    • [PUBDEV-3972] - Add Stacked Ensemble code examples to Python docstrings

    Tutte (3.10.2.2) - 1/12/2017

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tutte/2/index.html

    Bug

    Task

    • [PUBDEV-3816] - import functions required for r-release check

    Tutte (3.10.2.1) - 12/22/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tutte/1/index.html

    Bug

    • [PUBDEV-3291] - Summary() doesn't update stats values when asfactor() is applied
    • [PUBDEV-3498] - rectangular assign to a categorical column does not work (should be possible to assign either an existing level, or a new one)
    • [PUBDEV-3618] - Numerical Column Names in H2O and R
    • [PUBDEV-3690] - pred_noise_bandwidth parameter is not reproducible with seed
    • [PUBDEV-3723] - Fix mktime() referencing from 0 base to 1 base for month and day
    • [PUBDEV-3728] - Binary loss functions return error in GLRM
    • [PUBDEV-3747] - python hist() plotted bars overlap
    • [PUBDEV-3750] - Python set_levels doesn't change other methods
    • [PUBDEV-3753] - h2o doc: glm grid search hyper parameters missing/incorrect listing. Presently glrm's is marked as glm's
    • [PUBDEV-3764] - Partial Plot incorrectly calculates for constant categorical column
    • [PUBDEV-3778] - h2o.proj_archetypes returns error if constant column is dropped in GLRM model
    • [PUBDEV-3788] - GLRM loss by col produces error if constant columns are dropped
    • [PUBDEV-3796] - isna() overwrites column names
    • [PUBDEV-3812] - NullPointerException with Quantile GBM, cross validation, & sample_rate < 1
    • [PUBDEV-3819] - R h2o.download_mojo broken - writes a 1 byte file
    • [PUBDEV-3831] - Seed definition incorrect in R API for RF, GBM, GLM, NB
    • [PUBDEV-3834] - h2o.glm: get AIOOB exception with xval and lambda search

    New Feature

    • [PUBDEV-3482] - Supporting GLM binomial model to allow two arbitrary integer values
    • [PUBDEV-3376] - Implement ISAX calculations per ISAX word
    • [PUBDEV-3377] - Optimizations and final fixes for ISAX
    • [PUBDEV-3664] - Implement GLM MOJO
    • [PUBDEV-3501] - Variance metrics are missing from GLRM that are available in PCA
    • [PUBDEV-3541] - py h2o.as_list() should not return headers
    • [PUBDEV-3715] - Modify sum() calculation to work on rows or columns
    • [PUBDEV-3737] - make sure that the generated R bindings work with StackedEnsemble
    • [PUBDEV-3833] - Add HDP 2.5 Support

    Task

    • [PUBDEV-3012] - Remove grid.sort_by method in Python API
    • [PUBDEV-3695] - Documentation: Add GLM to list of algorithms that support MOJOs
    • [PUBDEV-3791] - Documentation: Add quasibinomomial family in GLM
    • [PUBDEV-3676] - Add SLURM cluster documentation
    • [PUBDEV-3692] - Add memory check for GLRM before proceeding
    • [PUBDEV-3765] - Check to make sure hinge loss works for GLRM
    • [PUBDEV-3803] - Add parameters from _upload_python_object to H2OFrame constructor
    • [PUBDEV-3804] - Refer to .h2o.jar.env when detaching R package
    • [PUBDEV-3805] - Call on proper port when exiting R/detaching package
    • [PUBDEV-3806] - Modify search for config file in R api
    • [PUBDEV-3818] - properly handle url in R docs from autogen

    Improvement

    • [PUBDEV-3256] - Documentation: As a GLM user, I want to be able to better understand the parameters
    • [PUBDEV-3758] - Fix bad/inconsistent/empty categorical (bitset) splits for DRF/GBM
    • [PUBDEV-3793] - Auto-generate R bindings

    Turnbull (3.10.1.2) - 12/14/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turnbull/2/index.html

    Bug

    • [PUBDEV-2801] - Starting h2o server from R ignores IP and port parameters
    • [PUBDEV-3484] - Treat 1-element numeric list as acceptable when numeric input required
    • [PUBDEV-3509] - h2o's cor() breaks R's native cor()
    • [PUBDEV-3592] - h2o.get_grid isn't working
    • [PUBDEV-3607] - `cor` function should properly pass arguments
    • [PUBDEV-3629] - Avoid confusing error message when column name is not found.
    • [PUBDEV-3631] - overwrite_with_best_model fails when using checkpoint
    • [PUBDEV-3633] - plot.h2oModel in R no longer supports metrics with uppercase names (e.g. AUC)
    • [PUBDEV-3642] - Fix citibike R demo
    • [PUBDEV-3697] - Create an Attribute for Number of Interal Trees in Python
    • [PUBDEV-3704] - Error with early stopping and score_tree_interval on GBM
    • [PUBDEV-3735] - Python's coef() and coef_norm() should use column name not index
    • [PUBDEV-3757] - Perfbar does not work for hierarchical path passed via -h2o_context

    New Feature

    • [PUBDEV-3474] - Show Partial Dependence Plots in Flow
    • [PUBDEV-3620] - Allow setting nthreads > 255.
    • [PUBDEV-3700] - Add RMSE, MAE, RMSLE, and lift_top_group as stopping metrics
    • [PUBDEV-3719] - Update h2o.mean in R to match Python API

    Task

    • [PUBDEV-3579] - Document Partial Dependence Plot in Flow
    • [PUBDEV-3621] - Add R endpoint for cumsum, cumprod, cummin, and cummax
    • [PUBDEV-3649] - Modify correlation matrix calculation to match R
    • [PUBDEV-3657] - Remove max_confusion_matrix_size from booklets & py doc

    Improvement

    • [HEXDEV-645] - aggregator should calculate domain for enum columns in aggregated output frames & member frames based on current output or member frame
    • [HEXDEV-658] - Naive Bayes (and maybe GLM): Drop limit on classes that can be predicted (currently 1000)
    • [PUBDEV-3625] - Speed up GBM and DRF
    • [PUBDEV-3756] - Support `-context_path` to change servlet path for REST API

    IT Help

    Turing (3.10.0.10) - 11/7/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/10/index.html

    Bug

    • [PUBDEV-3484] - Treat 1-element numeric list as acceptable when numeric input required
    • [PUBDEV-3675] - Cannot determine file type

    Turing (3.10.0.9) - 10/25/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/9/index.html

    Bug

    • [PUBDEV-3546] - h2o.year() method does not return year
    • [PUBDEV-3559] - Regression Training Metrics: Deviance and MAE were swapped
    • [PUBDEV-3568] - h2o.max returns NaN even when na.rf condition is set to TRUE
    • [PUBDEV-3593] - Fix display of array-valued entries in TwoDimTables such as grid search results

    Improvement

    • [PUBDEV-3585] - Optimize algorithm for automatic estimation of K for K-Means
    • [HEXDEV-646] - include flow, /3/ API accessible Aggregator model in h2o-3

    Turing (3.10.0.8) - 10/10/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/8/index.html

    Technical task

    Bug

    • [PUBDEV-3384] - S3 API method PersistS3#uriToKey breaks expected contract
    • [PUBDEV-3437] - GLM multinomial with defaults fails on attached dataset
    • [PUBDEV-3441] - .structure() encounters list index out of bounds when nan is encountered in column
    • [PUBDEV-3455] - max_active_predi tors option in glm does not work anymore
    • [PUBDEV-3461] - Printed PCA model metrics in R is missing
    • [PUBDEV-3477] - R - Unnecessary JDK requirement on Windows
    • [PUBDEV-3505] - uuid columns with mostly missing values causes parse to fail.
    • [HEXDEV-599] - Fold Column not available in h2o.grid

    New Feature

    • [PUBDEV-1943] - Compute partial dependence data
    • [PUBDEV-3422] - Create Method to Return Columns of Specific Type
    • [PUBDEV-3491] - Find optimal number of clusters in K-Means
    • [PUBDEV-3492] - Add optional categorical encoding schemes for GBM/DRF

    Task

    • [PUBDEV-3327] - Tasks for completing MOJO support
    • [PUBDEV-3444] - Ensure functions have `h2o.*` alias in R API

    Improvement

    • [PUBDEV-3465] - Sync up functionality of download_mojo and download_pojo in R & Py
    • [PUBDEV-3499] - Improve the stopping criterion for K-Means Lloyds iterations
    • [HEXDEV-596] - Encryption of H2O communication channels
    • [HEXDEV-636] - add option to Aggregator model to show ignored columns in output frame

    Turing (3.10.0.7) - 9/19/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/7/index.html

    Bug

    • [PUBDEV-3300] - NPE during categorical encoding with cross-validation (Windows 8 runit only??)
    • [PUBDEV-3306] - H2OFrame arithmetic/statistical functions return inconsistent types
    • [PUBDEV-3315] - Multi file parse fails with NPE
    • [PUBDEV-3374] - h2o.hist() does not respect breaks
    • [PUBDEV-3401] - importFiles, with s3n, gives NullPointerException
    • [PUBDEV-3409] - Python Structure() Breaks When Applied to Entire Dataframe

    New Feature

    • [PUBDEV-2707] - Diff operation on column in H2O Frame
    • [HEXDEV-619] - calculate residuals in h2o-3 and in flow and create a new frame with a new column that contains the residuals

    Task

    Improvement

    • [PUBDEV-3296] - In R, allow x to be missing (meaning take all columns except y) for all supervised algo's
    • [PUBDEV-3329] - median() should return a list of medians from an entire frame
    • [PUBDEV-3334] - Conduct rbind and cbind on multiple frames
    • [PUBDEV-3387] - Add argument to H2OFrame.print in R to specify number of rows
    • [PUBDEV-3418] - Suppress chunk summary in describe()

    Turing (3.10.0.6) - 8/25/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/6/index.html

    Bug

    • [HEXDEV-608] - Hashmap in H2OIllegalArgumentException fails to deserialize & throws FATAL
    • [PUBDEV-2879] - NPE in MetadataHandler
    • [PUBDEV-3086] - hist() fails for constant numeric columns
    • [PUBDEV-3173] - Client mode: flatfile requires list of all nodes, but a single entry node should be sufficient
    • [PUBDEV-3207] - Make CreateFrame reproducible for categorical columns.
    • [PUBDEV-3208] - Fix intermittency of categorical encoding via eigenvector.
    • [PUBDEV-3211] - isBitIdentical is returning true for two Frames with different content
    • [PUBDEV-3222] - AssertionError for DL train/valid with categorical encoding
    • [PUBDEV-3237] - Wrong MAE for observation weights other than 1.
    • [PUBDEV-3244] - H2ODriver for CDH5.7.0 does not accept memory settings
    • [PUBDEV-3276] - H2OFrame.drop() leaves the frame in inconsistent state

    New Feature

    • [PUBDEV-3007] - Implement skewness calculation for H2O Frames
    • [PUBDEV-3008] - Implement kurtosis calculation for H2O Frames
    • [PUBDEV-3128] - Add ability to do a deep copy in Python API
    • [PUBDEV-3163] - Add docs for h2o.make_metrics() for R and Python
    • [PUBDEV-3218] - Add RMSLE to model metrics
    • [PUBDEV-3264] - Return unique values of a categorical column as a Pythonic list

    Task

    • [PUBDEV-3235] - Refactor and simplify implementation of Pearson Correlation
    • [PUBDEV-3238] - Add MAE to CV Summary

    Improvement

    • [PUBDEV-2702] - Create h2o.* functions for H2O primitives
    • [PUBDEV-3098] - Add methods to get actual and default parameters of a model
    • [PUBDEV-3132] - Add ability to drop a list of columns or a subset of rows from an H2OFrame
    • [PUBDEV-3138] - Ensure all is*() functions return a list

    Turing (3.10.0.3) - 7/29/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/3/index.html

    Bug

    • [PUBDEV-2805] - Error when setting a string column to a single value in R/Py
    • [PUBDEV-2965] - R h2o.merge() ignores by.x and by.y
    • [PUBDEV-3135] - Download Logs broken URL from Flow

    New Feature

    • [PUBDEV-2958] - H2O Version Check
    • [PUBDEV-3022] - Add an h2o.concat function equivalent to pandas.concat
    • [PUBDEV-3050] - Add Huber loss function for GBM and DL (for regression)
    • [PUBDEV-3071] - Add RMSE to model metrics
    • [PUBDEV-3104] - Add Mean Absolute Error to Model Metrics
    • [PUBDEV-3108] - Add mean absolute error to scoring history and model plotting
    • [PUBDEV-3116] - Add categorical encoding schemes for DL and Aggregator
    • [PUBDEV-3155] - Compute supervised ModelMetrics from predicted and actual values in Java/R
    • [PUBDEV-3162] - Compute supervised ModelMetrics from predicted and actual values in Python

    Improvement

    • [PUBDEV-1888] - Implement gradient checking for DL
    • [PUBDEV-2627] - Add better warning message to functions of H2OModelMetrics objects
    • [PUBDEV-3021] - Add demo datasets to Python package
    • [PUBDEV-3113] - Replace "MSE" with "RMSE" in scoring history table
    • [PUBDEV-3122] - Make all TwoDimTable Headers Pythonic in R and Python API
    • [PUBDEV-3129] - Achieve consistency between DL and GBM/RF scoring history in regression case
    • [PUBDEV-3131] - Disable R^2 stopping criterion in tree model builders
    • [PUBDEV-3149] - Remove R^2 from all model output except GLM

    Turin (3.8.3.4) - 7/15/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turin/4/index.html

    Bug

    • [PUBDEV-3040] - File parse from S3 extremely slow
    • [PUBDEV-3145] - Fix Deep Learning POJO for hidden dropout other than 0.5

    Turin (3.8.3.2) - 7/1/2016

    Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turin/2/index.html

    Bug

    • [PUBDEV-898] - DRF: sample_rate=1 not permitted unless validation is performed
    • [PUBDEV-2087] - create a set of tests which create large POJOs for each algo and compiles them
    • [PUBDEV-2322] - Merge (method="radix") bug1
    • [PUBDEV-2325] - Merge (method="radix") bug2
    • [PUBDEV-2565] - Fold Column not available in h2o.grid
    • [PUBDEV-2964] - h2o.merge(,method="radix") failing 15/40 runs
    • [PUBDEV-3030] - Parse: java.lang.IllegalArgumentException: 0 > -2147483648
    • [PUBDEV-3032] - Cached errors are not printed if H2O exits
    • [PUBDEV-3072] - java.lang.ClassCastException for Quantile GBM
    • [PUBDEV-3077] - model_summary number of trees is too high for multinomial DRF/GBM models
    • [PUBDEV-3079] - NPE when accessing invalid null Frame cache in a Frame's vecs()
    • [PUBDEV-3081] - TwoDimTable version of a Frame prints missing value (NA) as 0
    • [PUBDEV-3089] - Fix tree split finding logic for some cases where min_rows wasn't satisfied and the entire column was no longer considered even if there were allowed split points
    • [PUBDEV-3093] - saveModel and loadModel don't work with windows c:/ paths
    • [PUBDEV-3095] - getStackTrace fails on NumberFormatException
    • [PUBDEV-3096] - TwoDimTable for Frame Summaries doesn't always show the full precision
    • [PUBDEV-3097] - DRF OOB scoring isn't using observation weights
    • [PUBDEV-3099] - AIOOBE when calling 'getModel' in Flow while a GLM model is training

    Task

    • [PUBDEV-2681] - Properly document the addition of missing_values_handling arg to GLM

    Improvement

    • [PUBDEV-1617] - Matt's new merge (aka join) integrated into H2O
    • [