Permalink
Switch branches/tags
jenkins-tomk-hadoop-1 jenkins-tomas_jenkins-7 jenkins-tomas_jenkins-6 jenkins-tomas_jenkins-5 jenkins-tomas_jenkins-4 jenkins-tomas_jenkins-3 jenkins-tomas_jenkins-2 jenkins-tomas_jenkins-1 jenkins-sample-docs-3 jenkins-sample-docs-2 jenkins-sample-docs-1 jenkins-rel-vapnik-1 jenkins-rel-vajda-4 jenkins-rel-vajda-3 jenkins-rel-vajda-2 jenkins-rel-vajda-1 jenkins-rel-ueno-9 jenkins-rel-ueno-8 jenkins-rel-ueno-7 jenkins-rel-ueno-6 jenkins-rel-ueno-5 jenkins-rel-ueno-4 jenkins-rel-ueno-3 jenkins-rel-ueno-2 jenkins-rel-ueno-1 jenkins-rel-tverberg-6 jenkins-rel-tverberg-5 jenkins-rel-tverberg-4 jenkins-rel-tverberg-3 jenkins-rel-tverberg-2 jenkins-rel-tverberg-1 jenkins-rel-tutte-2 jenkins-rel-tutte-1 jenkins-rel-turnbull-2 jenkins-rel-turnbull-1 jenkins-rel-turing-10 jenkins-rel-turing-9 jenkins-rel-turing-8 jenkins-rel-turing-7 jenkins-rel-turing-6 jenkins-rel-turing-5 jenkins-rel-turing-4 jenkins-rel-turing-3 jenkins-rel-turing-2 jenkins-rel-turing-1 jenkins-rel-turin-4 jenkins-rel-turin-3 jenkins-rel-turin-2 jenkins-rel-turin-1 jenkins-rel-turchin-11 jenkins-rel-turchin-10 jenkins-rel-turchin-9 jenkins-rel-turchin-8 jenkins-rel-turchin-7 jenkins-rel-turchin-6 jenkins-rel-turchin-5 jenkins-rel-turchin-4 jenkins-rel-turchin-3 jenkins-rel-turchin-2 jenkins-rel-turchin-1 jenkins-rel-turan-4 jenkins-rel-turan-3 jenkins-rel-turan-2 jenkins-rel-turan-1 jenkins-rel-tukey-6 jenkins-rel-tukey-5 jenkins-rel-tukey-4 jenkins-rel-tukey-3 jenkins-rel-tukey-2 jenkins-rel-tukey-1 jenkins-rel-tibshirani-12 jenkins-rel-tibshirani-11 jenkins-rel-tibshirani-10 jenkins-rel-tibshirani-9 jenkins-rel-tibshirani-8 jenkins-rel-tibshirani-7 jenkins-rel-tibshirani-5 jenkins-rel-tibshirani-4 jenkins-rel-tibshirani-3 jenkins-rel-tibshirani-2 jenkins-rel-tibshirani-1 jenkins-rel-slotnick-1 jenkins-rel-slater-9 jenkins-rel-slater-8 jenkins-rel-slater-7 jenkins-rel-slater-6 jenkins-rel-slater-5 jenkins-rel-slater-4 jenkins-rel-slater-3 jenkins-rel-slater-2 jenkins-rel-slater-1 jenkins-rel-simons-7 jenkins-rel-simons-6 jenkins-rel-simons-5 jenkins-rel-simons-4 jenkins-rel-simons-3 jenkins-rel-simons-2 jenkins-rel-simons-1 jenkins-rel-shannon-30 jenkins-rel-shannon-29
Nothing to show
Find file
a8f2b5a Jul 17, 2017
@abal5 @tomkraljevic @ledell @angela0xdata
6201 lines (4604 sloc) 456 KB

Recent Changes

H2O

Vajda (3.10.5.4) - 7/17/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-vajda/4/index.html

Bug

  • [PUBDEV-4694] - Tree Algos are wasting memory by storing categorical values in every tree

Vajda (3.10.5.3) - 6/30/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-vajda/3/index.html

Bug

  • [PUBDEV-4026] - Fixed an issue that resulted in "Unexpected character after column id:" warnings when parsing an SVMLight file.
  • [PUBDEV-4445] - h2o.predict now displays a warning if the features (columns) in the test frame do not contain those features used by the model.
  • [PUBDEV-4572] - The XGBoost REST API is now only registered when backend lib exists.
  • [PUBDEV-4595] - H2O no longer displays an error if there is a "/" in the user-supplied model name. Instead, a message will display indicating that the "/" is replaced with "_".

Improvement

  • [PUBDEV-3941] - Added support for autoencoder POJOs in in the EasyPredictModelWrapper.
  • [PUBDEV-4269] - H2O now warns the user about the minimal required Colorama version in case of python client. Note that the current minimum version is 0.3.8.
  • [PUBDEV-4537] - Removed deprecation warnings from the H2O build.
  • [PUBDEV-4548] - Moved the initialization of XGBoost into the H2O core extension.

Docs

  • [PUBDEV-4515] - Added a link to paper describing balance classes in the balance_classes parameter topic.
  • [PUBDEV-4610] - Removed `laplace`, `huber`, and `quantile` from list of supported distributions in the XGBoost documentation.
  • [PUBDEV-4612] - Add heuristics to the FAQ > General Troubleshooting topic.

Vajda (3.10.5.2) - 6/19/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-vajda/2/index.html

Bug

  • [PUBDEV-3860] - In PCA, fixed an issue that resulted in errors when specifying `pca_method=glrm` on wide datasets. In addition, the GLRM algorithm can now be used with wide datasets.
  • [PUBDEV-4416] - Fixed issues with streamParse in ORC parser that caused a NullPointerException when parsing multifile from Hive.
  • [PUBDEV-4438] - Fixed an issue that occurred with H2O data frame indexing for large indices that resulted in off-by-one errors. Now, when indexing is set to a value greater than 1000, indexing between left and right sides is no longer inconsistent.
  • [PUBDEV-4456] - In DRF, fixed an issue that resulted in an AssertionError when run on certain datasets with weights.
  • [PUBDEV-4579] - Removed an incorrect Python example from the Sparkling Water booklet. Python users must start Spark using the H2O pysparkling egg on the Python path. Using `--package` when running the pysparkling app is not advised, as the pysparkling distribution already contains the required jar file.
  • [PUBDEV-4594] - In GLM fixed an issue that caused a Runtime exception when specifying the quasibinomial family with `nfold = 2`.

New Feature

  • [PUBDEV-3624] - Added top an bottom N functions, which allow users to grab the top or bottom N percent of a numerical column. The returned frame contains the original row indices of the top/bottom N percent values extracted into the second column.
  • [PUBDEV-4096] - When building Stacked Ensembles in R, the base_models parameter can accept models rather than just model IDs. Updated the documentation in the User Guide for the base_models parameter to indicate this.
  • [PUBDEV-4523] - Added the following new GBM and DRF parameters to the User Guide: `calibrate_frame` and `calibrate_model`.

Improvement

  • [PUBDEV-4531] - Improved PredictCsv.java as follows:
    • Enabled PredictCsv.java to accept arbitrary separator characters in the input dataset file if the user includes the optional flag `--separator` in the input arguments. If a user enters a special Java character as the separator, then H2O will add "\".
    • Enabled PredictCsv.java to perform setConvertInvalidNumbersToNa(setInvNumNA)) if the optional flag `--setConvertInvalidNum` is included in the input arguments.
  • [PUBDEV-4578] - Fixed the R package so that a "browseURL" NOTE no longer appears.
  • [PUBDEV-4583] - In the R package documentation, improved the description of the GLM `alpha` parameter.

Docs

  • [PUBDEV-4524] - In the "Using Flow - H2O’s Web UI" section of the User Guide, updated the Viewing Models topic to include that users can download the h2o-genmodel.jar file when viewing models in Flow.
  • [PUBDEV-4549] - The `group_by` function accepts a number of aggregate options, which were documented in the User Guide and in the Python package documentation. These aggregate options are now described in the R package documentation.
  • [PUBDEV-4575] - Added an initial XGBoost topic to the User Guide. Note that this is still a work in progress.

Vajda (3.10.5.1) - 6/9/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-vajda/1/index.html

Technical Task

Bug

  • [PUBDEV-1457] - PCA no longer reports incorrect values when multiple eigenvectors exist.
  • [PUBDEV-1571] - Users can now specify the weights_column as a numeric index in R.
  • [PUBDEV-1578] - Fixed an issue that caused GLM models returned by h2o.glm() and h2o.getModel(..) to be different.
  • [PUBDEV-1616] - Fixed an issue that caused PCA with GLRM to display incorrect results on data.
  • [PUBDEV-2286] - Fixed an issue that caused `df.show(any_int)` to always display 10 rows.
  • [PUBDEV-2415] - Starting an H2O cloud from R no longer results in "Error in as.numeric(x["max_mem"]) : (list) object cannot be coerced to type 'double'"
  • [PUBDEV-2656] - `h2o::ifelse` now handles NA values the same way that `base::ifelse` does.
  • [PUBDEV-2715] - Fixed an issue in PCA that resulted in incorrect standard deviation and components results for non standardized data.
  • [PUBDEV-2759] - When performing a grid search with a `fold_assignment` specified and with `cross_validation` disabled, Python unit tests now display a Java error message. This is because a fold assignment is meaningless without cross validation.
  • [PUBDEV-2816] - The Python `h2o.get_grid()` function is now in the base h2o object, allowing you to use it the same way as `h2o.get_model()`, `h2o.get_frame()` etc.
  • [PUBDEV-3196] - The `.mean()` function can now be applied to a row in `H2OFrame.apply()`.
  • [PUBDEV-3350] - Fixed an issue that caused a negative value to display in the H2O cluster version.
  • [PUBDEV-3396] - GLM now checks to see if a response is encoded as a factor and warns the user if it is not.
  • [PUBDEV-3470] - Fixed an issue that resulted in an `h2o.init()` fail message even though the server had actually been started. As a result, H2O did not shutdown automatically upon exit.
  • [PUBDEV-3502] - Fixed an issue that caused PCA to hang when run on a wide dataset using the Randomized `pca_method`. Note that it is still not recommended to use Randomized with wide datasets.
  • [PUBDEV-3520] - `h2o.setLevels` now works correctly when wrapped into invisible.
  • [PUBDEV-3651] - Added a dependency for the roxygen2 package.
  • [PUBDEV-3711] - `h2o.coef` in R is now functional for multinomial models.
  • [PUBDEV-3729] - When converting a column to `type = string` with `.ascharacter()` in Python, the `structure` method now correctly recognizes the change.
  • [PUBDEV-3759] - Fixed an issue that caused GBM Grid Search to hang.
  • [PUBDEV-3777] - Subset h2o frame now allows 0 row subset - just as data.frame.
  • [PUBDEV-3815] - Fixed an issue that caused the R `apply` method to fail to work with `h2o.var()`.
  • [PUBDEV-3859] - PCA no longer reports errors when using PCA on wide datasets with `pca_method = Randomized`. Note that it is still not recommended to use Randomized with wide datasets.
  • [PUBDEV-3900] - Jenkins builds no longer all share the same R package directory, and new H2O R libraries are installed during testing.
  • [PUBDEV-3905] - When trimming is done, H2O now checks if it passes the beginning of the string. This check prevents the code from going further down the memory with negative indexes.
  • [PUBDEV-3973] - Stacked Ensembles no longer fails when the `fold_assignment` for base learners is not `Modulo`.
  • [] - Fixed an issue that caused H2O to generate invalid code in POJO for PCA/SVM.
  • [PUBDEV-4079] - Instead of using random charset for getting bytes from strings, the source code now centralizes "byte extraction" in StringUtils. This prevents different build machines from using different default encoders.
  • [PUBDEV-4090] - When performing a Random Hyperparameter Search, if the model parameter seed is set to the default value but a search_criteria seed is not, then the model parameter seed will now be set to search_criteria seed+0, 1, 2, ..., model_number. Seeding the built models makes random hyperparameter searches more repeatable.
  • [PUBDEV-4100] - Fixed a bad link that was included in the "A K/V Store for In-Memory Analytics, Part 2" blog.
  • [PUBDEV-4138] - Comments are now permitted in Content-Type header for application/json mime type. As a result, specifying content-type charset no longer results in the request body being ignored.
  • [PUBDEV-4143] - Improved the Python `group_by` option count column name to match the R client.
  • [PUBDEV-4146] - Fixed broken links in the "Hacking Algorithms into H2O" blog post.
  • [PUBDEV-4156] - The Python API now provides a method to extract parameters from `cluster_status`.
  • [PUBDEV-4171] - Fixed incorrect parsing of input parameters. Previously, system property parsing logic added the value of any system property other than "ga_opt_out" to the arguments list if a property was prefixed with "ai.h2o.". This caused an attempt to parse the value of a system property as if it were itself a system property and at times resulted in an "Unknown Argument" error.
  • [PUBDEV-4174] - Fixed intermittent pyunit_javapredict_dynamic_data_paramsDR.
  • [PUBDEV-4177] - Fixed orc parser test by setting timezone to local time.
  • [PUBDEV-4185] - H2O can now correctly handle preflight OPTIONS calls - specifically in the event of a (1) CORS request and (2) the request has a content type other than text/plain, application/x-www-form-urlencoded, or multipart/form-data.
  • [PUBDEV-4202] - In the REST API, POST of application/json requests no longer fails if requests expect required fields.
  • [PUBDEV-4216] - The R client `impute` function now checks for categorical values and returns an error if none exist.
  • [PUBDEV-4231] - Fixed a filepath issue that occurred on Windows 7 systems when specifying a network drive.
  • [PUBDEV-4234] - Added a response column to Stacked Ensembles so that it can be exposed in the Flow UI.
  • [PUBDEV-4235] - Updated the list of required packages on the H2O download page for the Python client.
  • [PUBDEV-4250] - Updated the header in the Confusion Matrix to make the list of actual vs predicted values more clear.
  • [PUBDEV-4300] - Explicit 1-hot encoding in FrameUtils no longer generates an invalid order of column names. MissingLevel is now the last column.
  • [PUBDEV-4304] - Fixed an issue that caused ModelBuilder to leak xval frames if hyperparameter errors existed.
  • [PUBDEV-4311] - Fixed an issue that caused PCA model output to fail to display the Importance of Components.
  • [PUBDEV-4314] - When using the H2O Python client, the varimp() function can now be used in PCA to retrieve the Importance of Components details.
  • [PUBDEV-4315] - Fixed an issue that caused an ArrayIndexOutOfBoundsException in GLM.
  • [PUBDEV-4316] - When a main model is cloned to create the CV models, clearValidationMessages() is now called. Messages are no longer all thrown into a single bucket, which previously caused confusion with the `error_count()`.
  • [PUBDEV-4317] - ModelBuilder.message(...) now correctly bumps the error count when the message is an error.
  • [PUBDEV-4319] - Fixed an issue with unseen categorical levels handling in GLM scoring. Prediction with "skip" missing value handling in GLM with more than one variable no longer fails.
  • [PUBDEV-4321] - ModelMetricsRegression._mean_residual_deviance is now exposed. For all algorithms except GLM, this is the mean residual deviance. For GLM, this is the total residual deviance.
  • [PUBDEV-4326] - Fixed an issue that caused the`~` operator to fail when used in the Python client. Now, all logical operators set their results as Boolean.
  • [PUBDEV-4328] - Fixed an issue that caused an assertion error in GLM.
  • [PUBDEV-4330] - In GLM, fixed an issue that caused GLM to fail when `quasibinomial` was specified with a link other than the default. Specifying an incorrect link for the quasibinomial family will now result in an error message.
  • [PUBDEV-4350] - Improved the doc strings for `sample_rate_per_class` in R and Python.
  • [PUBDEV-4351] - Fixed a bug in the cosine distance formula.
  • [PUBDEV-4352] - Fixed an issue with CBSChunk set with long argument.
  • [PUBDEV-4363] - C0DChunk with con == NaN now works with strings.
  • [PUBDEV-4378] - When retrieving a Variable Importance plot using the H2O Python client, the default number of features shown is now 10 (or all if < 10 exist). Also reduced the top and bottom margins of the Y axis.
  • [PUBDEV-4381] - When retrieving a Variable Importance plot using the H2O R client, the default number of features shown is now 10 (or all if < 10 exist).
  • [PUBDEV-4416] - Fixed an ORC stream parse.
  • [PUBDEV-4429] - Appended constant string to frame.
  • [PUBDEV-4495] - Fixed an issue with the View Log option in Flow.
  • [PUBDEV-4499] - The h2o.deepwater.available function is now working in the R API.
  • [PUBDEV-4542] - Fixed a bug with Log.info that resulted in bypassing log initialization.
  • [PUBDEV-4543] - LogsHandler now checks whether logging on specific level is enabled before accessing the particular log.
  • [PUBDEV-4546] - Fixed a logging issue that caused PID values to be set to an incorrect value. H2O now initializes PID before we initializing SELF_ADDRESS. This change was necessary because initialization of SELF_ADDRESS triggers buffered logged messages to be logged, and PID is part of the log header.

Epic

New Feature

  • [PUBDEV-47] - Generate R bindings now available for REST API.
  • [PUBDEV-103] - Flow: Implemented test infrastructure for Jenkins/CI.
  • [PUBDEV-525] - The R client now reports to the user when memory limits have been exceeded.
  • [PUBDEV-2022] - Added support to impute missing elements for RandomForest.
  • [PUBDEV-2348] - Added a probability calibration plot function.
  • [PUBDEV-2535] - A new h2o.pivot() function is available to allow pivoting of tables.
  • [PUBDEV-3666] - MOJO support has been extended to K-Means models.
  • [PUBDEV-3840] - Added two new options in GBM and DRF: `calibrate_model` and `calibrate_frame`. These flags allow you to retrieve calibrated probabilities for binary classification problems.
  • [PUBDEV-3850] - In Stacked Ensembles, added support for passing in models instead of model IDs when using the R client.
  • [PUBDEV-3970] - Added support for saving and loading binary Stacked Ensemble models.
  • [PUBDEV-4104] - Added support for idxmax, idxmin in Python H2OFrame to get an index of max/min values.
  • [PUBDEV-4105] - Added support for which.max, which.min support for R H2OFrame to get an index of max/min values.
  • [PUBDEV-4134] - A new h2o.sort() function is available in the H2O Python client. This returns a new Frame that is sorted by column(s) in ascending order. The column(s) to sort by can be either a single column name, a list of column names, or a list of column indices.
  • [PUBDEV-4147] - Word2vec can now be used with the H2O Python client.
  • [PUBDEV-4151] - Missing values are filled sequentially for time series data.
  • [PUBDEV-4168] - Enabled cors option flag behind the sys.ai.h2o. prefix for debugging.
  • [PUBDEV-4266] - Added support for converting a Word2vec model to a Frame.
  • [PUBDEV-4280] - Created a Capability rest end point that gives the client an overview of registered extensions.
  • [PUBDEV-4329] - When viewing a model in Flow, a new **Download Gen Model** button is available, allowing you to save the h2o-genmodel.jar file locally.
  • [PUBDEV-4425] - Added an `h2o.flow()` function to base H2O. This allows users to open up a Flow window from within R and Python.
  • [PUBDEV-4472] - The `parse_type` parameter is now case insensitive.
  • [PUBDEV-4478] - Added automatic reduction of categorical levels for Aggregator. This can be done by setting `categorical_encoding=EnumLimited`.
  • [NA] - In GBM and DRF, added two new categorical_encoding schemas: SortByResponse and LabelEncoding. More information about these options is available here.

Story

  • [PUBDEV-3927] - Added support for Leave One Covariate Out (LOCO). This calculates row-wise variable importances by re-scoring a trained supervised model and measuring the impact of setting each variable to missing or its most central value (mean or median & mode for categoricals).
  • [PUBDEV-4049] - Removed support for Java 6.
  • [PUBDEV-4274] - Integrated XGBoost with H2O core as a separate extension module.

Task

  • [PUBDEV-4062] - Users can now run predictions in R using a MOJO or POJO without running h2o running.
  • [PUBDEV-4087] - Created a test to verify that random grid search honors the `max_runtime_secs` parameter.
  • [PUBDEV-4193] - Removed javaMess.txt from scripts
  • [PUBDEV-4238] - A new `node()` function is available for retrieving node information from an H2O Cluster.
  • [PUBDEV-4353] - Improved the R/Py doc strings for the `sample_rate_per_class` parameter.
  • [PUBDEV-4412] - Users can now optionally build h2o.jar with a visualization data server using the following: `./gradlew -PwithVisDataServer=true -PvisDataServerVersion=3.14.0 :h2o-assemblies:main:projects`
  • [PUBDEV-4454] - Removed support for the following Hadoop platforms: CDH 5.2, CDH 5.3, and HDP 2.1.
  • [PUBDEV-4466] - Added the ability to go from String to Enum in PojoUtils.
  • [PUBDEV-4479 - Continued modularization of H2O by removing reflections utils and replace them by SPI.
  • [PUBDEV-4481] - Removed the deprecated `h2o.importURL` function from the R API.
  • [PUBDEV-4490] - Stacked Ensembles now removes any unnecessary frames, vecs, and models that were produced when compiled.
  • [PUBDEV-4494] - Updated R and Python doc strings to indicate that users can save and load Stacked Ensemble binary models. In the User Guide, updated the FAQ that previously indicated users could not save and load stacked ensemble models.

Improvement

  • [PUBDEV-3088] - Improved error handling when users receive the follwoing error: `Error: lexical error: invalid char in json text.
  • [PUBDEV-3500] - In PCA, when the user specifies a value for k that is <=0, then all principal components will automatically be calculated.
  • [PUBDEV-3908] - Exposed metalearner and base model keys in R/Py StackedEnsemble object.
  • [PUBDEV-4072] - The `h2o.download_pojo()` function now accepts a `jar_name` parameter, allowing users to create custom names for the downloaded file.
  • [PUBDEV-4103] - Added port and ip details to the error logs for h2o cloud.
  • [PUBDEV-4141] - When using Hadoop with SSL Internode Security, the `-internal_security` flag is now deprecated in favor of the `-internal_security_conf` flag.
  • [PUBDEV-4169] - Scala version of udf now serializes properly in multinode.
  • [PUBDEV-4181] - Fixed an NPM warn message.
  • [PUBDEV-4184] - Updated the documentation for using H2O with Anaconda and included an end-to-end example.
  • [PUBDEV-4190] - Arguments in h2o.naiveBayes in R are now the same as Python/Java.
  • [PUBDEV-4207] - StackedEnsembles is now stable vs. experimental.
  • [PUBDEV-4256] - Introduced latest_stable_R and latest_stable_py links, making it easy to point users to the current stable version of H2O for Python and R.
  • [PUBDEV-4267] - In the R client, the default for `nthreads` is now -1. The documentation examples have been updated to reflect this change.
  • [PUBDEV-4307] - ModelMetrics can sort models by a different Frame.
  • [PUBDEV-4331] - The application type is now reported in YARN manager, and H2O now overrides the default MapReduce type to H2O type.
  • [PUBDEV-4419] - Added a title option to PrintMOJO utility
  • [PUBDEV-4431] - Flow now uses ip:port for identifying the node as part of LogHandler.
  • [PUBDEV-4465] - Reduced the frequency of Hadoop heartbeat logging.
  • [PUBDEV-4484] - In GLM, quasibinomial models produce binomial metrics when scoring.
  • [PUBDEV-4492] - Implemented methods to get registered H2O capabilities in Python client.
  • [PUBDEV-4493] - Implemented methods to get registered H2O capabilities in R client.
  • [PUBDEV-4498] - Upgraded Flow to version 0.7.0
  • [PUBDEV-4511] - Removed the `selection_strategy` argument from Stacked Ensembles.
  • [PUBDEV-4533] - In Stacked Ensembles, added support for passing in models instead of model IDs when using the Python client.
  • [PUBDEV-4536] - Provided a file that contains a list of licenses for each H2O dependency. This can be acquired using com.github.hierynomus.license.
  • [PUBDEV-4540] - H2O now explicitly checks if the port and baseport is within allowed port range.

Docs

  • [PUBDEV-2864] - Added documentation describing how to call Rapids expressions from Flow.
  • [PUBDEV-3944] - Added parameter descriptions for Naive Bayes parameter.
  • [PUBDEV-3945] - Added examples for Naive Bayes parameter.
  • [PUBDEV-4075] - Added `label_encoder` and `sort_by_response` to the list of available `categorical_encoding` options.
  • [PUBDEV-4095] - Added support for KMeans in MOJO documentation.
  • [PUBDEV-4078] - Added a topic to the Data Manipulation section describing the `group_by` function.
  • [PUBDEV-4140] - In the Productionizing H2O section of the User Guide, added an example showing how to read a MOJO as a resource from a jar file.
  • [PUBDEV-4182] - Improved the R and Python documentation for coef() and coef_norm().
  • [PUBDEV-4183] - In the GLM section of the User Guide, added a topic describing how to extract coefficient table information. This new topic includes Python and R examples.
  • [PUBDEV-4184] - Added information about Anaconda support to the User Guide. Also included an IPython Notebook example.
  • [PUBDEV-4194] - Added Word2vec to list of supported algorithms on docs.h2o.ai.
  • [PUBDEV-4201] - Uncluttered the H2O User Guide. Combined serveral topics on the left navigation/TOC. Some changes include the following:
    • Moved AWS, Azure, DSX, and Nimbix to a new Cloud Integration section.
    • Added a new **Getting Data into H2O** topic and moved the Supported File Formats and Data Sources topics into this.
    • Moved POJO/MOJO topic into the **Productionizing H2O** section.
  • [PUBDEV-4206] - In the Security topic of the User Guide, added a section about using H2O with PAM authentication.
  • [PUBDEV-4211] - Documentation for `h2o.download_all_logs()` now informs the user that the supplied file name must include the .zip extension.
  • [PUBDEV-4218 - Added an FAQ describing how to use third-party plotting libraries to plot metrics in the H2O Python client. This faq is available in the FAQ > Python topic.
  • [PUBDEV-4230] - Added an "Authentication Options" section to **Starting H2O > From the Command Line**. This section describes the options that can be set for all available supported authentication types. This section also includes flags for setting the newly supported Pluggable Authentication Module (PAM) authentication as well as Form Authentication and Session timeouts for H2O Flow.
  • [PUBDEV-4232] - Updated documentation to indicate that Word2vec is now supported for Python.
  • [PUBDEV-4253] - Added support for HDP 2.6 in the Hadoop Users section.
  • [PUBDEV-4258] - Added two FAQs within the GLM section describing why H2O's glm differs from R's glm and the steps to take to get the two to match. These FAQs are available in the GLM > FAQ section.
  • [PUBDEV-4268] - Updated R examples in the User Guide to reflect that the default value for `nthreads` is now -1.
  • [PUBDEV-4281] - Updated the POJO Quick Start markdown file and Javadoc.
  • [PUBDEV-4290] - Added the `-principal` keyword to the list of Hadoop launch parameters.
  • [PUBDEV-4294] - In the Deep Learning topic, deleted the Algorithm section. The information included in that section has been moved into the Deep Learning FAQ.
  • [PUBDEV-4297] - Documented support for using H2O with Microsoft Azure Linux Data Science VM. Note that this is currently still a BETA feature.
  • [PUBDEV-4309] - Added an FAQ describing YARN resource usage. This FAQ is available in the FAQ > Hadoop topic.
  • [PUBDEV-4336] - Added parameter descriptions for PCA parameters.
  • [PUBDEV-4337] - Added examples for PCA parameters.
  • [PUBDEV-4348] - A new h2o.sort() function is available in the H2O Python client. This returns a new Frame that is sorted by column(s) in ascending order. The column(s) to sort by can be either a single column name, a list of column names, or a list of column indices. Information about this function is available in the Python and R documentation.
  • [PUBDEV-4349] - Updated the "Using H2O with Microsoft Azure" topics.
  • [PUBDEV-4362] - Updated the "What is H2O" section in each booklet.
  • [PUBDEV-4387] - A Deep Water booklet is now available. A link to this booklet is on docs.h2o.ai.
  • [PUBDEV-4396] - Updated GLM documentation to indicate that GLM supports both multinomial and binomial handling of categorical values.
  • [PUBDEV-4397] - Added an FAQ describing the steps to take if a user encounters a "Server error - server 127.0.0.1 is unreachable at this moment" message. This FAQ is available in the FAQ > R topic.
  • [PUBDEV-4401] - Fixed documentation that described estimating in K-means.
  • [PUBDEV-4403] - Updated the documentation that described how to download a model in Flow.
  • [PUBDEV-4444] - The Data Sources topic, which describes that data can come from local file system, S3, HDFS, and JDBC, now also includes that data can be imported by specifying the URL of a file.
  • [PUBDEV-4467] - H2O now supports GPUs. Updated the FAQ that indicated we do not, and added a pointer to Deep Water.

Ueno (3.10.4.8) - 5/21/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/8/index.html

Bug

  • [PUBDEV-4123] - Python: Frame summary does not return Python object
  • [PUBDEV-4315] - AIOOB with GLM
  • [PUBDEV-4330] - glm : quasi binomial with link other than default causes an h2o crash

Improvement

  • [PUBDEV-4332] - Create new /3/SteamMetrics REST API endpoint
  • [PUBDEV-4436] - Steam hadoop user impersonation

Ueno (3.10.4.7) - 5/8/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/7/index.html

Bug

  • [PUBDEV-4392] - h2o on yarn: H2O does not respect the cloud name in case of flatfile mode

Ueno (3.10.4.6) - 4/26/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/6/index.html

Bug

  • [PUBDEV-4265] - Problem with h2o.uploadFile on Windows
  • [PUBDEV-4339] - glm: get AIOOB exception on attached data
  • [PUBDEV-4341] - External cluster always reports ""Timeout for confirmation exceeded!"

Ueno (3.10.4.5) - 4/19/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/5/index.html

Bug

  • [PUBDEV-4293] - Problem with h2o.merge in python
  • [PUBDEV-4306] - Failing SVM parse
  • [PUBDEV-4308] - Rollups computation errors sometimes get wrapped in a unhelpful exception and the original cause is hidden.

Ueno (3.10.4.4) - 4/15/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/4/index.html

Technical task

  • [PUBDEV-4244] - Add documentation on how to create a config file

Bug

  • [PUBDEV-2807] - PCA Rotations not displayed in Python API
  • [PUBDEV-4081] - Sparse matrix cannot be converted to H2O
  • [PUBDEV-4229] - Flow/Schema problem, predicting on frame without response returns empty model metrics
  • [PUBDEV-4246] - Proportion of variance in GLRM for single component has a value > 1
  • [PUBDEV-4251] - HDP 2.6 add to the build
  • [PUBDEV-4252] - Set timeout for read/write confirmation in ExternalFrameWriter/ExternalFrameReader
  • [PUBDEV-4261] - GLM default solver gets AIIOB when run on dataset with 1 categorical variable and no intercept
  • [PUBDEV-4285] - Correct exit status reporting ( when running on YARN )
  • [PUBDEV-4287] - Documentation: Update GLM FAQ and missing_values_handling parameter regarding unseen categorical values

New Feature

Task

  • [PUBDEV-4180] - Wrap R examples in code so that they don't run on Mac OS
  • [PUBDEV-4215] - Export polygon function to fix CRAN note in h2o R package
  • [PUBDEV-4248] - Add a parameter that ignores the config file reader when h2o.init() is called

Improvement

  • [PUBDEV-4239] - Extend Watchdog client extension so cluster is also stopped when the client doesn't connect in specified timeout
  • [PUBDEV-4288] - Set hadoop user from h2odriver

Ueno (3.10.4.3) - 3/31/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/3/index.html

Bug

  • [PUBDEV-3281] - ARFF parser parses attached file incorrectly
  • [PUBDEV-4097] - Proxy warning message displays proxy with username and password.
  • [PUBDEV-4165] - h2o.import_sql_table works in R but on python gives error
  • [PUBDEV-4167] - java.lang.IllegalArgumentException with PCA
  • [PUBDEV-4187] - Impute does not handle catgoricals when values is specified
  • [PUBDEV-4219] - Increase number of bins in partial plots

New Feature

  • [PUBDEV-4162] - h2o.transform can produce incorrect aggregated sentence embeddings

Improvement

  • [PUBDEV-3858] - Errors with PCA on wide data for pca_method = Power
  • [PUBDEV-4102] - Introduce mode in which failure of H2O client ensures whole H2O clouds goes down
  • [PUBDEV-4178] - Add support for IBM IOP 4.2
  • [PUBDEV-4186] - Placeholder for: [SW-334]
  • [PUBDEV-4191] - Remove minor version from hadoop distribution in buildinfo.json file

Ueno (3.10.4.2) - 3/18/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/2/index.html

Bug

  • [PUBDEV-4119] - Deep Learning: mini_batch_size >>> 1 causes OOM issues
  • [PUBDEV-4135] - head(df) and tail(df) results in R are inconsistent for datetime columns
  • [PUBDEV-4144] - GLM with family = multinomial, intercept=false, and weights or SkipMissing produces error
  • [PUBDEV-4155] - glm hot fix: fix model.score0 for multinomial

New Feature

  • [PUBDEV-4133] - Add option to specify a port range for the Hadoop driver callback
  • [PUBDEV-4139] - Support reading MOJO from a classpath resource

Improvement

  • [PUBDEV-4056] - Arff Parser doesn't recognize spaces in @attribute
  • [PUBDEV-4099] - How to generate Precision Recall AUC (PRAUC) from the scala code

Docs

  • [PUBDEV-3977] - Documentation: Add documentation for word2vec
  • [PUBDEV-4118] - Documentation: Add topic for using with IBM Data Science Experience
  • [PUBDEV-4149] - Document "driverportrange" option of H2O's Hadoop driver

Ueno (3.10.4.1) - 3/3/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/1/index.html

Technical task

  • [PUBDEV-3943] - Documentation: Naive Bayes links to parameters section

Bug

  • [PUBDEV-3817] - Error in predict, performance functions caused by fold_column
  • [PUBDEV-3820] - Kmeans Centroid info not Rendered through Python API
  • [PUBDEV-3827] - PCA "Importance of Components" returns "data frame with 0 columns and 0 rows"
  • [PUBDEV-3866] - Stratified sampling does not split minority class
  • [PUBDEV-3885] - R Kmean's user_point doesn't get used
  • [PUBDEV-3903] - Setting -context_path doesn't change REST API path
  • [PUBDEV-3932] - K-means Training Metrics do not match Prediction Metrics with same data
  • [PUBDEV-3938] - h2o-py/tests/testdir_hdfs/pyunit_INTERNAL_HDFS_timestamp_date_orc.py failing
  • [PUBDEV-4017] - gradle update broke the build
  • [PUBDEV-4019] - H2O config (~/.h2oconfig) should allow user to specify username and password
  • [PUBDEV-4032] - Flow/R/Python - H2O cloudInfo should show if cluster is secured or not
  • [PUBDEV-4039] - FLOW fails to display custom models including Word2Vec
  • [PUBDEV-4040] - Import json module as different alias in Python API
  • [PUBDEV-4041] - Stacked Ensemble docstring example is broken
  • [PUBDEV-4042] - The autogen R bindings have an incorrect definition for the y argument
  • [PUBDEV-4047] - AIOOB while training an H2OKMeansEstimator
  • [PUBDEV-4065] - Fix bug in randomgridsearch and Fix intermittent pyunit_gbm_random_grid_large.py
  • [PUBDEV-4066] - Typos in Stacked Ensemble Python H2O User Guide example code
  • [PUBDEV-4073] - StackedEnsemble: stacking fails if combined with ignore_columns
  • [PUBDEV-4083] - AIOOB in GLM

New Feature

  • [PUBDEV-3852] - Documentation: Add Data Munging topic for file name globbing
  • [PUBDEV-4009] - Integration to add new top-level Plot menu to Flow
  • [PUBDEV-4038] - Add stddev to PDP computation

Task

  • [PUBDEV-3685] - Update h2o-py README
  • [PUBDEV-3797] - Generate Python API tests for H2O Cluster commands
  • [PUBDEV-3914] - Add documentation for python GroupBy class
  • [PUBDEV-3915] - Document python's Assembly and ConfusionMatrix classes, add python API tests as well
  • [PUBDEV-3937] - Clean up R docs
  • [PUBDEV-3986] - Documentation: Summarize the method for estimating k in kmeans and add to docs
  • [PUBDEV-4006] - Update links to Stacking on docs.h2o.ai
  • [PUBDEV-4021] - H2O config (~/.h2oconfig) should allow user to specify username and password
  • [PUBDEV-4067] - Check if strict_version_check is TRUE when checking for config file

Improvement

  • [PUBDEV-3781] - Documentation: Add info about sparse data support
  • [PUBDEV-3784] - h2o doc deeplearning: clarify what the (heuristics)defaults for auto are in categorical_encoding
  • [PUBDEV-3919] - Saving/serializing currently existing, detailed model information
  • [PUBDEV-3961] - Py/R: Remove unused 'cluster_id' parameter
  • [PUBDEV-3983] - Update GBM FAQ
  • [PUBDEV-3994] - Documentation: Add info about imputing data in Flow and in Data Manipulation
  • [PUBDEV-3998] - Documentation: Add instructions for running demos
  • [PUBDEV-4005] - AIOOB Exception with fold_column set with kmeans
  • [PUBDEV-4055] - Modify h2o#connect function to accept config with connect_params field
  • [PUBDEV-4059] - Change of h2o.connect(config) interface to support Steam

Tverberg (3.10.3.5) - 2/16/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tverberg/5/index.html

Bug

  • [PUBDEV-3848] - GLM with interaction parameter and cross-validation cause Exception
  • [PUBDEV-3916] - pca: hangs on attached data
  • [PUBDEV-3964] - StepOutOfRangeException when building GBM model
  • [PUBDEV-3976] - py unique() returns frame of integers (since epoch) instead of frame of unique dates
  • [PUBDEV-3979] - py date comparisons don't work for rows > 1
  • [PUBDEV-3980] - AstUnique drops column types
  • [PUBDEV-4013] - In R, the confusion matrix at the end doesn’t say: vertical: actual, across: predicted
  • [PUBDEV-4014] - AIOOB in GLM with hex.DataInfo.getCategoricalId(DataInfo.java:952) is the error with 2 fold cross validation
  • [PUBDEV-4036] - Parse fails when trying to parse large number of Parquet files
  • [HEXDEV-683] - POJO doesn't include Forest classes
  • [PUBDEV-4044] - moment producing wrong dates

Tverberg (3.10.3.4) - 2/3/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tverberg/4/index.html

Bug

  • [PUBDEV-3965] - Importing data in python returns error - TypeError: expected string or bytes-like object

Tverberg (3.10.3.3) - 2/2/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tverberg/3/index.html

Bug

  • [PUBDEV-3835] - Standard Errors in GLM: calculating and showing specifically when called

Improvement

Tverberg (3.10.3.2) - 1/31/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tverberg/2/index.html

Bug

  • Hotfix: Remove StackedEnsemble from Flow UI. Training is only supported from Python and R interfaces. Viewing is supported in the Flow UI.

Tverberg (3.10.3.1) - 1/30/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tverberg/1/index.html

Bug

  • [PUBDEV-2464] - Using asfactor() in Python client cannot allocate to a variable
  • [PUBDEV-3111] - R API's h2o.interaction() does not use destination_frame argument
  • [PUBDEV-3694] - Errors with PCA on wide data for pca_method = GramSVD which is the default
  • [PUBDEV-3742] - StackedEnsemble should work for regression
  • [PUBDEV-3865] - h2o gbm : for an unseen categorical level, discrepancy in predictions when score using h2o vs pojo/mojo
  • [PUBDEV-3883] - Negative indexing for H2OFrame is buggy in R API
  • [PUBDEV-3894] - Relational operators don't work properly with time columns.
  • [PUBDEV-3966] - java.lang.AssertionError when using h2o.makeGLMModel

Story

  • [PUBDEV-3739] - StackedEnsemble: put ensemble creation into the back end

New Feature

  • [PUBDEV-2058] - Implement word2vec in h2o
  • [PUBDEV-3635] - Ability to Select Columns for PDP computation in Flow
  • [PUBDEV-3881] - Add PCA Estimator documentation to Python API Docs
  • [PUBDEV-3902] - Documentation: Add information about Azure support to H2O User Guide (Beta)

Task

  • [PUBDEV-3336] - h2o.create_frame(): if randomize=True, `value` param cannot be used
  • [PUBDEV-3740] - REST: implement simple ensemble generation API
  • [PUBDEV-3843] - Modify R REST API to always return binary data
  • [PUBDEV-3844] - Safe GET calls for POJO/MOJO/genmodel
  • [PUBDEV-3864] - Import files by pattern
  • [PUBDEV-3884] - StackedEnsemble: Add to online documentation
  • [PUBDEV-3940] - Add Stacked Ensemble code examples to R docs

Improvement

  • [PUBDEV-3257] - Documentation: As a K-Means user, I want to be able to better understand the parameters
  • [PUBDEV-3741] - StackedEnsemble: add tests in R and Python to ensure that a StackedEnsemble performs at least as well as the base_models
  • [PUBDEV-3857] - Clean up the generated Python docs
  • [PUBDEV-3895] - Filter H2OFrame on pandas dates and time (python)
  • [PUBDEV-3912] - Provide way to specify context_path via Python/R h2o.init methods
  • [PUBDEV-3933] - Modify gen_R.py for Stacked Ensemble
  • [PUBDEV-3972] - Add Stacked Ensemble code examples to Python docstrings

Tutte (3.10.2.2) - 1/12/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tutte/2/index.html

Bug

Task

  • [PUBDEV-3816] - import functions required for r-release check

Tutte (3.10.2.1) - 12/22/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tutte/1/index.html

Bug

  • [PUBDEV-3291] - Summary() doesn't update stats values when asfactor() is applied
  • [PUBDEV-3498] - rectangular assign to a categorical column does not work (should be possible to assign either an existing level, or a new one)
  • [PUBDEV-3618] - Numerical Column Names in H2O and R
  • [PUBDEV-3690] - pred_noise_bandwidth parameter is not reproducible with seed
  • [PUBDEV-3723] - Fix mktime() referencing from 0 base to 1 base for month and day
  • [PUBDEV-3728] - Binary loss functions return error in GLRM
  • [PUBDEV-3747] - python hist() plotted bars overlap
  • [PUBDEV-3750] - Python set_levels doesn't change other methods
  • [PUBDEV-3753] - h2o doc: glm grid search hyper parameters missing/incorrect listing. Presently glrm's is marked as glm's
  • [PUBDEV-3764] - Partial Plot incorrectly calculates for constant categorical column
  • [PUBDEV-3778] - h2o.proj_archetypes returns error if constant column is dropped in GLRM model
  • [PUBDEV-3788] - GLRM loss by col produces error if constant columns are dropped
  • [PUBDEV-3796] - isna() overwrites column names
  • [PUBDEV-3812] - NullPointerException with Quantile GBM, cross validation, & sample_rate < 1
  • [PUBDEV-3819] - R h2o.download_mojo broken - writes a 1 byte file
  • [PUBDEV-3831] - Seed definition incorrect in R API for RF, GBM, GLM, NB
  • [PUBDEV-3834] - h2o.glm: get AIOOB exception with xval and lambda search

New Feature

  • [PUBDEV-3482] - Supporting GLM binomial model to allow two arbitrary integer values
  • [PUBDEV-3376] - Implement ISAX calculations per ISAX word
  • [PUBDEV-3377] - Optimizations and final fixes for ISAX
  • [PUBDEV-3664] - Implement GLM MOJO
  • [PUBDEV-3501] - Variance metrics are missing from GLRM that are available in PCA
  • [PUBDEV-3541] - py h2o.as_list() should not return headers
  • [PUBDEV-3715] - Modify sum() calculation to work on rows or columns
  • [PUBDEV-3737] - make sure that the generated R bindings work with StackedEnsemble
  • [PUBDEV-3833] - Add HDP 2.5 Support

Task

  • [PUBDEV-3012] - Remove grid.sort_by method in Python API
  • [PUBDEV-3695] - Documentation: Add GLM to list of algorithms that support MOJOs
  • [PUBDEV-3791] - Documentation: Add quasibinomomial family in GLM
  • [PUBDEV-3676] - Add SLURM cluster documentation
  • [PUBDEV-3692] - Add memory check for GLRM before proceeding
  • [PUBDEV-3765] - Check to make sure hinge loss works for GLRM
  • [PUBDEV-3803] - Add parameters from _upload_python_object to H2OFrame constructor
  • [PUBDEV-3804] - Refer to .h2o.jar.env when detaching R package
  • [PUBDEV-3805] - Call on proper port when exiting R/detaching package
  • [PUBDEV-3806] - Modify search for config file in R api
  • [PUBDEV-3818] - properly handle url in R docs from autogen

Improvement

  • [PUBDEV-3256] - Documentation: As a GLM user, I want to be able to better understand the parameters
  • [PUBDEV-3758] - Fix bad/inconsistent/empty categorical (bitset) splits for DRF/GBM
  • [PUBDEV-3793] - Auto-generate R bindings

Turnbull (3.10.1.2) - 12/14/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turnbull/2/index.html

Bug

  • [PUBDEV-2801] - Starting h2o server from R ignores IP and port parameters
  • [PUBDEV-3484] - Treat 1-element numeric list as acceptable when numeric input required
  • [PUBDEV-3509] - h2o's cor() breaks R's native cor()
  • [PUBDEV-3592] - h2o.get_grid isn't working
  • [PUBDEV-3607] - `cor` function should properly pass arguments
  • [PUBDEV-3629] - Avoid confusing error message when column name is not found.
  • [PUBDEV-3631] - overwrite_with_best_model fails when using checkpoint
  • [PUBDEV-3633] - plot.h2oModel in R no longer supports metrics with uppercase names (e.g. AUC)
  • [PUBDEV-3642] - Fix citibike R demo
  • [PUBDEV-3697] - Create an Attribute for Number of Interal Trees in Python
  • [PUBDEV-3704] - Error with early stopping and score_tree_interval on GBM
  • [PUBDEV-3735] - Python's coef() and coef_norm() should use column name not index
  • [PUBDEV-3757] - Perfbar does not work for hierarchical path passed via -h2o_context

New Feature

  • [PUBDEV-3474] - Show Partial Dependence Plots in Flow
  • [PUBDEV-3620] - Allow setting nthreads > 255.
  • [PUBDEV-3700] - Add RMSE, MAE, RMSLE, and lift_top_group as stopping metrics
  • [PUBDEV-3719] - Update h2o.mean in R to match Python API

Task

  • [PUBDEV-3579] - Document Partial Dependence Plot in Flow
  • [PUBDEV-3621] - Add R endpoint for cumsum, cumprod, cummin, and cummax
  • [PUBDEV-3649] - Modify correlation matrix calculation to match R
  • [PUBDEV-3657] - Remove max_confusion_matrix_size from booklets & py doc

Improvement

  • [HEXDEV-645] - aggregator should calculate domain for enum columns in aggregated output frames & member frames based on current output or member frame
  • [HEXDEV-658] - Naive Bayes (and maybe GLM): Drop limit on classes that can be predicted (currently 1000)
  • [PUBDEV-3625] - Speed up GBM and DRF
  • [PUBDEV-3756] - Support `-context_path` to change servlet path for REST API

IT Help

Turing (3.10.0.10) - 11/7/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/10/index.html

Bug

  • [PUBDEV-3484] - Treat 1-element numeric list as acceptable when numeric input required
  • [PUBDEV-3675] - Cannot determine file type

Turing (3.10.0.9) - 10/25/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/9/index.html

Bug

  • [PUBDEV-3546] - h2o.year() method does not return year
  • [PUBDEV-3559] - Regression Training Metrics: Deviance and MAE were swapped
  • [PUBDEV-3568] - h2o.max returns NaN even when na.rf condition is set to TRUE
  • [PUBDEV-3593] - Fix display of array-valued entries in TwoDimTables such as grid search results

Improvement

  • [PUBDEV-3585] - Optimize algorithm for automatic estimation of K for K-Means
  • [HEXDEV-646] - include flow, /3/ API accessible Aggregator model in h2o-3

Turing (3.10.0.8) - 10/10/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/8/index.html

Technical task

Bug

  • [PUBDEV-3384] - S3 API method PersistS3#uriToKey breaks expected contract
  • [PUBDEV-3437] - GLM multinomial with defaults fails on attached dataset
  • [PUBDEV-3441] - .structure() encounters list index out of bounds when nan is encountered in column
  • [PUBDEV-3455] - max_active_predi tors option in glm does not work anymore
  • [PUBDEV-3461] - Printed PCA model metrics in R is missing
  • [PUBDEV-3477] - R - Unnecessary JDK requirement on Windows
  • [PUBDEV-3505] - uuid columns with mostly missing values causes parse to fail.
  • [HEXDEV-599] - Fold Column not available in h2o.grid

New Feature

  • [PUBDEV-1943] - Compute partial dependence data
  • [PUBDEV-3422] - Create Method to Return Columns of Specific Type
  • [PUBDEV-3491] - Find optimal number of clusters in K-Means
  • [PUBDEV-3492] - Add optional categorical encoding schemes for GBM/DRF

Task

  • [PUBDEV-3327] - Tasks for completing MOJO support
  • [PUBDEV-3444] - Ensure functions have `h2o.*` alias in R API

Improvement

  • [PUBDEV-3465] - Sync up functionality of download_mojo and download_pojo in R & Py
  • [PUBDEV-3499] - Improve the stopping criterion for K-Means Lloyds iterations
  • [HEXDEV-596] - Encryption of H2O communication channels
  • [HEXDEV-636] - add option to Aggregator model to show ignored columns in output frame

Turing (3.10.0.7) - 9/19/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/7/index.html

Bug

  • [PUBDEV-3300] - NPE during categorical encoding with cross-validation (Windows 8 runit only??)
  • [PUBDEV-3306] - H2OFrame arithmetic/statistical functions return inconsistent types
  • [PUBDEV-3315] - Multi file parse fails with NPE
  • [PUBDEV-3374] - h2o.hist() does not respect breaks
  • [PUBDEV-3401] - importFiles, with s3n, gives NullPointerException
  • [PUBDEV-3409] - Python Structure() Breaks When Applied to Entire Dataframe

New Feature

  • [PUBDEV-2707] - Diff operation on column in H2O Frame
  • [HEXDEV-619] - calculate residuals in h2o-3 and in flow and create a new frame with a new column that contains the residuals

Task

Improvement

  • [PUBDEV-3296] - In R, allow x to be missing (meaning take all columns except y) for all supervised algo's
  • [PUBDEV-3329] - median() should return a list of medians from an entire frame
  • [PUBDEV-3334] - Conduct rbind and cbind on multiple frames
  • [PUBDEV-3387] - Add argument to H2OFrame.print in R to specify number of rows
  • [PUBDEV-3418] - Suppress chunk summary in describe()

Turing (3.10.0.6) - 8/25/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/6/index.html

Bug

  • [HEXDEV-608] - Hashmap in H2OIllegalArgumentException fails to deserialize & throws FATAL
  • [PUBDEV-2879] - NPE in MetadataHandler
  • [PUBDEV-3086] - hist() fails for constant numeric columns
  • [PUBDEV-3173] - Client mode: flatfile requires list of all nodes, but a single entry node should be sufficient
  • [PUBDEV-3207] - Make CreateFrame reproducible for categorical columns.
  • [PUBDEV-3208] - Fix intermittency of categorical encoding via eigenvector.
  • [PUBDEV-3211] - isBitIdentical is returning true for two Frames with different content
  • [PUBDEV-3222] - AssertionError for DL train/valid with categorical encoding
  • [PUBDEV-3237] - Wrong MAE for observation weights other than 1.
  • [PUBDEV-3244] - H2ODriver for CDH5.7.0 does not accept memory settings
  • [PUBDEV-3276] - H2OFrame.drop() leaves the frame in inconsistent state

New Feature

  • [PUBDEV-3007] - Implement skewness calculation for H2O Frames
  • [PUBDEV-3008] - Implement kurtosis calculation for H2O Frames
  • [PUBDEV-3128] - Add ability to do a deep copy in Python API
  • [PUBDEV-3163] - Add docs for h2o.make_metrics() for R and Python
  • [PUBDEV-3218] - Add RMSLE to model metrics
  • [PUBDEV-3264] - Return unique values of a categorical column as a Pythonic list

Task

  • [PUBDEV-3235] - Refactor and simplify implementation of Pearson Correlation
  • [PUBDEV-3238] - Add MAE to CV Summary

Improvement

  • [PUBDEV-2702] - Create h2o.* functions for H2O primitives
  • [PUBDEV-3098] - Add methods to get actual and default parameters of a model
  • [PUBDEV-3132] - Add ability to drop a list of columns or a subset of rows from an H2OFrame
  • [PUBDEV-3138] - Ensure all is*() functions return a list

Turing (3.10.0.3) - 7/29/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/3/index.html

Bug

  • [PUBDEV-2805] - Error when setting a string column to a single value in R/Py
  • [PUBDEV-2965] - R h2o.merge() ignores by.x and by.y
  • [PUBDEV-3135] - Download Logs broken URL from Flow

New Feature

  • [PUBDEV-2958] - H2O Version Check
  • [PUBDEV-3022] - Add an h2o.concat function equivalent to pandas.concat
  • [PUBDEV-3050] - Add Huber loss function for GBM and DL (for regression)
  • [PUBDEV-3071] - Add RMSE to model metrics
  • [PUBDEV-3104] - Add Mean Absolute Error to Model Metrics
  • [PUBDEV-3108] - Add mean absolute error to scoring history and model plotting
  • [PUBDEV-3116] - Add categorical encoding schemes for DL and Aggregator
  • [PUBDEV-3155] - Compute supervised ModelMetrics from predicted and actual values in Java/R
  • [PUBDEV-3162] - Compute supervised ModelMetrics from predicted and actual values in Python

Improvement

  • [PUBDEV-1888] - Implement gradient checking for DL
  • [PUBDEV-2627] - Add better warning message to functions of H2OModelMetrics objects
  • [PUBDEV-3021] - Add demo datasets to Python package
  • [PUBDEV-3113] - Replace "MSE" with "RMSE" in scoring history table
  • [PUBDEV-3122] - Make all TwoDimTable Headers Pythonic in R and Python API
  • [PUBDEV-3129] - Achieve consistency between DL and GBM/RF scoring history in regression case
  • [PUBDEV-3131] - Disable R^2 stopping criterion in tree model builders
  • [PUBDEV-3149] - Remove R^2 from all model output except GLM

Turin (3.8.3.4) - 7/15/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turin/4/index.html

Bug

  • [PUBDEV-3040] - File parse from S3 extremely slow
  • [PUBDEV-3145] - Fix Deep Learning POJO for hidden dropout other than 0.5

Turin (3.8.3.2) - 7/1/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turin/2/index.html

Bug

  • [PUBDEV-898] - DRF: sample_rate=1 not permitted unless validation is performed
  • [PUBDEV-2087] - create a set of tests which create large POJOs for each algo and compiles them
  • [PUBDEV-2322] - Merge (method="radix") bug1
  • [PUBDEV-2325] - Merge (method="radix") bug2
  • [PUBDEV-2565] - Fold Column not available in h2o.grid
  • [PUBDEV-2964] - h2o.merge(,method="radix") failing 15/40 runs
  • [PUBDEV-3030] - Parse: java.lang.IllegalArgumentException: 0 > -2147483648
  • [PUBDEV-3032] - Cached errors are not printed if H2O exits
  • [PUBDEV-3072] - java.lang.ClassCastException for Quantile GBM
  • [PUBDEV-3077] - model_summary number of trees is too high for multinomial DRF/GBM models
  • [PUBDEV-3079] - NPE when accessing invalid null Frame cache in a Frame's vecs()
  • [PUBDEV-3081] - TwoDimTable version of a Frame prints missing value (NA) as 0
  • [PUBDEV-3089] - Fix tree split finding logic for some cases where min_rows wasn't satisfied and the entire column was no longer considered even if there were allowed split points
  • [PUBDEV-3093] - saveModel and loadModel don't work with windows c:/ paths
  • [PUBDEV-3095] - getStackTrace fails on NumberFormatException
  • [PUBDEV-3096] - TwoDimTable for Frame Summaries doesn't always show the full precision
  • [PUBDEV-3097] - DRF OOB scoring isn't using observation weights
  • [PUBDEV-3099] - AIOOBE when calling 'getModel' in Flow while a GLM model is training

Task

  • [PUBDEV-2681] - Properly document the addition of missing_values_handling arg to GLM

Improvement

  • [PUBDEV-1617] - Matt's new merge (aka join) integrated into H2O
  • [PUBDEV-2822] - Improved handling of missing values in tree models (training and testing)
  • [PUBDEV-3060] - IPv6 documentation
  • [PUBDEV-3066] - Stop GBM models once the effective learning rate drops below 1e-6.
  • [PUBDEV-3094] - Log input parameters during boot of H2O

Turchin (3.8.2.9) - 6/10/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/9/index.html

Bug

  • [PUBDEV-2920] - Python apply() doesn't recognize % (modulo) within lambda function
  • [PUBDEV-2940] - Documentation: Add RoundRobin histogram_type to GBM/DRF
  • [PUBDEV-2957] - Add "seed" option to GLM in documentation
  • [PUBDEV-2973] - Documentation: Update supported Hadoop versions
  • [PUBDEV-2981] - Models hang when max_runtime_secs is too small
  • [PUBDEV-2982] - Default min/max_mem_size to gigabytes in h2o.init
  • [PUBDEV-2997] - Add "ignore_const_cols" argument to glm and gbm for Python API
  • [PUBDEV-2999] - AIOOBE in GBM if no nodes are split during tree building
  • [PUBDEV-3004] - Negative R^2 (now NaN) can prevent early stopping
  • [PUBDEV-3011] - Two grid sorting methods in Py API - only one works sometimes

New Feature

Task

  • [PUBDEV-3005] - Verify checkpoint argument in h2o.gbm (for R)

Improvement

  • [PUBDEV-2040] - Sync up argument names in `h2o.init` between R and Python
  • [PUBDEV-2996] - Change `getjar` to `get_jar` in h2o.download_pojo in R
  • [PUBDEV-2998] - Change min_split_improvement default value from 0 to 1e-5 for GBM/DRF
  • [PUBDEV-3013] - Allow specification of "AUC" or "auc" or "Auc" for stopping_metrics, sorting of grids, etc.

Turchin (3.8.2.8) - 6/2/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/8/index.html

Bug

  • [PUBDEV-2985] - Make Random grid search consistent between clients for same parameters
  • [PUBDEV-2987] - Allow learn_rate_annealing to be passed to H2OGBMEstimator constructor in Python API
  • [PUBDEV-2989] - Fix typo in GBM/DRF Python API for col_sample_rate_change_per_level - was misnamed and couldn't be set

New Feature

  • [PUBDEV-2979] - Add a new metric: mean misclassification error for classification models

Improvement

  • [PUBDEV-2972] - No longer print negative R^2 values - show NaN instead
  • [PUBDEV-2984] - Add xval=True/False as an option to model_performance() in Python API

Turchin (3.8.2.6) - 5/24/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/6/index.html

Bug

  • [PUBDEV-1899] - Number of active predictors is off by 1 when Intercept is included
  • [PUBDEV-2942] - GLM with cross-validation AIOOBE (+ Grid-Search + Multinomial, may be related)
  • [PUBDEV-2943] - Improved accuracy for histogram_type="QuantilesGlobal" for DRF/GBM

New Feature

  • [PUBDEV-1705] - GLM needs 'seed' argument for new (random) implementation of n-folds
  • [PUBDEV-2743] - Add seed argument to GLM

Improvement

  • [PUBDEV-2928] - Remove _Dev from file name _DataScienceH2O-Dev
  • [PUBDEV-2945] - Clean up overly long and duplicate error message in KeyV3
  • [PUBDEV-2953] - Allow the user to pass column types of an existing H2OFrame during Parse/Upload in R and Python
  • [PUBDEV-2954] - Tweak Parser Heuristic
  • [PUBDEV-2955] - GLM improvements and fixes

Turchin (3.8.2.5) - 5/19/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/5/index.html

Technical task

Bug

  • [PUBDEV-2282] - DRF: cannot compile pojo
  • [PUBDEV-2304] - GBM pojo compile failures
  • [PUBDEV-2878] - Bug in h2o-py H2OScaler.inverse_transform()
  • [PUBDEV-2880] - Add NAOmit() to Rapids
  • [PUBDEV-2897] - AIOOBE in Vec.factor (due to Parse bug?)
  • [PUBDEV-2903] - In grid search, max_runtime_secs without max_models hangs
  • [PUBDEV-2933] - GBM's fold_assignment = "Stratified" breaks with missing values in response column

New Feature

  • [PUBDEV-2729] - Implement h2o.relevel, equivalent of base R's relevel function
  • [PUBDEV-2857] - Add Kerberos authentication to Flow
  • [PUBDEV-2893] - Summaries Fail in rdemo.citi.bike.small.R
  • [PUBDEV-2895] - DimReduction for EasyModelAPI
  • [PUBDEV-2915] - Make histograms truly adaptive (quantiles-based) for DRF/GBM

Task

Improvement

  • [PUBDEV-2905] - Improve the progress bar based on max_runtime_secs & max_models & actual work
  • [PUBDEV-2908] - Improve GBM/DRF reproducibility for fixed parameters and hardware
  • [PUBDEV-2911] - Check sanity of random grid search parameters (max_models and max_runtime_secs)
  • [PUBDEV-2912] - Add Job's remaining time to Flow
  • [PUBDEV-2919] - Add enum option 'histogram_type' to DRF/GBM (and remove random_split_points)
  • [PUBDEV-2923] - JUnit: Separate POJO namespace during junit testing

Turchin (3.8.2.3) - 4/25/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/3/index.html

Bug

  • [PUBDEV-2852] - Incorrect sparse chunk getDoubles() extraction

New Feature

  • [PUBDEV-2825] - Create h2o.get_grid
  • [PUBDEV-2834] - Implement distributed Aggregator for visualization
  • [PUBDEV-2835] - Add col_sample_rate_change_per_level for GBM/DRF
  • [PUBDEV-2836] - Add learn_rate_annealing for GBM
  • [PUBDEV-2837] - Add random cut points for histograms in DRF/GBM (ExtraTreesClassifier)
  • [PUBDEV-2851] - Add limit on max. leaf node contribution for GBM

Task

  • [PUBDEV-2848] - Add tests for early stopping logic (stopping_rounds > 0)

Improvement

  • [PUBDEV-2877] - Make NA split decisions internally more consistent

Turchin (3.8.2.2) - 4/8/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/2/index.html

Bug

  • [PUBDEV-2820] - Implement max_runtime_secs to limit total runtime of building GLM models with and without cross-validation enabled

New Feature

  • [PUBDEV-2815] - Add stratified sampling per-tree for DRF/GBM

Turchin (3.8.2.1) - 4/7/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/1/index.html

Bug

  • [PUBDEV-2766] - AIOOBE for quantile regression with stochastic GBM
  • [PUBDEV-2770] - Naive Bayes AIOOBE
  • [PUBDEV-2772] - AIOOBE for GBM if test set has different number of classes than training set
  • [PUBDEV-2775] - Number of CPUs incorrect in Flow when using a hypervisor
  • [PUBDEV-2796] - Grid search runtime isn't enforced for CV models
  • [PUBDEV-2819] - AIOOBE in GLM for dense rows in sparse data

New Feature

  • [PUBDEV-2540] - Compute and display statistics of cross-validation model metrics
  • [PUBDEV-2774] - Add keep_cross_validation_fold_assignment and more CV accessors
  • [PUBDEV-2776] - Set initial weights and biases for DL models
  • [PUBDEV-2791] - Control min. relative squared error reduction for a node to split (DRF/GBM)
  • [PUBDEV-2806] - On-the-fly interactions for GLM
  • [PUBDEV-2815] - Add stratified sampling per-tree for DRF/GBM

Task

  • [PUBDEV-2055] - Create test cases to show that POJO prediction behavior can be different than in-h2o-model prediction behavior

Improvement

  • [PUBDEV-2620] - Populate start/end/duration time in milliseconds for all models
  • [PUBDEV-2695] - Consistent handling of missing categories in GBM/DRF (and between H2O and POJO)
  • [PUBDEV-2736] - Alert the user if columns can't be histogrammed due to numerical extremities
  • [PUBDEV-2756] - GLM should generate error if user enter an alpha value greater than 1.
  • [PUBDEV-2763] - Create full holdout prediction frame for cross-validation predictions
  • [PUBDEV-2769] - Support Validation Frame and Cross-Validation for Naive Bayes
  • [PUBDEV-2810] - Add class_sampling_factors argument to DRF/GBM for R and Python APIs

Turan (3.8.1.4) - 3/16/16

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turan/4/index.html

Bug

  • [PUBDEV-542] - KMeans: Size of clusters in Model Output is different from the labels generated on the training set
  • [PUBDEV-1976] - GLM fails on negative alpha
  • [PUBDEV-2718] - countmatches bug
  • [PUBDEV-2727] - bug in processTables in communication.R
  • [PUBDEV-2742] - Allow strings to be set to NA

New Feature

  • [PUBDEV-2719] - Implement Shannon entropy for a string
  • [PUBDEV-2720] - Implement proportion of substrings that are valid English words
  • [PUBDEV-2733] - Add utility function, h2o.ensemble_performance for ensemble and base learner metrics
  • [PUBDEV-2741] - Add date/time and string columns to createFrame.

Task

  • [PUBDEV-58] - Certify sparkling water on CDH5.2

Improvement

  • [PUBDEV-277] - Make python equivalent of as.h2o() work for numpy array and pandas arrays

Turan (3.8.1.3) - 3/6/16

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turan/3/index.html

Bug

  • [PUBDEV-2644] - Collinear columns cause NPE for P-values computation
  • [PUBDEV-2721] - Update default values in h2o.glm.wrapper from -1 and NaN to NULL
  • [PUBDEV-2722] - AIOOBE in NewChunk

New Feature

  • [PUBDEV-2111] - Hive UDF form for Scoring Engine POJO for H2O Models

Turan (3.8.1.2) - 3/4/16

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turan/2/index.html

Bug

New Feature

  • [PUBDEV-2711] - Allow DL models to be pretrained on unlabeled data with an autoencoder

Improvement

  • [PUBDEV-2708] - H2O Flow does not contain CodeMirror library
  • [PUBDEV-2710] - Model export fails: parent directory does not exist
  • [PUBDEV-2712] - Flow doesn't show DL AE error (MSE) plot
  • [PUBDEV-2717] - Do not compute expensive quantiles during h2o.summary call

Turan (3.8.1.1) - 3/3/16

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turan/1/index.html

Technical task

  • [PUBDEV-2705] - implement random (stochastic) hyperparameter search

Bug

  • [PUBDEV-2639] - Parse: Incorrect assertion error caused by very large few column data
  • [PUBDEV-2649] - h2o::|,& operator handles NA's differently than base::|,&
  • [PUBDEV-2655] - h2o::as.logical behavior is different than base::as.logical
  • [PUBDEV-2682] - Importing CSV file is not working with "java -jar h2o.jar -nthreads -1"
  • [PUBDEV-2685] - Allow DL reproducible mode to work with user-given train_samples_per_iteration >= 0
  • [PUBDEV-2690] - Grid Search NPE during Flow display after grid was cancelled
  • [PUBDEV-2693] - NPE in initialMSE computation for GBM
  • [PUBDEV-2696] - DL checkpoint restart doesn't honor a change in stopping_rounds

New Feature

  • [PUBDEV-1883] - Add option to train with mini-batch updates for DL
  • [PUBDEV-2698] - Return leaf node assignments for DRF + GBM

Improvement

  • [PUBDEV-2674] - Change default functionality of as_data_frame method in Py H2O
  • [PUBDEV-2697] - Add method setNames for setting column names on H2O Frame
  • [PUBDEV-2703] - NPE in Log.write during cluster shutdown

Tukey (3.8.0.6) - 2/23/16

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tukey/6/index.html

Enhancements

The following changes are improvements to existing features (which includes changed default values):

System
  • PUBDEV-2362: Handling Sparsity with Missing Values
  • PUBDEV-2683: Fix for erroneous conversion of NaNs to zeros during rebalancing
  • PUBDEV-2684: Remove bigdata test file (not available)

Bug Fixes

The following changes resolve incorrect software behavior:

Algorithms
  • PUBDEV-2678: CV models during grid search get overwritten
R
  • PUBDEV-2648: Di/trigamma handle NA
  • PUBDEV-2679: Progress bar for grid search with N-fold CV is wrong when max_models is given

Tukey (3.8.0.1) - 2/10/16

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tukey/1/index.html

New Features

These changes represent features that have been added since the previous release:

API
  • PUBDEV-1798: Ability to conduct a randomized grid search with optional limit of max. number of models or max. runtime
  • PUBDEV-1822: Add score_tree_interval to GBM to score every n'th tree
  • PUBDEV-2311: Make it easy for clients to sort by model metric of choice
  • PUBDEV-2548: Add ability to set a maximum runtime limit on all models
  • PUBDEV-2632: Return a grid search summary as a table with desired sort order and metric
Algorithms
  • HEXDEV-495: Added ability to calculate GLM p-values for non-regularized models
  • PUBDEV-853: Implemented gain/lift computation to allow using predicted data to evaluate the model performance
  • PUBDEV-2118: Compute the lift metric for binomial classification models
  • PUBDEV-2212: Add absolute loss (Laplace distribution) to GBM and Deep Learning
  • PUBDEV-2402: Add observations weights to quantile computation
  • PUBDEV-2469: For GBM/DRF, add ability to pick columns to sample from once per tree, instead of at every level
  • PUBDEV-2594: Quantile regression for GBM and Deep Learning
  • PUBDEV-2625: Add recall and specificity to default ROC metrics
Python
  • HEXDEV-399: Added support for Python 3.5 and better (in addition to existing support for 2.7 and better)

Enhancements

The following changes are improvements to existing features (which includes changed default values):

Algorithms
  • PUBDEV-2233: Adjust string substitution and global string substitution to do in place updates on a string column.
Python
  • PUBDEV-1981: Fix layout issues of Python docs.
  • PUBDEV-2335: as.numeric for a string column only converts strings to ints rather than reals
  • PUBDEV-2257: Table printout in Python doesn't warn the user about truncation
  • PUBDEV-2460: Version mismatch message directs user to get a matching download
  • HEXDEV-527: Implement secure Python h2o.init
  • PUBDEV-2504: Check and print a warning if a proxy environment variable is found
R
  • PUBDEV-2335: as.numeric for a string column only converts strings to ints rather than reals
  • PUBDEV-2257: Table printout in R doesn't warn the user about truncation
  • PUBDEV-2430: Improve R's reporting on quantiles
  • PUBDEV-2460: Version mismatch message directs user to get a matching download
Flow
  • PUBDEV-2407: Improve model convergence plots in Flow
  • PUBDEV-2596: Flow shows empty logloss box for regression models
  • PUBDEV-2617: Flow's histogram doesn't cover the full support
System
  • HEXDEV-436: exportFile should be a real job and have a progress bar
  • PUBDEV-2459: Improve parse chunk size heuristic for better use of cores on small data sets
  • PUBDEV-2606: Print all columns to stdout for Hadoop jobs for easier debugging

Bug Fixes

The following changes resolve incorrect software behavior:

API
  • PUBDEV-2633: Ability to extend grid searches with more models
Algorithms
  • PUBDEV-1867: GLRM with Simplex Fails with Infinite Objective
  • PUBDEV-2114: Set GLM to give error when lower bound > upper bound in beta contraints
  • PUBDEV-2190: Set GLM to default to a value of rho = 0, if rho is not provided when beta constraints are used
  • PUBDEV-2210: Add check for epochs value when using checkpointing in deep learning
  • PUBDEV-2241: Set warnings about slowness from wide column counts comes before building a model, not after
  • PUBDEV-2278: Fix docstring reporting in iPython
  • PUBDEV-2366: Fix display of scoring speed for autoencoder
  • PUBDEV-2426: GLM gives different std. dev. and means than expected
  • PUBDEV-2595: Bad (perceived) quality of DL models during cross-validation due to internal weights handling
  • PUBDEV-2626: GLM with weights gives different answer h2o vs R
Python
  • PUBDEV-2319: sd not working inside group_by
  • PUBDEV-2403: Parser reads file of empty strings as 0 rows
  • PUBDEV-2404: Empty strings in Python objects parsed as missing
R
  • PUBDEV-2319: sd not working inside group_by
  • PUBDEV-2231: Fix bug in summary when zero-count categoricals were present.
  • PUBDEV-1749: Fix h2o.apply to correctly handle functions (so long as functions contain only H2O supported primitives)
System
  • PUBDEV-1872: Ability to ignore 0-byte files during parse
  • PUBDEV-2401: /Jobs fails if you build a Model and then overwrite it in the DKV with any other type
  • PUBDEV-2603: Improve progress bar for grid/hyper-param searches

Tibshirani (3.6.0.9) - 12/7/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tibshirani/9/index.html

New Features

These changes represent features that have been added since the previous release:

API
  • PUBDEV-2189: H2O now allows selection of the non_negative flag in GLM for R and Python
Algorithms
R
  • PUBDEV-2079: R now retrieves column types for a H2O Frame more efficiently
Python

Enhancements

The following changes are improvements to existing features (which includes changed default values):

Algorithms
  • GitHub commit: Change in behavior in GLM beta constraints - when ignoring constant/bad columns, remove them from beta_constraints as well
  • GitHub commit: Added ignore_const_cols to all algos
  • PUBDEV-2311: Improved ability to sort by model metric of choice in client
Python
  • PUBDEV-2409: H2O now checks for H2O_DISABLE_STRICT_VERSION_CHECK env variable in Python GitHub commit
  • GitHub commit: H2O now allows l/r values to be null or an empty string
  • GitHub commit: H2O now accomodates LOAD_FAST and LOAD_GLOBAL in bytecode_to_ast
R
  • PUBDEV-1378: In R, h2o.getTimezone() previously returned a list of one, now it just returns the string
System
  • GitHub commit: Added more tweaks to help various low-memory configurations

Bug Fixes

The following changes resolve incorrect software behavior:

API
  • PUBDEV-2042: h2o.grid failed when REST API version was not default
  • PUBDEV-2401: /Jobs failed if you built a Model and then overwrote it in the DKV with any other type GitHub commit
  • PUBDEV-2392: /3/Jobs failed with exception after running /3/SplitFrame
  • GitHub commit: PUBDEV-2426 - Fixed error where sd and mean were adjusted to weights even if no observation weights were passed
Algorithms
  • PUBDEV-2396: GLRM validation frames must have the same number of rows as the training frame
  • PUBDEV-2053: Fixed assertion failure in Deep Learning
  • PUBDEV-2315: Could not compile POJO using K-means
  • PUBDEV-2317: Could not compile POJO using PCA
  • PUBDEV-2320: Could not compile POJO using Naive Bayes
  • GitHub commit: Fixed weighted mean and standard deviation computation in GLM
  • GitHub commit: Fixed stopping criteria for lambda search and multinomial in GLM
Python
R
  • PUBDEV-1749: h2o.apply did not correctly handle functions
  • PUBDEV-2335: R: as.numeric for a string column only converted strings to ints rather than reals
  • PUBDEV-2319: R: sd was not working inside group_by
  • PUBDEV-2397: R: Ignore Constant Columns was not an argument in Algos in R like it is in Flow
  • PUBDEV-2134: When a dataset was sliced, the int mapping of enums was returned
  • PUBDEV-2408: Improved handling when H2O has already been shutdown in R GitHub commit
  • PUBDEV-2231: Fixed categorical levels mapping bug
System

Tibshirani (3.6.0.7) - 11/23/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tibshirani/7/index.html

Enhancements

The following changes are improvements to existing features (which includes changed default values):

Algorithms
  • GitHub commit: Added Iterations and Epochs to DL job status updates, added Iterations to scoring history
  • GitHub commit: Cleaned up iteration counter to work for checkpointing
  • GitHub commit: Cleaned up counter iteration logic

Bug Fixes

The following changes resolve incorrect software behavior:

Algorithms
  • GitHub commit: Fixed scoring speed display for autoencoder, was showing 0 because wrong runtime was used (ms since 1970 instead of actual runtime)

Tibshirani (3.6.0.2) - 11/5/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tibshirani/2/index.html

New Features

Algorithms
  • GitHub commit: Added support for grid search
  • PUBDEV-2272: Implemented GLRM grid search in R and Python
  • GitHub commit: PUBDEV-2289: Enabled early convergence-based stopping by default for Deep Learning
  • GitHub commit: Added L1+LBFGS solver for multinomial GLM
Python
  • GitHub commit: PUBDEV-2289: Added Python API for convergence-based stopping
R
  • GitHub commit: Added .Last to Delete InitID
  • GitHub commit: PUBDEV-2289: Enabled convergence-based early stopping for R API of Deep Learning

Enhancements

Algorithms
  • GitHub commit: Enable grid search for Deep Learning parameters overwrite_with_best_model, momentum_ramp, elastic_averaging, elastic_averaging_moving_rate, & elastic_averaging_regularization
  • GitHub commit: PUBDEV-2289: Stopping tolerance and stopping metric are no longer hidden if stopping_rounds is 0
  • GitHub commit: Added checks to verify the mean, median, nrow, var, and sd are calculated correctly in groupby
  • GitHub commit: mean and sd now return lists
Python
  • GitHub commit: [PUBDEV-2257] H2O now gives users [row x col] of Frame in __str__
  • GitHub commit: sd/var is now sampled for group_by
  • GitHub commit: Parameter checking is now split between float and strings/unicode
  • GitHub commit: H2O now only wipes src._ex if src_in_self
  • GitHub commit: Refactored default arg handling in astfun
  • GitHub commit: Added new parameters to estimators
  • GitHub commit: Added session start/end; Python now ends the session on exit
  • GitHub commit: src and self types are now checked for None
  • GitHub commit: H2O now passes caches through all prefix ops
  • GitHub commit: H2O now pushes cached types, names, and ncols forward if possible
R
System
  • HEXDEV-475: Added EasyPOJO comments and improvements
  • GitHub commit: [PUBDEV-2204] Enabled Vec#toCategoricalVec to convert string columns to categorical columns
  • GitHub commit: apply now works in

Bug Fixes

Algorithms
Python
R
  • GitHub commit: [PUBDEV-2301, PUBDEV-2314] Hidden grid parameter was passed incorrectly from R
  • GitHub commit: H2O now uses deep copy when using assign from one global to another
  • GitHub commit: Fixed getFrame and directory unlink
System

Slotnick (3.4.0.1)

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-slotnick/1/index.html

New Features

API
Algorithms
  • GitHub commit: Added option in PCA to use randomized subspace iteration method for calculation
  • GitHub commit: Deep Learning: Added target_ratio_comm_to_comp to R and Python client APIs
  • GitHub commit: PUBDEV-1247: Added stochastic GBM parameters (sample_rate and col_sample_rate) to R/Py APIs
  • PUBDEV-1450: GLRM has been tested and removed from "experimental" status
Hadoop
Python
R

This software release introduces changes to the R API that may cause previously written R scripts to be inoperable. For more information, refer to the following link.

  • GitHub commit: Added h2o.getTypes() to the R wrapper
  • GitHub commit: Added ability to set col.types with a named list
  • GitHub commit: Added h2o.getId() to get the back-end distributed key/value store ID from a Frame
  • GitHub commit: Added column types to H2O frame in R, which allows R to set the correct column types when as.data.frame() is used on an H2O frame
  • GitHub commit: Added @export for exported R functions
System
  • GitHub commit: Added string length util for Enum columns
  • [GitHub commit: Added pass-through version of toCategoricalVec(), toNumericVec(), and toStringVec() to Vec.java for code simplicity and backwards compatibility
  • GitHub commit: Added string column handling to StrSplit()
Web UI

Enhancements

Algorithms
  • PUBDEV-467: Show Frames for DL weights/biases in Flow
  • PUBDEV-1847: DRF/GBM: nbins_top_level is now configurable
  • GitHub commit: Deep Learning: Scoring time is now shown in the logs
  • GitHub commit: Sped up GBM split finding by dynamically switching between single and multi-threaded based on workload
  • PUBDEV-1247: Implemented Stochastic GBM
  • GitHub commit: Parallelized split finding for GBM/DRF (useful for large numbers of columns and nbins).
  • GitHub commit: Added improvements to speed up DRF (up to 35% faster) and stochastic GBM (up to 5x faster)
  • GitHub commit: Added some straight-forward optimizations for GBM histogram building
  • GitHub commit: GLRM is now deterministic between one vs. many chunks
  • GitHub commit: Input parameters are now immutable
  • GitHub commit: PUBDEV-2135: Cleaned up N-fold CV model parameter sanity checking and error message propagation; now checks all N-fold model parameters upfront and lets the main model carry the message to the user
  • GitHub commit: PUBDEV-2130: N-fold CV models are no longer deleted when the main model is deleted
  • GitHub commit: PUBDEV-2107: The title in plot.H2OBinomialMetrics is now editable
  • GitHub commit: Parse Python lambda (bytecode -> ast -> rapids)
  • GitHub commit: PUBDEV-1847: Cleaned up/refactored GBM/DRF
  • GitHub commit: Updated MeanSquare to Quadratic for DL
  • GitHub commit: PUBDEV-2133: Speed up Enum mapping between train/test from O(N^2) to O(N*log(N))
  • GitHub commit: Added GLRM scoring history with step size and average change in objective function value
  • GitHub commit: SVD now outputs the V matrix as a frame with a frame key, rather than a double array in the API
  • GitHub commit: Modified k-means++ initialization in GLRM to set X to inverse of cluster distance with sum normalized to one, for each observation in training data
  • GitHub commit: Increased GBM worker thread priority to avoid deadlock with high parallel GBM job counts
  • GitHub commit: Added input parameter svd_method to GLRM
Python
  • GitHub commit: centers_std is now returned as a list of columns
  • GitHub commit: str(Frame) no longer returns an ID; updated ExprNode _to_string to accomodate
  • GitHub commit: Changed default setting for _isAllAscii to false
  • GitHub commit: Fixed var to return scalar/frame based on nrow
  • GitHub commit: Python now checks ncol, not nrow
  • PUBDEV-1060: Python's h2o.import_frame() now matches R's importFile() parameters where applicable
  • PUBDEV-1960: Python now uses the streaming endpoint /3/DownloadDataset.bin
  • PUBDEV-2223: Added normalization and standardization coefficients to the model output in Python
  • GitHub commit: Renamed logging to h2o_logging to avoid conflict with original logging package
  • GitHub commit: H2O now recognizes additional parameters (such as column names) for Python objects
  • GitHub commit: head and tail no longer download the entire dataset
  • GitHub commit: Truncated DF in head and tail before calling /DownloadDataset
  • GitHub commit: head() and tail() now default to pretty printing in Python
  • GitHub commit: Moved setup functionality from parse to parse setup; col_types and na_strings can now be dictionaries
  • GitHub commit: Updated H2OColSelect to supply extra argument
  • GitHub commit: PUBDEV-2174: Relative tolerance is now used for floating point comparison
  • GitHub commit: Added more cloud health output to run.py
  • GitHub commit: When Pandas frames are returned, they are now wrapped to display nicely in iPython
R
  • GitHub commit: Added null check
  • PUBDEV-2185: When appending a vec to an existing data frame, H2O now creates a new data frame while still keeping the original frame in memory
  • PUBDEV-1959: R now uses the streaming endpoint /3/DownloadDataset.bin
  • PUBDEV-2020: h2o.splitFrame() in R/Python now uses the runif technique instead of the horizontal slice technique
  • GitHub commit: Changed T/F to TRUE/FALSE
  • GitHub commit: xml2 package is now required for rversions package
  • GitHub commit: Package dependencies are taken into account when installing R packages
  • GitHub commit: Metrics are now always computer if a dataset is provided (R h2o.performance call)
  • GitHub commit: Column names are now fetched from H2O
  • GitHub commit: PUBDEV-2150: Time columns in H2O are now imported as Date columns in R
  • GitHub commit: h2o.ls() now returns data.frame
  • GitHub commit: h2o.ls() now returns the whole frame
  • GitHub commit: Removed unnamed additional parameters (ellipses) in R algos
  • GitHub commit: Added as.characterto Rapids implementation
  • GitHub commit: Updated plot.H2OModel in R
  • GitHub commit: Updated scoring history plot in R for training_frame only
  • GitHub commit: Instead of : and assign, attr is now used
  • GitHub commit: Raw strings are now used as accessors
  • GitHub commit: name.Frame and dimnames.Frame are now visible
System
  • GitHub commit: Added vertical prefetch of all chunks' worth of data for dense rows
  • PUBDEV-1426: Scoring is now a non-blocking job with a progress bar
  • GitHub commit: EasyPojo API is now serializable
  • GitHub commit: Changed parse setup guess when encountering large NA counts to not favor numeric over dates or UUIDs
  • GitHub commit: Refactored vector type conversion methods into a class called VecUtils
  • GitHub commit: Cleaned up ASTStrList to handle frames with more than one vector during column conversion; checks types before converting; added several new column type conversions
  • GitHub commit: If the job is cancelled, scoring is now canceled
  • GitHub commit: Refactored doAll_numericResult() -> doAll(nout, type, frame) where all output vecs are of the given type
  • GitHub commit: Improved hash function
  • GitHub commit: The output of _train.get() is now passed to a Frame
  • GitHub commit: Refactored binary/col ops for aesthetics and maintainability
  • GitHub commit: Added correct types for new Vecs; CategoricalWrappedVec now exports a utility for enum conversions instead of a constructor
  • GitHub commit: Mean/sigma values are now printed to the logs after parsing
  • GitHub commit: PUBDEV-2174: Added some optimizations for some chunks (mostly integers) in RollupStats
  • GitHub commit: PUBDEV-2174: Added instantiations of Rollups for dense numeric chunks
  • GitHub commit: PUBDEV-2174: Implemented single-pass variance/stddev calculation for rollups
  • GitHub commit: PUBDEV-2174: Added hasNA() for chunks
  • GitHub commit: Reordered args in sub/gsub (astid > astparameter, add string -> numeric
  • GitHub commit: Ensured all chunks get closed
  • GitHub commit: NewChunk.addString() now accepts a Java string or BufferedString, eliminating needless conversion to a BufferedString before inserting into the NewChunk buffer. Improves efficiency of several ASTStrOps as well as converting Categorical columns to String columns.
  • GitHub commit: Renamed enums to categoricals system-wide
  • GitHub commit: Renamed ValueString -> BufferedString
  • GitHub commit: Removed redundant frame creation; added Java comments to each string utility; changed RAPIDS name of gsub -> replaceall and sub -> replacefirst; added nchar utility to the R client; updated comments in Python and R client
  • GitHub commit: All NA chunks are now handled in string ops
  • GitHub commit: Added ability for string utils to handle NA chunks
  • GitHub commit: Added the ability to handle duplicate rows to merge
  • GitHub commit: countMatches utilities now only work on string columns
  • GitHub commit: Changed names of SubStr and GSubStr to ReplaceFirst and ReplaceAll; both methods now only accept string columns as input
  • GitHub commit: Changed toUpper and toLower to only work on string columns; includes an optimzied version of each method as well as a UTF-safe version
  • GitHub commit: CStrChunks now track whether they are pure ASCII to allow StringUtilities to use optimized versions of the utilities that operate directly on the string buffer
  • GitHub commit: Moved frame function to ArrayUtils
  • GitHub commit: Removed categorical versions of trim() and length()
  • GitHub commit: Changed the merge defaults to match the implementation
  • GitHub commit: Merge no longer uses a by argument
  • GitHub commit: Added trim and length functionality for string columns
  • GitHub commit: HEXDEV-442: Improved POJO handling
  • GitHub commit: Config files are now transferred using a hexstring to avoid issues with Hadoop XML parsing
  • GitHub commit: HEXDEV-445: Added isNA check
  • GitHub commit: Means, mults, modes, and size now do bulk rollups
  • GitHub commit: Increased priority of model builder Driver classes to prevent deadlock when bulk-launching parallel unrelated model builds
  • GitHub commit: Renamed Currents to Rapids
  • GitHub commit: CRAN-based R clients are now set to opt-out by default
  • GitHub commit: Assembly states are now saved in the DKV
Web UI
  • PUBDEV-1961: Flow now uses the streaming endpoit /3/DownloadDataset.bin

Bug Fixes

Algorithms
  • GitHub commit: Fixed bug with CategoricalWrappedVec
  • PUBDEV-1664: Corrected math for GBM Tweedie with offsets/weights
  • PUBDEV-1665: Corrected math for GBM Poisson with offsets/weights
  • PUBDEV-2130: Deleting Deep Learning n-fold models resulted in a java.lang.AssertionError
  • GitHub commit: Fixed GLM with nfolds
  • GitHub commit: Updated GLM InitTsk to run at +1 priority level to avoid deadlock when launching hundreds of GLMs in parallel
  • GitHub commit: Column names (feature names) are now named correctly for the exported weight matrix connecting the input to the first hidden layer
  • GitHub commit: Changed isEnum to isCategorical
  • GitHub commit: Cleaned up DRF and GBM; fixed checkpoint restart logic for trees and changed which parameters are configurable
  • GitHub commit: Fixed incorrect logistic and hinge loss functions and apply to binary numeric columns in {0,1} only
  • GitHub commit: Fixed a bug where Poisson loss function was calculated incorrectly for values of 0
  • GitHub commit: Fixed DL POJO for large input columns
Python
R
System
  • PUBDEV-2250: During parsing, SVMLight-formatted files failed with an NPE GitHub commit
  • PUBDEV-2213: During parsing, alphanumeric data in a column was converted to missing values and the column was assigned a type of int
  • PUBDEV-1990: Spaces are now permitted in the Flow directory name
  • PUBDEV-1037: Space in the user name was preventing H2O from starting
  • GitHub commit: Fixed VecUtils.copyOver() to accept a column type for the resulting copy
  • GitHub commit: Fixed Vec.preWriting so that it does not use an anonymous inner task which causes the entire Vec header to be passed
  • GitHub commit: Fixed parse to mark categorical references in ParseWriter as transient (enums must be node-shared during the entire multiple parse task)
  • GitHub commit: PUBDEV-2182: Fixed DL checkpoint restart with given validation set after R (currents) behavior changed; now the validation set key no longer necessarily matches the file name
  • GitHub commit: Fixed makeCon memory leak when redistribute=T
  • GitHub commit: PUBDEV-2174: Fixed sigma calculation for sparse chunks
  • GitHub commit: Restored pre-existing string manipulation utilities for categorical columns
  • GitHub commit: Fixed syncRPackages task so it doesn't run during the normal build process
  • GitHub commit: Fixed intermittent failures caused by different default timezone settings on different machines; sets needed timezone before starting test
  • GitHub commit: Fixed error message for countmatches
  • GitHub commit: PUBDEV-1443: Fixed size computation in merge
  • GitHub commit: Fixed h2o.tabulate() to work in multi-node mode
  • GitHub commit: Fixed integer overflow in printout of CM to TwoDimTable

Slater (3.2.0.7) - 10/09/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-slater/7/index.html

Bug Fixes

  • GitHub commit: Fix Java 6 compatibility

    The Java 7 API call _rawChannel.setOption(StandardSocketOptions.TCP_NODELAY, true); has been replaced by the Java 6 API call _rawChannel.socket().setTcpNoDelay(true);

    The Java 7 API call sock.getRemoteAddress()) has been replaced by sock.socket().getRemoteSocketAddress()


Slater (3.2.0.5) - 09/24/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-slater/5/index.html

Enhancements

Algorithms

Slater (3.2.0.3) - 09/21/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-slater/3/index.html

New Features

R

Enhancements

Algorithms
  • GitHub commit: Added back support for sparse activations in DL; currently changes results as numerical values are de-scaled only, no standardized
Python
  • GitHub commit: Adjusted import_file in Python to accept the same parameters as import_file in R
R

Bug Fixes

Algorithms
R
System

Slater (3.2.0.1) - 09/12/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-slater/1/index.html

New Features

Algorithms
  • GitHub: PUBDEV-1888: Added loss function calculation for DL.
  • GitHub: Set more parameters for GLM to be gridable.
  • GitHub: [KMeans] Enable grid search with max_iterations parameter.
  • GitHub: Add kfold column builders
  • GitHub: Add stratified kfold method
Python
  • PUBDEV-684: Add nfolds to R/Python
  • GitHub: Improved group-by functionality
  • GitHub: Added python example for downloading glm pojo.
  • GitHub: Added countmatches to Python along with a test.
  • GitHub: Added support for getting false positive rates and true positive rates for all thresholds from binomial models; makes it easier to calculate custom metrics from ROC data (like weighted ROC)
R
  • PUBDEV-1788: Added a factor function that will allow the user to set the levels for a enum column GitHub
  • PUBDEV-1881: Fixed bug in h2o.group_by for enumerator columns
  • GitHub: Refactor SVD method name and add svd_method option to R package to set preferred calculation method
  • PUBDEV-2071: Accept columns of type integer64 from R through as.h2o()
Sparkling Water
  • PUBDEV-282: Support Windows OS in Sparkling Water
System
  • HEXDEV-120: Switch from NanoHTTPD to Jetty
  • GitHub: Allow for "most" and "mode" in groupby
  • GitHub: Added NA check to checking for matches in categorical columns
  • PUBDEV-1470: Dropped UDP mode in favor of TCP
  • PUBDEV-1431: /3/DownloadDataset.bin is now a registered handler in JettyHTTPD.java. Allows streaming of large downloads from H2O.GitHub
  • PUBDEV-1865: Implemented per-row 1D, 2D and 3D DCT transformations for signal/image/volume processing
  • PUBDEV-1686: LDAP Integration
  • HEXDEV-381: LDAP Integration
  • HEXDEV-224: Added https support
  • GitHub: Added mapr5.0 version to builds
  • GitHub: Add Vec.Reader which replaces lost caching
Web UI
  • GitHub: Disallow N-fold CV for GLM when lambda-search is on.
  • GitHub: Added typeahead for http and https.
  • PUBDEV-1821: Added Save Model and Load Model

Enhancements

Algorithms
  • GitHub: Don't allocate input dropout helper if input_dropout_ratio = 0.
  • PUBDEV-1920: Datasets : Unbalanced sparse for binomial and multinomial
  • GitHub: Major code cleanup for DL: Remove dead code, deprecate sparse/col_major.
  • PUBDEV-1942: Use prior class probabilities to break ties when making labels GitHub
  • GitHub: Update DL perf Rmd file to get the overall CM error.
  • GitHub: Enable training data shuffling if train_samples_per_iteration==0 and reproducible==true
  • GitHub: Checkpointing for DL now follows the same convention as for DRF/GBM.
  • GitHub: No longer do sampling with replacement during training with shuffle_training_data
  • GitHub: Add printout of sparsity ratio for double chunks.
  • GitHub: Check memory footprint for Gram matrix in PCA and SVD initialization
  • GitHub: Print more fill ratio debugging.
  • GitHub: Fix the RNG for createFrame to be more random (since we are setting the seed for each row).
  • PUBDEV-2010: Improve reporting of unstable DL models GitHub
  • PUBDEV-2018: Improve auto-tuning for DL on large clusters / large datasets GitHub
  • GitHub: Add input parameter to h2o.glrm indicating whether to ignore constant columns
  • GitHub: Missing enums are imputed using the majority class of the column. For other types of missing categorical, just round the mean to the nearest integer.
  • GitHub: Skip rows in training frame with missing value(s) if requested
  • GitHub: Speed up direct SVD by working with transpose directly
  • GitHub: Fix a bug in initialization of SVD and change l2 norm to sum of squared error in convergence test.
  • GitHub: Use absolute value for mean weight and bias checks.
  • GitHub: No longer leak constant chunks during AE scoring/reconstruction.
  • GitHub: No longer differentiate between DL model instabilitites (weights vs biases).
  • GitHub: Make method static, where possible.
  • GitHub: Make GLRM seeding independent of number of chunks.
API
  • GitHub: Added REST end-points for glrm,svd,pca,naive bayes algorithms.
  • GitHub: Added unicode to frame getter possibilities
  • GitHub: Added proper lookup of offset/weights/fold_column
  • GitHub: Data should be eagered before download_csv.
  • GitHub: Simplified model builder
  • GitHub: Added None as default for "on" field
  • GitHub: Removed all of the unnecessary calls to h2o.init and removed the unnecessary environment variable for version checking during testing
  • PUBDEV-2064: rename the coordinate decent solvers in the REST API / Flow to (experimental)
Grid Search
  • GitHub: Added check that x is not null before verifying data in unsupervised grid search algorithm
  • GitHub: Made naivebayes parameters gridable.
  • PUBDEV-1933: Called drf as randomForest in algorithm option GitHub
  • GitHub: Validation of grid parameters against algo /parameters rest endpoint.
  • PUBDEV-1979: Train N-fold CV models in parallel GitHub
  • PUBDEV-1978: grid: would be good to add to h2o.grid R help example, how to access the individual grid models
Python
  • GitHub: Refactored into h2o.system_file so it's parallel to R client.
  • GitHub: Added h2o_deprecated decorator
  • GitHub: Use import_file in import_frame
  • GitHub: Handle a list of columns in python group-by api
  • GitHub: Use pandas if available for twodimtables and h2oframes
  • GitHub: Transform the parameters list into a dict with keys being the parameter label
  • GitHub: Added pop option which does inplace update on a frame (Frame.remove)
  • GitHub: ncol,dim,shape, and friends are now all properties
  • PUBDEV-193: Write python version of h2o.init() which knows how to start h2o
  • PUBDEV-1903: Method to get parameters of model in Python API
  • GitHub: Allow for single alpha specified not be in a list
  • GitHub: Updated endpoint for python client download_csv
  • GitHub: Allow for enum in scale/mean/sd (ignore or give NA)
  • GitHub: Allow for n_jobs=-1 and n_jobs > 1 for Parallel jobs
  • GitHub: Added frame_id property to frame
  • GitHub: Removed remaining splats on dicts
  • GitHub: Removed need to splat pass thru args
  • GitHub: Added get_jar flag to download_pojo
R
  • PUBDEV-1866: Rewrote h2o.ensemble to utilize nfolds/fold_column in h2o base learners
  • GitHub: Added max_active_predictors.
  • GitHub: Updated REST call from R for model export
  • PUBDEV-1853: Removed addToNavbar from RequestServer GitHub
  • GitHub: Add "Open H2O Flow" message.
  • GitHub: Replaced additive float op by multiplication
  • GitHub: Reimplement checksum for Model.Parameters
  • GitHub: Remove debug prints.
  • PUBDEV-1857: Removed the need for String[] path_params in RequestServer.register() GitHub
  • PUBDEV-1856: Removed the writeHTML_impl methods from all the schemas
  • PUBDEV-1854: Made _doc_method optional in the in Route constructors GitHub
  • PUBDEV-1858: Changed RequestServer so that only one handler instance is created for each Route
  • GitHub: Swapped out rjson for jsonlite for better handling of odd characters from dataset.
  • GitHub: Prettify R's grid output.
  • PUBDEV-1841: R now respects the TwoDimTable's column types
  • GitHub: Fixes show method for grid object when hyper_params is empty.
  • GitHub: h2o.levels returns R vector for single column
  • GitHub: Uses PredictCsv from genmodel now.
  • GitHub: Exposed stacktraces in R's summary() call.
  • GitHub: print type of failed value in $<-
  • GitHub: allow value to be integer in $<-
  • GitHub: Check for is_client being NULL since older H2O clusters may not have is_client.
Sparkling Water
  • GitHub: Copy content of h2o-dist into target directory.
System
  • GitHub: Rename label fields in prediction object.
  • GitHub: Uses the original Vec's domain in alignment
  • GitHub: Added columnName and unknownLevel to PredictUnknownCategoricalLevelException.
  • PUBDEV-1559: Added compression of 64-bit Reals GitHub
  • GitHub: Added time information to buildinfo.json.
  • GitHub: Put build metadata into a json file.
  • -GitHub: Add time information to buildinfo.json.
  • GitHub: Delete any prior main CV models of the same key if CV model building is cancelled before the main model started to build.
  • GitHub: Change loading name parameter to a String to address a Flow issue.
  • GitHub: Remove extra assertion to avoid NPEs after client call of bulk remove after done() is called but before the finally is done with updateModelOutput.
  • GitHub: Ensures that date time methods return year/month/day values in the currently set timezone.
  • GitHub: Frees memory from streamed zip reads after the chunk has been parsed.
  • GitHub: Unifies categorical strings to UTF-8 and warns the user about all conversion.
  • GitHub: add isNA checks to scale
  • GitHub: Do not start UDPRecevier thread (unless running with useUDP option)
Web UI
  • PUBDEV-1961: Flow: use streamining endpoint /3/DownloadDataset.bin

Bug Fixes

Algorithms
  • PUBDEV-1785: Deadlock while running GBM
  • GitHub: Fix name for standardized_coefficient_magnitudes.
  • PUBDEV-1774: Setting gbm's balance_classes to True produces suspect models
  • PUBDEV-1849: K-Means: negative sum-of-squares after mean imputation
  • GitHub: Set the iters counter during kmeans center initialization correctly
  • GitHub: fixed parenthesis in GLM POJO generation
  • GitHub: Should be updating model each iteration with the newly fitted kmeans clusters, not the old ones!
  • PUBDEV-1867: GLRM with Simplex Fails with Infinite Objective
  • PUBDEV-1666: GBM:Math correctness for Gamma with offsets/weights
  • PUBDEV-451: Trees in GBM change for identical models GitHub
  • PUBDEV-1924: R^2 stopping criterion isn't working GitHub
  • PUBDEV-1776: GLM: cross-validation bug GitHub
  • PUBDEV-1682: GLM : Lending club dataset => build GLM model => 100% complete => click on model => null pointer exception GitHub
  • PUBDEV-1987: error returned on prediction for xval model
  • PUBDEV-1928: Properly implement Maxout/MaxoutWithDropout GitHub
  • GitHub: print actual number of columns (was just #cols) in DRF init
  • PUBDEV-2026: Fix setting the proper job state in DL models GitHub
  • PUBDEV-1950: Splitframe with rapids is not blocking
  • PUBDEV-1995: nfold: when user cancels an nfold job, fold data still remains in the cluster memory
  • PUBDEV-1994: nfold: cancel results in a java.lang.AssertionError
  • PUBDEV-1910: Canceled GBM with CV keeps lock
  • GitHub: Fix DL checkpoint restart with new data.
API
  • PUBDEV-1955: Change Schema behavior to accept a single number in place of array GitHub
  • PUBDEV-1914: Iced deserialization fails for Enum Arrays
Grid
  • PUBDEV-1876: Grid: progress bar not working for grid jobs
  • PUBDEV-1875: Grid: the meta info should not be dumped on the R screen, once the grid job is over
  • GitHub: [PUBDEV-1876] Fix grid update.
  • PUBDEV-1874: Grid search: observe issues with model naming/overwriting and error msg propagation GitHub
  • HEXDEV-402: R: kmeans grid search doesn't work
  • PUBDEV-1901: Grid appends new models even though models already exist.
  • PUBDEV-1874: Grid search: observe issues with model naming/overwriting and error msg propagation
  • PUBDEV-1940: Grid: glm grid on alpha fails with error "Expected '[' while reading a double[], but found 1.0"
  • PUBDEV-1877: Grid: if user specify the parameter value he is running the grid on, would be good to warn him/her
  • PUBDEV-1938: Grid: randomForest: unsupported grid params and wrong error msg
Hadoop
  • PUBDEV-2036: importModel from hdfs doesn't work
  • PUBDEV-2027: Clicking shutdown in the Flow UI dropdown does not exit the Hadoop cluster
Python
  • PUBDEV-1789: Python client h2o.remove_vecs (ExprNode) makes bad ast
  • PUBDEV-1795: Unable to read H2OFrame from Python
  • PUBDEV-1764: Python importFile does not import all files in directory, only one file GitHub
  • GitHub: parameter name is "dir" not "path"
  • PUBDEV-1693: Python: Options for handling NAs in group_by is broken
  • PUBDEV-1415: Intermittent Unimplemented rapids exception: pyunit_var.py . Also prior test got unimplemented too, but test didn't fail (client wasn't notified)
  • PUBDEV-1119: Python: Need to be able to access resource genmodel.jar
  • GitHub: Fix download of pojo in Python.
R
  • GitHub: Fixed bug in h2o.ensemble .make_Z function
  • PUBDEV-1796: R: h2o.importFile doesn't allow user to choose column type during parse
  • PUBDEV-1768: R: Fails to return summary on subsetted frame GitHub
  • PUBDEV-1909: R: Adding column to frame changes string enums in column to numerics
  • PUBDEV-1936: R: h2o.levels return only the first factor of factor levels
  • PUBDEV-1869: R: sd function should convert enum column into numeric and calculate standard deviation GitHub
  • PUBDEV-1246: R: h2o.hist needs to run pretty function for pretty breakpoints to get same results as R's hist GitHub
  • PUBDEV-1868: R: h2o.performance returns error (not warning) when model is reloaded into H2O
  • PUBDEV-1723: h2o R : subsetting data :h2o removing wrong columns, when asked to delete more than 1 columns
  • GitHub: fix h2o.levels issue
  • PUBDEV-1972: R: setting weights_column = NULL causes unwanted variables to be used as predictors
Sparkling Water
  • PUBDEV-1173: create conversion tasks from primitive RDD
  • GitHub: Fix return value issue in distribution script.
System
  • HEXDEV-360: getFrame fails on Parsed Data
  • PUBDEV-366: Fix parsing for high-cardinality categorical features GitHub
  • PUBDEV-1143: Parse: Cancel parse unreliable; does not work at all times
  • PUBDEV-1872: Ability to ignore files during parse GitHub
  • PUBDEV-777: Parse : Parsing compressed files takes too long
  • PUBDEV-1916: Parse: 2 node cluster takes 49min vs 40sec on a 1 node cluster GitHub
  • PUBDEV-1431: Convert /3/DownloadDataset to streaming
  • PUBDEV-1995: nfold: when user cancels an nfold job, fold data still remains in the cluster memory
  • PUBDEV-1994: nfold: cancel results in a java.lang.AssertionError
  • PUBDEV-1910: Canceled GBM with CV keeps lock GitHub
  • PUBDEV-1992: CreateFrame isn't totally random
  • GitHub: Fixes a bug that allowed big buffers to be constantly reallocated when it wasn't needed. This saves memory and time.
  • GitHub: Fix print statement.
  • GitHub: Fixed orderly shutdown to work with flatfile.
  • PUBDEV-1998: Parse : Lending club dataset parse => cancelled by user
  • PUBDEV-2028: Shutdown => unimplemented error on curl -X POST 172.16.2.186:54321/3/Shutdown.html
  • PUBDEV-2070: Download frame brings down cluster
  • PUBDEV-2067: Cannot mix negative and positive array selection
  • PUBDEV-2024: Save model to HDFS fails
Web UI

Simons (3.0.1.7) - 8/11/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-simons/7/index.html

New Features

The following changes represent features that have been added since the previous release:

Python
Web UI

Enhancements

The following changes are improvements to existing features (which includes changed default values):

Algorithms
  • GitHub: add seed to the model building that uses balance_classes, for determinism/repeatability
  • GitHub: Reduce the frequency at which tiny tree models are printed to stdout: Only print during the first 4 seconds if score_each_iteration is enabled.
  • GitHub: Only call the limited printout for TwoDimTables during Model.toString () that prints all TwoDimTables of the model._output.
  • GitHub: Only print up to 10 rows of TwoDimTables in ASCII logs (first/last 5).
  • GitHub: Remove some overflow/underflow checks: Let exp(x) be small and log(x) be large.
  • GitHub: Add nbins_top_level parameter to DRF/GBM. Not yet in R.
  • GitHub: Disallow N-fold CV for GLM when lambda-search is on.
API
  • GitHub: Cleanup of public API of Schema.java. Improve its JavaDoc a lot.
Python
  • PUBDEV-1765: Improve python online documentation
  • PUBDEV-1497: Python : Weights R tests to be ported from R for GLM/GBM/RF/DL
  • GitHub: adjust to split frame jobs result
  • GitHub: allow for update thingy to be a tuple (so rows and columns)
  • GitHub: when starting h2o jvm with h2o.init(), give h2o child process different id than parent, so it doesn't get killed on Ctrl-C
  • GitHub: add option to turn off progress bar print out
  • GitHub: add unicode to frame getter possibilities
  • GitHub: remove remaining splats on dicts
  • GitHub: no need to splat pass thru args
  • GitHub: proper lookup of offset/weights/fold_column
  • GitHub: data should be eagered before download_csv.
  • GitHub: simplify model builder
  • GitHub: use None as default for "on" field
  • GitHub: add get_jar flag to download_pojo
  • GitHub:remove all of the unnecessary calls to h2o.init and remove the unnecessary environment variable for version checking during testing
R
  • PUBDEV-1744: Improve help message of h2o.init function
  • GitHub: add valid expression to list of accepted R CMD check outputs.
  • GitHub: added h2o.anomaly demo to r package
System
  • GitHub: Add -JJ command line argument to allow extra JVM arguments to be passed.
  • GitHub: Refactored CSVStream to be more understandable. Fix empty chunk bug.
  • GitHub: Add hintFlushRemoteChunk to CSVStream.
  • GitHub: Add parameterized route for frame export
  • GitHub: allow string vecs to be toEnum'd (with a sensible cap)
  • GitHub: allow lists of numbers in reducer ops
  • GitHub: Add warning message during POJO export if offset_column is specified (is not supported)
  • PUBDEV-1853: cleanup: remove addToNavbar from RequestServer GitHub
  • GitHub: Add "Open H2O Flow" message.
  • GitHub: Code refactoring to allow GBM JUnits to work with H2OApp in multi-node mode.
  • GitHub: Replace additive float op by multiplication
  • GitHub: Reimplement checksum for Model.Parameters
  • GitHub: Remove debug prints.
  • PUBDEV-1857: cleanup: remove the need for String[] path_params in RequestServer.register() GitHub
  • PUBDEV-1856: cleanup: remove the writeHTML_impl methods from all the schemas
  • PUBDEV-1854: cleanup: make _doc_method optional in the in Route constructors GitHub
  • PUBDEV-1858: cleanup: change RequestServer so that only one handler instance is created for each Route

Bug Fixes

The following changes are to resolve incorrect software behavior:

Algorithms
  • PUBDEV-1674: gbm w gamma: does not seems to split at all; all trees node pred=0 for attached data GitHub
  • PUBDEV-1760: GBM : Deviance testing for exp family
  • PUBDEV-1714: gbm gamma: R vs h2o same split variable, slightly different leaf predictions
  • PUBDEV-1755: DL : Math correctness for Tweedie with Offsets/Weights
  • PUBDEV-1758: DL : Deviance testing for exp family
  • PUBDEV-1756: DL : Math correctness for Poisson with Offsets/Weights
  • PUBDEV-1651: null/residual deviances don't match for various weights cases
  • PUBDEV-1757: DL : Math correctness for Gamma with Offsets/Weights
  • PUBDEV-1680: gbm gamma: seeing train set mse incs after sometime
  • PUBDEV-1724: gbm w tweedie: weird validation error behavior
  • PUBDEV-1774: setting gbm's balance_classes to True produces suspect models
  • PUBDEV-1849: K-Means: negative sum-of-squares after mean imputation
  • GitHub: Set the iters counter during kmeans center initialization correctly
  • GitHub: fixed parenthesis in GLM POJO generation
  • GitHub: Should be updating model each iteration with the newly fitted kmeans clusters, not the old ones!
  • PUBDEV-1867: GLRM with Simplex Fails with Infinite Objective
  • PUBDEV-1666: GBM:Math correctness for Gamma with offsets/weights
Python
  • PUBDEV-1779: Fixes intermittent failure seen when Model Metrics were looked at too quickly after a cross validation run.
  • PUBDEV-1409: h2o python h2o.locate() should stop and return "Not found" rather than passing path=None to h2o? causes confusion h2o message GitHub
  • PUBDEV-1630: GBM getting intermittent assertion error on iris scoring in pyunit_weights_api.py
  • PUBDEV-1770: sigterm caught by python is killing h2o GitHub
  • PUBDEV-1409: h2o python h2o.locate() should stop and return "Not found" rather than passing path=None to h2o? causes confusion h2o message
  • HEXDEV-397: Python fold_column option requires fold column to be in the training data
  • HEXDEV-394: Python client occasionally throws attached error
  • GitHub: add missing args to kmeans
  • GitHub: add missing kmeans params in
  • GitHub: add missing checkpoint param
  • PUBDEV-1785: Deadlock while running GBM
R
  • PUBDEV-1830: h2o.glm throws an error when fold_column and validation_frame are both specified
  • PUBDEV-1660: h2oR: when try to get a slice from pca eigenvectors get some formatting error GitHub
  • GitHub: fix broken %in% in R
  • PUBDEV-1831: Cross-validation metrics are not displayed in R (and Python?)
  • PUBDEV-1840: Autoencoder model doesn't display properly in R (training metrics) GitHub
System
  • PUBDEV-1790: can't convert iris species column to a character column.
  • PUBDEV-1520: Kmeans pojo naming inconsistency
  • GitHub: fix parse of range ast
  • GitHub: Sets POJO file name to match the class name. Prior behavior would allow them to be different and give a compile error.
Web UI
  • PUBDEV-1754: Export frame not working in flow : H2OKeyNotFoundArgumentException

Simons (3.0.1.4) - 7/29/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-simons/4/index.html

New Features

Algorithms
Python
  • PUBDEV-386: Expose ParseSetup to user in Python
  • PUBDEV-1239: Python: getFrame and getModel missing
  • HEXDEV-334: support rbind in python
  • PUBDEV-1215: python to have exportFile calll
  • GitHub: add cross-validation parameter to metric accessors and respective pyunit
  • PUBDEV-1729: Cross-validation metrics should be shown in R and Python for all models
R
  • PUBDEV-385: Expose ParseSetup to user in R
  • GitHub: add mean residual deviance accessor to R interface
  • GitHub: incorporate cross-validation metric access into the R client metric accessors
  • GitHub: R interface for checkpointing in RF enabled
System
  • PUBDEV-1735: Add 24-MAR-14 06.10.48.000000000 PM style date to autodetected

Enhancements

#####API

Algorithms
  • GitHub: Add proper deviance computation for DL regression.
  • GitHub: Print GLM model details to the logs.
  • GitHub: Disallow categorical response for GLM with non-binomial family.
  • GitHub: Disallow models with more than 1000 classes, can lead to too large values in DKV due to memory usage of 8*N^2 bytes (the Metrics objects which are in the model output)
  • GitHub: DL: Don't train too long in single node mode with auto-tuning.
  • GitHub: Use mean residual deviance to do early stopping in DL.
  • GitHub: Add a "AUTO" setting for fold_assignment (which is Random). This allows the code to reject non-default user-given values if n-fold CV is not enabled.
Python
  • HEXDEV-317: Python has to play nicely in a polyglot, long-running environment
  • GitHub: simplify ast in python frame slicer
  • GitHub: add cross validation metrics and mean residual deviance to model show()
  • GitHub: any to take a frame, simplify python's __contains__
R
  • GitHub: On detaching h2o R package, only shut down H2O instance if it was started by the R client
  • GitHub: update h2o load
System
  • GitHub: Print a handy message (Open H2O Flow in your web browser) when the cluster comes up like Sparkling Water does.
  • GitHub: Replace memory leaky RCurl getURL with curlPerform.
  • GitHub: Add -disable_web parameter.
  • GitHub: allow numerics in match
  • GitHub: More refactoring of h2o start. Includes:
    • H2OStarter - a generic class to start H2O. It does all dynamic registration
    • H2OTestStarter - a generic class to start h2o-core tests
  • GitHub: Use typed key when it is necessary. Key.make() now returns typed Key. The trick is that type T can be derived by left side of assignment. If it is not possible to derive type of the Key, then developer has to use typed syntax: Key.<Frame>make("myframe.hex") The change simplifies Scala code which will be able to derive type key.
  • PUBDEV-1793: Add Job state and start/end time to the model's output GitHub
  • GitHub: add more places to look when trying to start jar from python's h2o.init
  • GitHub: Cosmetic name changes
  • GitHub: Fetch local node differently from remote node.
  • GitHub: Don't clamp node_idx at 0 anymore.
  • GitHub: Added -log_dir option.

Bug Fixes

API
  • PUBDEV-776: Schema.parse() needs to be better behaved (like, not crash)
Algorithms
  • PUBDEV-1725: pca:glrm - give bad results for attached data (bec of plus plus initialization)
  • GitHub: Fix deviance calculation, use the sanitized parameters from the model info, where Auto parameter values have been replaced with actual values
  • GitHub: Fix offset in DL for exponential family (that doesn't do standardization)
  • GitHub: Fix a bug where initial Y was set to all zeroes by kmeans++ when scaling was disabled
  • PUBDEV-1668: GBM: Math correctness for weights
  • PUBDEV-1783: dl: deviance off for large dataset GitHub
  • PUBDEV-1667: GBM: Math correctness for Offsets
  • PUBDEV-1778: drf: reporting incorrect mse on validation set GitHub
  • GitHub: Fix DRF scoring with 0 trees.
Python
R
  • PUBDEV-1257: R: no is.numeric method for H2O objects
  • PUBDEV-1622: NPE in water.api.RequestServer, water.util.RString.replace(RString.java:132)...got flagged as WARN in log...I would think we should have all NPE's be ERROR / fatal? or ?? GitHub
  • PUBDEV-1655: h2o.strsplit needs isNA check
  • PUBDEV-1084: h2o.setTimezone NPE
  • PUBDEV-1738: R: cloud name creation can't handle user names with spaces
System
  • PUBDEV-1410: apply causes assert errors mentioning deadlock in runit_small_client_mode ...build never completes after hours ..deadlock?
  • PUBDEV-1195: docker build fails
  • HEXDEV-362: Bug in /parsesetup data preview GitHub
  • PUBDEV-1766: H2O xval: when delete all models: get Error evaluating future[6] :Error calling DELETE /3/Models/gbm_cv_13
  • PUBDEV-1767: H2O: when list frames after removing most frames, get: roll ups not possible vec deleted error GitHub
Web UI
  • PUBDEV-1782: Flow: View Data fails when there is a UUID column (and maybe also a String column)
  • PUBDEV-1769: xval: cancel job does not work GitHub

Simons (3.0.1.3) - 7/24/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-simons/3/index.html

New Features

Python

Enhancements

API
  • GitHub: Increase sleep from 2 to 3 because h2o itself does a sleep 2 on the REST API before triggering the shutdown.
System

Bug Fixes

The following changes are to resolve incorrect software behavior:

Algorithms
  • PUBDEV-1743: gbm poisson w weights: deviance off
  • PUBDEV-1736: gbm poisson with offset: seems to be giving wrong leaf predictions
Python
  • PUBDEV-1731: Python get_frame() results in deleting a frame created by Flow
  • HEXDEV-389: Split frame from python
  • HEXDEV-388: python client H2OFrame constructor puts the header into the data (as the first row)
R
  • PUBDEV-1504: Runit intermittent fails : runit_pub_180_ddply.R
  • PUBDEV-1678: Client mode jobs fail on runit_hex_1750_strongRules_mem.R
System
  • GitHub: Model parameters should be always public.

Simons (3.0.1.1) - 7/20/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-simons/1/index.html

New Features

Algorithms
Python
  • PUBDEV-1437: Python needs "nlevels" operator like R
  • PUBDEV-1434: Python needs "levels" operator, like R
  • PUBDEV-1355: Python needs h2o.trim, like in R
  • PUBDEV-1354: Python needs h2o.toupper, like in R
  • PUBDEV-1352: Python needs h2o.tolower, like in R
  • PUBDEV-1350: Python needs h2o.strsplit, like in R
  • PUBDEV-1347: Python needs h2o.shutdown, like in R
  • PUBDEV-1343: Python needs h2o.rep_len, like in R
  • PUBDEV-1340: Python needs h2o.nlevels, like in R
  • PUBDEV-1338: Python needs h2o.ls, like in R
  • PUBDEV-1344: Python needs h2o.saveModel, like in R
  • PUBDEV-1337: Python needs h2o.loadModel, like in R
  • PUBDEV-1335: Python needs h2o.interaction, like in R
  • PUBDEV-1334: Python needs h2o.hist, like in R
  • PUBDEV-1351: Python needs h2o.sub, like in R
  • PUBDEV-1333: Python needs h2o.gsub, like in R
  • PUBDEV-1336: Python needs h2o.listTimezones, like in R
  • PUBDEV-1346: Python needs h2o.setTimezone, like in R
  • PUBDEV-1332: Python needs h2o.getTimezone, like in R
  • PUBDEV-1329: Python needs h2o.downloadCSV, like in R
  • PUBDEV-1328: Python needs h2o.downloadAllLogs, like in R
  • PUBDEV-1327: Python needs h2o.createFrame, like in R
  • PUBDEV-1326: Python needs h2o.clusterStatus, like in R
  • PUBDEV-1323: Python needs svd algo
  • PUBDEV-1322: Python needs prcomp algo
  • PUBDEV-1321: Python needs naiveBayes algo
  • PUBDEV-1320: Python needs model num_iterations accessor for clustering models, like R's
  • PUBDEV-1318: Python needs screeplot and plot methods, like R's. (should probably check for matplotlib)
  • PUBDEV-1317: Python needs multinomial model hit_ratio_table accessor, like R's
  • PUBDEV-1316: Python needs model scoreHistory accessor, like R's
  • PUBDEV-1315: R needs weights and biases accessors for deeplearning models
  • PUBDEV-1313: Python needs "as.Date" operator, like R's
  • PUBDEV-1312: Python needs "rbind" operator, like R's
  • PUBDEV-1345: Python needs h2o.setLevel and h2o.setLevels, like in R
  • PUBDEV-1311: Python needs "setLevel" operator, like R's
  • PUBDEV-1306: Python needs "anyFactor" operator, like R's
  • PUBDEV-1305: Python needs "table" operator, like R's
  • PUBDEV-1301: Python needs "as.numeric" operator, like R's
  • PUBDEV-1300: Python needs "as.character" operator, like R's
  • PUBDEV-1293: Python needs "signif" operator, like R's
  • PUBDEV-1292: Python needs "round" operator, like R's
  • PUBDEV-1291: Python need transpose operator, like R's t operator
  • PUBDEV-1289: Python needs element-wise division and multiplication operators, like %/% and %-%in R
  • PUBDEV-1330: Python needs h2o.exportHDFS, like in R
  • PUBDEV-1357: Python and R need which operator GitHub
  • PUBDEV-1356: Python and R needs isnumeric and ischaracter operators
  • PUBDEV-1342: Python needs h2o.removeVecs, like in R
  • PUBDEV-1324: Python needs h2o.assign, like in R GitHub
  • PUBDEV-1296: Python and R h2o clients need "any" operator, like R's
  • PUBDEV-1295: Python and R h2o clients need "prod" operator, like R's
  • PUBDEV-1294: Python and R h2o clients need "range" operator, like R's
  • PUBDEV-1290: Python and R h2o clients need "cummax", "cummin", "cumprod", and "cumsum" operators, like R's
  • PUBDEV-1325: Python needs h2o.clearLog, like in R
  • PUBDEV-1349: Python needs h2o.startLogging and h2o.stopLogging, like in R
  • PUBDEV-1341: Python needs h2o.openLog, like in R
  • PUBDEV-1348: Python needs h2o.startGLMJob, like in R
  • PUBDEV-1331: Python needs h2o.getFutureModel, like in R
  • PUBDEV-1302: Python needs "match" operator, like R's
  • PUBDEV-1298: Python needs "%in%" operator, like R's
  • PUBDEV-1310: Python needs "scale" operator, like R's
  • PUBDEV-1297: Python needs "all" operator, like R's
  • GitHub: add start_glm_job() and get_future_model() to python client. add H2OModelFuture class. add respective pyunit
R
  • PUBDEV-1273: Add h2oEnsemble R package to h2o-3
  • PUBDEV-1319: R needs centroid_stats accessor like Python, for clustering models
Rapids
  • PUBDEV-1635: the equivalent of R's "any" should probably implemented in rapids
  • PUBDEV-1634: the equivalent of R's cummin, cummax, cumprod, cumsum should probably implemented in rapids
  • PUBDEV-1633: the equivalent of R's "range" should probably implemented in rapids
  • PUBDEV-1632: the equivalent of R's "prod" should probably implemented in rapids
  • PUBDEV-1699: the equivalent of R's "unique" should probably implemented in rapids GitHub
System
  • GitHub: changed to new AMI
  • PUBDEV-679: Create cross-validation holdout sets using the per-row weights
  • GitHub: Add user_name. Add ExtensionHandler1.
  • GitHub: Added auth options to h2o.init().
  • GitHub: Added H2O.calcNextUniqueModelId().
  • GitHub: Add ldap arg.
Web UI
  • HEXDEV-231: Flow: Ability to change column type post-Parse

Enhancements

Algorithms
  • GitHub: use fixed seed to avoid bad splits with some seeds
  • GitHub: Change seed to avoid type flip from integer to double after row slicing, which leads to different split decisions
  • GitHub: Add option during kmeans scoring to return matrix of indicator columns for cluster assignment, which is necessary for initializing GLRM
  • GitHub: Output number of processed observations in PCA
  • GitHub: Add validation into PCA with GramSVD
  • GitHub: Code cleanup of distributions. Also rename _n_folds -> _nfolds for consistency
  • GitHub: Remove restriction to data frames with more than 1 column
  • GitHub: Add debugging output for DL auto-tuning.
  • PUBDEV-556: implement algo-agnostic cross-validation mechanism via a column of weights
  • GitHub: When initializing with kmeans++ set X to matrix of indicator columns corresponding to cluster assignments, unless closed form solution exists
  • GitHub: Always print DL auto-tuning info for now.
  • PUBDEV-1657: pca: would be good to remove the redundant std dev from flow pca model object
API
  • GitHub: Set Content-Type: application/x-www-form-urlencoded for regular POST requests.
  • HEXDEV-272: Move response_column parameter above ignored_columns parameter GitHub
    • All of the fields of a schema are now stored in the leaf child of the class hierarchy. Changed the implementation of fields() to simply return the fields variable of a schema. The function calls H2O.fail() if it attempts to access a field from a non-leaf child. response_column is now moved above ignored_columns for every applicable schema. 'own_fields' is also now renamed to 'fields'
  • GitHub: Don't use features from servlet api 3.0 or later anymore. Instead save the response status in a thread local variable and fish it out when needed.
Python
  • GitHub: don't use the header of the timezone table for a choice
  • GitHub: never delete models. ever.
  • GitHub: add na_rm argument
  • GitHub: add prod to python interface
System
  • GitHub: use Key instead of Vec in refcnter
  • GitHub: protect vecs in apply
  • GitHub: Allows for more than one column to remain unnamed. The new naming will fill in the blanks.
  • GitHub: Refactoring of hadoop mapper and driver.
  • GitHub: Remove -hdfs option.
  • GitHub: Adds more checks for a parse cancel at more stages during the post ingestion file parse.
  • GitHub: Refactor method name for clarification.
  • GitHub: Cleans up and comments the freeing of chunks from a parsed file.
  • GitHub: Since more startup logic is getting added, simplify H2OClientApp as much as possible. Remove H2OClient entirely.
  • GitHub: Add dedicated AddCommonResponseHeadersHandler handler to set common response headers up-front.
  • GitHub: More refactoring of startup. Pushed a bunch of code from H2OApp into H2O. Added H2O.configureLogging().
  • GitHub: Make Progress extend Keyed.
  • GitHub: Make createServer() protected.
  • GitHub: model_id should probably be a Key, not Key.
  • GitHub: Change Jetty version from 9 to 8 to get Java 6 compatibility back.
Web UI
  • PUBDEV-1521: show REST API and overall UI response times for each cell in Flow
  • HEXDEV-304: Flow: Emphasize run time in job-progress output
  • PUBDEV-1522: show wall-clock start and run times in the Flow outline
  • PUBDEV-1707: Hook up "Export" button for datasets (frames) in Flow.

Bug Fixes

Algorithms
  • PUBDEV-1641: gbm w poisson: get java.lang.AssertionError' at hex.tree.gbm.GBM$GBMDriver.buildNextKTrees on attached data
  • PUBDEV-1672: kmeans: get AIOOB with user specified centroids GitHub
    • Throw an error if the number of rows in the user-specified initial centers is not equal to k.
  • PUBDEV-1654: pca: gram-svd std dev differs for v2 vs v3 for attached data
  • GitHub: Fix DL
  • GitHub: Fix a bug in PCA utilities for k = 1
  • PUBDEV-1700: nfolds: flow-when set nfold =1 job hangs for ever; in terminal get java.lang.AssertionError
  • PUBDEV-1706: GBM/DRF: is balance_classes=TRUE and nfolds>1 valid? GitHub
  • PUBDEV-806: GLM => runit_demo_glm_uuid.R : water.exceptions.H2OIllegalArgumentException
  • PUBDEV-1696: Client (model-build) is blocked when passing illegal nfolds value. GitHub
  • PUBDEV-1690: Cross Validation: if nfolds > number of observations, should it default to leave-one-out cross-validation?
  • PUBDEV-1537: pca: on airlines get java.lang.AssertionError at hex.svd.SVD$SVDDriver.compute2(SVD.java:219) GitHub
  • PUBDEV-1603: pca: glrm giving very different std dev than R and h2o's other methods for attached data
  • GitHub: Fix a potential race condition in tree validation scoring.
  • GitHub: Fix GLM parameter schema. Clean up hasOffset() and hasWeights()
Python
  • PUBDEV-1627: column name missing (python client)
  • PUBDEV-1629: python client's tail() header incorrect GitHub
  • PUBDEV-1413: intermittent assertion errors in pyunit_citi_bike_small.py/pyunit_citi_bike_large.py. Client apparently not notified
  • PUBDEV-1590: "Trying to unlock null" assertion during pyunit_citi_bike_large.py
  • PUBDEV-1400: match operator should take numerics
R
Rapids
Sparkling Water
System
  • PUBDEV-1551: Parser: Multifile Parse fails with 0-byte files in directory GitHub
  • HEXDEV-325: Empty reply when parsing dataset with mismatching header and data column length
  • PUBDEV-1509: Split frame : Big datasets : On 186K rows 3200 Cols split frame took 40 mins => which is too long
  • PUBDEV-1438: Column naming can create duplicate column names
  • PUBDEV-1105: NPE in Rollupstats after failed parse
  • PUBDEV-1142: H2O parse: When cancel a parse job, key remains locked and hence unable to delete the file GitHub
  • GitHub: client mode deadlock issue resolution
  • PUBDEV-1670: Client mode fails consistently sometimes : GBM_offset_tweedie.R.out.txt :
  • GitHub: nbhm bug: K == TOMBSTONE not key == TOMBSTONE
  • GitHub: Pulls out a GAID from resource in jar if the GAID doesn't equal the default. Presumably the GAID has been changed by the jar baking program.
Web UI
  • PUBDEV-872: Flows : Not able to load saved flows from hdfs/local GitHub
  • PUBDEV-554: Flow:Parse two different files simultaneously, flow should either complain or fill the additional (incompatible) rows with nas
  • PUBDEV-1527: missing .java extension when downloading pojo GitHub
  • PUBDEV-1642: Changing columns type takes column list back to first page of columns
  • PUBDEV-1508: Flow : Import file => Parse => Error compiling coffee-script Maximum call stack size exceeded
  • PUBDEV-1606: Flow :=> Cannot save flow on hdfs
  • PUBDEV-1527: missing .java extension when downloading pojo
  • PUBDEV-1653: Flow: the column names do not modify when user changes the dataset in model builder

Shannon (3.0.0.26) - 7/4/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/26/index.html

New Features

Algorithms
  • PUBDEV-1592: Expose standardization shift/mult values in the Model output in R/Python. GitHub
Python
  • GitHub: add h2o.shutdown to python client
  • GitHub: add h2o.hist and respective pyunit
  • GitHub: gbm weight pyunit (variable importances)
R
Web UI

Enhancements

Algorithms
  • PUBDEV-1494: GBM : Weights math correctness tests in R
  • PUBDEV-1523: GLM w tweedie: for attached data, R giving much better res dev than h2o
  • PUBDEV-1396: Offsets/Weights: Math correctness for GLM
  • PUBDEV-1496: RF : Weights Math correctness tests in R
  • HEXDEV-366: remove weights option from DRF and GBM in REST API, Python, R
  • PUBDEV-1553: Threshold in GLM is hardcoded to 0
  • GitHub: Make min_rows a double instead of int: Is now weighted number of observations (min_obs in R).
  • GitHub: Don't use sample weighted variance, but full weighted variance.
  • GitHub: Fix R^2 computation.
  • GitHub: Skip rows with missing response in weighted mean computation.
  • _binomial_double_trees disabled by default for DRF (was enabled).
  • GitHub: Relax tolerance.
  • HEXDEV-329 : Offset for GBM
  • HEXDEV-211 : Tweedie distributions for GLM
API
  • PUBDEV-1491: generated REST API POJOS should be compiled and jar'd up as part of the build
  • GitHub: Change schema for PCA, SVD, and GLRM to version 99
Python
  • GitHub: is factor returns TRUE/FALSE cast to scalar 1/0
  • GitHub: take a slightly different syntactic approach to dropping column
  • GitHub: better list comp in interaction call
  • GitHub: if weights_column argument is specified, attach the column to the training and/or validation frame (if not already specified as part of x/validation_x). if weights_column is not already part of x/validation_x, then a training_frame/validation_frame needs to be provided and the weights column is taken from here. respective pyunit added
R
  • GitHub: better ref handling in the [<- for python and R
  • GitHub: Pass binomial_double_trees in the R wrapper for DRF.
  • GitHub: carefully format NAs and non NAs
  • GitHub: for loop over the x[[j]] to format NAs properly
  • GitHub: Added example to h2o-r/ensemble/create_h2o_wrappers.R
System
  • GitHub: allow for no y in model_builder
  • GitHub: Enable auto-flag for Java6 generation.
  • GitHub: better compression in split frame
  • PUBDEV-1594: All basic file accessors in PersistHDFS should check file permissions
  • PUBDEV-1518: getFrames should show a Parse button for raw frames
Web UI
  • PUBDEV-1545: Flow => Build model => ignored columns table => should have column width resizing based on column names width => looks odd if column names are short
  • PUBDEV-1546: Flow: Build model => Search for 1 column => select it => build model shows list of columns instead of 1 column
  • PUBDEV-1254: Flow: Add Impute

Bug Fixes

Algorithms
  • PUBDEV-1554: dl with offset: when offset same as response, do not get 0 mse
  • PUBDEV-1555: h2oR: dl with offset giving : Error in args$x_ignore : object of type 'closure' is not subsettable
  • PUBDEV-1487: gbm weights: give different terminal node predictions than R for attached data
  • PUBDEV-1569: Investigate effectiveness of _binomial_double_trees (DRF) GitHub
  • PUBDEV-1574: Actually pass 'binomial_double_trees' argument given to R wrapper to DRF.
  • PUBDEV-1444: DL: h2o.saveModel cannot save metrics when a deeplearning model has a validation_frame
  • PUBDEV-1579: GBM test time predictions without weights seem off when training with weights GitHub
  • PUBDEV-1533: GLM: doubled weights should produce the same result as doubling the observations GitHub
  • PUBDEV-1531: GLM: it appears that observations with 0 weights are not ignored, as they should be.
  • GitHub: Fix a bug in PCA scoring that was handling categorical NAs inconsistently
  • PUBDEV-1581: Regression 3060 fails on GLRM in R tests
  • PUBDEV-1586: change Grid endpoints and schemas to v99 since they are still in flux
  • PUBDEV-1589: GLM : build model => airlinesbillion dataset => IRLSM/LBFGS => fails with array index out of bound exception
  • PUBDEV-1607: gbm w offset: predict seems to be wrong
  • PUBDEV-1600: Frame name creation fails when file name contains csv or zip (not as extension)
  • PUBDEV-1577: DL predictions on test set require weights if trained with weights
  • PUBDEV-1598: Flow: After running pca when call get Model/ jobs get: Failed to find schema for version: 3 and type: PCA
  • PUBDEV-1576: Test variable importances for weights for GBM/DRF/DL
  • PUBDEV-1517: With R, deep learning autoencoder using all columns in frame, not just those specified in x parameter
  • PUBDEV-1593: dl var importance:there is a .missing(NA) variable in Dl variable importnce even when data has no nas
Python
  • PUBDEV-1538: h2o.save_model fails on windoz due to path nonsense
  • GitHub: python leaked key check for Vecs, Chunks, and Frames
  • PUBDEV-1609: frame dimension mismatch between upload/import method
R
  • PUBDEV-1601: h2o.loadModel() from hdfs
  • PUBDEV-1611: R CMD Check failing on : The Date field is over a month old.
System
  • PUBDEV-1514: Large number of columns (~30000) on importFile (flow) is slow / unresponsive for long time
  • PUBDEV-841: Split frame : Flow should not show raw frames for SplitFrame dialog (water.exceptions.H2OIllegalArgumentException)
  • PUBDEV-1459: bug in GLM POJO: seems threshold for binary predictions is always 0
  • PUBDEV-1566: Cannot save model on windows since Key contains '@' (illegal character to path)
  • GitHub: Fixes the timezone lists.
  • GitHub: R CMD check fix for date
  • GitHub: add ec2 back into project
Web UI
  • HEXDEV-54: Flow : Import file 100k.svm => Something went wrong while displaying page

Shannon (3.0.0.25) - 6/25/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/25/index.html

Enhancements

API
  • PUBDEV-1452: branch 3.0.0.2 to REGRESSION_REST_API_3 and cherry-pick the /99/Rapids changes to it

##Web UI

  • PUBDEV-1545: Flow => Build model => ignored columns table => should have column width resizing based on column names width => looks odd if column names are short
  • PUBDEV-1546: Flow : Build model => Search for 1 column => select it => build model shows list of columns instead of 1 column

Bug Fixes

The following changes are to resolve incorrect software behavior:

Algorithms
  • PUBDEV-1487: gbm weights: give different terminal node predictions than R for attached data
  • GitHub: Fix offset for DL.
  • GitHub: Gracefully handle 0 weight for GBM.
Python
  • PUBDEV-1547: Weights API: weights column not found in python client
R
  • GitHub: Fix R wrapper for DL for weights/offset.
Web UI
  • PUBDEV-1528: Flow model builder: the na filter does not select all ignored columns; just the first 100.

Shannon (3.0.0.24) - 6/25/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/24/index.html

New Features

Algorithms
  • GitHub: Allow validation for unsupervised models.
R
  • GitHub: Added runit GBM weights
  • GitHub: Updated runit_GBM_weights.R
Python
  • GitHub: add h2o.set_timezone h2o.get_timezone and h2o.list_timezones to python client and respective pyunit.
  • GitHub: add h2o.save_model and h2o.load_model to python client and respective pyunit

Enhancements

Algorithms
  • GitHub: Skip rows with weight 0.
  • GitHub: x_ignore must be set when autoencoder is TRUE
System
  • GitHub: Fix Java bindings generator to generate code under project's location.
  • GitHub: Adds input parameter check to ParseSetup.

Bug Fixes

Algorithms
  • PUBDEV-1529: dl with ae: get ava.lang.UnsupportedOperationException: Trying to predict with an unstable model.
  • GitHub: Bring back accidentally removed hiding of classification-related fields for unsupervised models.
API
  • PUBDEV-1456: fix REST API POJO generation for enums, + java.util.map import

Shannon (3.0.0.23) - 6/19/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/23/index.html

New Features

Algorithms
API
  • PUBDEV-61: do back-end work to allow document navigation from one Schema to another
  • PUBDEV-133: doing summary means calling it with each columns name, index not supported?
Python
  • GitHub: add num_iterations accessor to python client and respective pyunit
  • GitHub: add score_history accessor to python client and respective pyunit
  • GitHub: add hit ratio table accessor to python interface and respective pyunit
  • GitHub: add h2o.naivebayes and respective pyunits
  • GitHub: add h2o.prcomp and respective pyunits.
  • PUBDEV-681: Add user-given input weight parameters to Python
  • GitHub: add h2o.create_frame to python client and respective pyunit
  • GitHub: add h2o.interaction and respective pyunit
  • GitHub: add h2o.strplit to python client and respective pyunit
  • GitHub: add h2o.toupper and h2o.tolower to python client and respective pyunit
  • GitHub: add h2o.sub and h2o.gsub to python interface and respective pyunit
  • GitHub: add h2o.trim() to python client and respective pyunit
  • GitHub: add h2o.rep_len to python client and respective pyunit
  • GitHub: add h2o.svd to python client and respective golden pyunit
  • GitHub: add scree plot functionality to python client and respective pyunit
  • GitHub: add plotting functionality to python client and respective pyunit
R
  • GitHub: added h2o.weights and h2o.biases accessors to R client and update respective runit
  • GitHub: add h2o.centroid_stats to R client and respective runit
  • PUBDEV-680: Add user-given input weight parameters to R
  • GitHub: Add offset/weights to DRF/GBM R wrappers.
Web UI

Enhancements

Algorithms
  • PUBDEV-676: Use the user-given weight Vec as observation weights for all algos
  • GitHub: Refactor the code to let the caller compute the weighted sigma.
  • GitHub: Modify prior class distribution to be computed from weighted response.
  • GitHub: Put back the defaultThreshold that's based on training/validation metrics. Was accidentally removed together with SupervisedModel.
  • GitHub: Always sample to at least #class labels when doing stratified sampling.
  • GitHub: Cutout for NAs in GLM score0(data[],...), same as for score0(Chunk[],…)
R
  • PUBDEV-856: All h2o things in R should have an h2o.something version so it's unambiguous GitHub
  • GitHub: export clusterIsUp and clusterInfo commands
  • GitHub: update accessors in the shim
  • GitHub: gbm with async exec
System
  • HEXDEV-361: Wide frame handling for model builders
  • GitHub: Remove application plugin from assembly to speedup build process.
  • GitHub: add byteSize to ls
  • GitHub: option to launch randomForest async
  • GitHub: Return HDFS persist manager for URIs starting with s3n and s3a
  • GitHub: quote strings when writing to disk

Bug Fixes

Algorithms
  • PUBDEV-1217: pca: when cancel the job the key remains locked
  • PUBDEV-1468: Error in GBM if response column is constant GitHub
  • PUBDEV-1476: dl with obs weights: nas in weights cause 'java.lang.AssertionError GitHub
  • PUBDEV-1458: pca: data with nas, v2 vs v3 slightly different results GitHub
  • PUBDEV-1477: dl w/obs wts: when all wts are zero, get java.lang.AssertionError GitHub
  • GitHub: Fix check for offset (allow offset for logistic regression).
  • GitHub: Gracefully handle exception when launching single-node DRF/GBM in client mode.
  • GitHub: Hack around the fact that hasWeights()/hasOffset() isn't available on remote nodes and that SharedTree is sent to remote nodes and its private internal classes need access to the above methods...
  • GitHub: Fix scoring when NAs are predicted.
Python
  • PUBDEV-1469: pyunit_citi_bike_large.py : test failing consistently on regression jobs
  • PUBDEV-1472: Regression job : Pyunit small tests groupie and pub_444_spaces failing consistently
  • PUBDEV-1372: Regression of pyunit_small, Groupby.py
  • PUBDEV-1386: intermittent fail in pyunit_citi_bike_small.py: -Unimplemented- failed lookup on token
  • PUBDEV-1471: pyunit_citi_bike_small.py : failing consistently on regression jobs
  • PUBDEV-1466: matplotlib.pyplot import failure on MASTER jenkins pyunit small jobs GitHub
  • GitHub: minor fix to python's h2o.create_frame
  • GitHub: update the path to jar in connection.py
R
  • PUBDEV-1475: Client mode failed tests : runit_GBM_one_node.R, runit_RF_one_node.R, runit_v_3_apply.R, runit_v_4_createfunctions.R GitHub
  • PUBDEV-1235: Split Frame causes AIOOBE on Chicago crimes data GitHub
  • PUBDEV-746: runit_demo_NOPASS_h2o_impute_R : h2o.impute() is missing. seems like we want that?
  • PUBDEV-582: H2O-R- does not give the full column summary
  • PUBDEV-1473: Regression : Runit small jobs failing on tests :
  • PUBDEV-741: runit_NOPASS_pub-668 R tests uses all() ...h2o says all is unimplemented
  • PUBDEV-1506: R: h2o.ls() needs to return data sizes
  • PUBDEV-1436: Intermitent runit fail : runit_GBM_ecology.R GitHub
  • PUBDEV-1464: R: toupper/tolower don't work GitHub GitHub
  • PUBDEV-1194: R: dataset is imported but can't return head of frame
Sparkling Water
  • PUBDEV-975: Download page for Sparkling Water should point to the right R-client and Python client
  • PUBDEV-1428: Sparkling water => Flow => Million song/KDD Cup path issues GitHub
Web UI
  • PUBDEV-1433: Flow UI: Change Help > FAQ link to h2o-docs/index.html#FAQ

Shannon (3.0.0.22) - 6/13/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/22/index.html

#New Features

##API

  • PUBDEV-633: Generate Java bindings for REST API: POJOs for the entities (schemas)

##Python

  • GitHub: added h2o.anyfactor() and respective pyunit
  • GitHub: add h2o.scale and respective pyunit
  • GitHub: added levels, nlevels, setLevel and setLevels and respective pyunit...PUBDEV-1434 PUBDEV-1437 PUBDEV-1434 PUBDEV-1345 PUBDEV-1311
  • GitHub: add H2OFrame.as_date and pyunit addition. H2OFrame.setLevel should return a H2OFrame not a H2OVec.

#Enhancements

##Algorithms

  • GitHub: Add _build_tree_one_node option to GBM

## API

  • HEXDEV-352: Additional attributes on /Frames and /Frames/foo/summary

##R

  • PUBDEV-706: Release h2o-dev to CRAN
  • Adding parameter parse_type to upload/import file (GitHub)

##Python

  • GitHub: print out where h2o jar is looked for
  • GitHub:add h2o.ls and respective pyunit

##System

  • PUBDEV-717: refector the duplicated code in FramesV2
  • PUBDEV-1281: Add horizontal pagination of frames to Flow GitHub
  • PUBDEV-607: Add Xmx reporting to GA
  • GitHub:Added support for Freezable[][][] in serialization (added addAAA to auto buffer and DocGen, DocGen will just throw H2O.fail())
  • GitHub: No longer set yyyy-MM-dd and dd-MMM-yy dates that precede the epoch to be NA. Negative time values are fine. This unifies these two time formats with the behavior of as.Date.
  • GitHub: Reduces the verbosity of parse tracing messages.
  • GitHub: Rename AUTO->GUESS for figuring out file type.

## Web UI

  • HEXDEV-276: Add frame pagination
  • PUBDEV-1405: Flow : Decision to be made on display of number of columns for wider datasets for Parse and Frame summary
  • PUBDEV-1404: Usability improvements
  • PUBDEV-244: "View Data" display may need to be modified/shortened.

#Bug Fixes

##Algorithms

  • PUBDEV-1365: GLM: Buggy when likelihood equals infinity
  • PUBDEV-1394: GLM: Some offsets hang
  • PUBDEV-1268: GLM: get java.lang.AssertionError at hex.glm.GLM$GLMSingleLambdaTsk.compute2 for attached data
  • PUBDEV-1403: pca: h2o-3 reporting incorrect proportion of variance and cum prop GitHub
  • HEXDEV-281: GLM - beta constraints with categorical variables fails with AIOOB
  • HEXDEV-280: GLM - gradient not within tolerance when specifying beta_constraints w/ and w/o prior values

## Python

## R

## System

  • PUBDEV-1423: Phantomjs : Add timeout command line option
  • PUBDEV-1401: Flow : Import file 15 M Rows 2.2K cols=> Parse these files => Change first column type => Unknown => Try to change other columns => Kind of hangs
  • PUBDEV-1406: make the ParseSetup / Parse API more efficient for high column counts GitHub

Shannon (3.0.0.21) - 6/12/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/21/index.html

New Features

Python
  • HEXDEV-29: The ability to define features as categorical or continuous in the web UI and in the python API

Enhancements

Algorithms
  • GitHub Made intercept option public and added it to field list in parameter schema
  • GitHub GLM: Updated null model intercept fit.
  • GitHub GLM: Updated null-model constant term fitting when running with offset
  • GitHub glm update
  • GitHub DL code refactoring to reduce file sizes
Python
  • GitHub add h2o.round() and h2o.signif() and additional pyunit checks
  • GitHub add h2o.all() and respective pyunit checks
R
  • GitHub added intercept option top R
System
Web UI
  • GitHub Add horizontal pagination of /Frames to handle UI navigation of wide datasets more efficiently.
  • GitHub Only show the top 7 metrics for the max metrics table
  • GitHub Make the max metrics table entries be called max f1 etc.

Bug Fixes

The following changes are to resolve incorrect software behavior:

Algorithms
  • PUBDEV-1365: GLM: Buggy when likelihood equals infinity GitHub
  • PUBDEV-1394: GLM: Some offsets hang
  • PUBDEV-1268: GLM: get java.lang.AssertionError at hex.glm.GLM$GLMSingleLambdaTsk.compute2 for attached data
  • PUBDEV-1382: pca: giving wrong std- dev for mentioned data
  • PUBDEV-1383: pca: std dev numbers differ for v2 and v3 for attached data GitHub
  • PUBDEV-1381: GBM, RF: get an NPE when run with a validation set with no response GitHub
  • GitHub GLM fix - fixed fitting of null model constant term
  • GitHub Fix remote bug
  • GitHub Remove elastic averaging parameters from Flow.
  • PUBDEV-1398: pca: predictions on the attached data from v2 and v3 differ
Python
R
  • PUBDEV-761: Save model and restore model (from R)
  • PUBDEV-1236: h2o-r/tests/testdir_misc/runit_mergecat.R failure (client mode only)
System
  • PUBDEV-1402: move Rapids to /99 since it's going to be in flux for a while GitHub
  • GitHub Fixes an operator precedence issue, and replaces debug GA target with actual one.
  • GitHub Fix log download bug where all nodes were getting the same zip file.

Shannon (3.0.0.18) - 6/9/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/18/index.html

New Features

System
Python
  • GitHub: Added --h2ojar option

Enhancements

Python
  • PUBDEV-277: Make python equivalent of as.h2o() work for numpy array and pandas arrays

Bug Fixes

Algorithms
  • PUBDEV-1371: pca: get java.lang.AssertionError at hex.svd.SVD$SVDDriver.compute2(SVD.java:198)
  • PUBDEV-1376: pca: predictions from h2o-3 and h2o-2 differs for attached data
  • PUBDEV-1380: DL: when try to access the training frame from the link in the dl model get: Object not found
R

Shannon (3.0.0.17) - 6/8/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/17/index.html

New Features

Algorithms
Python
  • PUBDEV-1270: Python Interface needs H2O Cut Function GitHub
  • PUBDEV-1242: Need equivalent of as.Date feature in Python GitHub
  • PUBDEV-1165: H2O Python needs Modulus Operations
  • HEXDEV-29: The ability to define features as categorical or continuous in the web UI and in the python API
  • PUBDEV-1237: environment variable to disable the strict version check in the R and Python bindings
Web UI
  • PUBDEV-1175: Flow: Good interactive confusion matrix for binomial
  • PUBDEV-1176: Flow: Good confusion matrix for multinomial

Enhancements

Algorithms
  • GitHub: GLM weights fix: regularize by sum of weights rather than number of observations
  • GitHub: GLM fix: added line search (and limited number of iterations) to constant term model fitting with offset (could enter infinite loop)
  • GitHub: No longer warn if binomial_double_trees option is enabled for _nclass!=2
  • GitHub: Fix CM table to have integer entries unless there are real-valued entries
  • GitHub: Add extra assertion for train_samples_per_iteration
  • GitHub: Update model during runtime of algorithm.
  • GitHub: Changes to glm forloop to add offsets and add NOPASS/NOFEATURE functionality back to run.py
R
  • GitHub: month was off by one, runit test edited
  • GitHub: Comments to clarify the policy on dates in H2O.
System
  • HEXDEV-344: Logs should include JVM launch parameters
Web UI
  • PUBDEV-467: Show Frames for DL weights/biases in Flow
  • PUBDEV-1221: add a "I like this" style button with LinkedIn or Github (beside the Flow Assist Me button)
  • PUBDEV-1245: Flow: use new _exclude_fields query parameter to speed up REST API usage

Bug Fixes

Algorithms
  • PUBDEV-1353: GLM: model with weights different in R than in H2o for attached data
  • PUBDEV-1358: GLM: when run with -ive weights, would be good to tell the user that -ive weights not allowed instead of throwing exception
  • PUBDEV-1264: GLM: reporting incorrect null deviance GitHub
  • PUBDEV-1362: GLM: when run with weights and offset get wrong ans
  • PUBDEV-1263: GLM: name ordering for the coefficients is incorrect GitHub
  • PUBDEV-1261: pca: wrong std dev for data with nas rest numeric cols GitHub
  • PUBDEV-1218: pca: progress bar not showing progress just the initial and final progress status GitHub
  • PUBDEV-1204: pca: from flow when try to invoke build model, displays-ERROR FETCHING INITIAL MODEL BUILDER STATE
  • PUBDEV-1212: pca: with enum column reporting (some junk) wrong stdev/ rotation GitHub
  • PUBDEV-1228: pca: no std dev getting reported for attached data
  • PUBDEV-1233: pca: std dev for attached data differ when run on h2o-3 and h2o-2
  • PUBDEV-1258: h2o.glm with offset column: get Error in .h2o.startModelJob(conn, algo, params) : Offset column 'logInsured' not found in the training frame.
R
Sparkling Water
System
  • PUBDEV-1288: Confusion Matrix: class java.lang.ArrayIndexOutOfBoundsException', with msg '2' java.lang.ArrayIndexOutOfBoundsException: 2 at hex.ConfusionMatrix.createConfusionMatrixHeader Github
  • HEXDEV-323: SVMLight Parse Bug GitHub
  • PUBDEV-1207: implement JSON field-filtering features: _exclude_fields
  • GitHub: Fix a missing field update in Job.
  • PUBDEV-65: Handling of strings columns in summary is broken
  • PUBDEV-1230: Parse: get AIOOB when parses the attached file with first two cols as enum while h2o-2 does fine
  • PUBDEV-1377: Get AIOOBE when parsing a file with fewer column names than columns GitHub
  • PUBDEV-1364: Variable importance Object
Web UI
  • PUBDEV-1198: Flow: Selecting "Cancel" for "Load Notebook" prompt clears current notebook anyway
  • PUBDEV-1172: Model builder takes forever to load the column names in Flow, hence cannot build any models
  • PUBDEV-1248: Flow GLM: from Flow the drop down with column names does not show up and hence not able to select the offset column
  • PUBDEV-1380: DL: when try to access the training frame from the link in the dl model get: Object not found GitHub

Shannon (3.0.0.13) - 5/30/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/13/index.html

New Features

Algorithms
Python
R

Enhancements

Algorithms
API
  • PUBDEV-669: have the /Frames/{key}/summary API call Vec.startRollupStats
R/Python
  • PUBDEV-479: Port MissingInserter to R/Python
  • PUBDEV-632: Display TwoDimTable of HitRatios in R/Python
  • github: minor change to h2o.demo()
  • github: add h2o.demo() facility to python package, along with some built-in (small) data
  • github: remove cols param

Bug Fixes

Algorithms
  • PUBDEV-1211: pca: descaled pca, std dev seems to be wrong for attached data github
  • PUBDEV-1213: pca: would be good to have the std dev numbered bec difficult to relate to the principal components (github)
  • PUBDEV-1201: pca: get ArrayIndexOutOfBoundsException (github)
  • PUBDEV-1203: pca: giving wrong std dev/rotation-labels for iris with species as enum (github)
  • PUBDEV-1199: DL with <1 epochs has wrong initial estimated time (github)
  • github: Fix missing AUC for training data in DL.
  • github: Add the seed back to GBM imbalanced test (was set to 0 by default before, now explicit)
R
  • PUBDEV-1189: R: h2o.hist broken for breaks that is a list of the break intervals (github)
  • PUBDEV-1206: Frame summary from R and Python need to use the Frame summary endpoint (github)
  • PUBDEV-1177: R summary() is slow when large number of columns
  • PUBDEV-1097: R: R should be able to take a of paths similar to how python does

Shannon (3.0.0.11) - 5/22/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/11/index.html

Enhancements

Algorithms
  • PUBDEV-1179: DRF: investigate if larger seeds giving better models
  • PUBDEV-1178: Add logloss/AUC/Error to GBM/DRF Logs & ScoringHistory
  • PUBDEV-1169: Use only 1 tree for DRF binomial (github)
  • PUBDEV-1170: Wrong ROC is shown for DRF (Training ROC, even though Validation is given)
  • PUBDEV-1162: Speed up sorting of histograms with O(N log N) instead of O(N^2)
System

Bug Fixes

Algorithms
  • HEXDEV-253: model output consistency
  • HEXDEV-319: DRF in h2o 3.0 is worse than in h2o 2.0 for Airline
  • PUBDEV-1180: DRF has wrong training metrics when validation is given
API
  • PUBDEV-501: H2OPredict: does not complain when you build a model with one dataset and predict on completely different dataset
Python
  • PUBDEV-1183: Python version check should fail hard by default
  • PUBDEV-1185: Python binding version mismatch check should fail hard and be on by default
  • HEXDEV-138: Port Python tests for Deep Learning

##R

  • PUBDEV-1160: R: h2o.hist doesn't support breaks argument
  • PUBDEV-1159: R: h2o.hist takes too long to run
  • PUBDEV-1150: R CMD Check: URLs not working
  • PUBDEV-1149: R CMD check not happy with our use of .OnAttach
  • PUBDEV-1174: R: h2o.hist FD implementation broken
  • PUBDEV-1167: R: h2o.group_by broken
  • HEXDEV-318: the fix to H2O startup for the host unreachable from R causes a security hole
  • PUBDEV-1187: FramesHandler.summary() needs to run summary on all Vecs concurrently.
System
  • PUBDEV-862: Building a model without training file -> NPE
  • HEXDEV-315: importFile fails: Error in fromJSON(txt, ...) : unexpected character: A
  • PUBDEV-1137: Parse: upload and import gives different chunk compression on the same file
  • PUBDEV-1054: Parse: h2o parses arff file incorrectly
  • PUBDEV-1181: Rapids should queue and block on the back-end to prevent overlapping calls
  • PUBDEV-1184: importFile fails for paths containing spaces
Web UI
  • PUBDEV-1182: Flow: when upload file fails, the control does not come back to the flow screen, and have to refresh the whole page to get it back
  • PUBDEV-1131: GBM crashes after calling getJobs in Flow

Shannon (3.0.0.7) - 5/18/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/7/index.html

Enhancements

API
  • PUBDEV-711: take a final look at all REST API parameter names and help strings
  • PUBDEV-757: Rename DocsV1 + DocsHandler to MetadataV1 + MetadataHandler
  • PUBDEV-1138: Performance improvements for big data sets => getModels
  • PUBDEV-1126: Performance improvements for big data sets => Get frame summary
System
  • HEXDEV-316: ImportFiles should not download files from HTTP
Web UI

Bug Fixes

The following changes are to resolve incorrect software behavior:

API
  • PUBDEV-501: H2OPredict: does not complain when you build a model with one dataset and predict on completely different dataset
  • PUBDEV-1047: API : Get frames and Build model => takes long time to get frames
  • HEXDEV-149: Allow JobsV3 to return properly typed jobs, not always instances of JobV3
  • PUBDEV-1036: rename straggler V2 schemas to V3
R
System
  • PUBDEV-1034: Windows 7/8/2012 Multicast Error UDP
  • PUBDEV-862: Building a model without training file -> NPE
  • HEXDEV-253: model output consistency
  • PUBDEV-1135: While predicting get:class water.fvec.RollupStats$ComputeRollupsTask; class java.lang.ArrayIndexOutOfBoundsException: 5
  • PUBDEV-1090: POJO: Models with "." in key name (ex. pros.glm) can't access pojo endpoint
  • PUBDEV-1077: Getting an IcedHashMap warning from H2O startup
Web UI
  • PUBDEV-1133: getModels in Flow returns error
  • PUBDEV-926: Flow: When user hits build model without specifying the training frame, it would be good if Flow guides the user. It presently shows an NPE msg
  • PUBDEV-1131: GBM crashes after calling getJobs in Flow

Shannon (3.0.0.2) - 5/15/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/2/index.html

New Features

ModelMetrics
WebUI
  • PUBDEV-942: ModelMetrics by model category - Autoencoder

Enhancements

Algorithms
  • github: GLM update: skip lambda max during lambda search
  • github: removed higher accuracy option
  • github: Rename constant col parameter
  • github: GLM update: added stopping criteria to lbfgs, tweaked some internal constants in ADMM
  • github: Add support for ignore_const_col in DL
Python
  • PUBDEV-852: Binomial: show per-metric-optimal CM and per-threshold CM in Python
  • github: add filterNACols to python
  • github: h2o.delete replaced with h2o.removeFrameShallow
  • github: Add distribution summary to Python
R
  • github: add filterNACols to R
  • github: explicitly set cols=TRUE for R style str on frames
  • github: enable faster str, bulk nlevels, bulk levels, bulk is.factor
  • github: Add optional blocking parameter to h2o.uploadFile
System
  • PUBDEV-672 HTML version of the REST API docs should be available on the website
  • PUBDEV-827: class GenModel duplicates part of code of Model
Web UI
  • HEXDEV-181 Flow: Handle deep features prediction input and output
  • github: removed use_all_factor_levels from glm flows

Bug Fixes

Algorithms
  • HEXDEV-302: AIOOBE during Prediction with DL github
  • github: glm fix: don't force in null model for lambda search with user given list of lambdas
  • github: Fix domain in glm scoring output for binomial
  • github: GLM Fix - fix degrees of freedom when running without intercept (+/-1)
  • github: GLM fix: make valid data info be clone of train data info (needs exactly the same categorical offsets, ignore unseen levels)
  • github: Fix glm scoring, fill in default domain {0,1} for binary columns when scoring
R
  • PUBDEV-1116: R: Parse that works from flow doesn't work from R using as.h2o
  • PUBDEV-798: R: String Munging Functions Missing
  • PUBDEV-584: R: hist() doesn't currently work for H2O objects
  • PUBDEV-820: H2oR: model objects should return the CM when run classification like h2o1
  • PUBDEV-1113: Remove Keys : Parse => Remove => doesn't complete
  • PUBDEV-1102: R: h2o.rbind fails to join two dataset together
  • PUBDEV-899: R: all doesn't work
  • PUBDEV-555: H2O-R: str does not work
  • PUBDEV-1110: H2OR: while printing a gbm model object, get invalid format '%d'; use format %f, %e, %g or %a for numeric objects
  • PUBDEV-903: R: Errors from some rapids calls seem to fail to return an error
  • HEXDEV-311: Performance bug from R with Expect: 100-continue
  • PUBDEV-1030: h2o.performance: ignores the user specified threshold
  • PUBDEV-1071: R: regression models don't show in print statement r2 but it exists in the model object
  • PUBDEV-1072: R: missing accessors for glm specific fields
  • PUBDEV-1032: After running some R and py demos when invoke a build model from flow get- rollup stats problem vec deleted error
  • PUBDEV-1069: R: missing implementation for h2o.r2
  • PUBDEV-1064: Passing sep="," to h2o.importFile() fails with '400 Bad Request'
  • PUBDEV-1092: Get NPE while predicting
System
  • PUBDEV-1091: S3 gzip parse failure
  • PUBDEV-1081: Probably want to cleanly disable multicast (not retry) and print suggestion message, if multicast not supported on picked multicast network interface
  • PUBDEV-1112: User has no way to specify whether to drop constant columns
  • PUBDEV-1109: Change all extdata imports to uploadFile
  • PUBDEV-1104: .gz file parse exception from local filesystem
Web UI
  • PUBDEV-1134: getPredictions in Flow returns error
  • PUBDEV-1020: Flow : Drop NA Cols enable => Should automatically populate the ignored columns
  • PUBDEV-1041: Flow GLM: formatting needed for the model parameter listing in the model object github
  • PUBDEV-1108: Flow: When predict on data with no response get :Error processing POST /3/Predictions/models/gbm-a179db76-ba96-420f-a643-0e166aea3af3/frames/subset_1 'undefined' is not an object (evaluating 'prediction.model')

H2O-Dev

Shackleford (0.2.3.6) - 5/8/15

Download at: http://h2o-release.s3.amazonaws.com/h2o-dev/rel-shackleford/6/index.html

New Features

Python

##Sparkling Water

  • Publish h2o-scala and h2o-app latest version to maven central (PUBDEV-443)

Enhancements

Algorithms
  • Use AUC's default threshold for label-making for binomial classifiers predict() (PUBDEV-1063) (github)
  • GLM update (github)
  • Cleanup AUC2, make incremental version (github)
  • Name change: override_with_best_model -> overwrite_with_best_model (github)
  • Couple of GLM updates (github)
  • Disable _replicate_training_data for data that's larger than 10GB (github)
  • Added replicate_training_data param for DL (github)
  • Change a few kmeans output parameters so no longer dividing by nrows or num_clusters (github)
  • GLMValidation Updated auc computation (github)
  • Do not delete model metrics at end of GBM/DRF (github)
API
  • Clean REST api for Parse (PUBDEV-993)
  • Removes is_valid, invalid_lines, and domains from REST api (github)
  • Annotate domains output field as expert level (github)
Python
R
  • Cleaner client POJO download for R (PUBDEV-907)
  • Implement h2o.interaction() (PUBDEV-854) (github)
  • R: h2o.impute missing (PUBDEV-796)
  • validation_frame is passed through to h2o (github)
  • Adding GBM accessor function runits (github)
  • Adding changes to h2o.hit_ratio_table to be like other accessors (i.e., no train) (github)
  • add h2o.getPOJO to R, fix impute ast build in python (github)
System
  • Change NA strings to an array in ParseSetup (PUBDEV-995)
  • Document way of passing S3 credentials for S3N (PUBDEV-947)
  • Add H2O-dev doc on docs.h2o.ai via a new structure (proposed below) (PUBDEV-355)
  • Rapids Ref Doc (PUBDEV-667)
  • Show Timestamp and Duration for all model scoring histories (PUBDEV-1018) (github)
  • Logs slow reads, mainly meant for noting slow S3 reads (github)
  • Make prediction frame column names non-integer (github)
  • Add String[] factor_columns instead of int[] factors (github)
  • change the runtime exception to a Log.info() if interface doesn't support multicast (github)
  • More robust way to copy Flow files to web root per Prithvi (github)
  • Switches na_string from a single value per column to an array per column (github)
Web UI

Bug Fixes

Algorithms
  • H2O cloud shuts down with some H2O.fail error, while building some kmeans clusters (PUBDEV-1051) (github)
  • GLM:beta constraint does not seem to be working (PUBDEV-1083)
  • GBM - random attack bug (probably because max_after_balance_size is really small) (PUBDEV-1061) (github)
  • GLM: LBFGS objval java lang assertion error (PUBDEV-1042) (github)
  • PCA Cholesky NPE (PUBDEV-921)
  • GBM: H2o returns just 5525 trees, when ask for a much larger number of trees (PUBDEV-860)
  • CM returned by AUC2 doesn't agree with manual-made labels from F1-optimal threshold (HEXDEV-263)
  • AUC: h2o reporting wrong auc on a modified covtype data (PUBDEV-891)
  • GLM: Build model => Predict => Residual deviance/Null deviance different from training/validation metrics (PUBDEV-991)
  • KMeans metrics incomplete (PUBDEV-1029)
  • GLM: Java Assertion Error (PUBDEV-1025)
  • Random forest bug (PUBDEV-1015)
  • A particular random forest model has an empty (training) metric json max_criteria_and_metric_scores (PUBDEV-1001)
  • PCA results exhibit numerical inaccuracies compared to R (PUBDEV-550)
  • DRF: reporting wrong depth for attached dataset (PUBDEV-1006)
  • added missing "names" column name to beta constraints processing (github)
  • Fix balance_classes probability correction consistency between H2O and POJO (github)
  • Fix in GLM scoring - check actual for NaNs as well (github)
Python
  • Cannot import_file path=url python interface (PUBDEV-1059)
  • head()/tail() should show labels, rather than number encoding, for enum columns (PUBDEV-1017)
  • h2o.py: for binary response printing transpose and hence wrong cm (PUBDEV-1013)
R
  • Broken Summary in R (PUBDEV-1073
  • h2oR summary: displaying no labels in summary (PUBDEV-1008)
  • R/Python impute bugs (PUBDEV-1055)
  • R: h2o.varimp doubles the print statement (PUBDEV-1068)
  • R: h2o.varimp returns NULL when model has no variable importance (PUBDEV-1078)
  • h2oR: h2o.confusionMatrix(my_gbm, validation=F) should not show a null (PUBDEV-849)
  • h2o.impute doesn't impute (PUBDEV-1024)
  • R: as.h2o cutting entries when trying to import data.frame into H2O (HEXDEV-293)
  • The default names are too long, for an R-datafile parsed to H2O, and needs to be changed (PUBDEV-976)
  • H2o.confusionMatrix: when invoked with threshold gives error (PUBDEV-1010)
  • removing train and adding error messages for valid = TRUE when there's not validation metrics (github)
System
  • Download logs is returning the same log file bundle for every node (PUBDEV-1056)
  • ParseSetup is useless and misleading for SVMLight (PUBDEV-994)
  • Fixes bug that was short circuiting the setting of column names (github)
Web UI

Shackleford (0.2.3.5) - 5/1/15

Download at: http://h2o-release.s3.amazonaws.com/h2o-dev/rel-shackleford/5/index.html

New Features

API
  • Need a /Log REST API to log client-side errors to H2O's log (HEXDEV-291)

##Python

  • add impute to python interface (github)
System

Enhancements

Algorithms
  • GLM: Name to be changed from normalized to standardized in output to be consistent between input/output (PUBDEV-954)
  • GLM: It would be really useful if the coefficient magnitudes are reported in descending order (PUBDEV-923)
  • PUBDEV-536: Limit DL models to 100M parameters (github)
  • PUBDEV-536: Add accurate memory-based admission control for GBM/DRF (github)
  • relax the tolerance a little more...(github)
  • Tree depth correction (github)
  • Comment out duration_in_ms for now, as it's always left at 0 (github)
  • Updated min mem computation for glm (github)
  • GLM update: added lambda search info to scoring history (github)
Python
  • python .show() on model and metric objects should match R/Flow as much as possible (HEXDEV-289)
  • GLM model output, details from Python (HEXDEV-95)
  • GBM model output, details from Python (HEXDEV-102)
  • Run GBM from Python (HEXDEV-99)
  • map domain to result from /Frames if needed (github)
  • added confusion matrix to metric output (github)
  • update metrics_base_confusion_matrices() (github)
  • fetch out string_data if type is string (github)
R
System
Web UI
  • Flow: Confusion matrix: good to have consistency in the column and row name (letter) case (PUBDEV-971)
  • Run GBM Multinomial from Flow (HEXDEV-111)
  • Run GBM Regression from Flow (HEXDEV-112)
  • Sort model types in alphabetical order in Flow (PUBDEV-1011)

Bug Fixes

The following changes are to resolve incorrect software behavior:

Algorithms
  • GLM: Model output display issues (PUBDEV-956)
  • h2o.glm: ignores validation set (PUBDEV-958)
  • DRF: reports wrong number of leaves in a summary (PUBDEV-930)
  • h2o.glm: summary of a prediction frame gives na's as labels (PUBDEV-959)
  • GBM: reports wrong max depth for a binary model on german data (PUBDEV-839)
  • GLM: Confusion matrix missing in R for binomial models (PUBDEV-950) (github)
  • GLM: On airlines(40g) get ArrayIndexOutOfBoundsException (PUBDEV-967)
  • GLM: Build model => Predict => Residual deviance/Null deviance different from training/validation metrics (PUBDEV-991)
  • Domains returned by GLM for binomial classification problem are integers, but should be mapped to their label (PUBDEV-999)
  • GLM: Validation on non training data gives NaN Res Deviance and AIC (PUBDEV-1005)
  • Confusion matrix has nan's in it (PUBDEV-1000)
  • glm fix: pass model_id from R (was being dropped) (github)
Python
R
  • h2o.confusionMatrix for binary response gives not-found thresholds (PUBDEV-957)
  • GLM: model_id param is ignored in R (PUBDEV-1007)
  • h2o.confusionmatrix: mixing cases(letter) for categorical labels while printing multinomial cm (PUBDEV-996)
  • fix the dupe thresholds error (github)
  • extra arg in impute example (github)
  • fix missing param data (github)
System
  • Builds : Failing intermittently due to java.lang.StackOverflowError (PUBDEV-972)
  • Get H2O cloud hang with NPE and roll up stats problem, when click on build model glm from flow, on laptop after running a few python demos and R scripts (PUBDEV-963)
Web UI
  • Flow :=> Airlines dataset => Build models glm/gbm/dl => water.DException$DistributedException: from /172.16.2.183:54321; by class water.fvec.RollupStats$ComputeRollupsTask; class java.lang.NullPointerException: null (PUBDEV-603)
  • Flow => Preview Pojo => collapse not working (PUBDEV-977)
  • Flow => Any algorithm => Select response => Select Add all for ignored columns => Try to unselect some from ignored columns => Build => Response column IsDepDelayed not found in frame: allyears_1987_2013.hex. (PUBDEV-978)
  • Flow => ROC curve select something on graph => Table is displayed for selection => Collapse ROC curve => Doesn't collapse table, collapses only graph (PUBDEV-1003)

Severi (0.2.2.16) - 4/29/15

New Features

Python

Enhancements

Algorithms
  • Use partial-sum version of mat-vec for DL POJO (PUBDEV-936)
  • Always store weights and biases for DLTest Junit (github)
  • Show the DL model size in the model summary (github)
  • Remove assertion in hot loop (github)
  • Rename ADMM to IRLSM (github)
  • Added no intercept option to glm (github)
  • Code cleanup. Moved ModelMetricsPCAV3 out of H2O-algos (github)
  • Improve DL model checkpoint logic (github)
  • Updated glm output (github)
  • Renamed normalized coefficients to standardized coefficients in glm output (github)
  • Use proper tie breaking for NB (github)
  • Add check that DL parameters aren't modified by model training (github)
  • Reduce tolerances (github)
  • If no observations of a response leveland prediction is numeric, assume it is drawn from standard normal distribution (mean 0, standard deviation 1). Add validation test with split frame for naive Bayes (github)
Python
  • replaced H2OFrame.send_frame() calls with cbind Exprs so that lazy evaluation is enforced (github)
  • change default xmx/s behavior of h2o.init() (github)
  • better handling of single row return and print (github)
R
  • Added interpolation to quantile to match R type 7 (github)
  • Removed and tidied if's in quantile.H2OFrame since it now uses match.arg (github)
  • Connected validation dataset to glm in R (github)
  • Removing h2o.aic from seealso link (doesn't exist) and updating documentation (github)
System
  • Add number of rows (per node) to ChunkSummary (PUBDEV-938) (github)
  • allow nrow as alias for count in groupby (github)
  • Only launches task to fill in SVM zeros if the file is SVM (github)
  • Adds more log traces to track progress of post-ingest actions (github)
  • Adds svm as a file extension to the hex name cleanup (github)
Web UI
  • Flow: Inspect data => Round decimal points to 1 to be consistent with h2o1 (PUBDEV-453)
  • Setup POJO download method for Flow (PUBDEV-909)
  • Pretty-print POJO preview in flow (PUBDEV-940)
  • Flow: It would be good if 'get predictions' also shows the data (PUBDEV-883)
  • GBM model output, details in Flow (HEXDEV-103)
  • Display a linked data table for each visualization in Flow (PUBDEV-318)
  • Run GBM binomial from Flow (needs proper CM) (PUBDEV-943)

Bug Fixes

Algorithms
  • GLM: results from model and prediction on the same dataset do not match (PUBDEV-922)
  • GLM: when select AUTO as solver, for prostate, glm gives all zero coefficients (PUBDEV-916)
  • Large (DL) models cause oversize issues during serialization (PUBDEV-941)
  • Fixed name change for ADMM (github)
API
Python
  • H2OVec.row_select(H2OVec) fails on case where only 1 row is selected (PUBDEV-948)
  • fix pyunit (github)
R
  • R: Parse of zip file fails, Summary fails on citibike data (PUBDEV-835)
  • h2o. performance reports a different Null Deviance than the model object for the same dataset (PUBDEV-816)
  • h2o.glm: no example on h2o.glm help page (PUBDEV-962)
  • H2O R: Confusion matrices from R still confused (PUBDEV-904) (github)
  • R: h2o.confusionMatrix("H2OModel", ...) extra parameters not working (PUBDEV-953) (github)
  • h2o.confusionMatrix for binomial gives not-found thresholds on S3 -airlines 43g (PUBDEV-957)
  • H2O summary quartiles outside tolerance of (max-min)/1000 (PUBDEV-671)
  • fix space headers issue from R (was not url-encoding the column strings) (github)
  • R CMD fixes (github)
  • Fixed broken R interface - make validation_frame non-mandatory (github)
Sparkling Water
  • Sparkling water : #UDP-Recv ERRR: UDP Receiver error on port 54322java.lang.ArrayIndexOutOfBoundsException:(PUBDEV-311)
System
  • Mapr 3.1.1 : Memory is not being allocated for what is asked for instead the default is what cluster gets (PUBDEV-937)
  • GLM: AIOOBwith msg '-14' at water.RPC$2.compute2(RPC.java:593) (PUBDEV-917)
  • h2o.glm: model summary listing same info twice (PUBDEV-915)
  • Parse: Detect and reject UTF-16 encoded files (HEXDEV-285)
  • DataInfo Row categorical encoding AIOOBE (HEXDEV-283)
  • Fix POJO Preview exception (github)
  • Fix NPE in ChunkSummary (github)
  • fix global name collision (github)

Severi (0.2.2.15) - 4/25/15

Download at: http://h2o-release.s3.amazonaws.com/h2o-dev/rel-severi/15/index.html

New Features

Python
  • added min, max, sum, median for H2OVecs and respective pyunit (github)
  • added min(), max(), and sum() functionality on H2OFrames and respective pyunits (github)
Web UI

Enhancements

Algorithms
  • K means output clean up (HEXDEV-187)
  • Add FNR/TNR/FPR/TPR to threshold tables, remove recall, specificity (github)
  • Add accessor for variable importances for DL (github)
  • Relax CM error tolerance for F1-optimal threshold now that AUC2 doesn't necessarily create consistent thresholds with its own CMs. (github)
  • Added scoring history to glm (github)
  • Added model summary to glm (github)
  • Add flag to support reading data from S3N (github)
  • Added degrees of freedom to GLM metrics schemas (github)
  • Allow DL scoring_history to be unlimited in length (github)
  • add plotting for binomial models (github)
  • Ignore certain parameters that are not applicable (class balancing, max CM size, etc.) (github)
  • Updated glm scoring, fill training/validation metrics in model output (github)
  • Rename gbm loss parameter to distribution (github)
  • Fix GBM naming: loss -> distribution (github)
  • GLM LBFGS update (github)
  • na.rm for quantile is default behavior (github)
  • GLM update: enabled max_predictors in REST, updated lbfgs (github)
  • Remove keep_cross_validation_splits for now from DL (github)
  • Get rid of sigma in the model metrics, instead show r2 (github)
  • Don't show score_every_iteration for DL (github)
  • Don't print too large confusion matrices in Tree models (github)
API
Python
  • Python client should check that version number == server version number (PUBDEV-799)
  • Add asfactor for month (github)
  • in Expr.show() only show 10 or less rows. remove locate from runit test because full path used (github)
  • change nulls to () (github)
  • sigma is no longer part of ModelMetricsRegressionV3 (github)
R
System
  • Port MissingValueInserter EndPoint to h2o-dev. (PUBDEV-465)
  • Rapids: require a (put "key" %frame) (PUBDEV-868)
  • Need pojo base model jar file embedded in h2o-dev via build process (PUBDEV-780) (github)
  • Make .json the default (PUBDEV-619) (github)
  • Rename class for clarification (github)
  • Classifies all NA columns as numeric. Also improves preview sampling accuracy by trimming partial lines at end of chunk. (github)
  • Implements sampling of files within the ParseSetup preview. This prevents poor column type guesses from only sampling the beginning of a file. (github).
  • Rename fields drop_na20_col (github)
  • allow for many deletes as final statements in a block (github)
  • rename initF -> init_f, dropNA20Cols -> drop_na20_cols (github)
  • Removed tweedie param (github)
  • thresholds -> threshold (github)
  • JSON of TwoDimTable with all null values in the first column (no row headers) now doesn't have an empty column for of "" or nulls. (github)
  • move H2O_Load, fix all the timezone functions (github)
  • Add extra verbose printout in case Frames don't match identically (github)
  • allow delayed column lookup (github)
  • add mixed type list (github)
  • Added WaterMeterIo to count persist info (github)
  • Remove special setChunkSize code in HDFS and NFS file vec (github)
  • add check for Frame on string parse (github)
  • Disable Memory Cleaner (github)
  • Handle '<' chars in Keys when swapping (github)
  • allow for colnames in slicing (github)
  • Adjusts parse type detection. If column is all one string value, declare it an enum (github)
Web UI

Bug Fixes

Algorithms
  • GLM: lasso i.e alpha =1 seems to be giving wrong answers (PUBDEV-769)
  • AUC: h2o reports .5 auc when actual auc is 1 (PUBDEV-879)
  • h2o.glm: No output displayed for the model (PUBDEV-858)
  • h2o.glm model object output needs a fix (PUBDEV-815)
  • h2o.glm model object says : fill me in GLMModelOutputV2; I think I'm redundant [1] FALSE (PUBDEV-765)
  • GLM : Build GLM Model => Java Assertion error (PUBDEV-686)
  • GLM :=> Progress shows -100% (PUBDEV-861)
  • GBM: Negative sign missing in initF value for ad dataset (PUBDEV-880)
  • K-Means takes a validation set but doesn't use it (PUBDEV-826)
  • Absolute_MCC is NaN (sometimes) (PUBDEV-848) (github)
  • GBM: A proper error msg should be thrown when the user sets the max depth =0 (PUBDEV-838) (github)
  • DRF Regression Assertion Error (PUBDEV-824)
  • h2o.randomForest: if h2o is not returning the mse for the 0th tree then it should not be reported in the model object (PUBDEV-811)
  • GBM: Got exception class java.lang.AssertionError with msg null java.lang.AssertionError at hex.tree.gbm.GBM$GBMDriver$GammaPass.map (PUBDEV-693)
  • GBM: Got exception class java.lang.AssertionError with msg null java.lang.AssertionError at hex.ModelMetricsMultinomial$MetricBuildMultinomial.perRow (HEXDEV-248)
  • GBM get java.lang.AssertionError: Coldata 2199.0 out of range C17:5086.0-19733.0 step=57.214844 nbins=256 isInt=1 (HEXDEV-241)
  • GLM: glmnet objective function better than h2o.glm (PUBDEV-749)
  • GLM: get AIOOB:-36 at hex.glm.GLMTask$GLMIterationTask.postGlobal(GLMTask.java:733) (PUBDEV-894) (github)
  • Fixed glm behavior in case no rows are left after filtering out NAs (github)
  • Fix memory leak in validation scoring in K-Means (github)
API
  • API unification: DataFrame should be able to accept URI referencing file on local filesystem (PUBDEV-709) (github)
Python
R
System
  • MapR FS loads are too slow (PUBDEV-927)
  • ensure that HDFS works from Windows (PUBDEV-812)
  • Summary: on a time column throws,'null' is not an object (evaluating 'column.domain[level.index]') in Flow (PUBDEV-867)
  • Parse: An enum column gets parsed as int for the attached file (PUBDEV-606)
  • Parse => 40Mx1_uniques => class java.lang.RuntimeException (PUBDEV-729)
  • if there are fewer than 5 unique values in a dataset column, mins/maxs reports e+308 values (PUBDEV-150) (github)
  • Sparkling water - DataFrame[T_UUID] to SchemaRDD[StringType] (PUDEV-771)
  • Sparkling water - DataFrame[T_NUM(Long)] to SchemaRDD[LongType] (PUBDEV-767)
  • Sparkling water - DataFrame[T_ENUM] to SchemaRDD[StringType] (PUBDEV-766)
  • Inconsistency in row and col slicing (HEXDEV-265) (github)
  • rep_len expects literal length only (HEXDEV-268) (github)
  • cbind and = don't work within a single rapids block (HEXDEV-237)
  • Rapids response for c(value) does not have frame key (HEXDEV-252)
  • S3 parse takes forever (PUBDEV-876)
  • Parse => Enum unification fails in multi-node parse (PUBDEV-718) (github)
  • All nodes are not getting updated with latest status of each other nodes info (PUBDEV-768)
  • Cluster creation is sometimes rejecting new nodes (post jenkins-master-1128+) (PUBDEV-807)
  • Parse => Multiple files 1 zip/ 1 csv gives Array index out of bounds (PUBDEV-840)
  • Parse => failed for X5MRows6KCols ==> OOM => Cluster dies (PUBDEV-836)
  • /frame/foo pagination weirded out (HEXDEV-277) (github)
  • Removed code that flipped enums to strings (github)
Web UI
  • Flow: It would be really useful to have the mse plots back in GBM (PUBDEV-889)
  • State change in Flow is not fully validated (PUBDEV-919)
  • Flows : Not able to load saved flows from hdfs (PUBDEV-872)
  • Save Function in Flow crashes (PUBDEV-791) (github)
  • Flow: should throw a proper error msg when user supplied response have more categories than algo can handle (PUBDEV-866)
  • Flow display of a summary of a column with all missing values fails. (HEXDEV-230)
  • Split frame UI improvements (HEXDEV-275)
  • Flow : Decimal point precisions to be consistent to 4 as in h2o1 (PUBDEV-844)
  • Flow: Prediction frame is outputing junk info (PUBDEV-825)
  • EC2 => Cluster of 16 nodes => Water Meter => shows blank page (PUBDEV-831)
  • Flow: Predict - "undefined is not an object (evaluating prediction.thresholds_and_metric_scores.name) (PUBDEV-559)
  • Flow: inspect getModel for PCA returns error (PUBDEV-610)
  • Flow, RF: Can't get Predict results; "undefined is not an object (evaluating prediction.confusion_matrices.length)" (PUBDEV-695)
  • Flow, GBM: getModel is broken -Error processing GET /3/Models.json/gbm-b1641e2dc3-4bad-9f69-a5f4b67051ba null is not an object (evaluating source.length) (PUBDEV-800)

Severi (0.2.2.1) - 4/10/15

New Features

R

Enhancements

Algorithms
  • POJO generation: GBM (PUBDEV-713)
  • POJO generation: DRF (PUBDEV-714)
  • Compute and Display Hit Ratios (PUBDEV-630) (github)
  • Add DL POJO scoring (PUBDEV-585)
  • Allow validation dataset for AutoEncoder (PUDEV-581)
  • PUBDEV-580: Add log loss to binomial and multinomial model metric (github)
  • Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)
  • increase tolerance to 2e-3 (was 1e-3 ..failed with 0.001647 relative difference (github)
  • change tolerance to 1e-3 (github)
  • Add option to export weights and biases to REST API / Flow. (github)
  • Add scree plot for H2O PCA models and fix Runit test. (github)
  • Remove quantiles from the model builders list. (github)
  • GLM update: added row filtering argument to line search task, fixed issues with dfork/asyncExec (github)
  • Updated rho-setting in GLM. (github)
  • No threshold 0.5; use the default (max F1) instead (github)
  • GLM update: updated initilization, NA row filtering, default lambda is now empty, will be picked based on the fraction of lambda_max. (github)
  • Updated ADMM solver. (github)
  • Added makeGLMModel call. (github)
  • Start with classification error NaN at t=0 for DL, not with 1. (github)
  • Relax DL POJO relative tolerance to 1e-2. (github)
  • Override nfeatures() method in DLModelOutput. (github)
  • Renaming of fields in GLM (github)
  • GLM: Take out Balance Classes (PUBDEV-795)
API
  • schema metadata for Map fields should include the key and value types (PUBDEV-753) (github)
  • schema metadata should include the superclass (PUBDEV-754)
  • rest api naming convention: n_folds vs ntrees (PUBDEV-737)
  • schema metadata for Map fields should include the key and value types (PUBDEV-753)
  • Create REST Endpoint for exposing .java pojo models (PUBDEV-778)
Python
  • Run GLM from Python (including LBFGS) (HEXDEV-92)
  • added H2OFrame show(), as_list(), and slicing pyunits (github)
  • changed solver parameter to "L_BFGS" (github)
  • added multidimensional slicing of H2OFrames and Exprs. (github)
  • add h2o.groupby to python interface (github)
  • added H2OModel.confusionMatrix() to return confusion matrix of a prediction (github)
R
  • PUBDEV-578, PUBDEV-541, PUBDEV-566. -R client now sends the data frame column names and data types to ParseSetup. -R client can get column names from a parsed frame or a list. -Respects client request for column data types (github)
  • R: Cannot create new columns through R (PUBDEV-571)
  • H2O-R: it would be more useful if h2o.confusion matrix reports the actual class labels instead of [,1] and [,2] (PUBDEV-553)
  • Support both multinomial and binomial CM (github)
System
  • Flow: Standardize max_iters/max_iterations parameters (PUBDEV-447) (github)
  • Add ERROR logging level for too-many-retries case (PUBDEV-146) (github)
  • Simplify checking of cluster health. Just report the status immediately. (github)
  • reduce timeout (github)
  • strings can have ' or " beginning (github)
  • Throw a validation error in flow if any training data cols are non-numeric (github)
  • Add getHdfsHomeDirectory(). (github)
  • Added --verbose. (github)
Web UI
  • PUBDEV-707: nice algo names in the Flow dropdown (full word names) (github)
  • Unbreak Flow's ConfusionMatrix display. (github)
  • POJO generation: DL (PUBDEV-715)

Bug Fixes

Algorithms
  • GLM : Build GLM model with nfolds brings down the cloud => FATAL: unimplemented (PUBDEV-731) (github)
  • DL : Build DL Model => FATAL: unimplemented: n_folds >= 2 is not (yet) implemented => SHUTSDOWN CLOUD (PUBDEV-727) (github)
  • GBM => Build GBM model => No enum constant hex.tree.gbm.GBMModel.GBMParameters.Family.AUTO (PUBDEV-723)
  • GBM: When run with loss = auto with a numeric column get- error :No enum constant hex.tree.gbm.GBMModel.GBMParameters.Family.AUTO (PUBDEV-708) (github)
  • gbm: does not complain when min_row >dataset size (PUBDEV-694) (github)
  • GLM: reports wrong residual degrees of freedom (PUBDEV-668)
  • H2O dev reports less accurate aucs than H2O (PUBDEV-602)
  • GLM : Build GLM model fails => ArrayIndexOutOfBoundsException (PUBDEV-601)
  • divide by zero in modelmetrics for deep learning (PUBDEV-568)
  • GBM: reports 0th tree mse value for the validation set, different than the train set ,When only train sets is provided (PUDEV-561)
  • GBM: Initial mse in bernoulli seems to be off (PUBDEV-515)
  • GLM : Build Model fails with Array Index Out of Bound exception (PUBDEV-454) (github)
  • Custom Functions don't work in apply() in R (PUBDEV-436)
  • GLM failure: got NaNs and/or Infs in beta on airlines (PUBDEV-362)
  • MetricBuilderMultinomial.perRow AssertionError while running GBM (HEXDEV-240)
  • Problems during Train/Test adaptation between Enum/Numeric (HEXDEV-229)
  • DRF/GBM balance_classes=True throws unimplemented exception (HEXDEV-226) (github)
  • AUC reported on training data is 0, but should be 1 (HEXDEV-223) (github)
  • glm pyunit intermittent failure (HEXDEV-199)
  • Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
  • get rid of nfolds= param since it's not supported in GLM yet (github)
  • Fixed degrees of freedom (off by 1) in glm, added test. (github)
  • GLM fix: fix filtering of rows with NAs and fix in sparse handling. (github)
  • Fix GLM job fail path to call Job.fail(). (github)
  • Full AUC computation, bug fixes (github)
  • Fix ADMM for upper/lower bounds. (updated rho settings + update u-vector in ADMM for intercept) (github)
  • Few glm fixes (github)
  • DL : KDD Algebra data set => Build DL model => ArrayIndexOutOfBoundsException (PUBDEV-696)
  • GBm: Dev vs H2O for depth 5, minrow=10, on prostate, give different trees (PUBDEV-759)
  • GBM param min_rows doesn't throw exception for negative values (PUBDEV-697)
  • GBM : Build GBM Model => Too many levels in response column! (java.lang.IllegalArgumentException) => Should display proper error message (PUBDEV-698)
  • GBM:Got exception 'class java.lang.AssertionError', with msg 'Something is wrong with GBM trees since returned prediction is Infinity (PUBDEV-722)
API
  • Cannot adapt numeric response to factors made from numbers (PUBDEV-620)
  • not specifying response_column gets NPE (deep learning build_model()) I think other algos might have same thing (PUBDEV-131)
  • NPE response has null msg, exception_msg and dev_msg (HEXDEV-225)
  • Flow :=> Save Flow => On Mac and Windows 8.1 => NodePersistentStorage failure while attempting to overwrite (?) a flow (HEXDEV-202) (github)
  • the can_build field in ModelBuilderSchema needs values[] to be set (PUBDEV-755)
  • value field in the field metadata isn't getting serialized as its native type (PUBDEV-756)
Python
R
System
  • key type failure should fail the request, not the cloud (PUBDEV-739) (github)
  • Parse => Import Medicare supplier file => Parse = > Illegal argument for field: column_names of schema: ParseV2: string and key arrays' values must be quoted, but the client sent: " (PUBDEV-719)
  • Overwriting a constant vector with strings fails (PUBDEV-702)
  • H2O - gets stuck while calculating quantile,no error msg, just keeps running a job that normally takes less than a sec (PUBDEV-685)
  • Summary and quantile on a column with all missing values should not throw an exception (PUBDEV-673) (github)
  • View Logs => class java.lang.RuntimeException: java.lang.IllegalArgumentException: File /home2/hdp/yarn/usercache/neeraja/appcache/application_1427144101512_0039/h2ologs/h2o_172.16.2.185_54321-3-info.log does not exist (PUBDEV-600)
  • Parse: After parsing Chicago crime dataset => Not able to build models or Get frames (PUBDEV-576)
  • Parse: Numbers completely parsed wrong (PUBDEV-574)
  • Flow: converting a column to enum while parsing does not work (PUBDEV-566)
  • Parse: Fail gracefully when asked to parse a zip file with different files in it (PUBDEV-540)(github)
  • toDataFrame doesn't support sequence format schema (array, vectorUDT) (PUBDEV-457)
  • Parse : Parsing random crap gives java.lang.ArrayIndexOutOfBoundsException: 13 (PUBDEV-428)
  • The quote stripper for column names should report when the stripped chars are not the expected quotes (PUBDEV-424)
  • import directory with large files,then Frames..really slow and disk grinds. Files are unparsed. Shouldn't be grinding (PUBDEV-98)
  • NodePersistentStorage gets wiped out when hadoop cluster is restarted (HEXDEV-185)
  • h2o.exec won't be supported (github)
  • fixed import issue (github)
  • fixed init param (github)
  • fix repeat as.factor NPE (github)
  • startH2O set to False in init (github)
  • hang on glm job removal (PUBDEV-726)
  • Flow - changed column types need to be reflected in parsed data (HEXDEV-189)
  • water.DException$DistributedException while running kmeans in multinode cluster (PUBDEV-691)
  • Frame inspection prior to file parsing, corrupts parsing (PUBDEV-425)
Web UI
  • Flow, DL: Need better fail message if "Autoencoder" and "use_all_factor_levels" are both selected (PUBDEV-724)
  • When select AUTO while building a gbm model get ERROR FETCHING INITIAL MODEL BUILDER STATE (PUBDEV-595)
  • Flow : Build h2o-dev-0.1.17.1009 : Building GLM model gives java.lang.ArrayIndexOutOfBoundsException: (PUBDEV-205 (github)
  • Flow:Summary on flow broken for a long time (PUBDEV-785)

Serre (0.2.1.1) - 3/18/15

Download at: http://h2o-release.s3.amazonaws.com/h2o-dev/rel-serre/1/index.html

New Features

Algorithms
Python
R
System
Web UI

Enhancements

Algorithms
  • Display GLM coefficients only if available (PUBDEV-466)
  • Add random chance line to RoC chart (HEXDEV-168)
  • Speed up DLSpiral test. Ignore Neurons test (MatVec) (github)
  • Use getRNG for Dropout (github)
  • PUBDEV-598: Add tests for determinism of RNGs (github)
  • PUBDEV-598: Implement Chi-Square test for RNGs (github)
  • Add DL model output toString() (github)
  • Add LogLoss to MultiNomial ModelMetrics (PUBDEV-580)
  • Print number of categorical levels once we hit >1000 input neurons. (github)
  • Updated the loss behavior for GBM. When loss is set to AUTO, if the response is an integer with 2 levels, then bernoullli (rather than gaussian) behavior is chosen. As a result, the do_classification flag is no longer necessary in Flow, since the loss completely specifies the desired behavior, and R users no longer to use as.factor() in their response to get the desired bernoulli behavior. The score_each_iteration flag has been removed as well. (github)
  • Fully remove _convert_to_enum in all algos (github)
  • Port MissingValueInserter EndPoint to h2o-dev. (PUBDEV-465)
API
Python
  • added H2OFrame.setNames(), H2OFrame.cbind(), H2OVec.cbind(), h2o.cbind(), and pyunit_cbind.py (github)
  • Make H2OVec.levels() return the levels (github)
  • H2OFrame.dim(), H2OFrame.append(), H2OVec.setName(), H2OVec.isna() additions. demo pyunit addition (github)
System
  • Customize H2O web UI port (PUBDEV-483)
  • Make parse setup interactive (PUBDEV-532)
  • Added --verbose (github)
  • Adds some H2OParseExceptions. Removes all H2O.fail in parse (no parse issues should cause a fail)(github)
  • Allows parse to specify check_headers=HAS_HEADERS, but not provide column names (github)
  • Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)
Web UI
  • Add 'Clear cell' and 'Run all cells' toolbar buttons (github)
  • Add 'Clear cell' and 'Clear all cells' commands (PUBDEV-493) (github)
  • 'Run' button selects next cell after running
  • ModelMetrics by model category: Clustering (PUBDEV-416)
  • ModelMetrics by model category: Regression (PUBDEV-415)
  • ModelMetrics by model category: Multinomial (PUBDEV-414)
  • ModelMetrics by model category: Binomial (PUBDEV-413)
  • Add ability to select and delete multiple models (github)
  • Add ability to select and delete multiple frames (github)
  • Flows now stop running when an error occurs
  • Print full number of mismatches during POJO comparison check. (github)
  • Make Grid multi-node safe (github)
  • Beautify the vertical axis labels for Flow charts/visualization (more) (PUBDEV-329)

Bug Fixes

Algorithms
  • GBM only populates either MSE_train or MSE_valid but displays both (PUBDEV-350)
  • GBM: train error increases after hitting zero on prostate dataset (PUBDEV-513)
  • GBM : Variable importance displays 0's for response param => should not display response in table at all (PUBDEV-430)
  • GLM : R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)
  • Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
  • Flow: GLM - 'model.output.coefficients_magnitude.name' not found, so can't view model (PUBDEV-466)
  • GBM predict fails without response column (PUBDEV-478)
  • GBM: When validation set is provided, gbm should report both mse_valid and mse_train (PUBDEV-499)
  • PCA Assertion Error during Model Metrics (PUBDEV-548) (github)
  • KMeans: Size of clusters in Model Output is different from the labels generated on the training set (PUBDEV-542) (github)
  • Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
  • PUBDEV-580: Fix some numerical edge cases (github)
  • Fix two missing float -> double conversion changes in tree scoring. (github)
  • Flow: HIDDEN_DROPOUT_RATIOS for DL does not show default value (PUBDEV-285)
  • Old GLM Parameters Missing (PUBDEV-431)
  • GLM: R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)
API
  • SplitFrame on String column produce C0LChunk instead of CStrChunk (PUBDEV-468)
  • Error in node$h2o$node : $ operator is invalid for atomic vectors (PUBDEV-348)
  • Response from /ModelBuilders don't conform to standard error json shape when there are errors (HEXDEV-121) (github)
Python
  • fix python syntax error (github)
  • Fixes handling of None in python for a returned na_string. (github)
R
  • R : Inconsistency - Train set name with and without quotes work but Validation set name with quotes does not work (PUBDEV-491)
  • h2o.confusionmatrices does not work (PUBDEV-547)
  • How do i convert an enum column back to integer/double from R? (PUBDEV-546)
  • Summary in R is faulty (PUBDEV-539)
  • R: as.h2o should preserve R data types (PUBDEV-578)
  • NPE in GBM Prediction with Sliced Test Data (HEXDEV-207) (github)
  • Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
  • Custom Functions don't work in apply() in R (PUBDEV-436)
  • got water.DException$DistributedException and then got java.lang.RuntimeException: Categorical renumber task (HEXDEV-195)
  • H2O-R: as.h2o parses column name as one of the row entries (PUBDEV-591)
  • R-H2O Managing Memory in a loop (PUB-1125)
  • h2o.confusionMatrices for multinomial does not work (PUBDEV-577)
  • H2O-R not showing meaningful error msg
System
  • Flow: When balance class = F then flow should not show max_after_balance_size = 5 in the parameter listing (PUBDEV-503)
  • 3 jvms, doing ModelMetrics on prostate, class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: --- Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122 (PUBDEV-495)
  • Not able to start h2o on hadoop (PUBDEV-487)
  • one row (one col) dataset seems to get assertion error in parse setup request (PUBDEV-96)
  • Parse : Import file (move.com) => Parse => First row contains column names => column names not selected (HEXDEV-171) (github)
  • The NY0 parse rule, in summary. Doesn't look like it's counting the 0's as NAs like h2o (PUBDEV-154)
  • 0 / Y / N parsing (PUBDEV-229)
  • NodePersistentStorage gets wiped out when laptop is restarted. (HEXDEV-167)
  • Building a model and making a prediction accepts invalid frame types (PUBDEV-83)
  • Flow : Import file 15M rows 2.2 Cols => Parse => Error fetching job on UI =>Console : ERROR: Job was not successful Exiting with nonzero exit status (HEXDEV-55)
  • Flow : Build GLM Model => Family tweedy => class hex.glm.LSMSolver$ADMMSolver$NonSPDMatrixException', with msg 'Matrix is not SPD, can't solve without regularization (PUBDEV-211)
  • Flow : Import File : File doesn't exist on all the hdfs nodes => Fails without valid message (PUBDEV-313)
  • Check reproducibility on multi-node vs single-node (PUBDEV-557)
  • Parse : After parsing Chicago crime dataset => Not able to build models or Get frames (PUBDEV-576)
Web UI
  • Flow : Build Model => Parameters => shows meta text for some params (PUBDEV-505)
  • Flow: K-Means - "None" option should not appear in "Init" parameters (PUBDEV-459)
  • Flow: PCA - "None" option appears twice in "Transform" list (HEXDEV-186)
  • GBM Model : Params in flow show two times (PUBDEV-440)
  • Flow multinomial confusion matrix visualization (HEXDEV-204)
  • Flow: It would be good if flow can report the actual distribution, instead of just reporting "Auto" in the model parameter listing (PUBDEV-509)
  • Unimplemented algos should be taken out from drop down of build model (PUBDEV-511)
  • [MapR] unable to give hdfs file name from Flow (PUBDEV-409)

Selberg (0.2.0.1) - 3/6/15

Download at: http://h2o-release.s3.amazonaws.com/h2o-dev/rel-selberg/1/index.html

New Features

Algorithms
Python
R
System
Web UI

Enhancements

The following changes are improvements to existing features (which includes changed default values):

Algorithms
  • Display GLM coefficients only if available (PUBDEV-466)
  • Add random chance line to RoC chart (HEXDEV-168)
  • Allow validation dataset for AutoEncoder (PUDEV-581)
  • Speed up DLSpiral test. Ignore Neurons test (MatVec) (github)
  • Use getRNG for Dropout (github)
  • PUBDEV-598: Add tests for determinism of RNGs (github)
  • PUBDEV-598: Implement Chi-Square test for RNGs (github)
  • PUBDEV-580: Add log loss to binomial and multinomial model metric (github)
  • Add DL model output toString() (github)
  • Add LogLoss to MultiNomial ModelMetrics (PUBDEV-580)
  • Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)
  • Print number of categorical levels once we hit >1000 input neurons. (github)
  • Updated the loss behavior for GBM. When loss is set to AUTO, if the response is an integer with 2 levels, then bernoullli (rather than gaussian) behavior is chosen. As a result, the do_classification flag is no longer necessary in Flow, since the loss completely specifies the desired behavior, and R users no longer to use as.factor() in their response to get the desired bernoulli behavior. The score_each_iteration flag has been removed as well. (github)
  • Fully remove _convert_to_enum in all algos (github)
  • Add DL POJO scoring (PUBDEV-585)
API
Python
  • added H2OFrame.setNames(), H2OFrame.cbind(), H2OVec.cbind(), h2o.cbind(), and pyunit_cbind.py (github)
  • Make H2OVec.levels() return the levels (github)
  • H2OFrame.dim(), H2OFrame.append(), H2OVec.setName(), H2OVec.isna() additions. demo pyunit addition (github)
R
  • PUBDEV-578, PUBDEV-541, PUBDEV-566. -R client now sends the data frame column names and data types to ParseSetup. -R client can get column names from a parsed frame or a list. -Respects client request for column data types (github)
System
  • Customize H2O web UI port (PUBDEV-483)
  • Make parse setup interactive (PUBDEV-532)
  • Added --verbose (github)
  • Adds some H2OParseExceptions. Removes all H2O.fail in parse (no parse issues should cause a fail)(github)
  • Allows parse to specify check_headers=HAS_HEADERS, but not provide column names (github)
  • Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)
Web UI
  • Add 'Clear cell' and 'Run all cells' toolbar buttons (github)
  • Add 'Clear cell' and 'Clear all cells' commands (PUBDEV-493) (github)
  • 'Run' button selects next cell after running
  • ModelMetrics by model category: Clustering (PUBDEV-416)
  • ModelMetrics by model category: Regression (PUBDEV-415)
  • ModelMetrics by model category: Multinomial (PUBDEV-414)
  • ModelMetrics by model category: Binomial (PUBDEV-413)
  • Add ability to select and delete multiple models (github)
  • Add ability to select and delete multiple frames (github)
  • Flows now stop running when an error occurs
  • Print full number of mismatches during POJO comparison check. (github)
  • Make Grid multi-node safe (github)
  • Beautify the vertical axis labels for Flow charts/visualization (more) (PUBDEV-329)

Bug Fixes

The following changes are to resolve incorrect software behavior:

Algorithms
  • GBM only populates either MSE_train or MSE_valid but displays both (PUBDEV-350)
  • GBM: train error increases after hitting zero on prostate dataset (PUBDEV-513)
  • GBM : Variable importance displays 0's for response param => should not display response in table at all (PUBDEV-430)
  • Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
  • GLM : R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)
  • Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
  • Flow: GLM - 'model.output.coefficients_magnitude.name' not found, so can't view model (PUBDEV-466)
  • GBM predict fails without response column (PUBDEV-478)
  • GBM: When validation set is provided, gbm should report both mse_valid and mse_train (PUBDEV-499)
  • PCA Assertion Error during Model Metrics (PUBDEV-548) (github)
  • KMeans: Size of clusters in Model Output is different from the labels generated on the training set (PUBDEV-542) (github)
  • Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
  • divide by zero in modelmetrics for deep learning (PUBDEV-568)
  • AUC reported on training data is 0, but should be 1 (HEXDEV-223) (github)
  • GBM: reports 0th tree mse value for the validation set, different than the train set ,When only train sets is provided (PUDEV-561)
  • PUBDEV-580: Fix some numerical edge cases (github)
  • Fix two missing float -> double conversion changes in tree scoring. (github)
  • Problems during Train/Test adaptation between Enum/Numeric (HEXDEV-229)
  • DRF/GBM balance_classes=True throws unimplemented exception (HEXDEV-226)
  • Flow: HIDDEN_DROPOUT_RATIOS for DL does not show default value (PUBDEV-285)
  • Old GLM Parameters Missing (PUBDEV-431)
  • GLM: R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)
  • GBM: Initial mse in bernoulli seems to be off (PUBDEV-515)
API
  • SplitFrame on String column produce C0LChunk instead of CStrChunk (PUBDEV-468)
  • Error in node$h2o$node : $ operator is invalid for atomic vectors (PUBDEV-348)
  • Response from /ModelBuilders don't conform to standard error json shape when there are errors (HEXDEV-121)
Python
  • fix python syntax error (github)
  • Fixes handling of None in python for a returned na_string. (github)
R
  • R : Inconsistency - Train set name with and without quotes work but Validation set name with quotes does not work (PUBDEV-491)
  • h2o.confusionmatrices does not work (PUBDEV-547)
  • How do i convert an enum column back to integer/double from R? (PUBDEV-546)
  • Summary in R is faulty (PUBDEV-539)
  • Custom Functions don't work in apply() in R (PUBDEV-436)
  • R: as.h2o should preserve R data types (PUBDEV-578)
  • as.h2o loses track of headers (PUBDEV-541)
  • NPE in GBM Prediction with Sliced Test Data (HEXDEV-207) (github)
  • Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
  • Custom Functions don't work in apply() in R (PUBDEV-436)
  • got water.DException$DistributedException and then got java.lang.RuntimeException: Categorical renumber task (HEXDEV-195)
  • h2o.confusionMatrices for multinomial does not work (PUBDEV-577)
  • R: h2o.confusionMatrix should handle both models and model metric objects (PUBDEV-590)
  • H2O-R: as.h2o parses column name as one of the row entries (PUBDEV-591)
System
  • Flow: When balance class = F then flow should not show max_after_balance_size = 5 in the parameter listing (PUBDEV-503)
  • 3 jvms, doing ModelMetrics on prostate, class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: --- Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122 (PUBDEV-495)
  • Not able to start h2o on hadoop (PUBDEV-487)
  • one row (one col) dataset seems to get assertion error in parse setup request (PUBDEV-96)
  • Parse : Import file (move.com) => Parse => First row contains column names => column names not selected (HEXDEV-171) (github)
  • The NY0 parse rule, in summary. Doesn't look like it's counting the 0's as NAs like h2o (PUBDEV-154)
  • 0 / Y / N parsing (PUBDEV-229)
  • NodePersistentStorage gets wiped out when laptop is restarted. (HEXDEV-167)
  • Parse : Parsing random crap gives java.lang.ArrayIndexOutOfBoundsException: 13 (PUBDEV-428)
  • Flow: converting a column to enum while parsing does not work (PUBDEV-566)
  • Parse: Numbers completely parsed wrong (PUBDEV-574)
  • NodePersistentStorage gets wiped out when hadoop cluster is restarted (HEXDEV-185)
  • Parse: Fail gracefully when asked to parse a zip file with different files in it (PUBDEV-540)(github)
  • Building a model and making a prediction accepts invalid frame types (PUBDEV-83)
  • Flow : Import file 15M rows 2.2 Cols => Parse => Error fetching job on UI =>Console : ERROR: Job was not successful Exiting with nonzero exit status (HEXDEV-55)
  • Flow : Build GLM Model => Family tweedy => class hex.glm.LSMSolver$ADMMSolver$NonSPDMatrixException', with msg 'Matrix is not SPD, can't solve without regularization (PUBDEV-211)
  • Flow : Import File : File doesn't exist on all the hdfs nodes => Fails without valid message (PUBDEV-313)
  • Check reproducibility on multi-node vs single-node (PUBDEV-557)
  • Parse: After parsing Chicago crime dataset => Not able to build models or Get frames (PUBDEV-576)
Web UI
  • Flow : Build Model => Parameters => shows meta text for some params (PUBDEV-505)
  • Flow: K-Means - "None" option should not appear in "Init" parameters (PUBDEV-459)
  • Flow: PCA - "None" option appears twice in "Transform" list (HEXDEV-186)
  • GBM Model : Params in flow show two times (PUBDEV-440)
  • Flow multinomial confusion matrix visualization (HEXDEV-204)
  • Flow: It would be good if flow can report the actual distribution, instead of just reporting "Auto" in the model parameter listing (PUBDEV-509)
  • Unimplemented algos should be taken out from drop down of build model (PUBDEV-511)
  • [MapR] unable to give hdfs file name from Flow (PUBDEV-409)

Selberg (0.2.0.1) - 3/6/15

Download at: http://h2o-release.s3.amazonaws.com/h2o-dev/rel-selberg/1/index.html

New Features

Web UI
  • Flow: Delete functionality to be available for import files, jobs, models, frames (PUBDEV-241)
  • Implement "Download Flow" (PUBDEV-407)
  • Flow: Implement "Run All Cells" (PUBDEV-110)
API
System
  • Add a README.txt to the hadoop zip files (github)
  • Build a cdh5.2 version of h2o (github)

Enhancements

Web UI
Algorithms
  • Added K-Means scoring (github)
  • Flow: Implement model output for Deep Learning (PUBDEV-118)
  • Flow: Implement model output for GLM (PUBDEV-120)
  • Deep Learning model output (HEXDEV-89, Flow),(HEXDEV-88, Python),(HEXDEV-87, R)
  • Run GLM Binomial from Flow (including LBFGS) (HEXDEV-90)
  • Flow: Display confusion matrices for multinomial models (PUBDEV-397)
  • During PCA, missing values in training data will be replaced with column mean (github)
  • Update parameters for best model scan (github)
  • Change Quantiles to match h2o-1; both Quantiles and Rollups now have the same default percentiles (github)
  • Massive cleanup and removal of old PCA, replacing with quadratically regularized PCA based on alternating minimization algorithm in GLRM (github)
  • Add model run time to DL Model Output (github)
  • Don't gather Neurons/Weights/Biases statistics (github)
  • Only store best model if override_with_best_model is enabled (github)
  • beta_eps added, passing tests changed (github)
  • For GLM, default values for max_iters parameter were changed from 1000 to 50.
  • For quantiles, probabilities are displayed.
  • Run Deep Learning Multinomial from Flow (HEXDEV-108)
API
  • Expose DL weights/biases to clients via REST call (PUBDEV-344)
  • Flow: Implement notification bar/API (PUBDEV-359)
  • Variable importance data in REST output for GLM (PUBDEV-359)
  • Add extra DL parameters to R API (average_activation, sparsity_beta, max_categorical_features, reproducible) (github)
  • Update GLRM API model output (github)
  • h2o.anomaly missing in R (PUBDEV-434)
  • No method to get enum levels (PUBDEV-432)
System
  • Improve memory footprint with latest version of h2o-dev (github)
  • For now, let model.delete() of DL delete its best models too. This allows R code to not leak when only calling h2o.rm() on the main model. (github)
  • Bind both TCP and UDP ports before clustering (github)
  • Round summary row#. Helps with pctiles for very small row counts. Add a test to check for getting close to the 50% percentile on small rows. (github)
  • Increase Max Value size in DKV to 256MB (github)
  • Flow: make parseRaw() do both import and parse in sequence (HEXDEV-184)
  • Remove notion of individual job/job tracking from Flow (PUBDEV-449)
  • Capability to name prediction results Frame in flow (PUBDEV-233)

Bug Fixes

Algorithms
  • GLM binomial prediction failing (PUBDEV-403)
  • DL: Predict with auto encoder enabled gives Error processing error (PUBDEV-433)
  • balance_classes in Deep Learning intermittent poor result (PUBDEV-437)
  • Flow: Building GLM model fails (PUBDEV-186)
  • summary returning incorrect 0.5 quantile for 5 row dataset (PUBDEV-95)
  • GBM missing variable importance and balance-classes (PUBDEV-309)
  • H2O Dev GBM first tree differs from H2O 1 (PUBDEV-421)
  • get glm model from flow fails to find coefficient name field (PUBDEV-394)
  • GBM/GLM build model fails on Hadoop after building 100% => Failed to find schema for version: 3 and type: GBMModel (PUBDEV-378)
  • Parsing KDD wrong (PUBDEV-393)
  • GLM AIOOBE (PUBDEV-199)
  • Flow : Build GLM Model with family poisson => java.lang.ArrayIndexOutOfBoundsException: 1 at hex.glm.GLM$GLMLambdaTask.needLineSearch(GLM.java:359) (PUBDEV-210)
  • Flow : GLM Model Error => Enum conversion only works on small integers (PUBDEV-365)
  • GLM binary response, do_classfication=FALSE, family=binomial, prediction error (PUBDEV-339)
  • Epsilon missing from GLM parameters (PUBDEV-354)
  • GLM NPE (PUBDEV-395)
  • Flow: GLM bug (or incorrect output) (PUBDEV-252)
  • GLM binomial prediction failing (PUBDEV-403)
  • GLM binomial on benign.csv gets assertion error in predict (PUBDEV-132)
  • current summary default_pctiles doesn't have 0.001 and 0.999 like h2o1 (PUBDEV-94)
  • Flow: Build GBM/DL Model: java.lang.IllegalArgumentException: Enum conversion only works on integer columns (PUBDEV-213) (github)
  • ModelMetrics on cup98VAL_z dataset has response with many nulls (PUBDEV-214)
  • GBM : Predict model category output/inspect parameters shows as Regression when model is built with do classification enabled (PUBDEV-441)
  • Fix double-precision DRF bugs (github)
System
  • Null columnTypes for /smalldata/arcene/arcene_train.data (PUBDEV-406) (github)
  • Flow: Waiting for -1 responses after starting h2o on hadoop cluster of 5 nodes (PUBDEV-419)
  • Parse: airlines_all.csv => Airtime type shows as ENUM instead of Integer (PUBDEV-426) (github)
  • Flow: Typo - "Time" option displays twice in column header type menu in Parse (PUBDEV-446)
  • Duplicate validation messages in k-means output (PUBDEV-305) (github)
  • Fixes Parse so that it returns to supplying generic column names when no column names exist (github)
  • Flow: Import File: File doesn't exist on all the hdfs nodes => Fails without valid message (PUBDEV-313)
  • Flow: Parse => 1m.svm hangs at 42% (HEXDEV-174)
  • Prediction NFE (PUBDEV-308)
  • NPE doing Frame to key before it's fully parsed (PUBDEV-79)
  • h2o_master_DEV_gradle_build_J8 #351 hangs for past 17 hrs (PUBDEV-239)
  • Sparkling water - container exited due to unavailable port (PUBDEV-357)
API
  • Flow: Splitframe => java.lang.ArrayIndexOutOfBoundsException (PUBDEV-410) (github)
  • Incorrect dest.type, description in /CreateFrame jobs (PUBDEV-404)
  • space in windows filename on python (PUBDEV-444) (github)
  • Python end-to-end data science example 1 runs correctly (PUBDEV-182)
  • 3/NodePersistentStorage.json/foo/id should throw 404 instead of 500 for 'not-found' (HEXDEV-163)
  • POST /3/NodePersistentStorage.json should handle Content-Type:multipart/form-data (HEXDEV-165)
  • by class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: --- Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122 (PUBDEV-92)
  • Sparkling water : val train:DataFrame = prostateRDD => Fails with ArrayIndexOutOfBoundsException (PUBDEV-392)
  • Flow : getModels produces error: Error calling GET /3/Models.json (PUBDEV-254)
  • Flow : Splitframe => java.lang.ArrayIndexOutOfBoundsException (PUBDEV-410)
  • ddply 'Could not find the operator' (HEXDEV-162) (github)
  • h2o.table AIOOBE during NewChunk creation (HEXDEV-161) (github)
  • Fix warning in h2o.ddply when supplying multiple grouping columns (github)

0.1.26.1051 - 2/13/15

New Features

Enhancements

System
  • Embedded H2O config can now provide flat file (needed for Hadoop) (github)
  • Don't logging GET of individual jobs to avoid filling up the logs (github)
Algorithms
  • Increase GBM/DRF factor binning back to historical levels. Had been capped accidentally at nbins (typically 20), was intended to support a much higher cap. (github)
  • Tweaked rho heuristic in glm (github)
  • Enable variable importances for autoencoders (github)
  • Removed group_split option from GBM
  • Flow: display varimp for GBM output (PUBDEV-398)
  • variable importance for GBM (github)
  • GLM in H2O-Dev may provide slightly different coefficient values when applying an L1 penalty in comparison with H2O1.

Bug Fixes

Algorithms
  • Fixed bug in GLM exception handling causing GLM jobs to hang (github)
  • Fixed a bug in kmeans input parameter schema where init was always being set to Furthest (github)
  • Fixed mean computation in GLM (github)
  • Fixed kmeans.R (github)
  • Flow: Building GBM model fails with Error executing javascript (PUBDEV-396)
System
  • DataFrame propagates absolute path to parser (github)
  • Fix flow shutdown bug (github)

0.1.26.1032 - 2/6/15

New Features

General Improvements
  • better model output
  • support for Python client
  • support for Maven
  • support for Sparkling Water
  • support for REST API schema
  • support for Hadoop CDH5 (github)
UI
  • Display summary visualizations by default in column summary output cells (PUBDEV-337)
  • Display AUC curve by default in binomial prediction output cells (PUBDEV-338)
  • Flow: Implement About H2O/Flow with version information (PUBDEV-111)
  • Add UI for CreateFrame (PUBDEV-218)
  • Flow: Add ability to cancel running jobs (PUBDEV-373)
  • Flow: warn when user navigates away while having unsaved content (PUBDEV-322)
Algorithms
API
System

Enhancements

UI
  • Added better message when h2o.init() not yet called (No active connection to an H2O cluster. Try calling "h2o.init()") (github)
Algorithms
  • Updated column-based gradient task to use sparse interface (github)
  • Updated LBFGS (added progress monitor interface, updated some default params), added progress and job support to GLM lbfgs (github)
  • Added pretty print (github)
  • Added AutoEncoder to R model categories (github)
  • Added Coefficients table to GLM model (github)
  • Updated glm lbfgs to allow for efficient lambda-search (l2 penalty only) (github)
  • Removed splitframe shuffle parameter (github)
  • Simplified model builders and added deeplearning model builder (github)
  • Add DL model outputs to Flow (PUBDEV-372)
  • Flow: Deep Learning: Expert Mode (PUBDEV-284)
  • Flow: Display multinomial and regression DL model outputs (PUBDEV-383)
  • Display varimp details for DL models (PUBDEV-381)
  • Make binomial response "0" and "1" by default (github)
  • Add Coefficients table to GLM model (github)
  • Removed splitframe shuffle parameter (github)
  • Update R GBM demos to reflect new input parameter names (github)
  • Rename GLM variable importance to normalized coefficient magnitudes (github)
API
  • Changed key to destination_key (github)
  • Cleaned up REST API schema interface (github)
  • Changed method name, cleaned setup, added a pyunit runner (github)
System

Bug Fixes

UI
  • Flow: Parse => 1m.svm hangs at 42% (PUBDEV-345)
  • cup98 Dataset has columns that prevent validation/prediction (PUBDEV-349)
  • Flow: predict step failed to function (PUBDEV-217)
  • Flow: Arrays of numbers (ex. hidden in deeplearning)require brackets (PUBDEV-303)
  • Flow v.0.1.26.1030: StackTrace was broken (PUBDEV-371)
  • Flow: Import files -> Search -> Parse these files -> null pointer exception (PUBDEV-170)
  • Flow: "getJobs" not working (PUBDEV-320)
  • Thresholds x Metrics and Max Criteria x Metrics tables were flipped in flow (HEXDEV-155)
  • Flow v.0.1.26.1030: StackTrace is broken (PUBDEV-348)
  • flow: getJobs always shows "Your H2O cloud has no jobs" (PUBDEV-243)
  • Flow: First and last characters deleted from ignored columns (PUBDEV-300)
  • Sparkling water => Flow => Menu buttons for cell do not show up (PUBDEV-294)
Algorithms
  • Flow: Build K Means model with default K value gives error "Required field k not specified" (PUBDEV-167)
  • Slicing out a specific data point is broken (PUBDEV-280)
  • Flow: SplitFrame and grep in algorithms for flow and loops back onto itself (PUBDEV-272)
  • Fixed the predict method (github)
  • Refactor ModelMetrics into a different class for Binomial (github)
  • /Predictions.json did not cache predictions (HEXDEV-119)
  • Flow, DL: Error after changing hidden layer size (PUBDEV-323)
  • Error in node$h2o#node: $ operator is invalid for atomic vectors (PUBDEV-348)
  • Fixed K-means predict (PUBDEV-321)
  • Flow: DL build mode fails => as it's missing adding quotes to parameter (PUBDEV-301)
  • Flow: Build K means model with training/validation frames => unknown error (PUBDEV-185)
  • Flow: Build quantile mode=> Click goes in loop (PUBDEV-188)
API
System
  • guesser needs to send types to parse (PUBDEV-279)
  • Got h2o.clusterStatus function working in R. (github)
  • Parse: Using R => java.lang.NullPointerException (PUBDEV-380)
  • Flow: Jobs => click on destination key => unimplemented: Unexpected val class for Inspect: class water.fvec.DataFrame (PUBDEV-363)
  • Column assignment in R exposes NullPointerException in Rollup (PUBDEV-155)
  • import from hdfs doesn't add files (PUBDEV-260)
  • AssertionError: ERROR: got tcp resend with existing in-progress task (PUBDEV-219)
  • HDFS parse fails when H2O launched on Spark CDH5 (PUBDEV-138)
  • Flow: Parse failure => java.lang.ArrayIndexOutOfBoundsException (PUBDEV-296)
  • "predict" step is not working in flow (PUBDEV-202)
  • Flow: Frame finishes parsing but comes up as null in flow (PUBDEV-270)
  • scala >flightsToORD.first() fails with "not serializable result" (PUBDEV-304)
  • DL throws NPE for bad column names (PUBDEV-15)
  • Flow: Build model: Not able to build KMeans/Deep Learning model (PUBDEV-297)
  • Flow: Col summary for NA/Y cols breaks (PUBDEV-325)
  • Sparkling Water : util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread NanoHTTPD Session,9,main (PUBDEV-346)
  • toDataFrame doesn't support sequence format schema (array, vectorUDT) (PUBDEV-457)

0.1.20.1019 - 1/19/15

New Features

UI
  • Added various documentation links to the build page (github)
Algorithms
  • Ported matrix multiply over and connected it to rapids (github)

Enhancements

UI
  • Allow user to specify (the log of) the number of rows per chunk for a new constant chunk; use this new function in CreateFrame (github)
  • Make CreateFrame non-blocking, now displays progress bar in Flow (github)
  • Add row and column count to H2OFrame show method (github)
  • Admin watermeter page (PUBDEV-234)
  • Admin stack trace (PUBDEV-228)
  • Admin profile (PUBDEV-227)
  • Flow: Add download logs in UI (PUBDEV-204)
  • Need shutdown, minimally like h2o (PUBDEV-74)
API
  • Changed 2 to 3 for JSON requests (github)
  • Rename some more fields per consistency (max_iters changed to max_iterations, _iters to _iterations, _ncats to _categorical_column_count, _centersraw to centers_raw, _avgwithinss to tot_withinss, _withinmse to withinss) (github)
  • Changed K-Means output parameters (withinmse to within_mse, avgss to avg_ss, avgbetweenss to avg_between_ss) (github)
  • Remove default field values from DeepLearning parameters schema, since they come from the backing class (github)
  • Add @API help annotation strings to JSON model output (PUBDEV-216)
Algorithms
  • Minor fix in rapids matrix multiplicaton (github)
  • Updated sparse chunk to cut off binary search for prefix/suffix zeros (github)
  • Updated L_BFGS for GLM - warm-start solutions during lambda search, correctly pass current lambda value, added column-based gradient task (github)
  • Fix model parameters' default values in the metadata (github)
  • Set default value of k = number of clusters to 1 for K-Means (PUBDEV-251)
System
  • Reject any training data with non-numeric values from KMeans model building (github)

Bug Fixes

API
  • Fixed isSparse call for constant chunks (github)
  • Fixed sparse interface of constant chunks (no nonzero if const 1= 0) (github)
System
  • Typeahead for folder contents apparently requires trailing "/" (github)
  • Fix build and instructions for R install.packages() style of installation; Note we only support source installs now (github)
  • Fixed R test runner h2o package install issue that caused it to fail to install on dev builds (github)

0.1.18.1013 - 1/14/15

New Features

UI

Enhancements

Algorithms

0.1.20.1016 - 12/28/14

  • Added ip_port field in node json output for Cloud query (github)