Permalink
Switch branches/tags
jenkins-tomk-hadoop-1 jenkins-tomas_jenkins-7 jenkins-tomas_jenkins-6 jenkins-tomas_jenkins-5 jenkins-tomas_jenkins-4 jenkins-tomas_jenkins-3 jenkins-tomas_jenkins-2 jenkins-tomas_jenkins-1 jenkins-sample-docs-3 jenkins-sample-docs-2 jenkins-sample-docs-1 jenkins-rel-vapnik-1 jenkins-rel-vajda-4 jenkins-rel-vajda-3 jenkins-rel-vajda-2 jenkins-rel-vajda-1 jenkins-rel-ueno-9 jenkins-rel-ueno-8 jenkins-rel-ueno-7 jenkins-rel-ueno-6 jenkins-rel-ueno-5 jenkins-rel-ueno-4 jenkins-rel-ueno-3 jenkins-rel-ueno-2 jenkins-rel-ueno-1 jenkins-rel-tverberg-6 jenkins-rel-tverberg-5 jenkins-rel-tverberg-4 jenkins-rel-tverberg-3 jenkins-rel-tverberg-2 jenkins-rel-tverberg-1 jenkins-rel-tutte-2 jenkins-rel-tutte-1 jenkins-rel-turnbull-2 jenkins-rel-turnbull-1 jenkins-rel-turing-10 jenkins-rel-turing-9 jenkins-rel-turing-8 jenkins-rel-turing-7 jenkins-rel-turing-6 jenkins-rel-turing-5 jenkins-rel-turing-4 jenkins-rel-turing-3 jenkins-rel-turing-2 jenkins-rel-turing-1 jenkins-rel-turin-4 jenkins-rel-turin-3 jenkins-rel-turin-2 jenkins-rel-turin-1 jenkins-rel-turchin-11 jenkins-rel-turchin-10 jenkins-rel-turchin-9 jenkins-rel-turchin-8 jenkins-rel-turchin-7 jenkins-rel-turchin-6 jenkins-rel-turchin-5 jenkins-rel-turchin-4 jenkins-rel-turchin-3 jenkins-rel-turchin-2 jenkins-rel-turchin-1 jenkins-rel-turan-4 jenkins-rel-turan-3 jenkins-rel-turan-2 jenkins-rel-turan-1 jenkins-rel-tukey-6 jenkins-rel-tukey-5 jenkins-rel-tukey-4 jenkins-rel-tukey-3 jenkins-rel-tukey-2 jenkins-rel-tukey-1 jenkins-rel-tibshirani-12 jenkins-rel-tibshirani-11 jenkins-rel-tibshirani-10 jenkins-rel-tibshirani-9 jenkins-rel-tibshirani-8 jenkins-rel-tibshirani-7 jenkins-rel-tibshirani-5 jenkins-rel-tibshirani-4 jenkins-rel-tibshirani-3 jenkins-rel-tibshirani-2 jenkins-rel-tibshirani-1 jenkins-rel-slotnick-1 jenkins-rel-slater-9 jenkins-rel-slater-8 jenkins-rel-slater-7 jenkins-rel-slater-6 jenkins-rel-slater-5 jenkins-rel-slater-4 jenkins-rel-slater-3 jenkins-rel-slater-2 jenkins-rel-slater-1 jenkins-rel-simons-7 jenkins-rel-simons-6 jenkins-rel-simons-5 jenkins-rel-simons-4 jenkins-rel-simons-3 jenkins-rel-simons-2 jenkins-rel-simons-1 jenkins-rel-shannon-30 jenkins-rel-shannon-29
Nothing to show
Find file
Fetching contributors…
Cannot retrieve contributors at this time
1271 lines (922 sloc) 108 KB

#Recent Changes

##H2O-Dev

###Severi (0.2.2.15) - 4/25/15

####New Features The following features have been added since the last release:

#####Python

  • added min, max, sum, median for H2OVecs and respective pyunit (github)
  • added min(), max(), and sum() functionality on H2OFrames and respective pyunits (github)

#####Web UI

####Enhancements The following changes are improvements to existing features (which includes changed default values):

#####Algorithms

  • K means output clean up (HEXDEV-187)
  • Add FNR/TNR/FPR/TPR to threshold tables, remove recall, specificity (github)
  • Add accessor for variable importances for DL (github)
  • Relax CM error tolerance for F1-optimal threshold now that AUC2 doesn't necessarily create consistent thresholds with its own CMs. (github)
  • Added scoring history to glm (github)
  • Added model summary to glm (github)
  • Add flag to support reading data from S3N (github)
  • Added degrees of freedom to GLM metrics schemas (github)
  • Allow DL scoring_history to be unlimited in length (github)
  • add plotting for binomial models (github)
  • Ignore certain parameters that are not applicable (class balancing, max CM size, etc.) (github)
  • Updated glm scoring, fill training/validation metrics in model output (github)
  • Rename gbm loss parameter to distribution (github)
  • Fix GBM naming: loss -> distribution (github)
  • GLM LBFGS update (github)
  • na.rm for quantile is default behavior (github)
  • GLM update: enabled max_predictors in REST, updated lbfgs (github)
  • Remove keep_cross_validation_splits for now from DL (github)
  • Get rid of sigma in the model metrics, instead show r2 (github)
  • Don't show score_every_iteration for DL (github)
  • Don't print too large confusion matrices in Tree models (github)

#####API

#####Python

  • Python client should check that version number == server version number (PUBDEV-799)
  • Add asfactor for month (github)
  • in Expr.show() only show 10 or less rows. remove locate from runit test because full path used (github)
  • change nulls to () (github)
  • sigma is no longer part of ModelMetricsRegressionV3 (github)

#####R

#####System

  • Port MissingValueInserter EndPoint to h2o-dev. (PUBDEV-465)
  • Rapids: require a (put "key" %frame) (PUBDEV-868)
  • Need pojo base model jar file embedded in h2o-dev via build process (PUBDEV-780) (github)
  • Make .json the default (PUBDEV-619) (github)
  • Rename class for clarification (github)
  • Classifies all NA columns as numeric. Also improves preview sampling accuracy by trimming partial lines at end of chunk. (github)
  • Implements sampling of files within the ParseSetup preview. This prevents poor column type guesses from only sampling the beginning of a file. (github).
  • Rename fields drop_na20_col (github)
  • allow for many deletes as final statements in a block (github)
  • rename initF -> init_f, dropNA20Cols -> drop_na20_cols (github)
  • Removed tweedie param (github)
  • thresholds -> threshold (github)
  • JSON of TwoDimTable with all null values in the first column (no row headers) now doesn't have an empty column for of "" or nulls. (github)
  • move H2O_Load, fix all the timezone functions (github)
  • Add extra verbose printout in case Frames don't match identically (github)
  • allow delayed column lookup (github)
  • add mixed type list (github)
  • Added WaterMeterIo to count persist info (github)
  • Remove special setChunkSize code in HDFS and NFS file vec (github)
  • add check for Frame on string parse (github)
  • Disable Memory Cleaner (github)
  • Handle '<' chars in Keys when swapping (github)
  • allow for colnames in slicing (github)
  • Adjusts parse type detection. If column is all one string value, declare it an enum (github)

#####Web UI

####Bug Fixes

The following changes are to resolve incorrect software behavior:

#####Algorithms

  • GLM: lasso i.e alpha =1 seems to be giving wrong answers (PUBDEV-769)
  • AUC: h2o reports .5 auc when actual auc is 1 (PUBDEV-879)
  • h2o.glm: No output displayed for the model (PUBDEV-858)
  • h2o.glm model object output needs a fix (PUBDEV-815)
  • h2o.glm model object says : fill me in GLMModelOutputV2; I think I'm redundant [1] FALSE (PUBDEV-765)
  • GLM : Build GLM Model => Java Assertion error (PUBDEV-686)
  • GLM :=> Progress shows -100% (PUBDEV-861)
  • GBM: Negative sign missing in initF value for ad dataset (PUBDEV-880)
  • K-Means takes a validation set but doesn't use it (PUBDEV-826)
  • Absolute_MCC is NaN (sometimes) (PUBDEV-848) (github)
  • GBM: A proper error msg should be thrown when the user sets the max depth =0 (PUBDEV-838) (github)
  • DRF Regression Assertion Error (PUBDEV-824)
  • h2o.randomForest: if h2o is not returning the mse for the 0th tree then it should not be reported in the model object (PUBDEV-811)
  • GBM: Got exception class java.lang.AssertionError with msg null java.lang.AssertionError at hex.tree.gbm.GBM$GBMDriver$GammaPass.map (PUBDEV-693)
  • GBM: Got exception class java.lang.AssertionError with msg null java.lang.AssertionError at hex.ModelMetricsMultinomial$MetricBuildMultinomial.perRow (HEXDEV-248)
  • GBM get java.lang.AssertionError: Coldata 2199.0 out of range C17:5086.0-19733.0 step=57.214844 nbins=256 isInt=1 (HEXDEV-241)
  • GLM: glmnet objective function better than h2o.glm (PUBDEV-749)
  • GLM: get AIOOB:-36 at hex.glm.GLMTask$GLMIterationTask.postGlobal(GLMTask.java:733) (PUBDEV-894) (github)
  • Fixed glm behavior in case no rows are left after filtering out NAs (github)
  • Fix memory leak in validation scoring in K-Means (github)

#####API

  • API unification: DataFrame should be able to accept URI referencing file on local filesystem (PUBDEV-709) (github)

#####Python

#####R

#####System

  • MapR FS loads are too slow (PUBDEV-927)
  • ensure that HDFS works from Windows (PUBDEV-812)
  • Summary: on a time column throws,'null' is not an object (evaluating 'column.domain[level.index]') in Flow (PUBDEV-867)
  • Parse: An enum column gets parsed as int for the attached file (PUBDEV-606)
  • Parse => 40Mx1_uniques => class java.lang.RuntimeException (PUBDEV-729)
  • if there are fewer than 5 unique values in a dataset column, mins/maxs reports e+308 values (PUBDEV-150) (github)
  • Sparkling water - DataFrame[T_UUID] to SchemaRDD[StringType] (PUDEV-771)
  • Sparkling water - DataFrame[T_NUM(Long)] to SchemaRDD[LongType] (PUBDEV-767)
  • Sparkling water - DataFrame[T_ENUM] to SchemaRDD[StringType] (PUBDEV-766)
  • Inconsistency in row and col slicing (HEXDEV-265) (github)
  • rep_len expects literal length only (HEXDEV-268) (github)
  • cbind and = don't work within a single rapids block (HEXDEV-237)
  • Rapids response for c(value) does not have frame key (HEXDEV-252)
  • S3 parse takes forever (PUBDEV-876)
  • Parse => Enum unification fails in multi-node parse (PUBDEV-718) (github)
  • All nodes are not getting updated with latest status of each other nodes info (PUBDEV-768)
  • Cluster creation is sometimes rejecting new nodes (post jenkins-master-1128+) (PUBDEV-807)
  • Parse => Multiple files 1 zip/ 1 csv gives Array index out of bounds (PUBDEV-840)
  • Parse => failed for X5MRows6KCols ==> OOM => Cluster dies (PUBDEV-836)
  • /frame/foo pagination weirded out (HEXDEV-277) (github)
  • Removed code that flipped enums to strings (github)

#####Web UI

  • Flow: It would be really useful to have the mse plots back in GBM (PUBDEV-889)
  • State change in Flow is not fully validated (PUBDEV-919)
  • Flows : Not able to load saved flows from hdfs (PUBDEV-872)
  • Save Function in Flow crashes (PUBDEV-791) (github)
  • Flow: should throw a proper error msg when user supplied response have more categories than algo can handle (PUBDEV-866)
  • Flow display of a summary of a column with all missing values fails. (HEXDEV-230)
  • Split frame UI improvements (HEXDEV-275)
  • Flow : Decimal point precisions to be consistent to 4 as in h2o1 (PUBDEV-844)
  • Flow: Prediction frame is outputing junk info (PUBDEV-825)
  • EC2 => Cluster of 16 nodes => Water Meter => shows blank page (PUBDEV-831)
  • Flow: Predict - "undefined is not an object (evaluating prediction.thresholds_and_metric_scores.name) (PUBDEV-559)
  • Flow: inspect getModel for PCA returns error (PUBDEV-610)
  • Flow, RF: Can't get Predict results; "undefined is not an object (evaluating prediction.confusion_matrices.length)" (PUBDEV-695)
  • Flow, GBM: getModel is broken -Error processing GET /3/Models.json/gbm-b1641e2dc3-4bad-9f69-a5f4b67051ba null is not an object (evaluating source.length) (PUBDEV-800)

###Severi (0.2.2.1) - 4/10/15

####New Features

#####R

####Enhancements

#####Algorithms

  • POJO generation: GBM (PUBDEV-713)
  • POJO generation: DRF (PUBDEV-714)
  • Compute and Display Hit Ratios (PUBDEV-630) (github)
  • Add DL POJO scoring (PUBDEV-585)
  • Allow validation dataset for AutoEncoder (PUDEV-581)
  • PUBDEV-580: Add log loss to binomial and multinomial model metric (github)
  • Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)
  • increase tolerance to 2e-3 (was 1e-3 ..failed with 0.001647 relative difference (github)
  • change tolerance to 1e-3 (github)
  • Add option to export weights and biases to REST API / Flow. (github)
  • Add scree plot for H2O PCA models and fix Runit test. (github)
  • Remove quantiles from the model builders list. (github)
  • GLM update: added row filtering argument to line search task, fixed issues with dfork/asyncExec (github)
  • Updated rho-setting in GLM. (github)
  • No threshold 0.5; use the default (max F1) instead (github)
  • GLM update: updated initilization, NA row filtering, default lambda is now empty, will be picked based on the fraction of lambda_max. (github)
  • Updated ADMM solver. (github)
  • Added makeGLMModel call. (github)
  • Start with classification error NaN at t=0 for DL, not with 1. (github)
  • Relax DL POJO relative tolerance to 1e-2. (github)
  • Override nfeatures() method in DLModelOutput. (github)
  • Renaming of fields in GLM (github)
  • GLM: Take out Balance Classes (PUBDEV-795)

#####API

  • schema metadata for Map fields should include the key and value types (PUBDEV-753) (github)
  • schema metadata should include the superclass (PUBDEV-754)
  • rest api naming convention: n_folds vs ntrees (PUBDEV-737)
  • schema metadata for Map fields should include the key and value types (PUBDEV-753)
  • Create REST Endpoint for exposing .java pojo models (PUBDEV-778)

#####Python

  • Run GLM from Python (including LBFGS) (HEXDEV-92)
  • added H2OFrame show(), as_list(), and slicing pyunits (github)
  • changed solver parameter to "L_BFGS" (github)
  • added multidimensional slicing of H2OFrames and Exprs. (github)
  • add h2o.groupby to python interface (github)
  • added H2OModel.confusionMatrix() to return confusion matrix of a prediction (github)

#####R

  • PUBDEV-578, PUBDEV-541, PUBDEV-566. -R client now sends the data frame column names and data types to ParseSetup. -R client can get column names from a parsed frame or a list. -Respects client request for column data types (github)
  • R: Cannot create new columns through R (PUBDEV-571)
  • H2O-R: it would be more useful if h2o.confusion matrix reports the actual class labels instead of [,1] and [,2] (PUBDEV-553)
  • Support both multinomial and binomial CM (github)

#####System

  • Flow: Standardize max_iters/max_iterations parameters (PUBDEV-447) (github)
  • Add ERROR logging level for too-many-retries case (PUBDEV-146) (github)
  • Simplify checking of cluster health. Just report the status immediately. (github)
  • reduce timeout (github)
  • strings can have ' or " beginning (github)
  • Throw a validation error in flow if any training data cols are non-numeric (github)
  • Add getHdfsHomeDirectory(). (github)
  • Added --verbose. (github)

#####Web UI

  • PUBDEV-707: nice algo names in the Flow dropdown (full word names) (github)
  • Unbreak Flow's ConfusionMatrix display. (github)
  • POJO generation: DL (PUBDEV-715)

####Bug Fixes

#####Algorithms

  • GLM : Build GLM model with nfolds brings down the cloud => FATAL: unimplemented (PUBDEV-731) (github)
  • DL : Build DL Model => FATAL: unimplemented: n_folds >= 2 is not (yet) implemented => SHUTSDOWN CLOUD (PUBDEV-727) (github)
  • GBM => Build GBM model => No enum constant hex.tree.gbm.GBMModel.GBMParameters.Family.AUTO (PUBDEV-723)
  • GBM: When run with loss = auto with a numeric column get- error :No enum constant hex.tree.gbm.GBMModel.GBMParameters.Family.AUTO (PUBDEV-708) (github)
  • gbm: does not complain when min_row >dataset size (PUBDEV-694) (github)
  • GLM: reports wrong residual degrees of freedom (PUBDEV-668)
  • H2O dev reports less accurate aucs than H2O (PUBDEV-602)
  • GLM : Build GLM model fails => ArrayIndexOutOfBoundsException (PUBDEV-601)
  • divide by zero in modelmetrics for deep learning (PUBDEV-568)
  • GBM: reports 0th tree mse value for the validation set, different than the train set ,When only train sets is provided (PUDEV-561)
  • GBM: Initial mse in bernoulli seems to be off (PUBDEV-515)
  • GLM : Build Model fails with Array Index Out of Bound exception (PUBDEV-454) (github)
  • Custom Functions don't work in apply() in R (PUBDEV-436)
  • GLM failure: got NaNs and/or Infs in beta on airlines (PUBDEV-362)
  • MetricBuilderMultinomial.perRow AssertionError while running GBM (HEXDEV-240)
  • Problems during Train/Test adaptation between Enum/Numeric (HEXDEV-229)
  • DRF/GBM balance_classes=True throws unimplemented exception (HEXDEV-226) (github)
  • AUC reported on training data is 0, but should be 1 (HEXDEV-223) (github)
  • glm pyunit intermittent failure (HEXDEV-199)
  • Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
  • get rid of nfolds= param since it's not supported in GLM yet (github)
  • Fixed degrees of freedom (off by 1) in glm, added test. (github)
  • GLM fix: fix filtering of rows with NAs and fix in sparse handling. (github)
  • Fix GLM job fail path to call Job.fail(). (github)
  • Full AUC computation, bug fixes (github)
  • Fix ADMM for upper/lower bounds. (updated rho settings + update u-vector in ADMM for intercept) (github)
  • Few glm fixes (github)
  • DL : KDD Algebra data set => Build DL model => ArrayIndexOutOfBoundsException (PUBDEV-696)
  • GBm: Dev vs H2O for depth 5, minrow=10, on prostate, give different trees (PUBDEV-759)
  • GBM param min_rows doesn't throw exception for negative values (PUBDEV-697)
  • GBM : Build GBM Model => Too many levels in response column! (java.lang.IllegalArgumentException) => Should display proper error message (PUBDEV-698)
  • GBM:Got exception 'class java.lang.AssertionError', with msg 'Something is wrong with GBM trees since returned prediction is Infinity (PUBDEV-722)

#####API

  • Cannot adapt numeric response to factors made from numbers (PUBDEV-620)
  • not specifying response_column gets NPE (deep learning build_model()) I think other algos might have same thing (PUBDEV-131)
  • NPE response has null msg, exception_msg and dev_msg (HEXDEV-225)
  • Flow :=> Save Flow => On Mac and Windows 8.1 => NodePersistentStorage failure while attempting to overwrite (?) a flow (HEXDEV-202) (github)
  • the can_build field in ModelBuilderSchema needs values[] to be set (PUBDEV-755)
  • value field in the field metadata isn't getting serialized as its native type (PUBDEV-756)

#####Python

#####R

#####System

  • key type failure should fail the request, not the cloud (PUBDEV-739) (github)
  • Parse => Import Medicare supplier file => Parse = > Illegal argument for field: column_names of schema: ParseV2: string and key arrays' values must be quoted, but the client sent: " (PUBDEV-719)
  • Overwriting a constant vector with strings fails (PUBDEV-702)
  • H2O - gets stuck while calculating quantile,no error msg, just keeps running a job that normally takes less than a sec (PUBDEV-685)
  • Summary and quantile on a column with all missing values should not throw an exception (PUBDEV-673) (github)
  • View Logs => class java.lang.RuntimeException: java.lang.IllegalArgumentException: File /home2/hdp/yarn/usercache/neeraja/appcache/application_1427144101512_0039/h2ologs/h2o_172.16.2.185_54321-3-info.log does not exist (PUBDEV-600)
  • Parse: After parsing Chicago crime dataset => Not able to build models or Get frames (PUBDEV-576)
  • Parse: Numbers completely parsed wrong (PUBDEV-574)
  • Flow: converting a column to enum while parsing does not work (PUBDEV-566)
  • Parse: Fail gracefully when asked to parse a zip file with different files in it (PUBDEV-540)(github)
  • toDataFrame doesn't support sequence format schema (array, vectorUDT) (PUBDEV-457)
  • Parse : Parsing random crap gives java.lang.ArrayIndexOutOfBoundsException: 13 (PUBDEV-428)
  • The quote stripper for column names should report when the stripped chars are not the expected quotes (PUBDEV-424)
  • import directory with large files,then Frames..really slow and disk grinds. Files are unparsed. Shouldn't be grinding (PUBDEV-98)
  • NodePersistentStorage gets wiped out when hadoop cluster is restarted (HEXDEV-185)
  • h2o.exec won't be supported (github)
  • fixed import issue (github)
  • fixed init param (github)
  • fix repeat as.factor NPE (github)
  • startH2O set to False in init (github)
  • hang on glm job removal (PUBDEV-726)
  • Flow - changed column types need to be reflected in parsed data (HEXDEV-189)
  • water.DException$DistributedException while running kmeans in multinode cluster (PUBDEV-691)
  • Frame inspection prior to file parsing, corrupts parsing (PUBDEV-425)

#####Web UI

  • Flow, DL: Need better fail message if "Autoencoder" and "use_all_factor_levels" are both selected (PUBDEV-724)
  • When select AUTO while building a gbm model get ERROR FETCHING INITIAL MODEL BUILDER STATE (PUBDEV-595)
  • Flow : Build h2o-dev-0.1.17.1009 : Building GLM model gives java.lang.ArrayIndexOutOfBoundsException: (PUBDEV-205 (github)
  • Flow:Summary on flow broken for a long time (PUBDEV-785)

Serre (0.2.1.1) - 3/18/15

####New Features

#####Algorithms

#####Python

#####R

#####System

#####Web UI

####Enhancements

#####Algorithms

  • Display GLM coefficients only if available (PUBDEV-466)
  • Add random chance line to RoC chart (HEXDEV-168)
  • Speed up DLSpiral test. Ignore Neurons test (MatVec) (github)
  • Use getRNG for Dropout (github)
  • PUBDEV-598: Add tests for determinism of RNGs (github)
  • PUBDEV-598: Implement Chi-Square test for RNGs (github)
  • Add DL model output toString() (github)
  • Add LogLoss to MultiNomial ModelMetrics (PUBDEV-580)
  • Print number of categorical levels once we hit >1000 input neurons. (github)
  • Updated the loss behavior for GBM. When loss is set to AUTO, if the response is an integer with 2 levels, then bernoullli (rather than gaussian) behavior is chosen. As a result, the do_classification flag is no longer necessary in Flow, since the loss completely specifies the desired behavior, and R users no longer to use as.factor() in their response to get the desired bernoulli behavior. The score_each_iteration flag has been removed as well. (github)
  • Fully remove _convert_to_enum in all algos (github)
  • Port MissingValueInserter EndPoint to h2o-dev. (PUBDEV-465)

#####API

#####Python

  • added H2OFrame.setNames(), H2OFrame.cbind(), H2OVec.cbind(), h2o.cbind(), and pyunit_cbind.py (github)
  • Make H2OVec.levels() return the levels (github)
  • H2OFrame.dim(), H2OFrame.append(), H2OVec.setName(), H2OVec.isna() additions. demo pyunit addition (github)

#####System

  • Customize H2O web UI port (PUBDEV-483)
  • Make parse setup interactive (PUBDEV-532)
  • Added --verbose (github)
  • Adds some H2OParseExceptions. Removes all H2O.fail in parse (no parse issues should cause a fail)(github)
  • Allows parse to specify check_headers=HAS_HEADERS, but not provide column names (github)
  • Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)

#####Web UI

  • Add 'Clear cell' and 'Run all cells' toolbar buttons (github)
  • Add 'Clear cell' and 'Clear all cells' commands (PUBDEV-493) (github)
  • 'Run' button selects next cell after running
  • ModelMetrics by model category: Clustering (PUBDEV-416)
  • ModelMetrics by model category: Regression (PUBDEV-415)
  • ModelMetrics by model category: Multinomial (PUBDEV-414)
  • ModelMetrics by model category: Binomial (PUBDEV-413)
  • Add ability to select and delete multiple models (github)
  • Add ability to select and delete multiple frames (github)
  • Flows now stop running when an error occurs
  • Print full number of mismatches during POJO comparison check. (github)
  • Make Grid multi-node safe (github)
  • Beautify the vertical axis labels for Flow charts/visualization (more) (PUBDEV-329)

####Bug Fixes

#####Algorithms

  • GBM only populates either MSE_train or MSE_valid but displays both (PUBDEV-350)
  • GBM: train error increases after hitting zero on prostate dataset (PUBDEV-513)
  • GBM : Variable importance displays 0's for response param => should not display response in table at all (PUBDEV-430)
  • GLM : R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)
  • Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
  • Flow: GLM - 'model.output.coefficients_magnitude.name' not found, so can't view model (PUBDEV-466)
  • GBM predict fails without response column (PUBDEV-478)
  • GBM: When validation set is provided, gbm should report both mse_valid and mse_train (PUBDEV-499)
  • PCA Assertion Error during Model Metrics (PUBDEV-548) (github)
  • KMeans: Size of clusters in Model Output is different from the labels generated on the training set (PUBDEV-542) (github)
  • Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
  • PUBDEV-580: Fix some numerical edge cases (github)
  • Fix two missing float -> double conversion changes in tree scoring. (github)
  • Flow: HIDDEN_DROPOUT_RATIOS for DL does not show default value (PUBDEV-285)
  • Old GLM Parameters Missing (PUBDEV-431)
  • GLM: R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)

#####API

  • SplitFrame on String column produce C0LChunk instead of CStrChunk (PUBDEV-468)
  • Error in node$h2o$node : $ operator is invalid for atomic vectors (PUBDEV-348)
  • Response from /ModelBuilders don't conform to standard error json shape when there are errors (HEXDEV-121) (github)

#####Python

  • fix python syntax error (github)
  • Fixes handling of None in python for a returned na_string. (github)

#####R

  • R : Inconsistency - Train set name with and without quotes work but Validation set name with quotes does not work (PUBDEV-491)
  • h2o.confusionmatrices does not work (PUBDEV-547)
  • How do i convert an enum column back to integer/double from R? (PUBDEV-546)
  • Summary in R is faulty (PUBDEV-539)
  • R: as.h2o should preserve R data types (PUBDEV-578)
  • NPE in GBM Prediction with Sliced Test Data (HEXDEV-207) (github)
  • Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
  • Custom Functions don't work in apply() in R (PUBDEV-436)
  • got water.DException$DistributedException and then got java.lang.RuntimeException: Categorical renumber task (HEXDEV-195)
  • H2O-R: as.h2o parses column name as one of the row entries (PUBDEV-591)
  • R-H2O Managing Memory in a loop (PUB-1125)
  • h2o.confusionMatrices for multinomial does not work (PUBDEV-577)
  • H2O-R not showing meaningful error msg

#####System

  • Flow: When balance class = F then flow should not show max_after_balance_size = 5 in the parameter listing (PUBDEV-503)
  • 3 jvms, doing ModelMetrics on prostate, class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: *** Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122 (PUBDEV-495)
  • Not able to start h2o on hadoop (PUBDEV-487)
  • one row (one col) dataset seems to get assertion error in parse setup request (PUBDEV-96)
  • Parse : Import file (move.com) => Parse => First row contains column names => column names not selected (HEXDEV-171) (github)
  • The NY0 parse rule, in summary. Doesn't look like it's counting the 0's as NAs like h2o (PUBDEV-154)
  • 0 / Y / N parsing (PUBDEV-229)
  • NodePersistentStorage gets wiped out when laptop is restarted. (HEXDEV-167)
  • Building a model and making a prediction accepts invalid frame types (PUBDEV-83)
  • Flow : Import file 15M rows 2.2 Cols => Parse => Error fetching job on UI =>Console : ERROR: Job was not successful Exiting with nonzero exit status (HEXDEV-55)
  • Flow : Build GLM Model => Family tweedy => class hex.glm.LSMSolver$ADMMSolver$NonSPDMatrixException', with msg 'Matrix is not SPD, can't solve without regularization (PUBDEV-211)
  • Flow : Import File : File doesn't exist on all the hdfs nodes => Fails without valid message (PUBDEV-313)
  • Check reproducibility on multi-node vs single-node (PUBDEV-557)
  • Parse : After parsing Chicago crime dataset => Not able to build models or Get frames (PUBDEV-576)

#####Web UI

  • Flow : Build Model => Parameters => shows meta text for some params (PUBDEV-505)
  • Flow: K-Means - "None" option should not appear in "Init" parameters (PUBDEV-459)
  • Flow: PCA - "None" option appears twice in "Transform" list (HEXDEV-186)
  • GBM Model : Params in flow show two times (PUBDEV-440)
  • Flow multinomial confusion matrix visualization (HEXDEV-204)
  • Flow: It would be good if flow can report the actual distribution, instead of just reporting "Auto" in the model parameter listing (PUBDEV-509)
  • Unimplemented algos should be taken out from drop down of build model (PUBDEV-511)
  • [MapR] unable to give hdfs file name from Flow (PUBDEV-409)

###Selberg (0.2.0.1) - 3/6/15 ####New Features

#####Algorithms

#####Python

#####R

#####System

#####Web UI

####Enhancements

The following changes are improvements to existing features (which includes changed default values):

#####Algorithms

  • Display GLM coefficients only if available (PUBDEV-466)
  • Add random chance line to RoC chart (HEXDEV-168)
  • Allow validation dataset for AutoEncoder (PUDEV-581)
  • Speed up DLSpiral test. Ignore Neurons test (MatVec) (github)
  • Use getRNG for Dropout (github)
  • PUBDEV-598: Add tests for determinism of RNGs (github)
  • PUBDEV-598: Implement Chi-Square test for RNGs (github)
  • PUBDEV-580: Add log loss to binomial and multinomial model metric (github)
  • Add DL model output toString() (github)
  • Add LogLoss to MultiNomial ModelMetrics (PUBDEV-580)
  • Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)
  • Print number of categorical levels once we hit >1000 input neurons. (github)
  • Updated the loss behavior for GBM. When loss is set to AUTO, if the response is an integer with 2 levels, then bernoullli (rather than gaussian) behavior is chosen. As a result, the do_classification flag is no longer necessary in Flow, since the loss completely specifies the desired behavior, and R users no longer to use as.factor() in their response to get the desired bernoulli behavior. The score_each_iteration flag has been removed as well. (github)
  • Fully remove _convert_to_enum in all algos (github)
  • Add DL POJO scoring (PUBDEV-585)

#####API

#####Python

  • added H2OFrame.setNames(), H2OFrame.cbind(), H2OVec.cbind(), h2o.cbind(), and pyunit_cbind.py (github)
  • Make H2OVec.levels() return the levels (github)
  • H2OFrame.dim(), H2OFrame.append(), H2OVec.setName(), H2OVec.isna() additions. demo pyunit addition (github)

#####R

  • PUBDEV-578, PUBDEV-541, PUBDEV-566. -R client now sends the data frame column names and data types to ParseSetup. -R client can get column names from a parsed frame or a list. -Respects client request for column data types (github)

#####System

  • Customize H2O web UI port (PUBDEV-483)
  • Make parse setup interactive (PUBDEV-532)
  • Added --verbose (github)
  • Adds some H2OParseExceptions. Removes all H2O.fail in parse (no parse issues should cause a fail)(github)
  • Allows parse to specify check_headers=HAS_HEADERS, but not provide column names (github)
  • Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)

#####Web UI

  • Add 'Clear cell' and 'Run all cells' toolbar buttons (github)
  • Add 'Clear cell' and 'Clear all cells' commands (PUBDEV-493) (github)
  • 'Run' button selects next cell after running
  • ModelMetrics by model category: Clustering (PUBDEV-416)
  • ModelMetrics by model category: Regression (PUBDEV-415)
  • ModelMetrics by model category: Multinomial (PUBDEV-414)
  • ModelMetrics by model category: Binomial (PUBDEV-413)
  • Add ability to select and delete multiple models (github)
  • Add ability to select and delete multiple frames (github)
  • Flows now stop running when an error occurs
  • Print full number of mismatches during POJO comparison check. (github)
  • Make Grid multi-node safe (github)
  • Beautify the vertical axis labels for Flow charts/visualization (more) (PUBDEV-329)

####Bug Fixes The following changes are to resolve incorrect software behavior:

#####Algorithms

  • GBM only populates either MSE_train or MSE_valid but displays both (PUBDEV-350)
  • GBM: train error increases after hitting zero on prostate dataset (PUBDEV-513)
  • GBM : Variable importance displays 0's for response param => should not display response in table at all (PUBDEV-430)
  • Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
  • GLM : R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)
  • Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
  • Flow: GLM - 'model.output.coefficients_magnitude.name' not found, so can't view model (PUBDEV-466)
  • GBM predict fails without response column (PUBDEV-478)
  • GBM: When validation set is provided, gbm should report both mse_valid and mse_train (PUBDEV-499)
  • PCA Assertion Error during Model Metrics (PUBDEV-548) (github)
  • KMeans: Size of clusters in Model Output is different from the labels generated on the training set (PUBDEV-542) (github)
  • Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
  • divide by zero in modelmetrics for deep learning (PUBDEV-568)
  • AUC reported on training data is 0, but should be 1 (HEXDEV-223) (github)
  • GBM: reports 0th tree mse value for the validation set, different than the train set ,When only train sets is provided (PUDEV-561)
  • PUBDEV-580: Fix some numerical edge cases (github)
  • Fix two missing float -> double conversion changes in tree scoring. (github)
  • Problems during Train/Test adaptation between Enum/Numeric (HEXDEV-229)
  • DRF/GBM balance_classes=True throws unimplemented exception (HEXDEV-226)
  • Flow: HIDDEN_DROPOUT_RATIOS for DL does not show default value (PUBDEV-285)
  • Old GLM Parameters Missing (PUBDEV-431)
  • GLM: R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)
  • GBM: Initial mse in bernoulli seems to be off (PUBDEV-515)

#####API

  • SplitFrame on String column produce C0LChunk instead of CStrChunk (PUBDEV-468)
  • Error in node$h2o$node : $ operator is invalid for atomic vectors (PUBDEV-348)
  • Response from /ModelBuilders don't conform to standard error json shape when there are errors (HEXDEV-121)

#####Python

  • fix python syntax error (github)
  • Fixes handling of None in python for a returned na_string. (github)

#####R

  • R : Inconsistency - Train set name with and without quotes work but Validation set name with quotes does not work (PUBDEV-491)
  • h2o.confusionmatrices does not work (PUBDEV-547)
  • How do i convert an enum column back to integer/double from R? (PUBDEV-546)
  • Summary in R is faulty (PUBDEV-539)
  • Custom Functions don't work in apply() in R (PUBDEV-436)
  • R: as.h2o should preserve R data types (PUBDEV-578)
  • as.h2o loses track of headers (PUBDEV-541)
  • NPE in GBM Prediction with Sliced Test Data (HEXDEV-207) (github)
  • Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
  • Custom Functions don't work in apply() in R (PUBDEV-436)
  • got water.DException$DistributedException and then got java.lang.RuntimeException: Categorical renumber task (HEXDEV-195)
  • h2o.confusionMatrices for multinomial does not work (PUBDEV-577)
  • R: h2o.confusionMatrix should handle both models and model metric objects (PUBDEV-590)
  • H2O-R: as.h2o parses column name as one of the row entries (PUBDEV-591)

#####System

  • Flow: When balance class = F then flow should not show max_after_balance_size = 5 in the parameter listing (PUBDEV-503)
  • 3 jvms, doing ModelMetrics on prostate, class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: *** Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122 (PUBDEV-495)
  • Not able to start h2o on hadoop (PUBDEV-487)
  • one row (one col) dataset seems to get assertion error in parse setup request (PUBDEV-96)
  • Parse : Import file (move.com) => Parse => First row contains column names => column names not selected (HEXDEV-171) (github)
  • The NY0 parse rule, in summary. Doesn't look like it's counting the 0's as NAs like h2o (PUBDEV-154)
  • 0 / Y / N parsing (PUBDEV-229)
  • NodePersistentStorage gets wiped out when laptop is restarted. (HEXDEV-167)
  • Parse : Parsing random crap gives java.lang.ArrayIndexOutOfBoundsException: 13 (PUBDEV-428)
  • Flow: converting a column to enum while parsing does not work (PUBDEV-566)
  • Parse: Numbers completely parsed wrong (PUBDEV-574)
  • NodePersistentStorage gets wiped out when hadoop cluster is restarted (HEXDEV-185)
  • Parse: Fail gracefully when asked to parse a zip file with different files in it (PUBDEV-540)(github)
  • Building a model and making a prediction accepts invalid frame types (PUBDEV-83)
  • Flow : Import file 15M rows 2.2 Cols => Parse => Error fetching job on UI =>Console : ERROR: Job was not successful Exiting with nonzero exit status (HEXDEV-55)
  • Flow : Build GLM Model => Family tweedy => class hex.glm.LSMSolver$ADMMSolver$NonSPDMatrixException', with msg 'Matrix is not SPD, can't solve without regularization (PUBDEV-211)
  • Flow : Import File : File doesn't exist on all the hdfs nodes => Fails without valid message (PUBDEV-313)
  • Check reproducibility on multi-node vs single-node (PUBDEV-557)
  • Parse: After parsing Chicago crime dataset => Not able to build models or Get frames (PUBDEV-576)

#####Web UI

  • Flow : Build Model => Parameters => shows meta text for some params (PUBDEV-505)
  • Flow: K-Means - "None" option should not appear in "Init" parameters (PUBDEV-459)
  • Flow: PCA - "None" option appears twice in "Transform" list (HEXDEV-186)
  • GBM Model : Params in flow show two times (PUBDEV-440)
  • Flow multinomial confusion matrix visualization (HEXDEV-204)
  • Flow: It would be good if flow can report the actual distribution, instead of just reporting "Auto" in the model parameter listing (PUBDEV-509)
  • Unimplemented algos should be taken out from drop down of build model (PUBDEV-511)
  • [MapR] unable to give hdfs file name from Flow (PUBDEV-409)

###Selberg (0.2.0.1) - 3/6/15 ####New Features

#####Web UI

  • Flow: Delete functionality to be available for import files, jobs, models, frames (PUBDEV-241)
  • Implement "Download Flow" (PUBDEV-407)
  • Flow: Implement "Run All Cells" (PUBDEV-110)

#####API

#####System

  • Add a README.txt to the hadoop zip files (github)
  • Build a cdh5.2 version of h2o (github)

####Enhancements

#####Web UI

#####Algorithms

  • Added K-Means scoring (github)
  • Flow: Implement model output for Deep Learning (PUBDEV-118)
  • Flow: Implement model output for GLM (PUBDEV-120)
  • Deep Learning model output (HEXDEV-89, Flow),(HEXDEV-88, Python),(HEXDEV-87, R)
  • Run GLM Binomial from Flow (including LBFGS) (HEXDEV-90)
  • Flow: Display confusion matrices for multinomial models (PUBDEV-397)
  • During PCA, missing values in training data will be replaced with column mean (github)
  • Update parameters for best model scan (github)
  • Change Quantiles to match h2o-1; both Quantiles and Rollups now have the same default percentiles (github)
  • Massive cleanup and removal of old PCA, replacing with quadratically regularized PCA based on alternating minimization algorithm in GLRM (github)
  • Add model run time to DL Model Output (github)
  • Don't gather Neurons/Weights/Biases statistics (github)
  • Only store best model if override_with_best_model is enabled (github)
  • beta_eps added, passing tests changed (github)
  • For GLM, default values for max_iters parameter were changed from 1000 to 50.
  • For quantiles, probabilities are displayed.
  • Run Deep Learning Multinomial from Flow (HEXDEV-108)

#####API

  • Expose DL weights/biases to clients via REST call (PUBDEV-344)
  • Flow: Implement notification bar/API (PUBDEV-359)
  • Variable importance data in REST output for GLM (PUBDEV-359)
  • Add extra DL parameters to R API (average_activation, sparsity_beta, max_categorical_features, reproducible) (github)
  • Update GLRM API model output (github)
  • h2o.anomaly missing in R (PUBDEV-434)
  • No method to get enum levels (PUBDEV-432)

#####System

  • Improve memory footprint with latest version of h2o-dev (github)
  • For now, let model.delete() of DL delete its best models too. This allows R code to not leak when only calling h2o.rm() on the main model. (github)
  • Bind both TCP and UDP ports before clustering (github)
  • Round summary row#. Helps with pctiles for very small row counts. Add a test to check for getting close to the 50% percentile on small rows. (github)
  • Increase Max Value size in DKV to 256MB (github)
  • Flow: make parseRaw() do both import and parse in sequence (HEXDEV-184)
  • Remove notion of individual job/job tracking from Flow (PUBDEV-449)
  • Capability to name prediction results Frame in flow (PUBDEV-233)

####Bug Fixes

#####Algorithms

  • GLM binomial prediction failing (PUBDEV-403)
  • DL: Predict with auto encoder enabled gives Error processing error (PUBDEV-433)
  • balance_classes in Deep Learning intermittent poor result (PUBDEV-437)
  • Flow: Building GLM model fails (PUBDEV-186)
  • summary returning incorrect 0.5 quantile for 5 row dataset (PUBDEV-95)
  • GBM missing variable importance and balance-classes (PUBDEV-309)
  • H2O Dev GBM first tree differs from H2O 1 (PUBDEV-421)
  • get glm model from flow fails to find coefficient name field (PUBDEV-394)
  • GBM/GLM build model fails on Hadoop after building 100% => Failed to find schema for version: 3 and type: GBMModel (PUBDEV-378)
  • Parsing KDD wrong (PUBDEV-393)
  • GLM AIOOBE (PUBDEV-199)
  • Flow : Build GLM Model with family poisson => java.lang.ArrayIndexOutOfBoundsException: 1 at hex.glm.GLM$GLMLambdaTask.needLineSearch(GLM.java:359) (PUBDEV-210)
  • Flow : GLM Model Error => Enum conversion only works on small integers (PUBDEV-365)
  • GLM binary response, do_classfication=FALSE, family=binomial, prediction error (PUBDEV-339)
  • Epsilon missing from GLM parameters (PUBDEV-354)
  • GLM NPE (PUBDEV-395)
  • Flow: GLM bug (or incorrect output) (PUBDEV-252)
  • GLM binomial prediction failing (PUBDEV-403)
  • GLM binomial on benign.csv gets assertion error in predict (PUBDEV-132)
  • current summary default_pctiles doesn't have 0.001 and 0.999 like h2o1 (PUBDEV-94)
  • Flow: Build GBM/DL Model: java.lang.IllegalArgumentException: Enum conversion only works on integer columns (PUBDEV-213) (github)
  • ModelMetrics on cup98VAL_z dataset has response with many nulls (PUBDEV-214)
  • GBM : Predict model category output/inspect parameters shows as Regression when model is built with do classification enabled (PUBDEV-441)
  • Fix double-precision DRF bugs (github)

#####System

  • Null columnTypes for /smalldata/arcene/arcene_train.data (PUBDEV-406) (github)
  • Flow: Waiting for -1 responses after starting h2o on hadoop cluster of 5 nodes (PUBDEV-419)
  • Parse: airlines_all.csv => Airtime type shows as ENUM instead of Integer (PUBDEV-426) (github)
  • Flow: Typo - "Time" option displays twice in column header type menu in Parse (PUBDEV-446)
  • Duplicate validation messages in k-means output (PUBDEV-305) (github)
  • Fixes Parse so that it returns to supplying generic column names when no column names exist (github)
  • Flow: Import File: File doesn't exist on all the hdfs nodes => Fails without valid message (PUBDEV-313)
  • Flow: Parse => 1m.svm hangs at 42% (HEXDEV-174)
  • Prediction NFE (PUBDEV-308)
  • NPE doing Frame to key before it's fully parsed (PUBDEV-79)
  • h2o_master_DEV_gradle_build_J8 #351 hangs for past 17 hrs (PUBDEV-239)
  • Sparkling water - container exited due to unavailable port (PUBDEV-357)

#####API

  • Flow: Splitframe => java.lang.ArrayIndexOutOfBoundsException (PUBDEV-410) (github)
  • Incorrect dest.type, description in /CreateFrame jobs (PUBDEV-404)
  • space in windows filename on python (PUBDEV-444) (github)
  • Python end-to-end data science example 1 runs correctly (PUBDEV-182)
  • 3/NodePersistentStorage.json/foo/id should throw 404 instead of 500 for 'not-found' (HEXDEV-163)
  • POST /3/NodePersistentStorage.json should handle Content-Type:multipart/form-data (HEXDEV-165)
  • by class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: *** Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122 (PUBDEV-92)
  • Sparkling water : val train:DataFrame = prostateRDD => Fails with ArrayIndexOutOfBoundsException (PUBDEV-392)
  • Flow : getModels produces error: Error calling GET /3/Models.json (PUBDEV-254)
  • Flow : Splitframe => java.lang.ArrayIndexOutOfBoundsException (PUBDEV-410)
  • ddply 'Could not find the operator' (HEXDEV-162) (github)
  • h2o.table AIOOBE during NewChunk creation (HEXDEV-161) (github)
  • Fix warning in h2o.ddply when supplying multiple grouping columns (github)

###0.1.26.1051 - 2/13/15

####New Features

####Enhancements

#####System

  • Embedded H2O config can now provide flat file (needed for Hadoop) (github)
  • Don't logging GET of individual jobs to avoid filling up the logs (github)

#####Algorithms

  • Increase GBM/DRF factor binning back to historical levels. Had been capped accidentally at nbins (typically 20), was intended to support a much higher cap. (github)
  • Tweaked rho heuristic in glm (github)
  • Enable variable importances for autoencoders (github)
  • Removed group_split option from GBM
  • Flow: display varimp for GBM output (PUBDEV-398)
  • variable importance for GBM (github)
  • GLM in H2O-Dev may provide slightly different coefficient values when applying an L1 penalty in comparison with H2O1.

####Bug Fixes

#####Algorithms

  • Fixed bug in GLM exception handling causing GLM jobs to hang (github)
  • Fixed a bug in kmeans input parameter schema where init was always being set to Furthest (github)
  • Fixed mean computation in GLM (github)
  • Fixed kmeans.R (github)
  • Flow: Building GBM model fails with Error executing javascript (PUBDEV-396)

#####System

  • DataFrame propagates absolute path to parser (github)
  • Fix flow shutdown bug (github)

###0.1.26.1032 - 2/6/15

####New Features

#####General Improvements

  • better model output
  • support for Python client
  • support for Maven
  • support for Sparkling Water
  • support for REST API schema
  • support for Hadoop CDH5 (github)

#####UI

  • Display summary visualizations by default in column summary output cells (PUBDEV-337)
  • Display AUC curve by default in binomial prediction output cells (PUBDEV-338)
  • Flow: Implement About H2O/Flow with version information (PUBDEV-111)
  • Add UI for CreateFrame (PUBDEV-218)
  • Flow: Add ability to cancel running jobs (PUBDEV-373)
  • Flow: warn when user navigates away while having unsaved content (PUBDEV-322)

#####Algorithms

#####API

#####System

####Enhancements

#####UI

  • Added better message when h2o.init() not yet called (No active connection to an H2O cluster. Try calling "h2o.init()") (github)

#####Algorithms

  • Updated column-based gradient task to use sparse interface (github)
  • Updated LBFGS (added progress monitor interface, updated some default params), added progress and job support to GLM lbfgs (github)
  • Added pretty print (github)
  • Added AutoEncoder to R model categories (github)
  • Added Coefficients table to GLM model (github)
  • Updated glm lbfgs to allow for efficient lambda-search (l2 penalty only) (github)
  • Removed splitframe shuffle parameter (github)
  • Simplified model builders and added deeplearning model builder (github)
  • Add DL model outputs to Flow (PUBDEV-372)
  • Flow: Deep Learning: Expert Mode (PUBDEV-284)
  • Flow: Display multinomial and regression DL model outputs (PUBDEV-383)
  • Display varimp details for DL models (PUBDEV-381)
  • Make binomial response "0" and "1" by default (github)
  • Add Coefficients table to GLM model (github)
  • Removed splitframe shuffle parameter (github)
  • Update R GBM demos to reflect new input parameter names (github)
  • Rename GLM variable importance to normalized coefficient magnitudes (github)

#####API

  • Changed key to destination_key (github)
  • Cleaned up REST API schema interface (github)
  • Changed method name, cleaned setup, added a pyunit runner (github)

#####System

####Bug Fixes

#####UI

  • Flow: Parse => 1m.svm hangs at 42% (PUBDEV-345)
  • cup98 Dataset has columns that prevent validation/prediction (PUBDEV-349)
  • Flow: predict step failed to function (PUBDEV-217)
  • Flow: Arrays of numbers (ex. hidden in deeplearning)require brackets (PUBDEV-303)
  • Flow v.0.1.26.1030: StackTrace was broken (PUBDEV-371)
  • Flow: Import files -> Search -> Parse these files -> null pointer exception (PUBDEV-170)
  • Flow: "getJobs" not working (PUBDEV-320)
  • Thresholds x Metrics and Max Criteria x Metrics tables were flipped in flow (HEXDEV-155)
  • Flow v.0.1.26.1030: StackTrace is broken (PUBDEV-348)
  • flow: getJobs always shows "Your H2O cloud has no jobs" (PUBDEV-243)
  • Flow: First and last characters deleted from ignored columns (PUBDEV-300)
  • Sparkling water => Flow => Menu buttons for cell do not show up (PUBDEV-294)

#####Algorithms

  • Flow: Build K Means model with default K value gives error "Required field k not specified" (PUBDEV-167)
  • Slicing out a specific data point is broken (PUBDEV-280)
  • Flow: SplitFrame and grep in algorithms for flow and loops back onto itself (PUBDEV-272)
  • Fixed the predict method (github)
  • Refactor ModelMetrics into a different class for Binomial (github)
  • /Predictions.json did not cache predictions (HEXDEV-119)
  • Flow, DL: Error after changing hidden layer size (PUBDEV-323)
  • Error in node$h2o#node: $ operator is invalid for atomic vectors (PUBDEV-348)
  • Fixed K-means predict (PUBDEV-321)
  • Flow: DL build mode fails => as it's missing adding quotes to parameter (PUBDEV-301)
  • Flow: Build K means model with training/validation frames => unknown error (PUBDEV-185)
  • Flow: Build quantile mode=> Click goes in loop (PUBDEV-188)

#####API

#####System

  • guesser needs to send types to parse (PUBDEV-279)
  • Got h2o.clusterStatus function working in R. (github)
  • Parse: Using R => java.lang.NullPointerException (PUBDEV-380)
  • Flow: Jobs => click on destination key => unimplemented: Unexpected val class for Inspect: class water.fvec.DataFrame (PUBDEV-363)
  • Column assignment in R exposes NullPointerException in Rollup (PUBDEV-155)
  • import from hdfs doesn't add files (PUBDEV-260)
  • AssertionError: ERROR: got tcp resend with existing in-progress task (PUBDEV-219)
  • HDFS parse fails when H2O launched on Spark CDH5 (PUBDEV-138)
  • Flow: Parse failure => java.lang.ArrayIndexOutOfBoundsException (PUBDEV-296)
  • "predict" step is not working in flow (PUBDEV-202)
  • Flow: Frame finishes parsing but comes up as null in flow (PUBDEV-270)
  • scala >flightsToORD.first() fails with "not serializable result" (PUBDEV-304)
  • DL throws NPE for bad column names (PUBDEV-15)
  • Flow: Build model: Not able to build KMeans/Deep Learning model (PUBDEV-297)
  • Flow: Col summary for NA/Y cols breaks (PUBDEV-325)
  • Sparkling Water : util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread NanoHTTPD Session,9,main (PUBDEV-346)
  • toDataFrame doesn't support sequence format schema (array, vectorUDT) (PUBDEV-457)

###0.1.20.1019 - 1/19/15

####New Features

#####UI

  • Added various documentation links to the build page (github)

#####Algorithms

  • Ported matrix multiply over and connected it to rapids (github)

####Enhancements

#####UI

  • Allow user to specify (the log of) the number of rows per chunk for a new constant chunk; use this new function in CreateFrame (github)
  • Make CreateFrame non-blocking, now displays progress bar in Flow (github)
  • Add row and column count to H2OFrame show method (github)
  • Admin watermeter page (PUBDEV-234)
  • Admin stack trace (PUBDEV-228)
  • Admin profile (PUBDEV-227)
  • Flow: Add download logs in UI (PUBDEV-204)
  • Need shutdown, minimally like h2o (PUBDEV-74)

#####API

  • Changed 2 to 3 for JSON requests (github)
  • Rename some more fields per consistency (max_iters changed to max_iterations, _iters to _iterations, _ncats to _categorical_column_count, _centersraw to centers_raw, _avgwithinss to avg_within_ss, _withinmse to within_mse) (github)
  • Changed K-Means output parameters (withinmse to within_mse, avgss to avg_ss, avgbetweenss to avg_between_ss) (github)
  • Remove default field values from DeepLearning parameters schema, since they come from the backing class (github)
  • Add @API help annotation strings to JSON model output (PUBDEV-216)

#####Algorithms

  • Minor fix in rapids matrix multiplicaton (github)
  • Updated sparse chunk to cut off binary search for prefix/suffix zeros (github)
  • Updated L_BFGS for GLM - warm-start solutions during lambda search, correctly pass current lambda value, added column-based gradient task (github)
  • Fix model parameters' default values in the metadata (github)
  • Set default value of k = number of clusters to 1 for K-Means (PUBDEV-251)

#####System

  • Reject any training data with non-numeric values from KMeans model building (github)

####Bug Fixes

#####API

  • Fixed isSparse call for constant chunks (github)
  • Fixed sparse interface of constant chunks (no nonzero if const 1= 0) (github)

#####System

  • Typeahead for folder contents apparently requires trailing "/" (github)
  • Fix build and instructions for R install.packages() style of installation; Note we only support source installs now (github)
  • Fixed R test runner h2o package install issue that caused it to fail to install on dev builds (github)

###0.1.18.1013 - 1/14/15

####New Features

#####UI

####Enhancements

#####Algorithms


###0.1.20.1016 - 12/28/14

  • Added ip_port field in node json output for Cloud query (github)