#Recent Changes
##H2O-Dev
###Selberg (0.2.0.1) - 3/6/15 ####New Features The following features have been added since the last release:
#####Web UI
- Flow: Delete functionality to be available for import files, jobs, models, frames (PUBDEV-241)
- Implement "Download Flow" (PUBDEV-407)
- Flow: Implement "Run All Cells" (PUBDEV-110)
#####API
- Create python package (PUBDEV-181)
- as.h2o in Python (HEXDEV-72)
#####System
####Enhancements
The following changes are improvements to existing features (which includes changed default values):
#####Web UI
- Flow: Job view should have info on start and end time (PUBDEV-267)
- Flow: Implement 'File > Open' (PUBDEV-408)
- Display IP address in ADMIN -> Cluster Status (HEXDEV-159)
- Flow: Display alternate UI for splitFrames() (PUBDEV-399)
#####Algorithms
- Added K-Means scoring (github)
- Flow: Implement model output for Deep Learning (PUBDEV-118)
- Flow: Implement model output for GLM (PUBDEV-120)
- Deep Learning model output (HEXDEV-89, Flow),(HEXDEV-88, Python),(HEXDEV-87, R)
- Run GLM Binomial from Flow (including LBFGS) (HEXDEV-90)
- Flow: Display confusion matrices for multinomial models (PUBDEV-397)
- During PCA, missing values in training data will be replaced with column mean (github)
- Update parameters for best model scan (github)
- Change Quantiles to match h2o-1; both Quantiles and Rollups now have the same default percentiles (github)
- Massive cleanup and removal of old PCA, replacing with quadratically regularized PCA based on alternating minimization algorithm in GLRM (github)
- Add model run time to DL Model Output (github)
- Don't gather Neurons/Weights/Biases statistics (github)
- Only store best model if
override_with_best_modelis enabled (github) beta_epsadded, passing tests changed (github)- For GLM, default values for
max_itersparameter were changed from 1000 to 50. - For quantiles, probabilities are displayed.
- Run Deep Learning Multinomial from Flow (HEXDEV-108)
#####API
- Expose DL weights/biases to clients via REST call (PUBDEV-344)
- Flow: Implement notification bar/API (PUBDEV-359)
- Variable importance data in REST output for GLM (PUBDEV-359)
- Add extra DL parameters to R API (
average_activation, sparsity_beta, max_categorical_features, reproducible) (github) - Update GLRM API model output (github)
- h2o.anomaly missing in R (PUBDEV-434)
- No method to get enum levels (PUBDEV-432)
#####System
- Improve memory footprint with latest version of h2o-dev (github)
- For now, let model.delete() of DL delete its best models too. This allows R code to not leak when only calling h2o.rm() on the main model. (github)
- Bind both TCP and UDP ports before clustering (github)
- Round summary row#. Helps with pctiles for very small row counts. Add a test to check for getting close to the 50% percentile on small rows. (github)
- Increase Max Value size in DKV to 256MB (github)
- Flow: make parseRaw() do both import and parse in sequence (HEXDEV-184)
- Remove notion of individual job/job tracking from Flow (PUBDEV-449)
- Capability to name prediction results Frame in flow (PUBDEV-233)
####Bug Fixes
The following changes are to resolve incorrect software behavior:
#####Algorithms
- GLM binomial prediction failing (PUBDEV-403)
- DL: Predict with auto encoder enabled gives Error processing error (PUBDEV-433)
- balance_classes in Deep Learning intermittent poor result (PUBDEV-437)
- Flow: Building GLM model fails (PUBDEV-186)
- summary returning incorrect 0.5 quantile for 5 row dataset (PUBDEV-95)
- GBM missing variable importance and balance-classes (PUBDEV-309)
- H2O Dev GBM first tree differs from H2O 1 (PUBDEV-421)
- get glm model from flow fails to find coefficient name field (PUBDEV-394)
- GBM/GLM build model fails on Hadoop after building 100% => Failed to find schema for version: 3 and type: GBMModel (PUBDEV-378)
- Parsing KDD wrong (PUBDEV-393)
- GLM AIOOBE (PUBDEV-199)
- Flow : Build GLM Model with family poisson => java.lang.ArrayIndexOutOfBoundsException: 1 at hex.glm.GLM$GLMLambdaTask.needLineSearch(GLM.java:359) (PUBDEV-210)
- Flow : GLM Model Error => Enum conversion only works on small integers (PUBDEV-365)
- GLM binary response, do_classfication=FALSE, family=binomial, prediction error (PUBDEV-339)
- Epsilon missing from GLM parameters (PUBDEV-354)
- GLM NPE (PUBDEV-395)
- Flow: GLM bug (or incorrect output) (PUBDEV-252)
- GLM binomial prediction failing (PUBDEV-403)
- GLM binomial on benign.csv gets assertion error in predict (PUBDEV-132)
- current summary default_pctiles doesn't have 0.001 and 0.999 like h2o1 (PUBDEV-94)
- Flow: Build GBM/DL Model: java.lang.IllegalArgumentException: Enum conversion only works on integer columns (PUBDEV-213) (github)
- ModelMetrics on cup98VAL_z dataset has response with many nulls (PUBDEV-214)
- GBM : Predict model category output/inspect parameters shows as Regression when model is built with do classification enabled (PUBDEV-441)
#####System
- Null columnTypes for /smalldata/arcene/arcene_train.data (PUBDEV-406) (github)
- Flow: Waiting for -1 responses after starting h2o on hadoop cluster of 5 nodes (PUBDEV-419)
- Parse: airlines_all.csv => Airtime type shows as ENUM instead of Integer (PUBDEV-426) (github)
- Flow: Typo - "Time" option displays twice in column header type menu in Parse (PUBDEV-446)
- Duplicate validation messages in k-means output (PUBDEV-305) (github)
- Fixes Parse so that it returns to supplying generic column names when no column names exist (github)
- Flow: Import File: File doesn't exist on all the hdfs nodes => Fails without valid message (PUBDEV-313)
- Flow: Parse => 1m.svm hangs at 42% (HEXDEV-174)
- Prediction NFE (PUBDEV-308)
- NPE doing Frame to key before it's fully parsed (PUBDEV-79)
h2o_master_DEV_gradle_build_J8#351 hangs for past 17 hrs (PUBDEV-239)- Sparkling water - container exited due to unavailable port (PUBDEV-357)
#####API
- Flow: Splitframe => java.lang.ArrayIndexOutOfBoundsException (PUBDEV-410) (github)
- Incorrect dest.type, description in /CreateFrame jobs (PUBDEV-404)
- space in windows filename on python (PUBDEV-444)
- Python end-to-end data science example 1 runs correctly (PUBDEV-182)
- 3/NodePersistentStorage.json/foo/id should throw 404 instead of 500 for 'not-found' (HEXDEV-163)
- POST /3/NodePersistentStorage.json should handle Content-Type:multipart/form-data (HEXDEV-165)
- by class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: *** Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122 (PUBDEV-92)
- Sparkling water : val train:DataFrame = prostateRDD => Fails with ArrayIndexOutOfBoundsException (PUBDEV-392)
- Flow : getModels produces error: Error calling GET /3/Models.json (PUBDEV-254)
- Flow : Splitframe => java.lang.ArrayIndexOutOfBoundsException (PUBDEV-410)
- ddply 'Could not find the operator' (HEXDEV-162) (github)
- h2o.table AIOOBE during NewChunk creation (HEXDEV-161) (github)
- Fix warning in h2o.ddply when supplying multiple grouping columns (github)
###0.1.26.1051 - 2/13/15
####New Features
- Flow: Display alternate UI for splitFrames() (PUBDEV-399)
####Enhancements
#####System
- Embedded H2O config can now provide flat file (needed for Hadoop) (github)
- Don't logging GET of individual jobs to avoid filling up the logs (github)
#####Algorithms
- Increase GBM/DRF factor binning back to historical levels. Had been capped accidentally at nbins (typically 20), was intended to support a much higher cap. (github)
- Tweaked rho heuristic in glm (github)
- Enable variable importances for autoencoders (github)
- Removed
group_splitoption from GBM - Flow: display varimp for GBM output (PUBDEV-398)
- variable importance for GBM (github)
- GLM in H2O-Dev may provide slightly different coefficient values when applying an L1 penalty in comparison with H2O1.
####Bug Fixes
#####Algorithms
- Fixed bug in GLM exception handling causing GLM jobs to hang (github)
- Fixed a bug in kmeans input parameter schema where init was always being set to Furthest (github)
- Fixed mean computation in GLM (github)
- Fixed kmeans.R (github)
- Flow: Building GBM model fails with Error executing javascript (PUBDEV-396)
#####System
###0.1.26.1032 - 2/6/15
####New Features
#####General Improvements
- better model output
- support for Python client
- support for Maven
- support for Sparkling Water
- support for REST API schema
- support for Hadoop CDH5 (github)
#####UI
- Display summary visualizations by default in column summary output cells (PUBDEV-337)
- Display AUC curve by default in binomial prediction output cells (PUBDEV-338)
- Flow: Implement About H2O/Flow with version information (PUBDEV-111)
- Add UI for CreateFrame (PUBDEV-218)
- Flow: Add ability to cancel running jobs (PUBDEV-373)
- Flow: warn when user navigates away while having unsaved content (PUBDEV-322)
#####Algorithms
- Implement splitFrame() in Flow (PUBDEV-356)
- Variable importance graph in Flow for GLM (PUBDEV-360)
- Flow: Implement model building form init and validation (PUBDEV-102)
- Added a shuffle-and-split-frame function; Use it to build a saner model on time-series data (github)
- Added binomial model metrics (github)
- Run KMeans from R (HEXDEV-105)
- Be able to create a new GLM model from an existing one with updated coefficients (HEXDEV-48)
- Run KMeans from Python (HEXDEV-106)
- Run Deep Learning Binomial from Flow (HEXDEV-83)
- Run KMeans from Flow (HEXDEV-104)
- Run Deep Learning from Python (HEXDEV-85)
- Run Deep Learning from R (HEXDEV-84)
- Run Deep Learning Multinomial from Flow (HEXDEV-108)
- Run Deep Learning Regression from Flow (HEXDEV-109)
#####API
- Flow: added REST API documentation to the web ui (PUBDEV-60)
- Flow: Implement visualization API (PUBDEV-114)
#####System
- Dataset inspection from Flow (HEXDEV-66)
- Basic data munging (Rapids) from R (HEXDEV-70)
- Implement stack operator/stacking in Lightning (HEXDEV-128)
####Enhancements
#####UI
- Added better message when h2o.init() not yet called (
No active connection to an H2O cluster. Try calling "h2o.init()") (github)
#####Algorithms
- Updated the loss behavior for GBM. When loss is set to AUTO, if the response is an integer with 2 levels, then bernoullli (rather than gaussian) behavior is chosen. As a result, the
do_classificationflag is no longer necessary in Flow, since the loss completely specifies the desired behavior, and R users no longer to useas.factor()in their response to get the desired bernoulli behavior. - Updated column-based gradient task to use sparse interface (github)
- Updated LBFGS (added progress monitor interface, updated some default params), added progress and job support to GLM lbfgs (github)
- Added pretty print (github)
- Added AutoEncoder to R model categories (github)
- Added Coefficients table to GLM model (github)
- Updated glm lbfgs to allow for efficient lambda-search (l2 penalty only) (github)
- Removed splitframe shuffle parameter (github)
- Simplified model builders and added deeplearning model builder (github)
- Add DL model outputs to Flow (PUBDEV-372)
- Flow: Deep Learning: Expert Mode (PUBDEV-284)
- Flow: Display multinomial and regression DL model outputs (PUBDEV-383)
- Display varimp details for DL models (PUBDEV-381)
- Make binomial response "0" and "1" by default (github)
- Add Coefficients table to GLM model (github)
- Removed splitframe shuffle parameter (github)
- Update R GBM demos to reflect new input parameter names (github)
- Rename GLM variable importance to normalized coefficient magnitudes (github)
#####API
- Changed
keytodestination_key(github) - Cleaned up REST API schema interface (github)
- Changed method name, cleaned setup, added a pyunit runner (github)
#####System
- Allow changing column types during parse-setup (PUBDEV-376)
- Display %NAs in model builder column lists (PUBDEV-375)
- Figure out how to add H2O to PyPl (PUBDEV-178)
####Bug Fixes
#####UI
- Flow: Parse => 1m.svm hangs at 42% (PUBDEV-345)
- cup98 Dataset has columns that prevent validation/prediction (PUBDEV-349)
- Flow: predict step failed to function (PUBDEV-217)
- Flow: Arrays of numbers (ex. hidden in deeplearning)require brackets (PUBDEV-303)
- Flow v.0.1.26.1030: StackTrace was broken (PUBDEV-371)
- Flow: Import files -> Search -> Parse these files -> null pointer exception (PUBDEV-170)
- Flow: "getJobs" not working (PUBDEV-320)
- Thresholds x Metrics and Max Criteria x Metrics tables were flipped in flow (HEXDEV-155)
- Flow v.0.1.26.1030: StackTrace is broken (PUBDEV-348)
- flow: getJobs always shows "Your H2O cloud has no jobs" (PUBDEV-243)
- Flow: First and last characters deleted from ignored columns (PUBDEV-300)
- Sparkling water => Flow => Menu buttons for cell do not show up (PUBDEV-294)
#####Algorithms
- Flow: Build K Means model with default K value gives error "Required field k not specified" (PUBDEV-167)
- Slicing out a specific data point is broken (PUBDEV-280)
- Flow: SplitFrame and grep in algorithms for flow and loops back onto itself (PUBDEV-272)
- Fixed the predict method (github)
- Refactor ModelMetrics into a different class for Binomial (github)
- /Predictions.json did not cache predictions (HEXDEV-119)
- Flow, DL: Error after changing hidden layer size (PUBDEV-323)
- Error in node$h2o#node: $ operator is invalid for atomic vectors (PUBDEV-348)
- Fixed K-means predict (PUBDEV-321)
- Flow: DL build mode fails => as it's missing adding quotes to parameter (PUBDEV-301)
- Flow: Build K means model with training/validation frames => unknown error (PUBDEV-185)
- Flow: Build quantile mode=> Click goes in loop (PUBDEV-188)
#####API
- Sparkling Water/Flow: Failed to find version for schema (PUBDEV-367)
- Cloud.json returns odd node name (PUBDEV-259)
#####System
- guesser needs to send types to parse (PUBDEV-279)
- Got h2o.clusterStatus function working in R. (github)
- Parse: Using R => java.lang.NullPointerException (PUBDEV-380)
- Flow: Jobs => click on destination key => unimplemented: Unexpected val class for Inspect: class water.fvec.DataFrame (PUBDEV-363)
- Column assignment in R exposes NullPointerException in Rollup (PUBDEV-155)
- import from hdfs doesn't add files (PUBDEV-260)
- AssertionError: ERROR: got tcp resend with existing in-progress task (PUBDEV-219)
- HDFS parse fails when H2O launched on Spark CDH5 (PUBDEV-138)
- Flow: Parse failure => java.lang.ArrayIndexOutOfBoundsException (PUBDEV-296)
- "predict" step is not working in flow (PUBDEV-202)
- Flow: Frame finishes parsing but comes up as null in flow (PUBDEV-270)
- scala >flightsToORD.first() fails with "not serializable result" (PUBDEV-304)
- DL throws NPE for bad column names (PUBDEV-15)
- Flow: Build model: Not able to build KMeans/Deep Learning model (PUBDEV-297)
- Flow: Col summary for NA/Y cols breaks (PUBDEV-325)
- Sparkling Water : util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread NanoHTTPD Session,9,main (PUBDEV-346)
###0.1.20.1019 - 1/19/15
####New Features
#####UI
- Added various documentation links to the build page (github)
#####Algorithms
- Ported matrix multiply over and connected it to rapids (github)
####Enhancements
#####UI
- Allow user to specify (the log of) the number of rows per chunk for a new constant chunk; use this new function in CreateFrame (github)
- Make CreateFrame non-blocking, now displays progress bar in Flow (github)
- Add row and column count to H2OFrame show method (github)
- Admin watermeter page (PUBDEV-234)
- Admin stack trace (PUBDEV-228)
- Admin profile (PUBDEV-227)
- Flow: Add download logs in UI (PUBDEV-204)
- Need shutdown, minimally like h2o (PUBDEV-74)
#####API
- Changed 2 to 3 for JSON requests (github)
- Rename some more fields per consistency (
max_iterschanged tomax_iterations,_itersto_iterations,_ncatsto_categorical_column_count,_centersrawtocenters_raw,_avgwithinsstoavg_within_ss,_withinmsetowithin_mse) (github) - Changed K-Means output parameters (
withinmsetowithin_mse,avgsstoavg_ss,avgbetweensstoavg_between_ss) (github) - Remove default field values from DeepLearning parameters schema, since they come from the backing class (github)
- Add @API help annotation strings to JSON model output (PUBDEV-216)
#####Algorithms
- Minor fix in rapids matrix multiplicaton (github)
- Updated sparse chunk to cut off binary search for prefix/suffix zeros (github)
- Updated L_BFGS for GLM - warm-start solutions during lambda search, correctly pass current lambda value, added column-based gradient task (github)
- Fix model parameters' default values in the metadata (github)
- Set default value of k = number of clusters to 1 for K-Means (PUBDEV-251)
#####System
- Reject any training data with non-numeric values from KMeans model building (github)
####Bug Fixes
#####API
- Fixed isSparse call for constant chunks (github)
- Fixed sparse interface of constant chunks (no nonzero if const 1= 0) (github)
#####System
- Typeahead for folder contents apparently requires trailing "/" (github)
- Fix build and instructions for R install.packages() style of installation; Note we only support source installs now (github)
- Fixed R test runner h2o package install issue that caused it to fail to install on dev builds (github)
###0.1.18.1013 - 1/14/15
####New Features
#####UI
- Admin timeline (PUBDEV-226)
- Admin cluster status (PUBDEV-225)
- Markdown cells should auto run when loading a saved Flow notebook (PUBDEV-87)
- Complete About page to include info about the H2O version (PUBDEV-223)
####Enhancements
#####Algorithms
- Flow: Implement model output for GBM (PUBDEV-119)
###0.1.20.1016 - 12/28/14
- Added ip_port field in node json output for Cloud query (github)