#Recent Changes
##H2O
###Shannon (3.0.0.24) - 6/25/15
####New Features
The following changes represent features that have been added since the previous release:
#####Algorithms
- GitHub: Allow validation for unsupervised models.
#####R
#####Python
- GitHub: add h2o.set_timezone h2o.get_timezone and h2o.list_timezones to python client and respective pyunit.
- GitHub: add h2o.save_model and h2o.load_model to python client and respective pyunit
####Enhancements
The following changes are improvements to existing features (which includes changed default values):
#####Algorithms
- GitHub: Fix weights for GBM - add weight correction to Gamma computation.
- GitHub: Skip rows with weight 0.
- GitHub: x_ignore must be set when autoencoder is TRUE
#####System
- GitHub: Fix Java bindings generator to generate code under project's location.
- GitHub: Adds input parameter check to ParseSetup.
####Bug Fixes
The following changes are to resolve incorrect software behavior:
#####Algorithms
- PUBDEV-1529: dl with ae: get ava.lang.UnsupportedOperationException: Trying to predict with an unstable model.
- GitHub: Bring back accidentally removed hiding of classification-related fields for unsupervised models.
#####API
- PUBDEV-1456: fix REST API POJO generation for enums, + java.util.map import
###Shannon (3.0.0.23) - 6/19/15
####New Features
#####Algorithms
- HEXDEV-21: Offset for GLM
- HEXDEV-208: Add observation weights to GLM (was HEXDEV-4)
- PUBDEV-677: Add observation weights to all metrics
- PUBDEV-675: Pass a weight Vec as input to all algos
- HEXDEV-6: Add observation weights to GBM
- HEXDEV-7: Add observation weights to DL
- HEXDEV-10: Add observation weights to DRF
- PUBDEV-291: Add observation weights to GLM, GBM, DRF, DL (classification)
- HEXDEV-332: Support Offsets for DL GitHub
- GitHub: Use weights/offsets in GBM.
#####API
- PUBDEV-61: do back-end work to allow document navigation from one Schema to another
- PUBDEV-133: doing summary means calling it with each columns name, index not supported?
#####Python
- GitHub: add num_iterations accessor to python client and respective pyunit
- GitHub: add score_history accessor to python client and respective pyunit
- GitHub: add hit ratio table accessor to python interface and respective pyunit
- GitHub: add h2o.naivebayes and respective pyunits
- GitHub: add h2o.prcomp and respective pyunits.
- PUBDEV-681: Add user-given input weight parameters to Python
- GitHub: add h2o.create_frame to python client and respective pyunit
- GitHub: add h2o.interaction and respective pyunit
- GitHub: add h2o.strplit to python client and respective pyunit
- GitHub: add h2o.toupper and h2o.tolower to python client and respective pyunit
- GitHub: add h2o.sub and h2o.gsub to python interface and respective pyunit
- GitHub: add h2o.trim() to python client and respective pyunit
- GitHub: add h2o.rep_len to python client and respective pyunit
- GitHub: add h2o.svd to python client and respective golden pyunit
- GitHub: add scree plot functionality to python client and respective pyunit
- GitHub: add plotting functionality to python client and respective pyunit
#####R
- GitHub: added h2o.weights and h2o.biases accessors to R client and update respective runit
- GitHub: add h2o.centroid_stats to R client and respective runit
- PUBDEV-680: Add user-given input weight parameters to R
- GitHub: Add offset/weights to DRF/GBM R wrappers.
#####Web UI
- PUBDEV-1513: Add cancelJob() routine to Flow
####Enhancements
#####Algorithms
- PUBDEV-676: Use the user-given weight Vec as observation weights for all algos
- GitHub: Refactor the code to let the caller compute the weighted sigma.
- GitHub: Modify prior class distribution to be computed from weighted response.
- GitHub: Put back the defaultThreshold that's based on training/validation metrics. Was accidentally removed together with SupervisedModel.
- GitHub: Always sample to at least #class labels when doing stratified sampling.
- GitHub: Cutout for NAs in GLM score0(data[],...), same as for score0(Chunk[],…)
#####R
- PUBDEV-856: All h2o things in R should have an
h2o.something
version so it's unambiguous GitHub - GitHub: export clusterIsUp and clusterInfo commands
- GitHub: update accessors in the shim
- GitHub: gbm with async exec
#####System
- HEXDEV-361: Wide frame handling for model builders
- GitHub: Remove application plugin from assembly to speedup build process.
- GitHub: add byteSize to ls
- GitHub: option to launch randomForest async
- GitHub: Return HDFS persist manager for URIs starting with s3n and s3a
- GitHub: quote strings when writing to disk
####Bug Fixes
#####Algorithms
- PUBDEV-1217: pca: when cancel the job the key remains locked
- PUBDEV-1468: Error in GBM if response column is constant GitHub
- PUBDEV-1476: dl with obs weights: nas in weights cause 'java.lang.AssertionError GitHub
- PUBDEV-1458: pca: data with nas, v2 vs v3 slightly different results GitHub
- PUBDEV-1477: dl w/obs wts: when all wts are zero, get java.lang.AssertionError GitHub
- GitHub: Fix check for offset (allow offset for logistic regression).
- GitHub: Gracefully handle exception when launching single-node DRF/GBM in client mode.
- GitHub: Hack around the fact that hasWeights()/hasOffset() isn't available on remote nodes and that SharedTree is sent to remote nodes and its private internal classes need access to the above methods...
- GitHub: Fix scoring when NAs are predicted.
#####Python
- PUBDEV-1469: pyunit_citi_bike_large.py : test failing consistently on regression jobs
- PUBDEV-1472: Regression job : Pyunit small tests groupie and pub_444_spaces failing consistently
- PUBDEV-1372: Regression of pyunit_small, Groupby.py
- PUBDEV-1386: intermittent fail in pyunit_citi_bike_small.py: -Unimplemented- failed lookup on token
- PUBDEV-1471: pyunit_citi_bike_small.py : failing consistently on regression jobs
- PUBDEV-1466: matplotlib.pyplot import failure on MASTER jenkins pyunit small jobs GitHub
- GitHub: minor fix to python's h2o.create_frame
- GitHub: update the path to jar in connection.py
#####R
- PUBDEV-1475: Client mode failed tests : runit_GBM_one_node.R, runit_RF_one_node.R, runit_v_3_apply.R, runit_v_4_createfunctions.R GitHub
- PUBDEV-1235: Split Frame causes AIOOBE on Chicago crimes data GitHub
- PUBDEV-746: runit_demo_NOPASS_h2o_impute_R : h2o.impute() is missing. seems like we want that?
- PUBDEV-582: H2O-R- does not give the full column summary
- PUBDEV-1473: Regression : Runit small jobs failing on tests :
- PUBDEV-741: runit_NOPASS_pub-668 R tests uses all() ...h2o says all is unimplemented
- PUBDEV-1506: R: h2o.ls() needs to return data sizes
- PUBDEV-1436: Intermitent runit fail : runit_GBM_ecology.R GitHub
- PUBDEV-1464: R: toupper/tolower don't work GitHub GitHub
- PUBDEV-1194: R: dataset is imported but can't return head of frame
#####Sparkling Water
- PUBDEV-975: Download page for Sparkling Water should point to the right R-client and Python client
- PUBDEV-1428: Sparkling water => Flow => Million song/KDD Cup path issues GitHub
- PUBDEV-1433: Flow UI: Change Help > FAQ link to h2o-docs/index.html#FAQ
###Shannon (3.0.0.22) - 6/13/15
####New Features
#####API
- PUBDEV-633: Generate Java bindings for REST API: POJOs for the entities (schemas)
#####Python
- GitHub: added h2o.anyfactor() and respective pyunit
- GitHub: add h2o.scale and respective pyunit
- GitHub: added levels, nlevels, setLevel and setLevels and respective pyunit...PUBDEV-1434 PUBDEV-1437 PUBDEV-1434 PUBDEV-1345 PUBDEV-1311
- GitHub: add H2OFrame.as_date and pyunit addition. H2OFrame.setLevel should return a H2OFrame not a H2OVec.
####Enhancements
#####Algorithms
- GitHub: Add
_build_tree_one_node
option to GBM
- HEXDEV-352: Additional attributes on /Frames and /Frames/foo/summary
#####R
- PUBDEV-706: Release h2o-dev to CRAN
- Adding parameter
parse_type
to upload/import file (GitHub)
#####Python
#####System
- PUBDEV-717: refector the duplicated code in FramesV2
- PUBDEV-1281: Add horizontal pagination of frames to Flow GitHub
- PUBDEV-607: Add Xmx reporting to GA
- GitHub:Added support for Freezable[][][] in serialization (added addAAA to auto buffer and DocGen, DocGen will just throw H2O.fail())
- GitHub: No longer set yyyy-MM-dd and dd-MMM-yy dates that precede the epoch to be NA. Negative time values are fine. This unifies these two time formats with the behavior of as.Date.
- GitHub: Reduces the verbosity of parse tracing messages.
- GitHub: Rename AUTO->GUESS for figuring out file type.
- HEXDEV-276: Add frame pagination
- PUBDEV-1405: Flow : Decision to be made on display of number of columns for wider datasets for Parse and Frame summary
- PUBDEV-1404: Usability improvements
- PUBDEV-244: "View Data" display may need to be modified/shortened.
####Bug Fixes
#####Algorithms
- PUBDEV-1365: GLM: Buggy when likelihood equals infinity
- PUBDEV-1394: GLM: Some offsets hang
- PUBDEV-1268: GLM: get java.lang.AssertionError at hex.glm.GLM$GLMSingleLambdaTsk.compute2 for attached data
- PUBDEV-1403: pca: h2o-3 reporting incorrect proportion of variance and cum prop GitHub
- HEXDEV-281: GLM - beta constraints with categorical variables fails with AIOOB
- HEXDEV-280: GLM - gradient not within tolerance when specifying beta_constraints w/ and w/o prior values
- PUBDEV-1425: Class Cast Exception ValStr to ValNum GitHub
- PUBDEV-1421: python client parse fail on hdfs /datasets/airlines/airlines.test.csv
- PUBDEV-1153: Demo: Airlines Demo in Python GitHub
- PUBDEV-1286: Python ifelse on H2OFrame never finishes
- PUBDEV-1435: Run.py modify to accept phantomjs timeout command line option GitHub
- PUBDEV-1154: Demo: Chicago Crime Demo in R
- PUBDEV-1240: Merge causes IllegalArgumentException
- PUBDEV-1447: R: no argument parser_type in h2o.uploadFile/h2o.importFile (GitHub)
- PUBDEV-1423: Phantomjs : Add timeout command line option
- PUBDEV-1401: Flow : Import file 15 M Rows 2.2K cols=> Parse these files => Change first column type => Unknown => Try to change other columns => Kind of hangs
- PUBDEV-1406: make the ParseSetup / Parse API more efficient for high column counts GitHub
###Shannon (3.0.0.21) - 6/12/15
####New Features
- HEXDEV-29: The ability to define features as categorical or continuous in the web UI and in the python API
####Enhancements
#####Algorithms
- GitHub Made intercept option public and added it to field list in parameter schema
- GitHub GLM: Updated null model intercept fit.
- GitHub GLM: Updated null-model constant term fitting when running with offset
- GitHub glm update
- GitHub DL code refactoring to reduce file sizes
#####Python
- GitHub add h2o.round() and h2o.signif() and additional pyunit checks
- GitHub add h2o.all() and respective pyunit checks
#####R
- GitHub added intercept option top R
#####System
- PUBDEV-607: Add Xmx reporting to GA GitHub
- GitHub Add horizontal pagination of /Frames to handle UI navigation of wide datasets more efficiently.
- GitHub Only show the top 7 metrics for the max metrics table
- GitHub Make the max metrics table entries be called
max f1
etc.
####Bug Fixes
The following changes are to resolve incorrect software behavior:
- PUBDEV-1365: GLM: Buggy when likelihood equals infinity GitHub
- PUBDEV-1394: GLM: Some offsets hang
- PUBDEV-1268: GLM: get java.lang.AssertionError at hex.glm.GLM$GLMSingleLambdaTsk.compute2 for attached data
- PUBDEV-1382: pca: giving wrong std- dev for mentioned data
- PUBDEV-1383: pca: std dev numbers differ for v2 and v3 for attached data GitHub
- PUBDEV-1381: GBM, RF: get an NPE when run with a validation set with no response GitHub
- GitHub GLM fix - fixed fitting of null model constant term
- GitHub Fix remote bug
- GitHub Remove elastic averaging parameters from Flow.
- PUBDEV-1398: pca: predictions on the attached data from v2 and v3 differ
- PUBDEV-1286: Python ifelse on H2OFrame never finishes GitHub
- PUBDEV-761: Save model and restore model (from R)
- PUBDEV-1236: h2o-r/tests/testdir_misc/runit_mergecat.R failure (client mode only)
- PUBDEV-1402: move Rapids to /99 since it's going to be in flux for a while GitHub
- GitHub Fixes an operator precedence issue, and replaces debug GA target with actual one.
- GitHub Fix log download bug where all nodes were getting the same zip file.
###Shannon (3.0.0.18) - 6/9/15
####New Features
#####System
- PUBDEV-1163: implement h2o1-style model save/restore in h2o-3 GitHub
#####Python
- GitHub: Added --h2ojar option
####Enhancements
- PUBDEV-277: Make python equivalent of as.h2o() work for numpy array and pandas arrays
####Bug Fixes
#####Algorithms
- PUBDEV-1371: pca: get java.lang.AssertionError at hex.svd.SVD$SVDDriver.compute2(SVD.java:198)
- PUBDEV-1376: pca: predictions from h2o-3 and h2o-2 differs for attached data
- PUBDEV-1380: DL: when try to access the training frame from the link in the dl model get: Object not found
- PUBDEV-761: Save model and restore model (from R) GitHub
###Shannon (3.0.0.17) - 6/8/15
####New Features
- HEXDEV-209:Poisson distributions for GLM
- HEXDEV-210: Gamma distributions for GLM
- PUBDEV-1270: Python Interface needs H2O Cut Function GitHub
- PUBDEV-1242: Need equivalent of as.Date feature in Python GitHub
- PUBDEV-1165: H2O Python needs Modulus Operations
- HEXDEV-29: The ability to define features as categorical or continuous in the web UI and in the python API
- PUBDEV-1237: environment variable to disable the strict version check in the R and Python bindings
- PUBDEV-1175: Flow: Good interactive confusion matrix for binomial
- PUBDEV-1176: Flow: Good confusion matrix for multinomial
####Enhancements
#####Algorithms
- GitHub: GLM weights fix: regularize by sum of weights rather than number of observations
- GitHub: GLM fix: added line search (and limited number of iterations) to constant term model fitting with offset (could enter infinite loop)
- GitHub: No longer warn if
binomial_double_trees
option is enabled for_nclass
!=2 - GitHub: Fix CM table to have integer entries unless there are real-valued entries
- GitHub: Add extra assertion for
train_samples_per_iteration
- GitHub: Update model during runtime of algorithm.
- GitHub: Changes to glm forloop to add offsets and add NOPASS/NOFEATURE functionality back to run.py
#####R
- GitHub: month was off by one, runit test edited
- GitHub: Comments to clarify the policy on dates in H2O.
#####System
- HEXDEV-344: Logs should include JVM launch parameters
- PUBDEV-467: Show Frames for DL weights/biases in Flow
- PUBDEV-1221: add a "I like this" style button with LinkedIn or Github (beside the Flow Assist Me button)
- PUBDEV-1245: Flow: use new
_exclude_fields
query parameter to speed up REST API usage
####Bug Fixes
#####Algorithms
- PUBDEV-1353: GLM: model with weights different in R than in H2o for attached data
- PUBDEV-1358: GLM: when run with -ive weights, would be good to tell the user that -ive weights not allowed instead of throwing exception
- PUBDEV-1264: GLM: reporting incorrect null deviance GitHub
- PUBDEV-1362: GLM: when run with weights and offset get wrong ans
- PUBDEV-1263: GLM: name ordering for the coefficients is incorrect GitHub
- PUBDEV-1261: pca: wrong std dev for data with nas rest numeric cols GitHub
- PUBDEV-1218: pca: progress bar not showing progress just the initial and final progress status GitHub
- PUBDEV-1204: pca: from flow when try to invoke build model, displays-ERROR FETCHING INITIAL MODEL BUILDER STATE
- PUBDEV-1212: pca: with enum column reporting (some junk) wrong stdev/ rotation GitHub
- PUBDEV-1228: pca: no std dev getting reported for attached data
- PUBDEV-1233: pca: std dev for attached data differ when run on h2o-3 and h2o-2
- PUBDEV-1258: h2o.glm with offset column: get Error in .h2o.startModelJob(conn, algo, params) : Offset column 'logInsured' not found in the training frame.
- PUBDEV-1234: h2o.setTimezone throwing an error GitHub
- PUBDEV-1229: R: Most GLM accessors fail GitHub
- PUBDEV-1227: R: Cannot extract an enum value using data[row,col] GitHub
- HEXDEV-339: Feature engineering: log (1+x) fails GitHub
- PUBDEV-1249: h2o.glm: no way to specify offset or weights from h2o R GitHub
- PUBDEV-1255: create_frame: hangs with following msg in the terminal, java.lang.IllegalArgumentException: n must be positive
- PUBDEV-1361: runit_hex_1841_asdate_datemanipulation.R fails intermittently GitHub
- PUBDEV-1361: runit_hex_1841_asdate_datemanipulation.R fails intermittently
- PUBDEV-692: Upgrade SparklingWater to Spark 1.3
#####System
- PUBDEV-1288: Confusion Matrix: class java.lang.ArrayIndexOutOfBoundsException', with msg '2' java.lang.ArrayIndexOutOfBoundsException: 2 at hex.ConfusionMatrix.createConfusionMatrixHeader Github
- HEXDEV-323: SVMLight Parse Bug GitHub
- PUBDEV-1207: implement JSON field-filtering features:
_exclude_fields
- GitHub: Fix a missing field update in Job.
- PUBDEV-65: Handling of strings columns in summary is broken
- PUBDEV-1230: Parse: get AIOOB when parses the attached file with first two cols as enum while h2o-2 does fine
- PUBDEV-1377: Get AIOOBE when parsing a file with fewer column names than columns GitHub
- PUBDEV-1364: Variable importance Object
#####Web UI
- PUBDEV-1198: Flow: Selecting "Cancel" for "Load Notebook" prompt clears current notebook anyway
- PUBDEV-1172: Model builder takes forever to load the column names in Flow, hence cannot build any models
- PUBDEV-1248: Flow GLM: from Flow the drop down with column names does not show up and hence not able to select the offset column
- PUBDEV-1380: DL: when try to access the training frame from the link in the dl model get: Object not found GitHub
###Shannon (3.0.0.13) - 5/30/15
####New Features
#####Algorithms
- HEXDEV-260: Add Random Forests for regression
- PUBDEV-1166: Converting H2OFrame into Python object
- PUBDEV-1165: H2O Python needs Modulus Operations
#####R
- PUBDEV-1188: Merge should handle non-numeric columns (github)
- PUBDEV-1096: R: add weekdays() function in addition to month() and year()
####Enhancements
#####Algorithms
- github: Updated weights handling, test.
- HEXDEV-324poor GBM performance on KDD Cup 2009 competition dataset (github)
- HEXDEV-326: varImp() function for DRF and GBM (github)
- github: Change some of the defaults
#####API
- PUBDEV-669: have the /Frames/{key}/summary API call Vec.startRollupStats
#####R/Python
- PUBDEV-479: Port MissingInserter to R/Python
- PUBDEV-632: Display TwoDimTable of HitRatios in R/Python
- github: minor change to h2o.demo()
- github: add h2o.demo() facility to python package, along with some built-in (small) data
- github: remove cols param
####Bug Fixes
#####Algorithms
- PUBDEV-1211: pca: descaled pca, std dev seems to be wrong for attached data github
- PUBDEV-1213: pca: would be good to have the std dev numbered bec difficult to relate to the principle components (github)
- PUBDEV-1201: pca: get ArrayIndexOutOfBoundsException (github)
- PUBDEV-1203: pca: giving wrong std dev/rotation-labels for iris with species as enum (github)
- PUBDEV-1199: DL with <1 epochs has wrong initial estimated time (github)
- github: Fix missing AUC for training data in DL.
- github: Add the seed back to GBM imbalanced test (was set to 0 by default before, now explicit)
#####R
- PUBDEV-1189: R: h2o.hist broken for breaks that is a list of the break intervals (github)
- PUBDEV-1206: Frame summary from R and Python need to use the Frame summary endpoint (github)
- PUBDEV-1177: R summary() is slow when large number of columns
- PUBDEV-1097: R: R should be able to take a of paths similar to how python does
###Shannon (3.0.0.11) - 5/22/15
####Enhancements
#####Algorithms
- PUBDEV-1179: DRF: investigate if larger seeds giving better models
- PUBDEV-1178: Add logloss/AUC/Error to GBM/DRF Logs & ScoringHistory
- PUBDEV-1169: Use only 1 tree for DRF binomial (github)
- PUBDEV-1170: Wrong ROC is shown for DRF (Training ROC, even though Validation is given)
- PUBDEV-1162: Speed up sorting of histograms with O(N log N) instead of O(N^2)
#####System
- PUBDEV-1152: Accept s3a URLs
- HEXDEV-316: ImportFiles should not download files from HTTP
####Bug Fixes
#####Algorithms
- HEXDEV-253: model output consistency
- HEXDEV-319: DRF in h2o 3.0 is worse than in h2o 2.0 for Airline
- PUBDEV-1180: DRF has wrong training metrics when validation is given
#####API
- PUBDEV-501: H2OPredict: does not complain when you build a model with one dataset and predict on completely different dataset
#####Python
- PUBDEV-1183: Python version check should fail hard by default
- PUBDEV-1185: Python binding version mismatch check should fail hard and be on by default
- HEXDEV-138: Port Python tests for Deep Learning
#####R
- PUBDEV-1160: R: h2o.hist doesn't support breaks argument
- PUBDEV-1159: R: h2o.hist takes too long to run
- PUBDEV-1150: R CMD Check: URLs not working
- PUBDEV-1149: R CMD check not happy with our use of .OnAttach
- PUBDEV-1174: R: h2o.hist FD implementation broken
- PUBDEV-1167: R: h2o.group_by broken
- HEXDEV-318: the fix to H2O startup for the host unreachable from R causes a security hole
- PUBDEV-1187: FramesHandler.summary() needs to run summary on all Vecs concurrently.
#####System
- PUBDEV-862: Building a model without training file -> NPE
- HEXDEV-315: importFile fails: Error in fromJSON(txt, ...) : unexpected character: A
- PUBDEV-1137: Parse: upload and import gives different chunk compression on the same file
- PUBDEV-1054: Parse: h2o parses arff file incorrectly
- PUBDEV-1181: Rapids should queue and block on the back-end to prevent overlapping calls
- PUBDEV-1184: importFile fails for paths containing spaces
#####Web UI
- PUBDEV-1182: Flow: when upload file fails, the control does not come back to the flow screen, and have to refresh the whole page to get it back
- PUBDEV-1131: GBM crashes after calling getJobs in Flow
###Shannon (3.0.0.7) - 5/18/15
####Enhancements
- PUBDEV-711: take a final look at all REST API parameter names and help strings
- PUBDEV-757: Rename DocsV1 + DocsHandler to MetadataV1 + MetadataHandler
- PUBDEV-1138: Performance improvements for big data sets => getModels
- PUBDEV-1126: Performance improvements for big data sets => Get frame summary
#####System
- HEXDEV-316: ImportFiles should not download files from HTTP
#####Web UI
- PUBDEV-1144: Update/Fix Flow API for CreateFrame
####Bug Fixes
The following changes are to resolve incorrect software behavior:
- PUBDEV-501: H2OPredict: does not complain when you build a model with one dataset and predict on completely different dataset
- PUBDEV-1047: API : Get frames and Build model => takes long time to get frames
- HEXDEV-149: Allow JobsV3 to return properly typed jobs, not always instances of JobV3
- PUBDEV-1036: rename straggler V2 schemas to V3
- PUBDEV-1159: R: h2o.hist takes too long to run
#####System
- PUBDEV-1034: Windows 7/8/2012 Multicast Error UDP
- PUBDEV-862: Building a model without training file -> NPE
- HEXDEV-253: model output consistency
- PUBDEV-1135: While predicting get:class water.fvec.RollupStats$ComputeRollupsTask; class java.lang.ArrayIndexOutOfBoundsException: 5
- PUBDEV-1090: POJO: Models with "." in key name (ex. pros.glm) can't access pojo endpoint
- PUBDEV-1077: Getting an IcedHashMap warning from H2O startup
#####Web UI
- PUBDEV-1133: getModels in Flow returns error
- PUBDEV-926: Flow: When user hits build model without specifying the training frame, it would be good if Flow guides the user. It presently shows an NPE msg
- PUBDEV-1131: GBM crashes after calling getJobs in Flow
###Shannon (3.0.0.2) - 5/15/15
####New Features
- PUBDEV-411: ModelMetrics by model category
- PUBDEV-942: ModelMetrics by model category - Autoencoder
####Enhancements
#####Algorithms
- github: GLM update: skip lambda max during lambda search
- github: removed higher accuracy option
- github: Rename constant col parameter
- github: GLM update: added stopping criteria to lbfgs, tweaked some internal constants in ADMM
- github: Add support for
ignore_const_col
in DL
######Python
- PUBDEV-852: Binomial: show per-metric-optimal CM and per-threshold CM in Python
- github: add filterNACols to python
- github: h2o.delete replaced with h2o.removeFrameShallow
- github: Add distribution summary to Python
#####R
- github: add filterNACols to R
- github: explicitly set cols=TRUE for R style str on frames
- github: enable faster str, bulk nlevels, bulk levels, bulk is.factor
- github: Add optional blocking parameter to h2o.uploadFile
- PUBDEV-672 HTML version of the REST API docs should be available on the website
- PUBDEV-827: class GenModel duplicates part of code of Model
#####Web UI
- HEXDEV-181 Flow: Handle deep features prediction input and output
- github: removed
use_all_factor_levels
from glm flows
####Bug Fixes
#####Algorithms
- HEXDEV-302: AIOOBE during Prediction with DL github
- github: glm fix: don't force in null model for lambda search with user given list of lambdas
- github: Fix domain in glm scoring output for binomial
- github: GLM Fix - fix degrees of freedom when running without intercept (+/-1)
- github: GLM fix: make valid data info be clone of train data info (needs exactly the same categorical offsets, ignore unseen levels)
- github: Fix glm scoring, fill in default domain {0,1} for binary columns when scoring
#####R
- PUBDEV-1116: R: Parse that works from flow doesn't work from R using as.h2o
- PUBDEV-798: R: String Munging Functions Missing
- PUBDEV-584: R: hist() doesn't currently work for H2O objects
- PUBDEV-820: H2oR: model objects should return the CM when run classification like h2o1
- PUBDEV-1113: Remove Keys : Parse => Remove => doesn't complete
- PUBDEV-1102: R: h2o.rbind fails to join two dataset together
- PUBDEV-899: R: all doesn't work
- PUBDEV-555: H2O-R: str does not work
- PUBDEV-1110: H2OR: while printing a gbm model object, get invalid format '%d'; use format %f, %e, %g or %a for numeric objects
- PUBDEV-903: R: Errors from some rapids calls seem to fail to return an error
- HEXDEV-311: Performance bug from R with Expect: 100-continue
- PUBDEV-1030: h2o.performance: ignores the user specified threshold
- PUBDEV-1071: R: regression models don't show in print statement r2 but it exists in the model object
- PUBDEV-1072: R: missing accessors for glm specific fields
- PUBDEV-1032: After running some R and py demos when invoke a build model from flow get- rollup stats problem vec deleted error
- PUBDEV-1069: R: missing implementation for h2o.r2
- PUBDEV-1064: Passing sep="," to h2o.importFile() fails with '400 Bad Request'
- PUBDEV-1092: Get NPE while predicting
#####System
- PUBDEV-1091: S3 gzip parse failure
- PUBDEV-1081: Probably want to cleanly disable multicast (not retry) and print suggestion message, if multicast not supported on picked multicast network interface
- PUBDEV-1112: User has no way to specify whether to drop constant columns
- PUBDEV-1109: Change all extdata imports to uploadFile
- PUBDEV-1104: .gz file parse exception from local filesystem
- PUBDEV-1134: getPredictions in Flow returns error
- PUBDEV-1020: Flow : Drop NA Cols enable => Should automatically populate the ignored columns
- PUBDEV-1041: Flow GLM: formatting needed for the model parameter listing in the model object github
- PUBDEV-1108: Flow: When predict on data with no response get :Error processing POST /3/Predictions/models/gbm-a179db76-ba96-420f-a643-0e166aea3af3/frames/subset_1 'undefined' is not an object (evaluating 'prediction.model')
##H2O-Dev
###Shackleford (0.2.3.6) - 5/8/15
####New Features
#####Python
- Set up POJO download for Python client (PUBDEV-908) (github)
#####Sparkling Water
- Publish h2o-scala and h2o-app latest version to maven central (PUBDEV-443)
####Enhancements
#####Algorithms
- Use AUC's default threshold for label-making for binomial classifiers predict() (PUBDEV-1063) (github)
- GLM update (github)
- Cleanup AUC2, make incremental version (github)
- Name change:
override_with_best_model
->overwrite_with_best_model
(github) - Couple of GLM updates (github)
- Disable
_replicate_training_data
for data that's larger than 10GB (github) - Added
replicate_training_data
param for DL (github) - Change a few kmeans output parameters so no longer dividing by
nrows
ornum_clusters
(github) - GLMValidation Updated auc computation (github)
- Do not delete model metrics at end of GBM/DRF (github)
#####API
- Clean REST api for Parse (PUBDEV-993)
- Removes
is_valid
,invalid_lines
, and domains from REST api (github) - Annotate domains output field as expert level (github)
#####Python
- Implement h2o.interaction() (PUBDEV-854) (github)
- nice tables in ipython! (github)
- added deeplearning weights and biases accessors and respective pyunit. (github)
#####R
- Cleaner client POJO download for R (PUBDEV-907)
- Implement h2o.interaction() (PUBDEV-854) (github)
- R: h2o.impute missing (PUBDEV-796)
validation_frame
is passed through to h2o (github)- Adding GBM accessor function runits (github)
- Adding changes to
h2o.hit_ratio_table
to be like other accessors (i.e., no train) (github) - add h2o.getPOJO to R, fix impute ast build in python (github)
#####System
- Change NA strings to an array in ParseSetup (PUBDEV-995)
- Document way of passing S3 credentials for S3N (PUBDEV-947)
- Add H2O-dev doc on docs.h2o.ai via a new structure (proposed below) (PUBDEV-355)
- Rapids Ref Doc (PUBDEV-667)
- Show Timestamp and Duration for all model scoring histories (PUBDEV-1018) (github)
- Logs slow reads, mainly meant for noting slow S3 reads (github)
- Make prediction frame column names non-integer (github)
- Add String[] factor_columns instead of int[] factors (github)
- change the runtime exception to a Log.info() if interface doesn't support multicast (github)
- More robust way to copy Flow files to web root per Prithvi (github)
- Switches
na_string
from a single value per column to an array per column (github)
#####Web UI
- Model output improvements (HEXDEV-150)
####Bug Fixes
#####Algorithms
- H2O cloud shuts down with some H2O.fail error, while building some kmeans clusters (PUBDEV-1051) (github)
- GLM:beta constraint does not seem to be working (PUBDEV-1083)
- GBM - random attack bug (probably because
max_after_balance_size
is really small) (PUBDEV-1061) (github) - GLM: LBFGS objval java lang assertion error (PUBDEV-1042) (github)
- PCA Cholesky NPE (PUBDEV-921)
- GBM: H2o returns just 5525 trees, when ask for a much larger number of trees (PUBDEV-860)
- CM returned by AUC2 doesn't agree with manual-made labels from F1-optimal threshold (HEXDEV-263)
- AUC: h2o reporting wrong auc on a modified covtype data (PUBDEV-891)
- GLM: Build model => Predict => Residual deviance/Null deviance different from training/validation metrics (PUBDEV-991)
- KMeans metrics incomplete (PUBDEV-1029)
- GLM: Java Assertion Error (PUBDEV-1025)
- Random forest bug (PUBDEV-1015)
- A particular random forest model has an empty (training) metric json
max_criteria_and_metric_scores
(PUBDEV-1001) - PCA results exhibit numerical inaccuracies compared to R (PUBDEV-550)
- DRF: reporting wrong depth for attached dataset (PUBDEV-1006)
- added missing "names" column name to beta constraints processing (github)
- Fix
balance_classes
probability correction consistency between H2O and POJO (github) - Fix in GLM scoring - check actual for NaNs as well (github)
#####Python
- Cannot import_file path=url python interface (PUBDEV-1059)
- head()/tail() should show labels, rather than number encoding, for enum columns (PUBDEV-1017)
- h2o.py: for binary response printing transpose and hence wrong cm (PUBDEV-1013)
#####R
- Broken Summary in R (PUBDEV-1073
- h2oR summary: displaying no labels in summary (PUBDEV-1008)
- R/Python impute bugs (PUBDEV-1055)
- R: h2o.varimp doubles the print statement (PUBDEV-1068)
- R: h2o.varimp returns NULL when model has no variable importance (PUBDEV-1078)
- h2oR: h2o.confusionMatrix(my_gbm, validation=F) should not show a null (PUBDEV-849)
- h2o.impute doesn't impute (PUBDEV-1024)
- R: as.h2o cutting entries when trying to import data.frame into H2O (HEXDEV-293)
- The default names are too long, for an R-datafile parsed to H2O, and needs to be changed (PUBDEV-976)
- H2o.confusionMatrix: when invoked with threshold gives error (PUBDEV-1010)
- removing train and adding error messages for valid = TRUE when there's not validation metrics (github)
#####System
- Download logs is returning the same log file bundle for every node (PUBDEV-1056)
- ParseSetup is useless and misleading for SVMLight (PUBDEV-994)
- Fixes bug that was short circuiting the setting of column names (github)
#####Web UI
- Flow: Predict should not show mse confusion matrix etc (PUBDEV-987) (github)
- Flow: Raw frames left out after importing files from directory (PUBDEV-1046)
###Shackleford (0.2.3.5) - 5/1/15
####New Features
#####API
- Need a /Log REST API to log client-side errors to H2O's log (HEXDEV-291)
#####Python
- add impute to python interface (github)
#####System
- Job admission control (PUBDEV-536) (github)
- Get Flow Exceptions/Stack Traces in H2O Logs (PUBDEV-920)
####Enhancements
#####Algorithms
- GLM: Name to be changed from normalized to standardized in output to be consistent between input/output (PUBDEV-954)
- GLM: It would be really useful if the coefficient magnitudes are reported in descending order (PUBDEV-923)
- PUBDEV-536: Limit DL models to 100M parameters (github)
- PUBDEV-536: Add accurate memory-based admission control for GBM/DRF (github)
- relax the tolerance a little more...(github)
- Tree depth correction (github)
- Comment out
duration_in_ms
for now, as it's always left at 0 (github) - Updated min mem computation for glm (github)
- GLM update: added lambda search info to scoring history (github)
#####Python
- python .show() on model and metric objects should match R/Flow as much as possible (HEXDEV-289)
- GLM model output, details from Python (HEXDEV-95)
- GBM model output, details from Python (HEXDEV-102)
- Run GBM from Python (HEXDEV-99)
- map domain to result from /Frames if needed (github)
- added confusion matrix to metric output (github)
- update
metrics_base_confusion_matrices()
(github) - fetch out
string_data
if type is string (github)
#####R
- GBM model output, details from R (HEXDEV-101)
- Run GBM from R (HEXDEV-98)
- check if it's a frame then check NA (github)
#####System
- Report MTU to logs (PUBDEV-614) (github)
- Make parameter changes Log.info() instead of Log.warn() (github)
#####Web UI
- Flow: Confusion matrix: good to have consistency in the column and row name (letter) case (PUBDEV-971)
- Run GBM Multinomial from Flow (HEXDEV-111)
- Run GBM Regression from Flow (HEXDEV-112)
- Sort model types in alphabetical order in Flow (PUBDEV-1011)
####Bug Fixes
The following changes are to resolve incorrect software behavior:
#####Algorithms
- GLM: Model output display issues (PUBDEV-956)
- h2o.glm: ignores validation set (PUBDEV-958)
- DRF: reports wrong number of leaves in a summary (PUBDEV-930)
- h2o.glm: summary of a prediction frame gives na's as labels (PUBDEV-959)
- GBM: reports wrong max depth for a binary model on german data (PUBDEV-839)
- GLM: Confusion matrix missing in R for binomial models (PUBDEV-950) (github)
- GLM: On airlines(40g) get ArrayIndexOutOfBoundsException (PUBDEV-967)
- GLM: Build model => Predict => Residual deviance/Null deviance different from training/validation metrics (PUBDEV-991)
- Domains returned by GLM for binomial classification problem are integers, but should be mapped to their label (PUBDEV-999)
- GLM: Validation on non training data gives NaN Res Deviance and AIC (PUBDEV-1005)
- Confusion matrix has nan's in it (PUBDEV-1000)
- glm fix: pass
model_id
from R (was being dropped) (github)
#####Python
- H2OPy: warns about version mismatch even when installed the latest from master (PUBDEV-980)
- Columns of type enum lose string label in Python H2OFrame.show() (PUBDEV-965)
- Bug in H2OFrame.show() (HEXDEV-295) (github)
#####R
- h2o.confusionMatrix for binary response gives not-found thresholds (PUBDEV-957)
- GLM: model_id param is ignored in R (PUBDEV-1007)
- h2o.confusionmatrix: mixing cases(letter) for categorical labels while printing multinomial cm (PUBDEV-996)
- fix the dupe thresholds error (github)
- extra arg in impute example (github)
- fix missing param data (github)
#####System
- Builds : Failing intermittently due to java.lang.StackOverflowError (PUBDEV-972)
- Get H2O cloud hang with NPE and roll up stats problem, when click on build model glm from flow, on laptop after running a few python demos and R scripts (PUBDEV-963)
#####Web UI
- Flow :=> Airlines dataset => Build models glm/gbm/dl => water.DException$DistributedException: from /172.16.2.183:54321; by class water.fvec.RollupStats$ComputeRollupsTask; class java.lang.NullPointerException: null (PUBDEV-603)
- Flow => Preview Pojo => collapse not working (PUBDEV-977)
- Flow => Any algorithm => Select response => Select Add all for ignored columns => Try to unselect some from ignored columns => Build => Response column IsDepDelayed not found in frame: allyears_1987_2013.hex. (PUBDEV-978)
- Flow => ROC curve select something on graph => Table is displayed for selection => Collapse ROC curve => Doesn't collapse table, collapses only graph (PUBDEV-1003)
###Severi (0.2.2.16) - 4/29/15
####New Features
#####Python
- Release h2o-dev to PyPi (PUBDEV-762)
- Python Documentation (PUBDEV-901)
- Python docs Wrap Up (PUBDEV-966)
- add getters for res/null dev, fix kmeans,dl getters (github)
####Enhancements
#####Algorithms
- Use partial-sum version of mat-vec for DL POJO (PUBDEV-936)
- Always store weights and biases for DLTest Junit (github)
- Show the DL model size in the model summary (github)
- Remove assertion in hot loop (github)
- Rename ADMM to IRLSM (github)
- Added no intercept option to glm (github)
- Code cleanup. Moved ModelMetricsPCAV3 out of H2O-algos (github)
- Improve DL model checkpoint logic (github)
- Updated glm output (github)
- Renamed normalized coefficients to standardized coefficients in glm output (github)
- Use proper tie breaking for NB (github)
- Add check that DL parameters aren't modified by model training (github)
- Reduce tolerances (github)
- If no observations of a response leveland prediction is numeric, assume it is drawn from standard normal distribution (mean 0, standard deviation 1). Add validation test with split frame for naive Bayes (github)
#####Python
- replaced H2OFrame.send_frame() calls with cbind Exprs so that lazy evaluation is enforced (github)
- change default xmx/s behavior of h2o.init() (github)
- better handling of single row return and print (github)
#####R
- Added interpolation to quantile to match R type 7 (github)
- Removed and tidied if's in quantile.H2OFrame since it now uses match.arg (github)
- Connected validation dataset to glm in R (github)
- Removing h2o.aic from seealso link (doesn't exist) and updating documentation (github)
#####System
- Add number of rows (per node) to ChunkSummary (PUBDEV-938) (github)
- allow nrow as alias for count in groupby (github)
- Only launches task to fill in SVM zeros if the file is SVM (github)
- Adds more log traces to track progress of post-ingest actions (github)
- Adds svm as a file extension to the hex name cleanup (github)
#####Web UI
- Flow: Inspect data => Round decimal points to 1 to be consistent with h2o1 (PUBDEV-453)
- Setup POJO download method for Flow (PUBDEV-909)
- Pretty-print POJO preview in flow (PUBDEV-940)
- Flow: It would be good if 'get predictions' also shows the data (PUBDEV-883)
- GBM model output, details in Flow (HEXDEV-103)
- Display a linked data table for each visualization in Flow (PUBDEV-318)
- Run GBM binomial from Flow (needs proper CM) (PUBDEV-943)
####Bug Fixes
#####Algorithms
- GLM: results from model and prediction on the same dataset do not match (PUBDEV-922)
- GLM: when select AUTO as solver, for prostate, glm gives all zero coefficients (PUBDEV-916)
- Large (DL) models cause oversize issues during serialization (PUBDEV-941)
- Fixed name change for ADMM (github)
#####API
- Fix schema warning on startup (PUBDEV-946) (github)
#####Python
- H2OVec.row_select(H2OVec) fails on case where only 1 row is selected (PUBDEV-948)
- fix pyunit (github)
#####R
- R: Parse of zip file fails, Summary fails on citibike data (PUBDEV-835)
- h2o. performance reports a different Null Deviance than the model object for the same dataset (PUBDEV-816)
- h2o.glm: no example on h2o.glm help page (PUBDEV-962)
- H2O R: Confusion matrices from R still confused (PUBDEV-904) (github)
- R: h2o.confusionMatrix("H2OModel", ...) extra parameters not working (PUBDEV-953) (github)
- h2o.confusionMatrix for binomial gives not-found thresholds on S3 -airlines 43g (PUBDEV-957)
- H2O summary quartiles outside tolerance of (max-min)/1000 (PUBDEV-671)
- fix space headers issue from R (was not url-encoding the column strings) (github)
- R CMD fixes (github)
- Fixed broken R interface - make
validation_frame
non-mandatory (github)
#####Sparkling Water
- Sparkling water : #UDP-Recv ERRR: UDP Receiver error on port 54322java.lang.ArrayIndexOutOfBoundsException:(PUBDEV-311)
#####System
- Mapr 3.1.1 : Memory is not being allocated for what is asked for instead the default is what cluster gets (PUBDEV-937)
- GLM: AIOOBwith msg '-14' at water.RPC$2.compute2(RPC.java:593) (PUBDEV-917)
- h2o.glm: model summary listing same info twice (PUBDEV-915)
- Parse: Detect and reject UTF-16 encoded files (HEXDEV-285)
- DataInfo Row categorical encoding AIOOBE (HEXDEV-283)
- Fix POJO Preview exception (github)
- Fix NPE in ChunkSummary (github)
- fix global name collision (github)
###Severi (0.2.2.15) - 4/25/15
####New Features
#####Python
- added min, max, sum, median for H2OVecs and respective pyunit (github)
- added min(), max(), and sum() functionality on H2OFrames and respective pyunits (github)
#####Web UI
- View POJO in Flow (PUBDEV-781)
- help > about page or add version on main page for easy bug reporting. (PUBDEV-804)
- POJO generation: GLM (PUBDEV-712) (github)
- GLM model output, details in Flow (HEXDEV-96)
####Enhancements
#####Algorithms
- K means output clean up (HEXDEV-187)
- Add FNR/TNR/FPR/TPR to threshold tables, remove recall, specificity (github)
- Add accessor for variable importances for DL (github)
- Relax CM error tolerance for F1-optimal threshold now that AUC2 doesn't necessarily create consistent thresholds with its own CMs. (github)
- Added scoring history to glm (github)
- Added model summary to glm (github)
- Add flag to support reading data from S3N (github)
- Added degrees of freedom to GLM metrics schemas (github)
- Allow DL scoring_history to be unlimited in length (github)
- add plotting for binomial models (github)
- Ignore certain parameters that are not applicable (class balancing, max CM size, etc.) (github)
- Updated glm scoring, fill training/validation metrics in model output (github)
- Rename gbm loss parameter to distribution (github)
- Fix GBM naming: loss -> distribution (github)
- GLM LBFGS update (github)
- na.rm for quantile is default behavior (github)
- GLM update: enabled
max_predictors
in REST, updated lbfgs (github) - Remove
keep_cross_validation_splits
for now from DL (github) - Get rid of sigma in the model metrics, instead show r2 (github)
- Don't show
score_every_iteration
for DL (github) - Don't print too large confusion matrices in Tree models (github)
#####API
- publish h2o-model.jar via REST API (PUBDEV-779)
- move all schemas and endpoints to v3 (PUBDEV-471)
- clean up routes (remove AddToNavbar, fix /Quantiles, etc) (PUBDEV-618) (github)
- More data in chunk_homes call. Add num_chunks_per_vec. Add num_vec. (github)
- Added chunk_homes route for frames (github)
- Update to use /3 routes (github)
#####Python
- Python client should check that version number == server version number (PUBDEV-799)
- Add asfactor for month (github)
- in Expr.show() only show 10 or less rows. remove locate from runit test because full path used (github)
- change nulls to () (github)
- sigma is no longer part of ModelMetricsRegressionV3 (github)
#####R
- Fix integer -> int in R (github)
- add autoencoder show method (github)
- accessor is $ not @ (github)
- add
hit_ratio_table
andvarimp
calls to R (github) - add h2o.predict as alternative (github)
- update model output in R (github)
#####System
- Port MissingValueInserter EndPoint to h2o-dev. (PUBDEV-465)
- Rapids: require a (put "key" %frame) (PUBDEV-868)
- Need pojo base model jar file embedded in h2o-dev via build process (PUBDEV-780) (github)
- Make .json the default (PUBDEV-619) (github)
- Rename class for clarification (github)
- Classifies all NA columns as numeric. Also improves preview sampling accuracy by trimming partial lines at end of chunk. (github)
- Implements sampling of files within the ParseSetup preview. This prevents poor column type guesses from only sampling the beginning of a file. (github).
- Rename fields
drop_na20_col
(github) - allow for many deletes as final statements in a block (github)
- rename initF -> init_f, dropNA20Cols -> drop_na20_cols (github)
- Removed tweedie param (github)
- thresholds -> threshold (github)
- JSON of TwoDimTable with all null values in the first column (no row headers) now doesn't have an empty column for of "" or nulls. (github)
- move H2O_Load, fix all the timezone functions (github)
- Add extra verbose printout in case Frames don't match identically (github)
- allow delayed column lookup (github)
- add mixed type list (github)
- Added WaterMeterIo to count persist info (github)
- Remove special setChunkSize code in HDFS and NFS file vec (github)
- add check for Frame on string parse (github)
- Disable Memory Cleaner (github)
- Handle '<' chars in Keys when swapping (github)
- allow for colnames in slicing (github)
- Adjusts parse type detection. If column is all one string value, declare it an enum (github)
#####Web UI
- nice algo names in the Flow dropdown (full word names) (PUBDEV-707)
- Compute and Display Hit Ratios (PUBDEV-630)
- Limit POJO preview to 1000 lines (github)
####Bug Fixes
#####Algorithms
- GLM: lasso i.e alpha =1 seems to be giving wrong answers (PUBDEV-769)
- AUC: h2o reports .5 auc when actual auc is 1 (PUBDEV-879)
- h2o.glm: No output displayed for the model (PUBDEV-858)
- h2o.glm model object output needs a fix (PUBDEV-815)
- h2o.glm model object says : fill me in GLMModelOutputV2; I think I'm redundant [1] FALSE (PUBDEV-765)
- GLM : Build GLM Model => Java Assertion error (PUBDEV-686)
- GLM :=> Progress shows -100% (PUBDEV-861)
- GBM: Negative sign missing in initF value for ad dataset (PUBDEV-880)
- K-Means takes a validation set but doesn't use it (PUBDEV-826)
- Absolute_MCC is NaN (sometimes) (PUBDEV-848) (github)
- GBM: A proper error msg should be thrown when the user sets the max depth =0 (PUBDEV-838) (github)
- DRF Regression Assertion Error (PUBDEV-824)
- h2o.randomForest: if h2o is not returning the mse for the 0th tree then it should not be reported in the model object (PUBDEV-811)
- GBM: Got exception
class java.lang.AssertionError
with msgnull
java.lang.AssertionError at hex.tree.gbm.GBM$GBMDriver$GammaPass.map (PUBDEV-693) - GBM: Got exception
class java.lang.AssertionError
with msgnull
java.lang.AssertionError at hex.ModelMetricsMultinomial$MetricBuildMultinomial.perRow (HEXDEV-248) - GBM get java.lang.AssertionError: Coldata 2199.0 out of range C17:5086.0-19733.0 step=57.214844 nbins=256 isInt=1 (HEXDEV-241)
- GLM: glmnet objective function better than h2o.glm (PUBDEV-749)
- GLM: get AIOOB:-36 at hex.glm.GLMTask$GLMIterationTask.postGlobal(GLMTask.java:733) (PUBDEV-894) (github)
- Fixed glm behavior in case no rows are left after filtering out NAs (github)
- Fix memory leak in validation scoring in K-Means (github)
#####API
- API unification: DataFrame should be able to accept URI referencing file on local filesystem (PUBDEV-709) (github)
#####Python
- Python: describe returning all zeros (PUBDEV-875)
- python/R & merge() (PUBDEV-834)
- python Expr min, max, median, sum bug (PUBDEV-845) (github)
#####R
- (R and Python) clients must not pass response to DL AutoEncoder model builder (PUBDEV-897) (github)
- h2o.varimp, h2o.hit_ratio_table missing in R (PUBDEV-842)
- GLM: No help for h2o.glm from R (PUBDEV-732)
- h2o.confusionMatrix not working for binary response (PUBDEV-782) (github)
- h2o.splitframe complains about destination keys (PUBDEV-783)
- h2o.assign does not work (PUBDEV-784) (github)
- H2oR: should display only first few entries of the variable importance in model object (PUBDEV-850)
- R: h2o.confusion matrix needs formatting (PUBDEV-764)
- R: h2o.confusionMatrix => No Confusion Matrices for H2ORegressionMetrics (PUBDEV-710)
- h2o.deeplearning: model object output needs a fix (PUBDEV-821)
- h2o.varimp, h2o.hit_ratio_table missing in R (PUBDEV-842)
- force gc more frequently (github)
#####System
- MapR FS loads are too slow (PUBDEV-927)
- ensure that HDFS works from Windows (PUBDEV-812)
- Summary: on a time column throws,'null' is not an object (evaluating 'column.domain[level.index]') in Flow (PUBDEV-867)
- Parse: An enum column gets parsed as int for the attached file (PUBDEV-606)
- Parse => 40Mx1_uniques => class java.lang.RuntimeException (PUBDEV-729)
- if there are fewer than 5 unique values in a dataset column, mins/maxs reports e+308 values (PUBDEV-150) (github)
- Sparkling water -
DataFrame[T_UUID]
toSchemaRDD[StringType]
(PUDEV-771) - Sparkling water -
DataFrame[T_NUM(Long)]
toSchemaRDD[LongType]
(PUBDEV-767) - Sparkling water -
DataFrame[T_ENUM]
toSchemaRDD[StringType]
(PUBDEV-766) - Inconsistency in row and col slicing (HEXDEV-265) (github)
- rep_len expects literal length only (HEXDEV-268) (github)
- cbind and = don't work within a single rapids block (HEXDEV-237)
- Rapids response for c(value) does not have frame key (HEXDEV-252)
- S3 parse takes forever (PUBDEV-876)
- Parse => Enum unification fails in multi-node parse (PUBDEV-718) (github)
- All nodes are not getting updated with latest status of each other nodes info (PUBDEV-768)
- Cluster creation is sometimes rejecting new nodes (post jenkins-master-1128+) (PUBDEV-807)
- Parse => Multiple files 1 zip/ 1 csv gives Array index out of bounds (PUBDEV-840)
- Parse => failed for X5MRows6KCols ==> OOM => Cluster dies (PUBDEV-836)
- /frame/foo pagination weirded out (HEXDEV-277) (github)
- Removed code that flipped enums to strings (github)
#####Web UI
- Flow: It would be really useful to have the mse plots back in GBM (PUBDEV-889)
- State change in Flow is not fully validated (PUBDEV-919)
- Flows : Not able to load saved flows from hdfs (PUBDEV-872)
- Save Function in Flow crashes (PUBDEV-791) (github)
- Flow: should throw a proper error msg when user supplied response have more categories than algo can handle (PUBDEV-866)
- Flow display of a summary of a column with all missing values fails. (HEXDEV-230)
- Split frame UI improvements (HEXDEV-275)
- Flow : Decimal point precisions to be consistent to 4 as in h2o1 (PUBDEV-844)
- Flow: Prediction frame is outputing junk info (PUBDEV-825)
- EC2 => Cluster of 16 nodes => Water Meter => shows blank page (PUBDEV-831)
- Flow: Predict - "undefined is not an object (evaluating
prediction.thresholds_and_metric_scores.name
) (PUBDEV-559) - Flow: inspect getModel for PCA returns error (PUBDEV-610)
- Flow, RF: Can't get Predict results; "undefined is not an object (evaluating
prediction.confusion_matrices.length
)" (PUBDEV-695) - Flow, GBM: getModel is broken -Error processing GET /3/Models.json/gbm-b1641e2dc3-4bad-9f69-a5f4b67051ba null is not an object (evaluating
source.length
) (PUBDEV-800)
###Severi (0.2.2.1) - 4/10/15
####New Features
#####R
- Implement /3/Frames/<my_frame>/summary (PUBDEV-6) (github)
- add allparameters slot to allow default values to be shown (github)
- add log loss accessor (github)
####Enhancements
#####Algorithms
- POJO generation: GBM (PUBDEV-713)
- POJO generation: DRF (PUBDEV-714)
- Compute and Display Hit Ratios (PUBDEV-630) (github)
- Add DL POJO scoring (PUBDEV-585)
- Allow validation dataset for AutoEncoder (PUDEV-581)
- PUBDEV-580: Add log loss to binomial and multinomial model metric (github)
- Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)
- increase tolerance to 2e-3 (was 1e-3 ..failed with 0.001647 relative difference (github)
- change tolerance to 1e-3 (github)
- Add option to export weights and biases to REST API / Flow. (github)
- Add scree plot for H2O PCA models and fix Runit test. (github)
- Remove quantiles from the model builders list. (github)
- GLM update: added row filtering argument to line search task, fixed issues with dfork/asyncExec (github)
- Updated rho-setting in GLM. (github)
- No threshold 0.5; use the default (max F1) instead (github)
- GLM update: updated initilization, NA row filtering, default lambda is now empty, will be picked based on the fraction of lambda_max. (github)
- Updated ADMM solver. (github)
- Added makeGLMModel call. (github)
- Start with classification error NaN at t=0 for DL, not with 1. (github)
- Relax DL POJO relative tolerance to 1e-2. (github)
- Override nfeatures() method in DLModelOutput. (github)
- Renaming of fields in GLM (github)
- GLM: Take out Balance Classes (PUBDEV-795)
#####API
- schema metadata for Map fields should include the key and value types (PUBDEV-753) (github)
- schema metadata should include the superclass (PUBDEV-754)
- rest api naming convention: n_folds vs ntrees (PUBDEV-737)
- schema metadata for Map fields should include the key and value types (PUBDEV-753)
- Create REST Endpoint for exposing .java pojo models (PUBDEV-778)
#####Python
- Run GLM from Python (including LBFGS) (HEXDEV-92)
- added H2OFrame show(), as_list(), and slicing pyunits (github)
- changed solver parameter to "L_BFGS" (github)
- added multidimensional slicing of H2OFrames and Exprs. (github)
- add h2o.groupby to python interface (github)
- added H2OModel.confusionMatrix() to return confusion matrix of a prediction (github)
#####R
- PUBDEV-578, PUBDEV-541, PUBDEV-566. -R client now sends the data frame column names and data types to ParseSetup. -R client can get column names from a parsed frame or a list. -Respects client request for column data types (github)
- R: Cannot create new columns through R (PUBDEV-571)
- H2O-R: it would be more useful if h2o.confusion matrix reports the actual class labels instead of [,1] and [,2] (PUBDEV-553)
- Support both multinomial and binomial CM (github)
#####System
- Flow: Standardize
max_iters
/max_iterations
parameters (PUBDEV-447) (github) - Add ERROR logging level for too-many-retries case (PUBDEV-146) (github)
- Simplify checking of cluster health. Just report the status immediately. (github)
- reduce timeout (github)
- strings can have ' or " beginning (github)
- Throw a validation error in flow if any training data cols are non-numeric (github)
- Add getHdfsHomeDirectory(). (github)
- Added --verbose. (github)
#####Web UI
- PUBDEV-707: nice algo names in the Flow dropdown (full word names) (github)
- Unbreak Flow's ConfusionMatrix display. (github)
- POJO generation: DL (PUBDEV-715)
####Bug Fixes
#####Algorithms
- GLM : Build GLM model with nfolds brings down the cloud => FATAL: unimplemented (PUBDEV-731) (github)
- DL : Build DL Model => FATAL: unimplemented: n_folds >= 2 is not (yet) implemented => SHUTSDOWN CLOUD (PUBDEV-727) (github)
- GBM => Build GBM model => No enum constant hex.tree.gbm.GBMModel.GBMParameters.Family.AUTO (PUBDEV-723)
- GBM: When run with loss = auto with a numeric column get- error :No enum constant hex.tree.gbm.GBMModel.GBMParameters.Family.AUTO (PUBDEV-708) (github)
- gbm: does not complain when min_row >dataset size (PUBDEV-694) (github)
- GLM: reports wrong residual degrees of freedom (PUBDEV-668)
- H2O dev reports less accurate aucs than H2O (PUBDEV-602)
- GLM : Build GLM model fails => ArrayIndexOutOfBoundsException (PUBDEV-601)
- divide by zero in modelmetrics for deep learning (PUBDEV-568)
- GBM: reports 0th tree mse value for the validation set, different than the train set ,When only train sets is provided (PUDEV-561)
- GBM: Initial mse in bernoulli seems to be off (PUBDEV-515)
- GLM : Build Model fails with Array Index Out of Bound exception (PUBDEV-454) (github)
- Custom Functions don't work in apply() in R (PUBDEV-436)
- GLM failure: got NaNs and/or Infs in beta on airlines (PUBDEV-362)
- MetricBuilderMultinomial.perRow AssertionError while running GBM (HEXDEV-240)
- Problems during Train/Test adaptation between Enum/Numeric (HEXDEV-229)
- DRF/GBM balance_classes=True throws unimplemented exception (HEXDEV-226) (github)
- AUC reported on training data is 0, but should be 1 (HEXDEV-223) (github)
- glm pyunit intermittent failure (HEXDEV-199)
- Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
- get rid of nfolds= param since it's not supported in GLM yet (github)
- Fixed degrees of freedom (off by 1) in glm, added test. (github)
- GLM fix: fix filtering of rows with NAs and fix in sparse handling. (github)
- Fix GLM job fail path to call Job.fail(). (github)
- Full AUC computation, bug fixes (github)
- Fix ADMM for upper/lower bounds. (updated rho settings + update u-vector in ADMM for intercept) (github)
- Few glm fixes (github)
- DL : KDD Algebra data set => Build DL model => ArrayIndexOutOfBoundsException (PUBDEV-696)
- GBm: Dev vs H2O for depth 5, minrow=10, on prostate, give different trees (PUBDEV-759)
- GBM param min_rows doesn't throw exception for negative values (PUBDEV-697)
- GBM : Build GBM Model => Too many levels in response column! (java.lang.IllegalArgumentException) => Should display proper error message (PUBDEV-698)
- GBM:Got exception 'class java.lang.AssertionError', with msg 'Something is wrong with GBM trees since returned prediction is Infinity (PUBDEV-722)
#####API
- Cannot adapt numeric response to factors made from numbers (PUBDEV-620)
- not specifying response_column gets NPE (deep learning build_model()) I think other algos might have same thing (PUBDEV-131)
- NPE response has null msg, exception_msg and dev_msg (HEXDEV-225)
- Flow :=> Save Flow => On Mac and Windows 8.1 => NodePersistentStorage failure while attempting to overwrite (?) a flow (HEXDEV-202) (github)
- the can_build field in ModelBuilderSchema needs values[] to be set (PUBDEV-755)
- value field in the field metadata isn't getting serialized as its native type (PUBDEV-756)
#####Python
- python api asfactor() on -1/1 column issue (HEXDEV-203)
#####R
- Rapids: Operations %/% and %% returns Illegal Argument Exception in R (PUBDEV-736)
- quantile: H2oR displays wrong quantile values when call the default quantile without specifying the probs (PUBDEV-689)(github)
- as.factor: If a user reruns as.factor on an already factor column, h2o should not show an exception (PUBDEV-622)
- as.factor works only on positive integers (PUBDEV-617) (github)
- H2O-R: model detail lists three mses, the first MSE slot does not contain any info about the model and hence, should be removed from the model details (PUBDEV-605) (github)
- H2O-R: Strings: While slicing get Error From H2O: water.DException$DistributedException (PUBDEV-592)
- R: h2o.confusionMatrix should handle both models and model metric objects (PUBDEV-590)
- R: as.Date not functional with H2O objects (PUBDEV-583) (github)
- R: some apply functions don't work on H2OFrame objects (PUBDEV-579) (github)
- h2o.confusionMatrices for multinomial does not work (PUBDEV-577)
- R: slicing issues (PUBDEV-573)
- R: length and is.factor don't work in h2o.ddply (PUBDEV-572) (github)
- R: apply(hex, c(1,2), ...) doesn't properly raise an error (PUBDEV-570) (github)
- R: Slicing negative indices to negative indices fails (PUBDEV-569) (github)
- h2o.ddply: doesn't accept anonymous functions (PUBDEV-567) (github)
- ifelse() cannot return H2OFrames in R (PUBDEV-543)
- as.h2o loses track of headers (PUBDEV-541)
- H2O-R not showing meaningful error msg (PUBDEV-502)
- H2O.fail() had better fail (PUBDEV-470) (github)
- fix issue in toEnum (github)
- fix colnames and new col creation (github)
- R: h2o.init() is posting warning messages of an unhealthy cluster when the cluster is fine. (PUBDEV-734)
- h2o.split frame is failing (PUBDEV-560)
#####System
- key type failure should fail the request, not the cloud (PUBDEV-739) (github)
- Parse => Import Medicare supplier file => Parse = > Illegal argument for field: column_names of schema: ParseV2: string and key arrays' values must be quoted, but the client sent: " (PUBDEV-719)
- Overwriting a constant vector with strings fails (PUBDEV-702)
- H2O - gets stuck while calculating quantile,no error msg, just keeps running a job that normally takes less than a sec (PUBDEV-685)
- Summary and quantile on a column with all missing values should not throw an exception (PUBDEV-673) (github)
- View Logs => class java.lang.RuntimeException: java.lang.IllegalArgumentException: File /home2/hdp/yarn/usercache/neeraja/appcache/application_1427144101512_0039/h2ologs/h2o_172.16.2.185_54321-3-info.log does not exist (PUBDEV-600)
- Parse: After parsing Chicago crime dataset => Not able to build models or Get frames (PUBDEV-576)
- Parse: Numbers completely parsed wrong (PUBDEV-574)
- Flow: converting a column to enum while parsing does not work (PUBDEV-566)
- Parse: Fail gracefully when asked to parse a zip file with different files in it (PUBDEV-540)(github)
- toDataFrame doesn't support sequence format schema (array, vectorUDT) (PUBDEV-457)
- Parse : Parsing random crap gives java.lang.ArrayIndexOutOfBoundsException: 13 (PUBDEV-428)
- The quote stripper for column names should report when the stripped chars are not the expected quotes (PUBDEV-424)
- import directory with large files,then Frames..really slow and disk grinds. Files are unparsed. Shouldn't be grinding (PUBDEV-98)
- NodePersistentStorage gets wiped out when hadoop cluster is restarted (HEXDEV-185)
- h2o.exec won't be supported (github)
- fixed import issue (github)
- fixed init param (github)
- fix repeat as.factor NPE (github)
- startH2O set to False in init (github)
- hang on glm job removal (PUBDEV-726)
- Flow - changed column types need to be reflected in parsed data (HEXDEV-189)
- water.DException$DistributedException while running kmeans in multinode cluster (PUBDEV-691)
- Frame inspection prior to file parsing, corrupts parsing (PUBDEV-425)
#####Web UI
- Flow, DL: Need better fail message if "Autoencoder" and "use_all_factor_levels" are both selected (PUBDEV-724)
- When select AUTO while building a gbm model get ERROR FETCHING INITIAL MODEL BUILDER STATE (PUBDEV-595)
- Flow : Build h2o-dev-0.1.17.1009 : Building GLM model gives java.lang.ArrayIndexOutOfBoundsException: (PUBDEV-205 (github)
- Flow:Summary on flow broken for a long time (PUBDEV-785)
####New Features
#####Algorithms
- Naive Bayes in H2O-dev (PUBDEV-158)
- GLM model output, details from R (HEXDEV-94)
- Run GLM Regression from Flow (including LBFGS) (HEXDEV-110)
- PCA (PUBDEV-157)
- Port Random Forest to h2o-dev (PUBDEV-455)
- Enable DRF model output (github)
- Add DRF to Flow (Model Output) (PUBDEV-533)
- Grid for GBM (github)
- Run Deep Learning Regression from Flow (HEXDEV-109)
#####Python
- Add Python wrapper for DRF (PUBDEV-534)
#####R
- Add R wrapper for DRF (PUBDEV-530)
#####System
- Include uploadFile (PUBDEV-299) (github)
- Added -flow_dir to hadoop driver (github)
#####Web UI
- Add Flow packs (HEXDEV-190) (PUBDEV-247)
- Integrate H2O Help inside Help panel (PUBDEV-108) (github)
- Add quick toggle button to show/hide the sidebar (github)
- Add New, Open toolbar buttons (github)
- Auto-refresh data preview when parse setup input parameters are changed (PUBDEV-532)
- Flow: Add playbar with Run, Continue, Pause, Progress controls (HEXDEV-192)
- You can now stop/cancel a running flow
####Enhancements
#####Algorithms
- Display GLM coefficients only if available (PUBDEV-466)
- Add random chance line to RoC chart (HEXDEV-168)
- Speed up DLSpiral test. Ignore Neurons test (MatVec) (github)
- Use getRNG for Dropout (github)
- PUBDEV-598: Add tests for determinism of RNGs (github)
- PUBDEV-598: Implement Chi-Square test for RNGs (github)
- Add DL model output toString() (github)
- Add LogLoss to MultiNomial ModelMetrics (PUBDEV-580)
- Print number of categorical levels once we hit >1000 input neurons. (github)
- Updated the loss behavior for GBM. When loss is set to AUTO, if the response is an integer with 2 levels, then bernoullli (rather than gaussian) behavior is chosen. As a result, the
do_classification
flag is no longer necessary in Flow, since the loss completely specifies the desired behavior, and R users no longer to useas.factor()
in their response to get the desired bernoulli behavior. Thescore_each_iteration
flag has been removed as well. (github) - Fully remove
_convert_to_enum
in all algos (github) - Port MissingValueInserter EndPoint to h2o-dev. (PUBDEV-465)
#####API
- Display point layer for tree vs mse plots in GBM output (PUBDEV-504)
- Rename API inputs/outputs (github)
- Rename Inf to Infinity (github)
#####Python
- added H2OFrame.setNames(), H2OFrame.cbind(), H2OVec.cbind(), h2o.cbind(), and pyunit_cbind.py (github)
- Make H2OVec.levels() return the levels (github)
- H2OFrame.dim(), H2OFrame.append(), H2OVec.setName(), H2OVec.isna() additions. demo pyunit addition (github)
#####System
- Customize H2O web UI port (PUBDEV-483)
- Make parse setup interactive (PUBDEV-532)
- Added --verbose (github)
- Adds some H2OParseExceptions. Removes all H2O.fail in parse (no parse issues should cause a fail)(github)
- Allows parse to specify check_headers=HAS_HEADERS, but not provide column names (github)
- Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)
#####Web UI
- Add 'Clear cell' and 'Run all cells' toolbar buttons (github)
- Add 'Clear cell' and 'Clear all cells' commands (PUBDEV-493) (github)
- 'Run' button selects next cell after running
- ModelMetrics by model category: Clustering (PUBDEV-416)
- ModelMetrics by model category: Regression (PUBDEV-415)
- ModelMetrics by model category: Multinomial (PUBDEV-414)
- ModelMetrics by model category: Binomial (PUBDEV-413)
- Add ability to select and delete multiple models (github)
- Add ability to select and delete multiple frames (github)
- Flows now stop running when an error occurs
- Print full number of mismatches during POJO comparison check. (github)
- Make Grid multi-node safe (github)
- Beautify the vertical axis labels for Flow charts/visualization (more) (PUBDEV-329)
####Bug Fixes
#####Algorithms
- GBM only populates either MSE_train or MSE_valid but displays both (PUBDEV-350)
- GBM: train error increases after hitting zero on prostate dataset (PUBDEV-513)
- GBM : Variable importance displays 0's for response param => should not display response in table at all (PUBDEV-430)
- GLM : R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)
- Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
- Flow: GLM - 'model.output.coefficients_magnitude.name' not found, so can't view model (PUBDEV-466)
- GBM predict fails without response column (PUBDEV-478)
- GBM: When validation set is provided, gbm should report both mse_valid and mse_train (PUBDEV-499)
- PCA Assertion Error during Model Metrics (PUBDEV-548) (github)
- KMeans: Size of clusters in Model Output is different from the labels generated on the training set (PUBDEV-542) (github)
- Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
- PUBDEV-580: Fix some numerical edge cases (github)
- Fix two missing float -> double conversion changes in tree scoring. (github)
- Flow: HIDDEN_DROPOUT_RATIOS for DL does not show default value (PUBDEV-285)
- Old GLM Parameters Missing (PUBDEV-431)
- GLM: R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)
#####API
- SplitFrame on String column produce C0LChunk instead of CStrChunk (PUBDEV-468)
- Error in node$h2o$node : $ operator is invalid for atomic vectors (PUBDEV-348)
- Response from /ModelBuilders don't conform to standard error json shape when there are errors (HEXDEV-121) (github)
#####Python
- fix python syntax error (github)
- Fixes handling of None in python for a returned na_string. (github)
#####R
- R : Inconsistency - Train set name with and without quotes work but Validation set name with quotes does not work (PUBDEV-491)
- h2o.confusionmatrices does not work (PUBDEV-547)
- How do i convert an enum column back to integer/double from R? (PUBDEV-546)
- Summary in R is faulty (PUBDEV-539)
- R: as.h2o should preserve R data types (PUBDEV-578)
- NPE in GBM Prediction with Sliced Test Data (HEXDEV-207) (github)
- Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
- Custom Functions don't work in apply() in R (PUBDEV-436)
- got water.DException$DistributedException and then got java.lang.RuntimeException: Categorical renumber task (HEXDEV-195)
- H2O-R: as.h2o parses column name as one of the row entries (PUBDEV-591)
- R-H2O Managing Memory in a loop (PUB-1125)
- h2o.confusionMatrices for multinomial does not work (PUBDEV-577)
- H2O-R not showing meaningful error msg
#####System
- Flow: When balance class = F then flow should not show max_after_balance_size = 5 in the parameter listing (PUBDEV-503)
- 3 jvms, doing ModelMetrics on prostate, class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: --- Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122 (PUBDEV-495)
- Not able to start h2o on hadoop (PUBDEV-487)
- one row (one col) dataset seems to get assertion error in parse setup request (PUBDEV-96)
- Parse : Import file (move.com) => Parse => First row contains column names => column names not selected (HEXDEV-171) (github)
- The NY0 parse rule, in summary. Doesn't look like it's counting the 0's as NAs like h2o (PUBDEV-154)
- 0 / Y / N parsing (PUBDEV-229)
- NodePersistentStorage gets wiped out when laptop is restarted. (HEXDEV-167)
- Building a model and making a prediction accepts invalid frame types (PUBDEV-83)
- Flow : Import file 15M rows 2.2 Cols => Parse => Error fetching job on UI =>Console : ERROR: Job was not successful Exiting with nonzero exit status (HEXDEV-55)
- Flow : Build GLM Model => Family tweedy => class hex.glm.LSMSolver$ADMMSolver$NonSPDMatrixException', with msg 'Matrix is not SPD, can't solve without regularization (PUBDEV-211)
- Flow : Import File : File doesn't exist on all the hdfs nodes => Fails without valid message (PUBDEV-313)
- Check reproducibility on multi-node vs single-node (PUBDEV-557)
- Parse : After parsing Chicago crime dataset => Not able to build models or Get frames (PUBDEV-576)
#####Web UI
- Flow : Build Model => Parameters => shows meta text for some params (PUBDEV-505)
- Flow: K-Means - "None" option should not appear in "Init" parameters (PUBDEV-459)
- Flow: PCA - "None" option appears twice in "Transform" list (HEXDEV-186)
- GBM Model : Params in flow show two times (PUBDEV-440)
- Flow multinomial confusion matrix visualization (HEXDEV-204)
- Flow: It would be good if flow can report the actual distribution, instead of just reporting "Auto" in the model parameter listing (PUBDEV-509)
- Unimplemented algos should be taken out from drop down of build model (PUBDEV-511)
- [MapR] unable to give hdfs file name from Flow (PUBDEV-409)
###Selberg (0.2.0.1) - 3/6/15 ####New Features
#####Algorithms
- Naive Bayes in H2O-dev (PUBDEV-158)
- GLM model output, details from R (HEXDEV-94)
- Run GLM Regression from Flow (including LBFGS) (HEXDEV-110)
- PCA (PUBDEV-157)
- Port Random Forest to h2o-dev (PUBDEV-455)
- Enable DRF model output (github)
- Add DRF to Flow (Model Output) (PUBDEV-533)
- Grid for GBM (github)
- Run Deep Learning Regression from Flow (HEXDEV-109)
#####Python
- Add Python wrapper for DRF (PUBDEV-534)
#####R
- Add R wrapper for DRF (PUBDEV-530)
#####System
- Include uploadFile (PUBDEV-299) (github)
- Added -flow_dir to hadoop driver (github)
#####Web UI
- Add Flow packs (HEXDEV-190) (PUBDEV-247)
- Integrate H2O Help inside Help panel (PUBDEV-108) (github)
- Add quick toggle button to show/hide the sidebar (github)
- Add New, Open toolbar buttons (github)
- Auto-refresh data preview when parse setup input parameters are changed (PUBDEV-532) -Flow: Add playbar with Run, Continue, Pause, Progress controls (HEXDEV-192)
- You can now stop/cancel a running flow
####Enhancements
The following changes are improvements to existing features (which includes changed default values):
#####Algorithms
- Display GLM coefficients only if available (PUBDEV-466)
- Add random chance line to RoC chart (HEXDEV-168)
- Allow validation dataset for AutoEncoder (PUDEV-581)
- Speed up DLSpiral test. Ignore Neurons test (MatVec) (github)
- Use getRNG for Dropout (github)
- PUBDEV-598: Add tests for determinism of RNGs (github)
- PUBDEV-598: Implement Chi-Square test for RNGs (github)
- PUBDEV-580: Add log loss to binomial and multinomial model metric (github)
- Add DL model output toString() (github)
- Add LogLoss to MultiNomial ModelMetrics (PUBDEV-580)
- Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)
- Print number of categorical levels once we hit >1000 input neurons. (github)
- Updated the loss behavior for GBM. When loss is set to AUTO, if the response is an integer with 2 levels, then bernoullli (rather than gaussian) behavior is chosen. As a result, the
do_classification
flag is no longer necessary in Flow, since the loss completely specifies the desired behavior, and R users no longer to useas.factor()
in their response to get the desired bernoulli behavior. Thescore_each_iteration
flag has been removed as well. (github) - Fully remove
_convert_to_enum
in all algos (github) - Add DL POJO scoring (PUBDEV-585)
#####API
- Display point layer for tree vs mse plots in GBM output (PUBDEV-504)
- Rename API inputs/outputs (github)
- Rename Inf to Infinity (github)
#####Python
- added H2OFrame.setNames(), H2OFrame.cbind(), H2OVec.cbind(), h2o.cbind(), and pyunit_cbind.py (github)
- Make H2OVec.levels() return the levels (github)
- H2OFrame.dim(), H2OFrame.append(), H2OVec.setName(), H2OVec.isna() additions. demo pyunit addition (github)
#####R
- PUBDEV-578, PUBDEV-541, PUBDEV-566. -R client now sends the data frame column names and data types to ParseSetup. -R client can get column names from a parsed frame or a list. -Respects client request for column data types (github)
#####System
- Customize H2O web UI port (PUBDEV-483)
- Make parse setup interactive (PUBDEV-532)
- Added --verbose (github)
- Adds some H2OParseExceptions. Removes all H2O.fail in parse (no parse issues should cause a fail)(github)
- Allows parse to specify check_headers=HAS_HEADERS, but not provide column names (github)
- Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)
#####Web UI
- Add 'Clear cell' and 'Run all cells' toolbar buttons (github)
- Add 'Clear cell' and 'Clear all cells' commands (PUBDEV-493) (github)
- 'Run' button selects next cell after running
- ModelMetrics by model category: Clustering (PUBDEV-416)
- ModelMetrics by model category: Regression (PUBDEV-415)
- ModelMetrics by model category: Multinomial (PUBDEV-414)
- ModelMetrics by model category: Binomial (PUBDEV-413)
- Add ability to select and delete multiple models (github)
- Add ability to select and delete multiple frames (github)
- Flows now stop running when an error occurs
- Print full number of mismatches during POJO comparison check. (github)
- Make Grid multi-node safe (github)
- Beautify the vertical axis labels for Flow charts/visualization (more) (PUBDEV-329)
####Bug Fixes The following changes are to resolve incorrect software behavior:
#####Algorithms
- GBM only populates either MSE_train or MSE_valid but displays both (PUBDEV-350)
- GBM: train error increases after hitting zero on prostate dataset (PUBDEV-513)
- GBM : Variable importance displays 0's for response param => should not display response in table at all (PUBDEV-430)
- Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
- GLM : R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)
- Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
- Flow: GLM - 'model.output.coefficients_magnitude.name' not found, so can't view model (PUBDEV-466)
- GBM predict fails without response column (PUBDEV-478)
- GBM: When validation set is provided, gbm should report both mse_valid and mse_train (PUBDEV-499)
- PCA Assertion Error during Model Metrics (PUBDEV-548) (github)
- KMeans: Size of clusters in Model Output is different from the labels generated on the training set (PUBDEV-542) (github)
- Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
- divide by zero in modelmetrics for deep learning (PUBDEV-568)
- AUC reported on training data is 0, but should be 1 (HEXDEV-223) (github)
- GBM: reports 0th tree mse value for the validation set, different than the train set ,When only train sets is provided (PUDEV-561)
- PUBDEV-580: Fix some numerical edge cases (github)
- Fix two missing float -> double conversion changes in tree scoring. (github)
- Problems during Train/Test adaptation between Enum/Numeric (HEXDEV-229)
- DRF/GBM balance_classes=True throws unimplemented exception (HEXDEV-226)
- Flow: HIDDEN_DROPOUT_RATIOS for DL does not show default value (PUBDEV-285)
- Old GLM Parameters Missing (PUBDEV-431)
- GLM: R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)
- GBM: Initial mse in bernoulli seems to be off (PUBDEV-515)
#####API
- SplitFrame on String column produce C0LChunk instead of CStrChunk (PUBDEV-468)
- Error in node$h2o$node : $ operator is invalid for atomic vectors (PUBDEV-348)
- Response from /ModelBuilders don't conform to standard error json shape when there are errors (HEXDEV-121)
#####Python
- fix python syntax error (github)
- Fixes handling of None in python for a returned na_string. (github)
#####R
- R : Inconsistency - Train set name with and without quotes work but Validation set name with quotes does not work (PUBDEV-491)
- h2o.confusionmatrices does not work (PUBDEV-547)
- How do i convert an enum column back to integer/double from R? (PUBDEV-546)
- Summary in R is faulty (PUBDEV-539)
- Custom Functions don't work in apply() in R (PUBDEV-436)
- R: as.h2o should preserve R data types (PUBDEV-578)
- as.h2o loses track of headers (PUBDEV-541)
- NPE in GBM Prediction with Sliced Test Data (HEXDEV-207) (github)
- Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
- Custom Functions don't work in apply() in R (PUBDEV-436)
- got water.DException$DistributedException and then got java.lang.RuntimeException: Categorical renumber task (HEXDEV-195)
- h2o.confusionMatrices for multinomial does not work (PUBDEV-577)
- R: h2o.confusionMatrix should handle both models and model metric objects (PUBDEV-590)
- H2O-R: as.h2o parses column name as one of the row entries (PUBDEV-591)
#####System
- Flow: When balance class = F then flow should not show max_after_balance_size = 5 in the parameter listing (PUBDEV-503)
- 3 jvms, doing ModelMetrics on prostate, class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: --- Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122 (PUBDEV-495)
- Not able to start h2o on hadoop (PUBDEV-487)
- one row (one col) dataset seems to get assertion error in parse setup request (PUBDEV-96)
- Parse : Import file (move.com) => Parse => First row contains column names => column names not selected (HEXDEV-171) (github)
- The NY0 parse rule, in summary. Doesn't look like it's counting the 0's as NAs like h2o (PUBDEV-154)
- 0 / Y / N parsing (PUBDEV-229)
- NodePersistentStorage gets wiped out when laptop is restarted. (HEXDEV-167)
- Parse : Parsing random crap gives java.lang.ArrayIndexOutOfBoundsException: 13 (PUBDEV-428)
- Flow: converting a column to enum while parsing does not work (PUBDEV-566)
- Parse: Numbers completely parsed wrong (PUBDEV-574)
- NodePersistentStorage gets wiped out when hadoop cluster is restarted (HEXDEV-185)
- Parse: Fail gracefully when asked to parse a zip file with different files in it (PUBDEV-540)(github)
- Building a model and making a prediction accepts invalid frame types (PUBDEV-83)
- Flow : Import file 15M rows 2.2 Cols => Parse => Error fetching job on UI =>Console : ERROR: Job was not successful Exiting with nonzero exit status (HEXDEV-55)
- Flow : Build GLM Model => Family tweedy => class hex.glm.LSMSolver$ADMMSolver$NonSPDMatrixException', with msg 'Matrix is not SPD, can't solve without regularization (PUBDEV-211)
- Flow : Import File : File doesn't exist on all the hdfs nodes => Fails without valid message (PUBDEV-313)
- Check reproducibility on multi-node vs single-node (PUBDEV-557)
- Parse: After parsing Chicago crime dataset => Not able to build models or Get frames (PUBDEV-576)
#####Web UI
- Flow : Build Model => Parameters => shows meta text for some params (PUBDEV-505)
- Flow: K-Means - "None" option should not appear in "Init" parameters (PUBDEV-459)
- Flow: PCA - "None" option appears twice in "Transform" list (HEXDEV-186)
- GBM Model : Params in flow show two times (PUBDEV-440)
- Flow multinomial confusion matrix visualization (HEXDEV-204)
- Flow: It would be good if flow can report the actual distribution, instead of just reporting "Auto" in the model parameter listing (PUBDEV-509)
- Unimplemented algos should be taken out from drop down of build model (PUBDEV-511)
- [MapR] unable to give hdfs file name from Flow (PUBDEV-409)
###Selberg (0.2.0.1) - 3/6/15 ####New Features
#####Web UI
- Flow: Delete functionality to be available for import files, jobs, models, frames (PUBDEV-241)
- Implement "Download Flow" (PUBDEV-407)
- Flow: Implement "Run All Cells" (PUBDEV-110)
#####API
- Create python package (PUBDEV-181)
- as.h2o in Python (HEXDEV-72)
#####System
####Enhancements
#####Web UI
- Flow: Job view should have info on start and end time (PUBDEV-267)
- Flow: Implement 'File > Open' (PUBDEV-408)
- Display IP address in ADMIN -> Cluster Status (HEXDEV-159)
- Flow: Display alternate UI for splitFrames() (PUBDEV-399)
#####Algorithms
- Added K-Means scoring (github)
- Flow: Implement model output for Deep Learning (PUBDEV-118)
- Flow: Implement model output for GLM (PUBDEV-120)
- Deep Learning model output (HEXDEV-89, Flow),(HEXDEV-88, Python),(HEXDEV-87, R)
- Run GLM Binomial from Flow (including LBFGS) (HEXDEV-90)
- Flow: Display confusion matrices for multinomial models (PUBDEV-397)
- During PCA, missing values in training data will be replaced with column mean (github)
- Update parameters for best model scan (github)
- Change Quantiles to match h2o-1; both Quantiles and Rollups now have the same default percentiles (github)
- Massive cleanup and removal of old PCA, replacing with quadratically regularized PCA based on alternating minimization algorithm in GLRM (github)
- Add model run time to DL Model Output (github)
- Don't gather Neurons/Weights/Biases statistics (github)
- Only store best model if
override_with_best_model
is enabled (github) beta_eps
added, passing tests changed (github)- For GLM, default values for
max_iters
parameter were changed from 1000 to 50. - For quantiles, probabilities are displayed.
- Run Deep Learning Multinomial from Flow (HEXDEV-108)
#####API
- Expose DL weights/biases to clients via REST call (PUBDEV-344)
- Flow: Implement notification bar/API (PUBDEV-359)
- Variable importance data in REST output for GLM (PUBDEV-359)
- Add extra DL parameters to R API (
average_activation, sparsity_beta, max_categorical_features, reproducible
) (github) - Update GLRM API model output (github)
- h2o.anomaly missing in R (PUBDEV-434)
- No method to get enum levels (PUBDEV-432)
#####System
- Improve memory footprint with latest version of h2o-dev (github)
- For now, let model.delete() of DL delete its best models too. This allows R code to not leak when only calling h2o.rm() on the main model. (github)
- Bind both TCP and UDP ports before clustering (github)
- Round summary row#. Helps with pctiles for very small row counts. Add a test to check for getting close to the 50% percentile on small rows. (github)
- Increase Max Value size in DKV to 256MB (github)
- Flow: make parseRaw() do both import and parse in sequence (HEXDEV-184)
- Remove notion of individual job/job tracking from Flow (PUBDEV-449)
- Capability to name prediction results Frame in flow (PUBDEV-233)
####Bug Fixes
#####Algorithms
- GLM binomial prediction failing (PUBDEV-403)
- DL: Predict with auto encoder enabled gives Error processing error (PUBDEV-433)
- balance_classes in Deep Learning intermittent poor result (PUBDEV-437)
- Flow: Building GLM model fails (PUBDEV-186)
- summary returning incorrect 0.5 quantile for 5 row dataset (PUBDEV-95)
- GBM missing variable importance and balance-classes (PUBDEV-309)
- H2O Dev GBM first tree differs from H2O 1 (PUBDEV-421)
- get glm model from flow fails to find coefficient name field (PUBDEV-394)
- GBM/GLM build model fails on Hadoop after building 100% => Failed to find schema for version: 3 and type: GBMModel (PUBDEV-378)
- Parsing KDD wrong (PUBDEV-393)
- GLM AIOOBE (PUBDEV-199)
- Flow : Build GLM Model with family poisson => java.lang.ArrayIndexOutOfBoundsException: 1 at hex.glm.GLM$GLMLambdaTask.needLineSearch(GLM.java:359) (PUBDEV-210)
- Flow : GLM Model Error => Enum conversion only works on small integers (PUBDEV-365)
- GLM binary response, do_classfication=FALSE, family=binomial, prediction error (PUBDEV-339)
- Epsilon missing from GLM parameters (PUBDEV-354)
- GLM NPE (PUBDEV-395)
- Flow: GLM bug (or incorrect output) (PUBDEV-252)
- GLM binomial prediction failing (PUBDEV-403)
- GLM binomial on benign.csv gets assertion error in predict (PUBDEV-132)
- current summary default_pctiles doesn't have 0.001 and 0.999 like h2o1 (PUBDEV-94)
- Flow: Build GBM/DL Model: java.lang.IllegalArgumentException: Enum conversion only works on integer columns (PUBDEV-213) (github)
- ModelMetrics on cup98VAL_z dataset has response with many nulls (PUBDEV-214)
- GBM : Predict model category output/inspect parameters shows as Regression when model is built with do classification enabled (PUBDEV-441)
- Fix double-precision DRF bugs (github)
#####System
- Null columnTypes for /smalldata/arcene/arcene_train.data (PUBDEV-406) (github)
- Flow: Waiting for -1 responses after starting h2o on hadoop cluster of 5 nodes (PUBDEV-419)
- Parse: airlines_all.csv => Airtime type shows as ENUM instead of Integer (PUBDEV-426) (github)
- Flow: Typo - "Time" option displays twice in column header type menu in Parse (PUBDEV-446)
- Duplicate validation messages in k-means output (PUBDEV-305) (github)
- Fixes Parse so that it returns to supplying generic column names when no column names exist (github)
- Flow: Import File: File doesn't exist on all the hdfs nodes => Fails without valid message (PUBDEV-313)
- Flow: Parse => 1m.svm hangs at 42% (HEXDEV-174)
- Prediction NFE (PUBDEV-308)
- NPE doing Frame to key before it's fully parsed (PUBDEV-79)
h2o_master_DEV_gradle_build_J8
#351 hangs for past 17 hrs (PUBDEV-239)- Sparkling water - container exited due to unavailable port (PUBDEV-357)
#####API
- Flow: Splitframe => java.lang.ArrayIndexOutOfBoundsException (PUBDEV-410) (github)
- Incorrect dest.type, description in /CreateFrame jobs (PUBDEV-404)
- space in windows filename on python (PUBDEV-444) (github)
- Python end-to-end data science example 1 runs correctly (PUBDEV-182)
- 3/NodePersistentStorage.json/foo/id should throw 404 instead of 500 for 'not-found' (HEXDEV-163)
- POST /3/NodePersistentStorage.json should handle Content-Type:multipart/form-data (HEXDEV-165)
- by class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: --- Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122 (PUBDEV-92)
- Sparkling water : val train:DataFrame = prostateRDD => Fails with ArrayIndexOutOfBoundsException (PUBDEV-392)
- Flow : getModels produces error: Error calling GET /3/Models.json (PUBDEV-254)
- Flow : Splitframe => java.lang.ArrayIndexOutOfBoundsException (PUBDEV-410)
- ddply 'Could not find the operator' (HEXDEV-162) (github)
- h2o.table AIOOBE during NewChunk creation (HEXDEV-161) (github)
- Fix warning in h2o.ddply when supplying multiple grouping columns (github)
###0.1.26.1051 - 2/13/15
####New Features
- Flow: Display alternate UI for splitFrames() (PUBDEV-399)
####Enhancements
#####System
- Embedded H2O config can now provide flat file (needed for Hadoop) (github)
- Don't logging GET of individual jobs to avoid filling up the logs (github)
#####Algorithms
- Increase GBM/DRF factor binning back to historical levels. Had been capped accidentally at nbins (typically 20), was intended to support a much higher cap. (github)
- Tweaked rho heuristic in glm (github)
- Enable variable importances for autoencoders (github)
- Removed
group_split
option from GBM - Flow: display varimp for GBM output (PUBDEV-398)
- variable importance for GBM (github)
- GLM in H2O-Dev may provide slightly different coefficient values when applying an L1 penalty in comparison with H2O1.
####Bug Fixes
#####Algorithms
- Fixed bug in GLM exception handling causing GLM jobs to hang (github)
- Fixed a bug in kmeans input parameter schema where init was always being set to Furthest (github)
- Fixed mean computation in GLM (github)
- Fixed kmeans.R (github)
- Flow: Building GBM model fails with Error executing javascript (PUBDEV-396)
#####System
###0.1.26.1032 - 2/6/15
####New Features
#####General Improvements
- better model output
- support for Python client
- support for Maven
- support for Sparkling Water
- support for REST API schema
- support for Hadoop CDH5 (github)
#####UI
- Display summary visualizations by default in column summary output cells (PUBDEV-337)
- Display AUC curve by default in binomial prediction output cells (PUBDEV-338)
- Flow: Implement About H2O/Flow with version information (PUBDEV-111)
- Add UI for CreateFrame (PUBDEV-218)
- Flow: Add ability to cancel running jobs (PUBDEV-373)
- Flow: warn when user navigates away while having unsaved content (PUBDEV-322)
#####Algorithms
- Implement splitFrame() in Flow (PUBDEV-356)
- Variable importance graph in Flow for GLM (PUBDEV-360)
- Flow: Implement model building form init and validation (PUBDEV-102)
- Added a shuffle-and-split-frame function; Use it to build a saner model on time-series data (github)
- Added binomial model metrics (github)
- Run KMeans from R (HEXDEV-105)
- Be able to create a new GLM model from an existing one with updated coefficients (HEXDEV-48)
- Run KMeans from Python (HEXDEV-106)
- Run Deep Learning Binomial from Flow (HEXDEV-83)
- Run KMeans from Flow (HEXDEV-104)
- Run Deep Learning from Python (HEXDEV-85)
- Run Deep Learning from R (HEXDEV-84)
- Run Deep Learning Multinomial from Flow (HEXDEV-108)
- Run Deep Learning Regression from Flow (HEXDEV-109)
#####API
- Flow: added REST API documentation to the web ui (PUBDEV-60)
- Flow: Implement visualization API (PUBDEV-114)
#####System
- Dataset inspection from Flow (HEXDEV-66)
- Basic data munging (Rapids) from R (HEXDEV-70)
- Implement stack operator/stacking in Lightning (HEXDEV-128)
####Enhancements
#####UI
- Added better message when h2o.init() not yet called (
No active connection to an H2O cluster. Try calling "h2o.init()"
) (github)
#####Algorithms
- Updated column-based gradient task to use sparse interface (github)
- Updated LBFGS (added progress monitor interface, updated some default params), added progress and job support to GLM lbfgs (github)
- Added pretty print (github)
- Added AutoEncoder to R model categories (github)
- Added Coefficients table to GLM model (github)
- Updated glm lbfgs to allow for efficient lambda-search (l2 penalty only) (github)
- Removed splitframe shuffle parameter (github)
- Simplified model builders and added deeplearning model builder (github)
- Add DL model outputs to Flow (PUBDEV-372)
- Flow: Deep Learning: Expert Mode (PUBDEV-284)
- Flow: Display multinomial and regression DL model outputs (PUBDEV-383)
- Display varimp details for DL models (PUBDEV-381)
- Make binomial response "0" and "1" by default (github)
- Add Coefficients table to GLM model (github)
- Removed splitframe shuffle parameter (github)
- Update R GBM demos to reflect new input parameter names (github)
- Rename GLM variable importance to normalized coefficient magnitudes (github)
#####API
- Changed
key
todestination_key
(github) - Cleaned up REST API schema interface (github)
- Changed method name, cleaned setup, added a pyunit runner (github)
#####System
- Allow changing column types during parse-setup (PUBDEV-376)
- Display %NAs in model builder column lists (PUBDEV-375)
- Figure out how to add H2O to PyPl (PUBDEV-178)
####Bug Fixes
#####UI
- Flow: Parse => 1m.svm hangs at 42% (PUBDEV-345)
- cup98 Dataset has columns that prevent validation/prediction (PUBDEV-349)
- Flow: predict step failed to function (PUBDEV-217)
- Flow: Arrays of numbers (ex. hidden in deeplearning)require brackets (PUBDEV-303)
- Flow v.0.1.26.1030: StackTrace was broken (PUBDEV-371)
- Flow: Import files -> Search -> Parse these files -> null pointer exception (PUBDEV-170)
- Flow: "getJobs" not working (PUBDEV-320)
- Thresholds x Metrics and Max Criteria x Metrics tables were flipped in flow (HEXDEV-155)
- Flow v.0.1.26.1030: StackTrace is broken (PUBDEV-348)
- flow: getJobs always shows "Your H2O cloud has no jobs" (PUBDEV-243)
- Flow: First and last characters deleted from ignored columns (PUBDEV-300)
- Sparkling water => Flow => Menu buttons for cell do not show up (PUBDEV-294)
#####Algorithms
- Flow: Build K Means model with default K value gives error "Required field k not specified" (PUBDEV-167)
- Slicing out a specific data point is broken (PUBDEV-280)
- Flow: SplitFrame and grep in algorithms for flow and loops back onto itself (PUBDEV-272)
- Fixed the predict method (github)
- Refactor ModelMetrics into a different class for Binomial (github)
- /Predictions.json did not cache predictions (HEXDEV-119)
- Flow, DL: Error after changing hidden layer size (PUBDEV-323)
- Error in node$h2o#node: $ operator is invalid for atomic vectors (PUBDEV-348)
- Fixed K-means predict (PUBDEV-321)
- Flow: DL build mode fails => as it's missing adding quotes to parameter (PUBDEV-301)
- Flow: Build K means model with training/validation frames => unknown error (PUBDEV-185)
- Flow: Build quantile mode=> Click goes in loop (PUBDEV-188)
#####API
- Sparkling Water/Flow: Failed to find version for schema (PUBDEV-367)
- Cloud.json returns odd node name (PUBDEV-259)
#####System
- guesser needs to send types to parse (PUBDEV-279)
- Got h2o.clusterStatus function working in R. (github)
- Parse: Using R => java.lang.NullPointerException (PUBDEV-380)
- Flow: Jobs => click on destination key => unimplemented: Unexpected val class for Inspect: class water.fvec.DataFrame (PUBDEV-363)
- Column assignment in R exposes NullPointerException in Rollup (PUBDEV-155)
- import from hdfs doesn't add files (PUBDEV-260)
- AssertionError: ERROR: got tcp resend with existing in-progress task (PUBDEV-219)
- HDFS parse fails when H2O launched on Spark CDH5 (PUBDEV-138)
- Flow: Parse failure => java.lang.ArrayIndexOutOfBoundsException (PUBDEV-296)
- "predict" step is not working in flow (PUBDEV-202)
- Flow: Frame finishes parsing but comes up as null in flow (PUBDEV-270)
- scala >flightsToORD.first() fails with "not serializable result" (PUBDEV-304)
- DL throws NPE for bad column names (PUBDEV-15)
- Flow: Build model: Not able to build KMeans/Deep Learning model (PUBDEV-297)
- Flow: Col summary for NA/Y cols breaks (PUBDEV-325)
- Sparkling Water : util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread NanoHTTPD Session,9,main (PUBDEV-346)
- toDataFrame doesn't support sequence format schema (array, vectorUDT) (PUBDEV-457)
###0.1.20.1019 - 1/19/15
####New Features
#####UI
- Added various documentation links to the build page (github)
#####Algorithms
- Ported matrix multiply over and connected it to rapids (github)
####Enhancements
#####UI
- Allow user to specify (the log of) the number of rows per chunk for a new constant chunk; use this new function in CreateFrame (github)
- Make CreateFrame non-blocking, now displays progress bar in Flow (github)
- Add row and column count to H2OFrame show method (github)
- Admin watermeter page (PUBDEV-234)
- Admin stack trace (PUBDEV-228)
- Admin profile (PUBDEV-227)
- Flow: Add download logs in UI (PUBDEV-204)
- Need shutdown, minimally like h2o (PUBDEV-74)
#####API
- Changed 2 to 3 for JSON requests (github)
- Rename some more fields per consistency (
max_iters
changed tomax_iterations
,_iters
to_iterations
,_ncats
to_categorical_column_count
,_centersraw
tocenters_raw
,_avgwithinss
totot_withinss
,_withinmse
towithinss
) (github) - Changed K-Means output parameters (
withinmse
towithin_mse
,avgss
toavg_ss
,avgbetweenss
toavg_between_ss
) (github) - Remove default field values from DeepLearning parameters schema, since they come from the backing class (github)
- Add @API help annotation strings to JSON model output (PUBDEV-216)
#####Algorithms
- Minor fix in rapids matrix multiplicaton (github)
- Updated sparse chunk to cut off binary search for prefix/suffix zeros (github)
- Updated L_BFGS for GLM - warm-start solutions during lambda search, correctly pass current lambda value, added column-based gradient task (github)
- Fix model parameters' default values in the metadata (github)
- Set default value of k = number of clusters to 1 for K-Means (PUBDEV-251)
#####System
- Reject any training data with non-numeric values from KMeans model building (github)
####Bug Fixes
#####API
- Fixed isSparse call for constant chunks (github)
- Fixed sparse interface of constant chunks (no nonzero if const 1= 0) (github)
#####System
- Typeahead for folder contents apparently requires trailing "/" (github)
- Fix build and instructions for R install.packages() style of installation; Note we only support source installs now (github)
- Fixed R test runner h2o package install issue that caused it to fail to install on dev builds (github)
###0.1.18.1013 - 1/14/15
####New Features
#####UI
- Admin timeline (PUBDEV-226)
- Admin cluster status (PUBDEV-225)
- Markdown cells should auto run when loading a saved Flow notebook (PUBDEV-87)
- Complete About page to include info about the H2O version (PUBDEV-223)
####Enhancements
#####Algorithms
- Flow: Implement model output for GBM (PUBDEV-119)
###0.1.20.1016 - 12/28/14
- Added ip_port field in node json output for Cloud query (github)