Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SW-449] Support for Sparse Vectors and Upgrade to H2O 3.14.0.2 #334

Merged
merged 4 commits into from
Aug 23, 2017

Conversation

jakubhava
Copy link
Contributor

@jakubhava jakubhava commented Jul 11, 2017

SparkDataFrameConverter refactored. Now we first create flat dataframe in spark to simplify handling on sw side.

The refactoring was needed in order to separate handling of vectors.

An improvement of this solution would be to drop the flag whether the vector is sparse or dense and make it part of the expected types

This change requires h2oai/h2o-3#1348

@jakubhava
Copy link
Contributor Author

jakubhava commented Jul 12, 2017

Still some tests are failing on external cluster. I will fix them tonight or tomorrow at last, but getting to the final state. Benchmarks seems to be in favour of this change

@jakubhava jakubhava force-pushed the jh/jira/sw-449 branch 2 times, most recently from 440547c to 2f40a54 Compare July 13, 2017 11:05
@jakubhava
Copy link
Contributor Author

Ready for review, this is final state

@jakubhava
Copy link
Contributor Author

@mmalohlava continuous-integration/jenkins/pr-head shouldn't succeed as this change depends on different H2O version, even during compilation

@jakubhava jakubhava changed the title [SW-449] Support for Sparse Vectors [SW-449] Support for Sparse Vectors and Upgrade to H2O 3.14.0.1 Aug 11, 2017
ExternalFrameUtils.EXPECTED_STRING
} else if (clazz == classOf[java.sql.Timestamp]) {
ExternalFrameUtils.EXPECTED_TIMESTAMP
} else if (clazz == classOf[org.apache.spark.mllib.linalg.Vector]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future: what about about org.apache.spark.ml.linalg.Vector ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oki, found it, you are doing Vectors.toML(..) coversion before.

@mmalohlava
Copy link
Member

Right now blocked by:
SW-516, SW-515

jakubhava and others added 3 commits August 22, 2017 10:34
Upgrade of H2O to 3.14.0.2
  - More tests for RDD[Vector]
  - Missing header
  - Add automl into assembly
  - More tests.
  - Align test data to given lenght
  - More testing and testing and testing
  - Enable assertions for H2O launched for external model cluster tests
@mmalohlava mmalohlava changed the title [SW-449] Support for Sparse Vectors and Upgrade to H2O 3.14.0.1 [SW-449] Support for Sparse Vectors and Upgrade to H2O 3.14.0.2 Aug 23, 2017
@mmalohlava mmalohlava merged commit 838c2d0 into master Aug 23, 2017
@mmalohlava mmalohlava deleted the jh/jira/sw-449 branch August 23, 2017 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants