Use compact and compressed model json by default #375

gerashegalov · 2019-07-29T23:21:03Z

Related issues
Fixes #374

Describe the proposed solution
use compact serialization, apply gzip to it.

Describe alternatives you've considered
make this behavior configurable

Additional context
In a test scenario output is reduced from 1.6M to 188K

codecov · 2019-07-29T23:41:02Z

Codecov Report

Merging #375 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #375      +/-   ##
==========================================
+ Coverage   86.77%   86.79%   +0.01%     
==========================================
  Files         336      336              
  Lines       10921    10922       +1     
  Branches      342      577     +235     
==========================================
+ Hits         9477     9480       +3     
+ Misses       1444     1442       -2

Impacted Files	Coverage Δ
...cala/com/salesforce/op/OpWorkflowModelWriter.scala	`100% <100%> (ø)`	⬆️
...es/src/main/scala/com/salesforce/op/OpParams.scala	`89.79% <0%> (+4.08%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 82bb2c1...527b3a1. Read the comment docs.

tovbinm

Don’t you need to mention the gzip codec when reading the model?

gerashegalov · 2019-07-30T05:40:32Z

Don’t you need to mention the gzip codec when reading the model?

It's handled by Hadoop TextInputFormat based on the filename extension transparently. On the write path the filename extension is appended based on compression codec. On the read path, you can even mix files with different compression codecs/extensions and uncompressed files in the same dir.

Bug fixes: - Ensure correct metrics despite model failures on some CV folds [#404](#404) - Fix flaky `ModelInsight` tests [#395](#395) - Avoid creating `SparseVector`s for LOCO [#377](#377) New features / updates: - Model combiner [#385](#399) - Added new sample for HousingPrices [#365](#365) - Test to verify that custom metrics appear in model insight metrics [#387](#387) - Add `FeatureDistribution` to `SerializationFormat`s [#383](#383) - Add metadata to `OpStandadrdScaler` to allow for descaling [#378](#378) - Improve json serde error in `evalMetFromJson` [#380](#380) - Track mean & standard deviation as metrics for numeric features and for text length of text features [#354](#354) - Making model selectors robust to failing models [#372](#372) - Use compact and compressed model json by default [#375](#375) - Descale feature contribution for Linear Regression & Logistic Regression [#345](#345) Dependency updates: - Update tika version [#382](#382)

gerashegalov requested review from leahmcguire and tovbinm as code owners July 29, 2019 23:21

salesforce-cla bot added the cla:signed label Jul 29, 2019

gerashegalov added the ready for review label Jul 29, 2019

tovbinm reviewed Jul 30, 2019

View reviewed changes

Use compact and compressed model json by default

527b3a1

gerashegalov force-pushed the gera/model-json-out branch from adec34d to 527b3a1 Compare July 30, 2019 05:42

tovbinm approved these changes Jul 30, 2019

View reviewed changes

tovbinm merged commit b505ff7 into salesforce:master Jul 30, 2019

gerashegalov deleted the gera/model-json-out branch July 31, 2019 18:27

gerashegalov mentioned this pull request Sep 8, 2019

0.6.1 release #403

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use compact and compressed model json by default #375

Use compact and compressed model json by default #375

gerashegalov commented Jul 29, 2019

codecov bot commented Jul 29, 2019 •

edited

Loading

tovbinm left a comment

gerashegalov commented Jul 30, 2019

Use compact and compressed model json by default #375

Use compact and compressed model json by default #375

Conversation

gerashegalov commented Jul 29, 2019

codecov bot commented Jul 29, 2019 • edited Loading

Codecov Report

tovbinm left a comment

Choose a reason for hiding this comment

gerashegalov commented Jul 30, 2019

codecov bot commented Jul 29, 2019 •

edited

Loading