feat: Add the option to get Feature Contributions in LightGBMBooster used by LightGBMRanker #791

JoanFM · 2020-01-30T17:39:31Z

feat: This PR tries to add the option to get feature contribution for a given score. For now I applied it to lightGBMRanker but it might be useful for other models, but I am not sure.

test: I have added tests in VerifyLightGBMRanker to ensure the features shap length and results are expected.

Please let me know if you have any doubt or any request for me to change in this PR. I hope you find this PR helpful.

…er models

…s in LightGBMRanker

imatiach-msft · 2020-01-30T18:08:01Z

/azp run

azure-pipelines · 2020-01-30T18:08:11Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2020-01-30T18:11:39Z

@JoanFM this is very interesting! As someone who was worked a lot on https://github.com/slundberg/shap and https://github.com/interpretml/interpret-community lately, it's very exciting to see this contribution in mmlspark. It would be even more amazing if we could add a spark-based implementation of TreeExplainer that works with the SparkML tree-based models natively.

imatiach-msft · 2020-01-30T18:18:39Z

(similar to the lime explainer implementation that @mhamilton723 worked on in mmlspark)

src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMBooster.scala

imatiach-msft · 2020-01-30T18:29:34Z

/azp run

azure-pipelines · 2020-01-30T18:29:44Z

Azure Pipelines successfully started running 1 pipeline(s).

codecov · 2020-01-30T18:36:35Z

Codecov Report

Merging #791 into master will decrease coverage by 42.10%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           master     #791       +/-   ##
===========================================
- Coverage   52.40%   10.30%   -42.11%     
===========================================
  Files         241      185       -56     
  Lines        9704     8504     -1200     
  Branches      529      525        -4     
===========================================
- Hits         5085      876     -4209     
- Misses       4619     7628     +3009

Impacted Files	Coverage Δ
...ain/scala/org/apache/spark/ml/param/UDFParam.scala	`0.00% <0.00%> (-100.00%)`	⬇️
...n/scala/org/apache/spark/ml/param/UDPyFParam.scala	`0.00% <0.00%> (-100.00%)`	⬇️
...cala/com/microsoft/ml/spark/io/binary/Binary.scala	`0.00% <0.00%> (-100.00%)`	⬇️
...cala/org/apache/spark/ml/param/DataTypeParam.scala	`0.00% <0.00%> (-100.00%)`	⬇️
...ala/org/apache/spark/ml/param/ByteArrayParam.scala	`0.00% <0.00%> (-100.00%)`	⬇️
...ala/org/apache/spark/ml/param/DataFrameParam.scala	`0.00% <0.00%> (-100.00%)`	⬇️
...ala/org/apache/spark/ml/param/EstimatorParam.scala	`0.00% <0.00%> (-100.00%)`	⬇️
...ala/org/apache/spark/ml/param/EvaluatorParam.scala	`0.00% <0.00%> (-100.00%)`	⬇️
...la/org/apache/spark/ml/param/ParamSpaceParam.scala	`0.00% <0.00%> (-100.00%)`	⬇️
...a/org/apache/spark/ml/param/TransformerParam.scala	`0.00% <0.00%> (-100.00%)`	⬇️
... and 200 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 875f89d...cf092d6. Read the comment docs.

…d LeafIndex usage

imatiach-msft · 2020-01-31T15:57:17Z

/azp run

azure-pipelines · 2020-01-31T15:57:28Z

Azure Pipelines successfully started running 1 pipeline(s).

JoanFM · 2020-01-31T16:36:16Z

@imatiach-msft is it normal to have cognitive UnitTests failing or is it just some system unstability?

imatiach-msft · 2020-01-31T16:37:11Z

probably some system unstability

imatiach-msft · 2020-01-31T16:37:22Z

/azp run

azure-pipelines · 2020-01-31T16:37:32Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2020-01-31T16:38:52Z

tagging @mhamilton723 for the cognitive test failures

imatiach-msft · 2020-01-31T17:20:27Z

@mhamilton723 any idea why the cognitive tests are failing? might you be able to take a look? Those test failures seem to be unrelated to this PR.

imatiach-msft · 2020-01-31T17:20:43Z

/azp run

azure-pipelines · 2020-01-31T17:20:55Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2020-02-10T17:47:04Z

@JoanFM could you please resolve the conflicts in src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMBooster.scala? Thank you!

imatiach-msft · 2020-02-10T19:00:17Z

/azp run

azure-pipelines · 2020-02-10T19:00:28Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2020-02-10T21:44:42Z

@JoanFM sorry, could you again resolve the conflicts in src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMBooster.scala? It's due to the new PR you had which was just merged. Thank you!

JoanFM · 2020-02-11T07:50:01Z

@JoanFM sorry, could you again resolve the conflicts in src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMBooster.scala? It's due to the new PR you had which was just merged. Thank you!

I just did, I was expecting to have conflicts with this set of PRs

imatiach-msft · 2020-02-11T15:35:09Z

/azp run

azure-pipelines · 2020-02-11T15:35:30Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2020-02-11T15:57:29Z

@JoanFM looks like there was some compilation failure:

[error] /home/vsts/work/1/s/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMBooster.scala:198:24: type mismatch;
[error] found : ThreadLocal[com.microsoft.ml.spark.lightgbm.LongLongNativePtrHandler]
[error] required: com.microsoft.ml.lightgbm.SWIGTYPE_p_long_long
[error] boosterHandler.scoredDataLengthLongPtr, boosterHandler.scoredDataOutPtr)
[error] required: com.microsoft.ml.lightgbm.SWIGTYPE_p_double
[error] boosterHandler.shapDataLengthLongPtr, boosterHandler.shapDataOutPtr)
[error] ^
[error] /home/vsts/work/1/s/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMBooster.scala:222:24: type mismatch;
[error] found : ThreadLocal[com.microsoft.ml.spark.lightgbm.LongLongNativePtrHandler]
[error] required: com.microsoft.ml.lightgbm.SWIGTYPE_p_long_long
[error] boosterHandler.shapDataLengthLongPtr, boosterHandler.shapDataOutPtr)
[error] ^
[error] /home/vsts/work/1/s/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMBooster.scala:222:62: type mismatch;
[error] found : ThreadLocal[com.microsoft.ml.spark.lightgbm.DoubleNativePtrHandler]
[error] required: com.microsoft.ml.lightgbm.SWIGTYPE_p_double
[error] boosterHandler.shapDataLengthLongPtr, boosterHandler.shapDataOutPtr)
[error] ^
[error] /home/vsts/work/1/s/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMBooster.scala:224:32: type mismatch;
[error] found : ThreadLocal[com.microsoft.ml.spark.lightgbm.DoubleNativePtrHandler]
[error] required: com.microsoft.ml.lightgbm.SWIGTYPE_p_double
[error] shapToArray(boosterHandler.shapDataOutPtr)
[error] ^
[error] 15 errors found
[error] (Compile / compileIncremental) Compilation failed
[error] Total time: 68 s, completed Feb 11, 2020 3:42:48 PM

src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMBooster.scala

imatiach-msft · 2020-02-11T15:59:56Z

@JoanFM I made a suggestion for where the fix needs to be added

imatiach-msft · 2020-02-11T16:29:44Z

/azp run

azure-pipelines · 2020-02-11T16:29:55Z

Azure Pipelines successfully started running 1 pipeline(s).

…used by LightGBMRanker (microsoft#791) * Allow LightGBMRanker to compute features shap * Take featureShapGetter into trait that potentially can be used by other models * Fix data used to be the one for shap and add tests for getShapFeatures in LightGBMRanker * Fix issues with merge conflict resolution * Refactor to share predictForMat and predictForCSR from Score, Shap and LeafIndex usage * Fix compilation issue

joanfontanals added 2 commits January 30, 2020 17:40

Allow LightGBMRanker to compute features shap

8f09937

Take featureShapGetter into trait that potentially can be used by oth…

0d9084b

…er models

JoanFM requested a review from imatiach-msft as a code owner January 30, 2020 17:39

Fix data used to be the one for shap and add tests for getShapFeature…

a191417

…s in LightGBMRanker

JoanFM force-pushed the predict_contrib branch from b0e881a to a191417 Compare January 30, 2020 17:40

Merge branch 'master' into predict_contrib

d5d05f8

JoanFM closed this Jan 30, 2020

JoanFM reopened this Jan 30, 2020

JoanFM force-pushed the predict_contrib branch 2 times, most recently from 51d9e5f to 148ed60 Compare January 30, 2020 18:16

imatiach-msft reviewed Jan 30, 2020

View reviewed changes

src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMBooster.scala Outdated Show resolved Hide resolved

Fix issues with merge conflict resolution

c7f6a3d

JoanFM force-pushed the predict_contrib branch from 148ed60 to c7f6a3d Compare January 30, 2020 18:25

imatiach-msft reviewed Jan 30, 2020

View reviewed changes

src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMBooster.scala Outdated Show resolved Hide resolved

Refactor to share predictForMat and predictForCSR from Score, Shap an…

8cec78b

…d LeafIndex usage

Solve merge conflicts

796f738

Merge branch 'master' into predict_contrib

acb781d

imatiach-msft reviewed Feb 11, 2020

View reviewed changes

src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMBooster.scala Outdated Show resolved Hide resolved

Fix compilation issue

cf092d6

JoanFM force-pushed the predict_contrib branch from c752543 to cf092d6 Compare February 11, 2020 16:27

imatiach-msft approved these changes Feb 11, 2020

View reviewed changes

imatiach-msft merged commit f702921 into microsoft:master Feb 11, 2020

JoanFM deleted the predict_contrib branch February 11, 2020 18:19

candalfigomoro mentioned this pull request Mar 4, 2020

LightGBM SHAP values #468

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add the option to get Feature Contributions in LightGBMBooster used by LightGBMRanker #791

feat: Add the option to get Feature Contributions in LightGBMBooster used by LightGBMRanker #791

JoanFM commented Jan 30, 2020

imatiach-msft commented Jan 30, 2020

azure-pipelines bot commented Jan 30, 2020

imatiach-msft commented Jan 30, 2020

imatiach-msft commented Jan 30, 2020

imatiach-msft commented Jan 30, 2020

azure-pipelines bot commented Jan 30, 2020

codecov bot commented Jan 30, 2020 •

edited

imatiach-msft commented Jan 31, 2020

azure-pipelines bot commented Jan 31, 2020

JoanFM commented Jan 31, 2020

imatiach-msft commented Jan 31, 2020 •

edited

imatiach-msft commented Jan 31, 2020

azure-pipelines bot commented Jan 31, 2020

imatiach-msft commented Jan 31, 2020

imatiach-msft commented Jan 31, 2020

imatiach-msft commented Jan 31, 2020

azure-pipelines bot commented Jan 31, 2020

imatiach-msft commented Feb 10, 2020

imatiach-msft commented Feb 10, 2020

azure-pipelines bot commented Feb 10, 2020

imatiach-msft commented Feb 10, 2020

JoanFM commented Feb 11, 2020

imatiach-msft commented Feb 11, 2020

azure-pipelines bot commented Feb 11, 2020

imatiach-msft commented Feb 11, 2020

imatiach-msft commented Feb 11, 2020

imatiach-msft commented Feb 11, 2020

azure-pipelines bot commented Feb 11, 2020

feat: Add the option to get Feature Contributions in LightGBMBooster used by LightGBMRanker #791

feat: Add the option to get Feature Contributions in LightGBMBooster used by LightGBMRanker #791

Conversation

JoanFM commented Jan 30, 2020

imatiach-msft commented Jan 30, 2020

azure-pipelines bot commented Jan 30, 2020

imatiach-msft commented Jan 30, 2020

imatiach-msft commented Jan 30, 2020

imatiach-msft commented Jan 30, 2020

azure-pipelines bot commented Jan 30, 2020

codecov bot commented Jan 30, 2020 • edited

Codecov Report

imatiach-msft commented Jan 31, 2020

azure-pipelines bot commented Jan 31, 2020

JoanFM commented Jan 31, 2020

imatiach-msft commented Jan 31, 2020 • edited

imatiach-msft commented Jan 31, 2020

azure-pipelines bot commented Jan 31, 2020

imatiach-msft commented Jan 31, 2020

imatiach-msft commented Jan 31, 2020

imatiach-msft commented Jan 31, 2020

azure-pipelines bot commented Jan 31, 2020

imatiach-msft commented Feb 10, 2020

imatiach-msft commented Feb 10, 2020

azure-pipelines bot commented Feb 10, 2020

imatiach-msft commented Feb 10, 2020

JoanFM commented Feb 11, 2020

imatiach-msft commented Feb 11, 2020

azure-pipelines bot commented Feb 11, 2020

imatiach-msft commented Feb 11, 2020

imatiach-msft commented Feb 11, 2020

imatiach-msft commented Feb 11, 2020

azure-pipelines bot commented Feb 11, 2020

codecov bot commented Jan 30, 2020 •

edited

imatiach-msft commented Jan 31, 2020 •

edited