Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[java][mmlspark] Fix cached predictor causing bad values for predicted probabilities #2356

Merged
merged 3 commits into from Aug 30, 2019

Conversation

imatiach-msft
Copy link
Contributor

In mmlspark, the predictied probabilities column was appearing the same as raw predicted values when both columns were computed from the model. This is due to a caching bug in the native code logic that was recently added. I added an array of predictors, one for each type, to fix the caching bug - and also added some additional checks in case the cached predictor needs to be reinitialized (not used by mmlspark directly).

@imatiach-msft
Copy link
Contributor Author

adding @eisber

src/c_api.cpp Outdated
int64_t single_row_num_pred_in_one_row_;
std::unique_ptr<Predictor> single_row_predictor_[PREDICTOR_TYPES];
bool single_row_predictor_pred_early_stop[PREDICTOR_TYPES];
int single_row_predictor_pred_early_stop_freq[PREDICTOR_TYPES];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to confirm, will these arrays be freed correctly with the class on the heap once class is deleted?

Copy link
Contributor

@eisber eisber Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those are fixed size arrays and they'll be deallocated directly.

I'd suggest to use create a

  • new class containing the relevant subset (unique_ptr, bool, int)
  • implement operator== (const Config&) and operator=(const Config&)
  • would make the assignment and comparison a bit more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eisber I already created a new class containing those variables, can you take a look at the latest commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also for comparison "implement operator== (const Config&) and operator=(const Config&)" I created a method since I needed to compare other variables besides the Config

src/c_api.cpp Outdated Show resolved Hide resolved
@guolinke
Copy link
Collaborator

I remember the single row prediction is added by @eisber
ping @eisber to review.

src/c_api.cpp Outdated Show resolved Hide resolved
src/c_api.cpp Outdated Show resolved Hide resolved
@guolinke
Copy link
Collaborator

@imatiach-msft Thanks! much cleaner now.

src/c_api.cpp Outdated Show resolved Hide resolved
@imatiach-msft imatiach-msft changed the title [mmlspark] Fix cached predictor causing bad values for predicted probabilities [java][mmlspark] Fix cached predictor causing bad values for predicted probabilities Aug 29, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants