New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[java][mmlspark] Fix cached predictor causing bad values for predicted probabilities #2356
[java][mmlspark] Fix cached predictor causing bad values for predicted probabilities #2356
Conversation
adding @eisber |
src/c_api.cpp
Outdated
int64_t single_row_num_pred_in_one_row_; | ||
std::unique_ptr<Predictor> single_row_predictor_[PREDICTOR_TYPES]; | ||
bool single_row_predictor_pred_early_stop[PREDICTOR_TYPES]; | ||
int single_row_predictor_pred_early_stop_freq[PREDICTOR_TYPES]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to confirm, will these arrays be freed correctly with the class on the heap once class is deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those are fixed size arrays and they'll be deallocated directly.
I'd suggest to use create a
- new class containing the relevant subset (unique_ptr, bool, int)
- implement operator== (const Config&) and operator=(const Config&)
- would make the assignment and comparison a bit more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eisber I already created a new class containing those variables, can you take a look at the latest commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also for comparison "implement operator== (const Config&) and operator=(const Config&)" I created a method since I needed to compare other variables besides the Config
@imatiach-msft Thanks! much cleaner now. |
In mmlspark, the predictied probabilities column was appearing the same as raw predicted values when both columns were computed from the model. This is due to a caching bug in the native code logic that was recently added. I added an array of predictors, one for each type, to fix the caching bug - and also added some additional checks in case the cached predictor needs to be reinitialized (not used by mmlspark directly).