Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PUBDEV-7204: GridSearch over TE parameters is not working #4216

Merged
merged 4 commits into from
Jan 23, 2020

Conversation

deil87
Copy link
Contributor

@deil87 deil87 commented Jan 14, 2020

This PR fixes two issues:

  1. Implementation of TargetEncoderBuilder does not take into account the way how GridSearch is working. Instead of using TargetEncoderBuilder's constructor which sets passed model parameters, GridSearch is using startupOnce constructor with default parameters so that it can later reassign them with values from the grid. Even though initialisation is being called, training frame is not available in default model parameters

  2. Wrong key was passed to TargetEncoderModel
    new TargetEncoderModel(_result, ... . Instead of _result from ModelBuilder the one from Job was used. See for more details my comment PUBDEV-7204: GridSearch over TE parameters is not working #4216 (comment)

@deil87 deil87 added WIP do not merge For PRs that are not supposed to be merged labels Jan 14, 2020
@@ -113,7 +113,7 @@ public final Futures delete(Key<Job> job_key, Futures fs, boolean cascade) {
@Override public Lockable atomic(Lockable old) {
_old = old;
if( old != null ) { // Prior Lockable exists?
assert !old.is_wlocked(_job_key) : "Key "+_key+" already locked (or deleted); lks="+Arrays.toString(old._lockers); // No double locking by same job
if(old.is_wlocked(_job_key)) return old;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michalkurka can we not fail if Lockable is already locked with the same key? This way we can probably include write_lock inside update. I don't know if there are situations when we want to update without prior locking. Maybe local updates do not require distributed locks( then it is better not to include write_lock inside update to give flexibility) .
Of course without failing developers are free to abuse this ability and use write_lock unnecessary number of times. But still it is easier than trying to understand why some execution paths have different lock story. I run into such situation : for GBM model parameters process of building a model does not release lock but for TargetEncoder model parameters it does. But it is hard to understand who and where releases the lock in TE path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is outdated. But still interesting to know.

@@ -55,9 +60,8 @@ public void computeImpl() {
disableIgnoreConstColsFeature();
TargetEncoderModel.TargetEncoderOutput output = new TargetEncoderModel.TargetEncoderOutput(TargetEncoderBuilder.this, _targetEncodingMap, priorMean);
_targetEncoderModel = new TargetEncoderModel(_job._result, _parms, output, tec);

_targetEncoderModel.write_lock(_job);
_targetEncoderModel.unlock(_job);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michalkurka What is the purpose of these two lines? Does it make sure that _targetEncoderModel is updated globally?

@deil87 deil87 force-pushed the PUBDEV-7204_grid_search_over_te branch from 7a8efb0 to 9ce5a40 Compare January 20, 2020 22:02

_targetEncoderModel.write_lock(_job);
_targetEncoderModel.unlock(_job);
_targetEncoderModel = new TargetEncoderModel(_result, _parms, output, tec);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

job._result was a wrong key to pass to a Model. It was causing key to be removed by Scope. Needed key that is stored in ModelBuilder's _result field first, matches keys that are used to retrieve models from the grid ( this way we don't get N null models in the end) and second, this key is being created outside of (before) Scope.enter() and is not being unnecessary deleted from DKV.

@deil87 deil87 added please review and removed WIP do not merge For PRs that are not supposed to be merged labels Jan 20, 2020
@deil87 deil87 requested a review from ledell January 20, 2020 22:14
@michalkurka
Copy link
Contributor

michalkurka commented Jan 21, 2020

I will review this first thing tomorrow - in progress now :)

@michalkurka
Copy link
Contributor

@deil87 tests are great, thank you

Code changes in TargetEncoderModel do not respect how initialization should be done and do not lock model while it is building, my attempt to address these issues: #4245

@deil87
Copy link
Contributor Author

deil87 commented Jan 22, 2020

@michalkurka I also noticed that parallelGridSearch() method does not wrap computations in a try-catch block as gridSearch() does. Maybe we can move this logic up an something like this:

try {
          if (_parallelism == 1) {
            gridSearch(grid);
          } else if (_parallelism > 1) {
            parallelGridSearch(grid);
          } else {
            throw new IllegalArgumentException(String.format("Grid search parallelism level must be >= 1. Give value is '%d'.",
                    _parallelism));
          }
        } finally {
          grid.unlock(_job);
        }

?
and maybe even do `attemptGridSave(grid) before unlocking as well

Michal Kurka added 2 commits January 22, 2020 13:37
PUBDEV-7204: Fix grid search for Target Encoder
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants