Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PUBDEV-7197 - Parallel GS threads should call Hyperspace iterator one… #4207

Merged
merged 3 commits into from
Jan 14, 2020

Conversation

Pscheidl
Copy link
Contributor

… at a time

https://0xdata.atlassian.net/browse/PUBDEV-7197

In the GridSearch class, there is

private MP getNextModelParams(final HyperSpaceWalker.HyperSpaceIterator<MP> hyperSpaceWalker, final Model model, final Grid grid){
      MP params = null;

      while (params == null) {
        if (hyperSpaceWalker.hasNext(model)) {
          params = hyperSpaceWalker.nextModelParameters(model);
          final Key modelKey = grid.getModelKey(params.checksum(IGNORED_FIELDS_PARAM_HASH));
          if(modelKey != null){
            params = null;
          }
        } else {
          break;
        }
      }

      return params;
    }
  }

method which calls hyperspace iterator and asks for next model params. It is perfectly possible (but highly improbable) for two threads accessing the nextModelParams method to obtain the same parameters - hyperspace iterators are not thread safe.

Solution is to use mutually exclusive access.

Currently, the worst thing that can happen is that one model parameters are used twice and thus the same model is built twice. It has no impact on the grid, as all the models are stored in the grid as key -> value, where key is has of the parameters. Therefore, the model will still occur only once in the Grid.

@deil87
Copy link
Contributor

deil87 commented Jan 13, 2020

@michalkurka I don't understand why this PR is needed. Changes from this PR are already in #4191 . Could you please consider to approve my PR?

@michalkurka
Copy link
Contributor

@deil87 will look into it, thanks

@michalkurka
Copy link
Contributor

@deil87

It seems to me your PR is more complex and introduces additional changes and refactoring. This PR is focused on fixing a specific issue.

Can we accept @Pscheidl change and include into the next fix release and then merge yours into master? It seems to me that would be the correct approach for the fix release cycle.

WDYT?

@deil87
Copy link
Contributor

deil87 commented Jan 13, 2020

@michalkurka we were discussing this issue in my PR. I think it would be right thing to do to discuss it there and with me ( or at least ask me for a review of this PR ).
Regarding my PR, I fixed all the comments. Do you have other concerns? In general I think it makes sense to let reviewers know about whether reviewer needs more time for other concerns. Otherwise it is not clear what author should wait for and just blocked.

@michalkurka
Copy link
Contributor

michalkurka commented Jan 13, 2020

@deil87 I would like to include this PR in the fix release because it is focused on just one issue.

I do need more time for reviewing your PR as there currently are more pressing matters. Thank you for understanding.

@deil87
Copy link
Contributor

deil87 commented Jan 13, 2020

@michalkurka ok, no problem. But description should reference original PR with corresponding credit.

@michalkurka
Copy link
Contributor

@deil87 thanks

@Pscheidl is this PR based off @deil87's work?

@deil87
Copy link
Contributor

deil87 commented Jan 13, 2020

@michalkurka also this comment suggests that you were not against merging it into release branch #4191 (comment) . Based on that I have changed base branch of the PR - that could have been avoided I believe.

@michalkurka
Copy link
Contributor

@deil87 I am not against anything (yet), I didn't review the PR yet and didn't make up my mind about it.

With the change in this PR I now think it might make more sense to merge this into fix release and put your change into master/dev. This is my current thinking. I am also open to including your change into rel-yu if there is a need for it. I see this PR as a no-brainer fix for the reported issue. And we can build on top of that.

@Pscheidl
Copy link
Contributor Author

No. This PR is based on observations made during the last weekend spent on benchmarking Parallel GS: https://github.com/Pscheidl/h2o-parallel-grid-search-benchmark

In the https://github.com/h2oai/h2o-3/pull/4191/files PR, early stopping logic is being removed from HyperspaceIterator's hashNext method and several other code changes are introduced, including changes breaking API. The author @deil87 seems to add some locking, but completely different than I have (he uses synchronized blocks). As a side effect, he ends up synchronizing the same block of code too, but in a different way using different means. I use my ReentrantLock mutex that has already been there for quite a while.

This PR is not based on the above-mentioned PR by @deil87 - it doesn't share any code and has much more narrow focus. I used this code to build a custom H2O jar and use it for the benchmarks and now I'm trying to make it part of our codebase.

However, I have no problem to give credit to @deil87. It surely helps the project. Therefore I did.I surely don't want anyone's credit to be forgotten. Thank you @deil87 .

@deil87
Copy link
Contributor

deil87 commented Jan 13, 2020

@Pscheidl we were discussing together synchronisation here #4191 (comment) 5 days ago . Please remove these comments from the code.

@deil87 deil87 self-requested a review January 13, 2020 19:42
Copy link
Contributor

@deil87 deil87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Please remove credits from the code.

@Pscheidl
Copy link
Contributor Author

Pscheidl commented Jan 13, 2020

I wanted to give the missing credit as you pointed out. I'll of course remove the code as you instruct.

Not sure what should I do now to correct my mistakes. I'll leave it as-is for now and await further instructions.

Not merging the PR for now !

@deil87 deil87 self-requested a review January 13, 2020 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants