New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating shogun implementations #79

Merged
merged 10 commits into from Jul 13, 2017

Conversation

Projects
None yet
4 participants
@Iron-Stark
Contributor

Iron-Stark commented Jul 10, 2017

@zoq @rcurtin

I have redone all the implementations as the implementations done previously were failing. Please review.

Merge pull request #77 from rcurtin/master
sweep support and options as dictionaries
@vigsterkr

This comment has been minimized.

Show comment
Hide comment
@vigsterkr

vigsterkr Jul 10, 2017

@Iron-Stark kudos for this one!

vigsterkr commented Jul 10, 2017

@Iron-Stark kudos for this one!

@Iron-Stark

This comment has been minimized.

Show comment
Hide comment
@Iron-Stark

Iron-Stark Jul 10, 2017

Contributor

@vigsterkr

Thanks a lot :-)

Contributor

Iron-Stark commented Jul 10, 2017

@vigsterkr

Thanks a lot :-)

@Iron-Stark

This comment has been minimized.

Show comment
Hide comment
@Iron-Stark

Iron-Stark Jul 10, 2017

Contributor

@zoq @rcurtin

The get_num_features is not working. Cannot figure out why. Thought it was a local issue but then it was not running here too. Can you please help with this.

Contributor

Iron-Stark commented Jul 10, 2017

@zoq @rcurtin

The get_num_features is not working. Cannot figure out why. Thought it was a local issue but then it was not running here too. Can you please help with this.

@Iron-Stark

This comment has been minimized.

Show comment
Hide comment
@Iron-Stark

Iron-Stark Jul 10, 2017

Contributor

@rcurtin @zoq

The only issues with the implementations are that I am not able to reuse the self.predictions file and have to train the model again. Can you please advice me on how to reuse the self.predictions file. I went through the scikit implementations and there too we are re-training the model in RunMetrics.

Contributor

Iron-Stark commented Jul 10, 2017

@rcurtin @zoq

The only issues with the implementations are that I am not able to reuse the self.predictions file and have to train the model again. Can you please advice me on how to reuse the self.predictions file. I went through the scikit implementations and there too we are re-training the model in RunMetrics.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 10, 2017

Member

You are right, the problem is the timer process does not share the parameter with the main process. A solution would be to return the modified parameter something like:

  def DTCShogun(self, options):
    def RunDTCShogun(q):
      totalTimer = Timer()

      ...

      time = totalTimer.ElapsedTime()
      q.put((time, self.predictions))

      return time

    result = timeout(RunDTCShogun, self.timeout)
    # Check for error, in this case the tuple doesn't contain extra information.
    if len(result) > 1:
       self.predictions = result[1]

    return result[0]

What do you think?

Member

zoq commented Jul 10, 2017

You are right, the problem is the timer process does not share the parameter with the main process. A solution would be to return the modified parameter something like:

  def DTCShogun(self, options):
    def RunDTCShogun(q):
      totalTimer = Timer()

      ...

      time = totalTimer.ElapsedTime()
      q.put((time, self.predictions))

      return time

    result = timeout(RunDTCShogun, self.timeout)
    # Check for error, in this case the tuple doesn't contain extra information.
    if len(result) > 1:
       self.predictions = result[1]

    return result[0]

What do you think?

@Iron-Stark

This comment has been minimized.

Show comment
Hide comment
@Iron-Stark

Iron-Stark Jul 10, 2017

Contributor

@zoq

Yes I was actually implementing the same thing before but then I thought the implementations would become inconsistent with the other libraries implementations. If this is fine I will implement this one.

Contributor

Iron-Stark commented Jul 10, 2017

@zoq

Yes I was actually implementing the same thing before but then I thought the implementations would become inconsistent with the other libraries implementations. If this is fine I will implement this one.

@Iron-Stark

This comment has been minimized.

Show comment
Hide comment
@Iron-Stark

Iron-Stark Jul 10, 2017

Contributor

It would be better to do the same for the scikit implementations too. I will open up a PR for that too.

Contributor

Iron-Stark commented Jul 10, 2017

It would be better to do the same for the scikit implementations too. I will open up a PR for that too.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 10, 2017

Member

Right, I think it makes sense to do the same for the other scripts as well, that way we don't have to rerun the benchmarks and can save some time.

Member

zoq commented Jul 10, 2017

Right, I think it makes sense to do the same for the other scripts as well, that way we don't have to rerun the benchmarks and can save some time.

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Jul 10, 2017

Member

At a quick glance it looks good to me, but I will have to dig a bit deeper shortly. As a clarification question, I remember that with new Shogun (6.0.0 I think) we have to ensure that everything is done as a separate process, otherwise it will hang. Has this been tested against that version of Shogun? I think that Jenkins still tests against 5.0.0 because of that issue. If not I can help you set up that environment, just let me know.

Member

rcurtin commented Jul 10, 2017

At a quick glance it looks good to me, but I will have to dig a bit deeper shortly. As a clarification question, I remember that with new Shogun (6.0.0 I think) we have to ensure that everything is done as a separate process, otherwise it will hang. Has this been tested against that version of Shogun? I think that Jenkins still tests against 5.0.0 because of that issue. If not I can help you set up that environment, just let me know.

@Iron-Stark

This comment has been minimized.

Show comment
Hide comment
@Iron-Stark

Iron-Stark Jul 11, 2017

Contributor

@rcurtin

The local slake account installation is shogun 6.0.0 and I have tested this PR on that account and it was working. So this should work with shogun 6.0.0

Contributor

Iron-Stark commented Jul 11, 2017

@rcurtin

The local slake account installation is shogun 6.0.0 and I have tested this PR on that account and it was working. So this should work with shogun 6.0.0

@@ -60,16 +62,15 @@ def BuildModel(self, data, labels, options):
if "k" in options:
n_neighbors = int(options.pop("k"))
else:
Log.Fatal("Required parameter 'k' not specified!")
raise Exception("missing parameter")
n_neighbors = 5

This comment has been minimized.

@rcurtin

rcurtin Jul 11, 2017

Member

Personally I would say that we should require k for the k-nearest-neighbors classifier, instead of using a default of 5. Since KNN takes a different amount of time based on the given k value, if someone forgets to put the values in their configuration file then we can end up with incorrectly different timings across libraries, especially if the defaults are not all the same (and it is a little hard to remember to keep them all in sync).

@rcurtin

rcurtin Jul 11, 2017

Member

Personally I would say that we should require k for the k-nearest-neighbors classifier, instead of using a default of 5. Since KNN takes a different amount of time based on the given k value, if someone forgets to put the values in their configuration file then we can end up with incorrectly different timings across libraries, especially if the defaults are not all the same (and it is a little hard to remember to keep them all in sync).

@rcurtin

If the tests pass locally, I think this looks good and we can go ahead and merge it and start re-running the shogun benchmarks. If you can address my one comment about KNC I think it would be good, but that may require some refactoring of KNC for some other methods too depending.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 11, 2017

Member

I think, before we rerun the shogun benchmark, we should run the test against shogun 6.0 what do you think?

Member

zoq commented Jul 11, 2017

I think, before we rerun the shogun benchmark, we should run the test against shogun 6.0 what do you think?

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Jul 11, 2017

Member

Agreed, let me change the checkout script and rebuild.

Member

rcurtin commented Jul 11, 2017

Agreed, let me change the checkout script and rebuild.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 11, 2017

Member

@mlpack-jenkins test this

Member

zoq commented Jul 11, 2017

@mlpack-jenkins test this

@Iron-Stark

This comment has been minimized.

Show comment
Hide comment
@Iron-Stark

Iron-Stark Jul 11, 2017

Contributor

@rcurtin

The knc method is currently implemented in sklearn,shogun and mlpy. Sklearn sets the default value of k to 5 if not specified and in mlpy's implementation too we have kept the default value as 5. In the config file we havent specified the neighbors in sklearn, shogun as well as mlpy's blocks. So there are two options available either specifying the options and making value of k mandatory in all three or specifying default value as 5 here. Please let me know what needs to be done.

Contributor

Iron-Stark commented Jul 11, 2017

@rcurtin

The knc method is currently implemented in sklearn,shogun and mlpy. Sklearn sets the default value of k to 5 if not specified and in mlpy's implementation too we have kept the default value as 5. In the config file we havent specified the neighbors in sklearn, shogun as well as mlpy's blocks. So there are two options available either specifying the options and making value of k mandatory in all three or specifying default value as 5 here. Please let me know what needs to be done.

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Jul 11, 2017

Member

I would say we should make it mandatory in scikit and mlpy also and then update the config file and tests accordingly. You could do that in this PR or another one if you like, either way is fine with me.

Member

rcurtin commented Jul 11, 2017

I would say we should make it mandatory in scikit and mlpy also and then update the config file and tests accordingly. You could do that in this PR or another one if you like, either way is fine with me.

@Iron-Stark

This comment has been minimized.

Show comment
Hide comment
@Iron-Stark

Iron-Stark Jul 11, 2017

Contributor

@rcurtin

Let us first complete the work on this one and then after this is merged I will open up another PR and do the same.

Contributor

Iron-Stark commented Jul 11, 2017

@rcurtin

Let us first complete the work on this one and then after this is merged I will open up another PR and do the same.

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Jul 11, 2017

Member

Agreed.

Member

rcurtin commented Jul 11, 2017

Agreed.

@zoq

zoq approved these changes Jul 12, 2017

Looks good to me.

@rcurtin rcurtin merged commit 2c61fb9 into mlpack:master Jul 13, 2017

1 check passed

Benchmarks Checks Build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment