New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updation of scikit libraries #60

Merged
merged 10 commits into from Jun 7, 2017

Conversation

Projects
None yet
3 participants
@Iron-Stark
Contributor

Iron-Stark commented May 30, 2017

No description provided.

Show outdated Hide outdated methods/scikit/allknn.py
Show outdated Hide outdated methods/scikit/LSHForest.py
@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Jun 1, 2017

Member

Hi @Iron-Stark, thanks for the PR. I think that we may want to consider other ways of providing all of these options to the config.yaml file. Right now we might write something like

options: '-c 3'

but if the goal is to allow the user to specify any setting in the config.yaml file, we may want to restructure the input to look more like

options:
  clusters: 3
  other_option: "value"
  ...

otherwise I think we will run out of option names. :) So instead of abstracting to parameters specified similarly to mlpack, we would end up abstracting to some other format. (I am not sure my idea is the best, so please feel free to disagree or improve upon it.)

A thing to keep in mind is that in order to produce benchmarks that can actually be useful for someone looking at it, we have to reduce the scope a little bit sometimes---instead of, e.g., benchmarking every possible configuration of LSH, we instead benchmark "the default configuration of each library's LSH configuration given that we can tune the number of tables and projections"; or, e.g., we benchmark "each k-means algorithm that is implemented for each library with k and the initial centroids set specifically". But if the scope is too large (e.g. "benchmark every k-NN algorithm") then there get to be so many different parameter combinations that it's impossible to reasonably do that.

So I don't think it's a problem necessarily to provide this many options to be accessed through the configuration (other than the running out of option names issue), but I do think we should keep in mind that we may not want to use all of these options in the config.yaml that we run.

Member

rcurtin commented Jun 1, 2017

Hi @Iron-Stark, thanks for the PR. I think that we may want to consider other ways of providing all of these options to the config.yaml file. Right now we might write something like

options: '-c 3'

but if the goal is to allow the user to specify any setting in the config.yaml file, we may want to restructure the input to look more like

options:
  clusters: 3
  other_option: "value"
  ...

otherwise I think we will run out of option names. :) So instead of abstracting to parameters specified similarly to mlpack, we would end up abstracting to some other format. (I am not sure my idea is the best, so please feel free to disagree or improve upon it.)

A thing to keep in mind is that in order to produce benchmarks that can actually be useful for someone looking at it, we have to reduce the scope a little bit sometimes---instead of, e.g., benchmarking every possible configuration of LSH, we instead benchmark "the default configuration of each library's LSH configuration given that we can tune the number of tables and projections"; or, e.g., we benchmark "each k-means algorithm that is implemented for each library with k and the initial centroids set specifically". But if the scope is too large (e.g. "benchmark every k-NN algorithm") then there get to be so many different parameter combinations that it's impossible to reasonably do that.

So I don't think it's a problem necessarily to provide this many options to be accessed through the configuration (other than the running out of option names issue), but I do think we should keep in mind that we may not want to use all of these options in the config.yaml that we run.

@Iron-Stark

This comment has been minimized.

Show comment
Hide comment
@Iron-Stark

Iron-Stark Jun 3, 2017

Contributor

@rcurtin

Yes I see your point. There are certain configurations of LSH available in sklearn and not available in say shogun. So introducing those parameters are not useful as benchmarks for them cannot be generated as these options are not available in the other library.

Also I agree that the config.yaml file can be restructured the way you have suggested so that the parameters can be abstracted to some other format.

Contributor

Iron-Stark commented Jun 3, 2017

@rcurtin

Yes I see your point. There are certain configurations of LSH available in sklearn and not available in say shogun. So introducing those parameters are not useful as benchmarks for them cannot be generated as these options are not available in the other library.

Also I agree that the config.yaml file can be restructured the way you have suggested so that the parameters can be abstracted to some other format.

Show outdated Hide outdated methods/scikit/LSHForest.py
Show outdated Hide outdated methods/scikit/LSHForest.py
@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Jun 5, 2017

Member

@Iron-Stark: would you like to implement the new config file support? If not let's just open an issue for it---I don't have any time to handle that myself now unfortunately.

Member

rcurtin commented Jun 5, 2017

@Iron-Stark: would you like to implement the new config file support? If not let's just open an issue for it---I don't have any time to handle that myself now unfortunately.

@Iron-Stark

This comment has been minimized.

Show comment
Hide comment
@Iron-Stark

Iron-Stark Jun 6, 2017

Contributor

@rcurtin
Yes I would like to implement that but I was thinking that it would be better to implement the algorithms in the various libraries first and then updating the config.yaml file.

Contributor

Iron-Stark commented Jun 6, 2017

@rcurtin
Yes I would like to implement that but I was thinking that it would be better to implement the algorithms in the various libraries first and then updating the config.yaml file.

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Jun 6, 2017

Member

Sounds good, I agree---we can handle the config file format somewhere else.

Member

rcurtin commented Jun 6, 2017

Sounds good, I agree---we can handle the config file format somewhere else.

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Jun 6, 2017

Member

I think this is ready to merge; if everyone agrees then we can go ahead and do that.

Member

rcurtin commented Jun 6, 2017

I think this is ready to merge; if everyone agrees then we can go ahead and do that.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jun 6, 2017

Member

I agree just a minor issue; if you could go through the changed files and truncate the lines to no more than 80 characters it would be great (e.g. put the comment above the parameter).

Member

zoq commented Jun 6, 2017

I agree just a minor issue; if you could go through the changed files and truncate the lines to no more than 80 characters it would be great (e.g. put the comment above the parameter).

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jun 7, 2017

Member

Thanks for going through the files, much easier to read.

Member

zoq commented Jun 7, 2017

Thanks for going through the files, much easier to read.

@zoq zoq merged commit 143598a into mlpack:master Jun 7, 2017

1 check passed

Benchmarks Checks Build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment