New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite GLM as GLMCV: a separate class for regularization path #158

Closed
themantalope opened this Issue Oct 3, 2016 · 22 comments

Comments

Projects
None yet
3 participants
@themantalope
Contributor

themantalope commented Oct 3, 2016

Currently theGLM object does not return the proper variable types at various stages of the sklearn GridSearchCV lifecycle. I'll start with some of the issues I have had and the solutions that I worked on thus far on my forked version of the project.

  1. GLM object did not have a set_params function.

At line 180 of pyglmnet.py I added the following code:

def set_params(self, **parameters):
    """
    Method for setting class parameters, as required by scikit-learn's
    GridSearchCV. See
    http://scikit-learn.org/stable/developers/contributing.html#get-params-and-set-params
    for more details.
    """
    for parameter, value in parameters.items():
        setattr(self, parameter, value)

This method is required for GridSearchCV to set the current parameters within the grid search.

@themantalope themantalope changed the title from make `GLM` object play nice with `sklearn`'s `GridSearchCV` to make GLM object play nice with sklearn's GridSearchCV Oct 3, 2016

@themantalope

This comment has been minimized.

Show comment
Hide comment
@themantalope

themantalope Oct 3, 2016

Contributor
  1. Assumption that the self.reg_lambda property will always be a list or numpy.ndarray

In the GLM object's fit method, the lambda parameters are used to fit the model as a loop. However, during the GridSearchCV fitting procedure, only one parameter is fit and scored at a time (this is to allow for parallel fitting during the grid search). For example if one were to use the GLM object and GridSearchCV, the setup would look like this:

glm = pgn.GLM(distr="poisson", score_metric="pseudo_R2")

    # print glm.get_params().keys()

    params = {"reg_lambda":np.logspace(-3,0, num=7),
              "alpha":np.linspace(0.05, 1.0, 7, endpoint=True),
              "max_iter":[int(1e4)]}

    gs_cv = grid_search.GridSearchCV(estimator=glm,
                                     param_grid=params,
                                     verbose=10,
                                     cv=cross_validation.ShuffleSplit(n=len(data_train_X_scaled), test_size=0.20),
                                      n_jobs=8)
    gs_cv.fit(data_train_X_scaled, data_train_y)

What GridSearchCV will do is for each combination of reg_lambda, alpha and max_iter, it will perform ShuffleSplit cross validation to determine the optimum model parameters given the data, reg_lambda and alpha.

In order to get around this, I added in a small if statement at line 586 of the pyglmnet.py file:

if isinstance(self.reg_lambda, numbers.Number):
            temp = self.reg_lambda
            self.reg_lambda = [temp]

In addition, I imported the built-in python numbers module at the top of the file. Now the GLM object will fit one model for the reg_lambda that was just set by GridSearchCV.

In the future, it may be worthwhile to change all of the GLM object's properties to pythonic properties, and type checking could then be done at the level of property assignment. One possible implementation for the reg_lambda property would look like this:

@property
def reg_lambda(self):
    return self._reg_lambda

@reg_lambda.setter
def reg_lambda(self, rl):
    if isinstance(rl, (list, np.ndarray)):
        self._reg_lambda = rl
    elif isinstance(rl, numbers.Number):
        self._reg_lambda = [rl]
    else:
        raise TypeError("'reg_lambda' must be a list, numpy.ndarray, or a number.")
Contributor

themantalope commented Oct 3, 2016

  1. Assumption that the self.reg_lambda property will always be a list or numpy.ndarray

In the GLM object's fit method, the lambda parameters are used to fit the model as a loop. However, during the GridSearchCV fitting procedure, only one parameter is fit and scored at a time (this is to allow for parallel fitting during the grid search). For example if one were to use the GLM object and GridSearchCV, the setup would look like this:

glm = pgn.GLM(distr="poisson", score_metric="pseudo_R2")

    # print glm.get_params().keys()

    params = {"reg_lambda":np.logspace(-3,0, num=7),
              "alpha":np.linspace(0.05, 1.0, 7, endpoint=True),
              "max_iter":[int(1e4)]}

    gs_cv = grid_search.GridSearchCV(estimator=glm,
                                     param_grid=params,
                                     verbose=10,
                                     cv=cross_validation.ShuffleSplit(n=len(data_train_X_scaled), test_size=0.20),
                                      n_jobs=8)
    gs_cv.fit(data_train_X_scaled, data_train_y)

What GridSearchCV will do is for each combination of reg_lambda, alpha and max_iter, it will perform ShuffleSplit cross validation to determine the optimum model parameters given the data, reg_lambda and alpha.

In order to get around this, I added in a small if statement at line 586 of the pyglmnet.py file:

if isinstance(self.reg_lambda, numbers.Number):
            temp = self.reg_lambda
            self.reg_lambda = [temp]

In addition, I imported the built-in python numbers module at the top of the file. Now the GLM object will fit one model for the reg_lambda that was just set by GridSearchCV.

In the future, it may be worthwhile to change all of the GLM object's properties to pythonic properties, and type checking could then be done at the level of property assignment. One possible implementation for the reg_lambda property would look like this:

@property
def reg_lambda(self):
    return self._reg_lambda

@reg_lambda.setter
def reg_lambda(self, rl):
    if isinstance(rl, (list, np.ndarray)):
        self._reg_lambda = rl
    elif isinstance(rl, numbers.Number):
        self._reg_lambda = [rl]
    else:
        raise TypeError("'reg_lambda' must be a list, numpy.ndarray, or a number.")
@themantalope

This comment has been minimized.

Show comment
Hide comment
@themantalope

themantalope Oct 3, 2016

Contributor
  1. GLM object's score method returns an array

GridSearchCV expects that the score function returns a scalar, since the general pattern of the GridSearchCV is to set parameters, train the model given the parameter set and data and return one score for model/parameter comparison. Currently, the GLM object's score function returns an array of scores (for the given lambdas). A very hackish solution I had for this was the following at the end of the score function:

if isinstance(score, numbers.Number):
            return score
        else:
            return np.array(score)

This still doesn't get around the problem that if one uses a scoring metric other than pseudo_R2 (based on my current understanding of the code) then it's possible that this function would return something other than a scalar.

Contributor

themantalope commented Oct 3, 2016

  1. GLM object's score method returns an array

GridSearchCV expects that the score function returns a scalar, since the general pattern of the GridSearchCV is to set parameters, train the model given the parameter set and data and return one score for model/parameter comparison. Currently, the GLM object's score function returns an array of scores (for the given lambdas). A very hackish solution I had for this was the following at the end of the score function:

if isinstance(score, numbers.Number):
            return score
        else:
            return np.array(score)

This still doesn't get around the problem that if one uses a scoring metric other than pseudo_R2 (based on my current understanding of the code) then it's possible that this function would return something other than a scalar.

@pavanramkumar

This comment has been minimized.

Show comment
Hide comment
@pavanramkumar

pavanramkumar Oct 3, 2016

Collaborator

@themantalope thanks a lot for looking at this carefully!

To answer your questions one by one:

  1. The set_params function looks great as you've written it. Should be ready to merge once you submit a PR.

  2. I like the idea of using the @property for the future. In the interest of readability, I'd suggest for now:

if isinstance(self.reg_lambda, float):
    list_of_reg_lambda = list()
    list_of_reg_lambda.append(self.reg_lambda)
    self.reg_lambda = list_of_reg_lambda
  1. It' a known issue that glm.score() returns a vector of scores, one for each lambda. For now, I've simply noted in the docstring of score that one must only use a sliced estimator with cross_val_score or GridSearchCV. See: https://github.com/pavanramkumar/pyglmnet/blob/master/pyglmnet/pyglmnet.py#L711-L718

  2. Minor comment: Since this is a convex loss function, you don't need to do a grid search on max_iter. As long a you have enough number of iterations, you're likely to find a solution very close to the global minumum.

Let me know if you have any more questions. And thanks again, look forward to the PR!

Collaborator

pavanramkumar commented Oct 3, 2016

@themantalope thanks a lot for looking at this carefully!

To answer your questions one by one:

  1. The set_params function looks great as you've written it. Should be ready to merge once you submit a PR.

  2. I like the idea of using the @property for the future. In the interest of readability, I'd suggest for now:

if isinstance(self.reg_lambda, float):
    list_of_reg_lambda = list()
    list_of_reg_lambda.append(self.reg_lambda)
    self.reg_lambda = list_of_reg_lambda
  1. It' a known issue that glm.score() returns a vector of scores, one for each lambda. For now, I've simply noted in the docstring of score that one must only use a sliced estimator with cross_val_score or GridSearchCV. See: https://github.com/pavanramkumar/pyglmnet/blob/master/pyglmnet/pyglmnet.py#L711-L718

  2. Minor comment: Since this is a convex loss function, you don't need to do a grid search on max_iter. As long a you have enough number of iterations, you're likely to find a solution very close to the global minumum.

Let me know if you have any more questions. And thanks again, look forward to the PR!

@themantalope

This comment has been minimized.

Show comment
Hide comment
@themantalope

themantalope Oct 3, 2016

Contributor

@pavanramkumar

Using the sliced estimator doesn't work. I'm getting an error when running a script as such:

    glm = pgn.GLM(distr="poisson", score_metric="pseudo_R2", solver="cdfast")

    # print glm.get_params().keys()

    params = {"reg_lambda":np.logspace(-3,0, num=7),
              "alpha":np.linspace(0.05, 1.0, 7, endpoint=True),
              "max_iter":[int(10)]}

    gs_cv = grid_search.GridSearchCV(estimator=glm[0],
                                     param_grid=params,
                                     verbose=10,
                                     cv=cross_validation.ShuffleSplit(n=len(data_train_X_scaled), test_size=0.20))

#terminal output below:
Traceback (most recent call last):
  File "glm_test_script_poisson.py", line 82, in <module>
    gs_cv = grid_search.GridSearchCV(estimator=glm[0],
  File "/Users/antalek/Documents/ProgrammingProjects/PythonProjects/pyglmnet/pyglmnet/pyglmnet.py", line 209, in __getitem__
    raise ValueError('Cannot slice object if the lambdas have'
ValueError: Cannot slice object if the lambdas have not been fit.

EDIT:

As per 4, yes, in the script I wasn't iterating over ranges of max_iter, however max_iter must be set in this way if one wants to use a number different from the default.

Contributor

themantalope commented Oct 3, 2016

@pavanramkumar

Using the sliced estimator doesn't work. I'm getting an error when running a script as such:

    glm = pgn.GLM(distr="poisson", score_metric="pseudo_R2", solver="cdfast")

    # print glm.get_params().keys()

    params = {"reg_lambda":np.logspace(-3,0, num=7),
              "alpha":np.linspace(0.05, 1.0, 7, endpoint=True),
              "max_iter":[int(10)]}

    gs_cv = grid_search.GridSearchCV(estimator=glm[0],
                                     param_grid=params,
                                     verbose=10,
                                     cv=cross_validation.ShuffleSplit(n=len(data_train_X_scaled), test_size=0.20))

#terminal output below:
Traceback (most recent call last):
  File "glm_test_script_poisson.py", line 82, in <module>
    gs_cv = grid_search.GridSearchCV(estimator=glm[0],
  File "/Users/antalek/Documents/ProgrammingProjects/PythonProjects/pyglmnet/pyglmnet/pyglmnet.py", line 209, in __getitem__
    raise ValueError('Cannot slice object if the lambdas have'
ValueError: Cannot slice object if the lambdas have not been fit.

EDIT:

As per 4, yes, in the script I wasn't iterating over ranges of max_iter, however max_iter must be set in this way if one wants to use a number different from the default.

@pavanramkumar

This comment has been minimized.

Show comment
Hide comment
@pavanramkumar

pavanramkumar Oct 4, 2016

Collaborator

Did you try running GridSearchCV on an estimator with reg_lambda = [0.1]? This will create an object that will return a scalar for score() and thus GridSearchCV may not complain. But I'm not sure.

In general, it seems that GridSearchCV is built for parallel search and doesn't work well for estimators with a regularization path.

See this scikit-learn thread:
scikit-learn/scikit-learn#1674

They seem to have discussed the issue and then concluded that it's unclear how to address grid search for regularization path estimators.

Collaborator

pavanramkumar commented Oct 4, 2016

Did you try running GridSearchCV on an estimator with reg_lambda = [0.1]? This will create an object that will return a scalar for score() and thus GridSearchCV may not complain. But I'm not sure.

In general, it seems that GridSearchCV is built for parallel search and doesn't work well for estimators with a regularization path.

See this scikit-learn thread:
scikit-learn/scikit-learn#1674

They seem to have discussed the issue and then concluded that it's unclear how to address grid search for regularization path estimators.

@themantalope

This comment has been minimized.

Show comment
Hide comment
@themantalope

themantalope Oct 4, 2016

Contributor

Right - basically if you set the python self.reg_lambda value to be a list of length one then only one score is returned, which can be fixed by the solution I propsed. I think the error I was getting from using the sliced estimator was because the sliced estimator hadn't been fit yet. Perhaps one way around that would be to allow the sliced estimator to return a copy of the GLM object but with the i^th set of parameters.

On another note, my understanding of using something like GridSearchCV is that we don't know what set of parameter combinations will work best given our data, so we essentially have to test them all. My understanding of the discussion from the sklearn thread is that they didn't know what would be the best method for handling the combination of GridSearch with warm starts - which in my understanding would assume that for different choices of alpha and lambda (in the context of Elastic Net) the same set of non-zero model weights should be the same which I'm not sure is a completely safe assumption to make. I could be completely wrong about that, so please let me know what you think.

Contributor

themantalope commented Oct 4, 2016

Right - basically if you set the python self.reg_lambda value to be a list of length one then only one score is returned, which can be fixed by the solution I propsed. I think the error I was getting from using the sliced estimator was because the sliced estimator hadn't been fit yet. Perhaps one way around that would be to allow the sliced estimator to return a copy of the GLM object but with the i^th set of parameters.

On another note, my understanding of using something like GridSearchCV is that we don't know what set of parameter combinations will work best given our data, so we essentially have to test them all. My understanding of the discussion from the sklearn thread is that they didn't know what would be the best method for handling the combination of GridSearch with warm starts - which in my understanding would assume that for different choices of alpha and lambda (in the context of Elastic Net) the same set of non-zero model weights should be the same which I'm not sure is a completely safe assumption to make. I could be completely wrong about that, so please let me know what you think.

@pavanramkumar

This comment has been minimized.

Show comment
Hide comment
@pavanramkumar

pavanramkumar Oct 4, 2016

Collaborator

The broad issue is that GridSearchCV is inherently parallel whereas warm restarts is inherently serial, i.e. the initialization for the subsequent value of reg_lambda depends on the solution with current value of reg_lambda.

I believe that the scikit-learn devels have worked around this issue by breaking it up into separate use cases.

  • They don't offer warm restarts in the fit() methods of Lasso and ElasticNet. Their alpha parameter (which is reg_lambda for us) is a scalar float. This makes it compatible with GridSearchCV and cross_val_score.
  • For warm restarts, they provide a separate method called path() which fits models along a regularization path and returns the set of coefficients for each reg_lambda along the path (n. b.: in their nomenclature, reg_lambda is alpha).

I thought our slicing solution was elegant, but it appears that it's not going to fit neatly with GridSearchCV() and cross_val_score().

I think we are faced with a choice of copying scikit-learn's solution. This will require some significant refactoring of fit(), predict(), fit_predict(), score() in addition to adding a new method called path(). The alternative is to document clearly that GridSearchCV() will only work for scalar reg_lambda.

In my personal experience, warm restarts leads to significantly better solutions than parallel grid search. So I'd vote for the latter solution.

@jasmainak and anyone else do you want to weigh in on this?

Collaborator

pavanramkumar commented Oct 4, 2016

The broad issue is that GridSearchCV is inherently parallel whereas warm restarts is inherently serial, i.e. the initialization for the subsequent value of reg_lambda depends on the solution with current value of reg_lambda.

I believe that the scikit-learn devels have worked around this issue by breaking it up into separate use cases.

  • They don't offer warm restarts in the fit() methods of Lasso and ElasticNet. Their alpha parameter (which is reg_lambda for us) is a scalar float. This makes it compatible with GridSearchCV and cross_val_score.
  • For warm restarts, they provide a separate method called path() which fits models along a regularization path and returns the set of coefficients for each reg_lambda along the path (n. b.: in their nomenclature, reg_lambda is alpha).

I thought our slicing solution was elegant, but it appears that it's not going to fit neatly with GridSearchCV() and cross_val_score().

I think we are faced with a choice of copying scikit-learn's solution. This will require some significant refactoring of fit(), predict(), fit_predict(), score() in addition to adding a new method called path(). The alternative is to document clearly that GridSearchCV() will only work for scalar reg_lambda.

In my personal experience, warm restarts leads to significantly better solutions than parallel grid search. So I'd vote for the latter solution.

@jasmainak and anyone else do you want to weigh in on this?

@jasmainak

This comment has been minimized.

Show comment
Hide comment
@jasmainak

jasmainak Oct 4, 2016

Member

I skimmed through the discussion quickly, so let me know if I missed anything. I don't think we should worry about making it compatible with GridSearchCV. As long as it works with cross_val_score I think we're fine. Scikit-learn has xxxxCV objects do deal with such situations. Like this one: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html.

Member

jasmainak commented Oct 4, 2016

I skimmed through the discussion quickly, so let me know if I missed anything. I don't think we should worry about making it compatible with GridSearchCV. As long as it works with cross_val_score I think we're fine. Scikit-learn has xxxxCV objects do deal with such situations. Like this one: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html.

@pavanramkumar pavanramkumar added the API label Oct 4, 2016

@pavanramkumar pavanramkumar added this to the version 0.1 milestone Oct 4, 2016

@pavanramkumar

This comment has been minimized.

Show comment
Hide comment
@pavanramkumar

pavanramkumar Oct 4, 2016

Collaborator

@jasmainak Do you know the differences between LassoCV vs. Lasso.path() vs. lasso_path? Seems like they all do the same thing. In any case, our slicing solution already offers what LassoCV offers.

@themantalope In practice I think that GridSearchCV is never used with Lasso and ElasticNet because the warm restarts option provides better solutions. LassoCV and ElasticNetCV provide warm restarts, not grid search as far as I can see. So I propose to clearly document that GridSearchCV works for scalar values of reg_lambda and then leave it at that. I will also edit the README.md to make sure we don't mislead users.

@jasmainak cross_val_score requires glm.score() to return a scalar which again, will only work with sliced estimators or then for scalar values of reg_lambda. Currently we are faced with the issue that objects cannot be sliced unless we run fit(). So either we warn the user that they must first run fit() or then we remove this restriction and allow slicing before fitting. Thoughts on this decision?

Collaborator

pavanramkumar commented Oct 4, 2016

@jasmainak Do you know the differences between LassoCV vs. Lasso.path() vs. lasso_path? Seems like they all do the same thing. In any case, our slicing solution already offers what LassoCV offers.

@themantalope In practice I think that GridSearchCV is never used with Lasso and ElasticNet because the warm restarts option provides better solutions. LassoCV and ElasticNetCV provide warm restarts, not grid search as far as I can see. So I propose to clearly document that GridSearchCV works for scalar values of reg_lambda and then leave it at that. I will also edit the README.md to make sure we don't mislead users.

@jasmainak cross_val_score requires glm.score() to return a scalar which again, will only work with sliced estimators or then for scalar values of reg_lambda. Currently we are faced with the issue that objects cannot be sliced unless we run fit(). So either we warn the user that they must first run fit() or then we remove this restriction and allow slicing before fitting. Thoughts on this decision?

@jasmainak

This comment has been minimized.

Show comment
Hide comment
@jasmainak

jasmainak Oct 5, 2016

Member

I think lasso_path is mostly for plotting in the same spirit as validation_curve from the sklearn.learning_curve module. I'm not exactly sure our slicing solution yet offers what LassoCV offers. What LassoCV does is that it accepts a cv object and does the cross validation internally with warm restarts. We don't accept cv objects as arguments yet.

I think we could in principle do slicing before fitting. But what purpose would it serve? I think we can just throw an error and ask users to run fit before ...

Member

jasmainak commented Oct 5, 2016

I think lasso_path is mostly for plotting in the same spirit as validation_curve from the sklearn.learning_curve module. I'm not exactly sure our slicing solution yet offers what LassoCV offers. What LassoCV does is that it accepts a cv object and does the cross validation internally with warm restarts. We don't accept cv objects as arguments yet.

I think we could in principle do slicing before fitting. But what purpose would it serve? I think we can just throw an error and ask users to run fit before ...

@themantalope

This comment has been minimized.

Show comment
Hide comment
@themantalope

themantalope Oct 5, 2016

Contributor

Just as a reference then, if one wanted to test a large number of alpha parameters for the GLM object, how should one do that given the current implementation scheme? Would you just iterate over alpha values and record the results? Perhaps we could bake that functionality somewhere into this package.

Contributor

themantalope commented Oct 5, 2016

Just as a reference then, if one wanted to test a large number of alpha parameters for the GLM object, how should one do that given the current implementation scheme? Would you just iterate over alpha values and record the results? Perhaps we could bake that functionality somewhere into this package.

@jasmainak

This comment has been minimized.

Show comment
Hide comment
@jasmainak

jasmainak Oct 5, 2016

Member

You want to test large number of alpha values with warm restarts for different regularizations?

Member

jasmainak commented Oct 5, 2016

You want to test large number of alpha values with warm restarts for different regularizations?

@pavanramkumar

This comment has been minimized.

Show comment
Hide comment
@pavanramkumar

pavanramkumar Oct 5, 2016

Collaborator

I fear that currently there is no easy way to do this.

  • Doing warm restarts on two variables (reg_lambda, alpha) is going to involve nested for loops through these paths, along with decisions about how to evolve the path.
  • Doing grid search on reg_lambda and alpha will require a rewrite of fit() that forces reg_lambda to be scalar.

In my personal experience, warm restarts finds better solutions than grid search.

Collaborator

pavanramkumar commented Oct 5, 2016

I fear that currently there is no easy way to do this.

  • Doing warm restarts on two variables (reg_lambda, alpha) is going to involve nested for loops through these paths, along with decisions about how to evolve the path.
  • Doing grid search on reg_lambda and alpha will require a rewrite of fit() that forces reg_lambda to be scalar.

In my personal experience, warm restarts finds better solutions than grid search.

@jasmainak

This comment has been minimized.

Show comment
Hide comment
@jasmainak

jasmainak Oct 5, 2016

Member

Doing grid search on reg_lambda and alpha will require a rewrite of fit() that forces reg_lambda to be scalar

isn't this already possible?

Member

jasmainak commented Oct 5, 2016

Doing grid search on reg_lambda and alpha will require a rewrite of fit() that forces reg_lambda to be scalar

isn't this already possible?

@jasmainak

This comment has been minimized.

Show comment
Hide comment
@jasmainak

jasmainak Oct 5, 2016

Member

oh, my bad. We don't have set_params ...

Member

jasmainak commented Oct 5, 2016

oh, my bad. We don't have set_params ...

@pavanramkumar

This comment has been minimized.

Show comment
Hide comment
@pavanramkumar

pavanramkumar Oct 5, 2016

Collaborator

@jasmainak even if we have set_params, we will have some issues. in particular, will work only if we have sliced estimators passed into GridSearchCV. this is awkward because to slice, we currently need to call fit() which is strange to do first if someone wants to run a formal grid search. see this comment further above:
https://github.com/pavanramkumar/pyglmnet/issues/158#issuecomment-251242823

Collaborator

pavanramkumar commented Oct 5, 2016

@jasmainak even if we have set_params, we will have some issues. in particular, will work only if we have sliced estimators passed into GridSearchCV. this is awkward because to slice, we currently need to call fit() which is strange to do first if someone wants to run a formal grid search. see this comment further above:
https://github.com/pavanramkumar/pyglmnet/issues/158#issuecomment-251242823

@jasmainak

This comment has been minimized.

Show comment
Hide comment
@jasmainak

jasmainak Oct 5, 2016

Member

in particular, will work only if we have sliced estimators passed into GridSearchCV

why slice? can't you just initialize it with a single reg_lambda?

Member

jasmainak commented Oct 5, 2016

in particular, will work only if we have sliced estimators passed into GridSearchCV

why slice? can't you just initialize it with a single reg_lambda?

@pavanramkumar

This comment has been minimized.

Show comment
Hide comment
@pavanramkumar

pavanramkumar Oct 5, 2016

Collaborator

yes, in fact i suggested that to @themantalope https://github.com/pavanramkumar/pyglmnet/issues/158#issuecomment-251263634 but apparently there were some issues.

i'm going to take a more detailed look at all this on friday.

Collaborator

pavanramkumar commented Oct 5, 2016

yes, in fact i suggested that to @themantalope https://github.com/pavanramkumar/pyglmnet/issues/158#issuecomment-251263634 but apparently there were some issues.

i'm going to take a more detailed look at all this on friday.

@jasmainak

This comment has been minimized.

Show comment
Hide comment
@jasmainak

jasmainak Oct 5, 2016

Member

Sorry for missing so much discussion ...

okay after some thinking, I fear that we should create a new object -- maybe GLMCV -- for grid search to work with warm restarts. We shouldn't rewrite the fit I think. GLM object is for a different purpose. I think that should be in milestone 0.2 or later ... if we can make it play nicely with GridSearchCV for now, we're good I'd say.

Member

jasmainak commented Oct 5, 2016

Sorry for missing so much discussion ...

okay after some thinking, I fear that we should create a new object -- maybe GLMCV -- for grid search to work with warm restarts. We shouldn't rewrite the fit I think. GLM object is for a different purpose. I think that should be in milestone 0.2 or later ... if we can make it play nicely with GridSearchCV for now, we're good I'd say.

@themantalope

This comment has been minimized.

Show comment
Hide comment
@themantalope

themantalope Oct 12, 2016

Contributor

I agree with @jasmainak, the ideal object would be GLMCV, similar to what the sklearn ElasticNetCV object is. Let's either close this issue or change it to something like 'add glmcv object'.

EDIT:

If the latter is the case then this thread is essentially a duplicate of #47 and should be closed.

Contributor

themantalope commented Oct 12, 2016

I agree with @jasmainak, the ideal object would be GLMCV, similar to what the sklearn ElasticNetCV object is. Let's either close this issue or change it to something like 'add glmcv object'.

EDIT:

If the latter is the case then this thread is essentially a duplicate of #47 and should be closed.

@pavanramkumar pavanramkumar changed the title from make GLM object play nice with sklearn's GridSearchCV to Rewrite GLM as GLMCV: a separate class for regularization path Oct 13, 2016

@pavanramkumar

This comment has been minimized.

Show comment
Hide comment
@pavanramkumar

pavanramkumar Oct 19, 2016

Collaborator

@themantalope GLMCV() can inherit from GLM(). In anticipation of tomorrow's meeting, I made a little inheritance demo: https://gist.github.com/pavanramkumar/cdc8b9820392b62c3921c03cd3d1a896

We just need to decide

  • which parameters/ internal variables are inherited as is
  • which new parameters need to be in GLMCV(),
  • and which ones need to be overwritten.

Then it will be straightforward to write.

Collaborator

pavanramkumar commented Oct 19, 2016

@themantalope GLMCV() can inherit from GLM(). In anticipation of tomorrow's meeting, I made a little inheritance demo: https://gist.github.com/pavanramkumar/cdc8b9820392b62c3921c03cd3d1a896

We just need to decide

  • which parameters/ internal variables are inherited as is
  • which new parameters need to be in GLMCV(),
  • and which ones need to be overwritten.

Then it will be straightforward to write.

@jasmainak

This comment has been minimized.

Show comment
Hide comment
@jasmainak

jasmainak Aug 4, 2017

Member

I think this is taken care of thanks to #200. @pavanramkumar feel free to reopen if something is not addressed

Member

jasmainak commented Aug 4, 2017

I think this is taken care of thanks to #200. @pavanramkumar feel free to reopen if something is not addressed

@jasmainak jasmainak closed this Aug 4, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment