Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSTM #43

Closed
wants to merge 15 commits into from
Closed

LSTM #43

wants to merge 15 commits into from

Conversation

Withington
Copy link
Collaborator

Reference Issues/PRs

Added a simple stacked LSTM regressor for forecasting sktime/sktime#5829

Any other comments?

I've put it as a work in progress because it will need mods to work with the check_is_fitted PR #39

Classifier still to do.

Are you happy with this choice of baseline LSTM?
I'll put some suggestions for refs detailing more complex LSTMs in sktime/sktime#5829, we could add one of those too.

Copy link
Contributor

@mloning mloning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left on more general comment, everything else looks good!

super().__init__(
model_name=model_name,
model_save_directory=model_save_directory)
LSTMNetwork.__init__(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been doing this in other places as well, but this way of handling multiple inheritance doesn't seem right to me.

model_name and model_save_directory should be part of the BaseDeepNetwork as we use them in both classifiers and regressors, so LSTMNetwork should inherit from BaseDeepNetwork and call its constructor via super(). This works without any problems for the BaseDeepRegressor, which essentially becomes a mixin class in that case, but the problem is that BaseDeepClassifier creates other attributes (classes and nb_classes), so I believe we have three options:

  • leave it as it is,
  • make classes cooperative,
  • simply don't initiate the attributes in the constructor and create them when needed as done in sklearn

Also see this stackoverflow post and linked article.

Copy link
Contributor

@james-large james-large Mar 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finer details of python inheritance, and what looks good to a python developer vs what functionally works etc. are probably best for me to delegate.

An alternative overall is potentially also to use composition for the networks?

Currently, LSTMRegressor is a BaseDeepRegressor, and is a LSTMNetwork

Could reorganise so that LSTMRegressor is (still) a BaseDeepRegressor, and has a LSTMNetwork

It makes no real difference at all for this class (aside from aesthetics), but for those classifiers with bespoke training methods and/or internal parameter tuning, e.g. MCNN, maybe it makes sense to say that it is a regressor that has a network + a method of training/tuning it (which, for most architectures, is just to call the data cleaning functions and the basic keras fit()).

Ultimately though, this code is 'supposed to be' hidden from the user, aside from being open source obviously. Some janky assumptions and dependencies behind the scenes are permissible in my opinion, assuming this doesn't become a functional as well as aesthetic issue in the future

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thanks.

  • Cooperative classes might work, yes. It would be nice to see how it looks here. Do you have time to try it out @mloning ?
  • Initiate attributes when needed - yes, straightforward. Downsides:
    • either a lot of code repetition (but, since 95% of it is assignment, it's only as much as calling super.__init__ in many places) and an increased risk of missing initiating one of the attributes somewhere
    • or
    • writing an initiate_common_variables function.
  • Regressor has a network does seem appropriate and was trialed but hit problems when forecasting called into sklearn clone. The BaseDeepNetwork parameters were not cloned.

Copy link
Contributor

@mloning mloning Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • So with cooperative classes it would look something like the code below. Technically, B's call to super isn't required, but this way the order of inheritance doesn't matter. I don't like the kwargs in the constructor.
class A:  # base network
    
    def __init__(self, a, **kwargs):
        self.a = a
        super(A, self).__init__(**kwargs)

class B:  # base regressor/classifier
    
    def __init__(self, b, **kwargs):
        self.b = b
        super(B, self).__init__(**kwargs)

class C(A, B):  # concrete estimator
    
    def __init__(self, a, b, c):
        self.c = c
        super(C, self).__init__(a=a, b=b)

c = C(1, 2, 3)
print(c.__dict__)
>>> {'c': 3, 'a': 1, 'b': 2}
  • I prefer to avoid this type of multiple inheritance here, and create attributes on the fly, we can add unit tests that check that all classifiers have the required attributes after calling fit.
  • I didn't understand why clone didn't work? I think we should ensure that clone works on all our estimators, that's essential for tuning and ensembling.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our estimators work fine with clone now. I put a link above to an old commit where clone failed; it was when using BaseDeepRegressor has a BaseDeepNetwork.

Have I read it right: you favour this option below?

simply don't initiate the attributes in the constructor and create them when needed as done in sklearn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but don't have a strong preference, I'd still prefer cooperative classes over the current way I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened up issue #45 for this because it will need a bit of work, separate to LSTM.

'''
:param nb_epochs: int, the number of epochs to train the model
:param batch_size: int, the number of samples per gradient update.
:param units: int, array of size 2, the number units in each LSTM layer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future can open this up to allow for tuning along different axes, e.g. #layers and #units in each layer, etc. Happy with this and these numbers as defaults

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It could be done with
units=[16, 16, 32, 16] to create 4 layers, etc.

Unless you want to be more explicit, as per CNN and have two args like
nb_conv_layers=2
filter_sizes=[6, 12]
?

@Withington Withington marked this pull request as ready for review March 13, 2020 19:09
@Withington
Copy link
Collaborator Author

I've changed the initialisation of LSTM and CNN as per #45 option 3. Please take a look and let me know what you think. If it's ok I'd then go on to do the other models.

It's not perfect, sklearn.utils.estimator_checks check_estimator (ref) would still fail, if we ran it, for two reasons:

  1. self.is_fitted is set in __init__
  • which would cause "AssertionError: Estimator CNNClassifier should not set any attribute apart from parameters during init. Found attributes ['is_fitted']. /sklearn/utils/estimator_checks.py:2413:"
  • However, is_fitted has to be initialised here because sktime_dl/meta/_dlensemble.py:69 checks it.
  • Can you think of a better way to do this?
  1. sklearn tests to see if the estimator can handle an input X np array but sktime fails because it requires X to be a pd DataFrame (see sktime/utils/validation/supervised.py:26) . @mloning would you perhaps change this in sktime so that it can handle X as either a DataFrame or a np array?

@mloning
Copy link
Contributor

mloning commented Apr 27, 2020

  1. In sktime, we're somehow managed to avoid running these checks until now, but I will add them as part of the ongoing refactoring. I do understand why there cannot be any input validation/logic in __init__.py, but I don't really understand why we cannot set additional attributes. Cloning seems to work as expected. I've asked them on gitter.
  2. There's an ongoing discussion on the data container in sktime, we've settled on pandas for now, in the end it's easier to assume a single input type. Even if the type would be the same, the input format would still be different, as we have time series. But we can just skip that test, no?

@mloning
Copy link
Contributor

mloning commented Apr 28, 2020

I've talked to one of the sklearn core developers about the estimator check that checks that no other attributes are initialised in __init__. As long as there is no logic or interaction between other attributes, initialising additional attributes seems to be no problem. We can keep is_fitted and simply skip/ignore that test (or even better, modify the test so that it ignores the is_fitted attribute but checks for other attributes)

@Withington
Copy link
Collaborator Author

Great, thanks @mloning . Yes we can continue to not perform the check_estimator test, I just have it running locally.

I'll go ahead and apply this same initialisation to the other estimators.

@mloning
Copy link
Contributor

mloning commented Apr 28, 2020

check_estimator iterates over a number of checks, I think running them is a good idea, but we can skip/ignore those that don't make sense for sktime. I think you can use it as an iterator and then filter out the checks that we don't want to run. I'll look into this myself for sktime some time later this week.

@james-large
Copy link
Contributor

james-large commented Apr 28, 2020

For reference with the estimator checks, https://github.com/scikit-learn/scikit-learn/blob/95d4f0841d57e8b5f6b2a570312e9d832e69debc/sklearn/utils/estimator_checks.py#L239

On the surface those all look useful and could be worked towards being run successfully, but e.g. check_methods_subset_invariance
would break for sktime-dl as far as I can imagine (unless the minibatches in the estimator test are a multiple of the batch size of the network).

Does make sense either way if there's a little util func in base sktime which prunes the list of checks that are wanted, then an sktime-dl util could take that and prune it further (and so on for other sktime-extensions)

@james-large
Copy link
Contributor

However, is_fitted has to be initialised here because sktime_dl/meta/_dlensemble.py:69 checks it.

On this point, if this is the only place in our code that checks it, a check for the attributes existence in the first place could definitely be added, and is_fitted removed from the inits

@mloning
Copy link
Contributor

mloning commented Apr 29, 2020

Yes, I agree, will look into this for base sktime as part of the refactoring.

@mloning
Copy link
Contributor

mloning commented May 3, 2020

I had a look at scikit-learn's estimator check and I'm not sure anymore, if they are very useful for us:

  • all tests run on numpy arrays, hence they wouldn't test our preferred workflow with nested pandas dataframes, as a consequence they would also only test the univariate case,
  • we would have to skip/ignore a few tests, including tests for sample weights, class weights, setting attributes in the constructor, is-fitted state
  • we have to add new tests for pandas data frame and different input data types (missing data, unequal length, data types, etc)

I haven't fully decided yet, but I may start adapting/re-writing them for sktime. What do you think?

@james-large
Copy link
Contributor

For the purpose of ensuring compatibility with all (still will be for most) of scikit-learn's functionality, changing anything means the overall checks fail. If it's desirable enough that bespoke checks are wanted to test if a model implementation meets sktime's requirements (but potentially not scikit-learn's), it could be worth adapting them yeah

Afaik, sktime's models etc still can work with numpy arrays as input? Though that's not the preferred workflow I understand.

On the last point, these checks are presumably to test minimal required functionality, would things like handling missing data, unequal length, etc be a part of that? In my mind, some can handle those things innately, others will need a data preprocessing step for them on the experimental level

@mloning
Copy link
Contributor

mloning commented May 5, 2020

  • Some estimators in sktime may accept numpy arrays, but I want to find a consistent solution for all of them, I'm not sure that accepting numpy arrays is a good idea, it only covers the simplest univariate case, either way, as long as we use the same functions for input checks/conversions it should be easy to change later
  • For an overview of the tests we need see Update unit testing sktime#251, I've started to re-write some of them for sktime in this PR Package refactoring  sktime#246

@Withington
Copy link
Collaborator Author

It seems like sktime-dl models could accept X as a numpy array (e.g. they all call check_and_clean_data and that returns X as a numpy array); it's sktime that insists on a pd DataFrame, in validation/supervised.py.

The way forward might be to change sktime to enable sktime-dl to handle simple time series datasets as numpy array, whilst still preferring X to be input as a DataFrame of Series.

@Withington
Copy link
Collaborator Author

This ready to be merged now @mloning @james-large

Copy link
Contributor

@mloning mloning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good, just a few suggestions/ideas

sktime_dl/deeplearning/cnn/_base.py Outdated Show resolved Hide resolved
sktime_dl/deeplearning/cnn/_classifier.py Show resolved Hide resolved
sktime_dl/deeplearning/fcn/_classifier.py Show resolved Hide resolved
sktime_dl/deeplearning/fcn/_regressor.py Show resolved Hide resolved
@Withington
Copy link
Collaborator Author

Thanks for the thorough reviews @mloning

@mloning
Copy link
Contributor

mloning commented May 8, 2020

Merging #51 introduced some conflicts ...

@Withington
Copy link
Collaborator Author

I merged in the lint changes from dev. This branch is still showing as having conflicts though. I don't have write access so I can't hit the "resolve conflicts" button. Moreover, the "Files Changed" view on this PR seems incorrect in places, as commented on above. Can you assist?

@mloning
Copy link
Contributor

mloning commented May 11, 2020

Sure, will look into this later today!

@mloning
Copy link
Contributor

mloning commented May 16, 2020

Closed because re-opened and merged in #55

@mloning mloning closed this May 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants