-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSTM #43
LSTM #43
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left on more general comment, everything else looks good!
super().__init__( | ||
model_name=model_name, | ||
model_save_directory=model_save_directory) | ||
LSTMNetwork.__init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've been doing this in other places as well, but this way of handling multiple inheritance doesn't seem right to me.
model_name
and model_save_directory
should be part of the BaseDeepNetwork
as we use them in both classifiers and regressors, so LSTMNetwork
should inherit from BaseDeepNetwork
and call its constructor via super()
. This works without any problems for the BaseDeepRegressor
, which essentially becomes a mixin class in that case, but the problem is that BaseDeepClassifier
creates other attributes (classes
and nb_classes
), so I believe we have three options:
- leave it as it is,
- make classes cooperative,
- simply don't initiate the attributes in the constructor and create them when needed as done in sklearn
Also see this stackoverflow post and linked article.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finer details of python inheritance, and what looks good to a python developer vs what functionally works etc. are probably best for me to delegate.
An alternative overall is potentially also to use composition for the networks?
Currently, LSTMRegressor is a BaseDeepRegressor, and is a LSTMNetwork
Could reorganise so that LSTMRegressor is (still) a BaseDeepRegressor, and has a LSTMNetwork
It makes no real difference at all for this class (aside from aesthetics), but for those classifiers with bespoke training methods and/or internal parameter tuning, e.g. MCNN, maybe it makes sense to say that it is a regressor that has a network + a method of training/tuning it (which, for most architectures, is just to call the data cleaning functions and the basic keras fit()).
Ultimately though, this code is 'supposed to be' hidden from the user, aside from being open source obviously. Some janky assumptions and dependencies behind the scenes are permissible in my opinion, assuming this doesn't become a functional as well as aesthetic issue in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, thanks.
- Cooperative classes might work, yes. It would be nice to see how it looks here. Do you have time to try it out @mloning ?
- Initiate attributes when needed - yes, straightforward. Downsides:
- either a lot of code repetition (but, since 95% of it is assignment, it's only as much as calling
super.__init__
in many places) and an increased risk of missing initiating one of the attributes somewhere - or
- writing an
initiate_common_variables
function.
- either a lot of code repetition (but, since 95% of it is assignment, it's only as much as calling
- Regressor has a network does seem appropriate and was trialed but hit problems when forecasting called into sklearn clone. The
BaseDeepNetwork
parameters were not cloned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- So with cooperative classes it would look something like the code below. Technically, B's call to super isn't required, but this way the order of inheritance doesn't matter. I don't like the kwargs in the constructor.
class A: # base network
def __init__(self, a, **kwargs):
self.a = a
super(A, self).__init__(**kwargs)
class B: # base regressor/classifier
def __init__(self, b, **kwargs):
self.b = b
super(B, self).__init__(**kwargs)
class C(A, B): # concrete estimator
def __init__(self, a, b, c):
self.c = c
super(C, self).__init__(a=a, b=b)
c = C(1, 2, 3)
print(c.__dict__)
>>> {'c': 3, 'a': 1, 'b': 2}
- I prefer to avoid this type of multiple inheritance here, and create attributes on the fly, we can add unit tests that check that all classifiers have the required attributes after calling fit.
- I didn't understand why clone didn't work? I think we should ensure that clone works on all our estimators, that's essential for tuning and ensembling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our estimators work fine with clone now. I put a link above to an old commit where clone failed; it was when using BaseDeepRegressor has a BaseDeepNetwork.
Have I read it right: you favour this option below?
simply don't initiate the attributes in the constructor and create them when needed as done in sklearn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but don't have a strong preference, I'd still prefer cooperative classes over the current way I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened up issue #45 for this because it will need a bit of work, separate to LSTM.
''' | ||
:param nb_epochs: int, the number of epochs to train the model | ||
:param batch_size: int, the number of samples per gradient update. | ||
:param units: int, array of size 2, the number units in each LSTM layer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future can open this up to allow for tuning along different axes, e.g. #layers and #units in each layer, etc. Happy with this and these numbers as defaults
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It could be done with
units=[16, 16, 32, 16]
to create 4 layers, etc.
Unless you want to be more explicit, as per CNN and have two args like
nb_conv_layers=2
filter_sizes=[6, 12]
?
I've changed the initialisation of LSTM and CNN as per #45 option 3. Please take a look and let me know what you think. If it's ok I'd then go on to do the other models. It's not perfect,
|
|
I've talked to one of the sklearn core developers about the estimator check that checks that no other attributes are initialised in |
Great, thanks @mloning . Yes we can continue to not perform the check_estimator test, I just have it running locally. I'll go ahead and apply this same initialisation to the other estimators. |
|
For reference with the estimator checks, https://github.com/scikit-learn/scikit-learn/blob/95d4f0841d57e8b5f6b2a570312e9d832e69debc/sklearn/utils/estimator_checks.py#L239 On the surface those all look useful and could be worked towards being run successfully, but e.g. check_methods_subset_invariance Does make sense either way if there's a little util func in base sktime which prunes the list of checks that are wanted, then an sktime-dl util could take that and prune it further (and so on for other sktime-extensions) |
On this point, if this is the only place in our code that checks it, a check for the attributes existence in the first place could definitely be added, and is_fitted removed from the inits |
Yes, I agree, will look into this for base sktime as part of the refactoring. |
I had a look at scikit-learn's estimator check and I'm not sure anymore, if they are very useful for us:
I haven't fully decided yet, but I may start adapting/re-writing them for sktime. What do you think? |
For the purpose of ensuring compatibility with all (still will be for most) of scikit-learn's functionality, changing anything means the overall checks fail. If it's desirable enough that bespoke checks are wanted to test if a model implementation meets sktime's requirements (but potentially not scikit-learn's), it could be worth adapting them yeah Afaik, sktime's models etc still can work with numpy arrays as input? Though that's not the preferred workflow I understand. On the last point, these checks are presumably to test minimal required functionality, would things like handling missing data, unequal length, etc be a part of that? In my mind, some can handle those things innately, others will need a data preprocessing step for them on the experimental level |
|
It seems like sktime-dl models could accept X as a numpy array (e.g. they all call check_and_clean_data and that returns X as a numpy array); it's sktime that insists on a pd DataFrame, in validation/supervised.py. The way forward might be to change sktime to enable sktime-dl to handle simple time series datasets as numpy array, whilst still preferring X to be input as a DataFrame of Series. |
This ready to be merged now @mloning @james-large |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look good, just a few suggestions/ideas
Thanks for the thorough reviews @mloning |
Merging #51 introduced some conflicts ... |
I merged in the lint changes from dev. This branch is still showing as having conflicts though. I don't have write access so I can't hit the "resolve conflicts" button. Moreover, the "Files Changed" view on this PR seems incorrect in places, as commented on above. Can you assist? |
Sure, will look into this later today! |
Closed because re-opened and merged in #55 |
Reference Issues/PRs
Added a simple stacked LSTM regressor for forecasting sktime/sktime#5829
Any other comments?
I've put it as a work in progress because it will need mods to work with the check_is_fitted PR #39
Classifier still to do.
Are you happy with this choice of baseline LSTM?
I'll put some suggestions for refs detailing more complex LSTMs in sktime/sktime#5829, we could add one of those too.