New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I/O for time gen class #2189
Comments
So far I've just been using
So yes, a EHN would be welcome in my opinion. |
Yeah - I've been doing pickling too, but I'm always worried about an API On Fri, Jun 5, 2015 at 3:40 PM, J-R King notifications@github.com wrote:
|
So I was thinking about this a little bit more, and I think this would be straightforward for any timegen-specific attributes, but potentially very convoluted for the models, cv objects, etc. One option would be to use something like tables (maybe h5io?) which I think automatically pickles things that it can't easily handle. However, doing this makes me wonder if it makes more sense to have the I/O just be something the user handles themselves for their use case. It doesn't seem like sklearn will improve their own I/O for the same reason of complexity. What do you think? |
I dont know much about io so I don't have any strong opinion. The large part of the gat object is 1) y_pred_ 2)scores_ and 3) the fitted ATM I personnally pickle everything but it's true that it s pretty slow.. On Tuesday, 22 December 2015, Chris Holdgraf notifications@github.com
|
Yep - I think all that is correct. As you mention, the challenge is that it's really hard to support arbitrary potential models that people will fit. (e.g., if they use logistic regression vs. a random forest, the attributes of interest change). In our case, we'd either have to hard-code ways to handle those specific use cases, or just pickle the whole thing anyway. And if we're pickling the bigger objects anyway, then it shouldn't be a big deal for the user to do the I/O with h5io on their own, no? |
If we can incorporate the io, we might as well do it for them. It would On Tuesday, 22 December 2015, Chris Holdgraf notifications@github.com
|
Hmm ok - let me cook something up then. The basic flow as I see it would be:
WDYT? |
I would instead of pickling save the attributes of sklearn objects and their parameters, to then re-construct them after reading. Most of these attributes should be standard types h5py can deal with. |
OK, so you vote for 'i' then? The only challenge is that it requires hard-coding for each type of sklearn object (e.g., it's coef_ if we're dealing with an SVM, but feature_importances_ if it's a RandomForest) |
If we had a mechanism for figuring out the field holding the feature importance, we could also use that to plot model topomaps. |
Yep agreed - this is why I think it's important to keep the model parameter On Tue, Dec 22, 2015 at 1:41 PM jona-sassenhagen notifications@github.com
|
I guess we can start easy with a blind pickling and we'll make subsequent On Tuesday, 22 December 2015, Chris Holdgraf notifications@github.com
|
FYI sklearn has never invested ressource to allow saveing to disk models
offering forward compatibility. Use pickle if you want but be ready to have
io problems if you update mne or sklearn. There is no easy way out unless
we had custom code restricted to linear models (not even pipelines).
|
95% of usage cases would probably be covered by just thinking about SVM and LR right? (Jean-Remí, you're still mainly using LR and SVMs for classification GAT too right?) |
I also regularly use ridge lasso multi lasso and LDA On Tuesday, 22 December 2015, jona-sassenhagen notifications@github.com
|
Ah okay. I've never seen Ridge or LDA give better results than LR for my usage cases ... |
Ridge has been a bit better on some 32 elec eeg dataset. And there s a On Tuesday, 22 December 2015, jona-sassenhagen notifications@github.com
|
So...choose 3-4 to support and then add support for new ones on an On Tue, Dec 22, 2015 at 4:12 PM Jean-Rémi KING notifications@github.com
|
Interesting @kingjr , I use low density EEG a lot and I always see LR and SVM strongly dominate Ridge. @choldgraf Maybe even just start with the default (LR) to get the API going? |
@jona-sassenhagen in my case there s about 5000 epochs. On Tuesday, 22 December 2015, jona-sassenhagen notifications@github.com
|
ok - though IIRC sklearn may behave differently for different kinds of coefficients. I think I remember an issue about this a while back where some estimators would let you set things like coef_, while others would raise an error. @agramfort correct me if I'm wrong? One option, then, would be to assume that there will be a set of linear weights, one per input feature, that belong to a GAT object. Then we could set those weights as an array and do I/O accordingly. Though that would lose some of the provenance for the full details of the model / CV / etc |
yes use pickle and we removed a lot of the old classes. closing |
I've been running some time gen models, and a tricky problem I've come across is how to deal with I/O. The model fitting (well, the predictions actually) can take a really long time, so I'd prefer to have one script do all the model fitting, and another script that looks at the results.
However, right now I don't have a good system for reading / writing from disk. I am wondering if it'd be worth implementing something similar to AverageTFR, which basically just takes attributes, writes them in a dictionary to an h5 file, and then reads them back from memory in a similar manner.
The main challenge I see to this would be that people may have arbitrary sklearn objects in the attribute for models. I think the h5 code will serialize this which is sub-optimal but at least shouldn't break. What do you guys think?
The text was updated successfully, but these errors were encountered: