-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom dataset in skorch when using sklearn GradSearchCV? #212
Comments
Getting pytorch My recommendation for now would be to try out a different data format that works with That being said, we could surely think about providing a helper class that wraps a Minor note: You only need to provide the |
@BenjaminBossan Thanks for the clarification. Does that mean if I want to use sklearn GridSearchCV, Hmm, I mostly likely either (1) Load all the data into memory and store it as numpy so that I can use GridSearchCV, or (2) hmm...[no idea]. My custom Dataset follows almost the same as what skorch tutorial is (https://nbviewer.jupyter.org/github/dnouri/skorch/blob/master/notebooks/Advanced_Usage.ipynb):
except xx[i] and yy[i] is a pytorch cuda.tensor. One of the advantage of using skorch is that it is sklearn-compatible, which is very attractive. Hmm, if I use my custom dataset (like above), I have no way to work with sklearn GridCVSearch? |
Maybe we could find a solution for your problem if you tell us in more detail what your use case. For example, when loading everything into memory is an issue, there is a solution where your X is just a numpy array of indices (or file names), which will work with sklearn. Then you can write a custom Dataset that will return the data indicated in the index/name when |
@BenjaminBossan Thanks a lot for your help! So, I basically preload all my data into memory, which is a 4-D data. I wrote a custom Dataset and a custom Transform because i want to do some specific data manipulation. Do you see any problem below that leads to not being able to use GridSearchCV (which I did encounter an error) I wrote my getitem as follow:
|
Okay, since your data is completely in memory, there should be a solution. What you probably don't know is that So for your case specifically, something like this should work:
I don't know exactly what your data looks like, thus there might be some more minor adjustments you need to make, but at the end of the day this should work. Note 1: We generally recommend to only fit with Note 2: If you perform random image augmentation within your dataset, you should be careful since the same augmentations would also be applied during prediction, which is not always what you want. |
Great, thanks @BenjaminBossan. Let me try this and see how it goes. |
According to the tutorial (https://nbviewer.jupyter.org/github/dnouri/skorch/blob/master/notebooks/Advanced_Usage.ipynb), skorch supports custom dataset (trainset in this case) object in training, which is perfect. As along as I provide the y, then I can train the model:
net.fit(trainset, y=trainset_label)
However, when I tried doing Grid search using sklearn, I got inconsistent X,y dimension error. The code that I had is below:
Is the custom dataset being supported in skorch when using sklearn GridSearchCV?
The text was updated successfully, but these errors were encountered: