Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I run wide and deeptab,I got stuck #68

Closed
zhang-HZAU opened this issue Dec 10, 2021 · 3 comments
Closed

When I run wide and deeptab,I got stuck #68

zhang-HZAU opened this issue Dec 10, 2021 · 3 comments
Assignees

Comments

@zhang-HZAU
Copy link

Hi pytorch_widedeep team,
First of all thank you for your contributions to the wide_deep field. Recently I am doing a task about classification. The essence is similar to the predicted adult salary level shown in the example in the project readme. All of my input data has been processed as numerical values. The data contains continuous values ​​and some indicators are discrete values ​​(such as: 0,1,2...). In addition, the data contains missing values. My task is a five-category problem.
Question one:
I used the code used to predict adult wages in the readme, changed the data input part and changed "binary" to "multiclass" to adapt to my task.When I executing the following code:

wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=cross_cols)
X_wide = wide_preprocessor.fit_transform(df_train)
wide = Wide(wide_dim=np.unique(X_wide).shape[0], pred_dim=1)

tab_preprocessor = TabPreprocessor(embed_cols=embed_cols, continuous_cols=cont_cols)
X_tab = tab_preprocessor.fit_transform(df_train)
deeptabular = TabMlp(
    mlp_hidden_dims=[64, 32],
    column_idx=tab_preprocessor.column_idx,
    embed_input=tab_preprocessor.embeddings_input,
    continuous_cols=cont_cols,
)

model = WideDeep(wide=wide, deeptabular=deeptabular)

I got the following error: "RuntimeWarning: invalid value encountered in true_divide". I understand that it is caused by the occurrence of 0 divided by 0, but the error message given by the code is not enough for me to locate the problem segment. I would like to ask how to solve the problem?
Question two:
In the process of using the function to divide the training set into the validation set:

trainer = Trainer(model, objective="multiclass", metrics=[Accuracy])
trainer.fit(
    X_wide=X_wide,
    X_tab=X_tab,
    target=target,
    n_epochs=15,
    batch_size=16,
    val_split=0.1,
)

I reported the error "IndexError: index 648 is out of bounds for axis 0 with size 530" exceeding the index range of the training set. I have no solution.
Upload the source code to the attachment, and see the ipynb file running results for detailed error information. Looking forward to your answer~
liver_predict.zip

@jrzaurin
Copy link
Owner

Hey @zhang-HZAU

Thanks for the issue.

You are not the first one that finds this not intuitive so we will add a warning as soon as possible.

The "issue" is that when you have a multiclass classification problem, you need to specify the number of classes via the pred_dim param to the WideDeep class, as this has no notion/information of the target.

see here:
https://pytorch-widedeep.readthedocs.io/en/latest/model_components.html#pytorch_widedeep.models.wide_deep.WideDeep

model = WideDeep(wide=wide, deeptabular=deeptabular, pred_dim=5)

And the number of classes needs to start from 0.

Going to leave this issue open until we add that Warning if the pred_dim is not passed. We will also alias it as num_classes

Thanks again

Regarding the second question, I will have a look to the code.

Could you please open a separate issue?

Cheers!
J.

@jrzaurin jrzaurin self-assigned this Dec 10, 2021
@jrzaurin
Copy link
Owner

jrzaurin commented Dec 10, 2021

@zhang-HZAU

is you could send me the data or point me towards where I can get it would be great

(also, the library is not autoML :) so you would have to impute the NaN before passing it to the preprocessor....just saying🙂

@zhang-HZAU
Copy link
Author

Hey @zhang-HZAU

Thanks for the issue.

You are not the first one that finds this not intuitive so we will add a warning as soon as possible.

The "issue" is that when you have a multiclass classification problem, you need to specify the number of classes via the pred_dim param to the WideDeep class, as this has no notion/information of the target.

see here: https://pytorch-widedeep.readthedocs.io/en/latest/model_components.html#pytorch_widedeep.models.wide_deep.WideDeep

model = WideDeep(wide=wide, deeptabular=deeptabular, pred_dim=5)

And the number of classes needs to start from 0.

Going to leave this issue open until we add that Warning if the pred_dim is not passed. We will also alias it as num_classes

Thanks again

Regarding the second question, I will have a look to the code.

Could you please open a separate issue?

Cheers! J.
Thank you for your answers. According to your suggestions, my problem feedback is in another "issue", as follows: #70

@jrzaurin jrzaurin closed this as completed Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants