Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't predict using test data #24

Open
GriffinRidgeback opened this issue Apr 11, 2019 · 4 comments
Open

can't predict using test data #24

GriffinRidgeback opened this issue Apr 11, 2019 · 4 comments

Comments

@GriffinRidgeback
Copy link

Hello - I have successfully trained my model using my training dataset. Now, when I go to predict, using this command:

python model.py -d ../testing_imputed.csv -m predict

I get this error:

ValueError: Usecols do not match columns, columns expected but not found: ['accepted']

but this is the column I'm trying to predict! Am I supposed to create the target/prediction column in the test data and it will be populated with the predictions? This is a logistic regression problem, where I am trying to predict whether or not a loan will be approved. If I need to add a column, is it:

test_data'['accepted'] = ""

Or do I zero it out and the prediction will update the value with what the model should predict?

Thanks in advance to all who respond.

@runska
Copy link

runska commented Apr 11, 2019

I'm having the exact same issue/confusion.

I added a target column and filled it with np.nan before running the predictions, but then a ValueError was thrown.

@GriffinRidgeback
Copy link
Author

GriffinRidgeback commented Apr 11, 2019 via email

@GriffinRidgeback
Copy link
Author

so I got around the problem using this link. I modified my code as follows:

header_list = test_data.columns

header_list = list(header_list)

header_list.append('accepted')

test_data = test_data.reindex(columns = header_list)

test_data = test_data.fillna(0)

When I run the command to generate predictions, I get a predictions.csv but the results it contains are not what I would expect for a logistic regression. It contains probabilities, not 1 or 0, which are the values in the training data and what I would expect. Why is that???

@2655398311
Copy link

I had the same problem
It's not clear that the format of the prediction model's data set is the data.csv written on GIT. This data set does not find where it is

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants