Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression error = 0.0 - looking for suggestions #200

Closed
jchiang4 opened this issue Dec 18, 2018 · 7 comments
Closed

Regression error = 0.0 - looking for suggestions #200

jchiang4 opened this issue Dec 18, 2018 · 7 comments

Comments

@jchiang4
Copy link

jchiang4 commented Dec 18, 2018

Describe the bug
After a long spark job, regression error comes back with 0 error and have double checked that response is not in the predictor. How to trouble shoot this issue?

Here's the log printout of the last section -

"BestModelName" : "OpGBTRegressor_000000000029_0",
"HoldoutEvaluation" : [ "com.salesforce.op.evaluators.MultiMetrics", "{\"metrics\":{\"regression evaluation metrics\":{\"RootMeanSquaredError\":0.0,\"MeanSquaredError\":0.0,\"R2\":\"NaN\",\"MeanAbsoluteError\":0.0}}}" ],
"ValidationParameters" : {
"parallelism" : 8,
"seed" : 112233,
"evaluator" : "root mean square error",
"stratify" : false,
"numFolds" : 3
},
"ProblemType" : "Regression",
"EvaluationMetric" : "RootMeanSquaredError",
"TrainEvaluation" : [ "com.salesforce.op.evaluators.MultiMetrics", "{\"metrics\":{\"regression evaluation metrics\":{\"RootMeanSquaredError\":0.0,\"MeanSquaredError\":0.0,\"R2\":\"NaN\",\"MeanAbsoluteError\":0.0}}}" ],
"DataPrepResults" : {
"className" : "com.salesforce.op.stages.impl.tuning.DataSplitterSummary"
},
"BestModelUID" : "OpGBTRegressor_000000000029"
}
}
18/12/13 21:43:39 INFO OpWorkflowRunner: Total run time: 3h21m42.079s

Does anyone have any suggestions on how best to troubleshoot?

Thanks,

Jim

@tovbinm
Copy link
Collaborator

tovbinm commented Dec 19, 2018

This is abnormal. How large is the datasize?

@kinfaikan @Jauntbox ideas?

@jchiang4
Copy link
Author

Not too big. 1.8M rows, 32 columns. Approx size = 0.3 GB. I've also tried to reduce to only 200K rows and it gives me the same issue. Any ideas?

@jchiang4
Copy link
Author

Also, reduced the predictor features as well. Still same issue.

@Jauntbox
Copy link
Contributor

Jauntbox commented Jan 8, 2019

Hey Jim, just got back from vacation so am taking a look at this. It's pretty hard to tell what could be going wrong without some more info.

For example, if you look at the feature contributions in the ModelInsights case class, what does the model think the most important features are?
Is something going wrong with setting the label perhaps? Eg. I could see this happening if the labels all got set to the same value somehow
Do other regression algorithms work on this dataset, or is this a problem with the GBTRegressor only?

@jchiang4
Copy link
Author

Thanks. Will continue investigating next week. Will update this thread with more info.

@jchiang4
Copy link
Author

Thanks for your suggestions, Jauntbox. You were correct. The labels got set to the same value in a data cleansing routine, hence the 0.0 error. Closing the issue.

@jchiang4
Copy link
Author

Closing issue. Thanks, Jim

@tovbinm tovbinm mentioned this issue Jul 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants