New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cannot fit 'intercept only' logistic regression model #1596
Comments
For that matter, fitting a perfectly cromulent model without an intercept also throws an error. With the same setup: # this works fine
s_mod_ok <- spark_data %>%
ml_logistic_regression(formula = is_setosa ~ Sepal_Length,fit_intercept=TRUE)
# this errors:
s_mod_bad <- spark_data %>%
ml_logistic_regression(formula = is_setosa ~ Sepal_Length,fit_intercept=FALSE) The error and stack trace I get are:
|
As this was written, I believe `coefficients` was referencing a function, not the model's coefficients. Hoping this will fix sparklyr#1596
To be sure, the PR should fix the later case where there is a variable and no intercept. I should have made this two separate issues. My bad. |
Sorry, this should not have been closed. #1597 fixes the case of 'no intercept', which you could use to 'fake' the intercept only model, I think. However, the following are all broken: # make some data
train_data <- iris %>% mutate(is_setosa=as.numeric((Species=='setosa')))
# copy it into spark
spark_data <- copy_to(sc,train_data,'like_iris',overwrite=TRUE)
# these all error.
# scala error:
s_int_mod <- spark_data %>% ml_logistic_regression(formula = is_setosa ~ 1)
# nonsensical in R:
s_int_mod <- spark_data %>% ml_logistic_regression(formula = is_setosa ~ )
# this throws a scala error:
s_int_mod <- spark_data %>% ml_logistic_regression(formula = "is_setosa ~ ")
# this might work when I get fix for #1597, but is against the spirit, really.
s_int_mod <- spark_data %>% mutate(one=1.0) %>% ml_logistic_regression(formula = is_setosa ~ one,fit_intercept=FALSE) The first one, which is of interest here, gives the stack trace:
From what I can tell,
So perhaps the underlying logistic regression really wants to have features in it. I will submit some tests that would catch this one, but I have no fix in mind, so the tests will only break the build. |
The 'pipeline API' gives the same error, of course: # try the pipeline model
pipeline <- ml_pipeline(sc) %>%
ft_r_formula(is_setosa ~ 1) %>%
ml_logistic_regression()
s_int_mod <- pipeline %>%
ml_fit(spark_data)
|
adding test to catch sparklyr#1596. fix not in place yet, this will break the build.
OK I think
So the parser isn't understanding that |
thanks for the update and thanks for all the work. Should I submit an issue upstream in |
I have not. |
I looked to see if it had already been filed and found only this issue, spark-19400, which seems to suggest that |
Thanks, this is interesting, taking a look now... |
Failing in Spark 2.3.0...
|
In Spark 2.2.0 this works in sparklyr:
|
Looks like issue has been fixed in |
It is possible to build a 'intercept only' glm in
R
, but not viaml_logistic_regression
:Depending on the size of the data (I was trying this on 'real' data, not
iris
), it may take a while, then throw an error, which suggests the problem is somewhere later in the processing, rather than earlier. The error message is the mystifying:with this awesome stack trace:
The text was updated successfully, but these errors were encountered: