New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF Support converting 32-bit matrices directly to liblinear format … #14296
Conversation
Nice! Could you please double check that this code path has test coverage?
T->value = *x; | ||
T->index = j; | ||
++ T; | ||
if (double_precision) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the optimiser likely to compile this out of the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but even if it doesn't, the CPU's branch predictor will reduce the cost of the if statement to zero.
Otherwise lgtm |
Thanks @alexhenrie I have not reviewed it in detail but the results look nice!
Could you please double check that this code path has test coverage?
Yes, for such refactorings I think we need to be sure that this doesn't change the obtained coefficients and predictions. There are existing tests that check coef_
equality between solvers but we need to ensure,
- that the used absolute tolerance is sufficient to detect possible issues
- that those are indeed running on sparse and also 32 bit. Maybe adding the
test_dtype_match
could be useful. or adding/changing the pytest parametrization of existing tests.
liblinear already treats the features as 32 bit. Our current code was
casting to 64 bit in order to put the data into our helper containers, then
casting it to 32 bit for liblinear's structs. So unless the C cast and the
numpy cast are different, this should give identical results.
|
What file do you want the new tests in? |
Ah, I see now.
The new tests exposed a bug in the 'saga' solver where using sparse input instead of dense input changes the output significantly. I have not figured out how to fix this bug. |
Sorry, my mistake, liblinear processes in float64 (double), but we are still doing the same cast as before, so I don't think we've changed the fit here, have we?
Sometimes the fit will be different, but not erroneous. I assume you mean that the sparse handling is wrong? |
@jnothman Oh, I see what you mean now. If you have 32-bit inputs, the output before and after this patch is necessarily the same because the only difference is when the conversion to 64-bit happens. I think there's a bug because according to the new tests, the output of the 'saga' solver varies much more than I would expect. I worked around the problem by increasing the tolerance when using 'saga'. Can you think of a better solution? |
Strangely, when I set |
Aha! If the input is sparse, intercept_decay is set to 0.01 instead of 1.0. The new tests pass if I bump intercept_delay up to at least 0.045, but now some of the old tests are failing :-( |
It doesn't look like there's any easy way to solve this, so I think that for now it's reasonable to make the new tests less stringent for the 'saga' solver. What do you think? |
Yeah, I don't know how well saga converges on a dataset with 3 samples. Comparing with lbfgs as far as this PR is concerned should be enough (without removing existing tests)... |
It seems that saga does not behave properly if X is sparse and |
@jeremiedbb Thanks for the info! I have worked around the problem in the tests I wrote. If in the future saga is fixed to work on sparse input and |
I think the fix is to raise an error. It's ongoing work in the pr i linked |
@jnothman Could you merge these commits now please? :-) |
lr_32 = clone(lr_templ) | ||
lr_32.fit(X_32, y_32) | ||
assert lr_32.coef_.dtype == X_32.dtype | ||
assert lr_32.coef_.dtype in [np.float32, np.float64] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? The whole point of adding support of 32bit in other solvers was to ensure that coef_
is of the same dtype as X
. We can't remove that constraint for other solvers at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I explained in #14296 (comment), there is no reason to prevent any solver from using 32-bit input to produce 64-bit output.
lr_32_sparse = clone(lr_templ) | ||
lr_32_sparse.fit(X_sparse_32, y_32) | ||
assert lr_32_sparse.coef_.dtype == X_sparse_32.dtype | ||
assert lr_32_sparse.coef_.dtype in [np.float32, np.float64] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
That's what we are trying to prevent explicitly as part of #8769, and this would remove the associated tests. The goal is to reduce memory usage and allow faster BLAS operations for those that support AVX* etc when float32 input is provided. So I think tests affecting other solvers should not be modified in this aspect, if those tests don't work for liblinear, you can also create a separate test. |
Fine, I just changed the tests back for all solvers except liblinear. Whatever it takes to get this pull request accepted. |
liblinear uses double precision. You want to avoid a copy by directly converting float to double during the conversion to liblinear format. Did I understand correctly ?
|
||
if solver == 'liblinear' and multi_class == 'multinomial': | ||
return | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use pytest.skip(some informative message)
instead of return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
assert_allclose(lr_32.coef_, lr_64.coef_.astype(np.float32), atol=atol) | ||
|
||
if solver == 'saga' and fit_intercept: | ||
# FIXME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe explain the reason of the FIXME by adding a comment saying that saga does not correctly fit the intercept on sparse data with the default tol and max_iter parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
*/ | ||
static struct feature_node **csr_to_sparse(double *values, int *indices, | ||
int *indptr, int n_samples, int n_features, int n_nonzero, double bias) | ||
static struct feature_node **csr_to_sparse(void *x, int double_precision, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use a char*
for consistency. Also, since (np.array).data gives a char* we can be specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly the kind of thing void*
is for: In no case do we want to read the array data as chars because that would result in gibberish, and it would be a mistake to do so. void*
prevents data access until it has been determined whether the array is an array of floats or an array of doubles.
Nonetheless, the machine code generated is the same whether x is defined as void*
or char*
. So if you want to give up the protection against accidentally accessing the array data as chars, it won't hurt anything to use char*
. Let me know and I'll make the change.
Why did you move |
Yes, that is correct. The space wasted by having to represent the input datapoints as doubles instead of floats puts us over our server's memory limit. |
I wanted all of the variables describing X to be next to each other. I'll change it back if you prefer it the other way, although it won't make the diff much more readable. |
# and that the output is approximately the same no matter the input format. | ||
|
||
if solver == 'liblinear' and multi_class == 'multinomial': | ||
pytest.skip('liblinear does not support multinomial classes') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pytest.skip('liblinear does not support multinomial classes') | |
pytest.skip('liblinear does not support multinomial logistic') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@jeremiedbb Have I answered your concerns to your satisfaction? |
Sorry for the delay. You're right that |
@jeremiedbb Okay, I just changed the two |
Thanks @alexhenrie ! |
Thank you so much! This is an enormous help to me and my colleagues :) |
This decreases the memory required for regression by about 33% on 32-bit dense matrices and by about 42% on 32-bit CSR matrices while not noticeably affecting the running time in any 32-bit or 64-bit case. Direct support for 32-bit matrices is the ideal solution for my group because we only need 32-bit precision and cutting the memory requirement by a third will get us inside our servers' memory limits.