You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! Thanks so much for this project, it is very cool!
In the util code for kmodes, here, there is a numpy array initialized using np.zeros, and then immediately converted to the int dtype. However, the default dtype for np.zeros is float64, which takes up the maximum amount of memory.
I'm wondering if it would be possible to switch the line over from being what it is currently:
Xenc = np.zeros(X.shape).astype('int')
to something like
Xenc = np.zeros(X.shape, dtype='int32')
This would set the dtype from the beginning, and use less memory in building the new array, since int32s take up less space, but should still have plenty of capacity for what they're being asked to hold here.
If this is possible, my team would be willing to help get this incorporated -- this fix would help us avoid numpy memory errors we're getting when trying to allocate that array!
The text was updated successfully, but these errors were encountered:
That sounds like a good improvement, @rggelles . Please submit a PR and I will review ASAP.
For reference, elsewhere I use uint16 for the labels (more than 65k labels seems unreasonable). Since these are feature matrices, uint32 seems like a good choice (note to use unsigned here).
Hi! Thanks so much for this project, it is very cool!
In the util code for kmodes, here, there is a numpy array initialized using np.zeros, and then immediately converted to the int dtype. However, the default dtype for np.zeros is float64, which takes up the maximum amount of memory.
I'm wondering if it would be possible to switch the line over from being what it is currently:
Xenc = np.zeros(X.shape).astype('int')
to something like
Xenc = np.zeros(X.shape, dtype='int32')
This would set the dtype from the beginning, and use less memory in building the new array, since int32s take up less space, but should still have plenty of capacity for what they're being asked to hold here.
If this is possible, my team would be willing to help get this incorporated -- this fix would help us avoid numpy memory errors we're getting when trying to allocate that array!
The text was updated successfully, but these errors were encountered: