Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage in array initialization #165

Closed
rggelles opened this issue Oct 5, 2021 · 2 comments
Closed

Reduce memory usage in array initialization #165

rggelles opened this issue Oct 5, 2021 · 2 comments

Comments

@rggelles
Copy link
Contributor

rggelles commented Oct 5, 2021

Hi! Thanks so much for this project, it is very cool!

In the util code for kmodes, here, there is a numpy array initialized using np.zeros, and then immediately converted to the int dtype. However, the default dtype for np.zeros is float64, which takes up the maximum amount of memory.

I'm wondering if it would be possible to switch the line over from being what it is currently:

Xenc = np.zeros(X.shape).astype('int')

to something like

Xenc = np.zeros(X.shape, dtype='int32')

This would set the dtype from the beginning, and use less memory in building the new array, since int32s take up less space, but should still have plenty of capacity for what they're being asked to hold here.

If this is possible, my team would be willing to help get this incorporated -- this fix would help us avoid numpy memory errors we're getting when trying to allocate that array!

@nicodv
Copy link
Owner

nicodv commented Oct 5, 2021

That sounds like a good improvement, @rggelles . Please submit a PR and I will review ASAP.

For reference, elsewhere I use uint16 for the labels (more than 65k labels seems unreasonable). Since these are feature matrices, uint32 seems like a good choice (note to use unsigned here).

@nicodv
Copy link
Owner

nicodv commented Oct 8, 2021

Fixed with #166

Thanks for the contribution, @rggelles !

This has been included in the new 0.11.1 release: https://github.com/nicodv/kmodes/releases/tag/0.11.1

@nicodv nicodv closed this as completed Oct 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants