Reduce memory usage in array initialization #165

rggelles · 2021-10-05T14:00:25Z

Hi! Thanks so much for this project, it is very cool!

In the util code for kmodes, here, there is a numpy array initialized using np.zeros, and then immediately converted to the int dtype. However, the default dtype for np.zeros is float64, which takes up the maximum amount of memory.

I'm wondering if it would be possible to switch the line over from being what it is currently:

Xenc = np.zeros(X.shape).astype('int')

to something like

Xenc = np.zeros(X.shape, dtype='int32')

This would set the dtype from the beginning, and use less memory in building the new array, since int32s take up less space, but should still have plenty of capacity for what they're being asked to hold here.

If this is possible, my team would be willing to help get this incorporated -- this fix would help us avoid numpy memory errors we're getting when trying to allocate that array!

The text was updated successfully, but these errors were encountered:

nicodv · 2021-10-05T16:45:40Z

That sounds like a good improvement, @rggelles . Please submit a PR and I will review ASAP.

For reference, elsewhere I use uint16 for the labels (more than 65k labels seems unreasonable). Since these are feature matrices, uint32 seems like a good choice (note to use unsigned here).

nicodv · 2021-10-08T04:42:49Z

Fixed with #166

Thanks for the contribution, @rggelles !

This has been included in the new 0.11.1 release: https://github.com/nicodv/kmodes/releases/tag/0.11.1

rggelles added the enhancement label Oct 5, 2021

rggelles mentioned this issue Oct 7, 2021

Change feature array initialization dtype to uint32 #166

Merged

nicodv closed this as completed Oct 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory usage in array initialization #165

Reduce memory usage in array initialization #165

rggelles commented Oct 5, 2021

nicodv commented Oct 5, 2021

nicodv commented Oct 8, 2021

Reduce memory usage in array initialization #165

Reduce memory usage in array initialization #165

Comments

rggelles commented Oct 5, 2021

nicodv commented Oct 5, 2021

nicodv commented Oct 8, 2021