Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Make neighbor tree module supports fused types #7153

Open
wants to merge 3 commits into
base: master
from

Conversation

@yenchenlin
Copy link
Contributor

yenchenlin commented Aug 5, 2016

Ball Tree and KD Tree currently only support np.float64, which is a waste of memory as same quality results should be obtainable with np.float32.

Note: self.new_data and self.new_data_arr are just for development usage, I try to first use them to repeat what self.data and self.data_arr do. With this manner, I can still build scikit-learn successfully and I'll change self.data and self.data_arr in the end.

cdef float f = 0

if tree.is_float:
return _allocate_data(tree, n_nodes, n_features, f)

This comment has been minimized.

Copy link
@yenchenlin

yenchenlin Aug 5, 2016

Author Contributor

Why I'm passing local variable d or f to the underlying function instead of self.data_arr is because it's a cdef function.

self.data_arr only gets its value assigned during __init__(), which is after the compilation of the cdef function. Therefore, I thought current implementation is a reasonable workaround.

ping @jnothman

This comment has been minimized.

Copy link
@jnothman

jnothman Aug 6, 2016

Member

Hmm... Okay. You should be able to cast a constant rather than have two local variables.

This comment has been minimized.

Copy link
@jnothman

jnothman Aug 6, 2016

Member

Why can't allocate_data receive a floating type_indicator?

@yenchenlin yenchenlin force-pushed the yenchenlin:neighbor-tree-fused-types branch from ce5ba41 to 8fba702 Aug 6, 2016
@yenchenlin yenchenlin force-pushed the yenchenlin:neighbor-tree-fused-types branch from 8fba702 to cefa237 Aug 6, 2016
@@ -150,8 +150,9 @@ import numpy as np
import warnings
from ..utils import check_array

from typedefs cimport DTYPE_t, ITYPE_t, DITYPE_t
from typedefs cimport DTYPE_t, ITYPE_t, DITYPE_t, floating_ITYPE_t

This comment has been minimized.

Copy link
@jnothman

jnothman Aug 6, 2016

Member

why floating_ITYPE_t? I is for integers...

This comment has been minimized.

Copy link
@yenchenlin

yenchenlin Aug 6, 2016

Author Contributor

Originally it's DITYPE_t. (also include integers)
What I did here is to add np.float32_t and define a new type.

@@ -1054,6 +1071,16 @@ cdef class BinaryTree:
self.data_arr = np.asarray(data, dtype=DTYPE, order='C')
self.data = get_memview_DTYPE_2D(self.data_arr)

self.new_data_arr = check_array(data, order='C', dtype=[np.float64])#, np.float32])

This comment has been minimized.

Copy link
@jnothman

jnothman Aug 6, 2016

Member

Why do we need this as well as data_arr? And why is the option of float32 commented out?

This comment has been minimized.

Copy link
@yenchenlin

yenchenlin Aug 6, 2016

Author Contributor

I try to first use self.new_data_arr to repeat all what self.data does and then finally replace self.data with self.new_data_arr, so that in the development process I can still build successfully.

But I found out this is extremely confused for the reviewer. I will modify it. Sorry!

This comment has been minimized.

Copy link
@jnothman

jnothman Aug 6, 2016

Member

No that's okay. If it helps, do it. Just leave a note!

This comment has been minimized.

Copy link
@yenchenlin

yenchenlin Aug 6, 2016

Author Contributor

Added! thanks!

cdef np.ndarray idx_array_arr
cdef np.ndarray node_data_arr
cdef np.ndarray node_bounds_arr

cdef readonly DTYPE_t[:, ::1] data
cdef void *new_data
cdef int n_samples

This comment has been minimized.

Copy link
@jnothman

jnothman Aug 6, 2016

Member

These should provide little benefit over self.data_arr.shape[x], I think, as that is defined in the pxd as a field of ndarray.

This comment has been minimized.

Copy link
@yenchenlin

yenchenlin Aug 6, 2016

Author Contributor

Oh yeah, I forget, thanks!

n_features, n_points)

if self.is_float:
find_node_split_dim(<float*>self.new_data, idx_array,

This comment has been minimized.

Copy link
@jnothman

jnothman Aug 6, 2016

Member

You're not capturing the return value here in i_max?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.