[WIP] Make neighbor tree module supports fused types #7153

Open
wants to merge 3 commits into
from

Conversation

Projects
None yet
2 participants
@yenchenlin
Contributor

yenchenlin commented Aug 5, 2016

Ball Tree and KD Tree currently only support np.float64, which is a waste of memory as same quality results should be obtainable with np.float32.

Note: self.new_data and self.new_data_arr are just for development usage, I try to first use them to repeat what self.data and self.data_arr do. With this manner, I can still build scikit-learn successfully and I'll change self.data and self.data_arr in the end.

+ cdef float f = 0
+
+ if tree.is_float:
+ return _allocate_data(tree, n_nodes, n_features, f)

This comment has been minimized.

@yenchenlin

yenchenlin Aug 5, 2016

Contributor

Why I'm passing local variable d or f to the underlying function instead of self.data_arr is because it's a cdef function.

self.data_arr only gets its value assigned during __init__(), which is after the compilation of the cdef function. Therefore, I thought current implementation is a reasonable workaround.

ping @jnothman

@yenchenlin

yenchenlin Aug 5, 2016

Contributor

Why I'm passing local variable d or f to the underlying function instead of self.data_arr is because it's a cdef function.

self.data_arr only gets its value assigned during __init__(), which is after the compilation of the cdef function. Therefore, I thought current implementation is a reasonable workaround.

ping @jnothman

This comment has been minimized.

@jnothman

jnothman Aug 6, 2016

Member

Hmm... Okay. You should be able to cast a constant rather than have two local variables.

@jnothman

jnothman Aug 6, 2016

Member

Hmm... Okay. You should be able to cast a constant rather than have two local variables.

This comment has been minimized.

@jnothman

jnothman Aug 6, 2016

Member

Why can't allocate_data receive a floating type_indicator?

@jnothman

jnothman Aug 6, 2016

Member

Why can't allocate_data receive a floating type_indicator?

@@ -150,8 +150,9 @@ import numpy as np
import warnings
from ..utils import check_array
-from typedefs cimport DTYPE_t, ITYPE_t, DITYPE_t
+from typedefs cimport DTYPE_t, ITYPE_t, DITYPE_t, floating_ITYPE_t

This comment has been minimized.

@jnothman

jnothman Aug 6, 2016

Member

why floating_ITYPE_t? I is for integers...

@jnothman

jnothman Aug 6, 2016

Member

why floating_ITYPE_t? I is for integers...

This comment has been minimized.

@yenchenlin

yenchenlin Aug 6, 2016

Contributor

Originally it's DITYPE_t. (also include integers)
What I did here is to add np.float32_t and define a new type.

@yenchenlin

yenchenlin Aug 6, 2016

Contributor

Originally it's DITYPE_t. (also include integers)
What I did here is to add np.float32_t and define a new type.

@@ -1054,6 +1071,16 @@ cdef class BinaryTree:
self.data_arr = np.asarray(data, dtype=DTYPE, order='C')
self.data = get_memview_DTYPE_2D(self.data_arr)
+ self.new_data_arr = check_array(data, order='C', dtype=[np.float64])#, np.float32])

This comment has been minimized.

@jnothman

jnothman Aug 6, 2016

Member

Why do we need this as well as data_arr? And why is the option of float32 commented out?

@jnothman

jnothman Aug 6, 2016

Member

Why do we need this as well as data_arr? And why is the option of float32 commented out?

This comment has been minimized.

@yenchenlin

yenchenlin Aug 6, 2016

Contributor

I try to first use self.new_data_arr to repeat all what self.data does and then finally replace self.data with self.new_data_arr, so that in the development process I can still build successfully.

But I found out this is extremely confused for the reviewer. I will modify it. Sorry!

@yenchenlin

yenchenlin Aug 6, 2016

Contributor

I try to first use self.new_data_arr to repeat all what self.data does and then finally replace self.data with self.new_data_arr, so that in the development process I can still build successfully.

But I found out this is extremely confused for the reviewer. I will modify it. Sorry!

This comment has been minimized.

@jnothman

jnothman Aug 6, 2016

Member

No that's okay. If it helps, do it. Just leave a note!

@jnothman

jnothman Aug 6, 2016

Member

No that's okay. If it helps, do it. Just leave a note!

This comment has been minimized.

@yenchenlin

yenchenlin Aug 6, 2016

Contributor

Added! thanks!

@yenchenlin

yenchenlin Aug 6, 2016

Contributor

Added! thanks!

cdef np.ndarray idx_array_arr
cdef np.ndarray node_data_arr
cdef np.ndarray node_bounds_arr
cdef readonly DTYPE_t[:, ::1] data
+ cdef void *new_data
+ cdef int n_samples

This comment has been minimized.

@jnothman

jnothman Aug 6, 2016

Member

These should provide little benefit over self.data_arr.shape[x], I think, as that is defined in the pxd as a field of ndarray.

@jnothman

jnothman Aug 6, 2016

Member

These should provide little benefit over self.data_arr.shape[x], I think, as that is defined in the pxd as a field of ndarray.

This comment has been minimized.

@yenchenlin

yenchenlin Aug 6, 2016

Contributor

Oh yeah, I forget, thanks!

@yenchenlin

yenchenlin Aug 6, 2016

Contributor

Oh yeah, I forget, thanks!

+ n_features, n_points)
+
+ if self.is_float:
+ find_node_split_dim(<float*>self.new_data, idx_array,

This comment has been minimized.

@jnothman

jnothman Aug 6, 2016

Member

You're not capturing the return value here in i_max?

@jnothman

jnothman Aug 6, 2016

Member

You're not capturing the return value here in i_max?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment