Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Adding user guide #61

Merged
merged 11 commits into from
Apr 12, 2023
Merged

[DOC] Adding user guide #61

merged 11 commits into from
Apr 12, 2023

Conversation

adam2392
Copy link
Collaborator

@adam2392 adam2392 commented Apr 6, 2023

Changes proposed in this pull request:

  • adds a preliminary user-guide for different type of tree models
  • The goal is to have a similar user-guide to scikit-learn and have it be an "extension" of what already exists in scikit-learn

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

  • All GitHub Actions jobs for my pull request have passed.

Signed-off-by: Adam Li <adam2392@gmail.com>
@codecov
Copy link

codecov bot commented Apr 7, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.30 🎉

Comparison is base (ac52350) 91.39% compared to head (a94c026) 91.70%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #61      +/-   ##
==========================================
+ Coverage   91.39%   91.70%   +0.30%     
==========================================
  Files           9        9              
  Lines         779      808      +29     
==========================================
+ Hits          712      741      +29     
  Misses         67       67              
Impacted Files Coverage Δ
sktree/__init__.py 77.27% <100.00%> (ø)
sktree/tests/test_supervised_forest.py 99.30% <100.00%> (+0.17%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392 adam2392 requested review from sampan501 and jovo April 8, 2023 21:12
Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392
Copy link
Collaborator Author

There is a bug in

cdef void sample_proj_mat(
self,
vector[vector[DTYPE_t]]& proj_mat_weights,
vector[vector[SIZE_t]]& proj_mat_indices
) noexcept nogil:
"""Sample projection matrix using a contiguous patch.
Randomly sample patches with weight of 1.
"""
cdef SIZE_t max_features = self.max_features
cdef UINT32_t* random_state = &self.rand_r_state
cdef int feat_i, proj_i
# weights are default to 1
cdef DTYPE_t weight = 1.
cdef SIZE_t data_width = self.data_width
cdef SIZE_t data_height = self.data_height
cdef SIZE_t min_patch_height = self.min_patch_height
cdef SIZE_t max_patch_height = self.max_patch_height
cdef SIZE_t min_patch_width = self.min_patch_width
cdef SIZE_t max_patch_width = self.max_patch_width
# define parameters for the random patch
cdef SIZE_t delta_width
cdef SIZE_t delta_height
cdef SIZE_t patch_height
cdef SIZE_t patch_width
# define parameters for vectorized points in the original data shape
# and top-left seed
cdef SIZE_t vectorized_point
cdef SIZE_t top_left_seed
cdef SIZE_t patch_end_seed
for proj_i in range(0, max_features):
# compute random patch width and height
# Note: By constraining max patch height/width to be at least the min patch
# height/width this ensures that the minimum value of patch_height and
# patch_width is 1
patch_height = rand_int(min_patch_height, max_patch_height + 1,
random_state)
patch_width = rand_int(min_patch_width, max_patch_width + 1, random_state)
# compute the difference between the image dimensions and the current random
# patch dimensions for sampling
delta_width = data_width - patch_width + 1
delta_height = data_height - patch_height + 1
# now get the top-left seed that is used to then determine the top-left
# position in patch
top_left_seed = rand_int(0, delta_width * delta_height, random_state)
# Get the end-point of the patch
# Note: The end-point of the dataset might be less than the patch.
# This occurs if we sample a seed point at the edge of the "image".
# Therefore, we take the minimum between the end-point, or the last
# index in the vectorized image.
patch_end_seed = min(
top_left_seed + delta_width * delta_height,
self.n_features
)
for feat_i in range(top_left_seed, patch_end_seed):
# now compute top-left point
vectorized_point = (feat_i % delta_width) + \
(data_width * <SIZE_t>floor(feat_i / delta_width))
# store the non-zero indices and non-zero weights of the data
proj_mat_indices[proj_i].push_back(vectorized_point)
proj_mat_weights[proj_i].push_back(weight)
that results in the issue you saw @sampan501

Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@sampan501 sampan501 self-requested a review April 12, 2023 15:47
@adam2392 adam2392 merged commit c6aeca7 into main Apr 12, 2023
22 checks passed
@adam2392 adam2392 deleted the userguide branch April 12, 2023 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants