Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Implement nan_euclidean distance metric #4783

Open
ChrisJar opened this issue Jun 23, 2022 · 2 comments
Open

[FEA] Implement nan_euclidean distance metric #4783

ChrisJar opened this issue Jun 23, 2022 · 2 comments
Labels
? - Needs Triage Need team to review and classify feature request New feature or request inactive-30d inactive-90d

Comments

@ChrisJar
Copy link
Contributor

ChrisJar commented Jun 23, 2022

Is your feature request related to a problem? Please describe.
I wish I could use cuML to calculate euclidean distance on data with missing values in the same way sci-kit learn can with nan_euclidean:

Compute the euclidean distance between each pair of samples in X and Y,
where Y=X is assumed if Y=None. When calculating the distance between a
pair of samples, this formulation ignores feature coordinates with a
missing value in either sample and scales up the weight of the remaining
coordinates

Describe the solution you'd like
I'd like a function that calculates euclidean distance on data with missing values like the scikit-learn function: nan_euclidean_distances. I'd also like to be able to use this as a metric for pairwise_distances.

@ChrisJar ChrisJar added ? - Needs Triage Need team to review and classify feature request New feature or request labels Jun 23, 2022
@github-actions github-actions bot added this to Needs prioritizing in Feature Planning Jun 23, 2022
SreekiranprasadV pushed a commit to SreekiranprasadV/cuml that referenced this issue Jul 1, 2022
SreekiranprasadV pushed a commit to SreekiranprasadV/cuml that referenced this issue Jul 18, 2022
@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

rapids-bot bot pushed a commit that referenced this issue Aug 30, 2022
Added nan_euclidean distance metric to pairwise_distances to calculate euclidean distance on data with missing values.

- Added Test cases for nan_euclidean_distance functions

Time taken to calculate:
#Data Points | Sklearn | Cuml
      10000       402 us     2.54 ms
      100k         23 ms      3.8 ms
      1M            760 ms    16 ms

GPU specifications:
- Tesla T4 15109MiB

CPU specifications:
- 11th gen intel i7, 8 cores, 16 Logical processors, 32 GB Memory
- Sklearn njobs as default

Authors:
  - https://github.com/Sreekiran096

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #4797
@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

jakirkham pushed a commit to jakirkham/cuml that referenced this issue Feb 27, 2023
…es (rapidsai#4797)

Added nan_euclidean distance metric to pairwise_distances to calculate euclidean distance on data with missing values.

- Added Test cases for nan_euclidean_distance functions

Time taken to calculate:
#Data Points | Sklearn | Cuml
      10000       402 us     2.54 ms
      100k         23 ms      3.8 ms
      1M            760 ms    16 ms

GPU specifications:
- Tesla T4 15109MiB

CPU specifications:
- 11th gen intel i7, 8 cores, 16 Logical processors, 32 GB Memory
- Sklearn njobs as default

Authors:
  - https://github.com/Sreekiran096

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#4797
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify feature request New feature or request inactive-30d inactive-90d
Projects
Feature Planning
Needs prioritizing
Development

No branches or pull requests

1 participant