-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: spatial: Rewriting distance metrics to _distance_pybind
#14611
base: main
Are you sure you want to change the base?
Conversation
b0a5a91
to
4d73301
Compare
Could you please fill up a bit more the description of your PRs? 😅 We try to make it clear enough so that we don't need to go through the code to know what is the intent (it takes quite some time otherwise to do this for all PRs). Thanks a lot 🙏 |
For this PR, there is nothing more info to provide other than the title at the moment. Once ready and things get clear enough, I will provide the benchmark code and results in the description. Thanks. |
From the current code it seems that this just addresses Hamming, and not the whole issue which is unclear from your description. If indeed you want to add all, then it should read |
First I added, Correlation distance and it seems like adding it would require a bunch of changes in the framework itself (or may be not). So, I changed to adding Hamming. I would be pushing more commits to add Cosine and Correlation. This PR may not be able to close the referenced issue at once (to avoid large code diff making it hard to review). In order to avoid confusion, I kept it in WIP mode. Once it would be ready for review I will explain my changes in the description, add benchmark code and results. |
In the future, please still add a description. In this case you can just list what you intend to do with a list for instance. It's not a problem if you don't tick everything and modify it latter. It's just clearer. If your work is very experimental though, I would argue that you should not open yet a PR and work on your fork first. This simply helps our CI consumption. |
Sure. I didn't think that adding Edit - Updated the description for clarity. |
bbe6a7f
to
5ad3c2a
Compare
4daae57
to
99e31eb
Compare
f900293
to
e5e5a5e
Compare
I am unable to run benchmarks with following command. Command
Error · No `environment_type` specified in asv.conf.json. This will be required in the future.
· Creating environments....................
·· Error running /home/czgdp1807ssd/anaconda3/bin/conda env create -f /tmp/tmpnufhm_31.yml -p /home/czgdp1807ssd/scipy_project/scipy/benchmarks/env/9e6b5bad4b437052513fdb7dfc5533cc --force (exit status 1)
STDOUT -------->
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working...
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed
Solving environment: ...working...
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed
STDERR -------->
Examining @/linux-64::__cuda==11.2=0: 23%|██▎ | 3/13 [00:00<00:00, 23.68iExamining @/linux-64::__cuda==11.2=0: 31%|███ | 4/13 [00:00<00:00, 31.55iExamining @/linux-64::__glibc==2.27=0: 31%|███ | 4/13 [00:00<00:00, 31.55Examining @/linux-64::__unix==0=0: 38%|███▊ | 5/13 [00:00<00:00, 31.55it/sExamining pybind11: 46%|████▌ | 6/13 [00:00<00:00, 31.55it/s] Examining @/linux-64::__archspec==1=x86_64: 54%|█████▍ | 7/13 [00:00<00:00, Examining @/linux-64::__archspec==1=x86_64: 62%|██████▏ | 8/13 [00:00<00:00, Examining pytest: 62%|██████▏ | 8/13 [00:00<00:00, 33.21it/s] Examining conflict for pythran cython pybind11 pytest numpy python pip wheel: Examining conflict for numpy pythran: 8%|▊ | 1/13 [00:01<00:13, 1.12sExamining conflict for numpy pythran: 15%|█▌ | 2/13 [00:01<00:06, 1.79iExamining conflict for pythran cython pytest python pip wheel: 15%|█▌ | Examining conflict for numpy pythran pytest: 23%|██▎ | 3/13 [00:01<00:05,Examining conflict for numpy pythran pytest: 31%|███ | 4/13 [00:01<00:02,Examining conflict for pythran cython pybind11 numpy python: 31%|███ | 4/Examining conflict for pythran cython pybind11 numpy python: 38%|███▊ | 5/Examining conflict for pythran pybind11 pytest numpy python pip wheel: 38%|███▊Examining conflict for pythran pybind11 pytest numpy python pip wheel: 46%|████Examining conflict for pythran cython __glibc pybind11 numpy python: 46%|████▌ Examining conflict for pythran cython __glibc pybind11 numpy python: 54%|█████▍Examining conflict for pytest pip cython wheel: 54%|█████▍ | 7/13 [00:05<00:Examining conflict for pytest pip cython wheel: 62%|██████▏ | 8/13 [00:05<00:Examining conflict for pybind11 pytest numpy python pip wheel: 62%|██████▏ | Examining conflict for pybind11 pytest numpy python pip wheel: 69%|██████▉ | Examining conflict for python pip wheel: 69%|██████▉ | 9/13 [00:06<00:02, 1.Examining conflict for python pip wheel: 77%|███████▋ | 10/13 [00:06<00:02, 1 Examining @/linux-64::__cuda==11.2=0: 23%|██▎ | 3/13 [00:00<00:00, 22.65iExamining @/linux-64::__cuda==11.2=0: 31%|███ | 4/13 [00:00<00:00, 30.18iExamining @/linux-64::__glibc==2.27=0: 31%|███ | 4/13 [00:00<00:00, 30.18Examining @/linux-64::__unix==0=0: 38%|███▊ | 5/13 [00:00<00:00, 30.18it/sExamining pybind11: 46%|████▌ | 6/13 [00:00<00:00, 30.18it/s] Examining @/linux-64::__archspec==1=x86_64: 54%|█████▍ | 7/13 [00:00<00:00, Examining @/linux-64::__archspec==1=x86_64: 62%|██████▏ | 8/13 [00:00<00:00, Examining pytest: 62%|██████▏ | 8/13 [00:00<00:00, 31.69it/s] Examining conflict for pythran cython pybind11 pytest numpy python pip wheel: Examining conflict for numpy pythran: 8%|▊ | 1/13 [00:01<00:14, 1.24sExamining conflict for numpy pythran: 15%|█▌ | 2/13 [00:01<00:06, 1.61iExamining conflict for pythran cython pytest python pip wheel: 15%|█▌ | Examining conflict for numpy pythran pytest: 23%|██▎ | 3/13 [00:01<00:06,Examining conflict for numpy pythran pytest: 31%|███ | 4/13 [00:01<00:03,Examining conflict for pythran cython pybind11 numpy python: 31%|███ | 4/Examining conflict for pythran cython pybind11 numpy python: 38%|███▊ | 5/Examining conflict for pythran pybind11 pytest numpy python pip wheel: 38%|███▊Examining conflict for pythran pybind11 pytest numpy python pip wheel: 46%|████Examining conflict for pythran cython __glibc pybind11 numpy python: 46%|████▌ Examining conflict for pythran cython __glibc pybind11 numpy python: 54%|█████▍Examining conflict for pytest pip cython wheel: 54%|█████▍ | 7/13 [00:06<00:Examining conflict for pytest pip cython wheel: 62%|██████▏ | 8/13 [00:06<00:Examining conflict for pybind11 pytest numpy python pip wheel: 62%|██████▏ | Examining conflict for pybind11 pytest numpy python pip wheel: 69%|██████▉ | Examining conflict for python pip wheel: 69%|██████▉ | 9/13 [00:07<00:02, 1.Examining conflict for python pip wheel: 77%|███████▋ | 10/13 [00:07<00:02, 1
UnsatisfiableError: The following specifications were found to be incompatible with each other:
Output in format: Requested package -> Available versions
Package ca-certificates conflicts for:
wheel -> python -> ca-certificates
pip -> python[version='>=2.7,<2.8.0a0'] -> ca-certificates
python=3.10 -> openssl[version='>=1.1.1l,<1.1.2a'] -> ca-certificates
numpy -> python[version='>=2.7,<2.8.0a0'] -> ca-certificates
pybind11 -> python[version='>=2.7,<2.8.0a0'] -> ca-certificates
pytest -> python[version='>=2.7,<2.8.0a0'] -> ca-certificates
Package python conflicts for:
cython=0.29.21 -> python[version='>=3.6,<3.7.0a0|>=3.8,<3.9.0a0|>=3.9,<3.10.0a0|>=3.7,<3.8.0a0']
pythran -> beniget=0.3 -> python[version='>=2.7,<2.8.0a0|>=3.5|>=3.5,<3.6.0a0|>=3.7|>=3.6']
pytest -> attrs[version='>=19.2.0'] -> python[version='>=2.7|>=3.5|>=3.6|>=3|>=3.4']
pybind11 -> python[version='>=2.7,<2.8.0a0|>=3.10,<3.11.0a0|>=3.7,<3.8.0a0|>=3.8,<3.9.0a0|>=3.9,<3.10.0a0|>=3.6,<3.7.0a0|>=3.5,<3.6.0a0']
pytest -> python[version='>=2.7,<2.8.0a0|>=3.10,<3.11.0a0|>=3.7,<3.8.0a0|>=3.8,<3.9.0a0|>=3.9,<3.10.0a0|>=3.6,<3.7.0a0|>=3.5,<3.6.0a0']
numpy -> python[version='>=2.7,<2.8.0a0|>=3.10,<3.11.0a0|>=3.8,<3.9.0a0|>=3.7,<3.8.0a0|>=3.9,<3.10.0a0|>=3.6,<3.7.0a0|>=3.5,<3.6.0a0']
pip -> wheel -> python
wheel -> setuptools -> python[version='>=3.10,<3.11.0a0|>=3.9,<3.10.0a0']
cython=0.29.21 -> setuptools -> python[version='>=2.7,<2.8.0a0|>=3.10,<3.11.0a0|>=3.5,<3.6.0a0']
wheel -> python[version='>=2.7,<2.8.0a0|>=3.6,<3.7.0a0|>=3.8,<3.9.0a0|>=3.7,<3.8.0a0|>=3.5,<3.6.0a0']
pip -> python[version='>=2.7,<2.8.0a0|>=3.10,<3.11.0a0|>=3.9,<3.10.0a0|>=3.8,<3.9.0a0|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0|>=3.5,<3.6.0a0']
pythran -> python[version='>=3.10,<3.11.0a0|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0|>=3.8,<3.9.0a0|>=3.9,<3.10.0a0']
Package six conflicts for:
numpy -> mkl-service[version='>=2.3.0,<3.0a0'] -> six
pytest -> six[version='>=1.10.0']
pytest -> more-itertools[version='>=4.0.0'] -> six[version='>=1.0.0,<2.0.0']
pythran -> six
Package numpy conflicts for:
pythran -> numpy[version='>=1.16.6,<2.0a0|>=1.21.2,<2.0a0']
pythran -> networkx[version='>=2'] -> numpy[version='>=1.19']
numpy
Package bzip2 conflicts for:
numpy -> python[version='>=3.10,<3.11.0a0'] -> bzip2[version='>=1.0.8,<2.0a0']
python=3.10 -> bzip2[version='>=1.0.8,<2.0a0']
pip -> python[version='>=3.10,<3.11.0a0'] -> bzip2[version='>=1.0.8,<2.0a0']
pybind11 -> python[version='>=3.10,<3.11.0a0'] -> bzip2[version='>=1.0.8,<2.0a0']
wheel -> python -> bzip2[version='>=1.0.8,<2.0a0']
pythran -> python[version='>=3.10,<3.11.0a0'] -> bzip2[version='>=1.0.8,<2.0a0']
pytest -> python[version='>=3.10,<3.11.0a0'] -> bzip2[version='>=1.0.8,<2.0a0']
Package setuptools conflicts for:
cython=0.29.21 -> setuptools
wheel -> setuptools
pip -> setuptools
pythran -> networkx[version='>=2'] -> setuptools
pytest -> setuptools[version='>=40.0']
python=3.10 -> pip -> setuptools
Package certifi conflicts for:
cython=0.29.21 -> setuptools -> certifi[version='>=2016.09|>=2016.9.26']
pip -> setuptools -> certifi[version='>=2016.09|>=2016.9.26']
wheel -> setuptools -> certifi[version='>=2016.09|>=2016.9.26']
pytest -> setuptools[version='>=40.0'] -> certifi[version='>=2016.09|>=2016.9.26']
Package _openmp_mutex conflicts for:
python=3.10 -> libgcc-ng[version='>=7.5.0'] -> _openmp_mutex[version='>=4.5']
cython=0.29.21 -> libgcc-ng[version='>=7.3.0'] -> _openmp_mutex[version='>=4.5']
numpy -> libgcc-ng[version='>=7.5.0'] -> _openmp_mutex[version='>=4.5']
pybind11 -> libgcc-ng[version='>=7.5.0'] -> _openmp_mutex[version='>=4.5']
pythran -> libgcc-ng[version='>=7.5.0'] -> _openmp_mutex[version='>=4.5']
Package wheel conflicts for:
pip -> wheel
wheelThe following specifications were found to be incompatible with your system:
- feature:/linux-64::__glibc==2.27=0
- feature:|@/linux-64::__glibc==2.27=0
- cython=0.29.21 -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']
- numpy -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
- pybind11 -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
- pythran -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
Your installed version is: 2.27
·· Failure creating environment for conda-py3.10-Cython0.29.21-numpy-pybind11-pytest-pythran
Traceback (most recent call last):
File "/home/czgdp1807ssd/anaconda3/envs/scipy-dev/bin/asv", line 10, in <module>
sys.exit(main())
File "/home/czgdp1807ssd/anaconda3/envs/scipy-dev/lib/python3.10/site-packages/asv/main.py", line 38, in main
result = args.func(args)
File "/home/czgdp1807ssd/anaconda3/envs/scipy-dev/lib/python3.10/site-packages/asv/commands/__init__.py", line 49, in run_from_args
return cls.run_from_conf_args(conf, args)
File "/home/czgdp1807ssd/anaconda3/envs/scipy-dev/lib/python3.10/site-packages/asv/commands/continuous.py", line 75, in run_from_conf_args
return cls.run(
File "/home/czgdp1807ssd/anaconda3/envs/scipy-dev/lib/python3.10/site-packages/asv/commands/continuous.py", line 114, in run
result = Run.run(
File "/home/czgdp1807ssd/anaconda3/envs/scipy-dev/lib/python3.10/site-packages/asv/commands/run.py", line 294, in run
Setup.perform_setup(environments, parallel=parallel)
File "/home/czgdp1807ssd/anaconda3/envs/scipy-dev/lib/python3.10/site-packages/asv/commands/setup.py", line 89, in perform_setup
list(map(_create, environments))
File "/home/czgdp1807ssd/anaconda3/envs/scipy-dev/lib/python3.10/site-packages/asv/commands/setup.py", line 21, in _create
env.create()
File "/home/czgdp1807ssd/anaconda3/envs/scipy-dev/lib/python3.10/site-packages/asv/environment.py", line 704, in create
self._setup()
File "/home/czgdp1807ssd/anaconda3/envs/scipy-dev/lib/python3.10/site-packages/asv/plugins/conda.py", line 174, in _setup
self._run_conda(['env', 'create', '-f', env_file_name,
File "/home/czgdp1807ssd/anaconda3/envs/scipy-dev/lib/python3.10/site-packages/asv/plugins/conda.py", line 227, in _run_conda
return util.check_output([conda] + args, env=env)
File "/home/czgdp1807ssd/anaconda3/envs/scipy-dev/lib/python3.10/site-packages/asv/util.py", line 754, in check_output
raise ProcessError(args, retcode, stdout, stderr)
asv.util.ProcessError: Command '/home/czgdp1807ssd/anaconda3/bin/conda env create -f /tmp/tmpnufhm_31.yml -p /home/czgdp1807ssd/scipy_project/scipy/benchmarks/env/9e6b5bad4b437052513fdb7dfc5533cc --force' returned non-zero exit status 1 cc: @rgommers @tupui Any ideas why the above might be happening? |
Well I made the benchmarks work by removing |
Work on |
f19da64
to
ba8ff7a
Compare
This PR is ready now. I have added cc: @rgommers @peterbell10 @tupui Also, note that #16023 is open which discusses some important aspects related to the new framework for distance metrics. I am not sure why tests on Windows fail. The log is shown below, During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "runtests.py", line 612, in <module>
main(argv=sys.argv[1:])
File "runtests.py", line 307, in main
test, version, mod_path = import_module()
File "runtests.py", line 356, in import_module
__import__(PROJECT_MODULE)
ModuleNotFoundError: No module named 'scipy'
##[error]PowerShell exited with code '1'.
Finishing: Run SciPy Test Suite |
On these jobs when this happens it's always because the previous stage failed. I know, it's confusing. You have build errors there. |
So, should I rebase on top of |
Feel free to rebase. When there are no review there is no problem with that 😃 When you rebase on GH, you should just be careful about which part of the history you are rewriting. GH is using the hash of the commits to links comments (seemingly) and if the hash change, comments on specific lines become dead links. I often force push my last commit because I have a typo, and this is fine until a new comment is made. |
I see. Probably that's why its safe to rebase here because currently there are on comments on the diff. Am I right? |
Yes, exactly. And during a review which already started, you can also ask about rebasing. Most of the time people will be fine with it. They just need to know before it happens and they are not surprise about their reviews disappearing. |
…g distance metrics, 1. Correlation 2. Cosine 3. Mahalanobis 4. Seucledian 5. Jensenshannon
afd4352
to
789ea3c
Compare
I rebased on top of |
Well, C++ on Windows doesn't allow creating variable sized arrays in stack memory. Will fix this. |
Tests pass now I think. |
Probably I am doing something wrong but I cannot compile the branch: git clone https://github.com/czgdp1807/scipy.git
cd scipy
git checkout pybind11_dist
pip install .
[snip]
/tmp/pip-build-env-z3vckssd/overlay/lib/python3.10/site-packages/numpy/distutils/system_info.py:933: UserWarning: Specified path /usr/local/include/python3.10 is invalid.
return self.get_paths(self.section, key)
/tmp/pip-build-env-z3vckssd/overlay/lib/python3.10/site-packages/numpy/distutils/system_info.py:933: UserWarning: Specified path /home/patacca/software/scipy/venv/include/python3.10 is invalid.
return self.get_paths(self.section, key)
non-existing path in 'scipy/linalg': 'src/lapack_deprecations/LICENSE'
Traceback (most recent call last):
File "/home/patacca/software/scipy/venv/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
main()
File "/home/patacca/software/scipy/venv/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/home/patacca/software/scipy/venv/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 164, in prepare_metadata_for_build_wheel
return hook(metadata_directory, config_settings)
File "/tmp/pip-build-env-z3vckssd/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 174, in prepare_metadata_for_build_wheel
self.run_setup()
File "/tmp/pip-build-env-z3vckssd/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 267, in run_setup
super(_BuildMetaLegacyBackend,
File "/tmp/pip-build-env-z3vckssd/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 158, in run_setup
exec(compile(code, __file__, 'exec'), locals())
File "setup.py", line 532, in <module>
setup_package()
File "setup.py", line 528, in setup_package
setup(**metadata)
File "/tmp/pip-build-env-z3vckssd/overlay/lib/python3.10/site-packages/numpy/distutils/core.py", line 135, in setup
config = configuration()
File "setup.py", line 438, in configuration
config.add_subpackage('scipy')
File "/tmp/pip-build-env-z3vckssd/overlay/lib/python3.10/site-packages/numpy/distutils/misc_util.py", line 1016, in add_subpackage
config_list = self.get_subpackage(subpackage_name, subpackage_path,
File "/tmp/pip-build-env-z3vckssd/overlay/lib/python3.10/site-packages/numpy/distutils/misc_util.py", line 982, in get_subpackage
config = self._get_configuration_from_setup_py(
File "/tmp/pip-build-env-z3vckssd/overlay/lib/python3.10/site-packages/numpy/distutils/misc_util.py", line 924, in _get_configuration_from_setup_py
config = setup_module.configuration(*args)
File "scipy/setup.py", line 18, in configuration
config.add_subpackage('optimize')
File "/tmp/pip-build-env-z3vckssd/overlay/lib/python3.10/site-packages/numpy/distutils/misc_util.py", line 1016, in add_subpackage
config_list = self.get_subpackage(subpackage_name, subpackage_path,
File "/tmp/pip-build-env-z3vckssd/overlay/lib/python3.10/site-packages/numpy/distutils/misc_util.py", line 982, in get_subpackage
config = self._get_configuration_from_setup_py(
File "/tmp/pip-build-env-z3vckssd/overlay/lib/python3.10/site-packages/numpy/distutils/misc_util.py", line 924, in _get_configuration_from_setup_py
config = setup_module.configuration(*args)
File "scipy/optimize/setup.py", line 130, in configuration
config.add_subpackage('_highs')
File "/tmp/pip-build-env-z3vckssd/overlay/lib/python3.10/site-packages/numpy/distutils/misc_util.py", line 1016, in add_subpackage
config_list = self.get_subpackage(subpackage_name, subpackage_path,
File "/tmp/pip-build-env-z3vckssd/overlay/lib/python3.10/site-packages/numpy/distutils/misc_util.py", line 982, in get_subpackage
config = self._get_configuration_from_setup_py(
File "/tmp/pip-build-env-z3vckssd/overlay/lib/python3.10/site-packages/numpy/distutils/misc_util.py", line 924, in _get_configuration_from_setup_py
config = setup_module.configuration(*args)
File "scipy/optimize/_highs/setup.py", line 63, in configuration
_major_dot_minor = _get_version(
File "scipy/optimize/_highs/setup.py", line 50, in _get_version
with open(CMakeLists, 'r', encoding='utf-8') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/patacca/software/scipy/scipy/_lib/highs/CMakeLists.txt'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
|
Hi @patacca. This is not how you would install a development version. Please refer to our contributors' documentation. If you have further issues after strictly following the steps, I invite you to open an issue to not mix things with this PR. |
Is there an ETA for this PR? It seems to pass all the checks |
It's just waiting for someone knowledgeable to review 😃 Any help is appreciated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the short summary here is that we basically have a few more functions migrated from Python -> C++ for the distrance metrics, that's something we've recently been +1 on. So, if the performance has generally improved, it seems "ok" in principle at least.
Are we reasonably confident in the state of the testing for these distance functions?
// Under NumPy's relaxed stride checking, dimensions with | ||
// 1 or fewer elements are ignored. | ||
desc.strides[i] = 0; | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diff is a bit hard to read because of its size--did you actually delete any of Peter's old infrastructure, or is just getting relocated because of the number of changes here?
I'd say this should get a review from @peterbell10. I'll bump the milestone - if it still gets merged in time that's nice to include, otherwise it's for the next release. |
|
||
StridedView1D<T> rownorms_view; | ||
rownorms_view.shape = {x.shape[0]}; | ||
rownorms_view.data = new T[x.shape[0]]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid raw new
and delete
:
rownorms_view.data = new T[x.shape[0]]; | |
auto rownorms_data = std::make_unique<T[]>(x.shape[0]); | |
rownorms_view.data = rownorm_data.get(); |
T x_ij = in_data[i*x.strides[0] + j*x.strides[1]]; | ||
rownorm += x_ij * x_ij; | ||
} | ||
rownorms_view.data[i] = sqrt(rownorm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always use std::
because the C versions names don't necessarily have float
overloads
rownorms_view.data[i] = sqrt(rownorm); | |
rownorms_view.data[i] = std::sqrt(rownorm); |
|
||
StridedView1D<T> rownorms_view; | ||
rownorms_view.shape = {x.shape[0]}; | ||
rownorms_view.data = new T[x.shape[0]]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid new
T w_ij = w_data[j*w.strides[0]]; | ||
rownorm += w_ij * x_ij * x_ij; | ||
} | ||
rownorms_view.data[i] = sqrt(rownorm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rownorms_view.data[i] = sqrt(rownorm); | |
rownorms_view.data[i] = std::sqrt(rownorm); |
|
||
StridedView1D<T> xrownorms_view; | ||
xrownorms_view.shape = {x.shape[0]}; | ||
xrownorms_view.data = new T[x.shape[0]]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid new
} | ||
|
||
ForceUnroll<ilp_factor>{}([&](int k) { | ||
out(i + k, 0) = 1.0 - project(dist[k])/(xrownorms(ix.shape[0]) * yrownorms(i + k)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having completely new templates for cosine
, seclidean
and jensenshannon
, what if we have just one new template that passes the indices i
and j
to map
, and i
to project
. Then the metrics can use or ignore the indices as they require. If that performs the same then it would save a lot of duplication.
@@ -1856,8 +1895,8 @@ class MetricInfo: | |||
canonical_name='jensenshannon', | |||
aka={'jensenshannon', 'js'}, | |||
dist_func=jensenshannon, | |||
cdist_func=CDistMetricWrapper('jensenshannon'), | |||
pdist_func=PDistMetricWrapper('jensenshannon'), | |||
cdist_func=_distance_pybind.cdist_jensenshannon, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run the benchmarks I'm seeing ~50% slow down in unweighted jensenshannon from this PR.
auto x_ratio = x_i / m_i + error_x; | ||
auto y_ratio = y_i / m_i + error_y; | ||
auto log_e = log2(exp(1)); | ||
auto s_x = ((T) x_i > 0.0) * x_i * (log2(x_ratio)/log_e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't log2(x)/log_e
equal to log(x)
? Also need to use std::
here as well.
template <typename T> | ||
struct Acc { | ||
Acc(): js_diverg(0) {} | ||
T js_diverg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll point out that the existing metrics that have an Acc
struct only use it because there are multiple values being accumulated, not as a requirement. If you just wanted to name the value then that's fine though.
T error_y = (y_i == 0.0) * 1e-6; | ||
m_i += (x_i == 0.0 && y_i == 0.0) * 1e-6; | ||
auto x_ratio = x_i / m_i + error_x; | ||
auto y_ratio = y_i / m_i + error_y; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did this come from, I can't see it in the existing C code?
There are unaddressed review comments, so I will clear the milestone from this PR. It'd be great to see this land if you find time at some point @czgdp1807 |
_distance_pybind
_distance_pybind
_distance_pybind
Reference issue
#14077
What does this implement/fix?
This PR aims to re-write distance metrics in
scipy.spatial.distance
to_distance_pybind
. As of now this is a work in progress. Following metrics will be added here,Refer this comment for more details.
Please feel free to leave your reviews meanwhile this gets completed. Thanks.
Additional information
Benchmarks result.
This PR is ready for now. Merged main into this branch after my latest commit.