You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Inside the documentation, there are two options for input metric:
string: This is one of the options supported in PAIRWISE_KERNEL_FUNCTIONS and allows the pairwise kernels to be computed directly using the vectorized function.
callable: this then uses joblib to compute a pairwise kernel for every single combination of rows between the array inputs. Note this is considerably slower.
We would ideally like to support both options to allow maximum flexibility for the user. However, we also want to add additional kernels such as the following delta kernel, which is used to handle categorical input.
def delta_kernel(X: ArrayLike, Y=None) -> ArrayLike:
"""Delta kernel for categorical values.
This is, the similarity is 1 if the values are equal and 0 otherwise.
Parameters
----------
X : ArrayLike of shape (n_samples, n_dimensions_x)
Input data.
Y : ArrayLike of shape (n_samples, n_dimensions_y), optional
By default None.
Returns
-------
result : ArrayLike of shape (n_samples, n_samples)
The resulting kernel matrix after applying the delta kernel.
"""
X, Y = check_pairwise_arrays(X, Y)
return np.equal(X[:, np.newaxis], Y).all(axis=-1).astype(int)
However, in order to support the pairwise_kernels API and leverage the fact that the delta kernel is vectorized, so we don't need to pass it in as a callable, we have to do the following augmentation of the PAIRWISE_KERNEL_FUNCTIONS variable.
from sklearn.metrics.pairwise import PAIRWISE_KERNEL_FUNCTIONS
# Note that this is added to the list of possible kernels for :func:`~sklearn.metrics.pairwise.pairwise_kernels`.
# because it is more efficient to compute the kernel over the entire matrices at once
# since numpy has vectorized operations.
PAIRWISE_KERNEL_FUNCTIONS["delta"] = delta_kernel
Our questions are:
is this recommended and fine?
If not, is there a recommended way to support this kind of usage?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
We are developing a package where we want to leverage the API that scikit-learn has for computing pairwise kernels among arrays: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_kernels.html.
Inside the documentation, there are two options for input
metric
:PAIRWISE_KERNEL_FUNCTIONS
and allows the pairwise kernels to be computed directly using the vectorized function.We would ideally like to support both options to allow maximum flexibility for the user. However, we also want to add additional kernels such as the following delta kernel, which is used to handle categorical input.
However, in order to support the
pairwise_kernels
API and leverage the fact that the delta kernel is vectorized, so we don't need to pass it in as a callable, we have to do the following augmentation of thePAIRWISE_KERNEL_FUNCTIONS
variable.Our questions are:
Beta Was this translation helpful? Give feedback.
All reactions