Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: buffer source array is read-only in check_estimator #28026

Closed
jilljenn opened this issue Dec 27, 2023 · 4 comments 路 Fixed by #28111
Closed

ValueError: buffer source array is read-only in check_estimator #28026

jilljenn opened this issue Dec 27, 2023 · 4 comments 路 Fixed by #28111
Labels

Comments

@jilljenn
Copy link
Contributor

Describe the bug

I am trying to make a scikit-learn estimator FMClassifier based on Python wrapper pyWFM for C++ library libFM (yes 馃槄).

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jj/code/fare/scikit-learn/sklearn/utils/estimator_checks.py", line 627, in check_estimator
    check(estimator)
  File "/home/jj/code/fare/scikit-learn/sklearn/utils/_testing.py", line 318, in wrapper
    return fn(*args, **kwargs)
  File "/home/jj/code/fare/scikit-learn/sklearn/utils/estimator_checks.py", line 2603, in check_estimators_fit_returns_self
    assert estimator.fit(X, y) is estimator
  File "/home/jj/code/ktm/fm.py", line 40, in fit
    model = fm.run(X, y, X, y)
  File "/home/jj/.local/lib/python3.10/site-packages/pywFM/__init__.py", line 149, in run
    dump_svmlight_file(x_train, y_train, train_path)
  File "/home/jj/code/fare/scikit-learn/sklearn/datasets/_svmlight_format_io.py", line 513, in dump_svmlight_file
    _dump_svmlight(X, y, f, multilabel, one_based, comment, query_id)
  File "/home/jj/code/fare/scikit-learn/sklearn/datasets/_svmlight_format_io.py", line 386, in _dump_svmlight
    _dump_svmlight_file(
  File "sklearn/datasets/_svmlight_format_fast.pyx", line 222, in sklearn.datasets._svmlight_format_fast._dump_svmlight_file
  File "sklearn/datasets/_svmlight_format_fast.pyx", line 133, in sklearn.datasets._svmlight_format_fast.get_dense_row_string
  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only

Possibly related issues:

Steps/Code to Reproduce

from sklearn.datasets import dump_svmlight_file
import sklearn
import numpy as np


class FMClassifier(sklearn.base.BaseEstimator):
    def __init__(self):
        super().__init__()
    def fit(self, X, y):
        with open('tmp.txt', 'wb') as f:
            dump_svmlight_file(X, y, f)
        return self
    def predict_proba(self, X):
        return np.zeros(len(X))


from sklearn.utils.estimator_checks import check_estimator
check_estimator(FMClassifier())

Expected Results

Well I should get to the next error, should I? If it's illegal to write into memory (makes sense) then could it be written in the documentation somewhere?

Actual Results

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jj/code/fare/scikit-learn/sklearn/utils/estimator_checks.py", line 627, in check_estimator
    check(estimator)
  File "/home/jj/code/fare/scikit-learn/sklearn/utils/_testing.py", line 318, in wrapper
    return fn(*args, **kwargs)
  File "/home/jj/code/fare/scikit-learn/sklearn/utils/estimator_checks.py", line 2603, in check_estimators_fit_returns_self
    assert estimator.fit(X, y) is estimator
  File "<stdin>", line 6, in fit
  File "/home/jj/code/fare/scikit-learn/sklearn/datasets/_svmlight_format_io.py", line 510, in dump_svmlight_file
    _dump_svmlight(X, y, f, multilabel, one_based, comment, query_id)
  File "/home/jj/code/fare/scikit-learn/sklearn/datasets/_svmlight_format_io.py", line 386, in _dump_svmlight
    _dump_svmlight_file(
  File "sklearn/datasets/_svmlight_format_fast.pyx", line 222, in sklearn.datasets._svmlight_format_fast._dump_svmlight_file
  File "sklearn/datasets/_svmlight_format_fast.pyx", line 133, in sklearn.datasets._svmlight_format_fast.get_dense_row_string
  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only

Versions

System:
    python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
executable: /usr/bin/python
   machine: Linux-6.2.0-39-generic-x86_64-with-glibc2.35

Python dependencies:
      sklearn: 1.2.dev0
          pip: 22.0.2
   setuptools: 59.6.0
        numpy: 1.23.5
        scipy: 1.9.3
       Cython: 0.29.28
       pandas: 1.3.3
   matplotlib: 3.6.0
       joblib: 1.3.2
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/jj/.local/lib/python3.10/site-packages/numpy.libs/libopenblas64_p-r0-742d56dc.3.20.so
        version: 0.3.20
threading_layer: pthreads
   architecture: SkylakeX
    num_threads: 8

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/jj/.local/lib/python3.10/site-packages/scipy.libs/libopenblasp-r0-41284840.3.18.so
        version: 0.3.18
threading_layer: pthreads
   architecture: SkylakeX
    num_threads: 8

       user_api: openmp
   internal_api: openmp
         prefix: libgomp
       filepath: /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0
        version: None
    num_threads: 8
@glemaitre
Copy link
Member

It looks like you have an old scikit-learn. Also the Cython version is pretty old.
Nowadays we require Cython >= 0.29.33.

I think that we solved this issue in main and with newer Cython version.

@glemaitre
Copy link
Member

Uhm actually I can reproduce with main.

@Charlie-XIAO
Copy link
Contributor

Charlie-XIAO commented Jan 12, 2024

It seems that one of the checks is to check that fit returns self when X is memmap-backed data, which is not writeable. I'm thinking that changing int_or_float[:, :] X to const int_or_float[:, :] X may solve the issue. I've opened #28111 to give it a try.

@Charlie-XIAO
Copy link
Contributor

@jilljenn Except from the read-only error which seems to be a scikit-learn bug, your estimator still fails other tests. The following modification will make it pass the checks.

import numpy as np
import scipy.sparse as sp

from sklearn.base import BaseEstimator
from sklearn.datasets import dump_svmlight_file
from sklearn.utils.estimator_checks import check_estimator
from sklearn.utils.validation import check_is_fitted


class FMClassifier(BaseEstimator):

    def __init__(self):
        super().__init__()

    def fit(self, X, y):
        with open("tmp.txt", "wb") as f:
            dump_svmlight_file(X, y, f)
        self._is_fitted = True
        return self

    def predict_proba(self, X):
        check_is_fitted(self)
        if sp.issparse(X):
            raise TypeError("Sparse data is not accepted")
        return np.zeros(len(X))

    def __sklearn_is_fitted__(self):
        return hasattr(self, "_is_fitted") and self._is_fitted

    def _more_tags(self):
        return {"no_validation": True}


if __name__ == "__main__":
    check_estimator(FMClassifier())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants