ValueError: buffer source array is read-only in check_estimator #28026

jilljenn · 2023-12-27T15:18:08Z

Describe the bug

I am trying to make a scikit-learn estimator FMClassifier based on Python wrapper pyWFM for C++ library libFM (yes 😅).

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jj/code/fare/scikit-learn/sklearn/utils/estimator_checks.py", line 627, in check_estimator
    check(estimator)
  File "/home/jj/code/fare/scikit-learn/sklearn/utils/_testing.py", line 318, in wrapper
    return fn(*args, **kwargs)
  File "/home/jj/code/fare/scikit-learn/sklearn/utils/estimator_checks.py", line 2603, in check_estimators_fit_returns_self
    assert estimator.fit(X, y) is estimator
  File "/home/jj/code/ktm/fm.py", line 40, in fit
    model = fm.run(X, y, X, y)
  File "/home/jj/.local/lib/python3.10/site-packages/pywFM/__init__.py", line 149, in run
    dump_svmlight_file(x_train, y_train, train_path)
  File "/home/jj/code/fare/scikit-learn/sklearn/datasets/_svmlight_format_io.py", line 513, in dump_svmlight_file
    _dump_svmlight(X, y, f, multilabel, one_based, comment, query_id)
  File "/home/jj/code/fare/scikit-learn/sklearn/datasets/_svmlight_format_io.py", line 386, in _dump_svmlight
    _dump_svmlight_file(
  File "sklearn/datasets/_svmlight_format_fast.pyx", line 222, in sklearn.datasets._svmlight_format_fast._dump_svmlight_file
  File "sklearn/datasets/_svmlight_format_fast.pyx", line 133, in sklearn.datasets._svmlight_format_fast.get_dense_row_string
  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only

Possibly related issues:

Steps/Code to Reproduce

from sklearn.datasets import dump_svmlight_file
import sklearn
import numpy as np


class FMClassifier(sklearn.base.BaseEstimator):
    def __init__(self):
        super().__init__()
    def fit(self, X, y):
        with open('tmp.txt', 'wb') as f:
            dump_svmlight_file(X, y, f)
        return self
    def predict_proba(self, X):
        return np.zeros(len(X))


from sklearn.utils.estimator_checks import check_estimator
check_estimator(FMClassifier())

Expected Results

Well I should get to the next error, should I? If it's illegal to write into memory (makes sense) then could it be written in the documentation somewhere?

Actual Results

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jj/code/fare/scikit-learn/sklearn/utils/estimator_checks.py", line 627, in check_estimator
    check(estimator)
  File "/home/jj/code/fare/scikit-learn/sklearn/utils/_testing.py", line 318, in wrapper
    return fn(*args, **kwargs)
  File "/home/jj/code/fare/scikit-learn/sklearn/utils/estimator_checks.py", line 2603, in check_estimators_fit_returns_self
    assert estimator.fit(X, y) is estimator
  File "<stdin>", line 6, in fit
  File "/home/jj/code/fare/scikit-learn/sklearn/datasets/_svmlight_format_io.py", line 510, in dump_svmlight_file
    _dump_svmlight(X, y, f, multilabel, one_based, comment, query_id)
  File "/home/jj/code/fare/scikit-learn/sklearn/datasets/_svmlight_format_io.py", line 386, in _dump_svmlight
    _dump_svmlight_file(
  File "sklearn/datasets/_svmlight_format_fast.pyx", line 222, in sklearn.datasets._svmlight_format_fast._dump_svmlight_file
  File "sklearn/datasets/_svmlight_format_fast.pyx", line 133, in sklearn.datasets._svmlight_format_fast.get_dense_row_string
  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only

Versions

System:
    python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
executable: /usr/bin/python
   machine: Linux-6.2.0-39-generic-x86_64-with-glibc2.35

Python dependencies:
      sklearn: 1.2.dev0
          pip: 22.0.2
   setuptools: 59.6.0
        numpy: 1.23.5
        scipy: 1.9.3
       Cython: 0.29.28
       pandas: 1.3.3
   matplotlib: 3.6.0
       joblib: 1.3.2
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/jj/.local/lib/python3.10/site-packages/numpy.libs/libopenblas64_p-r0-742d56dc.3.20.so
        version: 0.3.20
threading_layer: pthreads
   architecture: SkylakeX
    num_threads: 8

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/jj/.local/lib/python3.10/site-packages/scipy.libs/libopenblasp-r0-41284840.3.18.so
        version: 0.3.18
threading_layer: pthreads
   architecture: SkylakeX
    num_threads: 8

       user_api: openmp
   internal_api: openmp
         prefix: libgomp
       filepath: /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0
        version: None
    num_threads: 8

The text was updated successfully, but these errors were encountered:

glemaitre · 2023-12-27T18:00:59Z

It looks like you have an old scikit-learn. Also the Cython version is pretty old.
Nowadays we require Cython >= 0.29.33.

I think that we solved this issue in main and with newer Cython version.

glemaitre · 2023-12-27T18:02:39Z

Uhm actually I can reproduce with main.

Charlie-XIAO · 2024-01-12T15:48:42Z

It seems that one of the checks is to check that fit returns self when X is memmap-backed data, which is not writeable. I'm thinking that changing int_or_float[:, :] X to const int_or_float[:, :] X may solve the issue. I've opened #28111 to give it a try.

Charlie-XIAO · 2024-01-12T16:09:19Z

@jilljenn Except from the read-only error which seems to be a scikit-learn bug, your estimator still fails other tests. The following modification will make it pass the checks.

import numpy as np
import scipy.sparse as sp

from sklearn.base import BaseEstimator
from sklearn.datasets import dump_svmlight_file
from sklearn.utils.estimator_checks import check_estimator
from sklearn.utils.validation import check_is_fitted


class FMClassifier(BaseEstimator):

    def __init__(self):
        super().__init__()

    def fit(self, X, y):
        with open("tmp.txt", "wb") as f:
            dump_svmlight_file(X, y, f)
        self._is_fitted = True
        return self

    def predict_proba(self, X):
        check_is_fitted(self)
        if sp.issparse(X):
            raise TypeError("Sparse data is not accepted")
        return np.zeros(len(X))

    def __sklearn_is_fitted__(self):
        return hasattr(self, "_is_fitted") and self._is_fitted

    def _more_tags(self):
        return {"no_validation": True}


if __name__ == "__main__":
    check_estimator(FMClassifier())

jilljenn added Bug Needs Triage Issue requires triage labels Dec 27, 2023

jilljenn mentioned this issue Dec 27, 2023

Can FMClassifier pass scikit-learn's check_estimator tests? jilljenn/ktm#13

Open

Charlie-XIAO mentioned this issue Jan 12, 2024

FIX dump svmlight when data is read-only #28111

Merged

glemaitre removed the Needs Triage Issue requires triage label Jan 12, 2024

jjerphan closed this as completed in #28111 Jan 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: buffer source array is read-only in check_estimator #28026

ValueError: buffer source array is read-only in check_estimator #28026

jilljenn commented Dec 27, 2023

glemaitre commented Dec 27, 2023

glemaitre commented Dec 27, 2023

Charlie-XIAO commented Jan 12, 2024 •

edited

Charlie-XIAO commented Jan 12, 2024

ValueError: buffer source array is read-only in check_estimator #28026

ValueError: buffer source array is read-only in check_estimator #28026

Comments

jilljenn commented Dec 27, 2023

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

glemaitre commented Dec 27, 2023

glemaitre commented Dec 27, 2023

Charlie-XIAO commented Jan 12, 2024 • edited

Charlie-XIAO commented Jan 12, 2024

Charlie-XIAO commented Jan 12, 2024 •

edited