You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OSError is raised when executing the test code tests/data/test_mm.py. All test cases failed for the same issue.
$ nosetests ./data/test_mm.py -v
test0_get_default_option (data.test_mm.TestMatrixMarket) ... ok
test1_is_valid_option (data.test_mm.TestMatrixMarket) ... ok
test2_create (data.test_mm.TestMatrixMarket) ... [INFO ] 2023-12-19 04:03:30 [mm.py:247] Create the database from matrix market file.
[DEBUG ] 2023-12-19 04:03:30 [mm.py:252] Building meta part...
^M[PROGRESS] 0.00% 0.0/0.0secs 0.00it/s[INFO ] 2023-12-19 04:03:30 [base.py:179] File ./mm.h5py exists. To build new database, existing file ./mm.h5py will be deleted.
[ERROR ] 2023-12-19 04:03:30 [mm.py:162] Cannot create db: Can't write data (no appropriate function for conversion path) [ERROR ] 2023-12-19 04:03:30 [mm.py:163] Traceback (most recent call last): File "/home/bc-user/.local/lib/python3.10/site-packages/buffalo/data/mm.py", line 141, in _create idmap["rows"][:] = np.loadtxt(fin, dtype=f"S{uid_max_col}") File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "/home/bc-user/.local/lib/python3.10/site-packages/h5py/_hl/dataset.py", line 999, in __setitem__ self.id.write(mspace, fspace, val, mtype, dxpl=self._dxpl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5d.pyx", line 283, in h5py.h5d.DatasetID.write File "h5py/_proxy.pyx", line 114, in h5py._proxy.dset_rw OSError: Can't write data (no appropriate functionfor conversion path)
......(skip the middle lines)
MatrixMarketDataReader: DEBUG: creating temporary matrix-market data from numpy-kind array
MatrixMarket: INFO: Create the database from matrix market file.
MatrixMarket: DEBUG: Building meta part...
[PROGRESS] 0.00% 0.0/0.0secs 0.00it/s
MatrixMarket: INFO: File ./mm.h5py exists. To build new database, existing file ./mm.h5py will be deleted.
MatrixMarket: ERROR: Cannot create db: Can't write data (no appropriate function for conversion path)MatrixMarket: ERROR: Traceback (most recent call last): File "/home/bc-user/.local/lib/python3.10/site-packages/buffalo/data/mm.py", line 141, in _create idmap["rows"][:] = np.loadtxt(fin, dtype=f"S{uid_max_col}") File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "/home/bc-user/.local/lib/python3.10/site-packages/h5py/_hl/dataset.py", line 999, in __setitem__ self.id.write(mspace, fspace, val, mtype, dxpl=self._dxpl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5d.pyx", line 283, in h5py.h5d.DatasetID.write File "h5py/_proxy.pyx", line 114, in h5py._proxy.dset_rwOSError: Can't write data (no appropriate functionfor conversion path)
[PROGRESS] 100.00% 0.0/0.0secs 1,137.96it/s
--------------------- >> end captured logging <<-------------------------------------------------------------------------------------------Ran 10 tests in 0.041sFAILED (errors=5)
The cause is from mismatching between the data type of HDF5 and the numpy object, as annotated in the above error log. The current version only supports "utf-8" encoding for creating idmap, which makes the MatrixMarket object fail to load both user and item ID lists. To resolve the issue, converting the encoding rule from "utf-8" to "ascii" might be the feasible way. I tested a code with the local patch(buffalo/data/base.py) as follows,
test0_get_default_option (data.test_mm.TestMatrixMarket) ... ok
test1_is_valid_option (data.test_mm.TestMatrixMarket) ... ok
test2_create (data.test_mm.TestMatrixMarket) ...
[INFO ] 2023-12-19 04:54:58 [mm.py:247] Create the database from matrix market file.
[DEBUG ] 2023-12-19 04:54:58 [mm.py:252] Building meta part...
[PROGRESS] 0.00% 0.0/0.0secs 0.00it/s[INFO ] 2023-12-19 04:54:58 [base.py:179] File ./mm.h5py exists. To build new database, existing file ./mm.h5py will be deleted.
[PROGRESS] 100.00% 0.0/0.0secs 742.35it/s
[INFO ] 2023-12-19 04:54:58 [mm.py:260] Creating working data...
[PROGRESS] 0.00% 0.0/0.0secs 0.00it/s^M[PROGRESS] 100.00% 0.0/0.0secs 168,937.24it/s
[DEBUG ] 2023-12-19 04:54:58 [mm.py:264] Working data is created on /tmp/tmpr5a6iwrk
[INFO ] 2023-12-19 04:54:58 [mm.py:265] Building data part...
[INFO ] 2023-12-19 04:54:58 [base.py:417] Building compressed triplets for rowwise...
[INFO ] 2023-12-19 04:54:58 [base.py:418] Preprocessing...
[INFO ] 2023-12-19 04:54:58 [base.py:421] In-memory Compressing ...
[INFO ] 2023-12-19 04:54:59 [base.py:301] Load triplet files. Total job files: 73
[INFO ] 2023-12-19 04:54:59 [base.py:451] Finished
[INFO ] 2023-12-19 04:54:59 [base.py:417] Building compressed triplets for colwise...
[INFO ] 2023-12-19 04:54:59 [base.py:418] Preprocessing...
[INFO ] 2023-12-19 04:54:59 [base.py:421] In-memory Compressing ...
[INFO ] 2023-12-19 04:54:59 [base.py:301] Load triplet files. Total job files: 73
[INFO ] 2023-12-19 04:54:59 [base.py:451] Finished
[INFO ] 2023-12-19 04:54:59 [mm.py:279] DB built on ./mm.h5py
ok
......(skip the middle lines)
test3_list (data.test_mm.TestMatrixMarketReader) ... [DEBUG ] 2023-12-19 04:55:01 [mm.py:70] creating temporary matrix-market data from numpy-kind array
ok
----------------------------------------------------------------------
Ran 10 tests in 3.166s
OK
However, this patch is not functional for treating w2v training(PR) in which "utf-8" characters are employed to train Korean words. To reconcile this conflict, providing the appropriate encoding rules for both loading a matrix-market file and a stream data file is one of the feasible actions.
The text was updated successfully, but these errors were encountered:
Bug
OSError
is raised when executing the test codetests/data/test_mm.py
. All test cases failed for the same issue.The cause is from mismatching between the data type of
HDF5
and thenumpy
object, as annotated in the above error log. The current version only supports "utf-8" encoding for creatingidmap
, which makes theMatrixMarket
object fail to load both user and item ID lists. To resolve the issue, converting the encoding rule from "utf-8" to "ascii" might be the feasible way. I tested a code with the local patch(buffalo/data/base.py
) as follows,However, this patch is not functional for treating w2v training(PR) in which "utf-8" characters are employed to train Korean words. To reconcile this conflict, providing the appropriate encoding rules for both loading a matrix-market file and a stream data file is one of the feasible actions.
The text was updated successfully, but these errors were encountered: