Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue reading timestamp attributes #121

Open
glemaitre opened this issue Apr 21, 2021 · 3 comments
Open

Issue reading timestamp attributes #121

glemaitre opened this issue Apr 21, 2021 · 3 comments

Comments

@glemaitre
Copy link
Contributor

I forward a limitation pointed out by @zuliani99 in scikit-learn: scikit-learn/scikit-learn#19944
It is more appropriate to solve the issue upstream than in the vendor version in scikit-learn.

Describe the bug

I am trying to fetch a dataset with the fetch_openml api and I notice that it can't handle date type features like timestamp.

Steps/Code to Reproduce

id = 41889
X, y = fetch_openml(data_id=id, as_frame=True, return_X_y=True, cache=False)
y = y.to_frame()
X[y.columns[0]] = y
df = X

Expected Results

I expected it returns the usual X and y.

Actual Results

Traceback (most recent call last):
  File "start.py", line 29, in <module>
    main()
  File "start.py", line 25, in main
    test()
  File "/home/riccardo/Desktop/AutoML-Benchmark/functions/test.py", line 10, in test
    X, y = fetch_openml(data_id=id, as_frame=True, return_X_y=True, cache=False)
  File "/home/riccardo/.local/lib/python3.8/site-packages/sklearn/utils/validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "/home/riccardo/.local/lib/python3.8/site-packages/sklearn/datasets/_openml.py", line 915, in fetch_openml
    bunch = _download_data_to_bunch(url, return_sparse, data_home,
  File "/home/riccardo/.local/lib/python3.8/site-packages/sklearn/datasets/_openml.py", line 633, in _download_data_to_bunch
    out = _retry_with_clean_cache(url, data_home)(
  File "/home/riccardo/.local/lib/python3.8/site-packages/sklearn/datasets/_openml.py", line 59, in wrapper
    return f(*args, **kw)
  File "/home/riccardo/.local/lib/python3.8/site-packages/sklearn/datasets/_openml.py", line 514, in _load_arff_response
    arff = _arff.load(stream,
  File "/home/riccardo/.local/lib/python3.8/site-packages/sklearn/externals/_arff.py", line 1078, in load
    return decoder.decode(fp, encode_nominal=encode_nominal,
  File "/home/riccardo/.local/lib/python3.8/site-packages/sklearn/externals/_arff.py", line 915, in decode
    raise e
  File "/home/riccardo/.local/lib/python3.8/site-packages/sklearn/externals/_arff.py", line 911, in decode
    return self._decode(s, encode_nominal=encode_nominal,
  File "/home/riccardo/.local/lib/python3.8/site-packages/sklearn/externals/_arff.py", line 842, in _decode
    attr = self._decode_attribute(row)
  File "/home/riccardo/.local/lib/python3.8/site-packages/sklearn/externals/_arff.py", line 784, in _decode_attribute
    raise BadAttributeType()
sklearn.externals._arff.BadAttributeType: Bad @ATTRIBUTE type, at line 2.

Versions

System:
python: 3.8.5 (default, Jan 27 2021, 15:41:15) [GCC 9.3.0]
executable: /usr/bin/python3
machine: Linux-5.8.0-50-generic-x86_64-with-glibc2.29

Python dependencies:
pip: 21.0.1
setuptools: 56.0.0
sklearn: 0.24.1
numpy: 1.19.5
scipy: 1.5.4
Cython: 0.29.22
pandas: 1.1.4
matplotlib: 3.4.1
joblib: 1.0.1
threadpoolctl: 2.1.0

Built with OpenMP: True

@jnothman
Copy link
Contributor

See also #44

@zuliani99
Copy link

I also find out that sometimes after fetch a database from OpenML the y variable so the target is a NoneType variable, so how can I figure out what's the target value? Maybe the y is always the last column of the dataset?

@mfeurer
Copy link
Collaborator

mfeurer commented Apr 26, 2021

I also find out that sometimes after fetch a database from OpenML the y variable so the target is a NoneType variable, so how can I figure out what's the target value? Maybe the y is always the last column of the dataset?

Please open an issue with the library you're using to download the data, liac-arff does not know about OpenML.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants