Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] load_solar failing on main, with pandas._libs.tslibs.parsing.DateParseError #5527

Closed
yarnabrina opened this issue Nov 3, 2023 · 5 comments
Labels
bug Something isn't working module:datasets&loaders data sets and data loaders
Projects

Comments

@yarnabrina
Copy link
Collaborator

MCVE

>>> from sktime.datasets import load_solar
>>> load_solar()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/anirban/sktime-fork/sktime/datasets/_single_problem_loaders.py", line 1272, in load_solar
    return _load_solar(
  File "/home/anirban/sktime-fork/sktime/datasets/_single_problem_loaders.py", line 1257, in _load_solar
    df = df.asfreq("30T")
  File "/home/anirban/conda-environments/sktime/lib/python3.10/site-packages/pandas/core/frame.py", line 10971, in asfreq
    return super().asfreq(
  File "/home/anirban/conda-environments/sktime/lib/python3.10/site-packages/pandas/core/generic.py", line 8347, in asfreq
    return asfreq(
  File "/home/anirban/conda-environments/sktime/lib/python3.10/site-packages/pandas/core/resample.py", line 2232, in asfreq
    dti = date_range(obj.index.min(), obj.index.max(), freq=freq)
  File "/home/anirban/conda-environments/sktime/lib/python3.10/site-packages/pandas/core/indexes/datetimes.py", line 945, in date_range
    dtarr = DatetimeArray._generate_range(
  File "/home/anirban/conda-environments/sktime/lib/python3.10/site-packages/pandas/core/arrays/datetimes.py", line 401, in _generate_range
    start = Timestamp(start)
  File "pandas/_libs/tslibs/timestamps.pyx", line 1667, in pandas._libs.tslibs.timestamps.Timestamp.__new__
  File "pandas/_libs/tslibs/conversion.pyx", line 280, in pandas._libs.tslibs.conversion.convert_to_tsobject
  File "pandas/_libs/tslibs/conversion.pyx", line 557, in pandas._libs.tslibs.conversion.convert_str_to_tsobject
  File "pandas/_libs/tslibs/parsing.pyx", line 329, in pandas._libs.tslibs.parsing.parse_datetime_string
  File "pandas/_libs/tslibs/parsing.pyx", line 658, in pandas._libs.tslibs.parsing.dateutil_parse
pandas._libs.tslibs.parsing.DateParseError: Unknown datetime string format, unable to parse: %Y-%m-%dT%H:%i:%SZ

Version

>>> from sktime import show_versions; show_versions()

System:
    python: 3.10.12 (main, Jul  5 2023, 18:54:27) [GCC 11.2.0]
executable: /home/anirban/conda-environments/sktime/bin/python
   machine: Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.35

Python dependencies:
          pip: 23.3.1
       sktime: 0.24.0
      sklearn: 1.3.0
       skbase: 0.4.6
        numpy: 1.24.4
        scipy: 1.11.1
       pandas: 2.0.3
   matplotlib: 3.7.2
       joblib: 1.3.1
        numba: 0.57.1
  statsmodels: 0.14.0
     pmdarima: 2.0.3
statsforecast: 1.6.0
      tsfresh: None
      tslearn: None
        torch: None
   tensorflow: None
tensorflow_probability: None
@yarnabrina yarnabrina added the bug Something isn't working label Nov 3, 2023
@yarnabrina
Copy link
Collaborator Author

@fkiraly I don't think it's coming from #5437, as I just replicated #5004 changes. But if I missed something, please let me know.

@fkiraly
Copy link
Collaborator

fkiraly commented Nov 4, 2023

FYI @ciaran-g

@fkiraly
Copy link
Collaborator

fkiraly commented Nov 4, 2023

We have the same in PeakTimeFeatures, see below.

My first suspicion was pandas, but there is no new release. Neither for numpy.

133     Examples
134     --------
135     >>> from sktime.transformations.series.peak import PeakTimeFeature
136     >>> from sktime.datasets import load_solar
137     >>> y = load_solar()
UNEXPECTED EXCEPTION: ValueError('could not convert string to Timestamp')
Traceback (most recent call last):
  File "pandas/_libs/tslibs/conversion.pyx", line 530, in pandas._libs.tslibs.conversion._convert_str_to_tsobject
  File "pandas/_libs/tslibs/parsing.pyx", line 318, in pandas._libs.tslibs.parsing.parse_datetime_string
  File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/dateutil/parser/_parser.py", line 1368, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/dateutil/parser/_parser.py", line 643, in parse
    raise ParserError("Unknown string format: %s", timestr)
dateutil.parser._parser.ParserError: Unknown string format: %Y-%m-%dT%H:%i:%SZ
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/doctest.py", line 1350, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest sktime.transformations.series.peak.PeakTimeFeature[2]>", line 1, in <module>
  File "/home/runner/work/sktime/sktime/sktime/datasets/_single_problem_loaders.py", line 1277, in load_solar
    return _load_solar(
  File "/home/runner/work/sktime/sktime/sktime/datasets/_single_problem_loaders.py", line 1262, in _load_solar
    df = df.asfreq("30T")
  File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/pandas/core/frame.py", line 11367, in asfreq
    return super().asfreq(
  File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/pandas/core/generic.py", line 8235, in asfreq
    return asfreq(
  File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/pandas/core/resample.py", line 2229, in asfreq
    dti = date_range(obj.index.min(), obj.index.max(), freq=freq)
  File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/pandas/core/indexes/datetimes.py", line 1125, in date_range
    dtarr = DatetimeArray._generate_range(
  File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/pandas/core/arrays/datetimes.py", line 361, in _generate_range
    start = Timestamp(start)
  File "pandas/_libs/tslibs/timestamps.pyx", line 1698, in pandas._libs.tslibs.timestamps.Timestamp.__new__
  File "pandas/_libs/tslibs/conversion.pyx", line 249, in pandas._libs.tslibs.conversion.convert_to_tsobject
  File "pandas/_libs/tslibs/conversion.pyx", line 533, in pandas._libs.tslibs.conversion._convert_str_to_tsobject
ValueError: could not convert string to Timestamp
/home/runner/work/sktime/sktime/sktime/transformations/series/peak.py:137: UnexpectedException

@fkiraly fkiraly changed the title [BUG] load_solar is failing on main [BUG] pandas._libs.tslibs.parsing.DateParseError failures on main, in load_solar and PeakTimeFeatures Nov 4, 2023
@fkiraly fkiraly changed the title [BUG] pandas._libs.tslibs.parsing.DateParseError failures on main, in load_solar and PeakTimeFeatures [BUG] pandas._libs.tslibs.parsing.DateParseError failures on main, in load_solar Nov 4, 2023
@fkiraly
Copy link
Collaborator

fkiraly commented Nov 4, 2023

this is actually also coming from load_solar.

A data download should not run in a docstring, that should be easier to fix.

@fkiraly fkiraly changed the title [BUG] pandas._libs.tslibs.parsing.DateParseError failures on main, in load_solar [BUG] load_solar failing on main, with pandas._libs.tslibs.parsing.DateParseError Nov 4, 2023
@fkiraly fkiraly added the module:datasets&loaders data sets and data loaders label Nov 4, 2023
@fkiraly fkiraly added this to Needs triage & validation in Bugfixing via automation Nov 4, 2023
fkiraly added a commit that referenced this issue Nov 4, 2023
This PR skips the downloader `load_solar` in doctests to avoid
downloading data in doctests and keep download tests localized to the
respective CI element.

See #5527 (comment)
fkiraly added a commit that referenced this issue Nov 4, 2023
This PR excludes download tests the "no soft dependencies" CI element,
as downloads are now tested in a regular cron.

Also prevents the current failure arising from
#5527
in PR CI.
@yarnabrina
Copy link
Collaborator Author

Can not reproduce the originally reported issue on current main, so closing. It must be fixed by sometime in last 2 months, don't know when.

Bugfixing automation moved this from Needs triage & validation to Fixed/resolved Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module:datasets&loaders data sets and data loaders
Projects
Bugfixing
Fixed/resolved
Development

No branches or pull requests

2 participants