Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Got more bytes so far than requested #160

Closed
weiji14 opened this issue Oct 18, 2019 · 2 comments · Fixed by #161
Closed

ValueError: Got more bytes so far than requested #160

weiji14 opened this issue Oct 18, 2019 · 2 comments · Fixed by #161

Comments

@weiji14
Copy link

weiji14 commented Oct 18, 2019

There seems to be an issue with using fsspec to fetch large-ish files over http:

import fsspec
import xarray as xr
with fsspec.open("http://test.opendap.org/opendap/data/nc/coads_climatology2.nc") as f:
    ds = xr.open_dataset(f)

Full error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-4b2387e53223> in <module>
      2 import xarray as xr
      3 with fsspec.open("http://test.opendap.org/opendap/data/nc/coads_climatology2.nc") as f:
----> 4     ds = xr.open_dataset(f)

~/.local/share/virtualenvs/condaenv-AbcDeF1z/lib/python3.7/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime)
    524                 "with engine='scipy' or 'h5netcdf'"
    525             )
--> 526         engine = _get_engine_from_magic_number(filename_or_obj)
    527         if engine == "scipy":
    528             store = backends.ScipyDataStore(filename_or_obj, **backend_kwargs)

~/.local/share/virtualenvs/condaenv-AbcDeF1z/lib/python3.7/site-packages/xarray/backends/api.py in _get_engine_from_magic_number(filename_or_obj)
    118                 "manager"
    119             )
--> 120         magic_number = filename_or_obj.read(8)
    121         filename_or_obj.seek(0)
    122 

~/.local/share/virtualenvs/condaenv-AbcDeF1z/src/fsspec/fsspec/implementations/http.py in read(self, length)
    257         else:
    258             length = min(self.size - self.loc, length)
--> 259         return super().read(length)
    260 
    261     def _fetch_all(self):

~/.local/share/virtualenvs/condaenv-AbcDeF1z/src/fsspec/fsspec/spec.py in read(self, length)
   1045             # don't even bother calling fetch
   1046             return b""
-> 1047         out = self.cache._fetch(self.loc, self.loc + length)
   1048         self.loc += len(out)
   1049         return out

~/.local/share/virtualenvs/condaenv-AbcDeF1z/src/fsspec/fsspec/core.py in _fetch(self, start, end)
    596         ):
    597             # First read, or extending both before and after
--> 598             self.cache = self.fetcher(start, bend)
    599             self.start = start
    600         elif start < self.start:

~/.local/share/virtualenvs/condaenv-AbcDeF1z/src/fsspec/fsspec/implementations/http.py in _fetch_range(self, start, end)
    311                         raise ValueError(
    312                             "Got more bytes so far (>%i) than requested (%i)"
--> 313                             % (cl, end - start)
    314                         )
    315                 else:

ValueError: Got more bytes so far (>5242929) than requested (5242888)

The example test NetCDF file above is actually just 5.2MB in size. However, trying it on a smaller file of just 3.0MB works:

import fsspec
import xarray as xr
with fsspec.open("http://test.opendap.org/opendap/data/nc/coads_climatology.nc") as f:
    ds = xr.open_dataset(f, decode_times=False)
    print(ds)
<xarray.Dataset>
Dimensions:  (COADSX: 180, COADSY: 90, TIME: 12)
Coordinates:
  * COADSX   (COADSX) float64 21.0 23.0 25.0 27.0 ... 373.0 375.0 377.0 379.0
  * COADSY   (COADSY) float64 -89.0 -87.0 -85.0 -83.0 ... 83.0 85.0 87.0 89.0
  * TIME     (TIME) float64 366.0 1.096e+03 1.827e+03 ... 7.671e+03 8.401e+03
Data variables:
    SST      (TIME, COADSY, COADSX) float32 ...
    AIRT     (TIME, COADSY, COADSX) float32 ...
    UWND     (TIME, COADSY, COADSX) float32 ...
    VWND     (TIME, COADSY, COADSX) float32 ...
Attributes:
    history:  FERRET V4.30 (debug/no GUI) 15-Aug-96

This was tested using the latest fsspec from master (commit e69b679) and xarray==0.14.0. Carried forward from intake/intake-xarray#56.

@martindurant
Copy link
Member

Yep, you are hitting the problem that fsspec would like to be able to random access the file by issuing Range requests, but the server doesn't respect this. Typical remedies would be

  • set block_size=0 in open, which allows you to stream through the file but doesn't allow random access (so won't work here)
  • set block_size to a number larger than the file, which will then be held all in memory
  • use filecache: to save to local, ideally with target_options={"block_size": 0}. This actually needs a small fix (coming)

@weiji14
Copy link
Author

weiji14 commented Oct 18, 2019

Brilliant! I can confirm that the below works with fsspec master at commit e6b7e1c.

import fsspec
import xarray as xr
with fsspec.open(
    urlpath="filecache://test.opendap.org/opendap/data/nc/coads_climatology2.nc",
    target_protocol="http",
    target_options={"block_size": 0}
) as f:
    ds = xr.open_dataset(f, decode_times=False)
    print(ds)
<xarray.Dataset>
Dimensions:  (COADSX: 180, COADSY: 90, TIME: 12)
Coordinates:
  * COADSX   (COADSX) float64 21.0 23.0 25.0 27.0 ... 373.0 375.0 377.0 379.0
  * COADSY   (COADSY) float64 -89.0 -87.0 -85.0 -83.0 ... 83.0 85.0 87.0 89.0
  * TIME     (TIME) float64 366.0 1.096e+03 1.827e+03 ... 7.671e+03 8.401e+03
Data variables:
    SST      (TIME, COADSY, COADSX) float32 ...
    AIRT     (TIME, COADSY, COADSX) float32 ...
    SPEH     (TIME, COADSY, COADSX) float32 ...
    WSPD     (TIME, COADSY, COADSX) float32 ...
    UWND     (TIME, COADSY, COADSX) float32 ...
    VWND     (TIME, COADSY, COADSX) float32 ...
    SLP      (TIME, COADSY, COADSX) float32 ...
Attributes:
    history:  FERRET V5.22   11-Jan-01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants