Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting 'GS' key error when reading a csv from GCS using gcsfc #162

Open
ohashmi1 opened this issue Jul 17, 2019 · 10 comments
Open

Getting 'GS' key error when reading a csv from GCS using gcsfc #162

ohashmi1 opened this issue Jul 17, 2019 · 10 comments

Comments

@ohashmi1
Copy link

Hi I upgraded gcsfs and now I get the following error:

My code is pretty simple:

data = dd.read_csv(file_path, parse_dates=[date_column])\
        .compute()
    return data```

It used to work but all of a sudden it stopped working.
file_path = gs://mybuck/res.csv

```File "main.py", line 51, in run
    data = load_parse_file(file_path=args.input_file)
  File "/FbProphet/prophet_gcp/utils.py", line 15, in load_parse_file
    data = dd.read_csv(file_path, parse_dates=[date_column])\
  File "/work/miniconda/lib/python3.7/site-packages/dask/dataframe/io/csv.py", line 578, in read
    **kwargs
  File "/work/miniconda/lib/python3.7/site-packages/dask/dataframe/io/csv.py", line 405, in read_pandas
    **(storage_options or {})
  File "/work/miniconda/lib/python3.7/site-packages/dask/bytes/core.py", line 93, in read_bytes
    fs, fs_token, paths = get_fs_token_paths(urlpath, mode="rb", storage_options=kwargs)
  File "/work/miniconda/lib/python3.7/site-packages/dask/bytes/core.py", line 425, in get_fs_token_paths
    fs, fs_token = get_fs(protocol, options)
  File "/work/miniconda/lib/python3.7/site-packages/dask/bytes/core.py", line 571, in get_fs
    cls = _filesystems[protocol]
KeyError: 'gs'
@martindurant
Copy link
Member

Ah yes, sorry - my fault. For now, you can replace "gs" with "gcs".

@ohashmi1
Copy link
Author

I have tried both, still get the same issue

@martindurant
Copy link
Member

Hm, actually on second thoughts, you are not using the new code at all.

I don't know why you are seeing this, there has been no change in dask (master) or gcsfs (release) yet. Can you show the contents of dask.bytes.code._filesystems, try import gcsfs explicitly, or run dask.bytes.core.get_fs('gs')?

@martindurant
Copy link
Member

Of course, the workaround for you may be simply to downgrade gcsfs until we have completed the transition to fsspec (which is the reason for a little turbulence right now).

@bnaul
Copy link
Contributor

bnaul commented Jul 19, 2019

seeing the same behavior w/ 0.3.0:

[ins] In [2]: dd.read_csv('gs://gcp-public-data-landsat/index.csv.gz', compression=
         ...: 'gzip')
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-2-2c32b0849045> in <module>
----> 1 dd.read_csv('gs://gcp-public-data-landsat/index.csv.gz', compression='gzip')

~/venvs/model/lib/python3.7/site-packages/dask/dataframe/io/csv.py in read(urlpath, blocksize, collection, lineterminator, compression, sample, enforce, assume_missing, storage_options, include_path_column, **kwargs)
    576             storage_options=storage_options,
    577             include_path_column=include_path_column,
--> 578             **kwargs
    579         )
    580

~/venvs/model/lib/python3.7/site-packages/dask/dataframe/io/csv.py in read_pandas(reader, urlpath, blocksize, collection, lineterminator, compression, sample, enforce, assume_missing, storage_options, include_path_column, **kwargs)
    403         compression=compression,
    404         include_path=include_path_column,
--> 405         **(storage_options or {})
    406     )
    407

~/venvs/model/lib/python3.7/site-packages/dask/bytes/core.py in read_bytes(urlpath, delimiter, not_zero, blocksize, sample, compression, include_path, **kwargs)
     91
     92     """
---> 93     fs, fs_token, paths = get_fs_token_paths(urlpath, mode="rb", storage_options=kwargs)
     94
     95     if len(paths) == 0:

~/venvs/model/lib/python3.7/site-packages/dask/bytes/core.py in get_fs_token_paths(urlpath, mode, num, name_function, storage_options)
    423         update_storage_options(options, storage_options)
    424
--> 425         fs, fs_token = get_fs(protocol, options)
    426
    427         if "w" in mode:

~/venvs/model/lib/python3.7/site-packages/dask/bytes/core.py in get_fs(protocol, storage_options)
    569             "    pip install gcsfs",
    570         )
--> 571         cls = _filesystems[protocol]
    572
    573     elif protocol in ["adl", "adlfs"]:

KeyError: 'gs'

re: the q's you asked above

[nav] In [7]: import dask.bytes.core
         ...: dask.bytes.core._filesystems
         ...:
Out[7]: {'file': dask.bytes.local.LocalFileSystem}


[nav] In [9]: import dask.bytes.core
         ...: dask.bytes.core.get_fs('gs')
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-9-855d81a61db6> in <module>
      1 import dask.bytes.core
----> 2 dask.bytes.core.get_fs('gs')

~/venvs/model/lib/python3.7/site-packages/dask/bytes/core.py in get_fs(protocol, storage_options)
    569             "    pip install gcsfs",
    570         )
--> 571         cls = _filesystems[protocol]
    572
    573     elif protocol in ["adl", "adlfs"]:

KeyError: 'gs'

@martindurant
Copy link
Member

I'm afraid you need to use the master version of dask to pick this up, following dask/dask#5064

@bnaul
Copy link
Contributor

bnaul commented Jul 19, 2019

sg, seems like this is resolved then?

@PoradaKev
Copy link

PoradaKev commented Jul 25, 2019

`C:\Anaconda3\lib\site-packages\dask\dataframe\io\csv.py in read(urlpath, blocksize, collection, lineterminator, compression, sample, enforce, assume_missing, storage_options, include_path_column, **kwargs)
    576             storage_options=storage_options,
    577             include_path_column=include_path_column,
--> 578             **kwargs
    579         )
    580 

C:\Anaconda3\lib\site-packages\dask\dataframe\io\csv.py in read_pandas(reader, urlpath, blocksize, collection, lineterminator, compression, sample, enforce, assume_missing, storage_options, include_path_column, **kwargs)
    403         compression=compression,
    404         include_path=include_path_column,
--> 405         **(storage_options or {})
    406     )
    407 

C:\Anaconda3\lib\site-packages\dask\bytes\core.py in read_bytes(urlpath, delimiter, not_zero, blocksize, sample, compression, include_path, **kwargs)
     91 
     92     """
---> 93     fs, fs_token, paths = get_fs_token_paths(urlpath, mode="rb", storage_options=kwargs)
     94 
     95     if len(paths) == 0:

C:\Anaconda3\lib\site-packages\dask\bytes\core.py in get_fs_token_paths(urlpath, mode, num, name_function, storage_options)
    423         update_storage_options(options, storage_options)
    424 
--> 425         fs, fs_token = get_fs(protocol, options)
    426 
    427         if "w" in mode:

C:\Anaconda3\lib\site-packages\dask\bytes\core.py in get_fs(protocol, storage_options)
    569             "    pip install gcsfs",
    570         )
--> 571         cls = _filesystems[protocol]
    572 
    573     elif protocol in ["adl", "adlfs"]:

KeyError: 'gcs'

Have the same issue now
dask 2.1.0 py_0
dask-core 2.1.0 py_0
gcsfs 0.3.0 py_0 conda-forge

@martindurant
Copy link
Member

@PoradaKev - you a version of gcsfs that is too new for Dask. Either downgrade, or install Dask from master.

@chwonghk01
Copy link

Tried that dask==2.1.0 and gcsfs==0.2.3 would work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants