Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: add az #91

Closed
wants to merge 1 commit into from
Closed

WIP: add az #91

wants to merge 1 commit into from

Conversation

raybellwaves
Copy link
Contributor

@raybellwaves raybellwaves commented Aug 22, 2020

Closes #78

TODO:

  • Wait for next fsspec release
  • Run test again
  • Add info to README

@raybellwaves raybellwaves marked this pull request as draft August 22, 2020 01:50
@raybellwaves
Copy link
Contributor Author

Currently test is failing at

=============================================== FAILURES ===============================================
__________________________________________ test_dask_parquet ___________________________________________

storage = <azure.storage.blob._blob_service_client.BlobServiceClient object at 0x1170b40d0>

    def test_dask_parquet(storage):
        fs = adlfs.AzureBlobFileSystem(
            account_name=storage.account_name, connection_string=CONN_STR
        )
        fs.mkdir("test")
        STORAGE_OPTIONS = {
            "account_name": "devstoreaccount1",
            "connection_string": CONN_STR,
        }
        df = pd.DataFrame(
            {
                "col1": [1, 2, 3, 4],
                "col2": [2, 4, 6, 8],
                "index_key": [1, 1, 2, 2],
                "partition_key": [1, 1, 2, 2],
            }
        )
    
        dask_dataframe = dd.from_pandas(df, npartitions=1)
        for protocol in ["az", "abfs"]:
>           dask_dataframe.to_parquet(
                "{}://test/test_group.parquet".format(protocol),
                storage_options=STORAGE_OPTIONS,
                engine="pyarrow",
            )

adlfs/tests/test_core.py:488: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/anaconda3/envs/adlfs-dev/lib/python3.8/site-packages/dask/dataframe/core.py:3947: in to_parquet
    return to_parquet(self, path, *args, **kwargs)
/opt/anaconda3/envs/adlfs-dev/lib/python3.8/site-packages/dask/dataframe/io/parquet/core.py:384: in to_parquet
    fs, _, _ = get_fs_token_paths(path, mode="wb", storage_options=storage_options)
/opt/anaconda3/envs/adlfs-dev/lib/python3.8/site-packages/fsspec/core.py:576: in get_fs_token_paths
    cls = get_filesystem_class(protocol)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

protocol = 'az'

    def get_filesystem_class(protocol):
        """Fetch named protocol implementation from the registry
    
        The dict ``known_implementations`` maps protocol names to the locations
        of classes implementing the corresponding file-system. When used for the
        first time, appropriate imports will happen and the class will be placed in
        the registry. All subsequent calls will fetch directly from the registry.
    
        Some protocol implementations require additional dependencies, and so the
        import may fail. In this case, the string in the "err" field of the
        ``known_implementations`` will be given as the error message.
        """
        if not protocol:
            protocol = default
    
        if protocol not in registry:
            if protocol not in known_implementations:
>               raise ValueError("Protocol not known: %s" % protocol)
E               ValueError: Protocol not known: az

/opt/anaconda3/envs/adlfs-dev/lib/python3.8/site-packages/fsspec/registry.py:183: ValueError
--------------------------------------- Captured stdout teardown ---------------------------------------
Teardown azurite docker container
======================================= short test summary info ========================================
FAILED adlfs/tests/test_core.py::test_dask_parquet - ValueError: Protocol not known: az

I see the error is coming from az not being known at
https://github.com/intake/filesystem_spec/blob/master/fsspec/registry.py#L88

Do I need to make a PR there as well?

@raybellwaves
Copy link
Contributor Author

@raybellwaves
Copy link
Contributor Author

Closing in favor of #111

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

use of 'az' as a shorter version of 'abfs'
1 participant