GitHub - raybellwaves/adlfs: fsspec-compatible Azure Datake and Azure Blob Storage access

Dask interface to Azure-Datalake Gen1 and Gen2 Storage

Warning: this code is experimental and untested.

Quickstart

This package is on PyPi and can be installed using:

pip install adlfs

To use the Gen1 filesystem:

import dask.dataframe as dd
from fsspec.registry import known_implementations
known_implementations['adl'] = {'class': 'adlfs.AzureDatalakeFileSystem'}
STORAGE_OPTIONS={'tenant_id': TENANT_ID, 'client_id': CLIENT_ID, 'client_secret': CLIENT_SECRET}

dd.read_csv('adl://{STORE_NAME}/{FOLDER}/*.csv', storage_options=STORAGE_OPTIONS}

To use the Gen2 filesystem:

import dask.dataframe as dd
from fsspec.registry import known_implementations
known_implementations['abfs'] = {'class': 'adlfs.AzureDatalakeFileSystem'}
STORAGE_OPTIONS={'account_name': ACCOUNT_NAME, 'account_key': ACCOUNT_KEY}

ddf = dd.read_csv('abfs://{CONTAINER}/{FOLDER}/*.csv', storage_options=STORAGE_OPTIONS}
ddf = dd.read_csv('abfs://{CONTAINER}/folder.parquet', storage_options=STORAGE_OPTIONS}

Details

The package includes pythonic filesystem implementations for both Azure Datalake Gen1 and Azure Datalake Gen2, that facilitate interactions between both Azure Datalake implementations and Dask. This is done leveraging the intake/filesystem_spec base class and Azure Python SDKs.

Operations against both Gen1 Datalake currently only work with an Azure ServicePrincipal with suitable credentials to perform operations on the resources of choice.

Operations against the Gen2 Datalake are implemented by leveraging multi-protocol access, using the Azure Blob Storage Python SDK. Authentication is currently implemented only by the ACCOUNT_NAME and an ACCOUNT_KEY.

Name		Name	Last commit message	Last commit date
Latest commit History 150 Commits
adlfs		adlfs
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE.txt		LICENSE.txt
README.md		README.md
azure-pipelines.yml		azure-pipelines.yml
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dask interface to Azure-Datalake Gen1 and Gen2 Storage

Quickstart

Details

About

Releases

Packages

Languages

License

raybellwaves/adlfs

Folders and files

Latest commit

History

Repository files navigation

Dask interface to Azure-Datalake Gen1 and Gen2 Storage

Quickstart

Details

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages