Skip to content

ijlyttle/AzureDatalakeStore

Repository files navigation

CRAN_Status_Badge Travis-CI Build Status Coverage Status

AzureDatalakeStore

Overview

This package is used to access the datalake store itself. Looking at the reference, we note that this is using WebHDFS.

Just to note that the goal of this package is to get things working for Azure Datalake Store. If we end up being able to support WebHDFS, that will be a bonus.

WebHDFS seems to use a lot of redirects. However, this Microsoft blog suggests that we may not have to do that. If so, I don't know if this would be a "feature" of Azure or a "feature" of httr, or maybe curl.

What we will need

  • adls S3 object to hold the base-url and the token done

  • Make a directory done

  • List a directory done

  • Rename a file/folder in a directory done

  • Delete a file/directory done

  • Upload a file to a directory done

  • Read a file from a directory done

  • Append to a file in a directory done

  • Get file status (file/directory) (consider sending back as data-frame and reusing format from list_status)

  • Get contents summary (directory)

  • Concatenate files

For the file uploads, it may be easiest to provide a filepath (even a temporary one). This makes us dependent on a filesystem, whereas I would rather not be (shinyapps.io, for example).

Following readr, file can be:

  • form_file (upload_file())

Near future:

  • single string
  • no newlines, interpret as path, guess type
  • one-or-more newlines, interpret as text
  • raw vector, literal data

Farther future, handle connections and files that begin with http://, etc.

How about adls_is_empty() to see if a path returns anything?

Type-stability for functions that return file-status objects. Thinking of list_status and get_file_status. Return empty data-frames or NULL? NULL.

Installation

AzureDatalakeStore is not yet available on CRAN. You may install from GitHub:

# install.packages("devtools")
devtools::install_github("ijlyttle/AzureDatalakeStore")

If you encounter a clear bug, please file a minimal reproducible example on github.

Acknowlegements

This package draws inspiration from Forest Fang's rwebhdfs package.

Code of conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

About

REST API Client for Microsoft Azure Data Lake Store

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages