Skip to content
Justin Fu edited this page Sep 16, 2018 · 3 revisions

V2 Features

This document contains experimental features for a future release of doodad.

DFile

The DFile library (based on GFile) provides a seamless file interface for local files and remote files. The dfile.open function automatically detects whether a file is located locally or on some remote service (via SSH, S3, or more) and returns the appropriate file pointer. This means that in most cases you can write your code as if all files were local, without worrying about where the data is stored.

DFile detects the location of the file based on the prefix of the filename. For some of these options (i.e. GCS, AWS, SSH) you will need to configure the appropriate credentials

  • s3://<path> will map to a remote file via AWS S3.
  • gs://<path> will map to a remote file via Google Cloud Storage.
  • ssh://<username>@<hostname>/<path> will map to a remote file via SSH.
  • http://<path> or https://<path> will map to a web resource. Only reading is allowed for this type of file.
  • docker://<container-id>/<path> will map to a file inside a locally running docker container.
  • All other filenames will be mapped to the local filesystem.

Here is some example usage:

import doodad.dfile as dfile

with dfile.open(r's3://my.bucket/my_file.txt', mode='r') as f:
    # This will read a file from S3

with dfile.open(r'ssh://user@hostname.com/tmp/my_file.txt', mode='w') as f:
    # This will write a file and copy it to user@hostname.com via SSH

There are some external libraries which are hardcoded to use python's open function. In this case, we can override python's open built-in with dfile.open to force the external library to use DFile as follows:

import doodad.dfile as dfile

with dfile.override_builtin():
   import external_library

Credentials

The credentials library manages credentials for remote services such as SSH and AWS.

doodad.credentials.aws

doodad.credentials.ssh

TODO: Explain how to configure credentials.

Clone this wiki locally