Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-24475: Rewrite of Butler URI and major clean up of S3 usage #336

Merged
merged 45 commits into from
Jul 30, 2020

Commits on Jul 21, 2020

  1. Add more transfer modes to posix export

    We really need to generalize the code that transfers one
    file to another location so that ingest and export use
    the same code and preferably can be shared amongst different
    datastores. You can imagine ingest from S3 to local file or
    export from S3 to local file or export from local file
    to S3.  It is not scalable to put this code in multiple
    places in multiple datastores.
    timj committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    f02ff92 View commit details
    Browse the repository at this point in the history
  2. Move ButlerURI to separate file

    timj committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    1cd2aff View commit details
    Browse the repository at this point in the history
  3. Refactor ButlerURI to start creating subclasses

    Now you get different subclasses for schemeless, file
    and generic URIs.  This simplifies some of the
    logic for handling path components.
    timj committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    54208da View commit details
    Browse the repository at this point in the history

Commits on Jul 22, 2020

  1. Refactor the URI fix up code for subclasses

    This now leaves us with a tiny bit of duplication
    between schemeless and file fixups.
    timj committed Jul 22, 2020
    Configuration menu
    Copy the full SHA
    8287312 View commit details
    Browse the repository at this point in the history
  2. Simply schemeless fix up code

    Defer converting to posix form until the end, thereby letting
    us use os.sep everywhere.
    timj committed Jul 22, 2020
    Configuration menu
    Copy the full SHA
    f926cb7 View commit details
    Browse the repository at this point in the history

Commits on Jul 23, 2020

  1. Add concrete methods for transfer and exists of URI resources

    ButlerURI.transfer_from(URI) now works for file, http, and S3.
    timj committed Jul 23, 2020
    Configuration menu
    Copy the full SHA
    0641357 View commit details
    Browse the repository at this point in the history
  2. Change ButlerURI.dirname so that it does not force absolute path

    It is a surprise to see a schemeless relative path become
    a schemeless absolute path when asking for the directory
    component. Changing this required a couple of tests to
    be updated that were assuming absolute paths.
    timj committed Jul 23, 2020
    Configuration menu
    Copy the full SHA
    b2ce3f8 View commit details
    Browse the repository at this point in the history
  3. Remove abstract method

    mypy gets confused that new does not return a ButlerURI
    which leads to mypy thinking that we are calling
    abstract methods.
    timj committed Jul 23, 2020
    Configuration menu
    Copy the full SHA
    b4c8cce View commit details
    Browse the repository at this point in the history
  4. Add ButlerURI.join

    timj committed Jul 23, 2020
    Configuration menu
    Copy the full SHA
    12a4ddd View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    a8f873e View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    f2c0f62 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    dfb206e View commit details
    Browse the repository at this point in the history
  8. Add ButlerURI.read method

    timj committed Jul 23, 2020
    Configuration menu
    Copy the full SHA
    446f779 View commit details
    Browse the repository at this point in the history

Commits on Jul 24, 2020

  1. Add ButlerURI.isabs

    Simple routine that special cases schemeless URIs.
    timj committed Jul 24, 2020
    Configuration menu
    Copy the full SHA
    640f65d View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ff38951 View commit details
    Browse the repository at this point in the history
  3. Fix join dirLike property

    timj committed Jul 24, 2020
    Configuration menu
    Copy the full SHA
    ad31c1c View commit details
    Browse the repository at this point in the history
  4. Add pkg resource URI support

    timj committed Jul 24, 2020
    Configuration menu
    Copy the full SHA
    dcbc0cd View commit details
    Browse the repository at this point in the history
  5. Replace S3 and Resource calls with ButlerURI

    Now unified resource access for files, S3, HTTP, and pkg_resources.
    Much simplified I/O.
    
    Still need to sort out dumpToFile
    timj committed Jul 24, 2020
    Configuration menu
    Copy the full SHA
    bbff6db View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    c2fe7ab View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    4ca5e3a View commit details
    Browse the repository at this point in the history
  8. For local files check if URI is a directory

    This helps in cases like tempfile.mkdtemp where the path
    returned does not include a trailing slash but is used
    as a string and not a ButlerURI.
    timj committed Jul 24, 2020
    Configuration menu
    Copy the full SHA
    86f9917 View commit details
    Browse the repository at this point in the history
  9. Reimplement Config.dumpToUri to use ButlerURI.write

    This removes all the special casing for dumpToFile and
    dumpToS3 in Config.
    timj committed Jul 24, 2020
    Configuration menu
    Copy the full SHA
    a20c8a4 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    6164272 View commit details
    Browse the repository at this point in the history
  11. Add ButlerURI.mkdir

    timj committed Jul 24, 2020
    Configuration menu
    Copy the full SHA
    4c9227e View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    38d9e32 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    8894565 View commit details
    Browse the repository at this point in the history
  14. Rewrite S3 ingest to use ButlerURI

    Removes a lot of code.
    timj committed Jul 24, 2020
    Configuration menu
    Copy the full SHA
    f9bef58 View commit details
    Browse the repository at this point in the history
  15. Fix ButlerURI.relative_to for schemeless vs file URIs

    Absolute paths with a relative path should now work.
    timj committed Jul 24, 2020
    Configuration menu
    Copy the full SHA
    4602a74 View commit details
    Browse the repository at this point in the history
  16. Use uri relative_to method in posix datastore

    At some point we should go through and require that
    datastore.root be a URI and not allow it to sort of
    be a file.
    timj committed Jul 24, 2020
    Configuration menu
    Copy the full SHA
    94190dd View commit details
    Browse the repository at this point in the history

Commits on Jul 25, 2020

  1. Configuration menu
    Copy the full SHA
    53acda9 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    15a4ca2 View commit details
    Browse the repository at this point in the history
  3. Fix mistaken realpath read

    Calling realpath on the destination for the transfer
    is never the right thing to do. It was harmless when the
    destination did not exist but in cases where the destination
    existed and was a softlink that ended up somewhere else
    it completely moved the output location.
    timj committed Jul 25, 2020
    Configuration menu
    Copy the full SHA
    d77ca4e View commit details
    Browse the repository at this point in the history
  4. Use local tempdir to work around relsymlink problem

    on macOS the /var folder is a link to /private/var which
    means that when you create a temp directory in a test
    and put a file in it, once you readpath you end up
    in a completely different location that is nowhere
    near the test file that you are trying to use
    with relsymlink.  Using a local tempdir fixes
    this anomaly.
    timj committed Jul 25, 2020
    Configuration menu
    Copy the full SHA
    209137b View commit details
    Browse the repository at this point in the history
  5. Add transaction and move support to ButlerURI

    This enables posixDatastore to use it.
    timj committed Jul 25, 2020
    Configuration menu
    Copy the full SHA
    c456d35 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    e0b62c3 View commit details
    Browse the repository at this point in the history
  7. Add comment on full path for S3

    timj committed Jul 25, 2020
    Configuration menu
    Copy the full SHA
    e6a2a62 View commit details
    Browse the repository at this point in the history
  8. Improve logging on rollback

    timj committed Jul 25, 2020
    Configuration menu
    Copy the full SHA
    0f5e3ca View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    c9181fb View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    f6b8cb5 View commit details
    Browse the repository at this point in the history
  11. Add URI quoting and unquoting

    files containing ? were breaking because urllib.parse would
    treat this as a URI with query parameters. We therefore need
    to quote to protect this prior to parsing in some cases and
    also unquote when referring to local file resources.
    timj committed Jul 25, 2020
    Configuration menu
    Copy the full SHA
    dba3007 View commit details
    Browse the repository at this point in the history

Commits on Jul 29, 2020

  1. Configuration menu
    Copy the full SHA
    e8054f8 View commit details
    Browse the repository at this point in the history
  2. Make it explicit that symlink and relsymlink are being tested

    Rather than looking for a substring which is easy to miss
    at a glance.
    timj committed Jul 29, 2020
    Configuration menu
    Copy the full SHA
    edec089 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    da50958 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    e8cbcc4 View commit details
    Browse the repository at this point in the history
  5. Fail if we do not recognize the URI scheme.

    This required that we also add mem:// support for in-memory
    datastores.
    timj committed Jul 29, 2020
    Configuration menu
    Copy the full SHA
    6463d05 View commit details
    Browse the repository at this point in the history