Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align and test handling of directories in S3_File #9704

Open
12 tasks
radeusgd opened this issue Apr 15, 2024 · 1 comment
Open
12 tasks

Align and test handling of directories in S3_File #9704

radeusgd opened this issue Apr 15, 2024 · 1 comment
Assignees
Labels
-libs Libraries: New libraries to be implemented l-writedata x-new-feature Type: new feature request

Comments

@radeusgd
Copy link
Member

radeusgd commented Apr 15, 2024

All of our file APIs have a create_directory method, except for S3_File.

This is not surprising, because S3 does not really have a concept of directories like most filesystems. Just creating a file with path a/b/c/f.txt makes directories a/, a/b/ and a/b/c/ existent.

The AWS Console is using a 'trick' to be able to create an empty directory. To create a directory a/b/c/ it creates an empty object with key a/b/c/ (with the trailing slash) to indicate the directory exists. We can likely replicate this behaviour as well.

All operations should still work without having to explicitly create the parent directories, like one would need to in regular filesystems. The user should not need to care about these with S3, but at the same time if the user wants to explicitly create an empty directory - that should also be possible with the trick mentioned above.

  • Add create_directory method to S3_File.
    • It should only work if self.is_directory is true. For path a/b/c/ it should only create one key a/b/c/, it should not create additional keys along the path - e.g. a/b/ - these directories will start to exist 'virtually' by the fact of the child key existing.
  • Add tests for handling directory logic.
    • After explicitly creating a directory with create_directory, it should yield exists == True.
    • It still should be possible to write a nested file without explicitly creating all parent directories.
    • Such parent directories should also yield exists == True even if not explicitly created.
    • Test that both explicitly and implicitly created directories show up in the same way in S3_File.list.
    • Test that if the directory was created explicitly, it still exists after all files created inside of it are deleted.
    • Test that if the directory was created implicitly, it automatically disappears once all files inside of it are deleted.
    • Add tests for delete with recursive flag, consistent with local and Cloud variants.
      • delete recursive=False should fail if there are any files or subdirectories inside of a given directory, but should succeed if the directory is empty and exists only because it was explicitly created with create_directory.
      • delete recursive=True should always delete the directory entry (if it exists) and any files inside of it.
  • Check how this integrates with S3's "directory" buckets.
    • Will this work out of the box? Do we need something else to improve the integration there? How can we detect the bucket type?
    • If needed, create a followup ticket.
@radeusgd radeusgd self-assigned this Apr 15, 2024
@radeusgd radeusgd added x-new-feature Type: new feature request -libs Libraries: New libraries to be implemented l-writedata labels Apr 15, 2024
@radeusgd
Copy link
Member Author

As reference, a worthwhile discussion of the edge cases around S3 virtual 'directory' can be found here: fsspec/s3fs#401

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-libs Libraries: New libraries to be implemented l-writedata x-new-feature Type: new feature request
Projects
Status: 📤 Backlog
Development

No branches or pull requests

1 participant