feat(utils): implement LakeFileSystem for data lake operations and add documentation#64
Merged
TordAreStromsnes merged 6 commits intomainfrom Feb 4, 2026
Merged
Conversation
… streamline LakeFileSystem operations
leefw
reviewed
Feb 4, 2026
Contributor
leefw
left a comment
There was a problem hiding this comment.
A lot of string manipulation for path specific functionality. Should consider pathlib to help.
Contributor
Author
|
@leefw used both pathlib with pureposixpath and also used some internal method in fsspec to remove th e get_parent-method |
leefw
approved these changes
Feb 4, 2026
This was referenced Feb 4, 2026
TordAreStromsnes
pushed a commit
that referenced
this pull request
Feb 4, 2026
🤖 I have created a release *beep* *boop* --- ## [0.5.0](dataorc-utils-v0.4.0...dataorc-utils-v0.5.0) (2026-02-04) ### Features * introduce dictionary functionality for environment variables access ([#57](#57)) ([b6291fa](b6291fa)) * **utils:** implement LakeFileSystem for data lake operations and add documentation ([#64](#64)) ([be9e738](be9e738)) * **utils:** support optional revision suffix in version format and update tests ([#59](#59)) ([8ea0b60](8ea0b60)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
TordAreStromsnes
pushed a commit
that referenced
this pull request
Feb 5, 2026
🤖 I have created a release *beep* *boop* --- ## [0.2.0](dataorc-v0.1.1...dataorc-v0.2.0) (2026-02-05) ### Features * add Azure Key Vault support and documentation ([#42](#42)) ([abc42a0](abc42a0)) * create config tool for pipeline setup ([#22](#22)) ([21a8a84](21a8a84)) * introduce dictionary functionality for environment variables access ([#57](#57)) ([b6291fa](b6291fa)) * mount data lake ([#31](#31)) ([0bb3e51](0bb3e51)) * **utils:** add argument parsing helper for Databricks wheel tasks ([#43](#43)) ([393c6a2](393c6a2)) * **utils:** add retry logic and customizable parameters for get_keyvault_secret ([#63](#63)) ([acbc2b7](acbc2b7)) * **utils:** implement LakeFileSystem for data lake operations and add documentation ([#64](#64)) ([be9e738](be9e738)) * **utils:** support optional revision suffix in version format and update tests ([#59](#59)) ([8ea0b60](8ea0b60)) * **utils:** treat env as plain string and default to "dev" ([#50](#50)) ([65473a8](65473a8)) ### Documentation * add changelog tab ([#20](#20)) ([2ec4271](2ec4271)) * add CI status badge ([#9](#9)) ([8de41fe](8de41fe)) * add contributing guidelines ([#15](#15)) ([434cf31](434cf31)) * add developing instructions ([#33](#33)) ([835a35e](835a35e)) * add early development phase warning ([#39](#39)) ([406746d](406746d)) * bootstrap package ([#6](#6)) ([afbb765](afbb765)) * build docs using uv ([#36](#36)) ([15a1125](15a1125)) * initialize documentation structure ([#8](#8)) ([0adb45d](0adb45d)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Creating a unified, path-agnostic interface for file operations on Azure Data Lake Storage, designed for use in Databricks pipelines. The implementation abstracts away environment differences (local vs. Databricks), supports text and JSON file operations, and includes robust error handling. The module is documented and covered by tests, and the required dependency on
fsspecis added.New Lake Module Implementation
LakeFileSystemclass, offering unified methods for reading, writing, and deleting text and JSON files, as well as directory operations, without assuming any path normalization or mount conventions.backend.py), directory operations (directory.py), and text/JSON I/O (text_io.py), usingfsspecfor filesystem abstraction.Documentation and Testing
Provided comprehensive documentation for the new module, including usage examples, API reference, and integration notes for pipeline configuration.
Added a test suite covering all major
LakeFileSystemoperations.Project Configuration
Declared
fsspecas a required dependency inpyproject.toml.Updated documentation navigation to include the new Lake module overview.