Skip to content

feat(utils): implement LakeFileSystem for data lake operations and add documentation#64

Merged
TordAreStromsnes merged 6 commits intomainfrom
feat/lakefilesystem
Feb 4, 2026
Merged

feat(utils): implement LakeFileSystem for data lake operations and add documentation#64
TordAreStromsnes merged 6 commits intomainfrom
feat/lakefilesystem

Conversation

@TordAreStromsnes
Copy link
Contributor

Creating a unified, path-agnostic interface for file operations on Azure Data Lake Storage, designed for use in Databricks pipelines. The implementation abstracts away environment differences (local vs. Databricks), supports text and JSON file operations, and includes robust error handling. The module is documented and covered by tests, and the required dependency on fsspec is added.

New Lake Module Implementation

  • Added the LakeFileSystem class, offering unified methods for reading, writing, and deleting text and JSON files, as well as directory operations, without assuming any path normalization or mount conventions.
  • Implemented supporting modules for backend detection (backend.py), directory operations (directory.py), and text/JSON I/O (text_io.py), using fsspec for filesystem abstraction.

Documentation and Testing

  • Provided comprehensive documentation for the new module, including usage examples, API reference, and integration notes for pipeline configuration.

  • Added a test suite covering all major LakeFileSystem operations.
    Project Configuration

  • Declared fsspec as a required dependency in pyproject.toml.

  • Updated documentation navigation to include the new Lake module overview.

@TordAreStromsnes TordAreStromsnes changed the title feat(lake): implement LakeFileSystem for data lake operations and add documentation feat(utils): implement LakeFileSystem for data lake operations and add documentation Feb 2, 2026
@TordAreStromsnes TordAreStromsnes marked this pull request as draft February 2, 2026 14:10
@TordAreStromsnes TordAreStromsnes marked this pull request as ready for review February 4, 2026 08:21
Copy link
Contributor

@leefw leefw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of string manipulation for path specific functionality. Should consider pathlib to help.

@TordAreStromsnes
Copy link
Contributor Author

@leefw used both pathlib with pureposixpath and also used some internal method in fsspec to remove th e get_parent-method

@TordAreStromsnes TordAreStromsnes merged commit be9e738 into main Feb 4, 2026
7 checks passed
@TordAreStromsnes TordAreStromsnes deleted the feat/lakefilesystem branch February 4, 2026 10:37
TordAreStromsnes pushed a commit that referenced this pull request Feb 4, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.5.0](dataorc-utils-v0.4.0...dataorc-utils-v0.5.0)
(2026-02-04)


### Features

* introduce dictionary functionality for environment variables access
([#57](#57))
([b6291fa](b6291fa))
* **utils:** implement LakeFileSystem for data lake operations and add
documentation ([#64](#64))
([be9e738](be9e738))
* **utils:** support optional revision suffix in version format and
update tests ([#59](#59))
([8ea0b60](8ea0b60))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
TordAreStromsnes pushed a commit that referenced this pull request Feb 5, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.2.0](dataorc-v0.1.1...dataorc-v0.2.0)
(2026-02-05)


### Features

* add Azure Key Vault support and documentation
([#42](#42))
([abc42a0](abc42a0))
* create config tool for pipeline setup
([#22](#22))
([21a8a84](21a8a84))
* introduce dictionary functionality for environment variables access
([#57](#57))
([b6291fa](b6291fa))
* mount data lake ([#31](#31))
([0bb3e51](0bb3e51))
* **utils:** add argument parsing helper for Databricks wheel tasks
([#43](#43))
([393c6a2](393c6a2))
* **utils:** add retry logic and customizable parameters for
get_keyvault_secret
([#63](#63))
([acbc2b7](acbc2b7))
* **utils:** implement LakeFileSystem for data lake operations and add
documentation ([#64](#64))
([be9e738](be9e738))
* **utils:** support optional revision suffix in version format and
update tests ([#59](#59))
([8ea0b60](8ea0b60))
* **utils:** treat env as plain string and default to "dev"
([#50](#50))
([65473a8](65473a8))


### Documentation

* add changelog tab
([#20](#20))
([2ec4271](2ec4271))
* add CI status badge
([#9](#9))
([8de41fe](8de41fe))
* add contributing guidelines
([#15](#15))
([434cf31](434cf31))
* add developing instructions
([#33](#33))
([835a35e](835a35e))
* add early development phase warning
([#39](#39))
([406746d](406746d))
* bootstrap package ([#6](#6))
([afbb765](afbb765))
* build docs using uv
([#36](#36))
([15a1125](15a1125))
* initialize documentation structure
([#8](#8))
([0adb45d](0adb45d))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants