Skip to content

Conversation

adamrtalbot
Copy link
Collaborator

@adamrtalbot adamrtalbot commented Sep 25, 2025

  • Refactor AzFileAttributes to use the hdi_isfolder metadata key as the primary indicator for directories.
  • If there is a trailing slash, determine it is a directory
  • If hdi_isfolder metadata is not present or blob does not exist, it will list blobs at path. If there are "sub blobs", it will determine it is a directory.
  • Add and update tests to cover edge cases for directory detection.
  • Improves compatibility with Azure Data Lake Storage Gen2 and hierarchical namespace scenarios.

Fixes #6427

Signed-off-by: adamrtalbot 12817534+adamrtalbot@users.noreply.github.com

Hi! Thanks for contributing to Nextflow.

When submitting a Pull Request, please sign-off the DCO [1] to certify that you are the author of the contribution and you adhere to Nextflow's open source license [2] by adding a Signed-off-by line to the contribution commit message. See [3] for more details.

  1. https://developercertificate.org/
  2. https://github.com/nextflow-io/nextflow/blob/master/COPYING
  3. https://github.com/apps/dco

Note

Cursor Bugbot is generating a summary for commit 4dfec66. Configure here.

…roperties

- Refactor AzFileAttributes to use the `hdi_isfolder` metadata key as the primary indicator for directories.
- Fallback to blob name ending with '/' only if metadata is not present.
- Add and update tests to cover edge cases for directory detection.
- Improves compatibility with Azure Data Lake Storage Gen2 and hierarchical namespace scenarios.

Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
Copy link

netlify bot commented Sep 25, 2025

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit b90fab7
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/68dac4f55e26bd000818eb0b

cursor[bot]

This comment was marked as outdated.

- Improved logic to determine if a blob is a directory by checking for the "hdi_isfolder" metadata key and its value.
- If the blob is a directory, set size to 0 and mark as directory.
- If not, treat as file and set size and timestamps from blob properties.
- This ensures compatibility with Azure's explicit directory markers and avoids misclassification.

Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
cursor[bot]

This comment was marked as outdated.

Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
cursor[bot]

This comment was marked as outdated.

…adata

Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>

AzFileAttributes(String containerName, BlobItem item) {
objectId = "/${containerName}/${item.name}"
directory = item.name.endsWith('/')
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bentsherman what will break if we incorrectly classify something as a file when it's a pseudo-directory? What's the impact of getting it wrong?

cursor[bot]

This comment was marked as outdated.

…tory bug (#6427)

Add a testingest to document and demonstrate the bug in AzFileAttributes
where any blobName ending with '/' is treated as a directory, regardless
of actual Azure semantics. This test will help verify and prevent
regressions when fixing the directory detection logic.

Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
…directory

- Only treat paths ending with a trailing slash as directories (e.g. `/container/dir/`).
- Paths without a trailing slash (even if they exist as directories in Azure) are treated as files unless Azure metadata explicitly marks them as folders.
- This fixes inconsistent behavior in Nextflow pipelines where `file("az://container/dir")` and `file("az://container/dir/")` were detected differently.
- Updates `AzFileAttributes` logic and adds/updates tests to ensure correct directory detection for all path formats.

Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
}
}

return false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Azure API Calls Cause Performance Issues

The isDirectory() method now makes multiple uncached Azure API calls (exists(), getProperties(), listBlobs()) on every invocation when the directory field is false. This causes a significant performance regression. It also has a logic flaw that can lead to incorrect directory detection and lacks exception handling for these network calls.

Fix in Cursor Fix in Web

}
}

return false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Trailing Slash Bug Affects Directory Detection

The AzPath constructor's handling of trailing slashes causes isDirectory() to return early, bypassing its new Azure-querying logic for directory detection. This also creates unreachable code in the AzFileAttributes constructor and leads to conflicting test expectations regarding paths ending with a slash.

Additional Locations (4)

Fix in Cursor Fix in Web

Enhance the logic for determining if a blob represents a directory in
Azure. Now considers both the `hdi_isfolder` metadata and the presence
of child blobs, providing more accurate directory status detection.
Updates related tests to reflect the improved behavior.

Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
@adamrtalbot
Copy link
Collaborator Author

I have a branch that breaks out these functions into a separate AzDirectoryUtils class for DRY...too much complexity?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Azure path incorrectly determined to be directory by trailing slash
2 participants