From 17079d24f680a86ae317ea61a8be52dff7d3743b Mon Sep 17 00:00:00 2001 From: jorgee Date: Thu, 23 Apr 2026 15:25:34 +0200 Subject: [PATCH 1/6] initial implementation of datalinks in seqera fs Signed-off-by: jorgee --- adr/20260422-seqera-datalinks-filesystem.md | 203 ++ .../datalink/PagedDataLinkContent.groovy | 104 + .../datalink/SeqeraDataLinkClient.groovy | 257 ++ .../plugin/fs/ResourceTypeHandler.groovy | 56 + .../plugin/fs/SeqeraFileAttributes.groovy | 66 +- .../tower/plugin/fs/SeqeraFileSystem.groovy | 109 +- .../plugin/fs/SeqeraFileSystemProvider.groovy | 213 +- .../seqera/tower/plugin/fs/SeqeraPath.groovy | 326 +-- .../handler/DataLinksResourceHandler.groovy | 211 ++ .../fs/handler/DatasetsResourceHandler.groovy | 165 ++ .../datalink/SeqeraDataLinkClientTest.groovy | 243 ++ .../fs/ResourceTypeAbstractionTest.groovy | 89 + .../fs/SeqeraFileSystemProviderTest.groovy | 54 +- .../plugin/fs/SeqeraFileSystemTest.groovy | 64 +- .../tower/plugin/fs/SeqeraPathTest.groovy | 203 +- .../DataLinksResourceHandlerTest.groovy | 338 +++ .../DatasetsResourceHandlerTest.groovy | 201 ++ specs/260422-seqera-datalinks-fs/plan.md | 259 ++ specs/260422-seqera-datalinks-fs/spec.md | 182 ++ specs/260422-seqera-datalinks-fs/tasks.md | 2466 +++++++++++++++++ 20 files changed, 5270 insertions(+), 539 deletions(-) create mode 100644 adr/20260422-seqera-datalinks-filesystem.md create mode 100644 plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/PagedDataLinkContent.groovy create mode 100644 plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy create mode 100644 plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/ResourceTypeHandler.groovy create mode 100644 plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy create mode 100644 plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandler.groovy create mode 100644 plugins/nf-tower/src/test/io/seqera/tower/plugin/datalink/SeqeraDataLinkClientTest.groovy create mode 100644 plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/ResourceTypeAbstractionTest.groovy create mode 100644 plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy create mode 100644 plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandlerTest.groovy create mode 100644 specs/260422-seqera-datalinks-fs/plan.md create mode 100644 specs/260422-seqera-datalinks-fs/spec.md create mode 100644 specs/260422-seqera-datalinks-fs/tasks.md diff --git a/adr/20260422-seqera-datalinks-filesystem.md b/adr/20260422-seqera-datalinks-filesystem.md new file mode 100644 index 0000000000..30590a183f --- /dev/null +++ b/adr/20260422-seqera-datalinks-filesystem.md @@ -0,0 +1,203 @@ +# NIO Filesystem Support for Seqera Platform Data-Links + +- Authors: Jorge Ejarque +- Status: draft +- Date: 2026-04-22 +- Tags: nio, filesystem, seqera, data-links, nf-tower + +Technical Story: Extend the `seqera://` NIO filesystem (introduced by [20260310-seqera-dataset-filesystem](20260310-seqera-dataset-filesystem.md)) to address files and directories inside Seqera Platform data-links without requiring cloud-provider credentials or SDK integration. + +## Summary + +Add a second resource type (`data-links`) to the existing `seqera://` filesystem in the `nf-tower` plugin. Paths of the form `seqera:////data-links///` resolve to files and directories inside a Platform-managed data-link. Listings and attribute queries are served by the Platform's `/data-links/{id}/content` endpoint; byte reads go through pre-signed URLs returned by `/data-links/{id}/download`. Only the Seqera access token is required — no AWS/GCP/Azure credentials, no cloud SDK dependency. + +As part of this change, the existing dataset-specific logic in `SeqeraFileSystemProvider`, `SeqeraFileSystem`, and `SeqeraPath` is extracted into a `ResourceTypeHandler` abstraction, so the two resource types coexist behind a common contract. + +## Problem Statement + +The dataset filesystem ships `seqera://` URI support for Platform datasets, but datasets are only one of several file-like resources users manage on the Platform. Data-links are the most common — they reference a cloud bucket or prefix (S3, GCS, Azure Blob) with potentially large, nested content. Today, a pipeline that needs to read a file inside a data-link must: + +1. Look up the data-link's underlying URI outside Nextflow. +2. Configure cloud credentials in the compute environment (AWS access keys, GCP service account, Azure SAS, etc.). +3. Reference the object by its cloud URI. + +This is friction the Platform already solves: data-links are scoped, ACL-controlled entities, and the Platform knows how to broker access to their content. A `seqera://` URI for a path inside a data-link would let pipelines consume Platform-managed data with only the Seqera access token — no cloud SDK, no credential sprawl. + +## Goals or Decision Drivers + +- Native `seqera://` access to files and directories inside Platform data-links, at arbitrary depth. +- Zero cloud-provider credential configuration — the Seqera access token is the only auth surface. +- No new runtime dependency on cloud SDKs (`aws-sdk`, `google-cloud-storage`, `azure-*`). +- Reuse of existing nf-tower plugin infrastructure — `TowerClient` for HTTP + auth + retry, tower-api DTOs for wire types. +- Introduce a `ResourceTypeHandler` abstraction so the dataset and data-link behaviors share one filesystem without leaking into each other. +- Preserve Platform-side access control for listings and metadata (not just reads). + +## Non-goals + +- Write operations to data-links (upload). The Platform's `POST /data-links/{id}/multipart-upload` is a future hook; it is not implemented in this iteration. +- Data-link management (create/update/delete the data-link entity itself). +- Transparent pre-signed URL renewal when a URL expires mid-stream — failures surface as `IOException` and Nextflow task retry handles them. +- Browse-result caching within a run. +- Fusion integration — Fusion has its own data-link access path. + +## Considered Options + +### Option 1: Platform-brokered credentials + cloud SDK delegation + +For each read, call the Platform to obtain short-lived AWS/GCP/Azure credentials scoped to the data-link, then use the existing `nf-amazon` / `nf-google` / `nf-azure` providers for the actual I/O. + +- Good, because cloud providers handle streaming, range reads, multi-part efficiently. +- Bad, because it requires `nf-tower` to depend on (or coordinate with) three cloud plugins. +- Bad, because credential plumbing across plugin boundaries is complex — each cloud plugin has its own credential object model. +- Bad, because it adds failure modes around credential refresh windows crossing long reads. + +### Option 2: Pre-signed URL + direct HTTPS fetch + +Call the Platform's `GET /data-links/{id}/download?path=` endpoint to obtain a pre-signed URL; stream bytes through the existing `TowerClient` / `HxClient` HTTPS path. + +- Good, because there is no cloud SDK dependency — all I/O is generic HTTPS. +- Good, because the Platform is the only credential surface (user token goes in, signed URL comes out; credentials never cross our process boundary as a distinct object). +- Good, because it uniformly supports every provider the Platform supports — now and in the future — with no per-provider code. +- Good, because `TowerClient.sendStreamingRequest()` already exists and has the retry/backoff semantics we want. +- Bad, because pre-signed URLs have time windows; a very long read can outlive its URL. Acceptable: Nextflow task retry handles the failure. +- Bad, because range reads / multi-part reads are not implemented in this iteration. Acceptable: datasets are already single-shot reads and the pattern matches. + +### Option 3: Proxy all bytes through the Platform + +Route all reads through a Platform endpoint that streams content back to the client from the underlying cloud. + +- Good, because the Platform sees and can log every byte. +- Bad, because it imposes Platform bandwidth/egress cost on every pipeline byte. +- Bad, because no such primary endpoint is offered — `/download` returns a URL, not bytes. + +## Pros and Cons of the Options + +See above. + +## Solution or decision outcome + +Option 2 — pre-signed URL + direct HTTPS fetch. All data-link byte I/O goes through the Platform's `/download` endpoint and `TowerClient.sendStreamingRequest()`. The plugin never touches a cloud SDK and never holds a long-lived cloud credential. + +Extend the `fs/` package with a real `ResourceTypeHandler` abstraction. Extract the existing dataset logic into a `DatasetsResourceHandler`. Add `DataLinksResourceHandler` as the second implementation. + +## Rationale & discussion + +### Path Hierarchy + +The `seqera://` path gains a second resource-type branch: + +``` +seqera:// → ROOT (directory, depth 0) + └── / → ORGANIZATION (directory, depth 1) + └── / → WORKSPACE (directory, depth 2) + ├── datasets/ → RESOURCE TYPE (directory, depth 3) + │ └── [@] → DATASET (file, depth 4) + └── data-links/ → RESOURCE TYPE (directory, depth 3) + └── / → PROVIDER (directory, depth 4) + └── / → DATA-LINK (directory, depth 5) + └── /…/ → CONTENT (directory or file, depth 6+) +``` + +Three structural differences from datasets: + +1. **Two identity segments** (`/`) instead of one (``). Provider disambiguation is required because a workspace can host two data-links with the same name on different clouds. +2. **Arbitrary sub-path depth** below the data-link root. Each segment is a folder or file inside the underlying bucket. +3. **No version pinning** — data-link content is not versioned by the Platform. Content is always "current". + +`ResourceTypeHandler.getIdentitySegmentCount()` encodes the difference: 1 for datasets, 2 for data-links. `SeqeraPath` treats everything after the identity segments as the handler-owned sub-path. + +### Component Structure + +``` +plugins/nf-tower/src/main/io/seqera/tower/plugin/ +├── fs/ ← generic NIO layer (refactored) +│ ├── SeqeraFileSystemProvider ← dispatches by resourceType to handler +│ ├── SeqeraFileSystem ← org/ws cache + handler registry +│ ├── SeqeraPath ← generic segment list (identity + sub-path) +│ ├── SeqeraFileAttributes ← plain (isDir, size, lastModified) holder +│ ├── SeqeraPathFactory ← unchanged +│ ├── DatasetInputStream ← unchanged +│ ├── ResourceTypeHandler ← NEW interface +│ └── handler/ +│ ├── DatasetsResourceHandler ← NEW — dataset logic extracted here +│ └── DataLinksResourceHandler ← NEW +├── dataset/ +│ └── SeqeraDatasetClient ← unchanged +└── datalink/ ← NEW + └── SeqeraDataLinkClient ← typed client over TowerClient + returns io.seqera.tower.model.* directly +``` + +No plugin-local DTO classes are introduced. `DataLinkDto`, `DataLinkContentResponse`, `DataLinkItem`, `DataLinkDownloadUrlResponse`, `DataLinkProvider` and related types are reused from `io.seqera:tower-api:1.121.0`. + +### `ResourceTypeHandler` contract + +``` +interface ResourceTypeHandler { + String getResourceType() // "datasets" / "data-links" + int getIdentitySegmentCount() // 1 / 2 + List list(SeqeraPath dir) throws IOException + SeqeraFileAttributes readAttributes(SeqeraPath p) throws IOException + InputStream newInputStream(SeqeraPath p) throws IOException + void checkAccess(SeqeraPath p, AccessMode... modes) throws IOException +} +``` + +`SeqeraFileSystemProvider` owns dispatch at depth ≥ 3. Depth 0–2 (root/org/workspace) remains in `SeqeraFileSystem`, shared across all handlers. At depth 3 (the workspace listing returns the resource-type children), the handler registry is enumerated — `datasets` and `data-links` are the two entries today, added automatically by the provider at `newFileSystem()` time. + +### API Usage Summary (Data-Links) + +| NIO operation | Platform endpoint | Notes | +| -------------------------------------------------------------- | ------------------------------------------ | ------------------------------------------------------------------------------- | +| list data-links in workspace (resolution + depth-4/5 listings) | `GET /data-links?workspaceId=X` | cached per workspace; pagination exhausted | +| `newDirectoryStream(dir)` at depth ≥ 5 | `GET /data-links/{id}/content?path=` | items array → child paths | +| `readAttributes(path)` inside a data-link | `GET /data-links/{id}/content?path=` | single targeted call; file vs folder from response | +| `newInputStream(file)` | `GET /data-links/{id}/download?path=` | parse `DataLinkDownloadUrlResponse.url`; fetch with plain JDK `HttpClient` (no Seqera auth header — the URL is signed for the cloud backend) | + +### Key Design Decisions + +1. **TowerClient delegation for Platform calls**: `SeqeraDataLinkClient` routes all Seqera API calls (list, content, download-URL) through `TowerClient.sendApiRequest()`, sharing authentication state with the dataset client. The pre-signed URL itself is fetched directly with a plain JDK `HttpClient` — no Seqera headers are sent to the cloud backend. + +2. **Pre-signed URLs, not credential brokering**: the Platform returns a URL that already has the auth embedded. No AWS/GCP/Azure SDK is imported; no credential object crosses the plugin boundary. This is the single biggest simplification relative to a "get creds, hand to cloud plugin" approach. + +3. **No per-stream URL renewal**: if a signed URL expires mid-read, the HTTP connection errors and the `InputStream` surfaces an `IOException`. Nextflow task retry handles the failure as it does for any other transient read failure. The plugin does not implement transparent re-issuance. + +4. **Provider disambiguation in the path**: the data-link identity is `(workspace, provider, name)` on the Platform side. The path segment layout mirrors this to avoid ambiguity when names collide across providers. + +5. **Reuse tower-api DTOs**: every wire type is an `io.seqera.tower.model.*` class already on the plugin's classpath via `tower-api:1.121.0`. No parallel plugin-local DTOs. + +6. **Handler registry at construction, not via PF4J**: handlers are instantiated in `SeqeraFileSystemProvider.newFileSystem()`. Adding a third resource type is a code change to this plugin, identical in shape to the dataset/data-link pair. No extension-point protocol is introduced — YAGNI. + +7. **`readAttributes` is single-target**: because `GET /data-links/{id}/content?path=` accepts both directory and file paths, a file-level `readAttributes` is one API call — not a parent browse plus filter. No N+1 problem; no browse cache needed. + +8. **Read-only stance preserved**: `SeqeraFileSystem.isReadOnly()` remains `true`. Write operations on data-links raise `UnsupportedOperationException`. The `/data-links/{id}/multipart-upload` endpoint is a future extension point. + +9. **Minimal cache**: only the workspace-level data-link list is cached. No browse-result cache. No URL cache. Simpler to reason about; consistent with the dataset handler's cache shape. + +### Refactor Delivered by This Change + +Adding a second resource type requires a shared abstraction in the `fs/` package so the two behaviors do not collide: + +- The `ResourceTypeHandler` interface is introduced. +- All dataset-specific logic previously inlined in `SeqeraFileSystemProvider`, `SeqeraFileSystem`, and `SeqeraPath` is moved to a new `DatasetsResourceHandler`. +- `DataLinksResourceHandler` is added alongside it, implementing the same interface. +- The generic classes (`SeqeraFileSystemProvider`, `SeqeraFileSystem`, `SeqeraPath`) become resource-type-agnostic for depth ≥ 3 — they dispatch to handlers and carry no knowledge of either resource's semantics. + +The existing dataset test suite continues to pass unchanged; every dataset code path is routed through `DatasetsResourceHandler` without behavioral change. + +### Limitations + +- **No write support for data-links in this iteration.** Upload paths must continue to use Fusion or direct cloud-SDK access until a follow-up adds the multipart-upload handler. +- **Signed URL expiration is not handled transparently.** Very long reads may outlive the URL's validity window. +- **No transparent pagination of data-link entries inside a single directory.** The browse API's pagination (if any) must be exhausted; for very large directories this may be slower than direct cloud listings. +- **Single endpoint per JVM** (unchanged from dataset ADR): concurrent access to multiple Platform endpoints in one JVM is not supported. + +## Links + +- [Spec](../specs/260422-seqera-datalinks-fs/spec.md) +- Extends [20260310-seqera-dataset-filesystem](20260310-seqera-dataset-filesystem.md) + +## More information + +- [Seqera Platform OpenAPI spec](https://cloud.seqera.io/openapi/seqera-api-latest.yml) — `/data-links` endpoints. +- [What is an ADR and why should you use them](https://github.com/thomvaill/log4brains/tree/master#-what-is-an-adr-and-why-should-you-use-them) diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/PagedDataLinkContent.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/PagedDataLinkContent.groovy new file mode 100644 index 0000000000..32c10559cd --- /dev/null +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/PagedDataLinkContent.groovy @@ -0,0 +1,104 @@ +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.datalink + +import groovy.transform.CompileStatic +import io.seqera.tower.model.DataLinkItem + +/** + * Lazy, paginated view over a data-link's content. + * + * The first page is fetched eagerly by the producer so callers can inspect + * {@link #getOriginalPath()} and {@link #getFirstPage()} without triggering + * additional HTTP calls. Iterating yields items from the first page followed + * by subsequent pages fetched on demand via the injected page fetcher. + */ +@CompileStatic +class PagedDataLinkContent implements Iterable { + + /** + * Opaque page fetcher. Given a {@code nextPageToken}, returns the next page + * as a map with keys {@code objects} ({@code List}) and + * {@code nextPageToken} ({@code String}, null if no more pages). + */ + static interface PageFetcher { + Map fetch(String nextPageToken) throws IOException + } + + private final String originalPath + private final List firstPage + private final String firstPageNextToken + private final PageFetcher pageFetcher + + PagedDataLinkContent(String originalPath, + List firstPage, + String firstPageNextToken, + PageFetcher pageFetcher) { + this.originalPath = originalPath + this.firstPage = firstPage ?: Collections.emptyList() + this.firstPageNextToken = firstPageNextToken + this.pageFetcher = pageFetcher + } + + String getOriginalPath() { originalPath } + + /** First page, loaded eagerly — bounded in size by the server's page size. */ + List getFirstPage() { Collections.unmodifiableList(firstPage) } + + boolean isEmpty() { firstPage.isEmpty() && !firstPageNextToken } + + @Override + Iterator iterator() { + return new PagedIterator(firstPage, firstPageNextToken, pageFetcher) + } + + /** Lazy iterator that paginates on demand. */ + @CompileStatic + private static class PagedIterator implements Iterator { + private Iterator current + private String nextToken + private final PageFetcher fetcher + + PagedIterator(List firstPage, String firstPageNextToken, PageFetcher fetcher) { + this.current = firstPage.iterator() + this.nextToken = firstPageNextToken + this.fetcher = fetcher + } + + @Override + boolean hasNext() { + while (!current.hasNext()) { + if (!nextToken) return false + try { + final page = fetcher.fetch(nextToken) + final items = (page?.objects ?: []) as List + current = items.iterator() + nextToken = page?.nextPageToken as String + } catch (IOException e) { + throw new RuntimeException(e) + } + } + return true + } + + @Override + DataLinkItem next() { + if (!hasNext()) throw new NoSuchElementException() + return current.next() + } + } +} diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy new file mode 100644 index 0000000000..b96a010388 --- /dev/null +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy @@ -0,0 +1,257 @@ +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.datalink + +import groovy.transform.Memoized + +import java.nio.file.AccessDeniedException +import java.nio.file.NoSuchFileException + +import groovy.json.JsonSlurper +import groovy.transform.CompileStatic +import groovy.util.logging.Slf4j +import io.seqera.tower.model.DataLinkDownloadUrlResponse +import io.seqera.tower.model.DataLinkDto +import io.seqera.tower.model.DataLinkItem +import io.seqera.tower.model.DataLinkItemType +import io.seqera.tower.model.DataLinkProvider +import io.seqera.tower.plugin.TowerClient +import nextflow.exception.AbortOperationException + +/** + * Typed client for Seqera Platform data-link API endpoints. + * + * Paginated endpoints return lazy iterators so callers don't materialize the + * full result set in memory — only the current page is buffered at any time. + */ +@Slf4j +@CompileStatic +class SeqeraDataLinkClient { + + private static final int LIST_PAGE_SIZE = 100 + + private final TowerClient towerClient + + SeqeraDataLinkClient(TowerClient towerClient) { + this.towerClient = towerClient + } + + private String getEndpoint() { towerClient.endpoint } + + /** + * Lazy iterator over every data-link in the workspace. + * Pages are fetched from {@code GET /data-links?workspaceId=&max=&offset=} + * on demand as the iterator advances. + */ + Iterator listDataLinks(long workspaceId) { + return new DataLinkListIterator(towerClient, endpoint, workspaceId, LIST_PAGE_SIZE) + } + + /** + * Resolve a data-link by {@code (provider, name)} in the given workspace. + * Iterates the API's list endpoint lazily and short-circuits on first match. + */ + @Memoized + DataLinkDto getDataLink(long workspaceId, String provider, String name) { + final Iterator it = new DataLinkListIterator(towerClient, endpoint, workspaceId, LIST_PAGE_SIZE, name) + while( it.hasNext() ) { + final dl = it.next() + if( dl.provider?.toString() == provider ) + return dl + } + throw new NoSuchFileException( + "seqera://.../data-links/${provider}/${name}", + null, + "Data-link '${name}' not found for provider '${provider}' in workspace '$workspaceId'") + } + + /** + * Browse the content of a data-link. + * The first page is fetched eagerly to populate metadata ({@code originalPath}, + * first-page items). Subsequent pages are fetched on demand as the returned + * {@link PagedDataLinkContent} is iterated. + * + * Endpoints: {@code GET /data-links/{id}/browse} (root) and + * {@code GET /data-links/{id}/browse/{path}} (sub-path). + */ + PagedDataLinkContent getContent(String dataLinkId, String subPath, long workspaceId) { + final pathSegment = subPath ? '/' + encodePath(subPath) : '' + final baseUrl = "${endpoint}/data-links/${encodePath(dataLinkId)}/browse${pathSegment}" + final page = fetchBrowsePage(baseUrl, workspaceId, null) + final firstItems = page.objects + final firstToken = page.nextPageToken + final originalPath = page.originalPath + final fetcher = new PagedDataLinkContent.PageFetcher() { + @Override + Map fetch(String token) throws IOException { + final next = fetchBrowsePage(baseUrl, workspaceId, token) + return [objects: next.objects, nextPageToken: next.nextPageToken] as Map + } + } + return new PagedDataLinkContent(originalPath, firstItems, firstToken, fetcher) + } + + /** {@code GET /data-links/{id}/generate-download-url?workspaceId=&filePath=} */ + DataLinkDownloadUrlResponse getDownloadUrl(String dataLinkId, String subPath, long workspaceId) { + final url = "${endpoint}/data-links/${encodePath(dataLinkId)}/generate-download-url?workspaceId=${workspaceId}&filePath=${encodeQuery(subPath ?: '')}" + log.debug "SeqeraDataLinkClient GET $url" + final resp = towerClient.sendApiRequest(url) + checkFsResponse(resp, url) + final json = new JsonSlurper().parseText(resp.message) as Map + final out = new DataLinkDownloadUrlResponse() + out.url = json.url as String + return out + } + + // ---- page-fetching helpers ---- + + /** Fetch one browse page and normalize it into a {@link BrowsePage}. */ + private BrowsePage fetchBrowsePage(String baseUrl, long workspaceId, String nextPageToken) { + String url = "${baseUrl}?workspaceId=${workspaceId}" + if (nextPageToken) url += "&nextPageToken=${encodeQuery(nextPageToken)}" + log.debug "SeqeraDataLinkClient GET $url" + final resp = towerClient.sendApiRequest(url) + checkFsResponse(resp, url) + final json = new JsonSlurper().parseText(resp.message) as Map + final items = (json.objects as List)?.collect { Map m -> mapItem(m) } ?: Collections.emptyList() + return new BrowsePage(json.originalPath as String, items, json.nextPageToken as String) + } + + @CompileStatic + private static class BrowsePage { + final String originalPath + final List objects + final String nextPageToken + + BrowsePage(String originalPath, List objects, String nextPageToken) { + this.originalPath = originalPath + this.objects = objects + this.nextPageToken = nextPageToken + } + } + + /** Lazy iterator for the {@code /data-links} list endpoint (offset pagination). */ + @CompileStatic + private static class DataLinkListIterator implements Iterator { + private final TowerClient towerClient + private final String endpoint + private final long workspaceId + private final int pageSize + private final String search + + private Iterator current = Collections.emptyIterator() + private int offset = 0 + private long total = -1L // unknown until first fetch + private boolean exhausted = false + + DataLinkListIterator(TowerClient towerClient, String endpoint, long workspaceId, int pageSize, String search = null) { + this.towerClient = towerClient + this.endpoint = endpoint + this.workspaceId = workspaceId + this.pageSize = pageSize + this.search = search + } + + @Override + boolean hasNext() { + while (!current.hasNext()) { + if (exhausted) return false + fetchNextPage() + } + return true + } + + @Override + DataLinkDto next() { + if (!hasNext()) throw new NoSuchElementException() + return current.next() + } + + private void fetchNextPage() { + final url = "${endpoint}/data-links?status=AVAILABLE&workspaceId=${workspaceId}&max=${pageSize}&offset=${offset}${search ? '&search='+ encodeQuery(search) :''}" + log.debug "Fetching next list of datalinks: GET $url" + final resp = towerClient.sendApiRequest(url) + checkFsResponse(resp, url) + final json = new JsonSlurper().parseText(resp.message) as Map + final items = (json.dataLinks as List)?.collect { Map m -> mapDataLink(m) } ?: Collections.emptyList() + current = items.iterator() + offset += items.size() + if (total < 0) total = (json.totalSize as Long) ?: 0L + if (items.isEmpty() || offset >= total) exhausted = true + } + } + + // ---- encoding / error mapping ---- + + /** URL-encode a path value while preserving {@code /} as path separators. */ + private static String encodePath(String s) { + new URI(null, null, s ?: '', null).rawPath ?: '' + } + + /** URL-encode a value intended for use as a query-string value. */ + private static String encodeQuery(String s) { + URLEncoder.encode(s ?: '', 'UTF-8') + } + + private static void checkFsResponse(TowerClient.Response resp, String url) { + if (!resp.error) return + final code = resp.code + if (code == 401) + throw new AbortOperationException("Seqera authentication failed — check tower.accessToken or TOWER_ACCESS_TOKEN") + if (code == 403) + throw new AccessDeniedException(url, null, "Forbidden — check workspace permissions") + if (code == 404) + throw new NoSuchFileException(url) + throw new IOException("Seqera API error: HTTP ${code} for ${url}") + } + + private static DataLinkDto mapDataLink(Map m) { + final dto = new DataLinkDto() + dto.id = m.id as String + dto.name = m.name as String + dto.description = m.description as String + dto.resourceRef = m.resourceRef as String + if (m.provider) dto.provider = parseProvider(m.provider as String) + dto.region = m.region as String + return dto + } + + private static DataLinkItem mapItem(Map m) { + final it = new DataLinkItem() + it.name = m.name as String + if (m.type) it.type = parseItemType(m.type as String) + it.size = (m.size as Long) ?: 0L + it.mimeType = m.mimeType as String + return it + } + + private static DataLinkProvider parseProvider(String value) { + try { + return DataLinkProvider.fromValue(value) + } catch (Throwable ignored) { + return DataLinkProvider.values().find { DataLinkProvider p -> p.toString() == value } + } + } + + private static DataLinkItemType parseItemType(String value) { + try { + return DataLinkItemType.fromValue(value) + } catch (Throwable ignored) { + return DataLinkItemType.values().find { DataLinkItemType t -> t.toString() == value } + } + } +} diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/ResourceTypeHandler.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/ResourceTypeHandler.groovy new file mode 100644 index 0000000000..55f0f35a64 --- /dev/null +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/ResourceTypeHandler.groovy @@ -0,0 +1,56 @@ +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.fs + +import java.nio.file.AccessMode +import java.nio.file.Path + +/** + * Strategy owning the semantics of one depth-3 path segment under {@code seqera://}. + * Registered in {@link SeqeraFileSystem} at filesystem construction. + * + * Implementations own their resource's API client, caches, and interpretation of + * trail segments beyond {@code resourceType}. The generic NIO layer does not look + * inside the trail. + */ +interface ResourceTypeHandler { + + /** e.g. {@code "datasets"} or {@code "data-links"}. Must match the depth-3 path segment. */ + String getResourceType() + + /** + * List entries at the given directory path. Caller has verified depth ≥ 3. + * Returning an {@link Iterable} lets implementations stream large listings + * without materializing them in memory. + */ + Iterable list(SeqeraPath dir) throws IOException + + /** Return attributes for any path at depth ≥ 3 owned by this handler. */ + SeqeraFileAttributes readAttributes(SeqeraPath path) throws IOException + + /** + * Open a read stream for a leaf path. Throw {@link IllegalArgumentException} + * if the path is a directory or not otherwise addressable as a file. + */ + InputStream newInputStream(SeqeraPath path) throws IOException + + /** + * Verify the path exists and requested modes are satisfiable. READ is allowed; + * WRITE/EXECUTE throw {@link java.nio.file.AccessDeniedException}. + */ + void checkAccess(SeqeraPath path, AccessMode... modes) throws IOException +} diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileAttributes.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileAttributes.groovy index 3246bd35e2..abc35d044b 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileAttributes.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileAttributes.groovy @@ -21,67 +21,57 @@ import java.nio.file.attribute.FileTime import java.time.Instant import groovy.transform.CompileStatic -import io.seqera.tower.model.DatasetDto /** * {@link BasicFileAttributes} for {@code seqera://} paths. - * For depth < 4 (directory paths): {@code isDirectory=true}, {@code size=0}. - * For depth 4 (dataset file paths): {@code isRegularFile=true}, timestamps from {@link DatasetDto}. * - * @author Seqera Labs + * Resource-type agnostic: virtual directories use the {@code (boolean isDir)} + * constructor; file-like entries use the explicit {@code (size, lastMod, created, key)} + * constructor. Handlers build instances using whatever metadata the underlying + * resource exposes. */ @CompileStatic class SeqeraFileAttributes implements BasicFileAttributes { private final boolean directory - private final DatasetDto dataset + private final long size + private final Instant lastModified + private final Instant created + private final Object fileKey - /** Construct attributes for a virtual directory (depth 0–3). */ + /** Construct attributes for a virtual directory. */ SeqeraFileAttributes(boolean isDir) { this.directory = isDir - this.dataset = null + this.size = 0L + this.lastModified = Instant.EPOCH + this.created = Instant.EPOCH + this.fileKey = null } - /** Construct attributes for a dataset file (depth 4). */ - SeqeraFileAttributes(DatasetDto dataset) { + /** Construct attributes for a regular file with explicit metadata. */ + SeqeraFileAttributes(long size, Instant lastModified, Instant created, Object fileKey) { this.directory = false - this.dataset = dataset + this.size = size >= 0 ? size : 0L + this.lastModified = lastModified ?: Instant.EPOCH + this.created = created ?: Instant.EPOCH + this.fileKey = fileKey } - @Override - FileTime lastModifiedTime() { - if (dataset?.lastUpdated) { - return FileTime.from(dataset.lastUpdated.toInstant()) - } - return FileTime.from(Instant.EPOCH) - } + @Override FileTime lastModifiedTime() { FileTime.from(lastModified) } - @Override - FileTime lastAccessTime() { lastModifiedTime() } + @Override FileTime lastAccessTime() { FileTime.from(lastModified) } - @Override - FileTime creationTime() { - if (dataset?.dateCreated) { - return FileTime.from(dataset.dateCreated.toInstant()) - } - return FileTime.from(Instant.EPOCH) - } + @Override FileTime creationTime() { FileTime.from(created) } - @Override - boolean isRegularFile() { !directory } + @Override boolean isRegularFile() { !directory } - @Override - boolean isDirectory() { directory } + @Override boolean isDirectory() { directory } - @Override - boolean isSymbolicLink() { false } + @Override boolean isSymbolicLink() { false } - @Override - boolean isOther() { false } + @Override boolean isOther() { false } - @Override - long size() { 0L } + @Override long size() { size } - @Override - Object fileKey() { dataset?.id } + @Override Object fileKey() { fileKey } } diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystem.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystem.groovy index 4f639facd6..3ee200f425 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystem.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystem.groovy @@ -27,48 +27,52 @@ import java.nio.file.spi.FileSystemProvider import groovy.transform.CompileStatic import groovy.util.logging.Slf4j -import io.seqera.tower.model.DatasetDto -import io.seqera.tower.model.DatasetVersionDto import io.seqera.tower.model.OrgAndWorkspaceDto import io.seqera.tower.plugin.dataset.SeqeraDatasetClient /** * FileSystem instance for the {@code seqera://} scheme. - * One instance per (endpoint + credentials) pair, cached by {@link SeqeraFileSystemProvider}. + * One instance per {@link SeqeraFileSystemProvider}. * - * Lazily populates org/workspace/dataset caches on first access. - * Cache is invalidated on dataset write operations. - * - * @author Seqera Labs + * Resource-type-agnostic: the filesystem owns the org/workspace cache (shared across + * resource types) and a registry of {@link ResourceTypeHandler}s. Each handler owns + * its own API client and resource-specific caches. */ @Slf4j @CompileStatic class SeqeraFileSystem extends FileSystem { private final SeqeraFileSystemProvider provider0 - final SeqeraDatasetClient client + private SeqeraDatasetClient orgWorkspaceClient /** orgName → orgId */ private final Map orgCache = new LinkedHashMap<>() /** "orgName/workspaceName" → workspaceId */ private final Map workspaceCache = new LinkedHashMap<>() - /** workspaceId → list of DatasetDto */ - private final Map> datasetCache = new LinkedHashMap<>() - /** datasetId → list of DatasetVersionDto */ - private final Map> versionCache = new LinkedHashMap<>() + + /** resourceType → handler */ + private final Map handlers = new LinkedHashMap<>() private volatile boolean orgWorkspaceCacheLoaded = false - SeqeraFileSystem(SeqeraFileSystemProvider provider, SeqeraDatasetClient client) { + SeqeraFileSystem(SeqeraFileSystemProvider provider) { this.provider0 = provider - this.client = client + } + + /** + * Attach the dataset client used for user-info / workspaces lookup. The org/workspace + * listing uses dataset endpoints today ({@code GET /user-info}, {@code GET /user/{id}/workspaces}); + * keeping the client on the filesystem avoids duplicating it across handlers. + */ + void setOrgWorkspaceClient(SeqeraDatasetClient client) { + this.orgWorkspaceClient = client } @Override FileSystemProvider provider() { provider0 } @Override - void close() { /* no-op: platform API connection is stateless */ } + void close() { /* no-op */ } @Override boolean isOpen() { true } @@ -111,16 +115,14 @@ class SeqeraFileSystem extends FileSystem { throw new UnsupportedOperationException("WatchService not supported by seqera:// filesystem") } - // ---- cache management ---- + // ---- org/workspace cache (shared across handlers) ---- - /** - * Ensure the org/workspace cache is populated. Thread-safe: loads at most once. - * Calls GET /user-info then GET /user/{userId}/workspaces. - */ synchronized void loadOrgWorkspaceCache() { if (orgWorkspaceCacheLoaded) return + if (!orgWorkspaceClient) + throw new IllegalStateException("SeqeraFileSystem has no orgWorkspaceClient attached") log.debug "Loading Seqera org/workspace cache" - final entries = client.listUserWorkspacesAndOrgs(client.getUserId()) + final entries = orgWorkspaceClient.listUserWorkspacesAndOrgs(orgWorkspaceClient.getUserId()) for (OrgAndWorkspaceDto entry : entries) { if (entry.orgName) orgCache.put(entry.orgName, entry.orgId) @@ -130,28 +132,18 @@ class SeqeraFileSystem extends FileSystem { orgWorkspaceCacheLoaded = true } - /** - * @return distinct org names visible to the authenticated user - */ synchronized Set listOrgNames() { loadOrgWorkspaceCache() return Collections.unmodifiableSet(orgCache.keySet()) } - /** - * @return workspace names for the given org - */ synchronized List listWorkspaceNames(String org) { loadOrgWorkspaceCache() return workspaceCache.keySet() - .findAll { String k -> k.startsWith("${org}/") } - .collect { String k -> k.substring(org.length() + 1) } + .findAll { String k -> k.startsWith("${org}/") } + .collect { String k -> k.substring(org.length() + 1) } } - /** - * Resolve a workspace ID by org and workspace name. - * @throws NoSuchFileException if the org or workspace is not in the cache - */ synchronized long resolveWorkspaceId(String org, String workspace) throws NoSuchFileException { loadOrgWorkspaceCache() final key = "${org}/${workspace}" as String @@ -161,54 +153,17 @@ class SeqeraFileSystem extends FileSystem { return id } - /** - * Return datasets for the given workspace, populating the cache on first access. - */ - synchronized List resolveDatasets(long workspaceId) { - List cached = datasetCache.get(workspaceId) - if (cached == null) { - cached = client.listDatasets(workspaceId) - datasetCache.put(workspaceId, cached) - } - return cached - } + // ---- handler registry ---- - /** - * Invalidate the dataset and version caches for a workspace (call after a write operation). - */ - synchronized void invalidateDatasetCache(long workspaceId) { - // Remove version caches for all datasets in this workspace - final datasets = datasetCache.get(workspaceId) - if (datasets) { - for (DatasetDto ds : datasets) { - versionCache.remove(ds.id) - } - } - datasetCache.remove(workspaceId) + synchronized void registerHandler(ResourceTypeHandler handler) { + handlers.put(handler.resourceType, handler) } - /** - * Resolve a DatasetDto by name within a workspace. - * @throws NoSuchFileException if no dataset with the given name exists - */ - synchronized DatasetDto resolveDataset(long workspaceId, String name) throws NoSuchFileException { - final datasets = resolveDatasets(workspaceId) - return datasets.find { DatasetDto d -> d.name == name } + synchronized ResourceTypeHandler getHandler(String resourceType) { + handlers.get(resourceType) } - /** - * Return versions for the given dataset, populating the cache on first access. - * Note: the version cache is only invalidated when the workspace dataset cache is invalidated - * (e.g. after a write operation). Versions published externally during a pipeline run will not - * be visible until the cache is cleared. - */ - synchronized List resolveVersions(String datasetId, long workspaceId) { - List cached = versionCache.get(datasetId) - if (cached == null) { - cached = client.listVersions(datasetId, workspaceId) - versionCache.put(datasetId, cached) - } - return cached + synchronized Set getResourceTypes() { + Collections.unmodifiableSet(new LinkedHashSet(handlers.keySet())) } - } diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy index bed6667dd3..68cb9b556b 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy @@ -20,6 +20,7 @@ import java.nio.channels.SeekableByteChannel import java.nio.file.AccessDeniedException import java.nio.file.AccessMode import java.nio.file.CopyOption +import java.nio.file.DirectoryIteratorException import java.nio.file.DirectoryStream import java.nio.file.FileStore import java.nio.file.FileSystem @@ -27,9 +28,7 @@ import java.nio.file.FileSystemAlreadyExistsException import java.nio.file.FileSystemNotFoundException import java.nio.file.Files import java.nio.file.LinkOption -import java.nio.file.DirectoryIteratorException import java.nio.file.NoSuchFileException -import java.nio.file.NotDirectoryException import java.nio.file.OpenOption import java.nio.file.Path import java.nio.file.ProviderMismatchException @@ -41,23 +40,19 @@ import java.nio.file.spi.FileSystemProvider import groovy.transform.CompileStatic import groovy.util.logging.Slf4j -import io.seqera.tower.model.DatasetDto -import io.seqera.tower.model.DatasetVersionDto import io.seqera.tower.plugin.TowerClient import io.seqera.tower.plugin.TowerFactory +import io.seqera.tower.plugin.datalink.SeqeraDataLinkClient import io.seqera.tower.plugin.dataset.SeqeraDatasetClient +import io.seqera.tower.plugin.fs.handler.DataLinksResourceHandler +import io.seqera.tower.plugin.fs.handler.DatasetsResourceHandler /** - * NIO {@link FileSystemProvider} for the {@code seqera://} scheme. - * Registered via {@code META-INF/services/java.nio.file.spi.FileSystemProvider}. - * - * Enables Nextflow pipelines to read Seqera Platform datasets as ordinary file paths: - * {@code seqera:////datasets/} - * - * Follows the {@code LinFileSystemProvider} pattern for structure. - * Write support follows the {@code AzFileSystemProvider} buffered-upload pattern. + * NIO {@link FileSystemProvider} for the {@code seqera://} scheme. Registered via + * {@code META-INF/services/java.nio.file.spi.FileSystemProvider}. * - * @author Seqera Labs + * Generic for depth ≥ 3: dispatches to a {@link ResourceTypeHandler} selected by + * {@code SeqeraPath.resourceType}. The handlers own all resource-specific logic. */ @Slf4j @CompileStatic @@ -65,24 +60,26 @@ class SeqeraFileSystemProvider extends FileSystemProvider { public static final String SCHEME = 'seqera' - /** Single filesystem instance — TowerClient is a singleton per session */ private volatile SeqeraFileSystem fileSystem @Override String getScheme() { SCHEME } - // ---- FileSystem lifecycle ---- + // ---- lifecycle ---- @Override synchronized FileSystem newFileSystem(URI uri, Map env) throws IOException { checkScheme(uri) if (fileSystem) throw new FileSystemAlreadyExistsException("File system `seqera://` already exists") - final TowerClient towerClient = TowerFactory.client() - if (!towerClient) - throw new IllegalStateException("File system `seqera://` requires the Seqera Platform access token to be provided - use `tower.accessToken` config option or TOWER_ACCESS_TOKEN env variable") - final client = new SeqeraDatasetClient(towerClient) - fileSystem = new SeqeraFileSystem(this, client) + final TowerClient tc = TowerFactory.client() + if (!tc) + throw new IllegalStateException("File system `seqera://` requires the Seqera Platform access token — use `tower.accessToken` config option or TOWER_ACCESS_TOKEN env variable") + final datasetClient = new SeqeraDatasetClient(tc) + fileSystem = new SeqeraFileSystem(this) + fileSystem.setOrgWorkspaceClient(datasetClient) + fileSystem.registerHandler(new DatasetsResourceHandler(fileSystem, datasetClient)) + fileSystem.registerHandler(new DataLinksResourceHandler(fileSystem, new SeqeraDataLinkClient(tc))) return fileSystem } @@ -95,10 +92,7 @@ class SeqeraFileSystemProvider extends FileSystemProvider { synchronized SeqeraFileSystem getOrCreateFileSystem(URI uri, Map env) { checkScheme(uri) - if (!fileSystem) { - final envMap = env ?: Collections.emptyMap() - newFileSystem(uri, envMap as Map) - } + if (!fileSystem) newFileSystem(uri, env ?: Collections.emptyMap()) return fileSystem } @@ -108,32 +102,28 @@ class SeqeraFileSystemProvider extends FileSystemProvider { return new SeqeraPath(fs, uri.toString()) } - // ---- Read operations ---- + // ---- read ---- @Override InputStream newInputStream(Path path, OpenOption... options) throws IOException { final sp = toSeqeraPath(path) - if (sp.depth() != 4) - throw new IllegalArgumentException("Operation `newInputStream` requires a dataset path (depth 4): $path") + if (sp.depth() < 3) + throw new IllegalArgumentException("newInputStream requires a leaf path: $path") final fs = sp.getFileSystem() as SeqeraFileSystem - final workspaceId = fs.resolveWorkspaceId(sp.org, sp.workspace) - final dataset = fs.resolveDataset(workspaceId, sp.datasetName) - if (!dataset) - throw new NoSuchFileException(sp.toString(), null, "Dataset '${sp.datasetName}' not found in workspace $sp.workspace") - final version = resolveVersion(fs, dataset, sp) - log.debug "Downloading dataset '${sp.datasetName}' version ${version.version} (${version.fileName}) from workspace $workspaceId" - return fs.client.downloadDataset(dataset.id, String.valueOf(version.version), version.fileName, dataset.workspaceId) + final h = fs.getHandler(sp.resourceType) + if (!h) + throw new NoSuchFileException(path.toString(), null, "Unsupported resource type: ${sp.resourceType}") + return h.newInputStream(sp) } @Override SeekableByteChannel newByteChannel(Path path, Set options, FileAttribute... attrs) throws IOException { if (options?.contains(StandardOpenOption.WRITE) || options?.contains(StandardOpenOption.APPEND)) - throw new UnsupportedOperationException("File system `seqera://` is read-only") - final inputStream = newInputStream(path) - return new DatasetInputStream(inputStream) + throw new UnsupportedOperationException("seqera:// filesystem is read-only") + return new DatasetInputStream(newInputStream(path)) } - // ---- Metadata ---- + // ---- attributes ---- @Override A readAttributes(Path path, Class type, LinkOption... options) throws IOException { @@ -142,17 +132,14 @@ class SeqeraFileSystemProvider extends FileSystemProvider { final sp = toSeqeraPath(path) final fs = sp.getFileSystem() as SeqeraFileSystem final d = sp.depth() - if (d < 4) { - // Virtual directory — validate the path exists (throws NoSuchFileException if not) - validateDirectoryExists(fs, sp) + if (d < 3) { + validateSharedDirectoryExists(fs, sp) return (A) new SeqeraFileAttributes(true) } - // Dataset file - final workspaceId = fs.resolveWorkspaceId(sp.org, sp.workspace) - final dataset = fs.resolveDataset(workspaceId, sp.datasetName) - if (!dataset) - throw new NoSuchFileException(sp.toString(), null, "Dataset '${sp.datasetName}' not found in workspace $sp.workspace") - return (A) new SeqeraFileAttributes(dataset) + final h = fs.getHandler(sp.resourceType) + if (!h) + throw new NoSuchFileException(path.toString(), null, "Unsupported resource type: ${sp.resourceType}") + return (A) h.readAttributes(sp) } @Override @@ -160,7 +147,7 @@ class SeqeraFileSystemProvider extends FileSystemProvider { throw new UnsupportedOperationException("Operation `readAttributes(String)` not supported by `seqera://` file system") } - // ---- Access check ---- + // ---- access ---- @Override void checkAccess(Path path, AccessMode... modes) throws IOException { @@ -169,73 +156,107 @@ class SeqeraFileSystemProvider extends FileSystemProvider { if (m == AccessMode.WRITE || m == AccessMode.EXECUTE) throw new AccessDeniedException(path.toString(), null, "seqera:// filesystem is read-only") } - // For READ, verify the path resolves without throwing NoSuchFileException - if (sp.depth() >= 1) { - final fs = sp.getFileSystem() as SeqeraFileSystem - if (sp.depth() == 1) { - fs.loadOrgWorkspaceCache() - if (!fs.listOrgNames().contains(sp.org)) - throw new NoSuchFileException(path.toString(), null, "Organisation not found") - } else { - fs.resolveWorkspaceId(sp.org, sp.workspace) - } + final d = sp.depth() + if (d == 0) return + if (d < 3) { + validateSharedDirectoryExists(sp.getFileSystem() as SeqeraFileSystem, sp) + return } + final fs = sp.getFileSystem() as SeqeraFileSystem + final h = fs.getHandler(sp.resourceType) + if (!h) + throw new NoSuchFileException(path.toString(), null, "Unsupported resource type: ${sp.resourceType}") + h.checkAccess(sp, modes) } - // ---- Directory stream ---- + // ---- directory stream ---- @Override DirectoryStream newDirectoryStream(Path dir, DirectoryStream.Filter filter) throws IOException { final sp = toSeqeraPath(dir) final fs = sp.getFileSystem() as SeqeraFileSystem final d = sp.depth() - List entries + Iterable entries if (d == 0) { - // Root: list distinct org names fs.loadOrgWorkspaceCache() entries = fs.listOrgNames().collect { String org -> sp.resolve(org) as Path } } else if (d == 1) { - // Org: list workspace names fs.loadOrgWorkspaceCache() entries = fs.listWorkspaceNames(sp.org).collect { String ws -> sp.resolve(ws) as Path } } else if (d == 2) { - // Workspace: static resource types - entries = ['datasets'].collect { String rt -> sp.resolve(rt) as Path } - } else if (d == 3) { - // Resource type directory: list dataset names - final workspaceId = fs.resolveWorkspaceId(sp.org, sp.workspace) - entries = fs.resolveDatasets(workspaceId).collect { DatasetDto ds -> - sp.resolve(ds.name) as Path - } + fs.resolveWorkspaceId(sp.org, sp.workspace) + entries = fs.getResourceTypes().collect { String rt -> sp.resolve(rt) as Path } } else { - throw new NotDirectoryException(dir.toString()) + final h = fs.getHandler(sp.resourceType) + if (!h) + throw new NoSuchFileException(dir.toString(), null, "Unsupported resource type: ${sp.resourceType}") + entries = h.list(sp) } - final filtered = filter ? entries.findAll { Path p -> - try { filter.accept(p) } - catch (IOException e) { throw new DirectoryIteratorException(e) } - } : entries - + final source = entries return new DirectoryStream() { - @Override Iterator iterator() { filtered.iterator() } + @Override + Iterator iterator() { + final inner = source.iterator() + if (!filter) return inner + return new FilteredIterator(inner, filter) + } @Override void close() {} } } - // ---- Copy ---- + /** Lazy filtering iterator: calls the filter as each element is consumed. */ + @CompileStatic + private static class FilteredIterator implements Iterator { + private final Iterator inner + private final DirectoryStream.Filter filter + private T buffered + private boolean hasBuffered = false + + FilteredIterator(Iterator inner, DirectoryStream.Filter filter) { + this.inner = inner + this.filter = filter + } + + @Override + boolean hasNext() { + while (!hasBuffered && inner.hasNext()) { + final candidate = inner.next() + try { + if (filter.accept(candidate)) { + buffered = candidate + hasBuffered = true + } + } catch (IOException e) { + throw new DirectoryIteratorException(e) + } + } + return hasBuffered + } + + @Override + T next() { + if (!hasNext()) throw new NoSuchElementException() + final out = buffered + buffered = null + hasBuffered = false + return out + } + } + + // ---- copy ---- @Override void copy(Path source, Path target, CopyOption... options) throws IOException { toSeqeraPath(source) if (target instanceof SeqeraPath) throw new UnsupportedOperationException("seqera:// filesystem is read-only") - // cross-provider (seqera → local): stream to target try (final InputStream is = newInputStream(source)) { Files.copy(is, target, options) } } - // ---- Unsupported mutations ---- + // ---- unsupported mutations ---- @Override void move(Path source, Path target, CopyOption... options) { @@ -252,7 +273,7 @@ class SeqeraFileSystemProvider extends FileSystemProvider { throw new UnsupportedOperationException("createDirectory() not supported by seqera:// filesystem") } - // ---- Misc ---- + // ---- misc ---- @Override boolean isSameFile(Path path, Path path2) throws IOException { @@ -277,11 +298,10 @@ class SeqeraFileSystemProvider extends FileSystemProvider { throw new UnsupportedOperationException("setAttribute() not supported by seqera:// filesystem") } - // ---- private helpers ---- + // ---- helpers ---- private static SeqeraPath toSeqeraPath(Path path) { - if (path !instanceof SeqeraPath) - throw new ProviderMismatchException() + if (path !instanceof SeqeraPath) throw new ProviderMismatchException() return (SeqeraPath) path } @@ -290,36 +310,13 @@ class SeqeraFileSystemProvider extends FileSystemProvider { throw new IllegalArgumentException("Not a seqera:// URI: $uri") } - private static void validateDirectoryExists(SeqeraFileSystem fs, SeqeraPath sp) throws NoSuchFileException { + private static void validateSharedDirectoryExists(SeqeraFileSystem fs, SeqeraPath sp) throws NoSuchFileException { final d = sp.depth() if (d == 0) return - // Depth 1+: ensure org/workspace cache is loaded fs.loadOrgWorkspaceCache() if (d >= 1 && !fs.listOrgNames().contains(sp.org)) throw new NoSuchFileException("seqera://${sp.org}", null, "Organisation not found") if (d >= 2) fs.resolveWorkspaceId(sp.org, sp.workspace) - if (d >= 3 && sp.resourceType != 'datasets') - throw new NoSuchFileException("seqera://${sp.org}/${sp.workspace}/${sp.resourceType}", null, "Unsupported resource type") - } - - private static DatasetVersionDto resolveVersion(SeqeraFileSystem fs, DatasetDto dataset, SeqeraPath sp) throws IOException { - final pinnedVersion = sp.version - final versions = fs.resolveVersions(dataset.id, dataset.workspaceId) - if (versions.isEmpty()) - throw new NoSuchFileException(sp.toString(), null, "No versions available for dataset '${dataset.name}'") - if (pinnedVersion) { - final found = versions.find { DatasetVersionDto v -> String.valueOf(v.version) == pinnedVersion } - if (!found) - throw new NoSuchFileException(sp.toString(), null, "Version '${pinnedVersion}' not found for dataset '${dataset.name}'") - return found - } - // Latest non-disabled version - final latest = versions.findAll { DatasetVersionDto v -> !v.disabled } - .max { DatasetVersionDto v -> v.version } - if (!latest) - throw new NoSuchFileException(sp.toString(), null, "No enabled versions for dataset '${dataset.name}'") - return latest } - } diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraPath.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraPath.groovy index af2b39165c..f3c5ea77fd 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraPath.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraPath.groovy @@ -30,211 +30,164 @@ import groovy.transform.CompileStatic /** * {@link Path} implementation for the {@code seqera://} scheme. * - * Path hierarchy: + * Path shape: *
- *   depth 0  seqera://                                   (root — directory)
- *   depth 1  seqera://<org>                              (org — directory)
- *   depth 2  seqera://<org>/<workspace>                  (workspace — directory)
- *   depth 3  seqera://<org>/<workspace>/datasets          (resource type — directory)
- *   depth 4  seqera://<org>/<workspace>/datasets/<name>  (dataset file)
- *            seqera://<org>/<workspace>/datasets/<name@ver>  (pinned version)
+ *   seqera://                               depth 0 — root
+ *   seqera://<org>                       depth 1
+ *   seqera://<org>/<ws>               depth 2
+ *   seqera://<org>/<ws>/<type>      depth 3 — resource type
+ *   seqera://<org>/<ws>/<type>/...   depth 4+ — handler-owned trail
  * 
* - * @author Seqera Labs + * Resource-type-agnostic for depth ≥ 3: segments after {@code resourceType} are + * exposed as {@link #getTrail()} and interpreted by the matching + * {@link ResourceTypeHandler}. */ @CompileStatic class SeqeraPath implements Path { - /** URI scheme */ public static final String SCHEME = 'seqera' public static final String PROTOCOL = "${SCHEME}://" public static final String SEPARATOR = '/' private final SeqeraFileSystem fs - /** path segments in order: [org, workspace, resourceType, datasetName] — null for missing levels */ private final String org private final String workspace private final String resourceType - private final String datasetName - /** version string extracted from {@code @version} suffix; null when not pinned */ - private final String version - /** - * Raw relative path string — non-null only for relative {@code SeqeraPath} instances - * created by {@link #relativize(Path)}. When non-null, {@link #fs} is {@code null} - * and all segment fields are {@code null}. - */ + private final List trail + /** non-null only for relative paths produced by {@link #relativize(Path)} */ private final String relPath - /** - * Parse a {@code seqera://} URI string into a SeqeraPath. - * The URI authority is the org; path segments are workspace, resourceType, datasetName. - * The last segment may contain a {@code @version} suffix. - */ + /** Parse a {@code seqera://} URI string. */ SeqeraPath(SeqeraFileSystem fs, String uriString) { this.fs = fs this.relPath = null - if (!uriString.startsWith("${SCHEME}://")) + if (!uriString.startsWith(PROTOCOL)) throw new InvalidPathException(uriString, "Not a seqera:// URI") - // strip scheme: seqera://rest - final withoutScheme = uriString.substring("${SCHEME}://".length()) - // split on '/' - final parts = withoutScheme.split('/', -1).toList().findAll { it != null } as List - // parts[0]=org, parts[1]=workspace, parts[2]=resourceType, parts[3]=datasetName[@version] - this.org = parts.size() > 0 && parts[0] ? parts[0] : null - this.workspace = parts.size() > 1 && parts[1] ? parts[1] : null + final withoutScheme = uriString.substring(PROTOCOL.length()) + final parts = withoutScheme.split('/', -1).toList().findAll { String s -> s != null } as List + this.org = parts.size() > 0 && parts[0] ? parts[0] : null + this.workspace = parts.size() > 1 && parts[1] ? parts[1] : null this.resourceType = parts.size() > 2 && parts[2] ? parts[2] : null - if (parts.size() > 3 && parts[3]) { - final last = parts[3] - final atIdx = last.lastIndexOf('@') - if (atIdx > 0) { - this.datasetName = last.substring(0, atIdx) - this.version = last.substring(atIdx + 1) - } else { - this.datasetName = last - this.version = null - } - } else { - this.datasetName = null - this.version = null - } + // Trail: drop any empty segments (trailing slash, accidental double-slashes) + final List tail = parts.size() > 3 + ? parts.subList(3, parts.size()).findAll { String s -> s } as List + : new ArrayList() + this.trail = Collections.unmodifiableList(tail) validatePath(uriString) } - /** Internal constructor for programmatic absolute path creation */ - SeqeraPath(SeqeraFileSystem fs, String org, String workspace, String resourceType, String datasetName, String version) { + /** Programmatic absolute-path constructor. */ + SeqeraPath(SeqeraFileSystem fs, String org, String workspace, String resourceType, List trail) { this.fs = fs this.relPath = null this.org = org this.workspace = workspace this.resourceType = resourceType - this.datasetName = datasetName - this.version = version + this.trail = trail != null + ? Collections.unmodifiableList(new ArrayList(trail)) + : Collections.emptyList() validatePath(null) } - /** - * Constructor for relative paths produced by {@link #relativize(Path)}. - * The {@code relPath} is a slash-separated string of the differing path segments. - * All segment fields are {@code null}; {@link #isAbsolute()} returns {@code false}. - */ + /** Relative path produced only by {@link #relativize(Path)}. */ SeqeraPath(String relPath) { this.fs = null this.relPath = relPath ?: '' this.org = null this.workspace = null this.resourceType = null - this.datasetName = null - this.version = null + this.trail = Collections.emptyList() } - /** - * Validate structural integrity: deeper segments require all shallower ones, - * and no segment may contain {@code /}. - * - * @param original original URI string used in error messages (null → derive from fields) - * @throws InvalidPathException if the path is malformed - */ private void validatePath(String original) { final label = original ?: rawPath() - if (datasetName && !workspace) - throw new InvalidPathException(label, "Dataset path requires a workspace segment") + if (trail && !resourceType) + throw new InvalidPathException(label, "Trail segments require a resource-type segment") if (resourceType && !workspace) throw new InvalidPathException(label, "Resource type requires a workspace segment") if (workspace && !org) throw new InvalidPathException(label, "Workspace requires an org segment") - // Segments from URI parsing never contain '/', but guard the internal constructor too if (org?.contains('/')) throw new InvalidPathException(label, "Org name cannot contain '/'") if (workspace?.contains('/')) throw new InvalidPathException(label, "Workspace name cannot contain '/'") if (resourceType?.contains('/')) throw new InvalidPathException(label, "Resource type cannot contain '/'") - if (datasetName?.contains('/')) - throw new InvalidPathException(label, "Dataset name cannot contain '/'") + for (String t : trail) { + if (t == null || t.isEmpty()) + throw new InvalidPathException(label, "Path segments cannot be empty") + if (t.contains('/')) + throw new InvalidPathException(label, "Path segments cannot contain '/'") + } + } + + private String rawPath() { + final sb = new StringBuilder(PROTOCOL) + if (org) sb.append(org) + if (workspace) sb.append('/').append(workspace) + if (resourceType) sb.append('/').append(resourceType) + for (String t : trail) sb.append('/').append(t) + return sb.toString() } - /** Return a list of name component strings (works for both absolute and relative paths). */ private List nameComponents() { if (isAbsolute()) { final d = depth() - final result = new ArrayList(d) + final out = new ArrayList(d) for (int i = 0; i < d; i++) - result.add(getName(i).toString()) - return result + out.add(getName(i).toString()) + return out } if (!relPath) return Collections.emptyList() return relPath.split('/').toList().findAll { String s -> s } as List } - /** Build a raw path string from the current fields, for use in exception messages. */ - private String rawPath() { - final sb = new StringBuilder("${SCHEME}://") - if (org) sb.append(org) - if (workspace) sb.append('/').append(workspace) - if (resourceType) sb.append('/').append(resourceType) - if (datasetName) { - sb.append('/').append(datasetName) - if (version) sb.append('@').append(version) - } - return sb.toString() - } - - // ---- path component accessors ---- + // ---- accessors ---- String getOrg() { org } String getWorkspace() { workspace } String getResourceType() { resourceType } - String getDatasetName() { datasetName } - String getVersion() { version } + List getTrail() { trail } - /** - * Path depth: 0=root, 1=org, 2=workspace, 3=resourceType, 4=dataset file. - */ int depth() { - if (datasetName) return 4 - if (resourceType) return 3 + if (resourceType) return 3 + trail.size() if (workspace) return 2 if (org) return 1 return 0 } - boolean isDirectory() { depth() < 4 } - boolean isRegularFile() { depth() == 4 } - // ---- Path API ---- - @Override - FileSystem getFileSystem() { fs } + @Override FileSystem getFileSystem() { fs } + @Override boolean isAbsolute() { fs != null } @Override - boolean isAbsolute() { fs != null } - - @Override - Path getRoot() { new SeqeraPath(fs, null, null, null, null, null) } + Path getRoot() { new SeqeraPath(fs, null, null, null, null) } @Override Path getFileName() { final d = depth() if (d == 0) return null - final name = d == 4 ? (version ? "${datasetName}@${version}" : datasetName) - : d == 3 ? resourceType - : d == 2 ? workspace - : org - return new SeqeraPath( name as String) + if (d >= 4) return new SeqeraPath(trail[trail.size() - 1]) + if (d == 3) return new SeqeraPath(resourceType) + if (d == 2) return new SeqeraPath(workspace) + return new SeqeraPath(org) } @Override Path getParent() { final d = depth() if (d == 0) return null - if (d == 1) return new SeqeraPath(fs, null, null, null, null, null) - if (d == 2) return new SeqeraPath(fs, org, null, null, null, null) - if (d == 3) return new SeqeraPath(fs, org, workspace, null, null, null) - return new SeqeraPath(fs, org, workspace, resourceType, null, null) + if (d == 1) return new SeqeraPath(fs, null, null, null, null) + if (d == 2) return new SeqeraPath(fs, org, null, null, null) + if (d == 3) return new SeqeraPath(fs, org, workspace, null, null) + // d >= 4: drop last trail segment + final newTrail = trail.subList(0, trail.size() - 1) + return new SeqeraPath(fs, org, workspace, resourceType, newTrail) } - @Override - int getNameCount() { depth() } + @Override int getNameCount() { depth() } @Override Path getName(int index) { @@ -244,7 +197,7 @@ class SeqeraPath implements Path { if (index == 0) return new SeqeraPath(org) if (index == 1) return new SeqeraPath(workspace) if (index == 2) return new SeqeraPath(resourceType) - return new SeqeraPath((version ? "${datasetName}@${version}" : datasetName) as String) + return new SeqeraPath(trail[index - 3]) } @Override @@ -254,19 +207,14 @@ class SeqeraPath implements Path { @Override boolean startsWith(Path other) { - if (other !instanceof SeqeraPath) - return false + if (other !instanceof SeqeraPath) return false final that = (SeqeraPath) other - if (this.isAbsolute() != that.isAbsolute()) - return false - final thisNames = this.nameComponents() - final thatNames = that.nameComponents() - if (thatNames.size() > thisNames.size()) - return false - for (int i = 0; i < thatNames.size(); i++) { - if (thisNames[i] != thatNames[i]) - return false - } + if (this.isAbsolute() != that.isAbsolute()) return false + final mine = nameComponents() + final theirs = that.nameComponents() + if (theirs.size() > mine.size()) return false + for (int i = 0; i < theirs.size(); i++) + if (mine[i] != theirs[i]) return false return true } @@ -276,27 +224,20 @@ class SeqeraPath implements Path { try { final Path p = SeqeraPath.isSeqeraUri(other) ? new SeqeraPath(fs, other) : new SeqeraPath(other) return startsWith(p) - } catch (Exception ignored) { - return false - } + } catch (Exception ignored) { return false } } @Override boolean endsWith(Path other) { - if (other !instanceof SeqeraPath) - return false + if (other !instanceof SeqeraPath) return false final that = (SeqeraPath) other - if (that.isAbsolute()) - return this.equals(that) - final thisNames = this.nameComponents() - final thatNames = that.nameComponents() - if (thatNames.isEmpty() || thatNames.size() > thisNames.size()) - return false - final offset = thisNames.size() - thatNames.size() - for (int i = 0; i < thatNames.size(); i++) { - if (thisNames[offset + i] != thatNames[i]) - return false - } + if (that.isAbsolute()) return this.equals(that) + final mine = nameComponents() + final theirs = that.nameComponents() + if (theirs.isEmpty() || theirs.size() > mine.size()) return false + final offset = mine.size() - theirs.size() + for (int i = 0; i < theirs.size(); i++) + if (mine[offset + i] != theirs[i]) return false return true } @@ -306,20 +247,16 @@ class SeqeraPath implements Path { try { final Path p = SeqeraPath.isSeqeraUri(other) ? new SeqeraPath(fs, other) : new SeqeraPath(other) return endsWith(p) - } catch (Exception ignored) { - return false - } + } catch (Exception ignored) { return false } } - @Override - Path normalize() { this } + @Override Path normalize() { this } @Override Path resolve(Path other) { if (other instanceof SeqeraPath) { final that = (SeqeraPath) other if (that.isAbsolute()) return that - // Relative SeqeraPath: resolve each segment of relPath against this return resolve(that.relPath) } return resolve(other.toString()) @@ -328,34 +265,24 @@ class SeqeraPath implements Path { @Override Path resolve(String segment) { if (!segment) return this - // Absolute seqera:// URI — parse and return directly if (segment.startsWith(PROTOCOL)) return new SeqeraPath(fs, segment) - // Strip a single leading slash final stripped = segment.startsWith(SEPARATOR) ? segment.substring(1) : segment if (!stripped) return this - // Multi-segment: split and resolve one segment at a time final segs = stripped.split(SEPARATOR, -1).findAll { String s -> s } as List SeqeraPath result = this - for (String seg : segs) { - result = result.resolveOne(seg) - } + for (String seg : segs) result = result.resolveOne(seg) return result } - /** Resolve a single (non-empty, slash-free) segment against this path. */ private SeqeraPath resolveOne(String seg) { final d = depth() - if (d == 0) return new SeqeraPath(fs, seg, null, null, null, null) - if (d == 1) return new SeqeraPath(fs, org, seg, null, null, null) - if (d == 2) return new SeqeraPath(fs, org, workspace, seg, null, null) - if (d == 3) { - final atIdx = seg.lastIndexOf('@') - if (atIdx > 0) - return new SeqeraPath(fs, org, workspace, resourceType, seg.substring(0, atIdx), seg.substring(atIdx + 1)) - return new SeqeraPath(fs, org, workspace, resourceType, seg, null) - } - throw new IllegalStateException("Cannot resolve a path segment on a depth-4 path: $this") + if (d == 0) return new SeqeraPath(fs, seg, null, null, null) + if (d == 1) return new SeqeraPath(fs, org, seg, null, null) + if (d == 2) return new SeqeraPath(fs, org, workspace, seg, null) + final newTrail = new ArrayList(trail) + newTrail.add(seg) + return new SeqeraPath(fs, org, workspace, resourceType, newTrail) } @Override @@ -372,47 +299,36 @@ class SeqeraPath implements Path { @Override Path relativize(Path other) { - if (other !instanceof SeqeraPath) - throw new ProviderMismatchException() + if (other !instanceof SeqeraPath) throw new ProviderMismatchException() final that = (SeqeraPath) other if (!this.isAbsolute() || !that.isAbsolute()) throw new IllegalArgumentException("Both paths must be absolute to relativize: ${this} vs ${other}") - final thisNames = this.nameComponents() - final thatNames = that.nameComponents() - // Find common prefix length + final mine = this.nameComponents() + final theirs = that.nameComponents() int common = 0 - while (common < thisNames.size() && common < thatNames.size() - && thisNames[common] == thatNames[common]) - common++ - // Build ".." for each remaining segment in this, then append remaining segments of other + while (common < mine.size() && common < theirs.size() && mine[common] == theirs[common]) common++ final parts = new ArrayList() - for (int i = common; i < thisNames.size(); i++) - parts.add('..') - for (int i = common; i < thatNames.size(); i++) - parts.add(thatNames[i]) + for (int i = common; i < mine.size(); i++) parts.add('..') + for (int i = common; i < theirs.size(); i++) parts.add(theirs[i]) return new SeqeraPath(parts.join(SEPARATOR)) } @Override URI toUri() { - // Build path component for depth >= 2 String uriPath = null if (workspace) { final segments = [workspace] if (resourceType) segments.add(resourceType) - if (datasetName) segments.add(version ? "${datasetName}@${version}" as String : datasetName) + for (String t : trail) segments.add(t) uriPath = '/' + segments.join('/') } - // new URI(scheme, authority, path, query, fragment) avoids URI.create() pitfalls for edge cases return new URI(SCHEME, org ?: '', uriPath, null, null) } @Override String toString() { if (!isAbsolute()) return relPath - // Return the canonical human-readable representation - final d = depth() - if (d == 0) return "${SCHEME}://" + if (depth() == 0) return PROTOCOL return toUri().toString() } @@ -423,38 +339,30 @@ class SeqeraPath implements Path { return this } - @Override - Path toRealPath(LinkOption... options) { this } + @Override Path toRealPath(LinkOption... options) { this } @Override - File toFile() { - throw new UnsupportedOperationException("toFile() not supported for seqera:// paths") - } + File toFile() { throw new UnsupportedOperationException("toFile() not supported for seqera:// paths") } @Override - WatchKey register(WatchService watcher, WatchEvent.Kind[] events, WatchEvent.Modifier... modifiers) { + WatchKey register(WatchService w, WatchEvent.Kind[] e, WatchEvent.Modifier... m) { throw new UnsupportedOperationException("WatchService not supported by seqera:// paths") } @Override - WatchKey register(WatchService watcher, WatchEvent.Kind... events) { + WatchKey register(WatchService w, WatchEvent.Kind... e) { throw new UnsupportedOperationException("WatchService not supported by seqera:// paths") } @Override Iterator iterator() { final d = depth() - final List parts = new ArrayList<>(d) - for (int i = 0; i < d; i++) { - parts.add(getName(i)) - } - return parts.iterator() + final out = new ArrayList(d) + for (int i = 0; i < d; i++) out.add(getName(i)) + return out.iterator() } - @Override - int compareTo(Path other) { - return toString().compareTo(other.toString()) - } + @Override int compareTo(Path other) { toString().compareTo(other.toString()) } @Override boolean equals(Object obj) { @@ -463,23 +371,17 @@ class SeqeraPath implements Path { return toString() == obj.toString() } - @Override - int hashCode() { toString().hashCode() } + @Override int hashCode() { toString().hashCode() } static URI asUri(String path) { - if( !path ) - throw new IllegalArgumentException("Missing 'path' argument") - if( !path.startsWith(PROTOCOL) ) + if (!path) throw new IllegalArgumentException("Missing 'path' argument") + if (!path.startsWith(PROTOCOL)) throw new IllegalArgumentException("Invalid Seqera file system path URI - it must start with '${PROTOCOL}' prefix - offending value: $path") - if( path.startsWith(PROTOCOL + SEPARATOR) && path.length() > PROTOCOL.length() + 1 ) + if (path.startsWith(PROTOCOL + SEPARATOR) && path.length() > PROTOCOL.length() + 1) throw new IllegalArgumentException("Invalid Seqera file system path URI - make sure the scheme prefix does not contain more than two slash characters or a query in the root '/' - offending value: $path") - - //URI strings like seqera://./something are converted to seqera://something - if( path.startsWith(PROTOCOL + './') ) { + if (path.startsWith(PROTOCOL + './')) path = PROTOCOL + path.substring(PROTOCOL.length() + 2) - } - - if( path == PROTOCOL || path == PROTOCOL + '.') //Empty path case + if (path == PROTOCOL || path == PROTOCOL + '.') return new URI(PROTOCOL + '/') return new URI(path) } diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy new file mode 100644 index 0000000000..5a5715e86c --- /dev/null +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy @@ -0,0 +1,211 @@ +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.fs.handler + +import java.net.http.HttpClient +import java.net.http.HttpRequest +import java.net.http.HttpResponse +import java.nio.file.AccessDeniedException +import java.nio.file.AccessMode +import java.nio.file.NoSuchFileException +import java.nio.file.Path +import java.time.Duration +import java.time.Instant + +import groovy.transform.CompileStatic +import groovy.util.logging.Slf4j +import io.seqera.tower.model.DataLinkDto +import io.seqera.tower.model.DataLinkItem +import io.seqera.tower.model.DataLinkItemType +import io.seqera.tower.plugin.datalink.PagedDataLinkContent +import io.seqera.tower.plugin.datalink.SeqeraDataLinkClient +import io.seqera.tower.plugin.fs.ResourceTypeHandler +import io.seqera.tower.plugin.fs.SeqeraFileAttributes +import io.seqera.tower.plugin.fs.SeqeraFileSystem +import io.seqera.tower.plugin.fs.SeqeraPath + +/** + * {@link ResourceTypeHandler} for the {@code data-links} resource type. + * + * Listings and attribute queries go through the Seqera Platform API; file reads + * use a pre-signed URL obtained from {@code /generate-download-url} and fetched + * with a plain JDK {@link HttpClient} — the Seqera {@code Authorization} header + * must not be sent to the cloud-backed URL. + * + * Data-link list and directory content are streamed lazily to avoid materializing + * potentially large result sets in memory. + */ +@Slf4j +@CompileStatic +class DataLinksResourceHandler implements ResourceTypeHandler { + + public static final String TYPE = 'data-links' + + private final SeqeraFileSystem fs + private final SeqeraDataLinkClient client + private final HttpClient httpClient + + DataLinksResourceHandler(SeqeraFileSystem fs, SeqeraDataLinkClient client) { + this(fs, client, HttpClient.newBuilder().connectTimeout(Duration.ofSeconds(10)).build()) + } + + /** Test-only constructor to inject a mock {@link HttpClient}. */ + DataLinksResourceHandler(SeqeraFileSystem fs, SeqeraDataLinkClient client, HttpClient httpClient) { + this.fs = fs + this.client = client + this.httpClient = httpClient + } + + @Override + String getResourceType() { TYPE } + + @Override + Iterable list(SeqeraPath dir) throws IOException { + final workspaceId = fs.resolveWorkspaceId(dir.org, dir.workspace) + final trail = dir.trail + if (trail.isEmpty()) { + // data-links/ → distinct providers in use (sorted). Iterate the stream, + // collect distinct provider names — small output. + final providers = new TreeSet() + final Iterator it = client.listDataLinks(workspaceId) + while (it.hasNext()) { + final p = it.next().provider?.toString() + if (p) providers.add(p) + } + return providers.collect { String p -> dir.resolve(p) as Path } + } + if (trail.size() == 1) { + // data-links// → sorted data-link names for that provider + final prov = trail[0] + final names = new TreeSet() + final Iterator it = client.listDataLinks(workspaceId) + while (it.hasNext()) { + final dl = it.next() + if (dl.provider?.toString() == prov) names.add(dl.name) + } + if (names.isEmpty()) + throw new NoSuchFileException(dir.toString(), null, "No data-links for provider '$prov' in workspace '${dir.workspace}'") + return names.collect { String n -> dir.resolve(n) as Path } + } + // trail.size() >= 2 — browse inside a specific data-link. + // Content can be very large, so we stream it lazily. + final dl = client.getDataLink(workspaceId, trail[0], trail[1]) + final subPath = trail.size() > 2 ? trail.subList(2, trail.size()).join('/') : '' + final content = client.getContent(dl.id, subPath, workspaceId) + return new PathMappingIterable(content, dir) + } + + @Override + SeqeraFileAttributes readAttributes(SeqeraPath p) throws IOException { + final workspaceId = fs.resolveWorkspaceId(p.org, p.workspace) + final trail = p.trail + if (trail.size() < 2) { + // data-links/ or data-links/ — always directory + return new SeqeraFileAttributes(true) + } + final dl = client.getDataLink(workspaceId, trail[0], trail[1]) + if (trail.size() == 2) return new SeqeraFileAttributes(true) // data-link root + final subPath = trail.subList(2, trail.size()).join('/') + final content = client.getContent(dl.id, subPath, workspaceId) + return attributesFor(content, subPath, p) + } + + @Override + InputStream newInputStream(SeqeraPath p) throws IOException { + if (p.trail.size() < 3) + throw new IllegalArgumentException("newInputStream requires a file path inside a data-link: $p") + final workspaceId = fs.resolveWorkspaceId(p.org, p.workspace) + final dl = client.getDataLink(workspaceId, p.trail[0], p.trail[1]) + final subPath = p.trail.subList(2, p.trail.size()).join('/') + final urlResp = client.getDownloadUrl(dl.id, subPath, workspaceId) + if (!urlResp.url) + throw new NoSuchFileException(p.toString(), null, "Platform returned no download URL") + return fetchSignedUrl(urlResp.url) + } + + @Override + void checkAccess(SeqeraPath p, AccessMode... modes) throws IOException { + for (AccessMode m : modes) { + if (m == AccessMode.WRITE || m == AccessMode.EXECUTE) + throw new AccessDeniedException(p.toString(), null, "seqera:// data-links are read-only") + } + readAttributes(p) + } + + // ---- private helpers ---- + + /** + * Decide whether {@code subPath} refers to a file or a directory by inspecting + * only the first page of the content response — never paginates further. + */ + private static SeqeraFileAttributes attributesFor(PagedDataLinkContent content, String subPath, SeqeraPath pathForErrors) throws NoSuchFileException { + final firstPage = content.firstPage + final lastSeg = subPath.contains('/') ? subPath.substring(subPath.lastIndexOf('/') + 1) : subPath + // Single-file response: one FILE item whose name matches the final segment + final single = firstPage.find { DataLinkItem it -> it.name == lastSeg && it.type == DataLinkItemType.FILE } + if (single) + return new SeqeraFileAttributes(single.size ?: 0L, Instant.EPOCH, Instant.EPOCH, pathForErrors.toString()) + // If there are children, this is a directory listing + if (!firstPage.isEmpty()) return new SeqeraFileAttributes(true) + // No items AND no originalPath → path does not exist + if (!content.originalPath) + throw new NoSuchFileException(pathForErrors.toString(), null, "Path not found inside data-link") + return new SeqeraFileAttributes(true) + } + + private InputStream fetchSignedUrl(String url) throws IOException { + final req = HttpRequest.newBuilder(URI.create(url)) + .timeout(Duration.ofMinutes(5)) + .GET() + .build() + try { + final HttpResponse resp = httpClient.send(req, HttpResponse.BodyHandlers.ofInputStream()) + final status = resp.statusCode() + if (status >= 200 && status < 300) return resp.body() + try { resp.body()?.close() } catch (Throwable ignored) {} + throw new IOException("Signed URL fetch failed: HTTP $status for $url") + } catch (InterruptedException e) { + Thread.currentThread().interrupt() + throw new IOException("Interrupted while fetching signed URL", e) + } + } + + /** + * Lazy {@link Iterable} that maps each {@link DataLinkItem} from a + * {@link PagedDataLinkContent} to a child {@link SeqeraPath} under + * {@code parent}. Pages are fetched on demand as the iterator advances. + */ + @CompileStatic + private static class PathMappingIterable implements Iterable { + private final PagedDataLinkContent content + private final SeqeraPath parent + + PathMappingIterable(PagedDataLinkContent content, SeqeraPath parent) { + this.content = content + this.parent = parent + } + + @Override + Iterator iterator() { + final Iterator inner = content.iterator() + return new Iterator() { + @Override boolean hasNext() { inner.hasNext() } + @Override Path next() { parent.resolve(inner.next().name) as Path } + } + } + } +} diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandler.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandler.groovy new file mode 100644 index 0000000000..a04b1bc154 --- /dev/null +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandler.groovy @@ -0,0 +1,165 @@ +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.fs.handler + +import java.nio.file.AccessDeniedException +import java.nio.file.AccessMode +import java.nio.file.NoSuchFileException +import java.nio.file.Path +import java.time.Instant + +import groovy.transform.CompileStatic +import groovy.util.logging.Slf4j +import io.seqera.tower.model.DatasetDto +import io.seqera.tower.model.DatasetVersionDto +import io.seqera.tower.plugin.dataset.SeqeraDatasetClient +import io.seqera.tower.plugin.fs.ResourceTypeHandler +import io.seqera.tower.plugin.fs.SeqeraFileAttributes +import io.seqera.tower.plugin.fs.SeqeraFileSystem +import io.seqera.tower.plugin.fs.SeqeraPath + +/** + * {@link ResourceTypeHandler} for the {@code datasets} resource type. + * Owns its own dataset/version caches and {@code @version} suffix parsing. + */ +@Slf4j +@CompileStatic +class DatasetsResourceHandler implements ResourceTypeHandler { + + public static final String TYPE = 'datasets' + + private final SeqeraFileSystem fs + private final SeqeraDatasetClient client + + /** workspaceId → list of DatasetDto */ + private final Map> datasetCache = new LinkedHashMap<>() + /** datasetId → list of DatasetVersionDto */ + private final Map> versionCache = new LinkedHashMap<>() + + DatasetsResourceHandler(SeqeraFileSystem fs, SeqeraDatasetClient client) { + this.fs = fs + this.client = client + } + + @Override + String getResourceType() { TYPE } + + @Override + Iterable list(SeqeraPath dir) throws IOException { + final d = dir.depth() + if (d == 3) { + final workspaceId = fs.resolveWorkspaceId(dir.org, dir.workspace) + return resolveDatasets(workspaceId).collect { DatasetDto ds -> dir.resolve(ds.name) as Path } + } + throw new IllegalArgumentException("datasets handler does not list depth $d paths: $dir") + } + + @Override + SeqeraFileAttributes readAttributes(SeqeraPath p) throws IOException { + final d = p.depth() + if (d == 3) { + fs.resolveWorkspaceId(p.org, p.workspace) // validates + return new SeqeraFileAttributes(true) + } + if (d != 4) + throw new NoSuchFileException(p.toString(), null, "Invalid dataset path depth: $d") + final names = parseNameAndVersion(p.trail[0]) + final workspaceId = fs.resolveWorkspaceId(p.org, p.workspace) + final dataset = resolveDataset(workspaceId, names[0]) + if (!dataset) + throw new NoSuchFileException(p.toString(), null, "Dataset '${names[0]}' not found in workspace ${p.workspace}") + return new SeqeraFileAttributes( + 0L, + dataset.lastUpdated?.toInstant() ?: Instant.EPOCH, + dataset.dateCreated?.toInstant() ?: Instant.EPOCH, + dataset.id) + } + + @Override + InputStream newInputStream(SeqeraPath p) throws IOException { + if (p.depth() != 4) + throw new IllegalArgumentException("Operation `newInputStream` requires a dataset path (depth 4): $p") + final names = parseNameAndVersion(p.trail[0]) + final workspaceId = fs.resolveWorkspaceId(p.org, p.workspace) + final dataset = resolveDataset(workspaceId, names[0]) + if (!dataset) + throw new NoSuchFileException(p.toString(), null, "Dataset '${names[0]}' not found in workspace ${p.workspace}") + final version = resolveVersion(dataset, names[1], p) + log.debug "Downloading dataset '${names[0]}' version ${version.version} (${version.fileName}) from workspace $workspaceId" + return client.downloadDataset(dataset.id, String.valueOf(version.version), version.fileName, dataset.workspaceId) + } + + @Override + void checkAccess(SeqeraPath p, AccessMode... modes) throws IOException { + for (AccessMode m : modes) { + if (m == AccessMode.WRITE || m == AccessMode.EXECUTE) + throw new AccessDeniedException(p.toString(), null, "seqera:// datasets are read-only") + } + readAttributes(p) + } + + // ---- helpers ---- + + /** + * Split a trail segment into {@code [name, version]}. Version is {@code null} when + * the segment does not contain an {@code @}. + */ + private static String[] parseNameAndVersion(String segment) { + final atIdx = segment.lastIndexOf('@') + if (atIdx > 0) + return [segment.substring(0, atIdx), segment.substring(atIdx + 1)] as String[] + return [segment, null] as String[] + } + + private synchronized List resolveDatasets(long workspaceId) { + def cached = datasetCache.get(workspaceId) + if (cached == null) { + cached = client.listDatasets(workspaceId) + datasetCache.put(workspaceId, cached) + } + return cached + } + + private synchronized DatasetDto resolveDataset(long workspaceId, String name) { + return resolveDatasets(workspaceId).find { DatasetDto d -> d.name == name } + } + + private synchronized List resolveVersions(String datasetId, long workspaceId) { + def cached = versionCache.get(datasetId) + if (cached == null) { + cached = client.listVersions(datasetId, workspaceId) + versionCache.put(datasetId, cached) + } + return cached + } + + private DatasetVersionDto resolveVersion(DatasetDto dataset, String pinnedVersion, SeqeraPath p) throws IOException { + final versions = resolveVersions(dataset.id, dataset.workspaceId) + if (versions.isEmpty()) + throw new NoSuchFileException(p.toString(), null, "No versions available for dataset '${dataset.name}'") + if (pinnedVersion) { + final found = versions.find { DatasetVersionDto v -> String.valueOf(v.version) == pinnedVersion } + if (!found) + throw new NoSuchFileException(p.toString(), null, "Version '$pinnedVersion' not found for dataset '${dataset.name}'") + return found + } + final latest = versions.findAll { DatasetVersionDto v -> !v.disabled }.max { DatasetVersionDto v -> v.version } + if (!latest) + throw new NoSuchFileException(p.toString(), null, "No enabled versions for dataset '${dataset.name}'") + return latest + } +} diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/datalink/SeqeraDataLinkClientTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/datalink/SeqeraDataLinkClientTest.groovy new file mode 100644 index 0000000000..e57759b150 --- /dev/null +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/datalink/SeqeraDataLinkClientTest.groovy @@ -0,0 +1,243 @@ +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.datalink + +import java.nio.file.AccessDeniedException +import java.nio.file.NoSuchFileException + +import groovy.json.JsonOutput +import io.seqera.tower.model.DataLinkDto +import io.seqera.tower.plugin.TowerClient +import nextflow.exception.AbortOperationException +import spock.lang.Specification + +/** + * Tests for {@link SeqeraDataLinkClient} using a spy {@link TowerClient}. + */ +class SeqeraDataLinkClientTest extends Specification { + + private static final String EP = 'https://api.example.com' + + private TowerClient tower() { + def tc = Spy(TowerClient) + tc.@endpoint = EP + return tc + } + + private static TowerClient.Response ok(String body) { new TowerClient.Response(200, body) } + private static TowerClient.Response err(int code) { new TowerClient.Response(code, "error $code") } + + private static List drain(Iterator it) { + final out = new ArrayList() + while (it.hasNext()) out.add(it.next()) + return out + } + + // ---- listDataLinks ---- + + def "listDataLinks yields DTOs lazily for a single page"() { + given: + def body = JsonOutput.toJson([dataLinks: [ + [id: 'dl-1', name: 'inputs', provider: 'aws', resourceRef: 's3://bucket/'], + [id: 'dl-2', name: 'archive', provider: 'google', resourceRef: 'gs://bucket/'] + ], totalSize: 2]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0") >> ok(body) + def client = new SeqeraDataLinkClient(tc) + + when: + def list = drain(client.listDataLinks(10L)) + + then: + list.size() == 2 + list[0].id == 'dl-1' + list[1].provider.toString() == 'google' + } + + def "listDataLinks exhausts pagination across multiple pages"() { + given: + def p1 = JsonOutput.toJson([dataLinks: [[id: 'dl-1', name: 'a', provider: 'aws']], totalSize: 3]) + def p2 = JsonOutput.toJson([dataLinks: [[id: 'dl-2', name: 'b', provider: 'aws']], totalSize: 3]) + def p3 = JsonOutput.toJson([dataLinks: [[id: 'dl-3', name: 'c', provider: 'aws']], totalSize: 3]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0") >> ok(p1) + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=1") >> ok(p2) + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=2") >> ok(p3) + def client = new SeqeraDataLinkClient(tc) + + when: + def list = drain(client.listDataLinks(10L)) + + then: + list*.id == ['dl-1', 'dl-2', 'dl-3'] + } + + def "listDataLinks short-circuits — only fetches enough pages to satisfy the consumer"() { + given: + def p1 = JsonOutput.toJson([dataLinks: [[id: 'dl-1', name: 'a', provider: 'aws']], totalSize: 5]) + def tc = tower() + def client = new SeqeraDataLinkClient(tc) + + when: + def it = client.listDataLinks(10L) + def first = it.next() + + then: + 1 * tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0") >> ok(p1) + 0 * tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=1") + first.id == 'dl-1' + } + + def "listDataLinks returns empty iterator when workspace has none"() { + given: + def body = JsonOutput.toJson([dataLinks: [], totalSize: 0]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0") >> ok(body) + def client = new SeqeraDataLinkClient(tc) + + expect: + !client.listDataLinks(10L).hasNext() + } + + // ---- getContent ---- + + def "getContent on a sub-path uses /browse/{path}"() { + given: + def body = JsonOutput.toJson([ + originalPath: 'reads/', + objects: [ + [name: 'a.fq', type: 'FILE', size: 123, mimeType: 'application/gzip'], + [name: 'b.fq', type: 'FILE', size: 456, mimeType: 'application/gzip'] + ]]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links/dl-1/browse/reads/?workspaceId=10") >> ok(body) + def client = new SeqeraDataLinkClient(tc) + + when: + def resp = client.getContent('dl-1', 'reads/', 10L) + + then: + resp.originalPath == 'reads/' + resp.firstPage.size() == 2 + resp.firstPage[0].name == 'a.fq' + resp.firstPage[0].size == 123L + } + + def "getContent at the data-link root uses /browse (no path)"() { + given: + def body = JsonOutput.toJson([originalPath: '', objects: [[name: 'a', type: 'FILE', size: 1]]]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links/dl-1/browse?workspaceId=10") >> ok(body) + def client = new SeqeraDataLinkClient(tc) + + when: + def resp = client.getContent('dl-1', null, 10L) + + then: + resp.firstPage*.name == ['a'] + } + + def "getContent iterator lazily paginates across pages"() { + given: + def p1 = JsonOutput.toJson([originalPath: '', objects: [[name: 'a', type: 'FILE', size: 1]], nextPageToken: 'T2']) + def p2 = JsonOutput.toJson([originalPath: '', objects: [[name: 'b', type: 'FILE', size: 2]]]) + def tc = tower() + def client = new SeqeraDataLinkClient(tc) + + when: 'the caller iterates the full stream' + def resp = client.getContent('dl-1', null, 10L) + def names = resp.collect { it.name } + + then: 'first page fetched eagerly; second page fetched only when iterator advances past page 1' + 1 * tc.sendApiRequest("${EP}/data-links/dl-1/browse?workspaceId=10") >> ok(p1) + 1 * tc.sendApiRequest("${EP}/data-links/dl-1/browse?workspaceId=10&nextPageToken=T2") >> ok(p2) + names == ['a', 'b'] + } + + def "getContent does not fetch page 2 if the caller only consumes the first page"() { + given: + def p1 = JsonOutput.toJson([originalPath: '', objects: [[name: 'a', type: 'FILE', size: 1]], nextPageToken: 'T2']) + def tc = tower() + def client = new SeqeraDataLinkClient(tc) + + when: 'caller only reads firstPage metadata without iterating' + def resp = client.getContent('dl-1', null, 10L) + def first = resp.firstPage + + then: + 1 * tc.sendApiRequest("${EP}/data-links/dl-1/browse?workspaceId=10") >> ok(p1) + 0 * tc.sendApiRequest("${EP}/data-links/dl-1/browse?workspaceId=10&nextPageToken=T2") + first*.name == ['a'] + } + + // ---- getDownloadUrl ---- + + def "getDownloadUrl returns the signed URL from /generate-download-url"() { + given: + def tc = tower() + def expectedUrl = "${EP}/data-links/dl-1/generate-download-url?workspaceId=10&filePath=" + URLEncoder.encode('reads/a.fq', 'UTF-8') + tc.sendApiRequest(expectedUrl) >> ok(JsonOutput.toJson([url: 'https://signed'])) + def client = new SeqeraDataLinkClient(tc) + + when: + def resp = client.getDownloadUrl('dl-1', 'reads/a.fq', 10L) + + then: + resp.url == 'https://signed' + } + + // ---- error mapping ---- + + def "401 throws AbortOperationException"() { + given: + def tc = tower() + tc.sendApiRequest(_) >> err(401) + def client = new SeqeraDataLinkClient(tc) + + when: + drain(client.listDataLinks(10L)) + + then: + thrown(AbortOperationException) + } + + def "403 throws AccessDeniedException"() { + given: + def tc = tower() + tc.sendApiRequest(_) >> err(403) + def client = new SeqeraDataLinkClient(tc) + + when: + client.getContent('dl-1', '', 10L) + + then: + thrown(AccessDeniedException) + } + + def "404 throws NoSuchFileException"() { + given: + def tc = tower() + tc.sendApiRequest(_) >> err(404) + def client = new SeqeraDataLinkClient(tc) + + when: + client.getDownloadUrl('dl-1', 'missing', 10L) + + then: + thrown(NoSuchFileException) + } +} diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/ResourceTypeAbstractionTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/ResourceTypeAbstractionTest.groovy new file mode 100644 index 0000000000..ed1958b52b --- /dev/null +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/ResourceTypeAbstractionTest.groovy @@ -0,0 +1,89 @@ +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.fs + +import spock.lang.Specification + +/** + * Guards that the generic NIO layer does not reach into resource-type-specific packages. + * + * {@link SeqeraPath}, {@link SeqeraFileSystem}, {@link SeqeraFileAttributes} must not + * depend on {@code dataset/}, {@code datalink/}, or {@code fs/handler/}. Dispatch goes + * through the {@link ResourceTypeHandler} interface. + * + * {@link SeqeraFileSystemProvider} is the dispatch point: it wires handlers and routes + * calls to them, so it is *expected* to import the handler packages. The guard only + * applies to the generic classes above. + */ +class ResourceTypeAbstractionTest extends Specification { + + static final Class[] GENERIC_CLASSES = [SeqeraPath, SeqeraFileSystem, SeqeraFileAttributes] + + private static File srcRoot() { + // Gradle test cwd may be the plugin module dir or the repo root. + final candidates = [ + 'src/main/io/seqera/tower/plugin/fs', + 'plugins/nf-tower/src/main/io/seqera/tower/plugin/fs' + ] + for (String c : candidates) { + final f = new File(c) + if (f.isDirectory()) return f + } + throw new IllegalStateException("Cannot locate plugin source directory from ${new File('.').absolutePath}") + } + + static final File SRC_ROOT = srcRoot() + + def "generic fs classes do not import resource-type-specific packages"() { + expect: + GENERIC_CLASSES.each { Class c -> + final src = new File(SRC_ROOT, "${c.simpleName}.groovy").text + assert !src.contains('io.seqera.tower.plugin.datalink.'), "${c.simpleName} must not import datalink package" + assert !src.contains('io.seqera.tower.plugin.fs.handler.'), "${c.simpleName} must not import handler package" + assert !src.contains('DataLink'), "${c.simpleName} must not reference data-link types" + assert !src.contains('DatasetDto'), "${c.simpleName} must not reference DatasetDto" + assert !src.contains('DatasetVersionDto'), "${c.simpleName} must not reference DatasetVersionDto" + } + } + + def "generic fs classes do not carry resource-type string literals"() { + expect: + GENERIC_CLASSES.each { Class c -> + final src = new File(SRC_ROOT, "${c.simpleName}.groovy").text + assert !src.contains("'datasets'"), "${c.simpleName} must not hard-code the 'datasets' resource type" + assert !src.contains('"datasets"'), "${c.simpleName} must not hard-code the 'datasets' resource type" + assert !src.contains("'data-links'"), "${c.simpleName} must not hard-code the 'data-links' resource type" + assert !src.contains('"data-links"'), "${c.simpleName} must not hard-code the 'data-links' resource type" + } + } + + def "both handlers implement the ResourceTypeHandler interface"() { + expect: + ResourceTypeHandler.isAssignableFrom(io.seqera.tower.plugin.fs.handler.DatasetsResourceHandler) + ResourceTypeHandler.isAssignableFrom(io.seqera.tower.plugin.fs.handler.DataLinksResourceHandler) + } + + def "handlers do not reference each other's resource type"() { + expect: + final datasetSrc = new File(SRC_ROOT, 'handler/DatasetsResourceHandler.groovy').text + final dataLinkSrc = new File(SRC_ROOT, 'handler/DataLinksResourceHandler.groovy').text + !datasetSrc.contains('DataLink') + !datasetSrc.contains('datalink') + !dataLinkSrc.contains('DatasetDto') + !dataLinkSrc.contains('DatasetVersionDto') + } +} diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy index 7a698c1d5a..a7d3cacd33 100644 --- a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy @@ -53,7 +53,10 @@ class SeqeraFileSystemProviderTest extends Specification { private SeqeraFileSystem buildFs(TowerClient tc) { final client = new SeqeraDatasetClient(tc) final provider = new SeqeraFileSystemProvider() - return new SeqeraFileSystem(provider, client) + final fs = new SeqeraFileSystem(provider) + fs.setOrgWorkspaceClient(client) + fs.registerHandler(new io.seqera.tower.plugin.fs.handler.DatasetsResourceHandler(fs, client)) + return fs } private static String userInfoJson() { @@ -240,16 +243,18 @@ class SeqeraFileSystemProviderTest extends Specification { entries[0].toString() == 'seqera://acme/research' } - def "newDirectoryStream on workspace returns datasets resource type"() { + def "newDirectoryStream on workspace returns registered resource types"() { given: def tc = spyTower() + tc.sendApiRequest("${ENDPOINT}/user-info") >> ok(userInfoJson()) + tc.sendApiRequest("${ENDPOINT}/user/42/workspaces") >> ok(workspacesJson()) final fs = buildFs(tc) final wsPath = new SeqeraPath(fs, 'seqera://acme/research') when: def entries = fs.provider().newDirectoryStream(wsPath, null).toList() - then: + then: 'only datasets is registered by this test helper; data-links registration happens in the production provider' entries.size() == 1 entries[0].toString() == 'seqera://acme/research/datasets' } @@ -397,9 +402,8 @@ class SeqeraFileSystemProviderTest extends Specification { def "newFileSystem throws FileSystemAlreadyExistsException when filesystem exists"() { given: 'a provider with an existing filesystem' - def tc = spyTower() def provider = new SeqeraFileSystemProvider() - def fs = new SeqeraFileSystem(provider, new SeqeraDatasetClient(tc)) + def fs = new SeqeraFileSystem(provider) provider.@fileSystem = fs when: @@ -408,4 +412,44 @@ class SeqeraFileSystemProviderTest extends Specification { then: thrown(FileSystemAlreadyExistsException) } + + // ---- handler dispatch ---- + + def "newDirectoryStream at workspace enumerates registered handlers (datasets + data-links)"() { + given: + def tc = spyTower() + tc.sendApiRequest("${ENDPOINT}/user-info") >> ok(userInfoJson()) + tc.sendApiRequest("${ENDPOINT}/user/42/workspaces") >> ok(workspacesJson()) + def datasetClient = new SeqeraDatasetClient(tc) + def fs = new SeqeraFileSystem(new SeqeraFileSystemProvider()) + fs.setOrgWorkspaceClient(datasetClient) + fs.registerHandler(new io.seqera.tower.plugin.fs.handler.DatasetsResourceHandler(fs, datasetClient)) + fs.registerHandler(new io.seqera.tower.plugin.fs.handler.DataLinksResourceHandler(fs, new io.seqera.tower.plugin.datalink.SeqeraDataLinkClient(tc))) + def wsPath = new SeqeraPath(fs, 'seqera://acme/research') + + when: + def entries = fs.provider().newDirectoryStream(wsPath, null).toList() + + then: + entries*.toString().sort() == [ + 'seqera://acme/research/data-links', + 'seqera://acme/research/datasets' + ] + } + + def "newInputStream on an unsupported resource type throws NoSuchFileException"() { + given: + def tc = spyTower() + tc.sendApiRequest("${ENDPOINT}/user-info") >> ok(userInfoJson()) + tc.sendApiRequest("${ENDPOINT}/user/42/workspaces") >> ok(workspacesJson()) + def fs = buildFs(tc) + def path = new SeqeraPath(fs, 'seqera://acme/research/unknown-type/foo') + + when: + fs.provider().newInputStream(path) + + then: + def ex = thrown(NoSuchFileException) + ex.reason?.contains('Unsupported resource type') + } } diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemTest.groovy index c7db27a534..b600177e69 100644 --- a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemTest.groovy +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemTest.groovy @@ -24,7 +24,8 @@ import io.seqera.tower.plugin.dataset.SeqeraDatasetClient import spock.lang.Specification /** - * Tests for {@link SeqeraFileSystem} caching and workspace resolution using a mock {@link TowerClient}. + * Tests for {@link SeqeraFileSystem} org/workspace cache and handler registry. + * Resource-specific caches (datasets, data-links) are tested against their handlers. */ class SeqeraFileSystemTest extends Specification { @@ -53,7 +54,9 @@ class SeqeraFileSystemTest extends Specification { } private SeqeraFileSystem buildFs(TowerClient tc) { - new SeqeraFileSystem(new SeqeraFileSystemProvider(), new SeqeraDatasetClient(tc)) + final fs = new SeqeraFileSystem(new SeqeraFileSystemProvider()) + fs.setOrgWorkspaceClient(new SeqeraDatasetClient(tc)) + return fs } // ---- cache loading ---- @@ -147,58 +150,21 @@ class SeqeraFileSystemTest extends Specification { thrown(NoSuchFileException) } - // ---- dataset cache ---- + // ---- handler registry ---- - def "resolveDatasets populates cache and returns datasets"() { + def "registerHandler stores and looks up by resource type"() { given: - def tc = spyTower() - tc.sendApiRequest("${ENDPOINT}/datasets?workspaceId=10") >> - ok(JsonOutput.toJson([datasets: [ - [id: 'ds-1', name: 'samples', version: 1L, mediaType: 'text/csv', - dateCreated: '2024-01-01T00:00:00Z', lastUpdated: '2024-01-02T00:00:00Z'] - ], totalSize: 1])) - final fs = buildFs(tc) - - when: - def datasets = fs.resolveDatasets(10L) - - then: - datasets.size() == 1 - datasets[0].name == 'samples' - } - - def "resolveDatasets returns cached result on second call without extra API request"() { - given: - def tc = spyTower() - final datasetsJson = JsonOutput.toJson([datasets: [ - [id: 'ds-1', name: 'samples', version: 1L, mediaType: 'text/csv', - dateCreated: '2024-01-01T00:00:00Z', lastUpdated: '2024-01-02T00:00:00Z'] - ], totalSize: 1]) - final fs = buildFs(tc) - - when: - fs.resolveDatasets(10L) - fs.resolveDatasets(10L) - - then: - 1 * tc.sendApiRequest("${ENDPOINT}/datasets?workspaceId=10") >> ok(datasetsJson) - } - - def "invalidateDatasetCache forces re-fetch on next resolveDatasets call"() { - given: - def tc = spyTower() - final datasetsJson = JsonOutput.toJson([datasets: [ - [id: 'ds-1', name: 'samples', version: 1L, mediaType: 'text/csv', - dateCreated: '2024-01-01T00:00:00Z', lastUpdated: '2024-01-02T00:00:00Z'] - ], totalSize: 1]) - final fs = buildFs(tc) + def fs = new SeqeraFileSystem(new SeqeraFileSystemProvider()) + def handler = Mock(ResourceTypeHandler) { + getResourceType() >> 'datasets' + } when: - fs.resolveDatasets(10L) - fs.invalidateDatasetCache(10L) - fs.resolveDatasets(10L) + fs.registerHandler(handler) then: - 2 * tc.sendApiRequest("${ENDPOINT}/datasets?workspaceId=10") >> ok(datasetsJson) + fs.getHandler('datasets') === handler + fs.getHandler('unknown') == null + fs.getResourceTypes() == ['datasets'] as Set } } diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraPathTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraPathTest.groovy index 69a5e26915..22afcc1031 100644 --- a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraPathTest.groovy +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraPathTest.groovy @@ -16,7 +16,6 @@ package io.seqera.tower.plugin.fs -import io.seqera.tower.plugin.dataset.SeqeraDatasetClient import spock.lang.Specification /** @@ -26,10 +25,11 @@ class SeqeraPathTest extends Specification { private SeqeraFileSystem mockFs() { def provider = new SeqeraFileSystemProvider() - def client = Mock(SeqeraDatasetClient) - return new SeqeraFileSystem(provider, client) + return new SeqeraFileSystem(provider) } + // ---- depth / segment accessors ---- + def "depth 0 - root path"() { given: def fs = mockFs() @@ -37,10 +37,10 @@ class SeqeraPathTest extends Specification { expect: path.depth() == 0 - path.isDirectory() - !path.isRegularFile() path.org == null path.workspace == null + path.resourceType == null + path.trail == [] } def "depth 1 - org path"() { @@ -50,8 +50,6 @@ class SeqeraPathTest extends Specification { expect: path.depth() == 1 - path.isDirectory() - !path.isRegularFile() path.org == 'acme' path.workspace == null } @@ -63,7 +61,6 @@ class SeqeraPathTest extends Specification { expect: path.depth() == 2 - path.isDirectory() path.org == 'acme' path.workspace == 'research' path.resourceType == null @@ -76,40 +73,58 @@ class SeqeraPathTest extends Specification { expect: path.depth() == 3 - path.isDirectory() path.org == 'acme' path.workspace == 'research' path.resourceType == 'datasets' - path.datasetName == null + path.trail == [] } - def "depth 4 - dataset file path"() { + def "depth 4 - dataset trail segment is raw (handler parses @version)"() { given: def fs = mockFs() def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples') expect: path.depth() == 4 - !path.isDirectory() - path.isRegularFile() - path.org == 'acme' - path.workspace == 'research' path.resourceType == 'datasets' - path.datasetName == 'samples' - path.version == null + path.trail == ['samples'] } - def "depth 4 - dataset with pinned version"() { + def "dataset with @version suffix stays raw in trail"() { given: def fs = mockFs() def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples@2') + expect: + // Path is resource-type-agnostic — no @version parsing here. + path.depth() == 4 + path.trail == ['samples@2'] + } + + def "data-link path with provider, name, and sub-path"() { + given: + def fs = mockFs() + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/reads/sample.fq.gz') + + expect: + path.depth() == 7 + path.resourceType == 'data-links' + path.trail == ['AWS', 'inputs', 'reads', 'sample.fq.gz'] + } + + def "data-link path at provider level"() { + given: + def fs = mockFs() + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS') + expect: path.depth() == 4 - path.datasetName == 'samples' - path.version == '2' + path.resourceType == 'data-links' + path.trail == ['AWS'] } + // ---- toUri / toString ---- + def "toUri round-trip - no version"() { given: def fs = mockFs() @@ -121,7 +136,7 @@ class SeqeraPathTest extends Specification { path.toString() == uri } - def "toUri round-trip - with version"() { + def "toUri round-trip - dataset with @version"() { given: def fs = mockFs() def uri = 'seqera://acme/research/datasets/samples@2' @@ -131,6 +146,18 @@ class SeqeraPathTest extends Specification { path.toUri().toString() == uri } + def "toUri round-trip - deep data-link path"() { + given: + def fs = mockFs() + def uri = 'seqera://acme/research/data-links/AWS/inputs/reads/sample.fq.gz' + def path = new SeqeraPath(fs, uri) + + expect: + path.toUri().toString() == uri + } + + // ---- getParent ---- + def "getParent - depth 4 returns depth 3"() { given: def fs = mockFs() @@ -144,6 +171,19 @@ class SeqeraPathTest extends Specification { (parent as SeqeraPath).depth() == 3 } + def "getParent - depth 7 returns depth 6 (drops one trail segment)"() { + given: + def fs = mockFs() + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/reads/s.fq.gz') + + when: + def parent = path.getParent() as SeqeraPath + + then: + parent.trail == ['AWS', 'inputs', 'reads'] + parent.depth() == 6 + } + def "getParent - depth 3 returns depth 2"() { given: def fs = mockFs() @@ -171,6 +211,8 @@ class SeqeraPathTest extends Specification { path.getParent() == null } + // ---- resolve ---- + def "resolve - appends segment to workspace"() { given: def fs = mockFs() @@ -196,19 +238,52 @@ class SeqeraPathTest extends Specification { resolved.toString() == 'seqera://acme/research/datasets/my-dataset' } - def "resolve - dataset name with version"() { + def "resolve - dataset name with @version preserved as raw trail segment"() { given: def fs = mockFs() def path = new SeqeraPath(fs, 'seqera://acme/research/datasets') when: - def resolved = path.resolve('samples@3') + def resolved = path.resolve('samples@3') as SeqeraPath then: - (resolved as SeqeraPath).datasetName == 'samples' - (resolved as SeqeraPath).version == '3' + resolved.trail == ['samples@3'] + } + + def "resolve - appends nested data-link path segment"() { + given: + def fs = mockFs() + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs') + + when: + def child = path.resolve('reads') as SeqeraPath + + then: + child.trail == ['AWS', 'inputs', 'reads'] + } + + def "resolve with multi-segment string builds correct path"() { + given: + def fs = mockFs() + def base = new SeqeraPath(fs, 'seqera://acme/research') + + expect: + base.resolve('datasets/samples').toString() == 'seqera://acme/research/datasets/samples' + base.resolve('datasets').toString() == 'seqera://acme/research/datasets' + } + + def "resolve with absolute seqera URI returns that URI"() { + given: + def fs = mockFs() + def base = new SeqeraPath(fs, 'seqera://acme/research') + def absolute = 'seqera://other/ws/datasets/report' + + expect: + base.resolve(absolute).toString() == absolute } + // ---- equality / hashCode ---- + def "equality and hashCode"() { given: def fs = mockFs() @@ -222,7 +297,7 @@ class SeqeraPathTest extends Specification { p1 != p3 } - def "isAbsolute always true"() { + def "isAbsolute true when fs attached"() { given: def fs = mockFs() @@ -239,6 +314,7 @@ class SeqeraPathTest extends Specification { new SeqeraPath(fs, 'seqera://').nameCount == 0 new SeqeraPath(fs, 'seqera://acme').nameCount == 1 new SeqeraPath(fs, 'seqera://acme/research/datasets/samples').nameCount == 4 + new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/reads/a.fq').nameCount == 7 } // ---- relativize ---- @@ -291,28 +367,6 @@ class SeqeraPathTest extends Specification { 'seqera://acme/research/datasets/samples' | 'seqera://acme/research/datasets/other' | '../other' } - // ---- multi-segment resolve ---- - - def "resolve with multi-segment string builds correct path"() { - given: - def fs = mockFs() - def base = new SeqeraPath(fs, 'seqera://acme/research') - - expect: - base.resolve('datasets/samples').toString() == 'seqera://acme/research/datasets/samples' - base.resolve('datasets').toString() == 'seqera://acme/research/datasets' - } - - def "resolve with absolute seqera URI returns that URI"() { - given: - def fs = mockFs() - def base = new SeqeraPath(fs, 'seqera://acme/research') - def absolute = 'seqera://other/ws/datasets/report' - - expect: - base.resolve(absolute).toString() == absolute - } - def "isAbsolute is false for relative paths produced by relativize"() { given: def fs = mockFs() @@ -337,6 +391,7 @@ class SeqeraPathTest extends Specification { new SeqeraPath(fs, 'seqera://acme/research/datasets').getFileName().toString() == 'datasets' new SeqeraPath(fs, 'seqera://acme/research/datasets/samples').getFileName().toString() == 'samples' new SeqeraPath(fs, 'seqera://acme/research/datasets/samples@2').getFileName().toString() == 'samples@2' + new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/reads/a.fq').getFileName().toString() == 'a.fq' } def "getFileName is not absolute (uses relative constructor)"() { @@ -365,9 +420,7 @@ class SeqeraPathTest extends Specification { def "asUri - path starting with dot has dot stripped"() { expect: - // seqera://. → strips dot → seqera:// → hits empty-path case → seqera:/// SeqeraPath.asUri('seqera://.').toString() == 'seqera:///' - // seqera://./foo/bar → strips dot only (substring from index 10) → seqera:///foo/bar SeqeraPath.asUri('seqera://./foo/bar').toString() == 'seqera://foo/bar' } @@ -518,4 +571,54 @@ class SeqeraPathTest extends Specification { then: parts == ['acme'] } + + // ---- trailing slash / accidental double-slash tolerance ---- + + def "trailing slash on resource-type directory is ignored"() { + given: + def fs = mockFs() + + when: + def p = new SeqeraPath(fs, 'seqera://acme/research/datasets/') + + then: + p.depth() == 3 + p.resourceType == 'datasets' + p.trail == [] + } + + def "trailing slash on data-link directory is ignored"() { + given: + def fs = mockFs() + + when: + def p = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/') + + then: + p.depth() == 5 + p.trail == ['aws', 'inputs'] + } + + def "accidental double-slash inside the trail is collapsed"() { + given: + def fs = mockFs() + + when: + def p = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs//reads/a.fq') + + then: + p.trail == ['aws', 'inputs', 'reads', 'a.fq'] + } + + def "iterator on deep data-link path returns all segments"() { + given: + def fs = mockFs() + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/reads/a.fq') + + when: + def parts = path.iterator().collect { it.toString() } + + then: + parts == ['acme', 'research', 'data-links', 'AWS', 'inputs', 'reads', 'a.fq'] + } } diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy new file mode 100644 index 0000000000..71fa4e10ba --- /dev/null +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy @@ -0,0 +1,338 @@ +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.fs.handler + +import java.net.http.HttpClient +import java.net.http.HttpResponse +import java.nio.file.AccessMode +import java.nio.file.NoSuchFileException +import java.nio.file.Path + +import io.seqera.tower.model.DataLinkDto +import io.seqera.tower.model.DataLinkItem +import io.seqera.tower.model.DataLinkItemType +import io.seqera.tower.model.DataLinkProvider +import io.seqera.tower.model.DataLinkDownloadUrlResponse +import io.seqera.tower.plugin.datalink.PagedDataLinkContent +import io.seqera.tower.plugin.datalink.SeqeraDataLinkClient +import io.seqera.tower.plugin.fs.SeqeraFileSystem +import io.seqera.tower.plugin.fs.SeqeraPath +import spock.lang.Specification + +class DataLinksResourceHandlerTest extends Specification { + + private SeqeraFileSystem fs = Mock(SeqeraFileSystem) + private SeqeraDataLinkClient client = Mock(SeqeraDataLinkClient) + private HttpClient http = Mock(HttpClient) + private DataLinksResourceHandler handler = new DataLinksResourceHandler(fs, client, http) + + private static DataLinkDto dl(String id, String name, DataLinkProvider p) { + def d = new DataLinkDto(); d.id = id; d.name = name; d.provider = p; return d + } + + private static DataLinkItem item(String name, DataLinkItemType t, long size) { + def i = new DataLinkItem(); i.name = name; i.type = t; i.size = size; return i + } + + /** Single-page {@link PagedDataLinkContent} for tests. */ + private static PagedDataLinkContent pagedContent(List items, String originalPath = null) { + return new PagedDataLinkContent(originalPath, items, null, new PagedDataLinkContent.PageFetcher() { + Map fetch(String t) { throw new IllegalStateException('no more pages') } + }) + } + + /** Iterator over a fixed list — what {@code client.listDataLinks(...)} is expected to return. */ + private static Iterator iter(List list) { list.iterator() } + + private static List asList(Iterable iterable) { + final out = new ArrayList() + for (Path p : iterable) out.add(p) + return out + } + + // ===================================================== + // newInputStream — MVP + // ===================================================== + + def "newInputStream resolves (provider,name,subPath) and streams the signed URL"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/reads/a.fq') + def signedBody = new ByteArrayInputStream('data'.bytes) + def httpResp = Mock(HttpResponse) { + statusCode() >> 200 + body() >> signedBody + } + def urlResp = new DataLinkDownloadUrlResponse(); urlResp.url = 'https://signed/a' + + when: + def stream = handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * client.listDataLinks(10L) >> iter([dl('dl-1', 'inputs', DataLinkProvider.AWS)]) + 1 * client.getDownloadUrl('dl-1', 'reads/a.fq', 10L) >> urlResp + 1 * http.send(_, _) >> httpResp + stream === signedBody + } + + def "newInputStream throws NoSuchFileException when data-link unknown"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/unknown/reads/a.fq') + + when: + handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * client.listDataLinks(10L) >> iter([dl('dl-1', 'inputs', DataLinkProvider.AWS)]) + thrown(NoSuchFileException) + } + + def "newInputStream requires trail.size >= 3 (file path, not the data-link root itself)"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs') + + when: + handler.newInputStream(path) + + then: + thrown(IllegalArgumentException) + } + + def "newInputStream surfaces signed-URL HTTP failure as IOException"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/reads/a.fq') + def urlResp = new DataLinkDownloadUrlResponse(); urlResp.url = 'https://signed/a' + def httpResp = Mock(HttpResponse) { + statusCode() >> 403 + body() >> new ByteArrayInputStream(new byte[0]) + } + + when: + handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> iter([dl('dl-1', 'inputs', DataLinkProvider.AWS)]) + 1 * client.getDownloadUrl('dl-1', 'reads/a.fq', 10L) >> urlResp + 1 * http.send(_, _) >> httpResp + thrown(IOException) + } + + // ===================================================== + // list — US2 browse + // ===================================================== + + def "list at data-links/ returns distinct providers in use"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links') + + when: + def paths = asList(handler.list(path)) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * client.listDataLinks(10L) >> iter([ + dl('dl-1', 'a', DataLinkProvider.AWS), + dl('dl-2', 'b', DataLinkProvider.GOOGLE), + dl('dl-3', 'c', DataLinkProvider.AWS) + ]) + paths*.toString().sort() == [ + 'seqera://acme/research/data-links/aws', + 'seqera://acme/research/data-links/google' + ] + } + + def "list at data-links// returns data-link names for that provider"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws') + + when: + def paths = asList(handler.list(path)) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> iter([ + dl('dl-1', 'inputs', DataLinkProvider.AWS), + dl('dl-2', 'archive', DataLinkProvider.AWS), + dl('dl-3', 'onGcs', DataLinkProvider.GOOGLE) + ]) + paths*.toString() == [ + 'seqera://acme/research/data-links/aws/archive', + 'seqera://acme/research/data-links/aws/inputs' + ] + } + + def "list at data-link root returns top-level objects"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs') + + when: + def paths = asList(handler.list(path)) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> iter([dl('dl-1', 'inputs', DataLinkProvider.AWS)]) + 1 * client.getContent('dl-1', '', 10L) >> pagedContent([ + item('reads', DataLinkItemType.FOLDER, 0), + item('samplesheet.csv', DataLinkItemType.FILE, 42) + ]) + paths*.toString() == [ + 'seqera://acme/research/data-links/aws/inputs/reads', + 'seqera://acme/research/data-links/aws/inputs/samplesheet.csv' + ] + } + + def "list at deep sub-path browses the correct sub-path"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/reads') + + when: + def paths = asList(handler.list(path)) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> iter([dl('dl-1', 'inputs', DataLinkProvider.AWS)]) + 1 * client.getContent('dl-1', 'reads', 10L) >> pagedContent([ + item('a.fq', DataLinkItemType.FILE, 1), + item('b.fq', DataLinkItemType.FILE, 2) + ]) + paths*.toString() == [ + 'seqera://acme/research/data-links/aws/inputs/reads/a.fq', + 'seqera://acme/research/data-links/aws/inputs/reads/b.fq' + ] + } + + // ===================================================== + // readAttributes + // ===================================================== + + def "readAttributes at data-links/ resource-type dir reports directory"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links') + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + attr.directory + !attr.regularFile + } + + def "readAttributes at data-link root reports directory"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs') + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> iter([dl('dl-1', 'inputs', DataLinkProvider.AWS)]) + attr.directory + } + + def "readAttributes on a file sub-path reports file with size"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/reads/a.fq') + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> iter([dl('dl-1', 'inputs', DataLinkProvider.AWS)]) + 1 * client.getContent('dl-1', 'reads/a.fq', 10L) >> pagedContent([ + item('a.fq', DataLinkItemType.FILE, 123) + ]) + attr.regularFile + attr.size() == 123L + } + + def "readAttributes on a directory sub-path reports directory"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/reads') + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> iter([dl('dl-1', 'inputs', DataLinkProvider.AWS)]) + 1 * client.getContent('dl-1', 'reads', 10L) >> pagedContent( + [item('a.fq', DataLinkItemType.FILE, 1), item('b.fq', DataLinkItemType.FILE, 2)], + 'reads/') + attr.directory + } + + // ===================================================== + // error paths — US3 + // ===================================================== + + def "list at data-links// throws when no data-links for that provider"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/azure') + + when: + asList(handler.list(path)) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> iter([dl('dl-1', 'x', DataLinkProvider.AWS)]) + def ex = thrown(NoSuchFileException) + ex.reason?.toLowerCase()?.contains('no data-links') + } + + def "unknown data-link under a known provider throws"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/ghost/a.fq') + + when: + asList(handler.list(path)) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> iter([dl('dl-1', 'inputs', DataLinkProvider.AWS)]) + thrown(NoSuchFileException) + } + + def "missing sub-path inside a data-link surfaces as NoSuchFileException"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/does/not/exist') + + when: + handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> iter([dl('dl-1', 'inputs', DataLinkProvider.AWS)]) + 1 * client.getContent('dl-1', 'does/not/exist', 10L) >> pagedContent([]) + thrown(NoSuchFileException) + } + + def "checkAccess with WRITE is rejected"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/a.fq') + + when: + handler.checkAccess(path, AccessMode.WRITE) + + then: + thrown(java.nio.file.AccessDeniedException) + } +} diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandlerTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandlerTest.groovy new file mode 100644 index 0000000000..702d34f322 --- /dev/null +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandlerTest.groovy @@ -0,0 +1,201 @@ +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.fs.handler + +import java.nio.file.AccessMode +import java.nio.file.NoSuchFileException + +import io.seqera.tower.model.DatasetDto +import io.seqera.tower.model.DatasetVersionDto +import io.seqera.tower.plugin.dataset.SeqeraDatasetClient +import io.seqera.tower.plugin.fs.SeqeraFileSystem +import io.seqera.tower.plugin.fs.SeqeraPath +import spock.lang.Specification + +class DatasetsResourceHandlerTest extends Specification { + + def fs = Mock(SeqeraFileSystem) + def client = Mock(SeqeraDatasetClient) + def handler = new DatasetsResourceHandler(fs, client) + + private static DatasetDto ds(String id, String name, long wsId = 10L) { + def d = new DatasetDto() + d.id = id; d.name = name; d.workspaceId = wsId + return d + } + + private static DatasetVersionDto ver(String dsId, long v, String file, boolean disabled = false) { + def dv = new DatasetVersionDto() + dv.datasetId = dsId; dv.version = v; dv.fileName = file; dv.disabled = disabled + return dv + } + + def "getResourceType returns 'datasets'"() { + expect: + handler.resourceType == 'datasets' + } + + def "list at depth 3 returns one path per dataset"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets') + + when: + def paths = handler.list(path) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * client.listDatasets(10L) >> [ds('d1', 'one'), ds('d2', 'two')] + paths*.toString() == [ + 'seqera://acme/research/datasets/one', + 'seqera://acme/research/datasets/two' + ] + } + + def "list result is cached across calls"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets') + + when: + handler.list(path) + handler.list(path) + + then: + 2 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * client.listDatasets(10L) >> [ds('d1', 'one')] + } + + def "newInputStream resolves latest non-disabled version when no pin"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples') + def dataset = ds('d1', 'samples') + def v1 = ver('d1', 1, 'a.csv') + def v2 = ver('d1', 2, 'b.csv') + def content = new ByteArrayInputStream('x'.bytes) + + when: + def stream = handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * client.listDatasets(10L) >> [dataset] + 1 * client.listVersions('d1', 10L) >> [v1, v2] + 1 * client.downloadDataset('d1', '2', 'b.csv', 10L) >> content + stream === content + } + + def "newInputStream honors @version pin"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples@1') + def dataset = ds('d1', 'samples') + def v1 = ver('d1', 1, 'a.csv') + def v2 = ver('d1', 2, 'b.csv') + + when: + handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDatasets(10L) >> [dataset] + 1 * client.listVersions('d1', 10L) >> [v1, v2] + 1 * client.downloadDataset('d1', '1', 'a.csv', 10L) >> new ByteArrayInputStream('x'.bytes) + } + + def "newInputStream throws NoSuchFileException when dataset is missing"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/ghost') + + when: + handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * client.listDatasets(10L) >> [ds('d1', 'samples')] + thrown(NoSuchFileException) + } + + def "newInputStream throws when pinned version is unknown"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples@99') + def dataset = ds('d1', 'samples') + def v1 = ver('d1', 1, 'a.csv') + + when: + handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDatasets(10L) >> [dataset] + 1 * client.listVersions('d1', 10L) >> [v1] + thrown(NoSuchFileException) + } + + def "newInputStream falls back to latest when pinned version is omitted"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples') + def dataset = ds('d1', 'samples') + def enabled = ver('d1', 3, 'c.csv', false) + def disabled = ver('d1', 4, 'd.csv', true) + + when: + handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDatasets(10L) >> [dataset] + 1 * client.listVersions('d1', 10L) >> [enabled, disabled] + 1 * client.downloadDataset('d1', '3', 'c.csv', 10L) >> new ByteArrayInputStream('x'.bytes) + } + + def "readAttributes at depth 3 reports directory"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets') + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + attr.directory + !attr.regularFile + } + + def "readAttributes at depth 4 returns file attributes"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples') + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDatasets(10L) >> [ds('d1', 'samples')] + attr.regularFile + !attr.directory + attr.fileKey() == 'd1' + } + + def "checkAccess rejects WRITE"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples') + + when: + handler.checkAccess(path, AccessMode.WRITE) + + then: + thrown(java.nio.file.AccessDeniedException) + } +} diff --git a/specs/260422-seqera-datalinks-fs/plan.md b/specs/260422-seqera-datalinks-fs/plan.md new file mode 100644 index 0000000000..eb1f4ea617 --- /dev/null +++ b/specs/260422-seqera-datalinks-fs/plan.md @@ -0,0 +1,259 @@ +# Implementation Plan: Seqera NIO Filesystem Support for Platform Data-Links + +**Branch**: `260422-seqera-datalinks-fs` | **Date**: 2026-04-23 | **Spec**: [spec.md](spec.md) | **ADR**: [20260422-seqera-datalinks-filesystem](../../adr/20260422-seqera-datalinks-filesystem.md) + +--- + +## Summary + +Extend the `nf-tower` plugin's `seqera://` NIO filesystem (shipped in [260310-seqera-dataset-fs](../260310-seqera-dataset-fs/spec.md)) with a second resource type, `data-links`. Paths of the form `seqera:////data-links///` resolve to entries inside Platform-managed data-links (S3/GCS/Azure buckets or prefixes). Listings and attribute queries hit the Platform's `/data-links/{id}/content` endpoint; byte reads go through a pre-signed URL returned by `/data-links/{id}/download` and fetched with a plain JDK HTTP client — no cloud SDK dependency. + +As part of this change, extract a real `ResourceTypeHandler` abstraction from the existing dataset logic. `DatasetsResourceHandler` and `DataLinksResourceHandler` are parallel implementations; the generic classes (`SeqeraFileSystemProvider`, `SeqeraFileSystem`, `SeqeraPath`, `SeqeraFileAttributes`) become resource-type-agnostic for depth ≥ 3. + +--- + +## Technical Context + +**Language/Version**: Groovy 4.0.29, targeting Java 17 runtime +**Primary Dependencies**: `io.seqera:tower-api:1.121.0` (existing), `io.seqera:lib-httpx:2.1.0` (existing via TowerClient) +**Storage**: None (in-memory org/workspace + data-link list caches on `SeqeraFileSystem`) +**Testing**: Spock Framework with `Mock(TowerClient)` + `JsonOutput` fixtures — matches the dataset test style +**Target Platform**: nf-tower plugin; runs wherever Nextflow runs +**Performance Goals**: Listing any hierarchy level in ≤ 5s for workspaces with up to 500 data-links (SC-003) +**Constraints**: Read-only in this iteration; no cloud SDK on classpath (SC-006) +**Scale/Scope**: Single plugin change; no new plugin, no core module changes + +--- + +## Constitution Check + +| Principle | Status | Notes | +|-----------|--------|-------| +| I. Modular Architecture | ✅ PASS | Feature lives in `nf-tower`; no core module changes | +| II. Test-Driven Quality | ✅ PASS | Spock unit tests for client, handler, refactored path; reuses existing `Mock(TowerClient)` pattern | +| III. Dataflow Programming Model | ✅ PASS | NIO `InputStream`; no dataflow model changes | +| IV. Apache 2.0 License | ✅ PASS | All new files must include Apache 2.0 header | +| V. DCO Sign-off | ✅ PASS | All commits use `git commit -s` | +| VI. Semantic Versioning | ✅ PASS | VERSION bump and changelog entry both deferred to release time | +| VII. Groovy Idioms | ✅ PASS | `@CompileStatic`, follow existing `fs/` patterns | + +No violations. + +--- + +## Project Structure + +### Documentation (this feature) + +```text +specs/260422-seqera-datalinks-fs/ +├── spec.md ← feature spec (exists) +├── plan.md ← this file +└── tasks.md ← task checklist +``` + +### Source Code (nf-tower plugin) + +```text +plugins/nf-tower/ +└── src/ (VERSION and changelog.txt updated at release time, not in this feature) + ├── main/io/seqera/tower/plugin/ + │ ├── fs/ + │ │ ├── ResourceTypeHandler.groovy ← NEW (interface) + │ │ ├── SeqeraFileSystemProvider.groovy ← refactored (dispatch by handler) + │ │ ├── SeqeraFileSystem.groovy ← refactored (handler registry) + │ │ ├── SeqeraPath.groovy ← refactored (generic trail segments) + │ │ ├── SeqeraFileAttributes.groovy ← refactored (isDir, size, lastMod) + │ │ ├── SeqeraPathFactory.groovy ← unchanged + │ │ ├── DatasetInputStream.groovy ← unchanged + │ │ └── handler/ + │ │ ├── DatasetsResourceHandler.groovy ← NEW (extracted) + │ │ └── DataLinksResourceHandler.groovy ← NEW + │ ├── dataset/ + │ │ └── SeqeraDatasetClient.groovy ← unchanged + │ └── datalink/ ← NEW package + │ └── SeqeraDataLinkClient.groovy ← NEW + └── test/io/seqera/tower/plugin/ + ├── fs/ + │ ├── SeqeraPathTest.groovy ← extended (sub-path cases) + │ ├── SeqeraFileSystemTest.groovy ← extended (handler registry) + │ ├── SeqeraFileSystemProviderTest.groovy ← extended (data-link dispatch specs) + │ └── handler/ + │ ├── DatasetsResourceHandlerTest.groovy ← NEW + │ └── DataLinksResourceHandlerTest.groovy ← NEW + └── datalink/ + └── SeqeraDataLinkClientTest.groovy ← NEW +``` + +**Structure decision**: Parallel `datalink/` package mirrors the existing `dataset/` package. Handlers live in `fs/handler/` so the generic NIO classes in `fs/` remain resource-type-agnostic. All wire DTOs are reused from `io.seqera.tower.model.*` — no plugin-local DTO classes. + +--- + +## Phase 0: Research Notes + +### Tower-API DTOs (confirmed via `javap`) + +All reused from `io.seqera:tower-api:1.121.0` (already on the classpath): + +| DTO | Fields used here | +|---|---| +| `DataLinkDto` | `id: String`, `name: String`, `provider: DataLinkProvider`, `resourceRef: String` | +| `DataLinkProvider` (enum) | `AWS`, `GOOGLE`, `AZURE`, `AZURE_ENTRA`, `AZURE_CLOUD`, `SEQERACOMPUTE`, `S3` — exposes a `String value` via `toString()` | +| `DataLinksListResponse` | `dataLinks: List`, `totalSize: Long` | +| `DataLinkContentResponse` | `originalPath: String`, `objects: List`, `nextPageToken: String` | +| `DataLinkItem` | `type: DataLinkItemType`, `name: String`, `size: Long`, `mimeType: String` — no last-modified field | +| `DataLinkItemType` (enum) | `FOLDER`, `FILE` | +| `DataLinkDownloadUrlResponse` | `url: String` | + +**Attribute consequence**: `DataLinkItem` does not expose a last-modified timestamp. `SeqeraFileAttributes.lastModifiedTime()` for data-link paths returns `FileTime.from(Instant.EPOCH)`. Spec assumption 2 and FR-005 remain satisfied — we return a valid `FileTime`; the absence of real data is a Platform-API limitation. + +### Platform endpoints (confirmed from OpenAPI) + +- `GET /data-links?workspaceId=&max=&offset=` → `DataLinksListResponse`. Pagination: `totalSize` is the full count; keep fetching with offset until sum of received `dataLinks` equals `totalSize`. Default `max` is the server default; plugin uses `max=100` per page. +- `GET /data-links/{id}/content?workspaceId=&path=&nextPageToken=` → `DataLinkContentResponse`. Works for directory and file paths. Pagination: follow `nextPageToken` until null/empty. +- `GET /data-links/{id}/download?workspaceId=&path=` → `DataLinkDownloadUrlResponse` with a cloud-signed URL. + +### Signed-URL fetch + +The signed URL is **not** a Seqera endpoint; it points at S3/GCS/Azure with auth baked into the query string. It must be fetched **without** the Seqera `Authorization` header (AWS SigV4 will reject unknown `Authorization` headers). Use a standalone `java.net.http.HttpClient` inside `DataLinksResourceHandler` for this fetch. Do **not** use `TowerClient.sendStreamingRequest`, which adds Seqera auth headers. + +### SeqeraPath refactor shape + +Replace the six typed fields (`org`, `workspace`, `resourceType`, `datasetName`, `version`, `relPath`) with: + +- `org: String` (or null) +- `workspace: String` (or null) +- `resourceType: String` (or null) +- `trail: List` (possibly empty) — the segments after `resourceType` +- `relPath: String` (for relative paths; mutually exclusive with absolute segments) + +The `trail` is opaque to `SeqeraPath` — handlers interpret it. Concrete interpretations: + +- **Dataset** (`resourceType = "datasets"`): `trail.size() == 0` → resource-type dir; `trail.size() == 1` → dataset file, with optional `@version` suffix on the single element; `trail.size() > 1` → invalid. +- **Data-link** (`resourceType = "data-links"`): `trail.size() == 0` → resource-type dir; `trail.size() == 1` → provider dir; `trail.size() == 2` → data-link root dir; `trail.size() ≥ 3` → entry inside the data-link (directory or file, per `readAttributes`). + +`depth()` becomes `3 + trail.size()` when `resourceType` is set, else the count of non-null identity fields. + +### Existing tests to preserve + +Running `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.fs.*'` and `... --tests 'io.seqera.tower.plugin.dataset.*'` must continue to pass throughout the refactor. The dataset behavior does not change; only the class that implements it does. + +--- + +## Phase 1: Design & Contracts + +### `ResourceTypeHandler` interface + +```groovy +interface ResourceTypeHandler { + /** the depth-3 segment this handler owns, e.g. "datasets" or "data-links" */ + String getResourceType() + + /** list entries at the given directory path owned by this handler; caller verified depth >= 3 and isDirectory */ + List list(SeqeraPath dir) throws IOException + + /** return BasicFileAttributes for any path at depth >= 3 owned by this handler */ + SeqeraFileAttributes readAttributes(SeqeraPath path) throws IOException + + /** open a read stream for a leaf path; throw if the path is a directory */ + InputStream newInputStream(SeqeraPath path) throws IOException + + /** verify the path exists and modes are satisfiable; READ allowed, WRITE/EXECUTE rejected */ + void checkAccess(SeqeraPath path, AccessMode... modes) throws IOException +} +``` + +### `SeqeraDataLinkClient` contract + +```groovy +class SeqeraDataLinkClient { + SeqeraDataLinkClient(TowerClient towerClient) + + /** exhaust pagination; return all data-links in the workspace */ + List listDataLinks(long workspaceId) + + /** GET /data-links/{id}/content?path= — exhausts nextPageToken pagination */ + DataLinkContentResponse getContent(String dataLinkId, String subPath, long workspaceId) + + /** GET /data-links/{id}/download?path= */ + DataLinkDownloadUrlResponse getDownloadUrl(String dataLinkId, String subPath, long workspaceId) +} +``` + +All three translate 401/403/404/5xx through the same `checkFsResponse` pattern used in `SeqeraDatasetClient`. + +### `SeqeraFileSystem` handler registry + +```groovy +class SeqeraFileSystem extends FileSystem { + // existing org/workspace state (unchanged) + private final Map handlers = new LinkedHashMap<>() + + void registerHandler(ResourceTypeHandler h) { handlers.put(h.resourceType, h) } + ResourceTypeHandler getHandler(String type) { handlers.get(type) } + Set getResourceTypes() { Collections.unmodifiableSet(handlers.keySet()) } +} +``` + +`SeqeraFileSystemProvider.newFileSystem()` registers both handlers after constructing the filesystem: + +```groovy +fs.registerHandler(new DatasetsResourceHandler(fs, new SeqeraDatasetClient(towerClient))) +fs.registerHandler(new DataLinksResourceHandler(fs, new SeqeraDataLinkClient(towerClient))) +``` + +### Dispatch in `SeqeraFileSystemProvider` + +- Depth 0–2: handled directly (root/org/workspace) — uses `SeqeraFileSystem`'s org/workspace cache as before. +- Depth 2 listing: returns **`fs.getResourceTypes()`** as child paths (replaces the hard-coded `['datasets']`). +- Depth ≥ 3: dispatch to `fs.getHandler(sp.resourceType)`; unknown type → `NoSuchFileException("Unsupported resource type: ${sp.resourceType}")`. + +### `SeqeraFileAttributes` refactor + +Replace the `DatasetDto`-coupled constructor with two constructors: + +```groovy +/** directory */ +SeqeraFileAttributes(boolean isDir) +/** file with explicit metadata */ +SeqeraFileAttributes(long size, Instant lastModified, Instant created, Object fileKey) +``` + +Internal fields become `(directory, size, lastModified, created, fileKey)`. `DatasetsResourceHandler` constructs the file variant from `DatasetDto`; `DataLinksResourceHandler` constructs it from `DataLinkItem`. The previous `SeqeraFileAttributes(DatasetDto)` and `SeqeraFileAttributes(boolean)` call sites are updated. + +### `DataLinksResourceHandler` behaviors + +| Path shape | Method | Implementation | +|---|---|---| +| `data-links/` (trail=[]) | `list` | enumerate distinct `DataLinkDto.provider` (via `toString()` / enum value) in cached list; return paths `data-links/` | +| `data-links/` (trail=[p]) | `list` | filter cached list where provider matches; return paths `data-links//` | +| `data-links//` (trail=[p,n]) | `list` | resolve to `dataLinkId`; call `getContent(id, "", wsId)`; map `objects` to child paths | +| `data-links////…` (trail ≥ 3) | `list` | call `getContent(id, "/…", wsId)`; map `objects` | +| any depth ≥ 3 | `readAttributes` | data-link-root path → directory; below that → `getContent(id, sub, ws)`; if response `objects` has one item matching the last segment with `type = FILE`, return file attrs (size from item); otherwise → directory | +| leaf file | `newInputStream` | `getDownloadUrl(id, sub, ws)`; open plain `HttpClient.send(..., BodyHandlers.ofInputStream())` against `response.url`; return body stream | + +Provider segment canonicalization: the path segment is the `DataLinkProvider` enum's `toString()`. A path with an unknown provider segment maps to `NoSuchFileException`. + +### Data-link identity resolution + +`DataLinksResourceHandler.resolveDataLinkId(provider, name, workspaceId)`: + +1. Ensure the workspace's data-link list is loaded (cached `Map>` inside the handler). +2. Find `DataLinkDto` where `provider.toString() == providerSegment && name == nameSegment`. +3. Return `id`; throw `NoSuchFileException` with a clear message if not found. + +--- + +## Phase 2: Deliverables + +The detailed task list is in [tasks.md](tasks.md). Phases in execution order: + +1. **Refactor (foundational)** — extract `ResourceTypeHandler`, `DatasetsResourceHandler`; generalize `SeqeraPath`, `SeqeraFileSystem`, `SeqeraFileSystemProvider`, `SeqeraFileAttributes`. Existing dataset tests must pass unchanged. +2. **Data-link API client** — implement `SeqeraDataLinkClient` with pagination and error mapping. Unit tests with `Mock(TowerClient)`. +3. **US1 — Read file inside a data-link** — implement `DataLinksResourceHandler.newInputStream` and register the handler. +4. **US2 — Browse hierarchy** — implement `list` and `readAttributes`; workspace listing enumerates handlers. +5. **US3 — Error paths** — explicit tests for unknown provider, unknown data-link, missing sub-path, 401/403 mapping. +6. **US4 — Extensibility validation** — architectural check that generic classes stay resource-type-agnostic; end-to-end spec exercising both handlers in one fs. +7. **Final verification** — full test suite green; no cloud SDK added to classpath. VERSION bump and changelog entry happen at release time, not in this feature. + +Each task in `tasks.md` specifies exact file paths, exact code changes, exact test commands, and a commit step. diff --git a/specs/260422-seqera-datalinks-fs/spec.md b/specs/260422-seqera-datalinks-fs/spec.md new file mode 100644 index 0000000000..fcb31759b4 --- /dev/null +++ b/specs/260422-seqera-datalinks-fs/spec.md @@ -0,0 +1,182 @@ +# Feature Specification: Seqera NIO Filesystem Support for Platform Data-Links + +**Feature Branch**: `260422-seqera-datalinks-fs` +**Created**: 2026-04-22 +**Status**: Draft +**Depends on**: [260310-seqera-dataset-fs](../260310-seqera-dataset-fs/spec.md) +**Input**: User description: "I want to extend the seqera NIO filesystem to include the seqera platform data-links using this url template `seqera:////data-links/...`. The seqera platform API is https://cloud.seqera.io/openapi/seqera-api-latest.yml" + +## Clarifications + +### Session 2026-04-22 + +- Q: Scope of data-link support — list-only, full Platform-driven traversal, or hybrid (Platform-driven listing + cloud-driven I/O)? → A: Hybrid. Listing and attributes go through the Seqera Platform; byte-level I/O is delegated, but via pre-signed URLs the Platform returns, so no cloud provider SDK integration is required. +- Q: How are credentials handled for the cloud object storage behind a data-link? → A: Platform-brokered. The user provides only the `tower.accessToken`. The Platform returns a short-lived pre-signed URL for each read; no cloud credentials cross the plugin boundary. +- Q: Read-only, or read + write for data-links in this iteration? → A: Read-only. Write (upload) support may be added later; the architecture does not preclude it but the feature is not in scope. +- Q: Path hierarchy — does the data-link identity segment include the provider? → A: Yes. `data-links///...`. Names are not globally unique within a workspace (the same name may exist on two different providers), so the provider segment is required to disambiguate and mirrors the Platform UI's provider-grouped data explorer. +- Q: How deep can a data-link path go? → A: Arbitrary depth below the data-link root. Each segment after `` is an entry inside the underlying bucket/prefix — a directory or file, resolved via the Platform browse API. +- Q: How should the existing dataset filesystem code be extended to accommodate data-links? → A: Introduce a true resource-type abstraction (`ResourceTypeHandler`). The current dataset-specific logic in `SeqeraFileSystemProvider`, `SeqeraFileSystem`, and `SeqeraPath` is extracted into a `DatasetsResourceHandler`; data-links are added as a parallel `DataLinksResourceHandler`. The core path/filesystem/provider classes become resource-type-agnostic. +- Q: How should the listing vs I/O boundary work? → A: Listing (`newDirectoryStream`) and attributes (`readAttributes`) are resolved via `GET /data-links/{id}/content?path=`. Downloads (`newInputStream`) go through `GET /data-links/{id}/download?path=`, which returns a pre-signed URL that is then fetched with the existing `TowerClient.sendStreamingRequest()` streaming helper. No cloud SDK is used. +- Q: Which DTOs are introduced by this feature? → A: None. All types are reused from the `io.seqera:tower-api:1.121.0` dependency (`DataLinkDto`, `DataLinkItem`, `DataLinkProvider`, `DataLinkContentResponse`, `DataLinkDownloadUrlResponse`, etc.). +- Q: Is browse-per-file supported by the Platform API? → A: Yes. `GET /data-links/{id}/content?path=` works for both directories and files, so `readAttributes` on any path is a single targeted call — no parent-browse-and-filter, no N+1 problem. +- Q: What happens if the pre-signed URL expires during a long read? → A: The underlying HTTP connection errors out with an `IOException`. The plugin does not transparently re-issue URLs; Nextflow's task retry handles the failure as it already does for other transient I/O errors. + +## User Scenarios & Testing *(mandatory)* + +### User Story 1 — Use a File Inside a Data-Link as Pipeline Input (Priority: P1) + +A Nextflow pipeline developer has registered an S3 bucket or a GCS prefix as a Seqera Platform data-link (e.g. `inputs` on AWS). They want to reference a file inside that data-link as a pipeline input using a `seqera://` path, without configuring cloud credentials separately and without any manual pre-download step. + +**Why this priority**: This is the core value proposition of the feature. All other stories build on the ability to resolve and read a file inside a data-link by path. + +**Independent Test**: Write a pipeline that sets an input channel to `seqera:////data-links///` and verify the pipeline task receives the correct file content, using only the Seqera access token for authentication. + +**Acceptance Scenarios**: + +1. **Given** a data-link named `inputs` registered with provider `aws` in workspace `acme/research`, pointing at `s3://my-bucket/data/`, **When** a pipeline references `seqera://acme/research/data-links/aws/inputs/reads/sample1.fq.gz`, **Then** the pipeline task receives the byte content of `s3://my-bucket/data/reads/sample1.fq.gz` transparently. +2. **Given** a data-link on `google` and one on `azure` in the same workspace, **When** both are referenced from the same pipeline, **Then** each path resolves independently and content is streamed correctly from the respective provider. +3. **Given** a Seqera access token is configured (`tower.accessToken` or `TOWER_ACCESS_TOKEN`), **When** a data-link path is accessed, **Then** no additional cloud credentials (AWS, GCP, Azure) are required from the user. + +--- + +### User Story 2 — Browse the Data-Link Hierarchy (Priority: P2) + +A pipeline developer wants to navigate the data-link namespace at any level — list providers in a workspace, list data-links within a provider, and browse into the content tree of a specific data-link — using ordinary directory listing operations. + +**Why this priority**: Hierarchical listing supports discoverability and dynamic pipeline construction. Without it, users must know the exact path in advance. + +**Independent Test**: List each level of the hierarchy and verify the correct child entries appear. + +**Acceptance Scenarios**: + +1. **Given** a workspace `acme/research` has data-links on both AWS and GCS, **When** a user lists `seqera://acme/research/data-links/`, **Then** `aws` and `google` are returned as directory entries (only providers in use appear). +2. **Given** the `aws` provider has two data-links `inputs` and `archive`, **When** a user lists `seqera://acme/research/data-links/aws/`, **Then** both names are returned as directory entries. +3. **Given** a data-link `inputs` contains a folder `reads/` with files `a.fq.gz` and `b.fq.gz`, **When** a user lists `seqera://acme/research/data-links/aws/inputs/reads/`, **Then** both files appear as file entries with correct size and last-modified metadata. +4. **Given** a data-link root is listed, **When** the data-link is empty, **Then** an empty result is returned without errors. +5. **Given** a user lacks access to a workspace, **When** they attempt to list any `data-links/` path within it, **Then** a clear access-denied error is returned without leaking internal details. +6. **Given** `readAttributes` is called directly on a file path (no prior listing), **When** the path exists, **Then** a single Platform API call returns the file attributes (no parent-directory scan). + +--- + +### User Story 3 — Receive Meaningful Errors for Invalid or Inaccessible Paths (Priority: P3) + +A pipeline developer mistypes a data-link name, uses a provider segment that has no registered data-links, or references a path that no longer exists inside a data-link. They receive a clear, actionable error that helps them fix the problem. + +**Why this priority**: Error handling is essential for usability but delivers no new functionality on its own. Good errors prevent support escalations. + +**Independent Test**: Reference invalid data-link paths and verify the error messages identify the problem (unknown provider, unknown data-link name, path not found inside data-link, authentication failure) without generic or cryptic failures. + +**Acceptance Scenarios**: + +1. **Given** a workspace has no data-links on the `azure` provider, **When** a pipeline references `seqera://.../data-links/azure/anything`, **Then** a `NoSuchFileException` is raised with a message indicating no data-links exist for provider `azure` in the workspace. +2. **Given** a data-link name that does not exist under a provider, **When** a pipeline attempts to read any path within it, **Then** the error identifies the missing data-link by name and path. +3. **Given** a valid data-link but a path that does not exist inside it, **When** a pipeline attempts to read it, **Then** a `NoSuchFileException` is raised with a message including the sub-path. +4. **Given** invalid or expired credentials, **When** any data-link path is accessed, **Then** an authentication error is reported with guidance to reconfigure `tower.accessToken` / `TOWER_ACCESS_TOKEN`. +5. **Given** a resource-type segment that is neither `datasets` nor `data-links`, **When** used, **Then** a clear "unsupported resource type" error is returned (unchanged from dataset feature). + +--- + +### User Story 4 — Extensible Resource-Type Architecture (Priority: P4) + +A Nextflow or Seqera engineer wants the filesystem's resource-type abstraction to be real and exercised by more than one resource type, so that adding future Seqera-managed resources requires isolated, scoped changes rather than cross-cutting refactors. + +**Why this priority**: Shipping a second resource type is the right moment to introduce the shared abstraction — with two concrete consumers in place, the interface is validated in practice. This story captures the refactor as a first-class deliverable alongside the new data-link functionality. + +**Independent Test**: A code review confirms that (a) `SeqeraFileSystemProvider`, `SeqeraFileSystem`, and `SeqeraPath` contain no dataset- or data-link-specific branching for depth ≥ 3; (b) adding a hypothetical third resource type is a new `ResourceTypeHandler` implementation with no changes to the generic path/provider/filesystem classes; (c) both `DatasetsResourceHandler` and `DataLinksResourceHandler` implement the same interface without leakage of each other's concepts. + +**Acceptance Scenarios**: + +1. **Given** the refactored filesystem, **When** `seqera:////` is listed, **Then** the resource-type entries (`datasets`, `data-links`) are enumerated from the handler registry rather than a hard-coded list. +2. **Given** the refactored `SeqeraPath`, **When** any path shape valid for either resource type is parsed, **Then** parsing succeeds without requiring the path class to know which resource type owns the depth-4+ segments. +3. **Given** a new handler is registered in `SeqeraFileSystem`, **When** paths with its resource-type segment are resolved, **Then** dispatch reaches it without modifying existing handlers. +4. **Given** both existing resource types, **When** a path from one is accessed, **Then** the other handler's code is never executed. + +--- + +### Edge Cases + +- What happens when a data-link's underlying bucket/prefix has been revoked on the provider side but the data-link still exists in Seqera? The Platform surfaces an error which is propagated as `IOException`. +- What happens when a data-link has thousands of entries at one level? The Platform browse endpoint's pagination (if any) must be exhausted; initial implementation pages through using whatever cursor token the response exposes. +- What happens when a user has a provider name containing characters the path class rejects? Provider identifiers come from the Platform API verbatim (`DataLinkProvider` enum values); they are valid path segments by construction. +- What happens when the same data-link name exists for two providers (e.g. `inputs` on AWS and `inputs` on GCS)? Both are addressable at distinct paths: `data-links/aws/inputs/...` and `data-links/google/inputs/...`. The provider segment disambiguates. +- What happens when a pre-signed URL expires mid-read? The HTTP read fails with `IOException`. The plugin does not transparently re-issue URLs; Nextflow task retry handles the failure. +- What happens when the transient Platform API is unavailable? Same as the dataset feature — `TowerClient`'s retry/backoff is reused; exhaustion raises `IOException`. +- What happens when a data-link's listing contains entries whose names include `/`? The browse response is expected to return name segments, not paths; any entry whose name contains `/` is rejected with a descriptive error (indicates a provider data issue). +- What happens when a data-link is accessed concurrently from many pipeline tasks? All reads are independent signed-URL fetches; no shared state beyond the (read-only) cached data-link list. +- How is pagination of `GET /data-links` handled if a workspace has more data-links than fit in a single response page? Implementation must exhaust pages before caching. + +## Requirements *(mandatory)* + +### Functional Requirements + +- **FR-001**: System MUST accept paths in the format `seqera:////data-links///` where `` is zero or more segments addressing a directory or file inside the data-link. +- **FR-002**: System MUST read file content addressed by a data-link path transparently, requiring only the existing `tower.accessToken` / `TOWER_ACCESS_TOKEN` configuration — no cloud-provider credentials. +- **FR-003**: System MUST perform listing and attribute queries via the Seqera Platform API (`GET /data-links/{id}/content?path=`) and stream file content via pre-signed URLs returned from `GET /data-links/{id}/download?path=`. +- **FR-004**: System MUST support hierarchical directory listing: + - `seqera:////` → directory; entries include `datasets` and `data-links` (enumerated from the handler registry). + - `seqera:////data-links/` → directory; entries are distinct provider identifiers present in the workspace. + - `seqera:////data-links//` → directory; entries are data-link names under that provider. + - `seqera:////data-links///` → directory; entries are the top-level items in the data-link. + - `seqera:////data-links////` → directory; entries are the children at that sub-path. + - `seqera:////data-links////` → file. +- **FR-005**: System MUST return correct `BasicFileAttributes` — `isDirectory`, `isRegularFile`, `size`, `lastModifiedTime`, `creationTime` — for any path inside a data-link, sourced from the Platform's content response for that path. +- **FR-006**: System MUST treat data-link paths as read-only in this iteration. Any write-like operation (`newByteChannel` with `WRITE`/`APPEND`, `copy` with a data-link as target, `delete`, `createDirectory`, `move`) MUST fail with `UnsupportedOperationException` or `AccessDeniedException`, consistent with the dataset feature's read-only stance. +- **FR-007**: System MUST produce clear, actionable error messages distinguishing: unknown org/workspace, unknown provider, unknown data-link name, missing sub-path, unsupported resource type, authentication failure, and transient Platform errors. +- **FR-008**: System MUST NOT depend on `nf-amazon`, `nf-google`, or `nf-azure`. All cloud I/O is reduced to a single HTTPS fetch of a pre-signed URL via the existing `TowerClient` / `HxClient` streaming path. +- **FR-009**: System MUST reuse DTOs from `io.seqera:tower-api:1.121.0` (`DataLinkDto`, `DataLinkContentResponse`, `DataLinkItem`, `DataLinkDownloadUrlResponse`, `DataLinkProvider`, etc.) without introducing parallel plugin-local classes. +- **FR-010**: System MUST refactor the existing `fs/` package to introduce a `ResourceTypeHandler` interface. `DatasetsResourceHandler` MUST encapsulate all dataset-specific behavior previously inlined in `SeqeraFileSystemProvider` / `SeqeraFileSystem` / `SeqeraPath`. `DataLinksResourceHandler` MUST implement the same interface. +- **FR-011**: After the refactor, the classes `SeqeraPath`, `SeqeraFileSystem`, and `SeqeraFileSystemProvider` MUST contain no dataset- or data-link-specific logic for depth ≥ 3; all such logic MUST live in the respective handler. +- **FR-012**: `SeqeraPath` MUST parse and represent arbitrary sub-paths below depth 4 for resource types that support them (data-links). Datasets continue to reject sub-paths beyond depth 4. +- **FR-013**: The filesystem MUST reuse the existing `TowerClient` retry/backoff for all Platform API calls. No new retry logic is introduced. +- **FR-014**: Transient failure of a pre-signed URL fetch mid-stream MUST surface as `IOException`; Nextflow task retry handles the recovery. The plugin MUST NOT re-issue URLs transparently within a single `InputStream`. +- **FR-015**: Data-link list results (per workspace) MUST be cached for the lifetime of a single `SeqeraFileSystem` instance. No browse-result or URL cache is maintained. +- **FR-016**: `GET /data-links?workspaceId=X` pagination MUST be exhausted before the workspace data-link list cache is considered populated. + +### Key Entities + +- **Data-Link**: A Seqera Platform entity referencing a bucket or prefix on a cloud provider (S3, GCS, Azure Blob, etc.). Addressed by `(workspaceId, provider, name)`; content is browsed and read through Platform API calls. Represented in the path as `data-links///`. +- **Data-Link Provider**: A Platform-defined identifier for the cloud backend (`DataLinkProvider` enum values, e.g. `aws`, `google`, `azure`). Used as a path segment to disambiguate data-links with the same name on different providers. +- **Data-Link Entry**: An item inside a data-link — a file or folder — returned by the content API. Has a name, type (`FILE`/`FOLDER`), size, and last-modified timestamp. +- **Resource-Type Handler**: A pluggable strategy that owns the semantics of one depth-3 path segment (`datasets`, `data-links`, …). Exposes listing, attribute, read, and access-check operations to the generic filesystem. +- **Seqera Path (data-link variant)**: The URI `seqera:////data-links//[//…]`. All segments up to and including `` form the data-link identity; the remainder is the sub-path within the data-link. + +## Success Criteria *(mandatory)* + +### Measurable Outcomes + +- **SC-001**: A pipeline developer can reference a file inside a Seqera data-link via `seqera://` path, and the pipeline runs successfully using only the Seqera access token — no cloud credentials, no manual pre-download step. +- **SC-002**: 100% of existing Nextflow file operations that work on cloud-hosted files (read, iterate lines, pass as channel input) work identically when the file is referenced via a `seqera://` data-link path. +- **SC-003**: Listing any level of the data-link hierarchy completes in under 5 seconds for workspaces with up to 500 data-links and data-links containing up to 1,000 entries at a single level. +- **SC-004**: Invalid or inaccessible data-link paths produce error messages that allow a developer to identify and fix the problem without consulting external documentation in 90% of cases (measured by user testing or code review of error-text coverage). +- **SC-005**: The refactored `fs/` package passes a code review confirming that (a) no resource-type-specific logic remains in `SeqeraPath`, `SeqeraFileSystem`, or `SeqeraFileSystemProvider`; (b) the two handlers share no code paths for depth ≥ 3; (c) the dataset tests pass unchanged after routing through `DatasetsResourceHandler`. +- **SC-006**: The plugin's runtime classpath for this feature gains no new cloud-SDK dependency (no `aws-sdk`, no `google-cloud-storage`, no `azure-*` artifacts introduced by this change). + +## Assumptions + +- Authentication reuses the existing nf-tower plugin credential mechanism (Seqera access token); no new auth configuration is required from users. +- The `GET /data-links/{id}/content?path=` endpoint works for both directory and file paths. When `path` points at a file, the response describes just that file; when it points at a directory, the response's `items` array enumerates children. +- The `GET /data-links/{id}/download?path=` endpoint returns a pre-signed URL valid for long enough to complete a typical file read. The plugin does not extend this window. +- Data-link provider identifiers returned by the Platform (`DataLinkProvider`) are safe as path segments — they contain no `/` and no characters that collide with URI reserved characters. +- The tower-api artifact (`io.seqera:tower-api:1.121.0`) already available on the plugin classpath exposes all DTOs required (`DataLinkDto`, `DataLinkContentResponse`, `DataLinkItem`, `DataLinkDownloadUrlResponse`, `DataLinkProvider`, etc.). +- Data-link writes, renames, deletes, and management operations (create, update, delete the data-link entity itself) are **out of scope** for this iteration. +- Streaming a pre-signed URL reuses `TowerClient.sendStreamingRequest()` and therefore inherits the same retry/backoff behavior as dataset downloads. +- Data-link listings may be paginated; the plugin exhausts pagination before caching. +- No local caching across pipeline runs. Nextflow's standard task staging handles intra-run caching. +- Paths are case-sensitive — matches the Platform API and the dataset filesystem. +- The dataset feature's read-only filesystem stance (`isReadOnly()=true`) is preserved; data-link writes are deferred to a future iteration. + +## Dependencies + +- Seqera platform API (data-links endpoints: `/data-links`, `/data-links/{id}/content`, `/data-links/{id}/download`) must be accessible from the compute environment where the pipeline runs. +- nf-tower plugin must be enabled and configured with a valid `tower.accessToken` / `TOWER_ACCESS_TOKEN`. +- The Seqera account must have at least read access to the target workspace and data-link. +- The existing dataset filesystem (`260310-seqera-dataset-fs`) must be merged — this feature builds on its classes and refactors them. + +## Out of Scope + +- Write operations (upload) to data-links — the Platform's `POST /data-links/{id}/multipart-upload` endpoint is a natural future hook but is not implemented here. +- Data-link management operations (create, update, delete the data-link entity itself). +- Transparent pre-signed URL renewal mid-stream. +- Local caching across pipeline runs. +- Browse-result caching within a run. +- Fusion integration (Fusion has its own data-link access path; this feature is for direct NIO access). diff --git a/specs/260422-seqera-datalinks-fs/tasks.md b/specs/260422-seqera-datalinks-fs/tasks.md new file mode 100644 index 0000000000..8f32a9d939 --- /dev/null +++ b/specs/260422-seqera-datalinks-fs/tasks.md @@ -0,0 +1,2466 @@ +# Tasks: Seqera NIO Filesystem Support for Platform Data-Links + +**Branch**: `260422-seqera-datalinks-fs` | **Spec**: [spec.md](spec.md) | **Plan**: [plan.md](plan.md) + +> **For agentic workers**: execute tasks in order. Each task is self-contained and ends with a commit step. Do not skip TDD steps — write the test first, watch it fail, then make it pass. All commits use `git commit -s`. + +Tests use Spock with `Mock(TowerClient)` + `groovy.json.JsonOutput` fixtures — matching the style of `SeqeraDatasetClientTest` and `SeqeraFileSystemProviderTest`. No WireMock, no real HTTP. + +Legend: +- **[P]**: can be done in parallel with the previous task (different files, no dependency) +- **[Story]**: which user story from the spec +- Exact file paths are relative to repo root + +--- + +## Phase 1: Foundational Refactor (blocks all US) + +**Purpose**: Extract `ResourceTypeHandler`, move dataset-specific logic out of the generic classes. Existing dataset tests must pass unchanged at the end of this phase. + +### T001 — Generalize `SeqeraFileAttributes` + +**Files:** +- Modify: `plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileAttributes.groovy` + +- [ ] **Step 1: Replace the DatasetDto-coupled class with a generic one** + +Replace the entire class body (everything between `class SeqeraFileAttributes implements BasicFileAttributes {` and the final `}`) with: + +```groovy +@CompileStatic +class SeqeraFileAttributes implements BasicFileAttributes { + + private final boolean directory + private final long size + private final Instant lastModified + private final Instant created + private final Object fileKey + + /** Construct attributes for a virtual directory (any depth). */ + SeqeraFileAttributes(boolean isDir) { + this.directory = isDir + this.size = 0L + this.lastModified = Instant.EPOCH + this.created = Instant.EPOCH + this.fileKey = null + } + + /** Construct attributes for a regular file. */ + SeqeraFileAttributes(long size, Instant lastModified, Instant created, Object fileKey) { + this.directory = false + this.size = size ?: 0L + this.lastModified = lastModified ?: Instant.EPOCH + this.created = created ?: Instant.EPOCH + this.fileKey = fileKey + } + + @Override FileTime lastModifiedTime() { FileTime.from(lastModified) } + @Override FileTime lastAccessTime() { FileTime.from(lastModified) } + @Override FileTime creationTime() { FileTime.from(created) } + @Override boolean isRegularFile() { !directory } + @Override boolean isDirectory() { directory } + @Override boolean isSymbolicLink() { false } + @Override boolean isOther() { false } + @Override long size() { size } + @Override Object fileKey() { fileKey } +} +``` + +- [ ] **Step 2: Drop the now-unused `DatasetDto` import** + +In the same file, remove `import io.seqera.tower.model.DatasetDto`. Keep `import java.time.Instant`. + +- [ ] **Step 3: Compile the plugin** + +Run: `./gradlew :plugins:nf-tower:compileGroovy` +Expected: `BUILD FAILED` with errors in `SeqeraFileSystemProvider.groovy` (calls `new SeqeraFileAttributes(dataset)`) — we will fix that when we extract the dataset handler. Compile errors here are expected at this step only; leave them for T004. + +- [ ] **Step 4: Commit** + +```bash +git add plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileAttributes.groovy +git commit -s -m "refactor(nf-tower): generalize SeqeraFileAttributes (no DatasetDto coupling)" +``` + +### T002 — Add `ResourceTypeHandler` interface + +**Files:** +- Create: `plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/ResourceTypeHandler.groovy` + +- [ ] **Step 1: Write the interface** + +```groovy +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.fs + +import java.nio.file.AccessMode +import java.nio.file.Path + +/** + * Strategy owning the semantics of one depth-3 path segment under {@code seqera://}. + * Registered in {@link SeqeraFileSystem} at filesystem construction. + */ +interface ResourceTypeHandler { + + /** e.g. {@code "datasets"} or {@code "data-links"}. Must match the depth-3 path segment. */ + String getResourceType() + + /** List entries at the given directory path. Caller has verified depth ≥ 3 and {@code sp.isDirectory()}. */ + List list(SeqeraPath dir) throws IOException + + /** Return attributes for any path at depth ≥ 3 owned by this handler. */ + SeqeraFileAttributes readAttributes(SeqeraPath path) throws IOException + + /** Open a read stream for a leaf path. Throw {@link IllegalArgumentException} if the path is a directory. */ + InputStream newInputStream(SeqeraPath path) throws IOException + + /** Verify the path exists and requested modes are satisfiable. READ is allowed; WRITE/EXECUTE throw {@link java.nio.file.AccessDeniedException}. */ + void checkAccess(SeqeraPath path, AccessMode... modes) throws IOException +} +``` + +- [ ] **Step 2: Compile (the other existing errors from T001 still stand — expected)** + +Run: `./gradlew :plugins:nf-tower:compileGroovy` +Expected: same errors as T001 (in `SeqeraFileSystemProvider.groovy`), no new errors from the new file. + +- [ ] **Step 3: Commit** + +```bash +git add plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/ResourceTypeHandler.groovy +git commit -s -m "refactor(nf-tower): add ResourceTypeHandler interface" +``` + +### T003 — Refactor `SeqeraPath` to use generic `trail` segments + +**Files:** +- Modify: `plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraPath.groovy` +- Modify: `plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraPathTest.groovy` + +- [ ] **Step 1: Read the current `SeqeraPath` tests to understand coverage** + +Open `plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraPathTest.groovy`. Note the existing cases you must continue to pass (URI parsing, `toUri()` round-trip, `getParent`, `resolve`, `@version`, `equals`/`hashCode`). + +- [ ] **Step 2: Write failing tests for the new depth-≥5 cases** + +Append to the end of the class body (before the final `}`): + +```groovy + def "parse data-link path with provider and name"() { + when: + def p = new SeqeraPath(Mock(SeqeraFileSystem), 'seqera://acme/research/data-links/aws/inputs') + + then: + p.org == 'acme' + p.workspace == 'research' + p.resourceType == 'data-links' + p.trail == ['aws', 'inputs'] + p.depth() == 5 + } + + def "parse data-link path with nested sub-path"() { + when: + def p = new SeqeraPath(Mock(SeqeraFileSystem), 'seqera://acme/research/data-links/aws/inputs/reads/sample1.fq.gz') + + then: + p.trail == ['aws', 'inputs', 'reads', 'sample1.fq.gz'] + p.depth() == 7 + p.isRegularFile() == false // handler decides; generic class reports directory-by-default for trail != 1 dataset + } + + def "getParent walks up one trail segment for deep data-link paths"() { + given: + def fs = Mock(SeqeraFileSystem) + def p = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/reads/s.fq.gz') + + when: + def parent = p.getParent() as SeqeraPath + + then: + parent.trail == ['aws', 'inputs', 'reads'] + parent.depth() == 6 + } + + def "resolve appends one segment to trail"() { + given: + def fs = Mock(SeqeraFileSystem) + def p = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs') + + when: + def child = p.resolve('reads') as SeqeraPath + + then: + child.trail == ['aws', 'inputs', 'reads'] + } + + def "toUri round-trip for deep data-link path"() { + given: + def fs = Mock(SeqeraFileSystem) + def original = 'seqera://acme/research/data-links/aws/inputs/reads/sample.fq.gz' + + when: + def p = new SeqeraPath(fs, original) + + then: + p.toUri().toString() == original + } + + def "dataset version pinning preserved after refactor"() { + given: + def fs = Mock(SeqeraFileSystem) + + when: + def p = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples@3') + + then: + p.resourceType == 'datasets' + p.trail == ['samples'] + p.version == '3' + p.datasetName == 'samples' + p.depth() == 4 + } +``` + +Run: `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.fs.SeqeraPathTest' -i` +Expected: these new tests fail (methods `trail`, `version`, `datasetName` don't behave right yet, or `depth()` returns 4 for data-link paths because the current parse truncates). + +- [ ] **Step 3: Rewrite `SeqeraPath` to use generic trail segments** + +Replace `SeqeraPath.groovy` entirely with: + +```groovy +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.fs + +import java.nio.file.FileSystem +import java.nio.file.InvalidPathException +import java.nio.file.LinkOption +import java.nio.file.Path +import java.nio.file.ProviderMismatchException +import java.nio.file.WatchEvent +import java.nio.file.WatchKey +import java.nio.file.WatchService + +import groovy.transform.CompileStatic + +/** + * {@link Path} implementation for the {@code seqera://} scheme. + * + * Path shape: + *
+ *   seqera://                               depth 0 — root
+ *   seqera://<org>                       depth 1
+ *   seqera://<org>/<ws>               depth 2
+ *   seqera://<org>/<ws>/<type>      depth 3 — resource type
+ *   seqera://<org>/<ws>/<type>/...   depth 4+ — handler-owned trail
+ * 
+ * + * The generic class is resource-type-agnostic for depth ≥ 3: segments after + * {@code resourceType} are exposed as {@link #getTrail()} for the matching + * {@link ResourceTypeHandler} to interpret. + * + * The dataset convention (single trail segment, optional {@code @version} suffix) + * is preserved via {@link #getDatasetName()} and {@link #getVersion()} accessors. + */ +@CompileStatic +class SeqeraPath implements Path { + + public static final String SCHEME = 'seqera' + public static final String PROTOCOL = "${SCHEME}://" + public static final String SEPARATOR = '/' + + private final SeqeraFileSystem fs + private final String org + private final String workspace + private final String resourceType + private final List trail + private final String version + private final String relPath + + SeqeraPath(SeqeraFileSystem fs, String uriString) { + this.fs = fs + this.relPath = null + if (!uriString.startsWith(PROTOCOL)) + throw new InvalidPathException(uriString, "Not a seqera:// URI") + final withoutScheme = uriString.substring(PROTOCOL.length()) + final parts = withoutScheme.split('/', -1).toList().findAll { String s -> s != null } as List + this.org = parts.size() > 0 && parts[0] ? parts[0] : null + this.workspace = parts.size() > 1 && parts[1] ? parts[1] : null + this.resourceType = parts.size() > 2 && parts[2] ? parts[2] : null + final List tail = parts.size() > 3 ? new ArrayList(parts.subList(3, parts.size())) : new ArrayList() + // For datasets: strip "@version" from the last trail segment if present. + if (this.resourceType == 'datasets' && tail.size() == 1) { + final last = tail[0] + final atIdx = last.lastIndexOf('@') + if (atIdx > 0) { + tail[0] = last.substring(0, atIdx) + this.version = last.substring(atIdx + 1) + } else { + this.version = null + } + } else { + this.version = null + } + this.trail = Collections.unmodifiableList(tail) + validatePath(uriString) + } + + /** Programmatic absolute-path constructor. */ + SeqeraPath(SeqeraFileSystem fs, String org, String workspace, String resourceType, List trail, String version) { + this.fs = fs + this.relPath = null + this.org = org + this.workspace = workspace + this.resourceType = resourceType + this.trail = trail != null ? Collections.unmodifiableList(new ArrayList(trail)) : Collections.emptyList() + this.version = version + validatePath(null) + } + + /** Relative path, produced only by {@link #relativize(Path)}. */ + SeqeraPath(String relPath) { + this.fs = null + this.relPath = relPath ?: '' + this.org = null + this.workspace = null + this.resourceType = null + this.trail = Collections.emptyList() + this.version = null + } + + private void validatePath(String original) { + final label = original ?: rawPath() + if (trail && !resourceType) + throw new InvalidPathException(label, "Trail segments require a resource-type segment") + if (resourceType && !workspace) + throw new InvalidPathException(label, "Resource type requires a workspace segment") + if (workspace && !org) + throw new InvalidPathException(label, "Workspace requires an org segment") + if (org?.contains('/')) + throw new InvalidPathException(label, "Org name cannot contain '/'") + if (workspace?.contains('/')) + throw new InvalidPathException(label, "Workspace name cannot contain '/'") + if (resourceType?.contains('/')) + throw new InvalidPathException(label, "Resource type cannot contain '/'") + for (String t : trail) { + if (t == null || t.isEmpty()) + throw new InvalidPathException(label, "Path segments cannot be empty") + if (t.contains('/')) + throw new InvalidPathException(label, "Path segments cannot contain '/'") + } + // Datasets accept at most one trail segment + if (resourceType == 'datasets' && trail.size() > 1) + throw new InvalidPathException(label, "Dataset paths cannot have sub-paths beyond the dataset name") + } + + private String rawPath() { + final sb = new StringBuilder(PROTOCOL) + if (org) sb.append(org) + if (workspace) sb.append('/').append(workspace) + if (resourceType) sb.append('/').append(resourceType) + for (int i = 0; i < trail.size(); i++) { + sb.append('/') + if (i == trail.size() - 1 && version) + sb.append(trail[i]).append('@').append(version) + else + sb.append(trail[i]) + } + return sb.toString() + } + + private List nameComponents() { + if (isAbsolute()) { + final d = depth() + final out = new ArrayList(d) + for (int i = 0; i < d; i++) + out.add(getName(i).toString()) + return out + } + if (!relPath) return Collections.emptyList() + return relPath.split('/').toList().findAll { String s -> s } as List + } + + // ---- accessors ---- + + String getOrg() { org } + String getWorkspace() { workspace } + String getResourceType() { resourceType } + List getTrail() { trail } + String getVersion() { version } + + /** Backwards-compat: dataset name is the single trail segment when resourceType=='datasets'. */ + String getDatasetName() { + (resourceType == 'datasets' && trail.size() == 1) ? trail[0] : null + } + + int depth() { + if (resourceType) return 3 + trail.size() + if (workspace) return 2 + if (org) return 1 + return 0 + } + + boolean isDirectory() { + // Dataset leaf at depth 4 is a file; all other shapes are directory-by-default. + // Handlers override this interpretation for data-link sub-paths via readAttributes. + !(resourceType == 'datasets' && trail.size() == 1) + } + + boolean isRegularFile() { !isDirectory() } + + // ---- Path API ---- + + @Override FileSystem getFileSystem() { fs } + @Override boolean isAbsolute() { fs != null } + + @Override + Path getRoot() { new SeqeraPath(fs, null, null, null, null, null) } + + @Override + Path getFileName() { + final d = depth() + if (d == 0) return null + if (d >= 4) { + final last = trail[trail.size() - 1] + return new SeqeraPath((d == 4 && version) ? "${last}@${version}" as String : last) + } + if (d == 3) return new SeqeraPath(resourceType) + if (d == 2) return new SeqeraPath(workspace) + return new SeqeraPath(org) + } + + @Override + Path getParent() { + final d = depth() + if (d == 0) return null + if (d == 1) return new SeqeraPath(fs, null, null, null, null, null) + if (d == 2) return new SeqeraPath(fs, org, null, null, null, null) + if (d == 3) return new SeqeraPath(fs, org, workspace, null, null, null) + // d >= 4: drop last trail segment + final newTrail = trail.subList(0, trail.size() - 1) + return new SeqeraPath(fs, org, workspace, resourceType, newTrail, null) + } + + @Override int getNameCount() { depth() } + + @Override + Path getName(int index) { + final d = depth() + if (index < 0 || index >= d) + throw new IllegalArgumentException("Index out of range: $index") + if (index == 0) return new SeqeraPath(org) + if (index == 1) return new SeqeraPath(workspace) + if (index == 2) return new SeqeraPath(resourceType) + final trailIdx = index - 3 + final seg = trail[trailIdx] + // Only the last segment of a depth-4 dataset path carries the version suffix + if (trailIdx == trail.size() - 1 && version && resourceType == 'datasets') + return new SeqeraPath("${seg}@${version}" as String) + return new SeqeraPath(seg) + } + + @Override + Path subpath(int beginIndex, int endIndex) { + throw new UnsupportedOperationException("subpath not supported by seqera:// paths") + } + + @Override + boolean startsWith(Path other) { + if (other !instanceof SeqeraPath) return false + final that = (SeqeraPath) other + if (this.isAbsolute() != that.isAbsolute()) return false + final mine = nameComponents() + final theirs = that.nameComponents() + if (theirs.size() > mine.size()) return false + for (int i = 0; i < theirs.size(); i++) + if (mine[i] != theirs[i]) return false + return true + } + + @Override + boolean startsWith(String other) { + if (!other) return false + try { + final p = SeqeraPath.isSeqeraUri(other) ? new SeqeraPath(fs, other) : new SeqeraPath(other) + return startsWith(p) + } catch (Exception ignored) { return false } + } + + @Override + boolean endsWith(Path other) { + if (other !instanceof SeqeraPath) return false + final that = (SeqeraPath) other + if (that.isAbsolute()) return this.equals(that) + final mine = nameComponents() + final theirs = that.nameComponents() + if (theirs.isEmpty() || theirs.size() > mine.size()) return false + final offset = mine.size() - theirs.size() + for (int i = 0; i < theirs.size(); i++) + if (mine[offset + i] != theirs[i]) return false + return true + } + + @Override + boolean endsWith(String other) { + if (!other) return false + try { + final p = SeqeraPath.isSeqeraUri(other) ? new SeqeraPath(fs, other) : new SeqeraPath(other) + return endsWith(p) + } catch (Exception ignored) { return false } + } + + @Override Path normalize() { this } + + @Override + Path resolve(Path other) { + if (other instanceof SeqeraPath) { + final that = (SeqeraPath) other + if (that.isAbsolute()) return that + return resolve(that.relPath) + } + return resolve(other.toString()) + } + + @Override + Path resolve(String segment) { + if (!segment) return this + if (segment.startsWith(PROTOCOL)) + return new SeqeraPath(fs, segment) + final stripped = segment.startsWith(SEPARATOR) ? segment.substring(1) : segment + if (!stripped) return this + final segs = stripped.split(SEPARATOR, -1).findAll { String s -> s } as List + SeqeraPath result = this + for (String seg : segs) result = result.resolveOne(seg) + return result + } + + private SeqeraPath resolveOne(String seg) { + final d = depth() + if (d == 0) return new SeqeraPath(fs, seg, null, null, null, null) + if (d == 1) return new SeqeraPath(fs, org, seg, null, null, null) + if (d == 2) return new SeqeraPath(fs, org, workspace, seg, null, null) + // d >= 3: append to trail (with @version parsing only for dataset-shaped paths at depth 3→4) + if (d == 3 && resourceType == 'datasets') { + final atIdx = seg.lastIndexOf('@') + if (atIdx > 0) + return new SeqeraPath(fs, org, workspace, resourceType, [seg.substring(0, atIdx)], seg.substring(atIdx + 1)) + } + final newTrail = new ArrayList(trail) + newTrail.add(seg) + return new SeqeraPath(fs, org, workspace, resourceType, newTrail, null) + } + + @Override + Path resolveSibling(Path other) { + final parent = getParent() + return parent != null ? parent.resolve(other) : other + } + + @Override + Path resolveSibling(String other) { + final parent = getParent() + return parent != null ? parent.resolve(other) : new SeqeraPath(fs, other) + } + + @Override + Path relativize(Path other) { + if (other !instanceof SeqeraPath) throw new ProviderMismatchException() + final that = (SeqeraPath) other + if (!this.isAbsolute() || !that.isAbsolute()) + throw new IllegalArgumentException("Both paths must be absolute to relativize: ${this} vs ${other}") + final mine = this.nameComponents() + final theirs = that.nameComponents() + int common = 0 + while (common < mine.size() && common < theirs.size() && mine[common] == theirs[common]) common++ + final parts = new ArrayList() + for (int i = common; i < mine.size(); i++) parts.add('..') + for (int i = common; i < theirs.size(); i++) parts.add(theirs[i]) + return new SeqeraPath(parts.join(SEPARATOR)) + } + + @Override + URI toUri() { + String uriPath = null + if (workspace) { + final segments = [workspace] + if (resourceType) segments.add(resourceType) + for (int i = 0; i < trail.size(); i++) { + final t = trail[i] + if (i == trail.size() - 1 && version && resourceType == 'datasets') + segments.add("${t}@${version}" as String) + else + segments.add(t) + } + uriPath = '/' + segments.join('/') + } + return new URI(SCHEME, org ?: '', uriPath, null, null) + } + + @Override + String toString() { + if (!isAbsolute()) return relPath + if (depth() == 0) return PROTOCOL + return toUri().toString() + } + + @Override + Path toAbsolutePath() { + if (!isAbsolute()) + throw new IllegalStateException("Cannot convert relative SeqeraPath to absolute — no default directory context") + return this + } + + @Override Path toRealPath(LinkOption... options) { this } + + @Override + File toFile() { throw new UnsupportedOperationException("toFile() not supported for seqera:// paths") } + + @Override + WatchKey register(WatchService w, WatchEvent.Kind[] e, WatchEvent.Modifier... m) { + throw new UnsupportedOperationException("WatchService not supported by seqera:// paths") + } + + @Override + WatchKey register(WatchService w, WatchEvent.Kind... e) { + throw new UnsupportedOperationException("WatchService not supported by seqera:// paths") + } + + @Override + Iterator iterator() { + final d = depth() + final out = new ArrayList(d) + for (int i = 0; i < d; i++) out.add(getName(i)) + return out.iterator() + } + + @Override int compareTo(Path other) { toString().compareTo(other.toString()) } + + @Override + boolean equals(Object obj) { + if (obj == this) return true + if (obj !instanceof SeqeraPath) return false + return toString() == obj.toString() + } + + @Override int hashCode() { toString().hashCode() } + + static URI asUri(String path) { + if (!path) throw new IllegalArgumentException("Missing 'path' argument") + if (!path.startsWith(PROTOCOL)) + throw new IllegalArgumentException("Invalid Seqera file system path URI - it must start with '${PROTOCOL}' prefix - offending value: $path") + if (path.startsWith(PROTOCOL + SEPARATOR) && path.length() > PROTOCOL.length() + 1) + throw new IllegalArgumentException("Invalid Seqera file system path URI - make sure the scheme prefix does not contain more than two slash characters or a query in the root '/' - offending value: $path") + if (path.startsWith(PROTOCOL + './')) + path = PROTOCOL + path.substring(PROTOCOL.length() + 2) + if (path == PROTOCOL || path == PROTOCOL + '.') + return new URI(PROTOCOL + '/') + return new URI(path) + } + + static boolean isSeqeraUri(String path) { + return path && path.startsWith(PROTOCOL) + } +} +``` + +- [ ] **Step 4: Run the `SeqeraPathTest` — new and existing cases must pass** + +Run: `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.fs.SeqeraPathTest' -i` +Expected: all tests pass. If any existing test fails, it signals a refactor regression — fix before continuing. + +- [ ] **Step 5: Commit** + +```bash +git add plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraPath.groovy \ + plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraPathTest.groovy +git commit -s -m "refactor(nf-tower): generalize SeqeraPath with trail segments for multi-resource support" +``` + +### T004 — Extract `DatasetsResourceHandler` + +**Files:** +- Create: `plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandler.groovy` +- Create: `plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandlerTest.groovy` + +- [ ] **Step 1: Write the handler — move logic from current `SeqeraFileSystemProvider`** + +The source of the logic is the current `newInputStream`, `readAttributes`, `newDirectoryStream` (depth 3 only), and `checkAccess` branches in `SeqeraFileSystemProvider.groovy` that handle `datasets`. Preserve behavior byte-for-byte. + +```groovy +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.fs.handler + +import java.nio.file.AccessDeniedException +import java.nio.file.AccessMode +import java.nio.file.NoSuchFileException +import java.nio.file.Path +import java.time.Instant + +import groovy.transform.CompileStatic +import groovy.util.logging.Slf4j +import io.seqera.tower.model.DatasetDto +import io.seqera.tower.model.DatasetVersionDto +import io.seqera.tower.plugin.dataset.SeqeraDatasetClient +import io.seqera.tower.plugin.fs.ResourceTypeHandler +import io.seqera.tower.plugin.fs.SeqeraFileAttributes +import io.seqera.tower.plugin.fs.SeqeraFileSystem +import io.seqera.tower.plugin.fs.SeqeraPath + +/** + * {@link ResourceTypeHandler} for {@code datasets} resource type. + * All logic previously inlined in {@link SeqeraFileSystemProvider} for dataset paths lives here. + */ +@Slf4j +@CompileStatic +class DatasetsResourceHandler implements ResourceTypeHandler { + + public static final String TYPE = 'datasets' + + private final SeqeraFileSystem fs + private final SeqeraDatasetClient client + + DatasetsResourceHandler(SeqeraFileSystem fs, SeqeraDatasetClient client) { + this.fs = fs + this.client = client + } + + @Override + String getResourceType() { TYPE } + + @Override + List list(SeqeraPath dir) throws IOException { + final d = dir.depth() + if (d == 3) { + final workspaceId = fs.resolveWorkspaceId(dir.org, dir.workspace) + final datasets = fs.resolveDatasets(workspaceId) + return datasets.collect { DatasetDto ds -> dir.resolve(ds.name) as Path } + } + throw new IllegalArgumentException("datasets handler cannot list depth $d paths: $dir") + } + + @Override + SeqeraFileAttributes readAttributes(SeqeraPath p) throws IOException { + final d = p.depth() + if (d == 3) { + // resource-type dir — validate workspace + fs.resolveWorkspaceId(p.org, p.workspace) + return new SeqeraFileAttributes(true) + } + if (d != 4) + throw new NoSuchFileException(p.toString(), null, "Invalid dataset path depth: $d") + final workspaceId = fs.resolveWorkspaceId(p.org, p.workspace) + final dataset = fs.resolveDataset(workspaceId, p.datasetName) + if (!dataset) + throw new NoSuchFileException(p.toString(), null, "Dataset '${p.datasetName}' not found in workspace ${p.workspace}") + return new SeqeraFileAttributes( + 0L, + dataset.lastUpdated?.toInstant() ?: Instant.EPOCH, + dataset.dateCreated?.toInstant() ?: Instant.EPOCH, + dataset.id ) + } + + @Override + InputStream newInputStream(SeqeraPath p) throws IOException { + if (p.depth() != 4) + throw new IllegalArgumentException("Operation `newInputStream` requires a dataset path (depth 4): $p") + final workspaceId = fs.resolveWorkspaceId(p.org, p.workspace) + final dataset = fs.resolveDataset(workspaceId, p.datasetName) + if (!dataset) + throw new NoSuchFileException(p.toString(), null, "Dataset '${p.datasetName}' not found in workspace ${p.workspace}") + final version = resolveVersion(dataset, p) + log.debug "Downloading dataset '${p.datasetName}' version ${version.version} (${version.fileName}) from workspace $workspaceId" + return client.downloadDataset(dataset.id, String.valueOf(version.version), version.fileName, dataset.workspaceId) + } + + @Override + void checkAccess(SeqeraPath p, AccessMode... modes) throws IOException { + for (AccessMode m : modes) { + if (m == AccessMode.WRITE || m == AccessMode.EXECUTE) + throw new AccessDeniedException(p.toString(), null, "seqera:// datasets are read-only") + } + // READ: make sure the dataset resolves + readAttributes(p) + } + + private DatasetVersionDto resolveVersion(DatasetDto dataset, SeqeraPath p) throws IOException { + final pinned = p.version + final versions = fs.resolveVersions(dataset.id, dataset.workspaceId) + if (versions.isEmpty()) + throw new NoSuchFileException(p.toString(), null, "No versions available for dataset '${dataset.name}'") + if (pinned) { + final found = versions.find { DatasetVersionDto v -> String.valueOf(v.version) == pinned } + if (!found) + throw new NoSuchFileException(p.toString(), null, "Version '$pinned' not found for dataset '${dataset.name}'") + return found + } + final latest = versions.findAll { DatasetVersionDto v -> !v.disabled }.max { DatasetVersionDto v -> v.version } + if (!latest) + throw new NoSuchFileException(p.toString(), null, "No enabled versions for dataset '${dataset.name}'") + return latest + } +} +``` + +- [ ] **Step 2: Write Spock tests for the handler** + +```groovy +/* + * Copyright 2013-2026, Seqera Labs + * (Apache 2.0) + */ +package io.seqera.tower.plugin.fs.handler + +import java.nio.file.AccessMode +import java.nio.file.NoSuchFileException + +import io.seqera.tower.model.DatasetDto +import io.seqera.tower.model.DatasetVersionDto +import io.seqera.tower.plugin.dataset.SeqeraDatasetClient +import io.seqera.tower.plugin.fs.SeqeraFileSystem +import io.seqera.tower.plugin.fs.SeqeraPath +import spock.lang.Specification + +class DatasetsResourceHandlerTest extends Specification { + + def fs = Mock(SeqeraFileSystem) + def client = Mock(SeqeraDatasetClient) + def handler = new DatasetsResourceHandler(fs, client) + + def "getResourceType returns 'datasets'"() { + expect: + handler.resourceType == 'datasets' + } + + def "list at depth 3 returns one path per dataset"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets') + def ds1 = new DatasetDto(id: 'd1', name: 'one') + def ds2 = new DatasetDto(id: 'd2', name: 'two') + + when: + def paths = handler.list(path) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * fs.resolveDatasets(10L) >> [ds1, ds2] + paths*.toString() == ['seqera://acme/research/datasets/one', 'seqera://acme/research/datasets/two'] + } + + def "newInputStream resolves latest non-disabled version when no pin"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples') + def ds = new DatasetDto(id: 'd1', name: 'samples', workspaceId: 10L) + def v1 = new DatasetVersionDto(datasetId: 'd1', version: 1L, fileName: 'a.csv', disabled: false) + def v2 = new DatasetVersionDto(datasetId: 'd1', version: 2L, fileName: 'b.csv', disabled: false) + def content = new ByteArrayInputStream('x'.bytes) + + when: + def stream = handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * fs.resolveDataset(10L, 'samples') >> ds + 1 * fs.resolveVersions('d1', 10L) >> [v1, v2] + 1 * client.downloadDataset('d1', '2', 'b.csv', 10L) >> content + stream === content + } + + def "newInputStream honors @version pin"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples@1') + def ds = new DatasetDto(id: 'd1', name: 'samples', workspaceId: 10L) + def v1 = new DatasetVersionDto(datasetId: 'd1', version: 1L, fileName: 'a.csv', disabled: false) + def v2 = new DatasetVersionDto(datasetId: 'd1', version: 2L, fileName: 'b.csv', disabled: false) + + when: + handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * fs.resolveDataset(10L, 'samples') >> ds + 1 * fs.resolveVersions('d1', 10L) >> [v1, v2] + 1 * client.downloadDataset('d1', '1', 'a.csv', 10L) >> new ByteArrayInputStream('x'.bytes) + } + + def "newInputStream throws NoSuchFileException when dataset is missing"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/ghost') + + when: + handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * fs.resolveDataset(10L, 'ghost') >> null + thrown(NoSuchFileException) + } + + def "checkAccess rejects WRITE"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples') + + when: + handler.checkAccess(path, AccessMode.WRITE) + + then: + thrown(java.nio.file.AccessDeniedException) + } +} +``` + +- [ ] **Step 3: Run the handler tests** + +Run: `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.fs.handler.DatasetsResourceHandlerTest' -i` +Expected: all pass (after T005–T006 make the provider compile; until then the top-level compile failure blocks this — continue to T005). + +- [ ] **Step 4: Commit** + +```bash +git add plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandler.groovy \ + plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandlerTest.groovy +git commit -s -m "refactor(nf-tower): extract DatasetsResourceHandler from SeqeraFileSystemProvider" +``` + +### T005 — Add handler registry to `SeqeraFileSystem` + +**Files:** +- Modify: `plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystem.groovy` + +- [ ] **Step 1: Add the registry field and accessors** + +Insert after the existing cache field declarations (e.g. after the `versionCache` line): + +```groovy + private final Map handlers = new LinkedHashMap<>() +``` + +Then insert immediately before the closing `}` of the class: + +```groovy + // ---- handler registry ---- + + synchronized void registerHandler(ResourceTypeHandler handler) { + handlers.put(handler.resourceType, handler) + } + + synchronized ResourceTypeHandler getHandler(String resourceType) { + handlers.get(resourceType) + } + + synchronized Set getResourceTypes() { + Collections.unmodifiableSet(new LinkedHashSet(handlers.keySet())) + } +``` + +- [ ] **Step 2: Compile** + +Run: `./gradlew :plugins:nf-tower:compileGroovy` +Expected: compile errors in `SeqeraFileSystemProvider.groovy` remain; no new errors from `SeqeraFileSystem`. + +- [ ] **Step 3: Commit** + +```bash +git add plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystem.groovy +git commit -s -m "refactor(nf-tower): add ResourceTypeHandler registry to SeqeraFileSystem" +``` + +### T006 — Refactor `SeqeraFileSystemProvider` to dispatch via handlers + +**Files:** +- Modify: `plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy` + +- [ ] **Step 1: Replace the provider body** + +Replace the class body with the following (the class shell, package, imports are adjusted): + +```groovy +/* + * Copyright 2013-2026, Seqera Labs + * (Apache 2.0 header unchanged) + */ + +package io.seqera.tower.plugin.fs + +import java.nio.channels.SeekableByteChannel +import java.nio.file.AccessDeniedException +import java.nio.file.AccessMode +import java.nio.file.CopyOption +import java.nio.file.DirectoryIteratorException +import java.nio.file.DirectoryStream +import java.nio.file.FileStore +import java.nio.file.FileSystem +import java.nio.file.FileSystemAlreadyExistsException +import java.nio.file.FileSystemNotFoundException +import java.nio.file.Files +import java.nio.file.LinkOption +import java.nio.file.NoSuchFileException +import java.nio.file.NotDirectoryException +import java.nio.file.OpenOption +import java.nio.file.Path +import java.nio.file.ProviderMismatchException +import java.nio.file.StandardOpenOption +import java.nio.file.attribute.BasicFileAttributes +import java.nio.file.attribute.FileAttribute +import java.nio.file.attribute.FileAttributeView +import java.nio.file.spi.FileSystemProvider + +import groovy.transform.CompileStatic +import groovy.util.logging.Slf4j +import io.seqera.tower.plugin.TowerClient +import io.seqera.tower.plugin.TowerFactory +import io.seqera.tower.plugin.dataset.SeqeraDatasetClient +import io.seqera.tower.plugin.datalink.SeqeraDataLinkClient +import io.seqera.tower.plugin.fs.handler.DatasetsResourceHandler +import io.seqera.tower.plugin.fs.handler.DataLinksResourceHandler + +@Slf4j +@CompileStatic +class SeqeraFileSystemProvider extends FileSystemProvider { + + public static final String SCHEME = 'seqera' + + private volatile SeqeraFileSystem fileSystem + + @Override String getScheme() { SCHEME } + + @Override + synchronized FileSystem newFileSystem(URI uri, Map env) throws IOException { + checkScheme(uri) + if (fileSystem) + throw new FileSystemAlreadyExistsException("File system `seqera://` already exists") + final TowerClient tc = TowerFactory.client() + if (!tc) + throw new IllegalStateException("File system `seqera://` requires the Seqera Platform access token — use `tower.accessToken` or TOWER_ACCESS_TOKEN") + final datasetClient = new SeqeraDatasetClient(tc) + fileSystem = new SeqeraFileSystem(this, datasetClient) + fileSystem.registerHandler(new DatasetsResourceHandler(fileSystem, datasetClient)) + fileSystem.registerHandler(new DataLinksResourceHandler(fileSystem, new SeqeraDataLinkClient(tc))) + return fileSystem + } + + @Override + synchronized FileSystem getFileSystem(URI uri) { + checkScheme(uri) + if (!fileSystem) throw new FileSystemNotFoundException("No seqera:// filesystem has been created yet") + return fileSystem + } + + synchronized SeqeraFileSystem getOrCreateFileSystem(URI uri, Map env) { + checkScheme(uri) + if (!fileSystem) newFileSystem(uri, env ?: Collections.emptyMap()) + return fileSystem + } + + @Override + SeqeraPath getPath(URI uri) { + final fs = getOrCreateFileSystem(uri, Collections.emptyMap()) + return new SeqeraPath(fs, uri.toString()) + } + + // ---- read ---- + + @Override + InputStream newInputStream(Path path, OpenOption... options) throws IOException { + final sp = toSeqeraPath(path) + if (sp.depth() < 3) + throw new IllegalArgumentException("newInputStream requires a leaf path: $path") + final fs = sp.getFileSystem() as SeqeraFileSystem + final h = fs.getHandler(sp.resourceType) + if (!h) throw new NoSuchFileException(path.toString(), null, "Unsupported resource type: ${sp.resourceType}") + return h.newInputStream(sp) + } + + @Override + SeekableByteChannel newByteChannel(Path path, Set options, FileAttribute... attrs) throws IOException { + if (options?.contains(StandardOpenOption.WRITE) || options?.contains(StandardOpenOption.APPEND)) + throw new UnsupportedOperationException("seqera:// filesystem is read-only") + return new DatasetInputStream(newInputStream(path)) + } + + // ---- attributes ---- + + @Override +
A readAttributes(Path path, Class type, LinkOption... options) throws IOException { + if (!BasicFileAttributes.isAssignableFrom(type)) + throw new UnsupportedOperationException("Attribute type not supported: $type") + final sp = toSeqeraPath(path) + final fs = sp.getFileSystem() as SeqeraFileSystem + final d = sp.depth() + if (d < 3) { + validateSharedDirectoryExists(fs, sp) + return (A) new SeqeraFileAttributes(true) + } + final h = fs.getHandler(sp.resourceType) + if (!h) throw new NoSuchFileException(path.toString(), null, "Unsupported resource type: ${sp.resourceType}") + return (A) h.readAttributes(sp) + } + + @Override + Map readAttributes(Path path, String attributes, LinkOption... options) throws IOException { + throw new UnsupportedOperationException("readAttributes(String) not supported by `seqera://` filesystem") + } + + // ---- access ---- + + @Override + void checkAccess(Path path, AccessMode... modes) throws IOException { + final sp = toSeqeraPath(path) + for (AccessMode m : modes) { + if (m == AccessMode.WRITE || m == AccessMode.EXECUTE) + throw new AccessDeniedException(path.toString(), null, "seqera:// filesystem is read-only") + } + final d = sp.depth() + if (d == 0) return + if (d < 3) { + validateSharedDirectoryExists(sp.getFileSystem() as SeqeraFileSystem, sp) + return + } + final fs = sp.getFileSystem() as SeqeraFileSystem + final h = fs.getHandler(sp.resourceType) + if (!h) throw new NoSuchFileException(path.toString(), null, "Unsupported resource type: ${sp.resourceType}") + h.checkAccess(sp, modes) + } + + // ---- directory stream ---- + + @Override + DirectoryStream newDirectoryStream(Path dir, DirectoryStream.Filter filter) throws IOException { + final sp = toSeqeraPath(dir) + final fs = sp.getFileSystem() as SeqeraFileSystem + final d = sp.depth() + List entries + if (d == 0) { + fs.loadOrgWorkspaceCache() + entries = fs.listOrgNames().collect { String org -> sp.resolve(org) as Path } + } else if (d == 1) { + fs.loadOrgWorkspaceCache() + entries = fs.listWorkspaceNames(sp.org).collect { String ws -> sp.resolve(ws) as Path } + } else if (d == 2) { + fs.resolveWorkspaceId(sp.org, sp.workspace) // validates existence + entries = fs.getResourceTypes().collect { String rt -> sp.resolve(rt) as Path } + } else { + final h = fs.getHandler(sp.resourceType) + if (!h) throw new NoSuchFileException(dir.toString(), null, "Unsupported resource type: ${sp.resourceType}") + entries = h.list(sp) + } + + final filtered = filter ? entries.findAll { Path p -> + try { filter.accept(p) } + catch (IOException e) { throw new DirectoryIteratorException(e) } + } : entries + + return new DirectoryStream() { + @Override Iterator iterator() { filtered.iterator() } + @Override void close() {} + } + } + + // ---- copy ---- + + @Override + void copy(Path source, Path target, CopyOption... options) throws IOException { + toSeqeraPath(source) + if (target instanceof SeqeraPath) + throw new UnsupportedOperationException("seqera:// filesystem is read-only") + try (final InputStream is = newInputStream(source)) { + Files.copy(is, target, options) + } + } + + // ---- unsupported mutations ---- + + @Override void move(Path s, Path t, CopyOption... o) { throw new UnsupportedOperationException("move() not supported") } + @Override void delete(Path p) { throw new UnsupportedOperationException("delete() not supported") } + @Override void createDirectory(Path d, FileAttribute... a) { throw new UnsupportedOperationException("createDirectory() not supported") } + @Override boolean isSameFile(Path a, Path b) { a == b } + @Override boolean isHidden(Path p) { false } + @Override FileStore getFileStore(Path p) { throw new UnsupportedOperationException("getFileStore() not supported") } + @Override V getFileAttributeView(Path p, Class t, LinkOption... o) { null } + @Override void setAttribute(Path p, String a, Object v, LinkOption... o) { throw new UnsupportedOperationException("setAttribute() not supported") } + + // ---- helpers ---- + + private static SeqeraPath toSeqeraPath(Path path) { + if (path !instanceof SeqeraPath) throw new ProviderMismatchException() + return (SeqeraPath) path + } + + private static void checkScheme(URI uri) { + if (uri.scheme?.toLowerCase() != SCHEME) + throw new IllegalArgumentException("Not a seqera:// URI: $uri") + } + + private static void validateSharedDirectoryExists(SeqeraFileSystem fs, SeqeraPath sp) throws NoSuchFileException { + final d = sp.depth() + if (d == 0) return + fs.loadOrgWorkspaceCache() + if (d >= 1 && !fs.listOrgNames().contains(sp.org)) + throw new NoSuchFileException("seqera://${sp.org}", null, "Organisation not found") + if (d >= 2) fs.resolveWorkspaceId(sp.org, sp.workspace) + } +} +``` + +- [ ] **Step 2: Compile** + +Run: `./gradlew :plugins:nf-tower:compileGroovy` +Expected: **one** remaining compile failure — `import io.seqera.tower.plugin.datalink.SeqeraDataLinkClient` and `import io.seqera.tower.plugin.fs.handler.DataLinksResourceHandler` are not yet defined. These are created in T008–T010; continue without committing yet. + +- [ ] **Step 3: Temporarily stub the missing imports so compile passes and existing tests can run** + +Create a minimal stub at `plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy`: + +```groovy +package io.seqera.tower.plugin.datalink + +import groovy.transform.CompileStatic +import io.seqera.tower.plugin.TowerClient + +@CompileStatic +class SeqeraDataLinkClient { + SeqeraDataLinkClient(TowerClient tc) {} +} +``` + +Create a minimal stub at `plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy`: + +```groovy +package io.seqera.tower.plugin.fs.handler + +import java.nio.file.AccessMode +import java.nio.file.Path +import groovy.transform.CompileStatic +import io.seqera.tower.plugin.datalink.SeqeraDataLinkClient +import io.seqera.tower.plugin.fs.ResourceTypeHandler +import io.seqera.tower.plugin.fs.SeqeraFileAttributes +import io.seqera.tower.plugin.fs.SeqeraFileSystem +import io.seqera.tower.plugin.fs.SeqeraPath + +@CompileStatic +class DataLinksResourceHandler implements ResourceTypeHandler { + DataLinksResourceHandler(SeqeraFileSystem fs, SeqeraDataLinkClient client) {} + @Override String getResourceType() { 'data-links' } + @Override List list(SeqeraPath dir) { throw new UnsupportedOperationException('stub') } + @Override SeqeraFileAttributes readAttributes(SeqeraPath path) { throw new UnsupportedOperationException('stub') } + @Override InputStream newInputStream(SeqeraPath path) { throw new UnsupportedOperationException('stub') } + @Override void checkAccess(SeqeraPath path, AccessMode... modes) { throw new UnsupportedOperationException('stub') } +} +``` + +- [ ] **Step 4: Compile and run the existing nf-tower tests** + +Run: `./gradlew :plugins:nf-tower:test -i` +Expected: all existing tests pass — including `SeqeraDatasetClientTest`, `SeqeraFileSystemTest`, `SeqeraPathTest`, `SeqeraFileSystemProviderTest`, `DatasetsResourceHandlerTest` from T004. Any failure is a refactor regression and must be fixed before committing. + +- [ ] **Step 5: Commit** + +```bash +git add plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy \ + plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy \ + plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy +git commit -s -m "refactor(nf-tower): SeqeraFileSystemProvider dispatches to ResourceTypeHandler; add data-link stubs" +``` + +**Checkpoint**: Refactor done. Dataset behavior is unchanged, routed through `DatasetsResourceHandler`. Data-link stubs exist but throw `UnsupportedOperationException`. + +--- + +## Phase 2: Data-Link API Client + +### T007 — Implement `SeqeraDataLinkClient` + +**Files:** +- Modify: `plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy` + +- [ ] **Step 1: Replace the stub with the real client** + +```groovy +/* + * Copyright 2013-2026, Seqera Labs + * (Apache 2.0) + */ + +package io.seqera.tower.plugin.datalink + +import java.nio.file.AccessDeniedException +import java.nio.file.NoSuchFileException + +import groovy.json.JsonSlurper +import groovy.transform.CompileStatic +import groovy.util.logging.Slf4j +import io.seqera.tower.model.DataLinkContentResponse +import io.seqera.tower.model.DataLinkDownloadUrlResponse +import io.seqera.tower.model.DataLinkDto +import io.seqera.tower.model.DataLinkItem +import io.seqera.tower.model.DataLinkItemType +import io.seqera.tower.model.DataLinkProvider +import io.seqera.tower.plugin.TowerClient +import io.seqera.tower.plugin.exception.ForbiddenException +import io.seqera.tower.plugin.exception.NotFoundException +import io.seqera.tower.plugin.exception.UnauthorizedException +import nextflow.exception.AbortOperationException + +/** + * Typed client for Seqera Platform data-link API endpoints. + * Delegates HTTP execution to {@link TowerClient#sendApiRequest}. + */ +@Slf4j +@CompileStatic +class SeqeraDataLinkClient { + + private static final int LIST_PAGE_SIZE = 100 + + private final TowerClient towerClient + + SeqeraDataLinkClient(TowerClient tc) { this.towerClient = tc } + + private String getEndpoint() { towerClient.endpoint } + + /** + * GET /data-links?workspaceId={ws}&max={n}&offset={o} + * Exhausts pagination and returns all data-links in the workspace. + */ + List listDataLinks(long workspaceId) { + final out = new ArrayList() + int offset = 0 + while (true) { + final url = "${endpoint}/data-links?workspaceId=${workspaceId}&max=${LIST_PAGE_SIZE}&offset=${offset}" + log.debug "SeqeraDataLinkClient GET $url" + final resp = towerClient.sendApiRequest(url) + checkFsResponse(resp, url) + final json = new JsonSlurper().parseText(resp.message) as Map + final items = json.dataLinks as List + if (items) { + for (Map m : items) out.add(mapDataLink(m)) + offset += items.size() + } + final total = (json.totalSize as Long) ?: 0L + if (!items || offset >= total) break + } + return out + } + + /** + * GET /data-links/{id}/content?workspaceId={ws}&path={sub}&nextPageToken={tok} + * Works for directories and single files. Exhausts {@code nextPageToken}. + * Returns a synthesised {@link DataLinkContentResponse} with concatenated objects. + */ + DataLinkContentResponse getContent(String dataLinkId, String subPath, long workspaceId) { + final out = new DataLinkContentResponse() + out.objects = new ArrayList() + String token = null + while (true) { + String url = "${endpoint}/data-links/${dataLinkId}/content?workspaceId=${workspaceId}" + if (subPath) url += "&path=${encode(subPath)}" + if (token) url += "&nextPageToken=${encode(token)}" + log.debug "SeqeraDataLinkClient GET $url" + final resp = towerClient.sendApiRequest(url) + checkFsResponse(resp, url) + final json = new JsonSlurper().parseText(resp.message) as Map + if (out.originalPath == null) out.originalPath = json.originalPath as String + final items = json.objects as List + if (items) for (Map m : items) out.objects.add(mapItem(m)) + token = json.nextPageToken as String + if (!token) break + } + return out + } + + /** GET /data-links/{id}/download?workspaceId={ws}&path={sub} */ + DataLinkDownloadUrlResponse getDownloadUrl(String dataLinkId, String subPath, long workspaceId) { + final url = "${endpoint}/data-links/${dataLinkId}/download?workspaceId=${workspaceId}&path=${encode(subPath ?: '')}" + log.debug "SeqeraDataLinkClient GET $url" + final resp = towerClient.sendApiRequest(url) + checkFsResponse(resp, url) + final json = new JsonSlurper().parseText(resp.message) as Map + final out = new DataLinkDownloadUrlResponse() + out.url = json.url as String + return out + } + + // ---- helpers ---- + + private static String encode(String s) { + new URI(null, null, s, null).rawPath + } + + private static void checkFsResponse(TowerClient.Response resp, String url) { + if (!resp.error) return + final code = resp.code + if (code == 401) + throw new AbortOperationException("Seqera authentication failed — check tower.accessToken or TOWER_ACCESS_TOKEN") + if (code == 403) + throw new AccessDeniedException(url, null, "Forbidden — check workspace permissions") + if (code == 404) + throw new NoSuchFileException(url) + throw new IOException("Seqera API error: HTTP ${code} for ${url}") + } + + private static DataLinkDto mapDataLink(Map m) { + final dto = new DataLinkDto() + dto.id = m.id as String + dto.name = m.name as String + dto.description = m.description as String + dto.resourceRef = m.resourceRef as String + if (m.provider) dto.provider = DataLinkProvider.fromValue(m.provider as String) + dto.region = m.region as String + return dto + } + + private static DataLinkItem mapItem(Map m) { + final it = new DataLinkItem() + it.name = m.name as String + if (m.type) it.type = DataLinkItemType.fromValue(m.type as String) + it.size = (m.size as Long) ?: 0L + it.mimeType = m.mimeType as String + return it + } +} +``` + +- [ ] **Step 2: Verify `DataLinkProvider.fromValue` and `DataLinkItemType.fromValue` exist** + +Run: `javap -p /home/jorgee/IdeaProjects/nextflow/plugins/nf-tower/build/target/libs/tower-api-1.121.0.jar | grep -A2 'class io.seqera.tower.model.DataLinkProvider' | head -20` — or just proceed. These `fromValue` methods are standard on generated Micronaut enums; if not present the compile step will tell us. + +- [ ] **Step 3: Compile** + +Run: `./gradlew :plugins:nf-tower:compileGroovy` +Expected: success. If `fromValue` is missing, fall back to `DataLinkProvider.values().find { it.toString() == m.provider }`. + +- [ ] **Step 4: Commit** + +```bash +git add plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy +git commit -s -m "feat(nf-tower): add SeqeraDataLinkClient with pagination and error mapping" +``` + +### T008 — Unit tests for `SeqeraDataLinkClient` + +**Files:** +- Create: `plugins/nf-tower/src/test/io/seqera/tower/plugin/datalink/SeqeraDataLinkClientTest.groovy` + +- [ ] **Step 1: Write the Spock spec** + +```groovy +/* + * Copyright 2013-2026, Seqera Labs + * (Apache 2.0) + */ +package io.seqera.tower.plugin.datalink + +import java.nio.file.AccessDeniedException +import java.nio.file.NoSuchFileException + +import groovy.json.JsonOutput +import io.seqera.tower.plugin.TowerClient +import nextflow.exception.AbortOperationException +import spock.lang.Specification + +class SeqeraDataLinkClientTest extends Specification { + + private static final String EP = 'https://api.example.com' + + private TowerClient tower() { + def tc = Spy(TowerClient) + tc.@endpoint = EP + return tc + } + + private static TowerClient.Response ok(String body) { new TowerClient.Response(200, body) } + private static TowerClient.Response err(int code) { new TowerClient.Response(code, "error $code") } + + // ---- listDataLinks ---- + + def "listDataLinks returns parsed DTOs for a single page"() { + given: + def body = JsonOutput.toJson([dataLinks: [ + [id: 'dl-1', name: 'inputs', provider: 'aws', resourceRef: 's3://bucket/'], + [id: 'dl-2', name: 'archive', provider: 'google', resourceRef: 'gs://bucket/'] + ], totalSize: 2]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0") >> ok(body) + def client = new SeqeraDataLinkClient(tc) + + when: + def list = client.listDataLinks(10L) + + then: + list.size() == 2 + list[0].id == 'dl-1' + list[0].name == 'inputs' + list[1].provider.toString() == 'google' + } + + def "listDataLinks exhausts pagination"() { + given: + def page1 = JsonOutput.toJson([dataLinks: [[id: 'dl-1', name: 'a', provider: 'aws']], totalSize: 3]) + def page2 = JsonOutput.toJson([dataLinks: [[id: 'dl-2', name: 'b', provider: 'aws']], totalSize: 3]) + def page3 = JsonOutput.toJson([dataLinks: [[id: 'dl-3', name: 'c', provider: 'aws']], totalSize: 3]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0") >> ok(page1) + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=1") >> ok(page2) + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=2") >> ok(page3) + def client = new SeqeraDataLinkClient(tc) + + when: + def list = client.listDataLinks(10L) + + then: + list*.id == ['dl-1', 'dl-2', 'dl-3'] + } + + // ---- getContent ---- + + def "getContent single page returns parsed items"() { + given: + def body = JsonOutput.toJson([ + originalPath: 'reads/', + objects: [ + [name: 'a.fq', type: 'FILE', size: 123, mimeType: 'application/gzip'], + [name: 'b.fq', type: 'FILE', size: 456, mimeType: 'application/gzip'] + ]]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links/dl-1/content?workspaceId=10&path=reads/") >> ok(body) + def client = new SeqeraDataLinkClient(tc) + + when: + def resp = client.getContent('dl-1', 'reads/', 10L) + + then: + resp.objects.size() == 2 + resp.objects[0].name == 'a.fq' + resp.objects[0].size == 123L + resp.objects[0].type.toString() == 'FILE' + } + + def "getContent follows nextPageToken"() { + given: + def p1 = JsonOutput.toJson([originalPath: '', objects: [[name: 'a', type: 'FILE', size: 1]], nextPageToken: 'T2']) + def p2 = JsonOutput.toJson([originalPath: '', objects: [[name: 'b', type: 'FILE', size: 2]]]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links/dl-1/content?workspaceId=10") >> ok(p1) + tc.sendApiRequest("${EP}/data-links/dl-1/content?workspaceId=10&nextPageToken=T2") >> ok(p2) + def client = new SeqeraDataLinkClient(tc) + + when: + def resp = client.getContent('dl-1', null, 10L) + + then: + resp.objects*.name == ['a', 'b'] + } + + // ---- getDownloadUrl ---- + + def "getDownloadUrl returns the signed URL"() { + given: + def tc = tower() + tc.sendApiRequest("${EP}/data-links/dl-1/download?workspaceId=10&path=reads/a.fq") >> ok(JsonOutput.toJson([url: 'https://signed'])) + def client = new SeqeraDataLinkClient(tc) + + when: + def resp = client.getDownloadUrl('dl-1', 'reads/a.fq', 10L) + + then: + resp.url == 'https://signed' + } + + // ---- error mapping ---- + + def "401 throws AbortOperationException"() { + given: + def tc = tower() + tc.sendApiRequest(_) >> err(401) + def client = new SeqeraDataLinkClient(tc) + + when: + client.listDataLinks(10L) + + then: + thrown(AbortOperationException) + } + + def "403 throws AccessDeniedException"() { + given: + def tc = tower() + tc.sendApiRequest(_) >> err(403) + def client = new SeqeraDataLinkClient(tc) + + when: + client.getContent('dl-1', '', 10L) + + then: + thrown(AccessDeniedException) + } + + def "404 throws NoSuchFileException"() { + given: + def tc = tower() + tc.sendApiRequest(_) >> err(404) + def client = new SeqeraDataLinkClient(tc) + + when: + client.getDownloadUrl('dl-1', 'missing', 10L) + + then: + thrown(NoSuchFileException) + } +} +``` + +- [ ] **Step 2: Run the tests** + +Run: `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.datalink.SeqeraDataLinkClientTest' -i` +Expected: all pass. + +- [ ] **Step 3: Commit** + +```bash +git add plugins/nf-tower/src/test/io/seqera/tower/plugin/datalink/SeqeraDataLinkClientTest.groovy +git commit -s -m "test(nf-tower): SeqeraDataLinkClient unit tests — pagination, parse, error mapping" +``` + +--- + +## Phase 3: User Story 1 — Read a File Inside a Data-Link (P1) 🎯 MVP + +**Goal**: `file('seqera:////data-links///')` returns an `InputStream` over the file content. + +### T009 [US1] — Implement `DataLinksResourceHandler` (real, replacing the stub) + +**Files:** +- Modify: `plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy` + +- [ ] **Step 1: Replace the stub with the full handler** + +```groovy +/* + * Copyright 2013-2026, Seqera Labs + * (Apache 2.0) + */ + +package io.seqera.tower.plugin.fs.handler + +import java.net.http.HttpClient +import java.net.http.HttpRequest +import java.net.http.HttpResponse +import java.nio.file.AccessDeniedException +import java.nio.file.AccessMode +import java.nio.file.NoSuchFileException +import java.nio.file.Path +import java.time.Duration +import java.time.Instant + +import groovy.transform.CompileStatic +import groovy.util.logging.Slf4j +import io.seqera.tower.model.DataLinkContentResponse +import io.seqera.tower.model.DataLinkDto +import io.seqera.tower.model.DataLinkItem +import io.seqera.tower.model.DataLinkItemType +import io.seqera.tower.plugin.datalink.SeqeraDataLinkClient +import io.seqera.tower.plugin.fs.ResourceTypeHandler +import io.seqera.tower.plugin.fs.SeqeraFileAttributes +import io.seqera.tower.plugin.fs.SeqeraFileSystem +import io.seqera.tower.plugin.fs.SeqeraPath + +/** + * {@link ResourceTypeHandler} for {@code data-links} resource type. + * Listings and attributes go through the Seqera Platform API; file reads go through + * a pre-signed URL fetched with a plain JDK {@link HttpClient} — the signed URL is + * addressed to the cloud backend and the Seqera {@code Authorization} header must not be sent. + */ +@Slf4j +@CompileStatic +class DataLinksResourceHandler implements ResourceTypeHandler { + + public static final String TYPE = 'data-links' + + private final SeqeraFileSystem fs + private final SeqeraDataLinkClient client + private final HttpClient httpClient + + /** workspaceId → data-link list */ + private final Map> dataLinkCache = new LinkedHashMap<>() + + DataLinksResourceHandler(SeqeraFileSystem fs, SeqeraDataLinkClient client) { + this(fs, client, HttpClient.newBuilder().connectTimeout(Duration.ofSeconds(10)).build()) + } + + /** Test-only constructor to inject a mock {@link HttpClient}. */ + DataLinksResourceHandler(SeqeraFileSystem fs, SeqeraDataLinkClient client, HttpClient httpClient) { + this.fs = fs + this.client = client + this.httpClient = httpClient + } + + @Override String getResourceType() { TYPE } + + @Override + List list(SeqeraPath dir) throws IOException { + final workspaceId = fs.resolveWorkspaceId(dir.org, dir.workspace) + final trail = dir.trail + if (trail.isEmpty()) { + // List distinct providers in use + final providers = resolveDataLinks(workspaceId) + .collect { DataLinkDto dl -> dl.provider?.toString() } + .findAll { String p -> p } + .toSet() + return providers.toList().sort().collect { String p -> dir.resolve(p) as Path } + } + if (trail.size() == 1) { + // List data-link names under the given provider + final prov = trail[0] + final names = resolveDataLinks(workspaceId) + .findAll { DataLinkDto dl -> dl.provider?.toString() == prov } + .collect { DataLinkDto dl -> dl.name } + .sort() + if (names.isEmpty()) + throw new NoSuchFileException(dir.toString(), null, "No data-links for provider '$prov' in workspace '${dir.workspace}'") + return names.collect { String n -> dir.resolve(n) as Path } + } + // trail.size() >= 2 — browse inside the data-link + final dl = resolveDataLink(workspaceId, trail[0], trail[1]) + final subPath = trail.size() > 2 ? trail.subList(2, trail.size()).join('/') : '' + final resp = client.getContent(dl.id, subPath, workspaceId) + return (resp.objects ?: []).collect { DataLinkItem it -> dir.resolve(it.name) as Path } + } + + @Override + SeqeraFileAttributes readAttributes(SeqeraPath p) throws IOException { + final workspaceId = fs.resolveWorkspaceId(p.org, p.workspace) + final trail = p.trail + // data-links/ dir itself, provider dir, data-link root — all directories + if (trail.size() < 2) return new SeqeraFileAttributes(true) + final dl = resolveDataLink(workspaceId, trail[0], trail[1]) + if (trail.size() == 2) return new SeqeraFileAttributes(true) // data-link root + final subPath = trail.subList(2, trail.size()).join('/') + final resp = client.getContent(dl.id, subPath, workspaceId) + return attributesFor(resp, subPath, p) + } + + @Override + InputStream newInputStream(SeqeraPath p) throws IOException { + if (p.trail.size() < 3) + throw new IllegalArgumentException("newInputStream requires a file path inside a data-link: $p") + final workspaceId = fs.resolveWorkspaceId(p.org, p.workspace) + final dl = resolveDataLink(workspaceId, p.trail[0], p.trail[1]) + final subPath = p.trail.subList(2, p.trail.size()).join('/') + final urlResp = client.getDownloadUrl(dl.id, subPath, workspaceId) + if (!urlResp.url) + throw new NoSuchFileException(p.toString(), null, "Platform returned no download URL") + return fetchSignedUrl(urlResp.url) + } + + @Override + void checkAccess(SeqeraPath p, AccessMode... modes) throws IOException { + for (AccessMode m : modes) { + if (m == AccessMode.WRITE || m == AccessMode.EXECUTE) + throw new AccessDeniedException(p.toString(), null, "seqera:// data-links are read-only") + } + // READ: rely on readAttributes to validate existence + readAttributes(p) + } + + // ---- private ---- + + private synchronized List resolveDataLinks(long workspaceId) { + def cached = dataLinkCache.get(workspaceId) + if (cached == null) { + cached = client.listDataLinks(workspaceId) + dataLinkCache.put(workspaceId, cached) + } + return cached + } + + private DataLinkDto resolveDataLink(long workspaceId, String provider, String name) throws NoSuchFileException { + final found = resolveDataLinks(workspaceId).find { DataLinkDto dl -> + dl.provider?.toString() == provider && dl.name == name + } + if (!found) + throw new NoSuchFileException( + "seqera://.../data-links/${provider}/${name}", + null, + "Data-link '${name}' not found for provider '${provider}' in workspace '$workspaceId'") + return found + } + + private SeqeraFileAttributes attributesFor(DataLinkContentResponse resp, String subPath, SeqeraPath pathForErrors) throws NoSuchFileException { + final items = resp.objects ?: [] + // Single-file content response: one object whose name matches the final segment + final lastSeg = subPath.contains('/') ? subPath.substring(subPath.lastIndexOf('/') + 1) : subPath + final single = items.find { DataLinkItem it -> it.name == lastSeg && it.type == DataLinkItemType.FILE } + if (single) + return new SeqeraFileAttributes(single.size ?: 0L, Instant.EPOCH, Instant.EPOCH, pathForErrors.toString()) + // Otherwise treat as a directory (content response with multiple children or zero) + // If there are no children AND no originalPath, the path does not exist + if (items.isEmpty() && !resp.originalPath) + throw new NoSuchFileException(pathForErrors.toString(), null, "Path not found inside data-link") + return new SeqeraFileAttributes(true) + } + + private InputStream fetchSignedUrl(String url) throws IOException { + final req = HttpRequest.newBuilder(URI.create(url)) + .timeout(Duration.ofMinutes(5)) + .GET() + .build() + try { + final HttpResponse resp = httpClient.send(req, HttpResponse.BodyHandlers.ofInputStream()) + final status = resp.statusCode() + if (status >= 200 && status < 300) return resp.body() + resp.body()?.close() + throw new IOException("Signed URL fetch failed: HTTP $status for $url") + } catch (InterruptedException e) { + Thread.currentThread().interrupt() + throw new IOException("Interrupted while fetching signed URL", e) + } + } +} +``` + +- [ ] **Step 2: Compile** + +Run: `./gradlew :plugins:nf-tower:compileGroovy` +Expected: success. + +- [ ] **Step 3: Commit** + +```bash +git add plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy +git commit -s -m "feat(nf-tower): implement DataLinksResourceHandler (list, readAttributes, newInputStream)" +``` + +### T010 [US1] — Unit tests for `DataLinksResourceHandler.newInputStream` + +**Files:** +- Create: `plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy` + +- [ ] **Step 1: Write the spec — MVP scenarios for newInputStream** + +```groovy +/* + * Copyright 2013-2026, Seqera Labs + * (Apache 2.0) + */ +package io.seqera.tower.plugin.fs.handler + +import java.net.http.HttpClient +import java.net.http.HttpResponse +import java.nio.file.NoSuchFileException + +import io.seqera.tower.model.DataLinkContentResponse +import io.seqera.tower.model.DataLinkDownloadUrlResponse +import io.seqera.tower.model.DataLinkDto +import io.seqera.tower.model.DataLinkItem +import io.seqera.tower.model.DataLinkItemType +import io.seqera.tower.model.DataLinkProvider +import io.seqera.tower.plugin.datalink.SeqeraDataLinkClient +import io.seqera.tower.plugin.fs.SeqeraFileSystem +import io.seqera.tower.plugin.fs.SeqeraPath +import spock.lang.Specification + +class DataLinksResourceHandlerTest extends Specification { + + private SeqeraFileSystem fs = Mock(SeqeraFileSystem) + private SeqeraDataLinkClient client = Mock(SeqeraDataLinkClient) + private HttpClient http = Mock(HttpClient) + private DataLinksResourceHandler handler = new DataLinksResourceHandler(fs, client, http) + + private DataLinkDto dl(String id, String name, DataLinkProvider p) { + def d = new DataLinkDto(); d.id = id; d.name = name; d.provider = p; return d + } + private DataLinkItem item(String name, DataLinkItemType t, long size) { + def i = new DataLinkItem(); i.name = name; i.type = t; i.size = size; return i + } + + // ---- newInputStream ---- + + def "newInputStream resolves (provider,name,subPath) and streams the signed URL"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/reads/a.fq') + def signedBody = new ByteArrayInputStream('data'.bytes) + def httpResp = Mock(HttpResponse) { + statusCode() >> 200 + body() >> signedBody + } + def urlResp = new DataLinkDownloadUrlResponse(); urlResp.url = 'https://signed/a' + + when: + def stream = handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * client.listDataLinks(10L) >> [dl('dl-1', 'inputs', DataLinkProvider.AWS)] + 1 * client.getDownloadUrl('dl-1', 'reads/a.fq', 10L) >> urlResp + 1 * http.send(_, _) >> httpResp + stream === signedBody + } + + def "newInputStream throws NoSuchFileException when data-link unknown"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/unknown/reads/a.fq') + + when: + handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * client.listDataLinks(10L) >> [dl('dl-1', 'inputs', DataLinkProvider.AWS)] + thrown(NoSuchFileException) + } + + def "newInputStream requires trail.size >= 3 (file path, not the data-link root itself)"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs') + + when: + handler.newInputStream(path) + + then: + thrown(IllegalArgumentException) + } + + def "newInputStream wraps signed-URL HTTP 403 as IOException"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/reads/a.fq') + def urlResp = new DataLinkDownloadUrlResponse(); urlResp.url = 'https://signed/a' + def httpResp = Mock(HttpResponse) { + statusCode() >> 403 + body() >> new ByteArrayInputStream(new byte[0]) + } + + when: + handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> [dl('dl-1', 'inputs', DataLinkProvider.AWS)] + 1 * client.getDownloadUrl('dl-1', 'reads/a.fq', 10L) >> urlResp + 1 * http.send(_, _) >> httpResp + thrown(IOException) + } +} +``` + +- [ ] **Step 2: Run the tests** + +Run: `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.fs.handler.DataLinksResourceHandlerTest' -i` +Expected: all pass. + +- [ ] **Step 3: Commit** + +```bash +git add plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy +git commit -s -m "test(nf-tower): DataLinksResourceHandler.newInputStream unit tests" +``` + +**Checkpoint**: US1 complete — file reads through data-link paths work end to end in unit tests. + +--- + +## Phase 4: User Story 2 — Browse Data-Link Hierarchy (P2) + +### T011 [US2] — List & readAttributes tests for DataLinksResourceHandler + +**Files:** +- Modify: `plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy` + +- [ ] **Step 1: Append list / readAttributes tests** + +Add the following specs to `DataLinksResourceHandlerTest`: + +```groovy + // ---- list: depth 3 (data-links/) ---- + + def "list at data-links/ returns distinct providers in use"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links') + + when: + def paths = handler.list(path) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * client.listDataLinks(10L) >> [ + dl('dl-1', 'a', DataLinkProvider.AWS), + dl('dl-2', 'b', DataLinkProvider.GOOGLE), + dl('dl-3', 'c', DataLinkProvider.AWS) + ] + paths*.toString().sort() == [ + 'seqera://acme/research/data-links/AWS', + 'seqera://acme/research/data-links/GOOGLE' + ] + } + + // ---- list: depth 4 (data-links//) ---- + + def "list at data-links// returns data-link names for that provider"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS') + + when: + def paths = handler.list(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> [ + dl('dl-1', 'inputs', DataLinkProvider.AWS), + dl('dl-2', 'archive', DataLinkProvider.AWS), + dl('dl-3', 'onGcs', DataLinkProvider.GOOGLE) + ] + paths*.toString() == [ + 'seqera://acme/research/data-links/AWS/archive', + 'seqera://acme/research/data-links/AWS/inputs' + ] + } + + def "list at data-links// throws when no data-links for that provider"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AZURE') + + when: + handler.list(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> [dl('dl-1', 'x', DataLinkProvider.AWS)] + thrown(NoSuchFileException) + } + + // ---- list: depth 5 (data-link root) ---- + + def "list at data-link root returns top-level objects"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs') + def content = new DataLinkContentResponse() + content.objects = [item('reads', DataLinkItemType.FOLDER, 0), item('samplesheet.csv', DataLinkItemType.FILE, 42)] + + when: + def paths = handler.list(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> [dl('dl-1', 'inputs', DataLinkProvider.AWS)] + 1 * client.getContent('dl-1', '', 10L) >> content + paths*.toString() == [ + 'seqera://acme/research/data-links/AWS/inputs/reads', + 'seqera://acme/research/data-links/AWS/inputs/samplesheet.csv' + ] + } + + // ---- list: depth 6+ (nested sub-path) ---- + + def "list at deep sub-path browses the correct sub-path"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/reads') + def content = new DataLinkContentResponse() + content.objects = [item('a.fq', DataLinkItemType.FILE, 1), item('b.fq', DataLinkItemType.FILE, 2)] + + when: + def paths = handler.list(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> [dl('dl-1', 'inputs', DataLinkProvider.AWS)] + 1 * client.getContent('dl-1', 'reads', 10L) >> content + paths*.toString() == [ + 'seqera://acme/research/data-links/AWS/inputs/reads/a.fq', + 'seqera://acme/research/data-links/AWS/inputs/reads/b.fq' + ] + } + + // ---- readAttributes ---- + + def "readAttributes at data-links/ resource-type dir reports directory"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links') + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + attr.directory + !attr.regularFile + } + + def "readAttributes at data-link root reports directory"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs') + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> [dl('dl-1', 'inputs', DataLinkProvider.AWS)] + attr.directory + } + + def "readAttributes on a file sub-path reports file with size"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/reads/a.fq') + def content = new DataLinkContentResponse() + content.objects = [item('a.fq', DataLinkItemType.FILE, 123)] + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> [dl('dl-1', 'inputs', DataLinkProvider.AWS)] + 1 * client.getContent('dl-1', 'reads/a.fq', 10L) >> content + attr.regularFile + attr.size() == 123L + } + + def "readAttributes on a directory sub-path reports directory"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/reads') + def content = new DataLinkContentResponse() + content.originalPath = 'reads/' + content.objects = [item('a.fq', DataLinkItemType.FILE, 1), item('b.fq', DataLinkItemType.FILE, 2)] + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> [dl('dl-1', 'inputs', DataLinkProvider.AWS)] + 1 * client.getContent('dl-1', 'reads', 10L) >> content + attr.directory + } +``` + +- [ ] **Step 2: Run the tests** + +Run: `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.fs.handler.DataLinksResourceHandlerTest' -i` +Expected: all pass. + +- [ ] **Step 3: Commit** + +```bash +git add plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy +git commit -s -m "test(nf-tower): DataLinksResourceHandler list & readAttributes unit tests" +``` + +### T012 [US2] — Provider-level browsing test (workspace listing enumerates handlers) + +**Files:** +- Modify: `plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy` + +- [ ] **Step 1: Add a spec confirming `datasets` AND `data-links` appear when listing a workspace** + +Append to the test class (before the final `}`): + +```groovy + def "listing a workspace enumerates both registered resource types"() { + given: + def tc = spyTower() + tc.sendApiRequest("${ENDPOINT}/user-info") >> ok(userInfoJson()) + tc.sendApiRequest("${ENDPOINT}/user/42/workspaces") >> ok(workspacesJson()) + + final provider = new SeqeraFileSystemProvider() + provider.newFileSystem(URI.create('seqera://'), [:]) + final fs = provider.getFileSystem(URI.create('seqera://')) as SeqeraFileSystem + final wsPath = new SeqeraPath(fs, 'seqera://acme/research') + + when: + final List entries = [] + provider.newDirectoryStream(wsPath, null).withCloseable { s -> s.iterator().each { entries.add(it) } } + + then: + entries*.toString().sort() == [ + 'seqera://acme/research/data-links', + 'seqera://acme/research/datasets' + ] + } +``` + +- [ ] **Step 2: Run the test** + +Run: `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.fs.SeqeraFileSystemProviderTest' -i` +Expected: all pass — includes both the new spec and the existing dataset-related specs. + +- [ ] **Step 3: Commit** + +```bash +git add plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy +git commit -s -m "test(nf-tower): workspace listing enumerates both datasets and data-links" +``` + +**Checkpoint**: US2 complete — browsing works at every depth. + +--- + +## Phase 5: User Story 3 — Meaningful Errors (P3) + +### T013 [US3] — Error-mapping tests for data-link paths + +**Files:** +- Modify: `plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy` + +- [ ] **Step 1: Append error-path specs** + +```groovy + def "unknown data-link under a known provider throws with clear message"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/ghost/a.fq') + + when: + handler.list(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> [dl('dl-1', 'inputs', DataLinkProvider.AWS)] + def ex = thrown(NoSuchFileException) + ex.message.toLowerCase().contains('not found') || ex.reason?.toLowerCase()?.contains('not found') + } + + def "missing sub-path inside a data-link surfaces as NoSuchFileException"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/does/not/exist') + def empty = new DataLinkContentResponse() + empty.originalPath = null + empty.objects = [] + + when: + handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> [dl('dl-1', 'inputs', DataLinkProvider.AWS)] + 1 * client.getContent('dl-1', 'does/not/exist', 10L) >> empty + thrown(NoSuchFileException) + } + + def "checkAccess with WRITE is rejected"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/a.fq') + + when: + handler.checkAccess(path, java.nio.file.AccessMode.WRITE) + + then: + thrown(java.nio.file.AccessDeniedException) + } +``` + +- [ ] **Step 2: Run the tests** + +Run: `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.fs.handler.DataLinksResourceHandlerTest' -i` +Expected: all pass. + +- [ ] **Step 3: Commit** + +```bash +git add plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy +git commit -s -m "test(nf-tower): data-link error paths — unknown link, missing sub-path, write-rejected" +``` + +### T014 [US3] — Unsupported-resource-type error via the provider + +**Files:** +- Modify: `plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy` + +- [ ] **Step 1: Add a provider-level dispatch test** + +Append: + +```groovy + def "newInputStream on an unsupported resource type throws NoSuchFileException"() { + given: + def tc = spyTower() + tc.sendApiRequest("${ENDPOINT}/user-info") >> ok(userInfoJson()) + tc.sendApiRequest("${ENDPOINT}/user/42/workspaces") >> ok(workspacesJson()) + def provider = new SeqeraFileSystemProvider() + provider.newFileSystem(URI.create('seqera://'), [:]) + def fs = provider.getFileSystem(URI.create('seqera://')) as SeqeraFileSystem + def path = new SeqeraPath(fs, 'seqera://acme/research/unknown-type/foo') + + when: + provider.newInputStream(path) + + then: + def ex = thrown(NoSuchFileException) + ex.message.contains('Unsupported resource type') || ex.reason?.contains('Unsupported resource type') + } +``` + +- [ ] **Step 2: Run the tests** + +Run: `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.fs.SeqeraFileSystemProviderTest' -i` +Expected: all pass. + +- [ ] **Step 3: Commit** + +```bash +git add plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy +git commit -s -m "test(nf-tower): unsupported resource type surfaces NoSuchFileException" +``` + +**Checkpoint**: US3 complete — all error paths produce clear, type-specific exceptions. + +--- + +## Phase 6: User Story 4 — Extensibility Validation (P4) + +### T015 [US4] — Architectural guard test + +**Files:** +- Create: `plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/ResourceTypeAbstractionTest.groovy` + +- [ ] **Step 1: Write a guard test** + +```groovy +/* + * Copyright 2013-2026, Seqera Labs + * (Apache 2.0) + */ +package io.seqera.tower.plugin.fs + +import spock.lang.Specification + +/** + * Guards that the generic NIO layer does not reach into resource-type-specific packages. + * {@link SeqeraPath}, {@link SeqeraFileSystem}, {@link SeqeraFileSystemProvider} must not + * depend on {@code dataset/}, {@code datalink/}, or {@code fs/handler/} — handlers dispatch, + * but dispatch lives behind the {@link ResourceTypeHandler} interface. + */ +class ResourceTypeAbstractionTest extends Specification { + + static final Class[] GENERIC_CLASSES = [SeqeraPath, SeqeraFileSystem, SeqeraFileAttributes] + + def "generic fs classes do not import resource-type-specific packages"() { + expect: + GENERIC_CLASSES.each { Class c -> + final src = new File("plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/${c.simpleName}.groovy").text + assert !src.contains('io.seqera.tower.plugin.dataset.'), "${c.simpleName} must not import dataset package" + assert !src.contains('io.seqera.tower.plugin.datalink.'), "${c.simpleName} must not import datalink package" + assert !src.contains('io.seqera.tower.plugin.fs.handler.'), "${c.simpleName} must not import handler package" + assert !src.contains('DataLink') , "${c.simpleName} must not reference data-link types" + assert !src.contains('DatasetDto'), "${c.simpleName} must not reference DatasetDto" + } + } + + def "both handlers implement the ResourceTypeHandler interface"() { + expect: + ResourceTypeHandler.isAssignableFrom(io.seqera.tower.plugin.fs.handler.DatasetsResourceHandler) + ResourceTypeHandler.isAssignableFrom(io.seqera.tower.plugin.fs.handler.DataLinksResourceHandler) + } +} +``` + +- [ ] **Step 2: Run** + +Run: `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.fs.ResourceTypeAbstractionTest' -i` +Expected: all pass. If any import lingered from the refactor (e.g. `SeqeraFileSystem.groovy` still references `DatasetDto` via a cache field type), fix the import before proceeding — the dataset-specific caches belong in `DatasetsResourceHandler` long-term, but keeping the `DatasetDto` typed cache in `SeqeraFileSystem` **is acceptable** as long as the import is `io.seqera.tower.model.DatasetDto` (the tower-api DTO, not the plugin's `dataset` package). The test check against `'DatasetDto'` in source text guards the generic classes; if the existing field trips this, refactor the cache into the handler (move `datasetCache`, `versionCache`, `resolveDatasets`, `resolveDataset`, `resolveVersions`, `invalidateDatasetCache` from `SeqeraFileSystem` into `DatasetsResourceHandler`). + +**Important refactor note**: if the guard test fails on `SeqeraFileSystem.groovy`, perform this sub-step: + +- Remove `datasetCache`, `versionCache`, `resolveDatasets`, `resolveDataset`, `resolveVersions`, and `invalidateDatasetCache` from `SeqeraFileSystem.groovy`, along with their imports (`io.seqera.tower.model.DatasetDto`, `io.seqera.tower.model.DatasetVersionDto`). +- Move the same caches and methods into `DatasetsResourceHandler.groovy` as private fields and synchronized methods. +- Update `DatasetsResourceHandler` to use its own cache methods instead of calling `fs.resolveDatasets(...)` etc. +- Re-run the guard test to confirm. + +- [ ] **Step 3: Commit** + +```bash +git add plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/ResourceTypeAbstractionTest.groovy \ + plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystem.groovy \ + plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandler.groovy +git commit -s -m "test(nf-tower): enforce resource-type-agnostic boundary in generic fs classes" +``` + +**Checkpoint**: US4 complete — abstraction is validated by automated guard. + +--- + +## Phase 7: Final Verification + +**Note**: both the `plugins/nf-tower/VERSION` bump and `changelog.txt` entries are handled at release time by the repo's release process (see `CLAUDE.md § Release process`), not per-feature. This phase only verifies the build is green. + +### T016 — Final full test run + +- [ ] **Step 1: Run the full nf-tower test suite** + +Run: `./gradlew :plugins:nf-tower:test -i` +Expected: all tests pass — dataset, data-link, path, provider, filesystem, abstraction guard. + +- [ ] **Step 2: Run the full plugin build** + +Run: `./gradlew :plugins:nf-tower:build` +Expected: success. + +- [ ] **Step 3: Confirm no cloud-SDK dependencies were introduced** + +Run: +``` +./gradlew :plugins:nf-tower:dependencies --configuration runtimeClasspath | grep -iE 'aws-sdk|google-cloud|azure-' || echo 'OK: no cloud SDKs on classpath' +``` +Expected output ends with `OK: no cloud SDKs on classpath`. (SC-006) + +- [ ] **Step 4: Summary — nothing to commit; just confirm build-green at HEAD.** + +--- + +## Appendix — Task Dependency Graph + +``` +T001 ─┐ +T002 ─┼──► T003 ──► T004 ──► T005 ──► T006 ──► T007 ──► T008 + │ │ + │ └──► T009 ──► T010 + │ │ + │ ├──► T011 + │ ├──► T012 + │ ├──► T013 + │ └──► T014 + │ │ + │ └──► T015 + │ │ + └──────────────────────────────────────────────────────────────────────────────────┴──► T016 +``` From 91554f4604e6b89cee84a7e2d3dc0761b38db720 Mon Sep 17 00:00:00 2001 From: jorgee Date: Fri, 24 Apr 2026 12:55:28 +0200 Subject: [PATCH 2/6] Fix data-link API endpoints and short-circuit readAttributes via listing cache Signed-off-by: jorgee --- adr/20260422-seqera-datalinks-filesystem.md | 38 +++-- .../datalink/SeqeraDataLinkClient.groovy | 52 +++++-- .../plugin/fs/SeqeraFileSystemProvider.groovy | 2 + .../seqera/tower/plugin/fs/SeqeraPath.groovy | 26 ++++ .../handler/DataLinksResourceHandler.groovy | 38 +++-- .../fs/handler/DatasetsResourceHandler.groovy | 16 +- .../tower/plugin/fs/SeqeraPathTest.groovy | 29 ++++ .../DataLinksResourceHandlerTest.groovy | 109 ++++++++++---- .../DatasetsResourceHandlerTest.groovy | 32 ++++ specs/260422-seqera-datalinks-fs/plan.md | 142 ++++++++++++------ specs/260422-seqera-datalinks-fs/spec.md | 40 ++--- specs/260422-seqera-datalinks-fs/tasks.md | 9 ++ 12 files changed, 404 insertions(+), 129 deletions(-) diff --git a/adr/20260422-seqera-datalinks-filesystem.md b/adr/20260422-seqera-datalinks-filesystem.md index 30590a183f..dc290f26b6 100644 --- a/adr/20260422-seqera-datalinks-filesystem.md +++ b/adr/20260422-seqera-datalinks-filesystem.md @@ -9,7 +9,7 @@ Technical Story: Extend the `seqera://` NIO filesystem (introduced by [20260310- ## Summary -Add a second resource type (`data-links`) to the existing `seqera://` filesystem in the `nf-tower` plugin. Paths of the form `seqera:////data-links///` resolve to files and directories inside a Platform-managed data-link. Listings and attribute queries are served by the Platform's `/data-links/{id}/content` endpoint; byte reads go through pre-signed URLs returned by `/data-links/{id}/download`. Only the Seqera access token is required — no AWS/GCP/Azure credentials, no cloud SDK dependency. +Add a second resource type (`data-links`) to the existing `seqera://` filesystem in the `nf-tower` plugin. Paths of the form `seqera:////data-links///` resolve to files and directories inside a Platform-managed data-link. Listings and attribute queries are served by the Platform's `/data-links/{id}/browse[/path]` endpoints; byte reads go through pre-signed URLs returned by `/data-links/{id}/generate-download-url` and fetched with a plain JDK `HttpClient`. Only the Seqera access token is required — no AWS/GCP/Azure credentials, no cloud SDK dependency. As part of this change, the existing dataset-specific logic in `SeqeraFileSystemProvider`, `SeqeraFileSystem`, and `SeqeraPath` is extracted into a `ResourceTypeHandler` abstraction, so the two resource types coexist behind a common contract. @@ -53,12 +53,12 @@ For each read, call the Platform to obtain short-lived AWS/GCP/Azure credentials ### Option 2: Pre-signed URL + direct HTTPS fetch -Call the Platform's `GET /data-links/{id}/download?path=` endpoint to obtain a pre-signed URL; stream bytes through the existing `TowerClient` / `HxClient` HTTPS path. +Call the Platform's `GET /data-links/{id}/generate-download-url?filePath=` endpoint to obtain a pre-signed URL; stream bytes through a standalone HTTPS client. - Good, because there is no cloud SDK dependency — all I/O is generic HTTPS. - Good, because the Platform is the only credential surface (user token goes in, signed URL comes out; credentials never cross our process boundary as a distinct object). - Good, because it uniformly supports every provider the Platform supports — now and in the future — with no per-provider code. -- Good, because `TowerClient.sendStreamingRequest()` already exists and has the retry/backoff semantics we want. +- Good, because the existing `TowerClient` handles the Platform-side call (`/generate-download-url`) with retry/backoff, and the cloud-side fetch is a one-shot HTTPS GET through a standalone `java.net.http.HttpClient`. - Bad, because pre-signed URLs have time windows; a very long read can outlive its URL. Acceptable: Nextflow task retry handles the failure. - Bad, because range reads / multi-part reads are not implemented in this iteration. Acceptable: datasets are already single-shot reads and the pattern matches. @@ -76,7 +76,7 @@ See above. ## Solution or decision outcome -Option 2 — pre-signed URL + direct HTTPS fetch. All data-link byte I/O goes through the Platform's `/download` endpoint and `TowerClient.sendStreamingRequest()`. The plugin never touches a cloud SDK and never holds a long-lived cloud credential. +Option 2 — pre-signed URL + direct HTTPS fetch. The plugin calls the Platform's `/generate-download-url` endpoint through `TowerClient.sendApiRequest()` to obtain a pre-signed URL, then fetches that URL with a plain JDK `HttpClient` (no Seqera `Authorization` header). The plugin never touches a cloud SDK and never holds a long-lived cloud credential. Extend the `fs/` package with a real `ResourceTypeHandler` abstraction. Extract the existing dataset logic into a `DatasetsResourceHandler`. Add `DataLinksResourceHandler` as the second implementation. @@ -147,12 +147,16 @@ interface ResourceTypeHandler { ### API Usage Summary (Data-Links) -| NIO operation | Platform endpoint | Notes | -| -------------------------------------------------------------- | ------------------------------------------ | ------------------------------------------------------------------------------- | -| list data-links in workspace (resolution + depth-4/5 listings) | `GET /data-links?workspaceId=X` | cached per workspace; pagination exhausted | -| `newDirectoryStream(dir)` at depth ≥ 5 | `GET /data-links/{id}/content?path=` | items array → child paths | -| `readAttributes(path)` inside a data-link | `GET /data-links/{id}/content?path=` | single targeted call; file vs folder from response | -| `newInputStream(file)` | `GET /data-links/{id}/download?path=` | parse `DataLinkDownloadUrlResponse.url`; fetch with plain JDK `HttpClient` (no Seqera auth header — the URL is signed for the cloud backend) | +| NIO operation | Platform endpoint | Notes | +| -------------------------------------------------------------- | -------------------------------------------------------------- | ------------------------------------------------------------------------------- | +| enumerate providers in workspace (depth-3 listing) | `GET /data-links?workspaceId=X&max=100&offset=O` | offset pagination via lazy `Iterator` | +| resolve one data-link by (provider, name) | `GET /data-links?workspaceId=X&search=&max=100&offset=O` | server-side filter by name; short-circuit on first provider match; `@Memoized` | +| `newDirectoryStream(dir)` at data-link root | `GET /data-links/{id}/browse?workspaceId=X[&credentialsId=C]` | lazy `PagedDataLinkContent` — token pagination via `nextPageToken` | +| `newDirectoryStream(dir)` at a sub-path | `GET /data-links/{id}/browse/{path}?workspaceId=X[&credentialsId=C]` | same; slashes in `{path}` are preserved | +| `readAttributes(path)` inside a data-link | same as above (first page only) | short-circuited when `path.cachedAttributes` was set by a prior listing | +| `newInputStream(file)` | `GET /data-links/{id}/generate-download-url?workspaceId=X&filePath=[&credentialsId=C]` | parse `DataLinkDownloadUrlResponse.url`; fetch with plain JDK `HttpClient` (no Seqera auth header — the URL is signed for the cloud backend) | + +`credentialsId` is forwarded when `DataLinkDto.credentials` is non-empty (using the first entry's `id`); omitted otherwise. ### Key Design Decisions @@ -168,11 +172,15 @@ interface ResourceTypeHandler { 6. **Handler registry at construction, not via PF4J**: handlers are instantiated in `SeqeraFileSystemProvider.newFileSystem()`. Adding a third resource type is a code change to this plugin, identical in shape to the dataset/data-link pair. No extension-point protocol is introduced — YAGNI. -7. **`readAttributes` is single-target**: because `GET /data-links/{id}/content?path=` accepts both directory and file paths, a file-level `readAttributes` is one API call — not a parent browse plus filter. No N+1 problem; no browse cache needed. +7. **`readAttributes` is single-target**: because `GET /data-links/{id}/browse/{path}` accepts both directory and file paths, a file-level `readAttributes` is one API call — not a parent browse plus filter. No N+1 problem; no browse cache needed. + +8. **Read-only stance preserved**: `SeqeraFileSystem.isReadOnly()` remains `true`. Write operations on data-links raise `UnsupportedOperationException`. The `/data-links/{id}/upload` endpoints are a future extension point. + +9. **Listings stream lazily**: paginated Platform responses are exposed as lazy iterators rather than eagerly-materialized lists. `listDataLinks` is an `Iterator` that fetches offsets on demand. `getContent` returns a `PagedDataLinkContent` that loads the first page eagerly (for `readAttributes`) and paginates further only as the iterator advances. Handler `list()` returns `Iterable`, flowed through `DirectoryStream` without full materialization. -8. **Read-only stance preserved**: `SeqeraFileSystem.isReadOnly()` remains `true`. Write operations on data-links raise `UnsupportedOperationException`. The `/data-links/{id}/multipart-upload` endpoint is a future extension point. +10. **Per-path attribute cache, not a global cache**: listings attach `SeqeraFileAttributes` to each emitted `SeqeraPath` via `resolveWithAttributes(name, attrs)`. A follow-up `readAttributes(child)` returns the cached value with zero API calls. Paths parsed from raw URIs (no prior listing) fall back to the live browse endpoint. No global browse-result or URL cache is maintained. -9. **Minimal cache**: only the workspace-level data-link list is cached. No browse-result cache. No URL cache. Simpler to reason about; consistent with the dataset handler's cache shape. +11. **`credentialsId` forwarding**: when a data-link exposes credentials in its `DataLinkDto.credentials` list, the plugin forwards the first credential's `id` as the `credentialsId` query parameter on browse and download-URL calls. When the list is empty, the parameter is omitted and the Platform falls back to its default resolution. ### Refactor Delivered by This Change @@ -187,9 +195,9 @@ The existing dataset test suite continues to pass unchanged; every dataset code ### Limitations -- **No write support for data-links in this iteration.** Upload paths must continue to use Fusion or direct cloud-SDK access until a follow-up adds the multipart-upload handler. +- **No write support for data-links in this iteration.** Upload paths must continue to use Fusion or direct cloud-SDK access until a follow-up adds the `/data-links/{id}/upload` handler. - **Signed URL expiration is not handled transparently.** Very long reads may outlive the URL's validity window. -- **No transparent pagination of data-link entries inside a single directory.** The browse API's pagination (if any) must be exhausted; for very large directories this may be slower than direct cloud listings. +- **Per-item last-modified is not exposed by the Platform browse API.** `SeqeraFileAttributes.lastModifiedTime()` reports `Instant.EPOCH` for data-link entries until the Platform surfaces this metadata. - **Single endpoint per JVM** (unchanged from dataset ADR): concurrent access to multiple Platform endpoints in one JVM is not supported. ## Links diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy index b96a010388..47b1cc7fd3 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy @@ -24,6 +24,7 @@ import java.nio.file.NoSuchFileException import groovy.json.JsonSlurper import groovy.transform.CompileStatic import groovy.util.logging.Slf4j +import io.seqera.tower.model.DataLinkCredentials import io.seqera.tower.model.DataLinkDownloadUrlResponse import io.seqera.tower.model.DataLinkDto import io.seqera.tower.model.DataLinkItem @@ -32,6 +33,8 @@ import io.seqera.tower.model.DataLinkProvider import io.seqera.tower.plugin.TowerClient import nextflow.exception.AbortOperationException +import java.nio.file.Path + /** * Typed client for Seqera Platform data-link API endpoints. * @@ -61,6 +64,20 @@ class SeqeraDataLinkClient { return new DataLinkListIterator(towerClient, endpoint, workspaceId, LIST_PAGE_SIZE) } + /** + * Resolve a data-link providers in the given workspace. + */ + @Memoized + Set getDataLinkProviders(long workspaceId) { + final providers = new TreeSet() + final Iterator it = listDataLinks(workspaceId) + while (it.hasNext()) { + final p = it.next().provider?.toString() + if (p) providers.add(p) + } + return providers + } + /** * Resolve a data-link by {@code (provider, name)} in the given workspace. * Iterates the API's list endpoint lazily and short-circuits on first match. @@ -87,28 +104,32 @@ class SeqeraDataLinkClient { * * Endpoints: {@code GET /data-links/{id}/browse} (root) and * {@code GET /data-links/{id}/browse/{path}} (sub-path). + * + * @param credentialsId optional data-link credentials identifier (from + * {@code DataLinkDto.credentials[0].id}); forwarded as a query param when set. */ - PagedDataLinkContent getContent(String dataLinkId, String subPath, long workspaceId) { + PagedDataLinkContent getContent(String dataLinkId, String subPath, long workspaceId, String credentialsId = null) { final pathSegment = subPath ? '/' + encodePath(subPath) : '' final baseUrl = "${endpoint}/data-links/${encodePath(dataLinkId)}/browse${pathSegment}" - final page = fetchBrowsePage(baseUrl, workspaceId, null) + final page = fetchBrowsePage(baseUrl, workspaceId, credentialsId, null) final firstItems = page.objects final firstToken = page.nextPageToken final originalPath = page.originalPath final fetcher = new PagedDataLinkContent.PageFetcher() { @Override Map fetch(String token) throws IOException { - final next = fetchBrowsePage(baseUrl, workspaceId, token) + final next = fetchBrowsePage(baseUrl, workspaceId, credentialsId, token) return [objects: next.objects, nextPageToken: next.nextPageToken] as Map } } return new PagedDataLinkContent(originalPath, firstItems, firstToken, fetcher) } - /** {@code GET /data-links/{id}/generate-download-url?workspaceId=&filePath=} */ - DataLinkDownloadUrlResponse getDownloadUrl(String dataLinkId, String subPath, long workspaceId) { - final url = "${endpoint}/data-links/${encodePath(dataLinkId)}/generate-download-url?workspaceId=${workspaceId}&filePath=${encodeQuery(subPath ?: '')}" - log.debug "SeqeraDataLinkClient GET $url" + /** {@code GET /data-links/{id}/generate-download-url?workspaceId=&filePath=[&credentialsId=]} */ + DataLinkDownloadUrlResponse getDownloadUrl(String dataLinkId, String subPath, long workspaceId, String credentialsId = null) { + String url = "${endpoint}/data-links/${encodePath(dataLinkId)}/generate-download-url?workspaceId=${workspaceId}&filePath=${encodeQuery(subPath ?: '')}" + if (credentialsId) url += "&credentialsId=${encodeQuery(credentialsId)}" + log.debug "Getting downloadURL: GET $url" final resp = towerClient.sendApiRequest(url) checkFsResponse(resp, url) final json = new JsonSlurper().parseText(resp.message) as Map @@ -120,10 +141,11 @@ class SeqeraDataLinkClient { // ---- page-fetching helpers ---- /** Fetch one browse page and normalize it into a {@link BrowsePage}. */ - private BrowsePage fetchBrowsePage(String baseUrl, long workspaceId, String nextPageToken) { + private BrowsePage fetchBrowsePage(String baseUrl, long workspaceId, String credentialsId, String nextPageToken) { String url = "${baseUrl}?workspaceId=${workspaceId}" + if (credentialsId) url += "&credentialsId=${encodeQuery(credentialsId)}" if (nextPageToken) url += "&nextPageToken=${encodeQuery(nextPageToken)}" - log.debug "SeqeraDataLinkClient GET $url" + log.debug "Fetching Browse page GET $url" final resp = towerClient.sendApiRequest(url) checkFsResponse(resp, url) final json = new JsonSlurper().parseText(resp.message) as Map @@ -182,7 +204,7 @@ class SeqeraDataLinkClient { } private void fetchNextPage() { - final url = "${endpoint}/data-links?status=AVAILABLE&workspaceId=${workspaceId}&max=${pageSize}&offset=${offset}${search ? '&search='+ encodeQuery(search) :''}" + final url = "${endpoint}/data-links?workspaceId=${workspaceId}&max=${pageSize}&offset=${offset}${search ? '&search='+ encodeQuery(search) :''}" log.debug "Fetching next list of datalinks: GET $url" final resp = towerClient.sendApiRequest(url) checkFsResponse(resp, url) @@ -227,9 +249,19 @@ class SeqeraDataLinkClient { dto.resourceRef = m.resourceRef as String if (m.provider) dto.provider = parseProvider(m.provider as String) dto.region = m.region as String + final credList = m.credentials as List + if (credList) dto.credentials = credList.collect { Map c -> mapCredentials(c) } return dto } + private static DataLinkCredentials mapCredentials(Map m) { + final c = new DataLinkCredentials() + c.id = m.id as String + c.name = m.name as String + if (m.provider) c.provider = parseProvider(m.provider as String) + return c + } + private static DataLinkItem mapItem(Map m) { final it = new DataLinkItem() it.name = m.name as String diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy index 68cb9b556b..63fc0bfb13 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy @@ -130,6 +130,8 @@ class SeqeraFileSystemProvider extends FileSystemProvider { if (!BasicFileAttributes.isAssignableFrom(type)) throw new UnsupportedOperationException("Attribute type not supported: $type") final sp = toSeqeraPath(path) + if (sp.cachedAttributes) + return (A) sp.cachedAttributes final fs = sp.getFileSystem() as SeqeraFileSystem final d = sp.depth() if (d < 3) { diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraPath.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraPath.groovy index f3c5ea77fd..2da5dcfdde 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraPath.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraPath.groovy @@ -57,11 +57,19 @@ class SeqeraPath implements Path { private final List trail /** non-null only for relative paths produced by {@link #relativize(Path)} */ private final String relPath + /** + * Optional attributes attached when this path was produced by a directory listing, + * so {@code readAttributes()} can return them without a follow-up API call. + * Not part of the URI — does not affect {@link #equals}, {@link #hashCode}, + * {@link #toString}, {@link #toUri}, or propagation via {@link #resolve} / {@link #getParent}. + */ + private final SeqeraFileAttributes cachedAttributes /** Parse a {@code seqera://} URI string. */ SeqeraPath(SeqeraFileSystem fs, String uriString) { this.fs = fs this.relPath = null + this.cachedAttributes = null if (!uriString.startsWith(PROTOCOL)) throw new InvalidPathException(uriString, "Not a seqera:// URI") final withoutScheme = uriString.substring(PROTOCOL.length()) @@ -79,6 +87,11 @@ class SeqeraPath implements Path { /** Programmatic absolute-path constructor. */ SeqeraPath(SeqeraFileSystem fs, String org, String workspace, String resourceType, List trail) { + this(fs, org, workspace, resourceType, trail, null) + } + + /** Programmatic absolute-path constructor with pre-resolved attributes. */ + SeqeraPath(SeqeraFileSystem fs, String org, String workspace, String resourceType, List trail, SeqeraFileAttributes cachedAttributes) { this.fs = fs this.relPath = null this.org = org @@ -87,6 +100,7 @@ class SeqeraPath implements Path { this.trail = trail != null ? Collections.unmodifiableList(new ArrayList(trail)) : Collections.emptyList() + this.cachedAttributes = cachedAttributes validatePath(null) } @@ -98,6 +112,7 @@ class SeqeraPath implements Path { this.workspace = null this.resourceType = null this.trail = Collections.emptyList() + this.cachedAttributes = null } private void validatePath(String original) { @@ -149,6 +164,17 @@ class SeqeraPath implements Path { String getWorkspace() { workspace } String getResourceType() { resourceType } List getTrail() { trail } + SeqeraFileAttributes getCachedAttributes() { cachedAttributes } + + /** + * Resolve a child segment and attach the given attributes to the resulting path. + * Used by directory-listing code paths so follow-up {@code readAttributes()} calls + * don't re-fetch information that was already available from the listing response. + */ + SeqeraPath resolveWithAttributes(String segment, SeqeraFileAttributes attrs) { + final child = resolve(segment) as SeqeraPath + return new SeqeraPath(child.fs, child.org, child.workspace, child.resourceType, child.trail, attrs) + } int depth() { if (resourceType) return 3 + trail.size() diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy index 5a5715e86c..f868ba11ee 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy @@ -80,12 +80,7 @@ class DataLinksResourceHandler implements ResourceTypeHandler { if (trail.isEmpty()) { // data-links/ → distinct providers in use (sorted). Iterate the stream, // collect distinct provider names — small output. - final providers = new TreeSet() - final Iterator it = client.listDataLinks(workspaceId) - while (it.hasNext()) { - final p = it.next().provider?.toString() - if (p) providers.add(p) - } + final providers = client.getDataLinkProviders(workspaceId) return providers.collect { String p -> dir.resolve(p) as Path } } if (trail.size() == 1) { @@ -105,12 +100,15 @@ class DataLinksResourceHandler implements ResourceTypeHandler { // Content can be very large, so we stream it lazily. final dl = client.getDataLink(workspaceId, trail[0], trail[1]) final subPath = trail.size() > 2 ? trail.subList(2, trail.size()).join('/') : '' - final content = client.getContent(dl.id, subPath, workspaceId) + log.debug("Listing files for $dl.name path $subPath") + final content = client.getContent(dl.id, subPath, workspaceId, credentialsIdOf(dl)) return new PathMappingIterable(content, dir) } @Override SeqeraFileAttributes readAttributes(SeqeraPath p) throws IOException { + // Short-circuit: attributes attached when this path was produced by a listing + if (p.cachedAttributes) return p.cachedAttributes final workspaceId = fs.resolveWorkspaceId(p.org, p.workspace) final trail = p.trail if (trail.size() < 2) { @@ -120,7 +118,8 @@ class DataLinksResourceHandler implements ResourceTypeHandler { final dl = client.getDataLink(workspaceId, trail[0], trail[1]) if (trail.size() == 2) return new SeqeraFileAttributes(true) // data-link root final subPath = trail.subList(2, trail.size()).join('/') - final content = client.getContent(dl.id, subPath, workspaceId) + log.debug("Reading attributes for $p") + final content = client.getContent(dl.id, subPath, workspaceId, credentialsIdOf(dl)) return attributesFor(content, subPath, p) } @@ -131,12 +130,18 @@ class DataLinksResourceHandler implements ResourceTypeHandler { final workspaceId = fs.resolveWorkspaceId(p.org, p.workspace) final dl = client.getDataLink(workspaceId, p.trail[0], p.trail[1]) final subPath = p.trail.subList(2, p.trail.size()).join('/') - final urlResp = client.getDownloadUrl(dl.id, subPath, workspaceId) + final urlResp = client.getDownloadUrl(dl.id, subPath, workspaceId, credentialsIdOf(dl)) if (!urlResp.url) throw new NoSuchFileException(p.toString(), null, "Platform returned no download URL") return fetchSignedUrl(urlResp.url) } + /** First credentials entry on the data-link (or null if none). */ + private static String credentialsIdOf(DataLinkDto dl) { + final creds = dl?.credentials + return (creds && !creds.isEmpty()) ? creds[0].id : null + } + @Override void checkAccess(SeqeraPath p, AccessMode... modes) throws IOException { for (AccessMode m : modes) { @@ -187,7 +192,9 @@ class DataLinksResourceHandler implements ResourceTypeHandler { /** * Lazy {@link Iterable} that maps each {@link DataLinkItem} from a * {@link PagedDataLinkContent} to a child {@link SeqeraPath} under - * {@code parent}. Pages are fetched on demand as the iterator advances. + * {@code parent}. Each produced path carries cached attributes built from the + * item, so a follow-up {@code readAttributes()} call does not re-browse the + * Platform. Pages are fetched on demand as the iterator advances. */ @CompileStatic private static class PathMappingIterable implements Iterable { @@ -204,8 +211,17 @@ class DataLinksResourceHandler implements ResourceTypeHandler { final Iterator inner = content.iterator() return new Iterator() { @Override boolean hasNext() { inner.hasNext() } - @Override Path next() { parent.resolve(inner.next().name) as Path } + @Override Path next() { + final item = inner.next() + return parent.resolveWithAttributes(item.name, attributesFor(item)) as Path + } } } + + private static SeqeraFileAttributes attributesFor(DataLinkItem item) { + if (item.type == DataLinkItemType.FILE) + return new SeqeraFileAttributes(item.size ?: 0L, Instant.EPOCH, Instant.EPOCH, item.name) + return new SeqeraFileAttributes(true) + } } } diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandler.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandler.groovy index a04b1bc154..ef9c50785e 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandler.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandler.groovy @@ -63,13 +63,17 @@ class DatasetsResourceHandler implements ResourceTypeHandler { final d = dir.depth() if (d == 3) { final workspaceId = fs.resolveWorkspaceId(dir.org, dir.workspace) - return resolveDatasets(workspaceId).collect { DatasetDto ds -> dir.resolve(ds.name) as Path } + return resolveDatasets(workspaceId).collect { DatasetDto ds -> + dir.resolveWithAttributes(ds.name, attributesFor(ds)) as Path + } } throw new IllegalArgumentException("datasets handler does not list depth $d paths: $dir") } @Override SeqeraFileAttributes readAttributes(SeqeraPath p) throws IOException { + // Short-circuit: attributes attached when this path was produced by a listing + if (p.cachedAttributes) return p.cachedAttributes final d = p.depth() if (d == 3) { fs.resolveWorkspaceId(p.org, p.workspace) // validates @@ -82,11 +86,15 @@ class DatasetsResourceHandler implements ResourceTypeHandler { final dataset = resolveDataset(workspaceId, names[0]) if (!dataset) throw new NoSuchFileException(p.toString(), null, "Dataset '${names[0]}' not found in workspace ${p.workspace}") + return attributesFor(dataset) + } + + private static SeqeraFileAttributes attributesFor(DatasetDto ds) { return new SeqeraFileAttributes( 0L, - dataset.lastUpdated?.toInstant() ?: Instant.EPOCH, - dataset.dateCreated?.toInstant() ?: Instant.EPOCH, - dataset.id) + ds.lastUpdated?.toInstant() ?: Instant.EPOCH, + ds.dateCreated?.toInstant() ?: Instant.EPOCH, + ds.id) } @Override diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraPathTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraPathTest.groovy index 22afcc1031..6183465f7d 100644 --- a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraPathTest.groovy +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraPathTest.groovy @@ -610,6 +610,35 @@ class SeqeraPathTest extends Specification { p.trail == ['aws', 'inputs', 'reads', 'a.fq'] } + // ---- cached attributes ---- + + def "cachedAttributes is null by default and preserved by resolveWithAttributes"() { + given: + def fs = mockFs() + def parent = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs') + def attrs = new SeqeraFileAttributes(42L, java.time.Instant.EPOCH, java.time.Instant.EPOCH, 'k') + + when: + def child = parent.resolveWithAttributes('reads', attrs) + + then: + parent.cachedAttributes == null + child.cachedAttributes === attrs + child.toString() == 'seqera://acme/research/data-links/aws/inputs/reads' + } + + def "cachedAttributes does not affect equals/hashCode"() { + given: + def fs = mockFs() + def attrs = new SeqeraFileAttributes(true) + def withAttrs = new SeqeraPath(fs, 'seqera://acme/research/datasets').resolveWithAttributes('samples', attrs) + def withoutAttrs = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples') + + expect: + withAttrs == withoutAttrs + withAttrs.hashCode() == withoutAttrs.hashCode() + } + def "iterator on deep data-link path returns all segments"() { given: def fs = mockFs() diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy index 71fa4e10ba..1dc25a2398 100644 --- a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy @@ -22,6 +22,7 @@ import java.nio.file.AccessMode import java.nio.file.NoSuchFileException import java.nio.file.Path +import io.seqera.tower.model.DataLinkCredentials import io.seqera.tower.model.DataLinkDto import io.seqera.tower.model.DataLinkItem import io.seqera.tower.model.DataLinkItemType @@ -40,22 +41,26 @@ class DataLinksResourceHandlerTest extends Specification { private HttpClient http = Mock(HttpClient) private DataLinksResourceHandler handler = new DataLinksResourceHandler(fs, client, http) - private static DataLinkDto dl(String id, String name, DataLinkProvider p) { - def d = new DataLinkDto(); d.id = id; d.name = name; d.provider = p; return d + private static DataLinkDto dl(String id, String name, DataLinkProvider p, String credId = null) { + def d = new DataLinkDto() + d.id = id; d.name = name; d.provider = p + if (credId) { + def c = new DataLinkCredentials(); c.id = credId + d.credentials = [c] + } + return d } private static DataLinkItem item(String name, DataLinkItemType t, long size) { def i = new DataLinkItem(); i.name = name; i.type = t; i.size = size; return i } - /** Single-page {@link PagedDataLinkContent} for tests. */ private static PagedDataLinkContent pagedContent(List items, String originalPath = null) { return new PagedDataLinkContent(originalPath, items, null, new PagedDataLinkContent.PageFetcher() { Map fetch(String t) { throw new IllegalStateException('no more pages') } }) } - /** Iterator over a fixed list — what {@code client.listDataLinks(...)} is expected to return. */ private static Iterator iter(List list) { list.iterator() } private static List asList(Iterable iterable) { @@ -83,12 +88,31 @@ class DataLinksResourceHandlerTest extends Specification { then: 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L - 1 * client.listDataLinks(10L) >> iter([dl('dl-1', 'inputs', DataLinkProvider.AWS)]) - 1 * client.getDownloadUrl('dl-1', 'reads/a.fq', 10L) >> urlResp + 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS) + 1 * client.getDownloadUrl('dl-1', 'reads/a.fq', 10L, null) >> urlResp 1 * http.send(_, _) >> httpResp stream === signedBody } + def "newInputStream forwards credentialsId from the data-link's credentials"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/reads/a.fq') + def urlResp = new DataLinkDownloadUrlResponse(); urlResp.url = 'https://signed/a' + def httpResp = Mock(HttpResponse) { + statusCode() >> 200 + body() >> new ByteArrayInputStream('x'.bytes) + } + + when: + handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS, 'cred-42') + 1 * client.getDownloadUrl('dl-1', 'reads/a.fq', 10L, 'cred-42') >> urlResp + 1 * http.send(_, _) >> httpResp + } + def "newInputStream throws NoSuchFileException when data-link unknown"() { given: def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/unknown/reads/a.fq') @@ -98,7 +122,7 @@ class DataLinksResourceHandlerTest extends Specification { then: 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L - 1 * client.listDataLinks(10L) >> iter([dl('dl-1', 'inputs', DataLinkProvider.AWS)]) + 1 * client.getDataLink(10L, 'aws', 'unknown') >> { throw new NoSuchFileException("data-link not found") } thrown(NoSuchFileException) } @@ -127,8 +151,8 @@ class DataLinksResourceHandlerTest extends Specification { then: 1 * fs.resolveWorkspaceId(_, _) >> 10L - 1 * client.listDataLinks(10L) >> iter([dl('dl-1', 'inputs', DataLinkProvider.AWS)]) - 1 * client.getDownloadUrl('dl-1', 'reads/a.fq', 10L) >> urlResp + 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS) + 1 * client.getDownloadUrl('dl-1', 'reads/a.fq', 10L, null) >> urlResp 1 * http.send(_, _) >> httpResp thrown(IOException) } @@ -146,12 +170,8 @@ class DataLinksResourceHandlerTest extends Specification { then: 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L - 1 * client.listDataLinks(10L) >> iter([ - dl('dl-1', 'a', DataLinkProvider.AWS), - dl('dl-2', 'b', DataLinkProvider.GOOGLE), - dl('dl-3', 'c', DataLinkProvider.AWS) - ]) - paths*.toString().sort() == [ + 1 * client.getDataLinkProviders(10L) >> (['aws', 'google'] as Set) + paths*.toString() == [ 'seqera://acme/research/data-links/aws', 'seqera://acme/research/data-links/google' ] @@ -177,7 +197,7 @@ class DataLinksResourceHandlerTest extends Specification { ] } - def "list at data-link root returns top-level objects"() { + def "list at data-link root returns top-level objects with cached attributes"() { given: def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs') @@ -186,8 +206,8 @@ class DataLinksResourceHandlerTest extends Specification { then: 1 * fs.resolveWorkspaceId(_, _) >> 10L - 1 * client.listDataLinks(10L) >> iter([dl('dl-1', 'inputs', DataLinkProvider.AWS)]) - 1 * client.getContent('dl-1', '', 10L) >> pagedContent([ + 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS) + 1 * client.getContent('dl-1', '', 10L, null) >> pagedContent([ item('reads', DataLinkItemType.FOLDER, 0), item('samplesheet.csv', DataLinkItemType.FILE, 42) ]) @@ -195,6 +215,23 @@ class DataLinksResourceHandlerTest extends Specification { 'seqera://acme/research/data-links/aws/inputs/reads', 'seqera://acme/research/data-links/aws/inputs/samplesheet.csv' ] + // Attributes attached without follow-up API calls + (paths[0] as SeqeraPath).cachedAttributes.directory + (paths[1] as SeqeraPath).cachedAttributes.regularFile + (paths[1] as SeqeraPath).cachedAttributes.size() == 42L + } + + def "list forwards credentialsId to getContent when data-link has credentials"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs') + + when: + asList(handler.list(path)) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS, 'cred-42') + 1 * client.getContent('dl-1', '', 10L, 'cred-42') >> pagedContent([]) } def "list at deep sub-path browses the correct sub-path"() { @@ -206,8 +243,8 @@ class DataLinksResourceHandlerTest extends Specification { then: 1 * fs.resolveWorkspaceId(_, _) >> 10L - 1 * client.listDataLinks(10L) >> iter([dl('dl-1', 'inputs', DataLinkProvider.AWS)]) - 1 * client.getContent('dl-1', 'reads', 10L) >> pagedContent([ + 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS) + 1 * client.getContent('dl-1', 'reads', 10L, null) >> pagedContent([ item('a.fq', DataLinkItemType.FILE, 1), item('b.fq', DataLinkItemType.FILE, 2) ]) @@ -243,7 +280,7 @@ class DataLinksResourceHandlerTest extends Specification { then: 1 * fs.resolveWorkspaceId(_, _) >> 10L - 1 * client.listDataLinks(10L) >> iter([dl('dl-1', 'inputs', DataLinkProvider.AWS)]) + 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS) attr.directory } @@ -256,8 +293,8 @@ class DataLinksResourceHandlerTest extends Specification { then: 1 * fs.resolveWorkspaceId(_, _) >> 10L - 1 * client.listDataLinks(10L) >> iter([dl('dl-1', 'inputs', DataLinkProvider.AWS)]) - 1 * client.getContent('dl-1', 'reads/a.fq', 10L) >> pagedContent([ + 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS) + 1 * client.getContent('dl-1', 'reads/a.fq', 10L, null) >> pagedContent([ item('a.fq', DataLinkItemType.FILE, 123) ]) attr.regularFile @@ -273,13 +310,29 @@ class DataLinksResourceHandlerTest extends Specification { then: 1 * fs.resolveWorkspaceId(_, _) >> 10L - 1 * client.listDataLinks(10L) >> iter([dl('dl-1', 'inputs', DataLinkProvider.AWS)]) - 1 * client.getContent('dl-1', 'reads', 10L) >> pagedContent( + 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS) + 1 * client.getContent('dl-1', 'reads', 10L, null) >> pagedContent( [item('a.fq', DataLinkItemType.FILE, 1), item('b.fq', DataLinkItemType.FILE, 2)], 'reads/') attr.directory } + def "readAttributes short-circuits when path has cached attributes (no API call)"() { + given: + def attrs = new io.seqera.tower.plugin.fs.SeqeraFileAttributes(99L, java.time.Instant.EPOCH, java.time.Instant.EPOCH, 'key') + def parent = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/reads') + def path = parent.resolveWithAttributes('a.fq', attrs) + + when: + def got = handler.readAttributes(path) + + then: + 0 * fs.resolveWorkspaceId(_, _) + 0 * client.getDataLink(_, _, _) + 0 * client.getContent(_, _, _, _) + got === attrs + } + // ===================================================== // error paths — US3 // ===================================================== @@ -307,7 +360,7 @@ class DataLinksResourceHandlerTest extends Specification { then: 1 * fs.resolveWorkspaceId(_, _) >> 10L - 1 * client.listDataLinks(10L) >> iter([dl('dl-1', 'inputs', DataLinkProvider.AWS)]) + 1 * client.getDataLink(10L, 'aws', 'ghost') >> { throw new NoSuchFileException("not found") } thrown(NoSuchFileException) } @@ -320,8 +373,8 @@ class DataLinksResourceHandlerTest extends Specification { then: 1 * fs.resolveWorkspaceId(_, _) >> 10L - 1 * client.listDataLinks(10L) >> iter([dl('dl-1', 'inputs', DataLinkProvider.AWS)]) - 1 * client.getContent('dl-1', 'does/not/exist', 10L) >> pagedContent([]) + 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS) + 1 * client.getContent('dl-1', 'does/not/exist', 10L, null) >> pagedContent([]) thrown(NoSuchFileException) } diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandlerTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandlerTest.groovy index 702d34f322..752091c2d6 100644 --- a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandlerTest.groovy +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandlerTest.groovy @@ -188,6 +188,38 @@ class DatasetsResourceHandlerTest extends Specification { attr.fileKey() == 'd1' } + def "list attaches cached attributes to every child path"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets') + def now = java.time.OffsetDateTime.parse('2026-03-01T12:00:00Z') + def d = ds('d1', 'samples'); d.dateCreated = now; d.lastUpdated = now + + when: + def paths = handler.list(path).toList() + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDatasets(10L) >> [d] + def cached = (paths[0] as SeqeraPath).cachedAttributes + cached != null + cached.regularFile + cached.fileKey() == 'd1' + } + + def "readAttributes short-circuits when the path has cached attributes"() { + given: + def attrs = new io.seqera.tower.plugin.fs.SeqeraFileAttributes(0L, java.time.Instant.EPOCH, java.time.Instant.EPOCH, 'key') + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets').resolveWithAttributes('samples', attrs) + + when: + def got = handler.readAttributes(path) + + then: + 0 * fs.resolveWorkspaceId(_, _) + 0 * client.listDatasets(_) + got === attrs + } + def "checkAccess rejects WRITE"() { given: def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples') diff --git a/specs/260422-seqera-datalinks-fs/plan.md b/specs/260422-seqera-datalinks-fs/plan.md index eb1f4ea617..3f76fc89d2 100644 --- a/specs/260422-seqera-datalinks-fs/plan.md +++ b/specs/260422-seqera-datalinks-fs/plan.md @@ -6,7 +6,7 @@ ## Summary -Extend the `nf-tower` plugin's `seqera://` NIO filesystem (shipped in [260310-seqera-dataset-fs](../260310-seqera-dataset-fs/spec.md)) with a second resource type, `data-links`. Paths of the form `seqera:////data-links///` resolve to entries inside Platform-managed data-links (S3/GCS/Azure buckets or prefixes). Listings and attribute queries hit the Platform's `/data-links/{id}/content` endpoint; byte reads go through a pre-signed URL returned by `/data-links/{id}/download` and fetched with a plain JDK HTTP client — no cloud SDK dependency. +Extend the `nf-tower` plugin's `seqera://` NIO filesystem (shipped in [260310-seqera-dataset-fs](../260310-seqera-dataset-fs/spec.md)) with a second resource type, `data-links`. Paths of the form `seqera:////data-links///` resolve to entries inside Platform-managed data-links (S3/GCS/Azure buckets or prefixes). Listings and attribute queries hit the Platform's `/data-links/{id}/browse[/path]` endpoints; byte reads go through a pre-signed URL returned by `/data-links/{id}/generate-download-url` and fetched with a plain JDK HTTP client — no cloud SDK dependency. As part of this change, extract a real `ResourceTypeHandler` abstraction from the existing dataset logic. `DatasetsResourceHandler` and `DataLinksResourceHandler` are parallel implementations; the generic classes (`SeqeraFileSystemProvider`, `SeqeraFileSystem`, `SeqeraPath`, `SeqeraFileAttributes`) become resource-type-agnostic for depth ≥ 3. @@ -59,33 +59,35 @@ plugins/nf-tower/ └── src/ (VERSION and changelog.txt updated at release time, not in this feature) ├── main/io/seqera/tower/plugin/ │ ├── fs/ - │ │ ├── ResourceTypeHandler.groovy ← NEW (interface) - │ │ ├── SeqeraFileSystemProvider.groovy ← refactored (dispatch by handler) - │ │ ├── SeqeraFileSystem.groovy ← refactored (handler registry) - │ │ ├── SeqeraPath.groovy ← refactored (generic trail segments) - │ │ ├── SeqeraFileAttributes.groovy ← refactored (isDir, size, lastMod) + │ │ ├── ResourceTypeHandler.groovy ← NEW (interface; list returns Iterable) + │ │ ├── SeqeraFileSystemProvider.groovy ← refactored (dispatch by handler; lazy filter iterator) + │ │ ├── SeqeraFileSystem.groovy ← refactored (handler registry; no dataset caches) + │ │ ├── SeqeraPath.groovy ← refactored (trail segments, cachedAttributes, resolveWithAttributes) + │ │ ├── SeqeraFileAttributes.groovy ← refactored (isDir, size, lastMod, created, fileKey) │ │ ├── SeqeraPathFactory.groovy ← unchanged │ │ ├── DatasetInputStream.groovy ← unchanged │ │ └── handler/ - │ │ ├── DatasetsResourceHandler.groovy ← NEW (extracted) + │ │ ├── DatasetsResourceHandler.groovy ← NEW (extracted; owns dataset caches; parses @version) │ │ └── DataLinksResourceHandler.groovy ← NEW │ ├── dataset/ │ │ └── SeqeraDatasetClient.groovy ← unchanged │ └── datalink/ ← NEW package - │ └── SeqeraDataLinkClient.groovy ← NEW + │ ├── SeqeraDataLinkClient.groovy ← NEW (typed client; returns iterators and PagedDataLinkContent) + │ └── PagedDataLinkContent.groovy ← NEW (lazy pagination view over data-link content) └── test/io/seqera/tower/plugin/ ├── fs/ - │ ├── SeqeraPathTest.groovy ← extended (sub-path cases) + │ ├── SeqeraPathTest.groovy ← extended (sub-path cases, cachedAttributes, trailing slash) │ ├── SeqeraFileSystemTest.groovy ← extended (handler registry) │ ├── SeqeraFileSystemProviderTest.groovy ← extended (data-link dispatch specs) + │ ├── ResourceTypeAbstractionTest.groovy ← NEW (architectural guard) │ └── handler/ - │ ├── DatasetsResourceHandlerTest.groovy ← NEW - │ └── DataLinksResourceHandlerTest.groovy ← NEW + │ ├── DatasetsResourceHandlerTest.groovy ← NEW (caches, attr short-circuit) + │ └── DataLinksResourceHandlerTest.groovy ← NEW (cache, credentialsId, paged listings) └── datalink/ - └── SeqeraDataLinkClientTest.groovy ← NEW + └── SeqeraDataLinkClientTest.groovy ← NEW (pagination, endpoint URLs, error mapping) ``` -**Structure decision**: Parallel `datalink/` package mirrors the existing `dataset/` package. Handlers live in `fs/handler/` so the generic NIO classes in `fs/` remain resource-type-agnostic. All wire DTOs are reused from `io.seqera.tower.model.*` — no plugin-local DTO classes. +**Structure decision**: Parallel `datalink/` package mirrors the existing `dataset/` package. Handlers live in `fs/handler/` so the generic NIO classes in `fs/` remain resource-type-agnostic. All wire DTOs are reused from `io.seqera.tower.model.*` — no plugin-local DTO classes. `PagedDataLinkContent` is a plugin-local service type (not a DTO) that wraps lazy pagination over `DataLinkItem` streams. --- @@ -97,25 +99,31 @@ All reused from `io.seqera:tower-api:1.121.0` (already on the classpath): | DTO | Fields used here | |---|---| -| `DataLinkDto` | `id: String`, `name: String`, `provider: DataLinkProvider`, `resourceRef: String` | -| `DataLinkProvider` (enum) | `AWS`, `GOOGLE`, `AZURE`, `AZURE_ENTRA`, `AZURE_CLOUD`, `SEQERACOMPUTE`, `S3` — exposes a `String value` via `toString()` | +| `DataLinkDto` | `id: String`, `name: String`, `provider: DataLinkProvider`, `resourceRef: String`, `credentials: List` | +| `DataLinkCredentials` | `id: String`, `name: String`, `provider: DataLinkProvider` | +| `DataLinkProvider` (enum) | `AWS`, `GOOGLE`, `AZURE`, `AZURE_ENTRA`, `AZURE_CLOUD`, `SEQERACOMPUTE`, `S3`. `toString()` returns the **lowercase** enum value (e.g. `"aws"`, `"google"`) — this is what appears in user-visible paths. | | `DataLinksListResponse` | `dataLinks: List`, `totalSize: Long` | | `DataLinkContentResponse` | `originalPath: String`, `objects: List`, `nextPageToken: String` | | `DataLinkItem` | `type: DataLinkItemType`, `name: String`, `size: Long`, `mimeType: String` — no last-modified field | | `DataLinkItemType` (enum) | `FOLDER`, `FILE` | | `DataLinkDownloadUrlResponse` | `url: String` | -**Attribute consequence**: `DataLinkItem` does not expose a last-modified timestamp. `SeqeraFileAttributes.lastModifiedTime()` for data-link paths returns `FileTime.from(Instant.EPOCH)`. Spec assumption 2 and FR-005 remain satisfied — we return a valid `FileTime`; the absence of real data is a Platform-API limitation. +**Attribute consequence**: `DataLinkItem` does not expose a last-modified timestamp. `SeqeraFileAttributes.lastModifiedTime()` for data-link paths returns `FileTime.from(Instant.EPOCH)`. Spec assumption and FR-005 remain satisfied — we return a valid `FileTime`; the absence of real data is a Platform-API limitation. ### Platform endpoints (confirmed from OpenAPI) -- `GET /data-links?workspaceId=&max=&offset=` → `DataLinksListResponse`. Pagination: `totalSize` is the full count; keep fetching with offset until sum of received `dataLinks` equals `totalSize`. Default `max` is the server default; plugin uses `max=100` per page. -- `GET /data-links/{id}/content?workspaceId=&path=&nextPageToken=` → `DataLinkContentResponse`. Works for directory and file paths. Pagination: follow `nextPageToken` until null/empty. -- `GET /data-links/{id}/download?workspaceId=&path=` → `DataLinkDownloadUrlResponse` with a cloud-signed URL. +| Operation | Endpoint | Notes | +|---|---|---| +| List data-links in workspace | `GET /data-links?workspaceId=&max=&offset=` | Offset pagination. `totalSize` = full count; `max=100` per page. Optional `&search=` used by `getDataLink` for server-side pre-filter. | +| Browse root of a data-link | `GET /data-links/{id}/browse?workspaceId=` | Token pagination via `nextPageToken`. Optional `credentialsId`. | +| Browse a sub-path | `GET /data-links/{id}/browse/{path}?workspaceId=` | Same response and pagination as the root variant. The `{path}` segment preserves `/` as path separators. | +| Pre-signed download URL | `GET /data-links/{id}/generate-download-url?workspaceId=&filePath=` | Returns `DataLinkDownloadUrlResponse.url`. Optional `credentialsId`. | + +`credentialsId` is taken from `DataLinkDto.credentials[0].id` when the list is non-empty, otherwise the query parameter is omitted. ### Signed-URL fetch -The signed URL is **not** a Seqera endpoint; it points at S3/GCS/Azure with auth baked into the query string. It must be fetched **without** the Seqera `Authorization` header (AWS SigV4 will reject unknown `Authorization` headers). Use a standalone `java.net.http.HttpClient` inside `DataLinksResourceHandler` for this fetch. Do **not** use `TowerClient.sendStreamingRequest`, which adds Seqera auth headers. +The signed URL is **not** a Seqera endpoint; it points at S3/GCS/Azure with auth baked into the query string. It must be fetched **without** the Seqera `Authorization` header (AWS SigV4 will reject unknown `Authorization` headers). Use a standalone `java.net.http.HttpClient` inside `DataLinksResourceHandler` for this fetch. Do **not** use `TowerClient.sendStreamingRequest`, which would add Seqera auth headers. ### SeqeraPath refactor shape @@ -126,14 +134,17 @@ Replace the six typed fields (`org`, `workspace`, `resourceType`, `datasetName`, - `resourceType: String` (or null) - `trail: List` (possibly empty) — the segments after `resourceType` - `relPath: String` (for relative paths; mutually exclusive with absolute segments) +- `cachedAttributes: SeqeraFileAttributes` (nullable) — set only by handlers when this path is produced by a listing, so subsequent `readAttributes` calls skip the API -The `trail` is opaque to `SeqeraPath` — handlers interpret it. Concrete interpretations: +`trail` is opaque to `SeqeraPath` — handlers interpret it. Trail segments are stored verbatim (including any `@version` suffix for datasets); interpretation is the handler's responsibility. Concrete interpretations: -- **Dataset** (`resourceType = "datasets"`): `trail.size() == 0` → resource-type dir; `trail.size() == 1` → dataset file, with optional `@version` suffix on the single element; `trail.size() > 1` → invalid. +- **Dataset** (`resourceType = "datasets"`): `trail.size() == 0` → resource-type dir; `trail.size() == 1` → dataset file. The trail segment may carry an `@version` suffix (e.g. `samples@2`); `DatasetsResourceHandler.parseNameAndVersion` splits it internally. - **Data-link** (`resourceType = "data-links"`): `trail.size() == 0` → resource-type dir; `trail.size() == 1` → provider dir; `trail.size() == 2` → data-link root dir; `trail.size() ≥ 3` → entry inside the data-link (directory or file, per `readAttributes`). `depth()` becomes `3 + trail.size()` when `resourceType` is set, else the count of non-null identity fields. +`SeqeraPath` tolerates trailing slashes and accidental double-slashes in the URI (empty trail segments are filtered at parse time). `cachedAttributes` is ignored by `equals`/`hashCode`/`toString`/`toUri`/`resolve`/`getParent`; a new method `resolveWithAttributes(String segment, SeqeraFileAttributes attrs)` produces a child path carrying the given attrs. + ### Existing tests to preserve Running `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.fs.*'` and `... --tests 'io.seqera.tower.plugin.dataset.*'` must continue to pass throughout the refactor. The dataset behavior does not change; only the class that implements it does. @@ -149,8 +160,12 @@ interface ResourceTypeHandler { /** the depth-3 segment this handler owns, e.g. "datasets" or "data-links" */ String getResourceType() - /** list entries at the given directory path owned by this handler; caller verified depth >= 3 and isDirectory */ - List list(SeqeraPath dir) throws IOException + /** + * List entries at the given directory path. Caller has verified depth >= 3. + * Returning Iterable lets implementations stream large listings without + * materializing them in memory. + */ + Iterable list(SeqeraPath dir) throws IOException /** return BasicFileAttributes for any path at depth >= 3 owned by this handler */ SeqeraFileAttributes readAttributes(SeqeraPath path) throws IOException @@ -163,25 +178,64 @@ interface ResourceTypeHandler { } ``` +Handlers build each child path via `parent.resolveWithAttributes(segmentName, attrs)` so subsequent `readAttributes` calls short-circuit when the same path is used. + ### `SeqeraDataLinkClient` contract ```groovy class SeqeraDataLinkClient { SeqeraDataLinkClient(TowerClient towerClient) - /** exhaust pagination; return all data-links in the workspace */ - List listDataLinks(long workspaceId) + /** + * Lazy iterator over every data-link in the workspace. Pages fetched on demand + * via GET /data-links?workspaceId=&max=100&offset=. + */ + Iterator listDataLinks(long workspaceId) + + /** + * Server-side-filtered resolution of a single data-link by (provider, name). + * Iterates /data-links with &search=, short-circuits on first match; + * result is @Memoized per (workspaceId, provider, name). + * Throws NoSuchFileException if not found. + */ + DataLinkDto getDataLink(long workspaceId, String provider, String name) + + /** Distinct provider identifiers present in the workspace (sorted). */ + Set getDataLinkProviders(long workspaceId) + + /** + * Lazy paginated view over /data-links/{id}/browse[/{path}]. + * The returned PagedDataLinkContent loads the first page eagerly and paginates + * subsequent pages as its iterator advances. + */ + PagedDataLinkContent getContent(String dataLinkId, String subPath, long workspaceId, String credentialsId = null) + + /** GET /data-links/{id}/generate-download-url?filePath=[&credentialsId=] */ + DataLinkDownloadUrlResponse getDownloadUrl(String dataLinkId, String subPath, long workspaceId, String credentialsId = null) +} +``` + +All endpoints translate 401/403/404/5xx through the same `checkFsResponse` pattern used in `SeqeraDatasetClient`. The `credentialsId` parameter is forwarded as a query-string value when non-null; the handler sources it from `DataLinkDto.credentials[0].id`. - /** GET /data-links/{id}/content?path= — exhausts nextPageToken pagination */ - DataLinkContentResponse getContent(String dataLinkId, String subPath, long workspaceId) +### `PagedDataLinkContent` contract - /** GET /data-links/{id}/download?path= */ - DataLinkDownloadUrlResponse getDownloadUrl(String dataLinkId, String subPath, long workspaceId) +```groovy +class PagedDataLinkContent implements Iterable { + /** Page fetcher: fetch(null) -> first page; fetch(token) -> next page. */ + static interface PageFetcher { + Map fetch(String nextPageToken) throws IOException + // returns: {objects: List, nextPageToken: String, originalPath: String (first page only)} + } + + PagedDataLinkContent(String originalPath, List firstPage, String firstPageNextToken, PageFetcher pageFetcher) + + String getOriginalPath() + List getFirstPage() // eager, already loaded at construction + boolean isEmpty() + Iterator iterator() // yields first-page items, then paginates lazily } ``` -All three translate 401/403/404/5xx through the same `checkFsResponse` pattern used in `SeqeraDatasetClient`. - ### `SeqeraFileSystem` handler registry ```groovy @@ -225,22 +279,22 @@ Internal fields become `(directory, size, lastModified, created, fileKey)`. `Dat | Path shape | Method | Implementation | |---|---|---| -| `data-links/` (trail=[]) | `list` | enumerate distinct `DataLinkDto.provider` (via `toString()` / enum value) in cached list; return paths `data-links/` | -| `data-links/` (trail=[p]) | `list` | filter cached list where provider matches; return paths `data-links//` | -| `data-links//` (trail=[p,n]) | `list` | resolve to `dataLinkId`; call `getContent(id, "", wsId)`; map `objects` to child paths | -| `data-links////…` (trail ≥ 3) | `list` | call `getContent(id, "/…", wsId)`; map `objects` | -| any depth ≥ 3 | `readAttributes` | data-link-root path → directory; below that → `getContent(id, sub, ws)`; if response `objects` has one item matching the last segment with `type = FILE`, return file attrs (size from item); otherwise → directory | -| leaf file | `newInputStream` | `getDownloadUrl(id, sub, ws)`; open plain `HttpClient.send(..., BodyHandlers.ofInputStream())` against `response.url`; return body stream | +| `data-links/` (trail=[]) | `list` | `client.getDataLinkProviders(ws)` → distinct providers (sorted); emit child paths `data-links/` | +| `data-links/` (trail=[p]) | `list` | stream `client.listDataLinks(ws)`; collect names where provider matches; emit child paths; `NoSuchFileException` if none match | +| `data-links//` (trail=[p,n]) | `list` | `client.getDataLink(ws, p, n)` → `dl`; `client.getContent(dl.id, "", ws, credentialsIdOf(dl))` → wrap items as `Iterable` carrying cached `SeqeraFileAttributes` | +| `data-links////…` (trail ≥ 3) | `list` | same as above with `subPath = trail[2..].join('/')` | +| any depth ≥ 3 | `readAttributes` | short-circuit if `p.cachedAttributes` is set; else: data-link-root → directory; deeper → `getContent(id, sub, ws, credentialsIdOf(dl)).firstPage`; if one item matching the last segment with `type = FILE`, return file attrs (size from item); otherwise → directory | +| leaf file | `newInputStream` | `client.getDataLink(ws, p, n)` → `dl`; `client.getDownloadUrl(dl.id, sub, ws, credentialsIdOf(dl))`; open a plain JDK `HttpClient.send(..., BodyHandlers.ofInputStream())` against `response.url`; return body stream | -Provider segment canonicalization: the path segment is the `DataLinkProvider` enum's `toString()`. A path with an unknown provider segment maps to `NoSuchFileException`. +`credentialsIdOf(dl)` returns `dl.credentials[0].id` when non-empty, else `null` (query parameter omitted). -### Data-link identity resolution +Provider segment canonicalization: the path segment is the `DataLinkProvider` enum's `toString()` — lowercase (e.g. `aws`, `google`, `azure`). A path with an unknown provider segment fails via `client.getDataLink(...)` → `NoSuchFileException`. + +Listings populate cached attributes on each emitted `SeqeraPath` (via `parent.resolveWithAttributes(name, attrs)`) so a follow-up `readAttributes(child)` returns immediately with zero API calls. Attributes come directly from each `DataLinkItem`: file → `(size, Instant.EPOCH, Instant.EPOCH, item.name)`; folder → `SeqeraFileAttributes(true)`. -`DataLinksResourceHandler.resolveDataLinkId(provider, name, workspaceId)`: +### Data-link identity resolution -1. Ensure the workspace's data-link list is loaded (cached `Map>` inside the handler). -2. Find `DataLinkDto` where `provider.toString() == providerSegment && name == nameSegment`. -3. Return `id`; throw `NoSuchFileException` with a clear message if not found. +`client.getDataLink(workspaceId, provider, name)` iterates `/data-links?search=` (server-side pre-filter) and returns the first entry whose `provider.toString() == providerSegment`. Memoized via `@Memoized` per `(workspaceId, provider, name)` — repeated handler calls within a run hit the memoization cache. The handler does NOT maintain its own `Map>` cache — the client-level streaming iterator plus memoized-lookup replaces it. --- diff --git a/specs/260422-seqera-datalinks-fs/spec.md b/specs/260422-seqera-datalinks-fs/spec.md index fcb31759b4..129fa82661 100644 --- a/specs/260422-seqera-datalinks-fs/spec.md +++ b/specs/260422-seqera-datalinks-fs/spec.md @@ -16,9 +16,13 @@ - Q: Path hierarchy — does the data-link identity segment include the provider? → A: Yes. `data-links///...`. Names are not globally unique within a workspace (the same name may exist on two different providers), so the provider segment is required to disambiguate and mirrors the Platform UI's provider-grouped data explorer. - Q: How deep can a data-link path go? → A: Arbitrary depth below the data-link root. Each segment after `` is an entry inside the underlying bucket/prefix — a directory or file, resolved via the Platform browse API. - Q: How should the existing dataset filesystem code be extended to accommodate data-links? → A: Introduce a true resource-type abstraction (`ResourceTypeHandler`). The current dataset-specific logic in `SeqeraFileSystemProvider`, `SeqeraFileSystem`, and `SeqeraPath` is extracted into a `DatasetsResourceHandler`; data-links are added as a parallel `DataLinksResourceHandler`. The core path/filesystem/provider classes become resource-type-agnostic. -- Q: How should the listing vs I/O boundary work? → A: Listing (`newDirectoryStream`) and attributes (`readAttributes`) are resolved via `GET /data-links/{id}/content?path=`. Downloads (`newInputStream`) go through `GET /data-links/{id}/download?path=`, which returns a pre-signed URL that is then fetched with the existing `TowerClient.sendStreamingRequest()` streaming helper. No cloud SDK is used. -- Q: Which DTOs are introduced by this feature? → A: None. All types are reused from the `io.seqera:tower-api:1.121.0` dependency (`DataLinkDto`, `DataLinkItem`, `DataLinkProvider`, `DataLinkContentResponse`, `DataLinkDownloadUrlResponse`, etc.). -- Q: Is browse-per-file supported by the Platform API? → A: Yes. `GET /data-links/{id}/content?path=` works for both directories and files, so `readAttributes` on any path is a single targeted call — no parent-browse-and-filter, no N+1 problem. +- Q: How should the listing vs I/O boundary work? → A: Listing (`newDirectoryStream`) and attributes (`readAttributes`) are resolved via the Platform's browse endpoints (`GET /data-links/{id}/browse` for the data-link root and `GET /data-links/{id}/browse/{path}` for sub-paths). Downloads (`newInputStream`) go through `GET /data-links/{id}/generate-download-url?filePath=` to obtain a pre-signed URL, which is then fetched with a plain JDK `HttpClient` (no Seqera auth header on the cloud-backed URL). No cloud SDK is used. +- Q: Which DTOs are introduced by this feature? → A: None. All types are reused from the `io.seqera:tower-api:1.121.0` dependency (`DataLinkDto`, `DataLinkItem`, `DataLinkProvider`, `DataLinkCredentials`, `DataLinkContentResponse`, `DataLinkDownloadUrlResponse`, etc.). A plugin-local `PagedDataLinkContent` holder class wraps the eager-first-page + lazy-pagination behavior but holds only tower-api types. +- Q: Is browse-per-file supported by the Platform API? → A: Yes. `GET /data-links/{id}/browse/{path}` works for both directories and files, so `readAttributes` on any path is a single targeted call — no parent-browse-and-filter, no N+1 problem. +- Q: How are paginated Platform responses returned to callers? → A: Streaming. The workspace data-link list (`GET /data-links`) returns an `Iterator` that fetches offsets on demand. The browse endpoint returns a `PagedDataLinkContent` that loads the first page eagerly (so `readAttributes` can inspect it without iterating) and fetches subsequent pages lazily as the iterator advances. The handler layer exposes `Iterable` to the NIO `DirectoryStream`; no full materialization of listings in memory. +- Q: How are attributes discovered after a listing? → A: When `newDirectoryStream` yields a child path, the handler attaches the per-item attributes (size for files, directory marker for folders) to the `SeqeraPath` via an optional cache field. A subsequent `readAttributes` on that path returns the cached value without any additional Platform API call. Paths parsed from URIs (no prior listing) fall back to the live browse endpoint. +- Q: How are cloud credentials for the underlying bucket/prefix selected? → A: The Platform's `DataLinkDto.credentials` list associates one or more credential records with a data-link. The plugin forwards the first credential's ID as the `credentialsId` query parameter on browse and download-URL requests, when present. If the data-link has no associated credentials, the parameter is omitted and the Platform uses its default resolution. +- Q: Which provider-segment value appears in user-visible paths? → A: The lowercase value of the `DataLinkProvider` enum, as exposed by its `toString()` (e.g. `aws`, `google`, `azure`). This matches the Platform UI. - Q: What happens if the pre-signed URL expires during a long read? → A: The underlying HTTP connection errors out with an `IOException`. The plugin does not transparently re-issue URLs; Nextflow's task retry handles the failure as it already does for other transient I/O errors. ## User Scenarios & Testing *(mandatory)* @@ -111,7 +115,7 @@ A Nextflow or Seqera engineer wants the filesystem's resource-type abstraction t - **FR-001**: System MUST accept paths in the format `seqera:////data-links///` where `` is zero or more segments addressing a directory or file inside the data-link. - **FR-002**: System MUST read file content addressed by a data-link path transparently, requiring only the existing `tower.accessToken` / `TOWER_ACCESS_TOKEN` configuration — no cloud-provider credentials. -- **FR-003**: System MUST perform listing and attribute queries via the Seqera Platform API (`GET /data-links/{id}/content?path=`) and stream file content via pre-signed URLs returned from `GET /data-links/{id}/download?path=`. +- **FR-003**: System MUST perform listing and attribute queries via the Seqera Platform browse endpoints (`GET /data-links/{id}/browse` for the data-link root and `GET /data-links/{id}/browse/{path}` for sub-paths), and stream file content via pre-signed URLs returned from `GET /data-links/{id}/generate-download-url?filePath=`. - **FR-004**: System MUST support hierarchical directory listing: - `seqera:////` → directory; entries include `datasets` and `data-links` (enumerated from the handler registry). - `seqera:////data-links/` → directory; entries are distinct provider identifiers present in the workspace. @@ -119,24 +123,25 @@ A Nextflow or Seqera engineer wants the filesystem's resource-type abstraction t - `seqera:////data-links///` → directory; entries are the top-level items in the data-link. - `seqera:////data-links////` → directory; entries are the children at that sub-path. - `seqera:////data-links////` → file. -- **FR-005**: System MUST return correct `BasicFileAttributes` — `isDirectory`, `isRegularFile`, `size`, `lastModifiedTime`, `creationTime` — for any path inside a data-link, sourced from the Platform's content response for that path. +- **FR-005**: System MUST return correct `BasicFileAttributes` — `isDirectory`, `isRegularFile`, `size`, `lastModifiedTime`, `creationTime` — for any path inside a data-link. When a path was produced by a prior `newDirectoryStream` listing, its attributes MUST be returned from the listing response without a follow-up API call. Paths parsed from a URI (no prior listing) MUST source attributes from the Platform's browse endpoint for that specific path. - **FR-006**: System MUST treat data-link paths as read-only in this iteration. Any write-like operation (`newByteChannel` with `WRITE`/`APPEND`, `copy` with a data-link as target, `delete`, `createDirectory`, `move`) MUST fail with `UnsupportedOperationException` or `AccessDeniedException`, consistent with the dataset feature's read-only stance. - **FR-007**: System MUST produce clear, actionable error messages distinguishing: unknown org/workspace, unknown provider, unknown data-link name, missing sub-path, unsupported resource type, authentication failure, and transient Platform errors. -- **FR-008**: System MUST NOT depend on `nf-amazon`, `nf-google`, or `nf-azure`. All cloud I/O is reduced to a single HTTPS fetch of a pre-signed URL via the existing `TowerClient` / `HxClient` streaming path. -- **FR-009**: System MUST reuse DTOs from `io.seqera:tower-api:1.121.0` (`DataLinkDto`, `DataLinkContentResponse`, `DataLinkItem`, `DataLinkDownloadUrlResponse`, `DataLinkProvider`, etc.) without introducing parallel plugin-local classes. +- **FR-008**: System MUST NOT depend on `nf-amazon`, `nf-google`, or `nf-azure`. All cloud I/O is reduced to a single HTTPS fetch of a pre-signed URL. The signed URL is fetched with a plain JDK `HttpClient` — NOT through `TowerClient`, since the URL is addressed to the cloud backend and must not carry the Seqera `Authorization` header. +- **FR-009**: System MUST reuse DTOs from `io.seqera:tower-api:1.121.0` (`DataLinkDto`, `DataLinkContentResponse`, `DataLinkItem`, `DataLinkDownloadUrlResponse`, `DataLinkCredentials`, `DataLinkProvider`, etc.) without introducing parallel plugin-local DTOs. A plugin-local `PagedDataLinkContent` service type is permitted as a lazy-pagination wrapper around tower-api types. - **FR-010**: System MUST refactor the existing `fs/` package to introduce a `ResourceTypeHandler` interface. `DatasetsResourceHandler` MUST encapsulate all dataset-specific behavior previously inlined in `SeqeraFileSystemProvider` / `SeqeraFileSystem` / `SeqeraPath`. `DataLinksResourceHandler` MUST implement the same interface. - **FR-011**: After the refactor, the classes `SeqeraPath`, `SeqeraFileSystem`, and `SeqeraFileSystemProvider` MUST contain no dataset- or data-link-specific logic for depth ≥ 3; all such logic MUST live in the respective handler. - **FR-012**: `SeqeraPath` MUST parse and represent arbitrary sub-paths below depth 4 for resource types that support them (data-links). Datasets continue to reject sub-paths beyond depth 4. - **FR-013**: The filesystem MUST reuse the existing `TowerClient` retry/backoff for all Platform API calls. No new retry logic is introduced. - **FR-014**: Transient failure of a pre-signed URL fetch mid-stream MUST surface as `IOException`; Nextflow task retry handles the recovery. The plugin MUST NOT re-issue URLs transparently within a single `InputStream`. -- **FR-015**: Data-link list results (per workspace) MUST be cached for the lifetime of a single `SeqeraFileSystem` instance. No browse-result or URL cache is maintained. -- **FR-016**: `GET /data-links?workspaceId=X` pagination MUST be exhausted before the workspace data-link list cache is considered populated. +- **FR-015**: System MUST NOT maintain a global or per-run cache of browse-result pages or pre-signed URLs. A cheap per-path attribute cache lives on each `SeqeraPath` instance returned by a listing (file size / directory flag captured from the listing item); this cache is scoped to the lifetime of that path object and is not shared across paths. +- **FR-016**: Paginated Platform responses MUST be exposed to callers as lazy iterators — callers consume pages only as elements are requested. The workspace data-link list (`GET /data-links?workspaceId=X&max=&offset=`) MUST be returned as an `Iterator`; the data-link content endpoint (`GET /data-links/{id}/browse[/path]`) MUST be returned as a `PagedDataLinkContent` view backed by a lazy iterator of `DataLinkItem`. +- **FR-017**: System MUST forward the data-link's associated credentials identifier to the Platform when one is available. When `DataLinkDto.credentials` is non-empty, the first entry's `id` MUST be passed as the `credentialsId` query parameter on browse (`GET /data-links/{id}/browse[/path]`) and download-URL (`GET /data-links/{id}/generate-download-url`) requests. When the list is empty, the parameter MUST be omitted so the Platform applies its default resolution. ### Key Entities - **Data-Link**: A Seqera Platform entity referencing a bucket or prefix on a cloud provider (S3, GCS, Azure Blob, etc.). Addressed by `(workspaceId, provider, name)`; content is browsed and read through Platform API calls. Represented in the path as `data-links///`. - **Data-Link Provider**: A Platform-defined identifier for the cloud backend (`DataLinkProvider` enum values, e.g. `aws`, `google`, `azure`). Used as a path segment to disambiguate data-links with the same name on different providers. -- **Data-Link Entry**: An item inside a data-link — a file or folder — returned by the content API. Has a name, type (`FILE`/`FOLDER`), size, and last-modified timestamp. +- **Data-Link Entry**: An item inside a data-link — a file or folder — returned by the browse API. Has a name, type (`FILE`/`FOLDER`), size, and MIME type. The Platform's browse response does not currently expose a per-item last-modified timestamp, so that attribute is reported as epoch. - **Resource-Type Handler**: A pluggable strategy that owns the semantics of one depth-3 path segment (`datasets`, `data-links`, …). Exposes listing, attribute, read, and access-check operations to the generic filesystem. - **Seqera Path (data-link variant)**: The URI `seqera:////data-links//[//…]`. All segments up to and including `` form the data-link identity; the remainder is the sub-path within the data-link. @@ -154,20 +159,21 @@ A Nextflow or Seqera engineer wants the filesystem's resource-type abstraction t ## Assumptions - Authentication reuses the existing nf-tower plugin credential mechanism (Seqera access token); no new auth configuration is required from users. -- The `GET /data-links/{id}/content?path=` endpoint works for both directory and file paths. When `path` points at a file, the response describes just that file; when it points at a directory, the response's `items` array enumerates children. -- The `GET /data-links/{id}/download?path=` endpoint returns a pre-signed URL valid for long enough to complete a typical file read. The plugin does not extend this window. -- Data-link provider identifiers returned by the Platform (`DataLinkProvider`) are safe as path segments — they contain no `/` and no characters that collide with URI reserved characters. -- The tower-api artifact (`io.seqera:tower-api:1.121.0`) already available on the plugin classpath exposes all DTOs required (`DataLinkDto`, `DataLinkContentResponse`, `DataLinkItem`, `DataLinkDownloadUrlResponse`, `DataLinkProvider`, etc.). +- The `GET /data-links/{id}/browse` endpoint (root) and `GET /data-links/{id}/browse/{path}` endpoint (sub-path) work for both directory and file paths. When the path points at a file, the response's `objects` array contains the single file entry; when it points at a directory, the array enumerates children. Both endpoints page via `nextPageToken`. +- The `GET /data-links/{id}/generate-download-url?filePath=` endpoint returns a pre-signed URL valid for long enough to complete a typical file read. The plugin does not extend this window. +- The signed URL points at the underlying cloud object (S3 / GCS / Azure). Fetching it does NOT go through `TowerClient`; it uses a plain JDK `HttpClient` so the Seqera `Authorization` header is not sent to the cloud backend (which would be rejected by AWS SigV4 and similar schemes). +- Data-link provider identifiers returned by the Platform (`DataLinkProvider`) are safe as path segments and are emitted in lowercase by `toString()` (e.g. `aws`, `google`, `azure`). User-visible paths use this lowercase form. +- The tower-api artifact (`io.seqera:tower-api:1.121.0`) already available on the plugin classpath exposes all DTOs required (`DataLinkDto`, `DataLinkContentResponse`, `DataLinkItem`, `DataLinkDownloadUrlResponse`, `DataLinkCredentials`, `DataLinkProvider`, etc.). - Data-link writes, renames, deletes, and management operations (create, update, delete the data-link entity itself) are **out of scope** for this iteration. -- Streaming a pre-signed URL reuses `TowerClient.sendStreamingRequest()` and therefore inherits the same retry/backoff behavior as dataset downloads. -- Data-link listings may be paginated; the plugin exhausts pagination before caching. +- Browse and download-URL Platform API calls reuse `TowerClient.sendApiRequest`, inheriting the existing retry/backoff policy. The cloud-side signed-URL fetch is a one-shot JDK HTTP GET with no additional retry layer beyond Nextflow task retry. +- Data-link listings may be paginated; the plugin exposes them as lazy iterators and only fetches the pages the caller consumes. A caller that reads just the first page of a browse response pays exactly one HTTP call. - No local caching across pipeline runs. Nextflow's standard task staging handles intra-run caching. - Paths are case-sensitive — matches the Platform API and the dataset filesystem. - The dataset feature's read-only filesystem stance (`isReadOnly()=true`) is preserved; data-link writes are deferred to a future iteration. ## Dependencies -- Seqera platform API (data-links endpoints: `/data-links`, `/data-links/{id}/content`, `/data-links/{id}/download`) must be accessible from the compute environment where the pipeline runs. +- Seqera platform API (data-links endpoints: `/data-links`, `/data-links/{id}`, `/data-links/{id}/browse`, `/data-links/{id}/browse/{path}`, `/data-links/{id}/generate-download-url`) must be accessible from the compute environment where the pipeline runs. - nf-tower plugin must be enabled and configured with a valid `tower.accessToken` / `TOWER_ACCESS_TOKEN`. - The Seqera account must have at least read access to the target workspace and data-link. - The existing dataset filesystem (`260310-seqera-dataset-fs`) must be merged — this feature builds on its classes and refactors them. diff --git a/specs/260422-seqera-datalinks-fs/tasks.md b/specs/260422-seqera-datalinks-fs/tasks.md index 8f32a9d939..3acf79fbec 100644 --- a/specs/260422-seqera-datalinks-fs/tasks.md +++ b/specs/260422-seqera-datalinks-fs/tasks.md @@ -2,6 +2,15 @@ **Branch**: `260422-seqera-datalinks-fs` | **Spec**: [spec.md](spec.md) | **Plan**: [plan.md](plan.md) +> **Status**: This task checklist was the initial implementation recipe. The shipped code diverges in several specifics discovered during integration — the canonical design now lives in [spec.md](spec.md) and [plan.md](plan.md). Notable refinements vs the tasks below: +> +> - Endpoints: browse uses `/data-links/{id}/browse` and `/data-links/{id}/browse/{path}` (not `/content`); signed URLs come from `/data-links/{id}/generate-download-url?filePath=…` (not `/download`). +> - Pagination is exposed as lazy iterators: `listDataLinks` → `Iterator`; `getContent` → `PagedDataLinkContent` (eager first page, lazy successors). +> - `ResourceTypeHandler.list` returns `Iterable` (not `List`) so directory streams never materialize. +> - `SeqeraPath` gained `cachedAttributes` and `resolveWithAttributes(name, attrs)` so `readAttributes` on a child of a listing short-circuits without an API call. Dataset `@version` parsing moved into `DatasetsResourceHandler`. +> - Data-link `credentialsId` is forwarded on browse and download-URL calls from `DataLinkDto.credentials[0].id`. +> - `SeqeraDataLinkClient` adds `getDataLink(ws, provider, name)` (memoized, server-side `search=` filter) and `getDataLinkProviders(ws)`. + > **For agentic workers**: execute tasks in order. Each task is self-contained and ends with a commit step. Do not skip TDD steps — write the test first, watch it fail, then make it pass. All commits use `git commit -s`. Tests use Spock with `Mock(TowerClient)` + `groovy.json.JsonOutput` fixtures — matching the style of `SeqeraDatasetClientTest` and `SeqeraFileSystemProviderTest`. No WireMock, no real HTTP. From 6a4c7f661fc92675d20793738b77cbcde08e12d3 Mon Sep 17 00:00:00 2001 From: jorgee Date: Fri, 24 Apr 2026 14:44:37 +0200 Subject: [PATCH 3/6] review fixes Signed-off-by: jorgee --- .../datalink/PagedDataLinkContent.groovy | 2 +- .../datalink/SeqeraDataLinkClient.groovy | 23 ++-- .../plugin/fs/SeqeraFileSystemProvider.groovy | 7 + .../handler/DataLinksResourceHandler.groovy | 16 ++- .../datalink/SeqeraDataLinkClientTest.groovy | 129 ++++++++++++++++++ .../fs/SeqeraFileSystemProviderTest.groovy | 32 +++++ .../DataLinksResourceHandlerTest.groovy | 27 ++++ specs/260422-seqera-datalinks-fs/spec.md | 1 + 8 files changed, 225 insertions(+), 12 deletions(-) diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/PagedDataLinkContent.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/PagedDataLinkContent.groovy index 32c10559cd..e5c5359740 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/PagedDataLinkContent.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/PagedDataLinkContent.groovy @@ -89,7 +89,7 @@ class PagedDataLinkContent implements Iterable { current = items.iterator() nextToken = page?.nextPageToken as String } catch (IOException e) { - throw new RuntimeException(e) + throw new UncheckedIOException(e) } } return true diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy index 47b1cc7fd3..5f9041b726 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy @@ -33,8 +33,6 @@ import io.seqera.tower.model.DataLinkProvider import io.seqera.tower.plugin.TowerClient import nextflow.exception.AbortOperationException -import java.nio.file.Path - /** * Typed client for Seqera Platform data-link API endpoints. * @@ -65,7 +63,8 @@ class SeqeraDataLinkClient { } /** - * Resolve a data-link providers in the given workspace. + * Distinct provider identifiers present in the workspace, sorted. + * The returned set is unmodifiable; memoized per workspace. */ @Memoized Set getDataLinkProviders(long workspaceId) { @@ -75,12 +74,17 @@ class SeqeraDataLinkClient { final p = it.next().provider?.toString() if (p) providers.add(p) } - return providers + return Collections.unmodifiableSet(providers) } /** * Resolve a data-link by {@code (provider, name)} in the given workspace. - * Iterates the API's list endpoint lazily and short-circuits on first match. + * Iterates the API's list endpoint lazily (server-side filtered by {@code name}) + * and short-circuits on first match. + * + * Memoized per {@code (workspaceId, provider, name)} tuple. Note: Groovy's + * {@code @Memoized} caches successful returns only — a path that repeatedly + * references a non-existent data-link re-runs the search each time. */ @Memoized DataLinkDto getDataLink(long workspaceId, String provider, String name) { @@ -177,7 +181,7 @@ class SeqeraDataLinkClient { private Iterator current = Collections.emptyIterator() private int offset = 0 - private long total = -1L // unknown until first fetch + private long total = -1L // -1 = unknown; set only when the server reports totalSize private boolean exhausted = false DataLinkListIterator(TowerClient towerClient, String endpoint, long workspaceId, int pageSize, String search = null) { @@ -212,8 +216,11 @@ class SeqeraDataLinkClient { final items = (json.dataLinks as List)?.collect { Map m -> mapDataLink(m) } ?: Collections.emptyList() current = items.iterator() offset += items.size() - if (total < 0) total = (json.totalSize as Long) ?: 0L - if (items.isEmpty() || offset >= total) exhausted = true + // Record the server-reported total only if present (null/missing → leave as -1 and + // rely on an empty-page response to signal exhaustion) + if (total < 0 && json.totalSize != null) total = json.totalSize as Long + // Exhausted when: this page is empty, OR we've reached the known total + if (items.isEmpty() || (total >= 0 && offset >= total)) exhausted = true } } diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy index 63fc0bfb13..19d965633d 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy @@ -197,8 +197,15 @@ class SeqeraFileSystemProvider extends FileSystemProvider { final source = entries return new DirectoryStream() { + private boolean iteratorCalled = false @Override Iterator iterator() { + // NIO contract: DirectoryStream.iterator() may be called at most once. + // For data-link listings a second iteration would also re-fetch pages 2+ + // (needlessly doubling API calls), so enforcing the contract is a win. + if (iteratorCalled) + throw new IllegalStateException("DirectoryStream.iterator() may be called at most once") + iteratorCalled = true final inner = source.iterator() if (!filter) return inner return new FilteredIterator(inner, filter) diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy index f868ba11ee..8917b0740a 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy @@ -60,7 +60,10 @@ class DataLinksResourceHandler implements ResourceTypeHandler { private final HttpClient httpClient DataLinksResourceHandler(SeqeraFileSystem fs, SeqeraDataLinkClient client) { - this(fs, client, HttpClient.newBuilder().connectTimeout(Duration.ofSeconds(10)).build()) + this(fs, client, HttpClient.newBuilder() + .connectTimeout(Duration.ofSeconds(10)) + .followRedirects(HttpClient.Redirect.NORMAL) + .build()) } /** Test-only constructor to inject a mock {@link HttpClient}. */ @@ -111,8 +114,15 @@ class DataLinksResourceHandler implements ResourceTypeHandler { if (p.cachedAttributes) return p.cachedAttributes final workspaceId = fs.resolveWorkspaceId(p.org, p.workspace) final trail = p.trail - if (trail.size() < 2) { - // data-links/ or data-links/ — always directory + if (trail.isEmpty()) { + // data-links/ — always a directory + return new SeqeraFileAttributes(true) + } + if (trail.size() == 1) { + // data-links/ — validate the provider has at least one data-link + final providers = client.getDataLinkProviders(workspaceId) + if (!providers.contains(trail[0])) + throw new NoSuchFileException(p.toString(), null, "No data-links for provider '${trail[0]}' in workspace '${p.workspace}'") return new SeqeraFileAttributes(true) } final dl = client.getDataLink(workspaceId, trail[0], trail[1]) diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/datalink/SeqeraDataLinkClientTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/datalink/SeqeraDataLinkClientTest.groovy index e57759b150..3b5f8ef798 100644 --- a/plugins/nf-tower/src/test/io/seqera/tower/plugin/datalink/SeqeraDataLinkClientTest.groovy +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/datalink/SeqeraDataLinkClientTest.groovy @@ -113,6 +113,135 @@ class SeqeraDataLinkClientTest extends Specification { !client.listDataLinks(10L).hasNext() } + // ---- getDataLink ---- + + def "getDataLink uses server-side search filter and returns first matching provider"() { + given: + def body = JsonOutput.toJson([dataLinks: [ + [id: 'dl-1', name: 'inputs', provider: 'google'], + [id: 'dl-2', name: 'inputs', provider: 'aws'] + ], totalSize: 2]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0&search=inputs") >> ok(body) + def client = new SeqeraDataLinkClient(tc) + + when: + def dl = client.getDataLink(10L, 'aws', 'inputs') + + then: + dl.id == 'dl-2' + dl.provider.toString() == 'aws' + } + + def "getDataLink throws NoSuchFileException when no matching (provider, name) is found"() { + given: + def body = JsonOutput.toJson([dataLinks: [ + [id: 'dl-1', name: 'inputs', provider: 'google'] + ], totalSize: 1]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0&search=inputs") >> ok(body) + def client = new SeqeraDataLinkClient(tc) + + when: + client.getDataLink(10L, 'aws', 'inputs') + + then: + thrown(NoSuchFileException) + } + + def "getDataLink memoizes successful lookups"() { + given: + def body = JsonOutput.toJson([dataLinks: [ + [id: 'dl-1', name: 'inputs', provider: 'aws'] + ], totalSize: 1]) + def tc = tower() + def client = new SeqeraDataLinkClient(tc) + + when: + def a = client.getDataLink(10L, 'aws', 'inputs') + def b = client.getDataLink(10L, 'aws', 'inputs') + + then: + 1 * tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0&search=inputs") >> ok(body) + a.is(b) + } + + // ---- getDataLinkProviders ---- + + def "getDataLinkProviders returns distinct sorted providers across all pages"() { + given: + def p1 = JsonOutput.toJson([dataLinks: [ + [id: 'dl-1', name: 'a', provider: 'aws'], + [id: 'dl-2', name: 'b', provider: 'google'] + ], totalSize: 3]) + def p2 = JsonOutput.toJson([dataLinks: [ + [id: 'dl-3', name: 'c', provider: 'aws'] + ], totalSize: 3]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0") >> ok(p1) + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=2") >> ok(p2) + def client = new SeqeraDataLinkClient(tc) + + when: + def providers = client.getDataLinkProviders(10L) + + then: + providers as List == ['aws', 'google'] + } + + def "getDataLinkProviders memoizes the result"() { + given: + def body = JsonOutput.toJson([dataLinks: [ + [id: 'dl-1', name: 'a', provider: 'aws'] + ], totalSize: 1]) + def tc = tower() + def client = new SeqeraDataLinkClient(tc) + + when: + def a = client.getDataLinkProviders(10L) + def b = client.getDataLinkProviders(10L) + + then: + 1 * tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0") >> ok(body) + a.is(b) + } + + def "getDataLinkProviders returns an unmodifiable Set"() { + given: + def body = JsonOutput.toJson([dataLinks: [ + [id: 'dl-1', name: 'a', provider: 'aws'] + ], totalSize: 1]) + def tc = tower() + tc.sendApiRequest(_) >> ok(body) + def client = new SeqeraDataLinkClient(tc) + + when: + client.getDataLinkProviders(10L).add('gcs') + + then: + thrown(UnsupportedOperationException) + } + + // ---- listDataLinks pagination robustness ---- + + def "listDataLinks keeps paginating when totalSize is absent until an empty page"() { + given: + def p1 = JsonOutput.toJson([dataLinks: [[id: 'dl-1', name: 'a', provider: 'aws']]]) // no totalSize + def p2 = JsonOutput.toJson([dataLinks: [[id: 'dl-2', name: 'b', provider: 'aws']]]) // no totalSize + def p3 = JsonOutput.toJson([dataLinks: []]) // empty page → exhausted + def tc = tower() + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0") >> ok(p1) + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=1") >> ok(p2) + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=2") >> ok(p3) + def client = new SeqeraDataLinkClient(tc) + + when: + def list = drain(client.listDataLinks(10L)) + + then: + list*.id == ['dl-1', 'dl-2'] + } + // ---- getContent ---- def "getContent on a sub-path uses /browse/{path}"() { diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy index a7d3cacd33..83dd83f5be 100644 --- a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy @@ -452,4 +452,36 @@ class SeqeraFileSystemProviderTest extends Specification { def ex = thrown(NoSuchFileException) ex.reason?.contains('Unsupported resource type') } + + def "readAttributes short-circuits when the SeqeraPath carries cachedAttributes (no API call)"() { + given: 'a provider with a fresh filesystem and a path carrying pre-resolved attrs' + def tc = spyTower() + def fs = buildFs(tc) + def attrs = new SeqeraFileAttributes(999L, java.time.Instant.EPOCH, java.time.Instant.EPOCH, 'cached-key') + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples').resolveWithAttributes('nested', attrs) + + when: + def got = fs.provider().readAttributes(path, java.nio.file.attribute.BasicFileAttributes) + + then: 'no workspace-cache load and no dataset/browse API calls were issued' + 0 * tc.sendApiRequest(_) + got === attrs + } + + def "newDirectoryStream.iterator() throws IllegalStateException on a second call"() { + given: + def tc = spyTower() + tc.sendApiRequest("${ENDPOINT}/user-info") >> ok(userInfoJson()) + tc.sendApiRequest("${ENDPOINT}/user/42/workspaces") >> ok(workspacesJson()) + def fs = buildFs(tc) + def wsPath = new SeqeraPath(fs, 'seqera://acme/research') + def stream = fs.provider().newDirectoryStream(wsPath, null) + + when: + stream.iterator() + stream.iterator() + + then: + thrown(IllegalStateException) + } } diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy index 1dc25a2398..dab9b880f1 100644 --- a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy @@ -271,6 +271,33 @@ class DataLinksResourceHandlerTest extends Specification { !attr.regularFile } + def "readAttributes at data-links// reports directory when provider exists"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws') + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.getDataLinkProviders(10L) >> (['aws', 'google'] as Set) + attr.directory + } + + def "readAttributes at data-links// throws when the provider has no data-links"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/azure') + + when: + handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.getDataLinkProviders(10L) >> (['aws'] as Set) + def ex = thrown(NoSuchFileException) + ex.reason?.contains("No data-links for provider 'azure'") + } + def "readAttributes at data-link root reports directory"() { given: def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs') diff --git a/specs/260422-seqera-datalinks-fs/spec.md b/specs/260422-seqera-datalinks-fs/spec.md index 129fa82661..43b782d69a 100644 --- a/specs/260422-seqera-datalinks-fs/spec.md +++ b/specs/260422-seqera-datalinks-fs/spec.md @@ -20,6 +20,7 @@ - Q: Which DTOs are introduced by this feature? → A: None. All types are reused from the `io.seqera:tower-api:1.121.0` dependency (`DataLinkDto`, `DataLinkItem`, `DataLinkProvider`, `DataLinkCredentials`, `DataLinkContentResponse`, `DataLinkDownloadUrlResponse`, etc.). A plugin-local `PagedDataLinkContent` holder class wraps the eager-first-page + lazy-pagination behavior but holds only tower-api types. - Q: Is browse-per-file supported by the Platform API? → A: Yes. `GET /data-links/{id}/browse/{path}` works for both directories and files, so `readAttributes` on any path is a single targeted call — no parent-browse-and-filter, no N+1 problem. - Q: How are paginated Platform responses returned to callers? → A: Streaming. The workspace data-link list (`GET /data-links`) returns an `Iterator` that fetches offsets on demand. The browse endpoint returns a `PagedDataLinkContent` that loads the first page eagerly (so `readAttributes` can inspect it without iterating) and fetches subsequent pages lazily as the iterator advances. The handler layer exposes `Iterable` to the NIO `DirectoryStream`; no full materialization of listings in memory. +- Q: What convenience methods does the client expose on top of the raw list endpoint? → A: Two memoized helpers — `getDataLink(ws, provider, name)` uses the server-side `&search=` filter and returns the first match (throws `NoSuchFileException` on miss); `getDataLinkProviders(ws)` returns the sorted set of distinct providers present in the workspace. Both are memoized per-arguments within a single `SeqeraDataLinkClient` instance. - Q: How are attributes discovered after a listing? → A: When `newDirectoryStream` yields a child path, the handler attaches the per-item attributes (size for files, directory marker for folders) to the `SeqeraPath` via an optional cache field. A subsequent `readAttributes` on that path returns the cached value without any additional Platform API call. Paths parsed from URIs (no prior listing) fall back to the live browse endpoint. - Q: How are cloud credentials for the underlying bucket/prefix selected? → A: The Platform's `DataLinkDto.credentials` list associates one or more credential records with a data-link. The plugin forwards the first credential's ID as the `credentialsId` query parameter on browse and download-URL requests, when present. If the data-link has no associated credentials, the parameter is omitted and the Platform uses its default resolution. - Q: Which provider-segment value appears in user-visible paths? → A: The lowercase value of the `DataLinkProvider` enum, as exposed by its `toString()` (e.g. `aws`, `google`, `azure`). This matches the Platform UI. From 412d885ba3eb49aefb45f7c84e08bec5d3889ed7 Mon Sep 17 00:00:00 2001 From: jorgee Date: Wed, 29 Apr 2026 13:40:44 +0200 Subject: [PATCH 4/6] fix read attributes, add caching and refactor iterators Signed-off-by: jorgee --- .../datalink/PagedDataLinkContent.groovy | 104 ----------- .../plugin/datalink/PagedIterable.groovy | 113 ++++++++++++ .../datalink/SeqeraDataLinkClient.groovy | 166 ++++++++---------- .../plugin/dataset/SeqeraDatasetClient.groovy | 47 ----- .../plugin/fs/ResourceTypeHandler.groovy | 5 - .../tower/plugin/fs/SeqeraFileSystem.groovy | 82 +++++++-- .../plugin/fs/SeqeraFileSystemProvider.groovy | 14 +- .../seqera/tower/plugin/fs/SeqeraPath.groovy | 19 +- .../handler/DataLinksResourceHandler.groovy | 76 ++++---- .../fs/handler/DatasetsResourceHandler.groovy | 9 - .../datalink/SeqeraDataLinkClientTest.groovy | 38 ++-- .../dataset/SeqeraDatasetClientTest.groovy | 21 --- .../fs/SeqeraFileSystemProviderTest.groovy | 8 +- .../plugin/fs/SeqeraFileSystemTest.groovy | 22 ++- .../tower/plugin/fs/SeqeraPathTest.groovy | 3 +- .../DataLinksResourceHandlerTest.groovy | 125 ++++++++++--- .../DatasetsResourceHandlerTest.groovy | 13 +- 17 files changed, 466 insertions(+), 399 deletions(-) delete mode 100644 plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/PagedDataLinkContent.groovy create mode 100644 plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/PagedIterable.groovy diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/PagedDataLinkContent.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/PagedDataLinkContent.groovy deleted file mode 100644 index e5c5359740..0000000000 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/PagedDataLinkContent.groovy +++ /dev/null @@ -1,104 +0,0 @@ -/* - * Copyright 2013-2026, Seqera Labs - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package io.seqera.tower.plugin.datalink - -import groovy.transform.CompileStatic -import io.seqera.tower.model.DataLinkItem - -/** - * Lazy, paginated view over a data-link's content. - * - * The first page is fetched eagerly by the producer so callers can inspect - * {@link #getOriginalPath()} and {@link #getFirstPage()} without triggering - * additional HTTP calls. Iterating yields items from the first page followed - * by subsequent pages fetched on demand via the injected page fetcher. - */ -@CompileStatic -class PagedDataLinkContent implements Iterable { - - /** - * Opaque page fetcher. Given a {@code nextPageToken}, returns the next page - * as a map with keys {@code objects} ({@code List}) and - * {@code nextPageToken} ({@code String}, null if no more pages). - */ - static interface PageFetcher { - Map fetch(String nextPageToken) throws IOException - } - - private final String originalPath - private final List firstPage - private final String firstPageNextToken - private final PageFetcher pageFetcher - - PagedDataLinkContent(String originalPath, - List firstPage, - String firstPageNextToken, - PageFetcher pageFetcher) { - this.originalPath = originalPath - this.firstPage = firstPage ?: Collections.emptyList() - this.firstPageNextToken = firstPageNextToken - this.pageFetcher = pageFetcher - } - - String getOriginalPath() { originalPath } - - /** First page, loaded eagerly — bounded in size by the server's page size. */ - List getFirstPage() { Collections.unmodifiableList(firstPage) } - - boolean isEmpty() { firstPage.isEmpty() && !firstPageNextToken } - - @Override - Iterator iterator() { - return new PagedIterator(firstPage, firstPageNextToken, pageFetcher) - } - - /** Lazy iterator that paginates on demand. */ - @CompileStatic - private static class PagedIterator implements Iterator { - private Iterator current - private String nextToken - private final PageFetcher fetcher - - PagedIterator(List firstPage, String firstPageNextToken, PageFetcher fetcher) { - this.current = firstPage.iterator() - this.nextToken = firstPageNextToken - this.fetcher = fetcher - } - - @Override - boolean hasNext() { - while (!current.hasNext()) { - if (!nextToken) return false - try { - final page = fetcher.fetch(nextToken) - final items = (page?.objects ?: []) as List - current = items.iterator() - nextToken = page?.nextPageToken as String - } catch (IOException e) { - throw new UncheckedIOException(e) - } - } - return true - } - - @Override - DataLinkItem next() { - if (!hasNext()) throw new NoSuchElementException() - return current.next() - } - } -} diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/PagedIterable.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/PagedIterable.groovy new file mode 100644 index 0000000000..1cd5bca15d --- /dev/null +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/PagedIterable.groovy @@ -0,0 +1,113 @@ +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.datalink + +import groovy.transform.CompileStatic + +/** + * Generic lazy paginated view over a sequence of {@code T}. + * + * The first page is fetched eagerly (via {@link #start}) so any {@link IOException} from + * the underlying source surfaces at the construction call site, not later at the first + * {@link Iterator#hasNext()} call. Subsequent pages are fetched on demand as the + * iterator advances. + * + * Pagination cursor state is captured by the {@link NextPage} implementation (closure + * variables / fields) — this base class only knows "fetch returns more" vs "exhausted". + */ +@CompileStatic +class PagedIterable implements Iterable { + + /** + * Stateful "give me the next page" callback. Implementations track their own + * cursor (offset, token, etc.) across invocations. + */ + static interface NextPageFetcher { + /** @return next page, never {@code null} (use empty items + {@code isLast=true} for end) */ + Page fetch() throws IOException + } + + @CompileStatic + static class Page { + final List items + final boolean isLast + Page(List items, boolean isLast) { + this.items = items ?: Collections.emptyList() + this.isLast = isLast + } + } + + protected final List firstPage + protected final boolean firstPageIsLast + protected final NextPageFetcher fetcher + + PagedIterable(List firstPage, boolean firstPageIsLast, NextPageFetcher fetcher) { + this.firstPage = firstPage ?: Collections.emptyList() + this.firstPageIsLast = firstPageIsLast + this.fetcher = fetcher + } + + /** Eagerly fetch the first page; later pages on demand. Throws IOException at the call site on failure. */ + static PagedIterable start(NextPageFetcher fetcher) throws IOException { + final p = fetcher.fetch() + if (p == null) return new PagedIterable(Collections.emptyList(), true, fetcher) + return new PagedIterable(p.items, p.isLast, fetcher) + } + + /** First page (eagerly loaded). */ + List getFirstPage() { Collections.unmodifiableList(firstPage) } + + boolean isEmpty() { firstPage.isEmpty() && firstPageIsLast } + + @Override + Iterator iterator() { + return new PagedIterator() + } + + /** + * Lazy iterator that yields first-page items, then advances pages on demand. + * Fetch failures are wrapped in {@link UncheckedIOException} (the {@link Iterator} + * contract does not declare {@link IOException}). + */ + @CompileStatic + private class PagedIterator implements Iterator { + private Iterator current = firstPage.iterator() + private boolean exhausted = firstPageIsLast + + @Override + boolean hasNext() { + while (!current.hasNext()) { + if (exhausted) return false + try { + final p = fetcher.fetch() + final items = p?.items ?: Collections.emptyList() + current = items.iterator() + if (p == null || p.isLast) exhausted = true + } catch (IOException e) { + throw new UncheckedIOException(e) + } + } + return true + } + + @Override + T next() { + if (!hasNext()) throw new NoSuchElementException() + return current.next() + } + } +} diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy index 5f9041b726..bcb73f1cca 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy @@ -55,11 +55,12 @@ class SeqeraDataLinkClient { /** * Lazy iterator over every data-link in the workspace. - * Pages are fetched from {@code GET /data-links?workspaceId=&max=&offset=} - * on demand as the iterator advances. + * The first page is fetched eagerly from {@code GET /data-links?workspaceId=&max=&offset=} + * (so any IOException surfaces here, not on the first {@code hasNext()}); subsequent + * pages are fetched on demand as the iterator advances. */ - Iterator listDataLinks(long workspaceId) { - return new DataLinkListIterator(towerClient, endpoint, workspaceId, LIST_PAGE_SIZE) + Iterator listDataLinks(long workspaceId) throws IOException { + return PagedIterable.start(new DataLinkListFetcher(towerClient, endpoint, workspaceId, LIST_PAGE_SIZE, null)).iterator() } /** @@ -79,54 +80,41 @@ class SeqeraDataLinkClient { /** * Resolve a data-link by {@code (provider, name)} in the given workspace. - * Iterates the API's list endpoint lazily (server-side filtered by {@code name}) - * and short-circuits on first match. + * Filters server-side using the Platform's keyword-search syntax + * ({@code search= provider:}) so the response contains at most + * the matching data-link; this method returns the first result or {@code null}. * - * Memoized per {@code (workspaceId, provider, name)} tuple. Note: Groovy's - * {@code @Memoized} caches successful returns only — a path that repeatedly - * references a non-existent data-link re-runs the search each time. + * Memoized per {@code (workspaceId, provider, name)} tuple, including {@code null} + * misses — a path that repeatedly references a non-existent data-link does not + * re-issue the search. Caveat: a data-link created on the Platform after a miss is + * cached will not be visible until a new {@link SeqeraDataLinkClient} is constructed + * (i.e. for the lifetime of a pipeline run, misses are sticky). */ @Memoized - DataLinkDto getDataLink(long workspaceId, String provider, String name) { - final Iterator it = new DataLinkListIterator(towerClient, endpoint, workspaceId, LIST_PAGE_SIZE, name) - while( it.hasNext() ) { - final dl = it.next() - if( dl.provider?.toString() == provider ) - return dl - } - throw new NoSuchFileException( - "seqera://.../data-links/${provider}/${name}", - null, - "Data-link '${name}' not found for provider '${provider}' in workspace '$workspaceId'") + DataLinkDto getDataLink(long workspaceId, String provider, String name) throws IOException { + final search = "${name} provider:${provider}".toString() + final it = PagedIterable.start(new DataLinkListFetcher(towerClient, endpoint, workspaceId, LIST_PAGE_SIZE, search)).iterator() + return it.hasNext() ? it.next() : null } /** * Browse the content of a data-link. - * The first page is fetched eagerly to populate metadata ({@code originalPath}, - * first-page items). Subsequent pages are fetched on demand as the returned - * {@link PagedDataLinkContent} is iterated. + * The first page is fetched eagerly (so any IOException surfaces here, not at the + * first iterator call); subsequent pages are fetched on demand as the returned + * {@link PagedIterable} is iterated. * * Endpoints: {@code GET /data-links/{id}/browse} (root) and * {@code GET /data-links/{id}/browse/{path}} (sub-path). * * @param credentialsId optional data-link credentials identifier (from * {@code DataLinkDto.credentials[0].id}); forwarded as a query param when set. + * @param search optional server-side prefix filter on entry names */ - PagedDataLinkContent getContent(String dataLinkId, String subPath, long workspaceId, String credentialsId = null) { + PagedIterable getContent(String dataLinkId, String subPath, long workspaceId, String credentialsId = null, String search = null) throws IOException { + log.debug("Getting content for data-link: $dataLinkId, path: $subPath, workspace: $workspaceId, credentialsId: $credentialsId") final pathSegment = subPath ? '/' + encodePath(subPath) : '' final baseUrl = "${endpoint}/data-links/${encodePath(dataLinkId)}/browse${pathSegment}" - final page = fetchBrowsePage(baseUrl, workspaceId, credentialsId, null) - final firstItems = page.objects - final firstToken = page.nextPageToken - final originalPath = page.originalPath - final fetcher = new PagedDataLinkContent.PageFetcher() { - @Override - Map fetch(String token) throws IOException { - final next = fetchBrowsePage(baseUrl, workspaceId, credentialsId, token) - return [objects: next.objects, nextPageToken: next.nextPageToken] as Map - } - } - return new PagedDataLinkContent(originalPath, firstItems, firstToken, fetcher) + return PagedIterable.start(new DataLinkContentFetcher(towerClient, baseUrl, workspaceId, credentialsId, search)) } /** {@code GET /data-links/{id}/generate-download-url?workspaceId=&filePath=[&credentialsId=]} */ @@ -142,49 +130,25 @@ class SeqeraDataLinkClient { return out } - // ---- page-fetching helpers ---- - - /** Fetch one browse page and normalize it into a {@link BrowsePage}. */ - private BrowsePage fetchBrowsePage(String baseUrl, long workspaceId, String credentialsId, String nextPageToken) { - String url = "${baseUrl}?workspaceId=${workspaceId}" - if (credentialsId) url += "&credentialsId=${encodeQuery(credentialsId)}" - if (nextPageToken) url += "&nextPageToken=${encodeQuery(nextPageToken)}" - log.debug "Fetching Browse page GET $url" - final resp = towerClient.sendApiRequest(url) - checkFsResponse(resp, url) - final json = new JsonSlurper().parseText(resp.message) as Map - final items = (json.objects as List)?.collect { Map m -> mapItem(m) } ?: Collections.emptyList() - return new BrowsePage(json.originalPath as String, items, json.nextPageToken as String) - } - - @CompileStatic - private static class BrowsePage { - final String originalPath - final List objects - final String nextPageToken - - BrowsePage(String originalPath, List objects, String nextPageToken) { - this.originalPath = originalPath - this.objects = objects - this.nextPageToken = nextPageToken - } - } + // ---- page fetchers ---- - /** Lazy iterator for the {@code /data-links} list endpoint (offset pagination). */ + /** + * {@link io.seqera.tower.plugin.datalink.PagedIterable.NextPageFetcher} for the + * {@code /data-links} list endpoint (offset pagination). + * Cursor state (offset + server-reported total) lives in instance fields. + */ @CompileStatic - private static class DataLinkListIterator implements Iterator { + private static class DataLinkListFetcher implements PagedIterable.NextPageFetcher { private final TowerClient towerClient private final String endpoint private final long workspaceId private final int pageSize private final String search - private Iterator current = Collections.emptyIterator() private int offset = 0 - private long total = -1L // -1 = unknown; set only when the server reports totalSize - private boolean exhausted = false + private long total = -1L // unknown until the server reports totalSize - DataLinkListIterator(TowerClient towerClient, String endpoint, long workspaceId, int pageSize, String search = null) { + DataLinkListFetcher(TowerClient towerClient, String endpoint, long workspaceId, int pageSize, String search) { this.towerClient = towerClient this.endpoint = endpoint this.workspaceId = workspaceId @@ -193,34 +157,56 @@ class SeqeraDataLinkClient { } @Override - boolean hasNext() { - while (!current.hasNext()) { - if (exhausted) return false - fetchNextPage() - } - return true - } - - @Override - DataLinkDto next() { - if (!hasNext()) throw new NoSuchElementException() - return current.next() - } - - private void fetchNextPage() { - final url = "${endpoint}/data-links?workspaceId=${workspaceId}&max=${pageSize}&offset=${offset}${search ? '&search='+ encodeQuery(search) :''}" + PagedIterable.Page fetch() throws IOException { + final url = "${endpoint}/data-links?workspaceId=${workspaceId}&max=${pageSize}&offset=${offset}${search ? '&search=' + encodeQuery(search) : ''}" log.debug "Fetching next list of datalinks: GET $url" final resp = towerClient.sendApiRequest(url) checkFsResponse(resp, url) final json = new JsonSlurper().parseText(resp.message) as Map final items = (json.dataLinks as List)?.collect { Map m -> mapDataLink(m) } ?: Collections.emptyList() - current = items.iterator() offset += items.size() - // Record the server-reported total only if present (null/missing → leave as -1 and - // rely on an empty-page response to signal exhaustion) if (total < 0 && json.totalSize != null) total = json.totalSize as Long - // Exhausted when: this page is empty, OR we've reached the known total - if (items.isEmpty() || (total >= 0 && offset >= total)) exhausted = true + final isLast = items.isEmpty() || (total >= 0 && offset >= total) + return new PagedIterable.Page(items, isLast) + } + } + + /** + * {@link io.seqera.tower.plugin.datalink.PagedIterable.NextPageFetcher} for a + * data-link's {@code /browse[/path]} endpoint (token pagination). + * The next-page cursor lives in the {@code nextToken} instance field. + */ + @CompileStatic + private static class DataLinkContentFetcher implements PagedIterable.NextPageFetcher { + private final TowerClient towerClient + private final String baseUrl + private final long workspaceId + private final String credentialsId + private final String search + + private String nextToken = null + + DataLinkContentFetcher(TowerClient towerClient, String baseUrl, long workspaceId, String credentialsId, String search) { + this.towerClient = towerClient + this.baseUrl = baseUrl + this.workspaceId = workspaceId + this.credentialsId = credentialsId + this.search = search + } + + @Override + PagedIterable.Page fetch() throws IOException { + String url = "${baseUrl}?workspaceId=${workspaceId}" + if (search) url += "&search=${encodeQuery(search)}" + if (credentialsId) url += "&credentialsId=${encodeQuery(credentialsId)}" + if (nextToken) url += "&nextPageToken=${encodeQuery(nextToken)}" + log.debug "Fetching Browse page GET $url" + final resp = towerClient.sendApiRequest(url) + checkFsResponse(resp, url) + final json = new JsonSlurper().parseText(resp.message) as Map + final items = (json.objects as List)?.collect { Map m -> mapItem(m) } ?: Collections.emptyList() + nextToken = json.nextPageToken as String + return new PagedIterable.Page(items, nextToken == null) } } diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/dataset/SeqeraDatasetClient.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/dataset/SeqeraDatasetClient.groovy index 143b193c23..50e0346d55 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/dataset/SeqeraDatasetClient.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/dataset/SeqeraDatasetClient.groovy @@ -29,7 +29,6 @@ import groovy.transform.CompileStatic import groovy.util.logging.Slf4j import io.seqera.tower.model.DatasetDto import io.seqera.tower.model.DatasetVersionDto -import io.seqera.tower.model.OrgAndWorkspaceDto import io.seqera.tower.plugin.TowerClient import nextflow.exception.AbortOperationException @@ -53,42 +52,6 @@ class SeqeraDatasetClient { private String getEndpoint() { towerClient.endpoint } - /** - * @return current user info (id, userName, etc.) from GET /user-info - */ - Long getUserId() { - try { - final info = towerClient.getUserInfo() - if( info?.id == null ) - throw new AbortOperationException("Unable to retrieve user ID from Seqera Platform — check your access token") - return info.id as long - }catch( UnauthorizedException e ){ - throw new AbortOperationException(e.getMessage()) - }catch( ForbiddenException e){ - throw new AccessDeniedException("${endpoint}/user-info", null, e.message) - }catch(NotFoundException e){ - throw new NoSuchFileException("${endpoint}/user-info") - } - } - - /** - * @return all orgs and workspaces accessible to the given user from GET /user/{userId}/workspaces - */ - List listUserWorkspacesAndOrgs(long userId) { - try { - final list = towerClient.listUserWorkspacesAndOrgs(userId as String) - return list.collect { m -> mapOrgAndWorkspace(m) } - } catch( UnauthorizedException e ){ - throw new AbortOperationException(e.getMessage()) - } catch( ForbiddenException e){ - throw new AccessDeniedException("${endpoint}/user/$userId/workspaces", null, e.message) - } catch(NotFoundException e){ - throw new NoSuchFileException("${endpoint}/user/$userId/workspaces") - } - } - - - /** * @return all datasets in the given workspace from GET /datasets?workspaceId={workspaceId} */ @@ -165,16 +128,6 @@ class SeqeraDatasetClient { throw new IOException("Seqera API error: HTTP ${code} for ${url}") } - private static OrgAndWorkspaceDto mapOrgAndWorkspace(Map m) { - final dto = new OrgAndWorkspaceDto() - dto.orgId = (m.orgId as Long) ?: 0L - dto.orgName = m.orgName as String - dto.workspaceId = (m.workspaceId as Long) ?: 0L - dto.workspaceName = m.workspaceName as String - dto.workspaceFullName = m.workspaceFullName as String - return dto - } - private static DatasetDto mapDataset(Map m) { final dto = new DatasetDto() dto.id = m.id as String diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/ResourceTypeHandler.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/ResourceTypeHandler.groovy index 55f0f35a64..13902b8fe7 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/ResourceTypeHandler.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/ResourceTypeHandler.groovy @@ -48,9 +48,4 @@ interface ResourceTypeHandler { */ InputStream newInputStream(SeqeraPath path) throws IOException - /** - * Verify the path exists and requested modes are satisfiable. READ is allowed; - * WRITE/EXECUTE throw {@link java.nio.file.AccessDeniedException}. - */ - void checkAccess(SeqeraPath path, AccessMode... modes) throws IOException } diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystem.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystem.groovy index 3ee200f425..6ca252c633 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystem.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystem.groovy @@ -16,6 +16,7 @@ package io.seqera.tower.plugin.fs +import java.nio.file.AccessDeniedException import java.nio.file.FileStore import java.nio.file.FileSystem import java.nio.file.NoSuchFileException @@ -28,22 +29,29 @@ import java.nio.file.spi.FileSystemProvider import groovy.transform.CompileStatic import groovy.util.logging.Slf4j import io.seqera.tower.model.OrgAndWorkspaceDto -import io.seqera.tower.plugin.dataset.SeqeraDatasetClient +import io.seqera.tower.plugin.TowerClient +import io.seqera.tower.plugin.exception.ForbiddenException +import io.seqera.tower.plugin.exception.NotFoundException +import io.seqera.tower.plugin.exception.UnauthorizedException +import nextflow.exception.AbortOperationException /** * FileSystem instance for the {@code seqera://} scheme. * One instance per {@link SeqeraFileSystemProvider}. * - * Resource-type-agnostic: the filesystem owns the org/workspace cache (shared across - * resource types) and a registry of {@link ResourceTypeHandler}s. Each handler owns - * its own API client and resource-specific caches. + * Resource-type-agnostic: the filesystem owns the user-id and org/workspace caches + * (shared across resource types) and a registry of {@link ResourceTypeHandler}s. + * Each handler owns its own API client and resource-specific caches. */ @Slf4j @CompileStatic class SeqeraFileSystem extends FileSystem { private final SeqeraFileSystemProvider provider0 - private SeqeraDatasetClient orgWorkspaceClient + private final TowerClient towerClient + + /** Cached current-user id; the user is fixed by the {@code TowerClient}'s access token. */ + private volatile Long cachedUserId /** orgName → orgId */ private final Map orgCache = new LinkedHashMap<>() @@ -55,17 +63,9 @@ class SeqeraFileSystem extends FileSystem { private volatile boolean orgWorkspaceCacheLoaded = false - SeqeraFileSystem(SeqeraFileSystemProvider provider) { + SeqeraFileSystem(SeqeraFileSystemProvider provider, TowerClient towerClient) { this.provider0 = provider - } - - /** - * Attach the dataset client used for user-info / workspaces lookup. The org/workspace - * listing uses dataset endpoints today ({@code GET /user-info}, {@code GET /user/{id}/workspaces}); - * keeping the client on the filesystem avoids duplicating it across handlers. - */ - void setOrgWorkspaceClient(SeqeraDatasetClient client) { - this.orgWorkspaceClient = client + this.towerClient = towerClient } @Override @@ -115,14 +115,58 @@ class SeqeraFileSystem extends FileSystem { throw new UnsupportedOperationException("WatchService not supported by seqera:// filesystem") } - // ---- org/workspace cache (shared across handlers) ---- + // ---- user-id / org / workspace caches (shared across handlers) ---- + + /** + * Resolve the current user's numeric ID via {@code GET /user-info}. + * Cached for the lifetime of this filesystem — the token does not change + * during a pipeline run, so neither does the resolved user. + */ + synchronized long getUserId() throws IOException { + if (cachedUserId != null) return cachedUserId + try { + final info = towerClient.getUserInfo() + if (info?.id == null) + throw new AbortOperationException("Unable to retrieve user ID from Seqera Platform — check your access token") + cachedUserId = info.id as Long + return cachedUserId + } catch (UnauthorizedException e) { + throw new AbortOperationException(e.getMessage()) + } catch (ForbiddenException e) { + throw new AccessDeniedException("${towerClient.endpoint}/user-info", null, e.message) + } catch (NotFoundException e) { + throw new NoSuchFileException("${towerClient.endpoint}/user-info") + } + } + + /** {@code GET /user/{userId}/workspaces} — reachable orgs and workspaces. */ + private List fetchUserWorkspacesAndOrgs(long userId) throws IOException { + try { + final list = towerClient.listUserWorkspacesAndOrgs(userId as String) + return list.collect { Map m -> mapOrgAndWorkspace(m) } + } catch (UnauthorizedException e) { + throw new AbortOperationException(e.getMessage()) + } catch (ForbiddenException e) { + throw new AccessDeniedException("${towerClient.endpoint}/user/$userId/workspaces", null, e.message) + } catch (NotFoundException e) { + throw new NoSuchFileException("${towerClient.endpoint}/user/$userId/workspaces") + } + } + + private static OrgAndWorkspaceDto mapOrgAndWorkspace(Map m) { + final dto = new OrgAndWorkspaceDto() + dto.orgId = (m.orgId as Long) ?: 0L + dto.orgName = m.orgName as String + dto.workspaceId = (m.workspaceId as Long) ?: 0L + dto.workspaceName = m.workspaceName as String + dto.workspaceFullName = m.workspaceFullName as String + return dto + } synchronized void loadOrgWorkspaceCache() { if (orgWorkspaceCacheLoaded) return - if (!orgWorkspaceClient) - throw new IllegalStateException("SeqeraFileSystem has no orgWorkspaceClient attached") log.debug "Loading Seqera org/workspace cache" - final entries = orgWorkspaceClient.listUserWorkspacesAndOrgs(orgWorkspaceClient.getUserId()) + final entries = fetchUserWorkspacesAndOrgs(getUserId()) for (OrgAndWorkspaceDto entry : entries) { if (entry.orgName) orgCache.put(entry.orgName, entry.orgId) diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy index 19d965633d..a13f3379b5 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy @@ -75,10 +75,8 @@ class SeqeraFileSystemProvider extends FileSystemProvider { final TowerClient tc = TowerFactory.client() if (!tc) throw new IllegalStateException("File system `seqera://` requires the Seqera Platform access token — use `tower.accessToken` config option or TOWER_ACCESS_TOKEN env variable") - final datasetClient = new SeqeraDatasetClient(tc) - fileSystem = new SeqeraFileSystem(this) - fileSystem.setOrgWorkspaceClient(datasetClient) - fileSystem.registerHandler(new DatasetsResourceHandler(fileSystem, datasetClient)) + fileSystem = new SeqeraFileSystem(this, tc) + fileSystem.registerHandler(new DatasetsResourceHandler(fileSystem, new SeqeraDatasetClient(tc))) fileSystem.registerHandler(new DataLinksResourceHandler(fileSystem, new SeqeraDataLinkClient(tc))) return fileSystem } @@ -136,12 +134,14 @@ class SeqeraFileSystemProvider extends FileSystemProvider { final d = sp.depth() if (d < 3) { validateSharedDirectoryExists(fs, sp) - return (A) new SeqeraFileAttributes(true) + sp.cachedAttributes = new SeqeraFileAttributes(true) + return (A) sp.cachedAttributes } final h = fs.getHandler(sp.resourceType) if (!h) throw new NoSuchFileException(path.toString(), null, "Unsupported resource type: ${sp.resourceType}") - return (A) h.readAttributes(sp) + sp.cachedAttributes = h.readAttributes(sp) + return (A) sp.cachedAttributes } @Override @@ -168,7 +168,7 @@ class SeqeraFileSystemProvider extends FileSystemProvider { final h = fs.getHandler(sp.resourceType) if (!h) throw new NoSuchFileException(path.toString(), null, "Unsupported resource type: ${sp.resourceType}") - h.checkAccess(sp, modes) + h.readAttributes(sp) } // ---- directory stream ---- diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraPath.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraPath.groovy index 2da5dcfdde..f2f71f0f29 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraPath.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraPath.groovy @@ -16,6 +16,8 @@ package io.seqera.tower.plugin.fs +import groovy.transform.PackageScope + import java.nio.file.FileSystem import java.nio.file.InvalidPathException import java.nio.file.LinkOption @@ -59,11 +61,15 @@ class SeqeraPath implements Path { private final String relPath /** * Optional attributes attached when this path was produced by a directory listing, - * so {@code readAttributes()} can return them without a follow-up API call. - * Not part of the URI — does not affect {@link #equals}, {@link #hashCode}, - * {@link #toString}, {@link #toUri}, or propagation via {@link #resolve} / {@link #getParent}. + * or set by {@link SeqeraFileSystemProvider#readAttributes} after a fresh resolution + * so subsequent reads on the same path instance hit the cache. Not part of the URI — + * does not affect {@link #equals}, {@link #hashCode}, {@link #toString}, {@link #toUri}, + * or propagation via {@link #resolve} / {@link #getParent}. + * + * Marked {@code volatile} so the publication of a freshly-resolved value is visible + * to other threads that call {@link #getCachedAttributes()} on the same path. */ - private final SeqeraFileAttributes cachedAttributes + private volatile SeqeraFileAttributes cachedAttributes /** Parse a {@code seqera://} URI string. */ SeqeraPath(SeqeraFileSystem fs, String uriString) { @@ -166,6 +172,11 @@ class SeqeraPath implements Path { List getTrail() { trail } SeqeraFileAttributes getCachedAttributes() { cachedAttributes } + /** Package-scope: only same-package callers (e.g. {@link SeqeraFileSystemProvider}) cache attrs. */ + @PackageScope + void setCachedAttributes(SeqeraFileAttributes attributes) { + this.cachedAttributes = attributes + } /** * Resolve a child segment and attach the given attributes to the resulting path. * Used by directory-listing code paths so follow-up {@code readAttributes()} calls diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy index 8917b0740a..af4626433a 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy @@ -31,7 +31,7 @@ import groovy.util.logging.Slf4j import io.seqera.tower.model.DataLinkDto import io.seqera.tower.model.DataLinkItem import io.seqera.tower.model.DataLinkItemType -import io.seqera.tower.plugin.datalink.PagedDataLinkContent +import io.seqera.tower.plugin.datalink.PagedIterable import io.seqera.tower.plugin.datalink.SeqeraDataLinkClient import io.seqera.tower.plugin.fs.ResourceTypeHandler import io.seqera.tower.plugin.fs.SeqeraFileAttributes @@ -101,7 +101,7 @@ class DataLinksResourceHandler implements ResourceTypeHandler { } // trail.size() >= 2 — browse inside a specific data-link. // Content can be very large, so we stream it lazily. - final dl = client.getDataLink(workspaceId, trail[0], trail[1]) + final dl = requireDataLink(workspaceId, trail[0], trail[1], dir) final subPath = trail.size() > 2 ? trail.subList(2, trail.size()).join('/') : '' log.debug("Listing files for $dl.name path $subPath") final content = client.getContent(dl.id, subPath, workspaceId, credentialsIdOf(dl)) @@ -125,12 +125,11 @@ class DataLinksResourceHandler implements ResourceTypeHandler { throw new NoSuchFileException(p.toString(), null, "No data-links for provider '${trail[0]}' in workspace '${p.workspace}'") return new SeqeraFileAttributes(true) } - final dl = client.getDataLink(workspaceId, trail[0], trail[1]) + final dl = requireDataLink(workspaceId, trail[0], trail[1], p) if (trail.size() == 2) return new SeqeraFileAttributes(true) // data-link root final subPath = trail.subList(2, trail.size()).join('/') log.debug("Reading attributes for $p") - final content = client.getContent(dl.id, subPath, workspaceId, credentialsIdOf(dl)) - return attributesFor(content, subPath, p) + return resolveAttrsViaParent(dl, subPath, workspaceId, p) } @Override @@ -138,7 +137,7 @@ class DataLinksResourceHandler implements ResourceTypeHandler { if (p.trail.size() < 3) throw new IllegalArgumentException("newInputStream requires a file path inside a data-link: $p") final workspaceId = fs.resolveWorkspaceId(p.org, p.workspace) - final dl = client.getDataLink(workspaceId, p.trail[0], p.trail[1]) + final dl = requireDataLink(workspaceId, p.trail[0], p.trail[1], p) final subPath = p.trail.subList(2, p.trail.size()).join('/') final urlResp = client.getDownloadUrl(dl.id, subPath, workspaceId, credentialsIdOf(dl)) if (!urlResp.url) @@ -152,33 +151,50 @@ class DataLinksResourceHandler implements ResourceTypeHandler { return (creds && !creds.isEmpty()) ? creds[0].id : null } - @Override - void checkAccess(SeqeraPath p, AccessMode... modes) throws IOException { - for (AccessMode m : modes) { - if (m == AccessMode.WRITE || m == AccessMode.EXECUTE) - throw new AccessDeniedException(p.toString(), null, "seqera:// data-links are read-only") - } - readAttributes(p) + /** + * Resolve a data-link by (provider, name) and throw {@link NoSuchFileException} + * with a uniform error message when missing. Wraps {@link SeqeraDataLinkClient#getDataLink} + * (which returns {@code null} on miss). + */ + private DataLinkDto requireDataLink(long workspaceId, String provider, String name, SeqeraPath pathForErrors) throws NoSuchFileException { + final dl = client.getDataLink(workspaceId, provider, name) + if (!dl) + throw new NoSuchFileException(pathForErrors.toString(), null, "Data-link '${name}' not found for provider '${provider}' in workspace '$workspaceId'") + return dl } // ---- private helpers ---- /** - * Decide whether {@code subPath} refers to a file or a directory by inspecting - * only the first page of the content response — never paginates further. + * Determine attributes for a path inside a data-link by listing the path's parent + * directory and finding the entry by name. The entry's {@code type} (FILE/FOLDER) + * is the authoritative file-vs-directory signal; absence of the entry means the + * path does not exist. + * + * The Platform's {@code /browse/{path}} response does not reliably distinguish + * "file path", "directory path", and "missing path" by itself ({@code originalPath} + * is populated in all three), so we always query the parent. + * + * Cost: one API call (or more if the parent listing paginates and the entry isn't + * on the first page). Iteration is short-circuited as soon as the entry is found. */ - private static SeqeraFileAttributes attributesFor(PagedDataLinkContent content, String subPath, SeqeraPath pathForErrors) throws NoSuchFileException { - final firstPage = content.firstPage - final lastSeg = subPath.contains('/') ? subPath.substring(subPath.lastIndexOf('/') + 1) : subPath - // Single-file response: one FILE item whose name matches the final segment - final single = firstPage.find { DataLinkItem it -> it.name == lastSeg && it.type == DataLinkItemType.FILE } - if (single) - return new SeqeraFileAttributes(single.size ?: 0L, Instant.EPOCH, Instant.EPOCH, pathForErrors.toString()) - // If there are children, this is a directory listing - if (!firstPage.isEmpty()) return new SeqeraFileAttributes(true) - // No items AND no originalPath → path does not exist - if (!content.originalPath) - throw new NoSuchFileException(pathForErrors.toString(), null, "Path not found inside data-link") + private SeqeraFileAttributes resolveAttrsViaParent(DataLinkDto dl, String subPath, long workspaceId, SeqeraPath pathForErrors) throws IOException { + final lastSlash = subPath.lastIndexOf('/') + final parentSubPath = lastSlash > 0 ? subPath.substring(0, lastSlash) : '' + final lastSeg = lastSlash > 0 ? subPath.substring(lastSlash + 1) : subPath + log.debug("Looking for $lastSeg in data-link: ${dl.id} path: $parentSubPath") + final parent = client.getContent(dl.id, parentSubPath, workspaceId, credentialsIdOf(dl), lastSeg) + DataLinkItem found = null + for (DataLinkItem it : parent) { + log.trace("Item: $it") + if (it.name == lastSeg || it.name == lastSeg + '/') { found = it; break } + } + if (found == null) + throw new NoSuchFileException(pathForErrors.toString(), null, "Path '${subPath}' not found inside data-link '${dl.name}'") + if (found.type == DataLinkItemType.FILE) { + final long size = (found.size != null) ? found.size.longValue() : 0L + return new SeqeraFileAttributes(size, Instant.EPOCH, Instant.EPOCH, pathForErrors.toString()) + } return new SeqeraFileAttributes(true) } @@ -201,17 +217,17 @@ class DataLinksResourceHandler implements ResourceTypeHandler { /** * Lazy {@link Iterable} that maps each {@link DataLinkItem} from a - * {@link PagedDataLinkContent} to a child {@link SeqeraPath} under + * {@link PagedIterable} to a child {@link SeqeraPath} under * {@code parent}. Each produced path carries cached attributes built from the * item, so a follow-up {@code readAttributes()} call does not re-browse the * Platform. Pages are fetched on demand as the iterator advances. */ @CompileStatic private static class PathMappingIterable implements Iterable { - private final PagedDataLinkContent content + private final PagedIterable content private final SeqeraPath parent - PathMappingIterable(PagedDataLinkContent content, SeqeraPath parent) { + PathMappingIterable(PagedIterable content, SeqeraPath parent) { this.content = content this.parent = parent } diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandler.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandler.groovy index ef9c50785e..9fbe07b104 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandler.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandler.groovy @@ -111,15 +111,6 @@ class DatasetsResourceHandler implements ResourceTypeHandler { return client.downloadDataset(dataset.id, String.valueOf(version.version), version.fileName, dataset.workspaceId) } - @Override - void checkAccess(SeqeraPath p, AccessMode... modes) throws IOException { - for (AccessMode m : modes) { - if (m == AccessMode.WRITE || m == AccessMode.EXECUTE) - throw new AccessDeniedException(p.toString(), null, "seqera:// datasets are read-only") - } - readAttributes(p) - } - // ---- helpers ---- /** diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/datalink/SeqeraDataLinkClientTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/datalink/SeqeraDataLinkClientTest.groovy index 3b5f8ef798..c0eac0164b 100644 --- a/plugins/nf-tower/src/test/io/seqera/tower/plugin/datalink/SeqeraDataLinkClientTest.groovy +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/datalink/SeqeraDataLinkClientTest.groovy @@ -115,14 +115,14 @@ class SeqeraDataLinkClientTest extends Specification { // ---- getDataLink ---- - def "getDataLink uses server-side search filter and returns first matching provider"() { - given: + def "getDataLink uses server-side keyword search (name + provider:

) and returns the first match"() { + given: 'the server returns the single data-link matching both keywords' def body = JsonOutput.toJson([dataLinks: [ - [id: 'dl-1', name: 'inputs', provider: 'google'], [id: 'dl-2', name: 'inputs', provider: 'aws'] - ], totalSize: 2]) + ], totalSize: 1]) def tc = tower() - tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0&search=inputs") >> ok(body) + // URL-encoded: ' ' → '+', ':' → '%3A' + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0&search=inputs+provider%3Aaws") >> ok(body) def client = new SeqeraDataLinkClient(tc) when: @@ -133,20 +133,30 @@ class SeqeraDataLinkClientTest extends Specification { dl.provider.toString() == 'aws' } - def "getDataLink throws NoSuchFileException when no matching (provider, name) is found"() { + def "getDataLink returns null when no matching (provider, name) is found"() { given: - def body = JsonOutput.toJson([dataLinks: [ - [id: 'dl-1', name: 'inputs', provider: 'google'] - ], totalSize: 1]) + def body = JsonOutput.toJson([dataLinks: [], totalSize: 0]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0&search=inputs+provider%3Aaws") >> ok(body) + def client = new SeqeraDataLinkClient(tc) + + expect: + client.getDataLink(10L, 'aws', 'inputs') == null + } + + def "getDataLink memoizes null misses (no second API call)"() { + given: + def body = JsonOutput.toJson([dataLinks: [], totalSize: 0]) def tc = tower() - tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0&search=inputs") >> ok(body) def client = new SeqeraDataLinkClient(tc) when: - client.getDataLink(10L, 'aws', 'inputs') + def a = client.getDataLink(10L, 'aws', 'ghost') + def b = client.getDataLink(10L, 'aws', 'ghost') then: - thrown(NoSuchFileException) + 1 * tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0&search=ghost+provider%3Aaws") >> ok(body) + a == null && b == null } def "getDataLink memoizes successful lookups"() { @@ -162,7 +172,7 @@ class SeqeraDataLinkClientTest extends Specification { def b = client.getDataLink(10L, 'aws', 'inputs') then: - 1 * tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0&search=inputs") >> ok(body) + 1 * tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0&search=inputs+provider%3Aaws") >> ok(body) a.is(b) } @@ -247,7 +257,6 @@ class SeqeraDataLinkClientTest extends Specification { def "getContent on a sub-path uses /browse/{path}"() { given: def body = JsonOutput.toJson([ - originalPath: 'reads/', objects: [ [name: 'a.fq', type: 'FILE', size: 123, mimeType: 'application/gzip'], [name: 'b.fq', type: 'FILE', size: 456, mimeType: 'application/gzip'] @@ -260,7 +269,6 @@ class SeqeraDataLinkClientTest extends Specification { def resp = client.getContent('dl-1', 'reads/', 10L) then: - resp.originalPath == 'reads/' resp.firstPage.size() == 2 resp.firstPage[0].name == 'a.fq' resp.firstPage[0].size == 123L diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/dataset/SeqeraDatasetClientTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/dataset/SeqeraDatasetClientTest.groovy index 25884bf3fa..4447344841 100644 --- a/plugins/nf-tower/src/test/io/seqera/tower/plugin/dataset/SeqeraDatasetClientTest.groovy +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/dataset/SeqeraDatasetClientTest.groovy @@ -51,27 +51,6 @@ class SeqeraDatasetClientTest extends Specification { new TowerClient.Response(code, "error $code") } - // ---- listUserWorkspacesAndOrgs ---- - - def "listUserWorkspacesAndOrgs returns parsed DTOs"() { - given: - def body = JsonOutput.toJson([orgsAndWorkspaces: [ - [orgId: 1, orgName: 'acme', workspaceId: 10, workspaceName: 'research', workspaceFullName: 'acme/research'] - ]]) - def tc = spyTower() - tc.sendApiRequest('https://api.example.com/user/42/workspaces') >> ok(body) - def client = new SeqeraDatasetClient(tc) - - when: - def list = client.listUserWorkspacesAndOrgs(42L) - - then: - list.size() == 1 - list[0].orgName == 'acme' - list[0].workspaceId == 10L - list[0].workspaceName == 'research' - } - // ---- listDatasets ---- def "listDatasets returns parsed DatasetDto list"() { diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy index 83dd83f5be..82ebadfc3d 100644 --- a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy @@ -53,8 +53,7 @@ class SeqeraFileSystemProviderTest extends Specification { private SeqeraFileSystem buildFs(TowerClient tc) { final client = new SeqeraDatasetClient(tc) final provider = new SeqeraFileSystemProvider() - final fs = new SeqeraFileSystem(provider) - fs.setOrgWorkspaceClient(client) + final fs = new SeqeraFileSystem(provider, tc) fs.registerHandler(new io.seqera.tower.plugin.fs.handler.DatasetsResourceHandler(fs, client)) return fs } @@ -403,7 +402,7 @@ class SeqeraFileSystemProviderTest extends Specification { def "newFileSystem throws FileSystemAlreadyExistsException when filesystem exists"() { given: 'a provider with an existing filesystem' def provider = new SeqeraFileSystemProvider() - def fs = new SeqeraFileSystem(provider) + def fs = new SeqeraFileSystem(provider, Mock(TowerClient)) provider.@fileSystem = fs when: @@ -421,8 +420,7 @@ class SeqeraFileSystemProviderTest extends Specification { tc.sendApiRequest("${ENDPOINT}/user-info") >> ok(userInfoJson()) tc.sendApiRequest("${ENDPOINT}/user/42/workspaces") >> ok(workspacesJson()) def datasetClient = new SeqeraDatasetClient(tc) - def fs = new SeqeraFileSystem(new SeqeraFileSystemProvider()) - fs.setOrgWorkspaceClient(datasetClient) + def fs = new SeqeraFileSystem(new SeqeraFileSystemProvider(), tc) fs.registerHandler(new io.seqera.tower.plugin.fs.handler.DatasetsResourceHandler(fs, datasetClient)) fs.registerHandler(new io.seqera.tower.plugin.fs.handler.DataLinksResourceHandler(fs, new io.seqera.tower.plugin.datalink.SeqeraDataLinkClient(tc))) def wsPath = new SeqeraPath(fs, 'seqera://acme/research') diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemTest.groovy index b600177e69..a320ca96e7 100644 --- a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemTest.groovy +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemTest.groovy @@ -20,7 +20,6 @@ import java.nio.file.NoSuchFileException import groovy.json.JsonOutput import io.seqera.tower.plugin.TowerClient -import io.seqera.tower.plugin.dataset.SeqeraDatasetClient import spock.lang.Specification /** @@ -54,9 +53,7 @@ class SeqeraFileSystemTest extends Specification { } private SeqeraFileSystem buildFs(TowerClient tc) { - final fs = new SeqeraFileSystem(new SeqeraFileSystemProvider()) - fs.setOrgWorkspaceClient(new SeqeraDatasetClient(tc)) - return fs + return new SeqeraFileSystem(new SeqeraFileSystemProvider(), tc) } // ---- cache loading ---- @@ -75,6 +72,21 @@ class SeqeraFileSystemTest extends Specification { 1 * tc.sendApiRequest("${ENDPOINT}/user/42/workspaces") >> ok(workspacesJson()) } + def "getUserId is cached across multiple calls (single /user-info request)"() { + given: + def tc = spyTower() + final fs = buildFs(tc) + + when: + def first = fs.getUserId() + def second = fs.getUserId() + + then: + 1 * tc.sendApiRequest("${ENDPOINT}/user-info") >> ok(userInfoJson()) + first == 42L + second == 42L + } + def "listOrgNames returns distinct org names from cache"() { given: def tc = spyTower() @@ -154,7 +166,7 @@ class SeqeraFileSystemTest extends Specification { def "registerHandler stores and looks up by resource type"() { given: - def fs = new SeqeraFileSystem(new SeqeraFileSystemProvider()) + def fs = new SeqeraFileSystem(new SeqeraFileSystemProvider(), Mock(TowerClient)) def handler = Mock(ResourceTypeHandler) { getResourceType() >> 'datasets' } diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraPathTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraPathTest.groovy index 6183465f7d..fc217bbb7a 100644 --- a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraPathTest.groovy +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraPathTest.groovy @@ -16,6 +16,7 @@ package io.seqera.tower.plugin.fs +import io.seqera.tower.plugin.TowerClient import spock.lang.Specification /** @@ -25,7 +26,7 @@ class SeqeraPathTest extends Specification { private SeqeraFileSystem mockFs() { def provider = new SeqeraFileSystemProvider() - return new SeqeraFileSystem(provider) + return new SeqeraFileSystem(provider, Mock(TowerClient)) } // ---- depth / segment accessors ---- diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy index dab9b880f1..f94e905eec 100644 --- a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy @@ -18,7 +18,6 @@ package io.seqera.tower.plugin.fs.handler import java.net.http.HttpClient import java.net.http.HttpResponse -import java.nio.file.AccessMode import java.nio.file.NoSuchFileException import java.nio.file.Path @@ -28,7 +27,7 @@ import io.seqera.tower.model.DataLinkItem import io.seqera.tower.model.DataLinkItemType import io.seqera.tower.model.DataLinkProvider import io.seqera.tower.model.DataLinkDownloadUrlResponse -import io.seqera.tower.plugin.datalink.PagedDataLinkContent +import io.seqera.tower.plugin.datalink.PagedIterable import io.seqera.tower.plugin.datalink.SeqeraDataLinkClient import io.seqera.tower.plugin.fs.SeqeraFileSystem import io.seqera.tower.plugin.fs.SeqeraPath @@ -55,9 +54,12 @@ class DataLinksResourceHandlerTest extends Specification { def i = new DataLinkItem(); i.name = name; i.type = t; i.size = size; return i } - private static PagedDataLinkContent pagedContent(List items, String originalPath = null) { - return new PagedDataLinkContent(originalPath, items, null, new PagedDataLinkContent.PageFetcher() { - Map fetch(String t) { throw new IllegalStateException('no more pages') } + private static PagedIterable pagedContent(List items) { + // Single-page test fixture: firstPage already loaded, no further pages. + return new PagedIterable(items, true, new PagedIterable.NextPageFetcher() { + PagedIterable.Page fetch() { + throw new IllegalStateException('no more pages') + } }) } @@ -122,7 +124,7 @@ class DataLinksResourceHandlerTest extends Specification { then: 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L - 1 * client.getDataLink(10L, 'aws', 'unknown') >> { throw new NoSuchFileException("data-link not found") } + 1 * client.getDataLink(10L, 'aws', 'unknown') >> null thrown(NoSuchFileException) } @@ -311,7 +313,7 @@ class DataLinksResourceHandlerTest extends Specification { attr.directory } - def "readAttributes on a file sub-path reports file with size"() { + def "readAttributes on a deep file sub-path queries the parent dir and finds the entry"() { given: def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/reads/a.fq') @@ -321,14 +323,33 @@ class DataLinksResourceHandlerTest extends Specification { then: 1 * fs.resolveWorkspaceId(_, _) >> 10L 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS) - 1 * client.getContent('dl-1', 'reads/a.fq', 10L, null) >> pagedContent([ - item('a.fq', DataLinkItemType.FILE, 123) + 1 * client.getContent('dl-1', 'reads', 10L, null, 'a.fq') >> pagedContent([ + item('a.fq', DataLinkItemType.FILE, 123), + item('b.fq', DataLinkItemType.FILE, 456) ]) attr.regularFile attr.size() == 123L } - def "readAttributes on a directory sub-path reports directory"() { + def "readAttributes on a top-level file queries the data-link root and finds the entry"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/samplesheet.csv') + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS) + 1 * client.getContent('dl-1', '', 10L, null, 'samplesheet.csv') >> pagedContent([ + item('reads', DataLinkItemType.FOLDER, 0), + item('samplesheet.csv', DataLinkItemType.FILE, 999) + ]) + attr.regularFile + attr.size() == 999L + } + + def "readAttributes on a directory sub-path queries the parent dir and reports directory"() { given: def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/reads') @@ -338,10 +359,72 @@ class DataLinksResourceHandlerTest extends Specification { then: 1 * fs.resolveWorkspaceId(_, _) >> 10L 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS) - 1 * client.getContent('dl-1', 'reads', 10L, null) >> pagedContent( - [item('a.fq', DataLinkItemType.FILE, 1), item('b.fq', DataLinkItemType.FILE, 2)], - 'reads/') + 1 * client.getContent('dl-1', '', 10L, null, 'reads') >> pagedContent([ + item('reads', DataLinkItemType.FOLDER, 0), + item('samplesheet.csv', DataLinkItemType.FILE, 1) + ]) + attr.directory + } + + def "readAttributes on a non-existent sub-path throws NoSuchFileException"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/reads/missing.fq') + + when: + handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS) + 1 * client.getContent('dl-1', 'reads', 10L, null, 'missing.fq') >> pagedContent([ + item('a.fq', DataLinkItemType.FILE, 1) + ]) + thrown(NoSuchFileException) + } + + def "readAttributes distinguishes a directory and a same-named file inside it (deferred #1 case)"() { + // Directory 'foo' contains a file 'foo'. Path .../bucket/foo refers to the dir. + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/bucket/foo') + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.getDataLink(10L, 'aws', 'bucket') >> dl('dl-1', 'bucket', DataLinkProvider.AWS) + // root listing shows 'foo' as a folder + 1 * client.getContent('dl-1', '', 10L, null, 'foo') >> pagedContent([ + item('foo', DataLinkItemType.FOLDER, 0) + ]) attr.directory + + when: 'now query the inner file at .../bucket/foo/foo' + def innerPath = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/bucket/foo/foo') + def innerAttr = handler.readAttributes(innerPath) + + then: 'parent dir is foo, lastSeg is foo, found as FILE' + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.getDataLink(10L, 'aws', 'bucket') >> dl('dl-1', 'bucket', DataLinkProvider.AWS) + 1 * client.getContent('dl-1', 'foo', 10L, null, 'foo') >> pagedContent([ + item('foo', DataLinkItemType.FILE, 42) + ]) + innerAttr.regularFile + innerAttr.size() == 42L + } + + def "readAttributes returns NoSuchFileException when the data-link is missing"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/ghost/a.fq') + + when: + handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.getDataLink(10L, 'aws', 'ghost') >> null + def ex = thrown(NoSuchFileException) + ex.reason?.contains("Data-link 'ghost' not found") } def "readAttributes short-circuits when path has cached attributes (no API call)"() { @@ -387,7 +470,7 @@ class DataLinksResourceHandlerTest extends Specification { then: 1 * fs.resolveWorkspaceId(_, _) >> 10L - 1 * client.getDataLink(10L, 'aws', 'ghost') >> { throw new NoSuchFileException("not found") } + 1 * client.getDataLink(10L, 'aws', 'ghost') >> null thrown(NoSuchFileException) } @@ -401,18 +484,8 @@ class DataLinksResourceHandlerTest extends Specification { then: 1 * fs.resolveWorkspaceId(_, _) >> 10L 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS) - 1 * client.getContent('dl-1', 'does/not/exist', 10L, null) >> pagedContent([]) + // parent listing of 'does/not' has no 'exist' entry + 1 * client.getContent('dl-1', 'does/not', 10L, null, 'exist') >> pagedContent([]) thrown(NoSuchFileException) } - - def "checkAccess with WRITE is rejected"() { - given: - def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/a.fq') - - when: - handler.checkAccess(path, AccessMode.WRITE) - - then: - thrown(java.nio.file.AccessDeniedException) - } } diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandlerTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandlerTest.groovy index 752091c2d6..2c9cb425eb 100644 --- a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandlerTest.groovy +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandlerTest.groovy @@ -16,7 +16,6 @@ package io.seqera.tower.plugin.fs.handler -import java.nio.file.AccessMode import java.nio.file.NoSuchFileException import io.seqera.tower.model.DatasetDto @@ -220,14 +219,6 @@ class DatasetsResourceHandlerTest extends Specification { got === attrs } - def "checkAccess rejects WRITE"() { - given: - def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples') - - when: - handler.checkAccess(path, AccessMode.WRITE) - - then: - thrown(java.nio.file.AccessDeniedException) - } + // WRITE/EXECUTE rejection is now enforced at the SeqeraFileSystemProvider level + // (handler.checkAccess was removed from ResourceTypeHandler). See SeqeraFileSystemProviderTest. } From 4d8f87a62208547a09e975e6c5504b571ad0d954 Mon Sep 17 00:00:00 2001 From: jorgee Date: Wed, 29 Apr 2026 13:48:36 +0200 Subject: [PATCH 5/6] add integration test for data-links Signed-off-by: jorgee --- tests/checks/seqera-data-links.nf/.checks | 16 ++++++++++ tests/seqera-data-links.nf | 36 +++++++++++++++++++++++ 2 files changed, 52 insertions(+) create mode 100644 tests/checks/seqera-data-links.nf/.checks create mode 100644 tests/seqera-data-links.nf diff --git a/tests/checks/seqera-data-links.nf/.checks b/tests/checks/seqera-data-links.nf/.checks new file mode 100644 index 0000000000..81432b0d79 --- /dev/null +++ b/tests/checks/seqera-data-links.nf/.checks @@ -0,0 +1,16 @@ +set -e +export NXF_PLUGINS_DEFAULT=nf-tower + +# Skip test if Seqera Platform token is missing +if [[ ! $TOWER_ACCESS_TOKEN ]]; then + echo "Skip seqera-dataset test since TOWER_ACCESS_TOKEN is not available" + exit 0 +fi + +# +# run normal mode +# +$NXF_RUN | tee stdout + +[[ `grep INFO .nextflow.log | grep -c 'Submitted process > TEST'` == 3 ]] || false + diff --git a/tests/seqera-data-links.nf b/tests/seqera-data-links.nf new file mode 100644 index 0000000000..6ce9c0538e --- /dev/null +++ b/tests/seqera-data-links.nf @@ -0,0 +1,36 @@ +#!/usr/bin/env nextflow +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +params.data_link_aws = 'seqera://seqeralabs/automated-testing/data-links/aws/1000genomes/README.alignment_data' +params.data_link_google = 'seqera://seqeralabs/automated-testing/data-links/google/nf-core-gcpmegatests/test-data/rnaseq/README' +params.data_link_azure = 'seqera://seqeralabs/automated-testing/data-links/azure/seqeralabs-showcase/results/pipeline_info/nf_core_rnaseq_software_mqc_versions.yml' + +process TEST { + input: + path(file) + output: + stdout + script: + """ + ls $file + """ +} + +workflow { + ch = channel.fromList([file(params.data_link_aws), file(params.data_link_google), file(params.data_link_azure)]) + TEST(ch).view() +} From 3d7e515d9e988de2c493d3d4a1d307e933fe3c4d Mon Sep 17 00:00:00 2001 From: jorgee Date: Wed, 29 Apr 2026 14:03:36 +0200 Subject: [PATCH 6/6] update adr and specs [ci skip] Signed-off-by: jorgee --- adr/20260422-seqera-datalinks-filesystem.md | 54 ++++---- specs/260422-seqera-datalinks-fs/plan.md | 134 ++++++++++++-------- specs/260422-seqera-datalinks-fs/spec.md | 19 +-- 3 files changed, 122 insertions(+), 85 deletions(-) diff --git a/adr/20260422-seqera-datalinks-filesystem.md b/adr/20260422-seqera-datalinks-filesystem.md index dc290f26b6..b77e80503d 100644 --- a/adr/20260422-seqera-datalinks-filesystem.md +++ b/adr/20260422-seqera-datalinks-filesystem.md @@ -104,7 +104,7 @@ Three structural differences from datasets: 2. **Arbitrary sub-path depth** below the data-link root. Each segment is a folder or file inside the underlying bucket. 3. **No version pinning** — data-link content is not versioned by the Platform. Content is always "current". -`ResourceTypeHandler.getIdentitySegmentCount()` encodes the difference: 1 for datasets, 2 for data-links. `SeqeraPath` treats everything after the identity segments as the handler-owned sub-path. +`SeqeraPath` stores trail segments verbatim (everything after `resourceType`); each handler interprets them. Datasets accept exactly one trail segment (the dataset name, optionally with an `@version` suffix); data-links accept two identity segments (`/`) plus arbitrary further sub-path. ### Component Structure @@ -112,49 +112,53 @@ Three structural differences from datasets: plugins/nf-tower/src/main/io/seqera/tower/plugin/ ├── fs/ ← generic NIO layer (refactored) │ ├── SeqeraFileSystemProvider ← dispatches by resourceType to handler -│ ├── SeqeraFileSystem ← org/ws cache + handler registry -│ ├── SeqeraPath ← generic segment list (identity + sub-path) -│ ├── SeqeraFileAttributes ← plain (isDir, size, lastModified) holder +│ ├── SeqeraFileSystem ← holds TowerClient; cached getUserId; org/ws cache; handler registry +│ ├── SeqeraPath ← generic segment list; volatile cachedAttributes; resolveWithAttributes +│ ├── SeqeraFileAttributes ← plain (isDir, size, lastModified, created, fileKey) holder │ ├── SeqeraPathFactory ← unchanged │ ├── DatasetInputStream ← unchanged -│ ├── ResourceTypeHandler ← NEW interface +│ ├── ResourceTypeHandler ← NEW interface (no checkAccess — provider enforces it) │ └── handler/ -│ ├── DatasetsResourceHandler ← NEW — dataset logic extracted here -│ └── DataLinksResourceHandler ← NEW +│ ├── DatasetsResourceHandler ← NEW — dataset logic extracted; owns dataset caches; parses @version +│ └── DataLinksResourceHandler ← NEW — parent-browse readAttributes; credentialsId forwarding ├── dataset/ -│ └── SeqeraDatasetClient ← unchanged +│ └── SeqeraDatasetClient ← cleanup — only dataset endpoints; user/workspace lookup moved out └── datalink/ ← NEW - └── SeqeraDataLinkClient ← typed client over TowerClient - returns io.seqera.tower.model.* directly + ├── SeqeraDataLinkClient ← typed client; getDataLink uses combined keyword search + └── PagedIterable ← generic lazy-pagination abstraction (eager 1st page, lazy rest) ``` -No plugin-local DTO classes are introduced. `DataLinkDto`, `DataLinkContentResponse`, `DataLinkItem`, `DataLinkDownloadUrlResponse`, `DataLinkProvider` and related types are reused from `io.seqera:tower-api:1.121.0`. +No plugin-local DTO classes are introduced. `DataLinkDto`, `DataLinkContentResponse`, `DataLinkItem`, `DataLinkCredentials`, `DataLinkDownloadUrlResponse`, `DataLinkProvider` and related types are reused from `io.seqera:tower-api:1.121.0`. `PagedIterable` is a plugin-local **service** type (not a DTO) — a generic eager-first-page + lazy-subsequent paginated iterable used by both list and browse endpoints. + +User-id and workspace lookup live on `SeqeraFileSystem` (shared infrastructure), not on a resource-type client. The filesystem holds a `TowerClient` directly and exposes a cached `getUserId()`. ### `ResourceTypeHandler` contract ``` interface ResourceTypeHandler { String getResourceType() // "datasets" / "data-links" - int getIdentitySegmentCount() // 1 / 2 - List list(SeqeraPath dir) throws IOException + Iterable list(SeqeraPath dir) throws IOException SeqeraFileAttributes readAttributes(SeqeraPath p) throws IOException InputStream newInputStream(SeqeraPath p) throws IOException - void checkAccess(SeqeraPath p, AccessMode... modes) throws IOException } ``` +`checkAccess` is **not** on this interface — `SeqeraFileSystemProvider.checkAccess` rejects WRITE/EXECUTE upfront and delegates existence-check to `h.readAttributes(sp)`. + `SeqeraFileSystemProvider` owns dispatch at depth ≥ 3. Depth 0–2 (root/org/workspace) remains in `SeqeraFileSystem`, shared across all handlers. At depth 3 (the workspace listing returns the resource-type children), the handler registry is enumerated — `datasets` and `data-links` are the two entries today, added automatically by the provider at `newFileSystem()` time. ### API Usage Summary (Data-Links) | NIO operation | Platform endpoint | Notes | | -------------------------------------------------------------- | -------------------------------------------------------------- | ------------------------------------------------------------------------------- | -| enumerate providers in workspace (depth-3 listing) | `GET /data-links?workspaceId=X&max=100&offset=O` | offset pagination via lazy `Iterator` | -| resolve one data-link by (provider, name) | `GET /data-links?workspaceId=X&search=&max=100&offset=O` | server-side filter by name; short-circuit on first provider match; `@Memoized` | -| `newDirectoryStream(dir)` at data-link root | `GET /data-links/{id}/browse?workspaceId=X[&credentialsId=C]` | lazy `PagedDataLinkContent` — token pagination via `nextPageToken` | -| `newDirectoryStream(dir)` at a sub-path | `GET /data-links/{id}/browse/{path}?workspaceId=X[&credentialsId=C]` | same; slashes in `{path}` are preserved | -| `readAttributes(path)` inside a data-link | same as above (first page only) | short-circuited when `path.cachedAttributes` was set by a prior listing | -| `newInputStream(file)` | `GET /data-links/{id}/generate-download-url?workspaceId=X&filePath=[&credentialsId=C]` | parse `DataLinkDownloadUrlResponse.url`; fetch with plain JDK `HttpClient` (no Seqera auth header — the URL is signed for the cloud backend) | +| user-id lookup | `GET /user-info` | called once per `SeqeraFileSystem`; result cached on the FS. | +| workspaces for user | `GET /user/{userId}/workspaces` | drives `SeqeraFileSystem.loadOrgWorkspaceCache()`. | +| enumerate providers in workspace (depth-3 listing) | `GET /data-links?workspaceId=X&max=100&offset=O` | offset pagination via lazy `PagedIterable`-backed `Iterator`. | +| resolve one data-link by (provider, name) | `GET /data-links?workspaceId=X&search=+provider:

&max=100&offset=O` | combined keyword search (URL-encoded); single result; `@Memoized` including `null` misses. | +| `newDirectoryStream(dir)` at data-link root | `GET /data-links/{id}/browse?workspaceId=X[&credentialsId=C]` | lazy `PagedIterable` — token pagination via `nextPageToken`. | +| `newDirectoryStream(dir)` at a sub-path | `GET /data-links/{id}/browse/{path}?workspaceId=X[&credentialsId=C]` | same; slashes in `{path}` are preserved. | +| `readAttributes(path)` inside a data-link | parent-browse: `GET /data-links/{id}/browse[/parent]?...&search=` | lists the parent directory and finds the entry by name. Entry's `type` (FILE/FOLDER) is the authoritative signal. Short-circuited when `path.cachedAttributes` was set by a prior listing or by a prior `readAttributes` resolution. | +| `newInputStream(file)` | `GET /data-links/{id}/generate-download-url?workspaceId=X&filePath=[&credentialsId=C]` | parse `DataLinkDownloadUrlResponse.url`; fetch with plain JDK `HttpClient` (no Seqera auth header — the URL is signed for the cloud backend). | `credentialsId` is forwarded when `DataLinkDto.credentials` is non-empty (using the first entry's `id`); omitted otherwise. @@ -172,16 +176,20 @@ interface ResourceTypeHandler { 6. **Handler registry at construction, not via PF4J**: handlers are instantiated in `SeqeraFileSystemProvider.newFileSystem()`. Adding a third resource type is a code change to this plugin, identical in shape to the dataset/data-link pair. No extension-point protocol is introduced — YAGNI. -7. **`readAttributes` is single-target**: because `GET /data-links/{id}/browse/{path}` accepts both directory and file paths, a file-level `readAttributes` is one API call — not a parent browse plus filter. No N+1 problem; no browse cache needed. +7. **`readAttributes` uses parent-browse**: the Platform's `/browse/{path}` response shape (with `originalPath` populated in all three cases) does NOT reliably distinguish "file path", "directory path", and "missing path". `readAttributes` therefore lists the **parent** directory and finds the entry by name; the entry's `type` (FILE/FOLDER) is the authoritative signal. A missing entry surfaces as `NoSuchFileException`. Cost is one API call — the same as a single-path browse — and the server-side `&search=` keyword filter narrows the parent listing. 8. **Read-only stance preserved**: `SeqeraFileSystem.isReadOnly()` remains `true`. Write operations on data-links raise `UnsupportedOperationException`. The `/data-links/{id}/upload` endpoints are a future extension point. -9. **Listings stream lazily**: paginated Platform responses are exposed as lazy iterators rather than eagerly-materialized lists. `listDataLinks` is an `Iterator` that fetches offsets on demand. `getContent` returns a `PagedDataLinkContent` that loads the first page eagerly (for `readAttributes`) and paginates further only as the iterator advances. Handler `list()` returns `Iterable`, flowed through `DirectoryStream` without full materialization. +9. **Generic `PagedIterable` for all paginated endpoints**: paginated Platform responses are exposed as lazy iterables rather than eagerly-materialized lists. `PagedIterable` is a single generic abstraction that captures the eager-first-page + lazy-subsequent-page contract; the workspace data-link list and the data-link content browse share the same state machine. Two named static fetchers (`DataLinkListFetcher` for offset pagination; `DataLinkContentFetcher` for token pagination) own their own cursor state. Eager first-page fetching means `IOException` surfaces at the call site rather than at the first `Iterator.hasNext()`. -10. **Per-path attribute cache, not a global cache**: listings attach `SeqeraFileAttributes` to each emitted `SeqeraPath` via `resolveWithAttributes(name, attrs)`. A follow-up `readAttributes(child)` returns the cached value with zero API calls. Paths parsed from raw URIs (no prior listing) fall back to the live browse endpoint. No global browse-result or URL cache is maintained. +10. **Per-path attribute cache, not a global cache**: listings attach `SeqeraFileAttributes` to each emitted `SeqeraPath` via `resolveWithAttributes(name, attrs)`. The provider also writes back resolved attributes onto a path after a fresh `readAttributes`. A follow-up read on the same path instance returns the cached value with zero API calls. Paths parsed from raw URIs (no prior listing) trigger one parent-browse on first read. No global browse-result or URL cache is maintained. 11. **`credentialsId` forwarding**: when a data-link exposes credentials in its `DataLinkDto.credentials` list, the plugin forwards the first credential's `id` as the `credentialsId` query parameter on browse and download-URL calls. When the list is empty, the parameter is omitted and the Platform falls back to its default resolution. +12. **Combined keyword search for data-link resolution**: `getDataLink(ws, provider, name)` issues `&search= provider:` (URL-encoded) — the Platform returns at most the matching data-link, eliminating the need for client-side iterate-and-filter. `@Memoized` per `(ws, provider, name)`, including `null` misses, makes repeated lookups for non-existent data-links free. + +13. **Cached user-id on the filesystem**: `SeqeraFileSystem` holds a `TowerClient` directly and exposes `getUserId()` whose result is cached for the lifetime of the filesystem. The token doesn't change during a pipeline run, so neither does the resolved user. This consolidates shared infrastructure (used by the workspace cache for every resource type) onto the filesystem instead of stuffing it into the dataset client. + ### Refactor Delivered by This Change Adding a second resource type requires a shared abstraction in the `fs/` package so the two behaviors do not collide: diff --git a/specs/260422-seqera-datalinks-fs/plan.md b/specs/260422-seqera-datalinks-fs/plan.md index 3f76fc89d2..3fe3c5ea40 100644 --- a/specs/260422-seqera-datalinks-fs/plan.md +++ b/specs/260422-seqera-datalinks-fs/plan.md @@ -60,34 +60,34 @@ plugins/nf-tower/ ├── main/io/seqera/tower/plugin/ │ ├── fs/ │ │ ├── ResourceTypeHandler.groovy ← NEW (interface; list returns Iterable) - │ │ ├── SeqeraFileSystemProvider.groovy ← refactored (dispatch by handler; lazy filter iterator) - │ │ ├── SeqeraFileSystem.groovy ← refactored (handler registry; no dataset caches) - │ │ ├── SeqeraPath.groovy ← refactored (trail segments, cachedAttributes, resolveWithAttributes) + │ │ ├── SeqeraFileSystemProvider.groovy ← refactored (dispatch by handler; lazy filter iterator; one-shot DirectoryStream) + │ │ ├── SeqeraFileSystem.groovy ← refactored (holds TowerClient; cached getUserId; handler registry) + │ │ ├── SeqeraPath.groovy ← refactored (trail segments, volatile cachedAttributes, resolveWithAttributes) │ │ ├── SeqeraFileAttributes.groovy ← refactored (isDir, size, lastMod, created, fileKey) │ │ ├── SeqeraPathFactory.groovy ← unchanged │ │ ├── DatasetInputStream.groovy ← unchanged │ │ └── handler/ │ │ ├── DatasetsResourceHandler.groovy ← NEW (extracted; owns dataset caches; parses @version) - │ │ └── DataLinksResourceHandler.groovy ← NEW + │ │ └── DataLinksResourceHandler.groovy ← NEW (parent-browse readAttributes; credentialsId forwarding; requireDataLink helper) │ ├── dataset/ - │ │ └── SeqeraDatasetClient.groovy ← unchanged + │ │ └── SeqeraDatasetClient.groovy ← cleanup (only dataset endpoints; user/workspace lookup moved to SeqeraFileSystem) │ └── datalink/ ← NEW package - │ ├── SeqeraDataLinkClient.groovy ← NEW (typed client; returns iterators and PagedDataLinkContent) - │ └── PagedDataLinkContent.groovy ← NEW (lazy pagination view over data-link content) + │ ├── SeqeraDataLinkClient.groovy ← NEW (typed client; getDataLink uses combined keyword search; named static fetchers) + │ └── PagedIterable.groovy ← NEW (generic lazy pagination — eager first page, lazy subsequent) └── test/io/seqera/tower/plugin/ ├── fs/ │ ├── SeqeraPathTest.groovy ← extended (sub-path cases, cachedAttributes, trailing slash) - │ ├── SeqeraFileSystemTest.groovy ← extended (handler registry) - │ ├── SeqeraFileSystemProviderTest.groovy ← extended (data-link dispatch specs) + │ ├── SeqeraFileSystemTest.groovy ← extended (TowerClient ctor, getUserId caching, handler registry) + │ ├── SeqeraFileSystemProviderTest.groovy ← extended (data-link dispatch specs, DirectoryStream one-shot, cachedAttributes propagation) │ ├── ResourceTypeAbstractionTest.groovy ← NEW (architectural guard) │ └── handler/ │ ├── DatasetsResourceHandlerTest.groovy ← NEW (caches, attr short-circuit) - │ └── DataLinksResourceHandlerTest.groovy ← NEW (cache, credentialsId, paged listings) + │ └── DataLinksResourceHandlerTest.groovy ← NEW (parent-browse, cache, credentialsId, paged listings) └── datalink/ - └── SeqeraDataLinkClientTest.groovy ← NEW (pagination, endpoint URLs, error mapping) + └── SeqeraDataLinkClientTest.groovy ← NEW (pagination, getDataLink combined search, endpoint URLs, error mapping) ``` -**Structure decision**: Parallel `datalink/` package mirrors the existing `dataset/` package. Handlers live in `fs/handler/` so the generic NIO classes in `fs/` remain resource-type-agnostic. All wire DTOs are reused from `io.seqera.tower.model.*` — no plugin-local DTO classes. `PagedDataLinkContent` is a plugin-local service type (not a DTO) that wraps lazy pagination over `DataLinkItem` streams. +**Structure decision**: Parallel `datalink/` package mirrors the existing `dataset/` package. Handlers live in `fs/handler/` so the generic NIO classes in `fs/` remain resource-type-agnostic. User/workspace lookup lives on `SeqeraFileSystem` (shared infrastructure across resource types), not on a resource-type client. All wire DTOs are reused from `io.seqera.tower.model.*` — no plugin-local DTO classes. `PagedIterable` is a plugin-local generic service type (not a DTO) that captures the eager-first-page + lazy-subsequent-page contract used by all paginated endpoints. --- @@ -114,7 +114,9 @@ All reused from `io.seqera:tower-api:1.121.0` (already on the classpath): | Operation | Endpoint | Notes | |---|---|---| -| List data-links in workspace | `GET /data-links?workspaceId=&max=&offset=` | Offset pagination. `totalSize` = full count; `max=100` per page. Optional `&search=` used by `getDataLink` for server-side pre-filter. | +| User-id lookup | `GET /user-info` | Called from `SeqeraFileSystem.getUserId()`; result cached for the lifetime of the filesystem. | +| Workspaces for user | `GET /user/{userId}/workspaces` | Called from `SeqeraFileSystem.loadOrgWorkspaceCache()`. | +| List data-links in workspace | `GET /data-links?workspaceId=&max=&offset=` | Offset pagination. `totalSize` = full count; `max=100` per page. Optional `&search=` keyword filter; `getDataLink` uses `&search=+provider:` (URL-encoded) for a single server-side resolution. | | Browse root of a data-link | `GET /data-links/{id}/browse?workspaceId=` | Token pagination via `nextPageToken`. Optional `credentialsId`. | | Browse a sub-path | `GET /data-links/{id}/browse/{path}?workspaceId=` | Same response and pagination as the root variant. The `{path}` segment preserves `/` as path separators. | | Pre-signed download URL | `GET /data-links/{id}/generate-download-url?workspaceId=&filePath=` | Returns `DataLinkDownloadUrlResponse.url`. Optional `credentialsId`. | @@ -172,12 +174,11 @@ interface ResourceTypeHandler { /** open a read stream for a leaf path; throw if the path is a directory */ InputStream newInputStream(SeqeraPath path) throws IOException - - /** verify the path exists and modes are satisfiable; READ allowed, WRITE/EXECUTE rejected */ - void checkAccess(SeqeraPath path, AccessMode... modes) throws IOException } ``` +`checkAccess` is **not** on this interface — `SeqeraFileSystemProvider.checkAccess` rejects WRITE/EXECUTE upfront and delegates existence-check to `h.readAttributes(sp)`. + Handlers build each child path via `parent.resolveWithAttributes(segmentName, attrs)` so subsequent `readAttributes` calls short-circuit when the same path is used. ### `SeqeraDataLinkClient` contract @@ -187,28 +188,30 @@ class SeqeraDataLinkClient { SeqeraDataLinkClient(TowerClient towerClient) /** - * Lazy iterator over every data-link in the workspace. Pages fetched on demand - * via GET /data-links?workspaceId=&max=100&offset=. + * Iterator over every data-link in the workspace, backed by PagedIterable. + * The first page is fetched eagerly (early-fail at the call site); later pages + * are fetched on demand. Endpoint: GET /data-links?workspaceId=&max=100&offset=. */ - Iterator listDataLinks(long workspaceId) + Iterator listDataLinks(long workspaceId) throws IOException /** - * Server-side-filtered resolution of a single data-link by (provider, name). - * Iterates /data-links with &search=, short-circuits on first match; - * result is @Memoized per (workspaceId, provider, name). - * Throws NoSuchFileException if not found. + * Resolve a single data-link by (provider, name) using the Platform's combined + * keyword search: `search= provider:`. Server returns at most + * one matching data-link; this method takes the first result and returns null + * on miss. @Memoized per (ws, provider, name) — both successful resolutions + * AND null misses are cached for the lifetime of the client. */ - DataLinkDto getDataLink(long workspaceId, String provider, String name) + DataLinkDto getDataLink(long workspaceId, String provider, String name) throws IOException - /** Distinct provider identifiers present in the workspace (sorted). */ + /** Distinct provider identifiers present in the workspace (sorted, unmodifiable). @Memoized. */ Set getDataLinkProviders(long workspaceId) /** - * Lazy paginated view over /data-links/{id}/browse[/{path}]. - * The returned PagedDataLinkContent loads the first page eagerly and paginates - * subsequent pages as its iterator advances. + * Lazy paginated view over /data-links/{id}/browse[/{path}]. PagedIterable + * loads the first page eagerly and paginates subsequent pages on demand. + * Optional &search= for server-side prefix filter on entry names. */ - PagedDataLinkContent getContent(String dataLinkId, String subPath, long workspaceId, String credentialsId = null) + PagedIterable getContent(String dataLinkId, String subPath, long workspaceId, String credentialsId = null, String search = null) throws IOException /** GET /data-links/{id}/generate-download-url?filePath=[&credentialsId=] */ DataLinkDownloadUrlResponse getDownloadUrl(String dataLinkId, String subPath, long workspaceId, String credentialsId = null) @@ -217,43 +220,63 @@ class SeqeraDataLinkClient { All endpoints translate 401/403/404/5xx through the same `checkFsResponse` pattern used in `SeqeraDatasetClient`. The `credentialsId` parameter is forwarded as a query-string value when non-null; the handler sources it from `DataLinkDto.credentials[0].id`. -### `PagedDataLinkContent` contract +Pagination is delegated to two named static fetchers nested inside the client: +- `DataLinkListFetcher` — implements `PagedIterable.NextPageFetcher`; offset-paginated; cursor state is `(offset, total)` instance fields. +- `DataLinkContentFetcher` — implements `PagedIterable.NextPageFetcher`; token-paginated; cursor state is `nextToken` instance field. + +### `PagedIterable` contract + +Generic lazy-pagination abstraction shared by all paginated endpoints. Eager first page (so `IOException` surfaces at the call site, not at the first `Iterator.hasNext()`); later pages on demand. Fetch failures during iteration wrap in `UncheckedIOException`. ```groovy -class PagedDataLinkContent implements Iterable { - /** Page fetcher: fetch(null) -> first page; fetch(token) -> next page. */ - static interface PageFetcher { - Map fetch(String nextPageToken) throws IOException - // returns: {objects: List, nextPageToken: String, originalPath: String (first page only)} +class PagedIterable implements Iterable { + /** Stateful "give me the next page" callback. Cursor lives in the implementation. */ + static interface NextPageFetcher { + Page fetch() throws IOException + } + + static class Page { + final List items + final boolean isLast // true → no more pages after this one } - PagedDataLinkContent(String originalPath, List firstPage, String firstPageNextToken, PageFetcher pageFetcher) + /** Eagerly fetch the first page; later pages on demand. */ + static PagedIterable start(NextPageFetcher fetcher) throws IOException - String getOriginalPath() - List getFirstPage() // eager, already loaded at construction + List getFirstPage() // eager, already loaded boolean isEmpty() - Iterator iterator() // yields first-page items, then paginates lazily + Iterator iterator() // yields first-page items, then paginates lazily } ``` -### `SeqeraFileSystem` handler registry +### `SeqeraFileSystem` shape + +Holds a `TowerClient` directly and exposes shared (across resource types) infrastructure: ```groovy class SeqeraFileSystem extends FileSystem { - // existing org/workspace state (unchanged) - private final Map handlers = new LinkedHashMap<>() + SeqeraFileSystem(SeqeraFileSystemProvider provider, TowerClient towerClient) + + /** Cached for the lifetime of the FS — token doesn't change during a run. */ + long getUserId() throws IOException - void registerHandler(ResourceTypeHandler h) { handlers.put(h.resourceType, h) } - ResourceTypeHandler getHandler(String type) { handlers.get(type) } - Set getResourceTypes() { Collections.unmodifiableSet(handlers.keySet()) } + void loadOrgWorkspaceCache() + Set listOrgNames() + List listWorkspaceNames(String org) + long resolveWorkspaceId(String org, String workspace) throws NoSuchFileException + + void registerHandler(ResourceTypeHandler h) + ResourceTypeHandler getHandler(String type) + Set getResourceTypes() } ``` -`SeqeraFileSystemProvider.newFileSystem()` registers both handlers after constructing the filesystem: +`SeqeraFileSystemProvider.newFileSystem()` constructs the filesystem with the `TowerClient` and registers both handlers: ```groovy -fs.registerHandler(new DatasetsResourceHandler(fs, new SeqeraDatasetClient(towerClient))) -fs.registerHandler(new DataLinksResourceHandler(fs, new SeqeraDataLinkClient(towerClient))) +fileSystem = new SeqeraFileSystem(this, towerClient) +fileSystem.registerHandler(new DatasetsResourceHandler(fileSystem, new SeqeraDatasetClient(towerClient))) +fileSystem.registerHandler(new DataLinksResourceHandler(fileSystem, new SeqeraDataLinkClient(towerClient))) ``` ### Dispatch in `SeqeraFileSystemProvider` @@ -281,20 +304,23 @@ Internal fields become `(directory, size, lastModified, created, fileKey)`. `Dat |---|---|---| | `data-links/` (trail=[]) | `list` | `client.getDataLinkProviders(ws)` → distinct providers (sorted); emit child paths `data-links/` | | `data-links/` (trail=[p]) | `list` | stream `client.listDataLinks(ws)`; collect names where provider matches; emit child paths; `NoSuchFileException` if none match | -| `data-links//` (trail=[p,n]) | `list` | `client.getDataLink(ws, p, n)` → `dl`; `client.getContent(dl.id, "", ws, credentialsIdOf(dl))` → wrap items as `Iterable` carrying cached `SeqeraFileAttributes` | +| `data-links//` (trail=[p,n]) | `list` | `requireDataLink(ws, p, n, dir)` → `dl`; `client.getContent(dl.id, "", ws, credentialsIdOf(dl))` → wrap items as `Iterable` carrying cached `SeqeraFileAttributes` | | `data-links////…` (trail ≥ 3) | `list` | same as above with `subPath = trail[2..].join('/')` | -| any depth ≥ 3 | `readAttributes` | short-circuit if `p.cachedAttributes` is set; else: data-link-root → directory; deeper → `getContent(id, sub, ws, credentialsIdOf(dl)).firstPage`; if one item matching the last segment with `type = FILE`, return file attrs (size from item); otherwise → directory | -| leaf file | `newInputStream` | `client.getDataLink(ws, p, n)` → `dl`; `client.getDownloadUrl(dl.id, sub, ws, credentialsIdOf(dl))`; open a plain JDK `HttpClient.send(..., BodyHandlers.ofInputStream())` against `response.url`; return body stream | +| `data-links/` or `data-links/` | `readAttributes` | trail=[] → directory; trail=[p] → validate via `client.getDataLinkProviders(ws)`; throw if unknown | +| `data-links///` | `readAttributes` | short-circuit if `p.cachedAttributes` is set; else `requireDataLink(...)`; trail.size==2 → directory (data-link root); trail.size≥3 → **parent-browse**: list the parent directory and find the entry by name. Entry's `type` (FILE/FOLDER) is the authoritative signal; file → `(size, EPOCH, EPOCH, path)`; folder → directory marker; missing entry → `NoSuchFileException`. Server-side `&search=` narrows the parent listing. | +| leaf file | `newInputStream` | `requireDataLink(...)`; `client.getDownloadUrl(dl.id, sub, ws, credentialsIdOf(dl))`; open a plain JDK `HttpClient.send(..., BodyHandlers.ofInputStream())` against `response.url`; return body stream | `credentialsIdOf(dl)` returns `dl.credentials[0].id` when non-empty, else `null` (query parameter omitted). -Provider segment canonicalization: the path segment is the `DataLinkProvider` enum's `toString()` — lowercase (e.g. `aws`, `google`, `azure`). A path with an unknown provider segment fails via `client.getDataLink(...)` → `NoSuchFileException`. +`requireDataLink(ws, provider, name, pathForErrors)` calls `client.getDataLink(...)` and throws `NoSuchFileException` if the result is `null` — uniform error message across the three call sites. + +Provider segment canonicalization: the path segment is the `DataLinkProvider` enum's `toString()` — lowercase (e.g. `aws`, `google`, `azure`). A path with an unknown provider segment fails via `client.getDataLink(...) == null` → `requireDataLink` → `NoSuchFileException`. -Listings populate cached attributes on each emitted `SeqeraPath` (via `parent.resolveWithAttributes(name, attrs)`) so a follow-up `readAttributes(child)` returns immediately with zero API calls. Attributes come directly from each `DataLinkItem`: file → `(size, Instant.EPOCH, Instant.EPOCH, item.name)`; folder → `SeqeraFileAttributes(true)`. +Listings populate cached attributes on each emitted `SeqeraPath` (via `parent.resolveWithAttributes(name, attrs)`) so a follow-up `readAttributes(child)` returns immediately with zero API calls. Attributes come directly from each `DataLinkItem`: file → `(size, Instant.EPOCH, Instant.EPOCH, item.name)`; folder → `SeqeraFileAttributes(true)`. The provider also writes back resolved attributes onto the path after a fresh `readAttributes`, so subsequent reads on the same instance also hit the cache. ### Data-link identity resolution -`client.getDataLink(workspaceId, provider, name)` iterates `/data-links?search=` (server-side pre-filter) and returns the first entry whose `provider.toString() == providerSegment`. Memoized via `@Memoized` per `(workspaceId, provider, name)` — repeated handler calls within a run hit the memoization cache. The handler does NOT maintain its own `Map>` cache — the client-level streaming iterator plus memoized-lookup replaces it. +`client.getDataLink(workspaceId, provider, name)` issues a single combined keyword search (`&search=+provider:`, URL-encoded) — the Platform returns at most the matching data-link, and the method returns the first result or `null`. `@Memoized` per `(workspaceId, provider, name)` — both successful resolutions AND `null` misses are cached for the lifetime of the client. The handler does NOT maintain its own data-link cache; the client-level memoization replaces it. Why no client-side iterate-and-filter: the server-side `provider:` keyword filter eliminates the need. --- diff --git a/specs/260422-seqera-datalinks-fs/spec.md b/specs/260422-seqera-datalinks-fs/spec.md index 43b782d69a..80b5050438 100644 --- a/specs/260422-seqera-datalinks-fs/spec.md +++ b/specs/260422-seqera-datalinks-fs/spec.md @@ -17,12 +17,13 @@ - Q: How deep can a data-link path go? → A: Arbitrary depth below the data-link root. Each segment after `` is an entry inside the underlying bucket/prefix — a directory or file, resolved via the Platform browse API. - Q: How should the existing dataset filesystem code be extended to accommodate data-links? → A: Introduce a true resource-type abstraction (`ResourceTypeHandler`). The current dataset-specific logic in `SeqeraFileSystemProvider`, `SeqeraFileSystem`, and `SeqeraPath` is extracted into a `DatasetsResourceHandler`; data-links are added as a parallel `DataLinksResourceHandler`. The core path/filesystem/provider classes become resource-type-agnostic. - Q: How should the listing vs I/O boundary work? → A: Listing (`newDirectoryStream`) and attributes (`readAttributes`) are resolved via the Platform's browse endpoints (`GET /data-links/{id}/browse` for the data-link root and `GET /data-links/{id}/browse/{path}` for sub-paths). Downloads (`newInputStream`) go through `GET /data-links/{id}/generate-download-url?filePath=` to obtain a pre-signed URL, which is then fetched with a plain JDK `HttpClient` (no Seqera auth header on the cloud-backed URL). No cloud SDK is used. -- Q: Which DTOs are introduced by this feature? → A: None. All types are reused from the `io.seqera:tower-api:1.121.0` dependency (`DataLinkDto`, `DataLinkItem`, `DataLinkProvider`, `DataLinkCredentials`, `DataLinkContentResponse`, `DataLinkDownloadUrlResponse`, etc.). A plugin-local `PagedDataLinkContent` holder class wraps the eager-first-page + lazy-pagination behavior but holds only tower-api types. -- Q: Is browse-per-file supported by the Platform API? → A: Yes. `GET /data-links/{id}/browse/{path}` works for both directories and files, so `readAttributes` on any path is a single targeted call — no parent-browse-and-filter, no N+1 problem. -- Q: How are paginated Platform responses returned to callers? → A: Streaming. The workspace data-link list (`GET /data-links`) returns an `Iterator` that fetches offsets on demand. The browse endpoint returns a `PagedDataLinkContent` that loads the first page eagerly (so `readAttributes` can inspect it without iterating) and fetches subsequent pages lazily as the iterator advances. The handler layer exposes `Iterable` to the NIO `DirectoryStream`; no full materialization of listings in memory. -- Q: What convenience methods does the client expose on top of the raw list endpoint? → A: Two memoized helpers — `getDataLink(ws, provider, name)` uses the server-side `&search=` filter and returns the first match (throws `NoSuchFileException` on miss); `getDataLinkProviders(ws)` returns the sorted set of distinct providers present in the workspace. Both are memoized per-arguments within a single `SeqeraDataLinkClient` instance. -- Q: How are attributes discovered after a listing? → A: When `newDirectoryStream` yields a child path, the handler attaches the per-item attributes (size for files, directory marker for folders) to the `SeqeraPath` via an optional cache field. A subsequent `readAttributes` on that path returns the cached value without any additional Platform API call. Paths parsed from URIs (no prior listing) fall back to the live browse endpoint. +- Q: Which DTOs are introduced by this feature? → A: None. All types are reused from the `io.seqera:tower-api:1.121.0` dependency (`DataLinkDto`, `DataLinkItem`, `DataLinkProvider`, `DataLinkCredentials`, `DataLinkContentResponse`, `DataLinkDownloadUrlResponse`, etc.). A plugin-local generic `PagedIterable` (with inner `NextPageFetcher` and `Page`) provides the lazy-pagination abstraction shared by all paginated endpoints — it is a service type, not a DTO. +- Q: Is browse-per-file reliable for distinguishing file vs directory? → A: No — the Platform's `/browse/{path}` response shape (with a populated `originalPath`) does not reliably distinguish "file path", "directory path", and "missing path". `readAttributes` on a sub-path therefore lists the **parent** directory and finds the entry by name; the entry's `type` (FILE/FOLDER) is the authoritative signal, and a missing entry → `NoSuchFileException`. This also resolves the directory-vs-file ambiguity for buckets containing a same-named file inside. +- Q: How are paginated Platform responses returned to callers? → A: Streaming, via a single generic `PagedIterable` abstraction. The first page is fetched eagerly when the iterable is constructed (so any `IOException` surfaces at the call site, not at the first `hasNext()`); subsequent pages are fetched on demand as the iterator advances. The workspace data-link list (`GET /data-links`) is exposed as `Iterator`; data-link content is exposed as `PagedIterable`. Two named static fetchers — `DataLinkListFetcher` (offset-paginated) and `DataLinkContentFetcher` (token-paginated) — own their cursor state. The handler layer exposes `Iterable` to the NIO `DirectoryStream`; no full materialization of listings in memory. +- Q: What convenience methods does the client expose on top of the raw list endpoint? → A: Two memoized helpers — `getDataLink(ws, provider, name)` issues the combined keyword search ` provider:` so the server returns at most the matching data-link; the method takes the first result and returns `null` on miss (memoized, including null misses). `getDataLinkProviders(ws)` returns the sorted, unmodifiable set of distinct providers present in the workspace. +- Q: How are attributes discovered after a listing? → A: When `newDirectoryStream` yields a child path, the handler attaches the per-item attributes (size for files, directory marker for folders) to the `SeqeraPath` via an optional cache field. The provider's `readAttributes` also writes resolved attributes back onto the path, so subsequent reads on the same path instance hit the cache. Paths parsed from URIs (no prior listing) trigger a parent-browse on first read. - Q: How are cloud credentials for the underlying bucket/prefix selected? → A: The Platform's `DataLinkDto.credentials` list associates one or more credential records with a data-link. The plugin forwards the first credential's ID as the `credentialsId` query parameter on browse and download-URL requests, when present. If the data-link has no associated credentials, the parameter is omitted and the Platform uses its default resolution. +- Q: Where does the workspace user-id resolution live? → A: On `SeqeraFileSystem`. The filesystem holds a `TowerClient` directly; `getUserId()` is exposed and cached for the lifetime of the filesystem instance (the access token does not change during a pipeline run, so neither does the resolved user). The dataset and data-link clients no longer carry user/workspace lookup methods — those are shared infrastructure used by every resource type. - Q: Which provider-segment value appears in user-visible paths? → A: The lowercase value of the `DataLinkProvider` enum, as exposed by its `toString()` (e.g. `aws`, `google`, `azure`). This matches the Platform UI. - Q: What happens if the pre-signed URL expires during a long read? → A: The underlying HTTP connection errors out with an `IOException`. The plugin does not transparently re-issue URLs; Nextflow's task retry handles the failure as it already does for other transient I/O errors. @@ -124,19 +125,21 @@ A Nextflow or Seqera engineer wants the filesystem's resource-type abstraction t - `seqera:////data-links///` → directory; entries are the top-level items in the data-link. - `seqera:////data-links////` → directory; entries are the children at that sub-path. - `seqera:////data-links////` → file. -- **FR-005**: System MUST return correct `BasicFileAttributes` — `isDirectory`, `isRegularFile`, `size`, `lastModifiedTime`, `creationTime` — for any path inside a data-link. When a path was produced by a prior `newDirectoryStream` listing, its attributes MUST be returned from the listing response without a follow-up API call. Paths parsed from a URI (no prior listing) MUST source attributes from the Platform's browse endpoint for that specific path. +- **FR-005**: System MUST return correct `BasicFileAttributes` — `isDirectory`, `isRegularFile`, `size`, `lastModifiedTime`, `creationTime` — for any path inside a data-link. When a path was produced by a prior `newDirectoryStream` listing, its attributes MUST be returned from the listing response without a follow-up API call. Paths parsed from a URI (no prior listing) MUST source attributes by listing the **parent** directory and finding the entry by name — the entry's `type` (FILE/FOLDER) is the authoritative file-vs-directory signal; a missing entry MUST surface as `NoSuchFileException`. The Platform's `/browse/{path}` response shape alone does not reliably distinguish file paths, directory paths, and missing paths, so a parent-browse strategy is required. - **FR-006**: System MUST treat data-link paths as read-only in this iteration. Any write-like operation (`newByteChannel` with `WRITE`/`APPEND`, `copy` with a data-link as target, `delete`, `createDirectory`, `move`) MUST fail with `UnsupportedOperationException` or `AccessDeniedException`, consistent with the dataset feature's read-only stance. - **FR-007**: System MUST produce clear, actionable error messages distinguishing: unknown org/workspace, unknown provider, unknown data-link name, missing sub-path, unsupported resource type, authentication failure, and transient Platform errors. - **FR-008**: System MUST NOT depend on `nf-amazon`, `nf-google`, or `nf-azure`. All cloud I/O is reduced to a single HTTPS fetch of a pre-signed URL. The signed URL is fetched with a plain JDK `HttpClient` — NOT through `TowerClient`, since the URL is addressed to the cloud backend and must not carry the Seqera `Authorization` header. -- **FR-009**: System MUST reuse DTOs from `io.seqera:tower-api:1.121.0` (`DataLinkDto`, `DataLinkContentResponse`, `DataLinkItem`, `DataLinkDownloadUrlResponse`, `DataLinkCredentials`, `DataLinkProvider`, etc.) without introducing parallel plugin-local DTOs. A plugin-local `PagedDataLinkContent` service type is permitted as a lazy-pagination wrapper around tower-api types. +- **FR-009**: System MUST reuse DTOs from `io.seqera:tower-api:1.121.0` (`DataLinkDto`, `DataLinkContentResponse`, `DataLinkItem`, `DataLinkDownloadUrlResponse`, `DataLinkCredentials`, `DataLinkProvider`, etc.) without introducing parallel plugin-local DTOs. A plugin-local generic `PagedIterable` service type (with inner `NextPageFetcher` and `Page`) is permitted as a shared lazy-pagination abstraction. - **FR-010**: System MUST refactor the existing `fs/` package to introduce a `ResourceTypeHandler` interface. `DatasetsResourceHandler` MUST encapsulate all dataset-specific behavior previously inlined in `SeqeraFileSystemProvider` / `SeqeraFileSystem` / `SeqeraPath`. `DataLinksResourceHandler` MUST implement the same interface. - **FR-011**: After the refactor, the classes `SeqeraPath`, `SeqeraFileSystem`, and `SeqeraFileSystemProvider` MUST contain no dataset- or data-link-specific logic for depth ≥ 3; all such logic MUST live in the respective handler. - **FR-012**: `SeqeraPath` MUST parse and represent arbitrary sub-paths below depth 4 for resource types that support them (data-links). Datasets continue to reject sub-paths beyond depth 4. - **FR-013**: The filesystem MUST reuse the existing `TowerClient` retry/backoff for all Platform API calls. No new retry logic is introduced. - **FR-014**: Transient failure of a pre-signed URL fetch mid-stream MUST surface as `IOException`; Nextflow task retry handles the recovery. The plugin MUST NOT re-issue URLs transparently within a single `InputStream`. - **FR-015**: System MUST NOT maintain a global or per-run cache of browse-result pages or pre-signed URLs. A cheap per-path attribute cache lives on each `SeqeraPath` instance returned by a listing (file size / directory flag captured from the listing item); this cache is scoped to the lifetime of that path object and is not shared across paths. -- **FR-016**: Paginated Platform responses MUST be exposed to callers as lazy iterators — callers consume pages only as elements are requested. The workspace data-link list (`GET /data-links?workspaceId=X&max=&offset=`) MUST be returned as an `Iterator`; the data-link content endpoint (`GET /data-links/{id}/browse[/path]`) MUST be returned as a `PagedDataLinkContent` view backed by a lazy iterator of `DataLinkItem`. +- **FR-016**: Paginated Platform responses MUST be exposed to callers via a single generic `PagedIterable` abstraction. The first page MUST be fetched eagerly so any `IOException` surfaces at the call site rather than at the first `Iterator.hasNext()`; subsequent pages MUST be fetched on demand as the iterator advances. The workspace data-link list (`GET /data-links?workspaceId=X&max=&offset=`) MUST be returned as `Iterator` backed by a `PagedIterable`; data-link content (`GET /data-links/{id}/browse[/path]`) MUST be returned as `PagedIterable`. - **FR-017**: System MUST forward the data-link's associated credentials identifier to the Platform when one is available. When `DataLinkDto.credentials` is non-empty, the first entry's `id` MUST be passed as the `credentialsId` query parameter on browse (`GET /data-links/{id}/browse[/path]`) and download-URL (`GET /data-links/{id}/generate-download-url`) requests. When the list is empty, the parameter MUST be omitted so the Platform applies its default resolution. +- **FR-018**: User-id and workspace lookup MUST live on `SeqeraFileSystem` (not on a resource-type client) since they are shared infrastructure. `SeqeraFileSystem` MUST hold a `TowerClient` directly and expose a `getUserId()` method whose result is cached for the lifetime of the filesystem instance — the Seqera access token does not change during a pipeline run, and the resolved user is therefore stable. +- **FR-019**: `SeqeraDataLinkClient.getDataLink(workspaceId, provider, name)` MUST issue a single combined keyword search (` provider:`) and take the first result, rather than fetching by name and filtering provider client-side. Both successful resolutions and `null` misses MUST be memoized for the lifetime of the client. ### Key Entities