Skip to content

storage cleanup & add S3 Storage for datasets (so far)#1256

Merged
alexcos20 merged 6 commits intomainfrom
feature/storage_cleanup
Mar 5, 2026
Merged

storage cleanup & add S3 Storage for datasets (so far)#1256
alexcos20 merged 6 commits intomainfrom
feature/storage_cleanup

Conversation

@alexcos20
Copy link
Member

@alexcos20 alexcos20 commented Mar 5, 2026

PR: Storage cleanup, S3 support, and documentation

Summary

Refactors the storage component into one file per class, adds S3-compatible storage support (AWS, Ceph, MinIO, etc.), adds documentation for all supported storage types (URL, IPFS, Arweave, S3), and improves type consistency for file objects.

Changes

1. Storage component refactor (src/components/storage/)

  • Split the former single-file implementation in index.ts into:
    • Storage.ts – Abstract base class Storage (validation, getFile, getReadableStream, getFileInfo, isEncrypted, canDecrypt, etc.). Uses StorageObject union (includes S3).
    • UrlStorage.tsUrlStorage (URL-backed files, GET/POST, optional headers, unsafeURLs).
    • ArweaveStorage.tsArweaveStorage (Arweave transaction ID, gateway URL).
    • IpfsStorage.tsIpfsStorage (IPFS CID, gateway URL).
    • getStorageClass.ts – Factory getStorageClass(file, config) that returns the appropriate concrete class; uses CORE_LOGGER.error for errors.
  • index.ts now only imports the above and re-exports Storage, UrlStorage, ArweaveStorage, IpfsStorage, S3Storage, and attaches getStorageClass to Storage.getStorageClass so existing call sites remain valid.

API: Unchanged for existing types. New type: 's3' file objects are supported via Storage.getStorageClass(...).

2. S3 storage (src/components/storage/S3Storage.ts)

  • New S3Storage class for S3-compatible backends (Amazon S3, Ceph, MinIO, DigitalOcean Spaces, etc.):
    • s3Access on the file object: endpoint, bucket, objectKey, accessKeyId, secretAccessKey; region optional (defaults to us-east-1 when omitted).
    • validate() – Requires s3Access and non-empty bucket, objectKey, endpoint, accessKeyId, secretAccessKey.
    • getReadableStream() – Uses AWS SDK v3 GetObjectCommand; builds client from s3Access (endpoint normalized to https:// when needed).
    • fetchSpecificFileMetadata() – Uses HeadObjectCommand for metadata (content length, type, name, optional ETag checksum).
    • getDownloadUrl() – Returns s3://bucket/objectKey (no credentials).
  • Wiring: FileObjectType.S3 in fileObject.ts; getStorageClass and index export S3Storage.

3. Storage documentation

  • Added / updated docs/Storage.md describing:
    • Supported storage types: URL, IPFS, Arweave, S3 (four backends).
    • For each type: file object shape (with JSON examples and field tables), validation rules, and required/optional node configuration.
    • S3: s3Access shape, optional region, no node-level S3 config (credentials and endpoint on the file object).
  • Updated README.md – Added a link to Storage Types in the Additional Resources section.

4. Type consistency (src/@types/fileObject.ts)

  • FileObjectType – Added S3 = 's3'.
  • S3Object and S3FileObject – New interfaces for S3 file objects; StorageObject union includes S3FileObject.
  • FileInfoHttpRequest.type – Uses FileObjectType enum instead of string union (includes s3).

Files touched

Path Change
src/components/storage/index.ts Re-exports only; added S3Storage
src/components/storage/Storage.ts New – base class (StorageObject)
src/components/storage/UrlStorage.ts New – URL storage
src/components/storage/ArweaveStorage.ts New – Arweave storage
src/components/storage/IpfsStorage.ts New – IPFS storage
src/components/storage/S3Storage.ts New – S3-compatible storage
src/components/storage/getStorageClass.ts New – factory (+ S3 case, CORE_LOGGER)
docs/Storage.md New – storage types doc (incl. S3)
README.md Link to Storage.md
src/@types/fileObject.ts FileObjectType.S3, S3Object, S3FileObject, StorageObject, FileInfoHttpRequest.type

Testing

  • npm run build – passes.
  • npm run lint – passes (no new errors).
  • Existing call sites unchanged; run nvm use then full test suite as needed.

Checklist

  • Storage logic preserved for URL, IPFS, Arweave; only file layout and exports refactored.
  • Public API preserved (Storage.getStorageClass, class exports).
  • S3 storage added with optional region; works with AWS and generic S3 (Ceph, MinIO, etc.).
  • Documentation added for all four types and linked from README.
  • Type usage aligned with FileObjectType; S3 types in fileObject.ts.

@alexcos20
Copy link
Member Author

/run-security-scan

Copy link
Member Author

@alexcos20 alexcos20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI automated code review (Gemini 3).

Overall risk: low

Summary:
This pull request introduces a significant refactoring of the storage component within the Ocean Node, implementing a new object-oriented design. It establishes an abstract Storage class with concrete implementations for UrlStorage, IpfsStorage, and ArweaveStorage. This greatly improves modularity, maintainability, and extensibility for handling different asset storage backends. Additionally, a comprehensive docs/Storage.md file has been added, clearly documenting each storage type, its configuration, and validation rules. The PR also includes a .cursor rule for nvm use to ensure consistent Node.js versions during testing.

Comments:
• [WARNING][style] The getFileInfo method in the abstract Storage class returns Promise<FileInfoResponse[]> but currently only processes this.getFile() (a single file) and pushes one item to the response array. If FileInfoRequest is meant to support multiple files, the getFile() method might need to return an array, or this method should be refactored to better reflect its actual single-file processing, perhaps returning Promise<FileInfoResponse> instead.
• [INFO][style] The eslint-disable comment // eslint-disable-next-line no-use-before-define -- static factory return type references this class on the static getStorageClass declaration is now less accurate as the implementation has moved to getStorageClass.ts. While the type declaration still refers to Storage, the no-use-before-define is mainly circumvented by the external assignment in index.ts. Consider updating or removing this comment if it's no longer strictly relevant to the current file.
• [INFO][style] The catch block uses console.error for logging. For consistency, consider using CORE_LOGGER.error here, similar to how it's used in Storage.ts line 99.
• [INFO][other] This new rule is a good addition for development consistency. Consider if it's possible to automate nvm use within the package.json test scripts (e.g., "test": "nvm use && npm run mocha") for a more seamless developer experience, preventing issues even if someone forgets to run nvm use manually.
• [INFO][style] For name: '', the comment in ArweaveStorage.ts explicitly states // Never send the file name for Arweave as it may leak the transaction ID. A similar comment here for IPFS would be useful for consistency and clarity, explaining why the name is intentionally empty.

@alexcos20
Copy link
Member Author

Added S3 Storage

@alexcos20
Copy link
Member Author

/run-security-scan

Copy link
Member Author

@alexcos20 alexcos20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI automated code review (Gemini 3).

Overall risk: medium

Summary:
This pull request introduces significant architectural improvements to the storage component of Ocean Node, adding support for S3-compatible object storage and refactoring existing storage handlers (URL, IPFS, Arweave) into a more modular, class-based structure. Key changes include: new dedicated classes for each storage type extending an abstract Storage class, a factory function (getStorageClass) for instantiation, updated @types/fileObject.ts to support S3, and comprehensive documentation in docs/Storage.md. Dependency updates in package.json and package-lock.json reflect the inclusion of the AWS S3 SDK. The changes enhance maintainability and extensibility of the storage layer.

Comments:
• [WARNING][style] The pattern of declaring static getStorageClass as a type on the abstract Storage class and then assigning the actual implementation at runtime in index.ts is unconventional for a factory. While it works, it bypasses direct static inheritance and can be less explicit. A more idiomatic approach would be to export getStorageClass directly from its module (getStorageClass.ts) and import it wherever needed, keeping the factory separate from the abstract base class it instantiates. This improves modularity and clarity.
• [INFO][other] The forcePathStyle: false setting in createS3Client configures the S3 client for virtual host addressing (e.g., bucket.s3.amazonaws.com). This is standard for AWS S3. However, some self-hosted S3-compatible services (like MinIO) often require forcePathStyle: true (e.g., s3.example.com/bucket). It might be beneficial to document this nuance, or consider making forcePathStyle configurable via OceanNodeConfig if broader S3-compatible deployments are expected, to prevent future compatibility issues.
• [WARNING][bug] The headers field in StorageReadable is defined as [any], which is an array of any. Here, [response.ContentType] (an array containing a single string) is assigned to it. However, HTTP headers are typically represented as key-value pairs (e.g., Record<string, string>). The current type definition and usage are inconsistent with standard HTTP header representation and might lead to issues if the consumer of StorageReadable.headers expects a key-value object rather than an array of values. Consider updating StorageReadable.headers to be Record<string, string | string[]> | undefined and populating it with key-value pairs ({ 'Content-Type': response.ContentType }).
• [ERROR][bug] The getFileInfo method wraps the call to fetchSpecificFileMetadata in a try...catch block. If fetchSpecificFileMetadata throws an error (e.g., due to a validation failure or network issue), the error is logged using CORE_LOGGER.error but then silently swallowed, and the method returns an empty array ([]). This behavior masks the underlying problem from the caller, making it difficult to understand why file info couldn't be retrieved. It's recommended to re-throw the error after logging it, or modify the FileInfoResponse structure to explicitly include error details when valid is false.
• [INFO][security] The isFilePath() method in UrlStorage.ts explicitly returns false if the URL starts with http:// or https://. This is generally correct for preventing misinterpretation of HTTP URLs as local paths. However, it implicitly relies on unsafeURLs configuration to catch any potentially malicious HTTP URLs. Ensure that the unsafeURLs regex patterns are robust enough to cover all relevant security considerations for URL access.

@alexcos20 alexcos20 marked this pull request as draft March 5, 2026 09:46
@alexcos20
Copy link
Member Author

/run-security-scan

@alexcos20 alexcos20 changed the title storage cleanup storage cleanup & add S3 Storage for datasets (so far) Mar 5, 2026
Copy link
Member Author

@alexcos20 alexcos20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI automated code review (Gemini 3).

Overall risk: medium

Summary:
This pull request introduces support for S3 as a new storage backend, significantly enhancing the flexibility of asset storage options. The core storage logic has been refactored into a more modular and extensible abstract Storage class with concrete implementations for URL, IPFS, Arweave, and S3. Comprehensive documentation for the new storage types has been added, along with updates to the package.json to include the AWS S3 SDK. The architectural changes improve maintainability and type safety across the storage components. Overall, this is a positive and impactful update that expands the capabilities of Ocean Node.

Comments:
• [WARNING][other] The newly added @aws-sdk/client-s3 dependency (version ^3.1002.0) in package-lock.json specifies an engines.node requirement of >=20.0.0. However, the project's package.json currently indicates "node": ">=18". Please verify that ocean-node officially supports Node.js >= 20, and if so, update the engines field in package.json to "node": ">=20" to reflect this new minimum requirement. If Node.js 18/19 support is still intended, this dependency might be incompatible and could cause runtime issues for users on those versions.
• [WARNING][security] The S3 storage type requires accessKeyId and secretAccessKey to be provided directly in the file object's s3Access configuration. While this enables flexible per-asset S3 access, it's crucial to highlight the security implications. If the DDO containing this information becomes publicly accessible (e.g., via IPFS or a misconfigured metadata store), these credentials could be exposed. It would be beneficial to add a prominent note in this documentation section (docs/Storage.md, specifically near the 'S3 storage' section) advising users on best practices for managing these credentials, such as using AWS IAM roles with temporary credentials, or other secure secrets management solutions, especially for public assets or sensitive data. This would guide users towards more secure configurations.
• [INFO][style] Minor formatting. Consider adding a space to align Arweave under Type in the table, e.g., | Arweave | arweave |.
• [INFO][style] The line credentials (see below). on line 139, followed by **s3Access fields:** on line 142 is a bit redundant. You could rephrase line 139 to credentials (see 's3Access fields below).for more direct guidance. • [INFO][other] Thenamefield forArweaveFileObjectis explicitly set to an empty string with the comment// Never send the file name for Arweave as it may leak the transaction ID. This is a good security measure to prevent exposure of sensitive transaction IDs. It would be valuable to add a similar brief explanation in the docs/Storage.md` under the Arweave section, so users understand this design choice and its implications for the file name.

@alexcos20 alexcos20 marked this pull request as ready for review March 5, 2026 10:13
@alexcos20 alexcos20 merged commit 23b18dd into main Mar 5, 2026
18 of 19 checks passed
@alexcos20 alexcos20 deleted the feature/storage_cleanup branch March 5, 2026 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants