storage cleanup & add S3 Storage for datasets (so far)#1256
Conversation
|
/run-security-scan |
alexcos20
left a comment
There was a problem hiding this comment.
AI automated code review (Gemini 3).
Overall risk: low
Summary:
This pull request introduces a significant refactoring of the storage component within the Ocean Node, implementing a new object-oriented design. It establishes an abstract Storage class with concrete implementations for UrlStorage, IpfsStorage, and ArweaveStorage. This greatly improves modularity, maintainability, and extensibility for handling different asset storage backends. Additionally, a comprehensive docs/Storage.md file has been added, clearly documenting each storage type, its configuration, and validation rules. The PR also includes a .cursor rule for nvm use to ensure consistent Node.js versions during testing.
Comments:
• [WARNING][style] The getFileInfo method in the abstract Storage class returns Promise<FileInfoResponse[]> but currently only processes this.getFile() (a single file) and pushes one item to the response array. If FileInfoRequest is meant to support multiple files, the getFile() method might need to return an array, or this method should be refactored to better reflect its actual single-file processing, perhaps returning Promise<FileInfoResponse> instead.
• [INFO][style] The eslint-disable comment // eslint-disable-next-line no-use-before-define -- static factory return type references this class on the static getStorageClass declaration is now less accurate as the implementation has moved to getStorageClass.ts. While the type declaration still refers to Storage, the no-use-before-define is mainly circumvented by the external assignment in index.ts. Consider updating or removing this comment if it's no longer strictly relevant to the current file.
• [INFO][style] The catch block uses console.error for logging. For consistency, consider using CORE_LOGGER.error here, similar to how it's used in Storage.ts line 99.
• [INFO][other] This new rule is a good addition for development consistency. Consider if it's possible to automate nvm use within the package.json test scripts (e.g., "test": "nvm use && npm run mocha") for a more seamless developer experience, preventing issues even if someone forgets to run nvm use manually.
• [INFO][style] For name: '', the comment in ArweaveStorage.ts explicitly states // Never send the file name for Arweave as it may leak the transaction ID. A similar comment here for IPFS would be useful for consistency and clarity, explaining why the name is intentionally empty.
|
Added S3 Storage |
|
/run-security-scan |
alexcos20
left a comment
There was a problem hiding this comment.
AI automated code review (Gemini 3).
Overall risk: medium
Summary:
This pull request introduces significant architectural improvements to the storage component of Ocean Node, adding support for S3-compatible object storage and refactoring existing storage handlers (URL, IPFS, Arweave) into a more modular, class-based structure. Key changes include: new dedicated classes for each storage type extending an abstract Storage class, a factory function (getStorageClass) for instantiation, updated @types/fileObject.ts to support S3, and comprehensive documentation in docs/Storage.md. Dependency updates in package.json and package-lock.json reflect the inclusion of the AWS S3 SDK. The changes enhance maintainability and extensibility of the storage layer.
Comments:
• [WARNING][style] The pattern of declaring static getStorageClass as a type on the abstract Storage class and then assigning the actual implementation at runtime in index.ts is unconventional for a factory. While it works, it bypasses direct static inheritance and can be less explicit. A more idiomatic approach would be to export getStorageClass directly from its module (getStorageClass.ts) and import it wherever needed, keeping the factory separate from the abstract base class it instantiates. This improves modularity and clarity.
• [INFO][other] The forcePathStyle: false setting in createS3Client configures the S3 client for virtual host addressing (e.g., bucket.s3.amazonaws.com). This is standard for AWS S3. However, some self-hosted S3-compatible services (like MinIO) often require forcePathStyle: true (e.g., s3.example.com/bucket). It might be beneficial to document this nuance, or consider making forcePathStyle configurable via OceanNodeConfig if broader S3-compatible deployments are expected, to prevent future compatibility issues.
• [WARNING][bug] The headers field in StorageReadable is defined as [any], which is an array of any. Here, [response.ContentType] (an array containing a single string) is assigned to it. However, HTTP headers are typically represented as key-value pairs (e.g., Record<string, string>). The current type definition and usage are inconsistent with standard HTTP header representation and might lead to issues if the consumer of StorageReadable.headers expects a key-value object rather than an array of values. Consider updating StorageReadable.headers to be Record<string, string | string[]> | undefined and populating it with key-value pairs ({ 'Content-Type': response.ContentType }).
• [ERROR][bug] The getFileInfo method wraps the call to fetchSpecificFileMetadata in a try...catch block. If fetchSpecificFileMetadata throws an error (e.g., due to a validation failure or network issue), the error is logged using CORE_LOGGER.error but then silently swallowed, and the method returns an empty array ([]). This behavior masks the underlying problem from the caller, making it difficult to understand why file info couldn't be retrieved. It's recommended to re-throw the error after logging it, or modify the FileInfoResponse structure to explicitly include error details when valid is false.
• [INFO][security] The isFilePath() method in UrlStorage.ts explicitly returns false if the URL starts with http:// or https://. This is generally correct for preventing misinterpretation of HTTP URLs as local paths. However, it implicitly relies on unsafeURLs configuration to catch any potentially malicious HTTP URLs. Ensure that the unsafeURLs regex patterns are robust enough to cover all relevant security considerations for URL access.
|
/run-security-scan |
alexcos20
left a comment
There was a problem hiding this comment.
AI automated code review (Gemini 3).
Overall risk: medium
Summary:
This pull request introduces support for S3 as a new storage backend, significantly enhancing the flexibility of asset storage options. The core storage logic has been refactored into a more modular and extensible abstract Storage class with concrete implementations for URL, IPFS, Arweave, and S3. Comprehensive documentation for the new storage types has been added, along with updates to the package.json to include the AWS S3 SDK. The architectural changes improve maintainability and type safety across the storage components. Overall, this is a positive and impactful update that expands the capabilities of Ocean Node.
Comments:
• [WARNING][other] The newly added @aws-sdk/client-s3 dependency (version ^3.1002.0) in package-lock.json specifies an engines.node requirement of >=20.0.0. However, the project's package.json currently indicates "node": ">=18". Please verify that ocean-node officially supports Node.js >= 20, and if so, update the engines field in package.json to "node": ">=20" to reflect this new minimum requirement. If Node.js 18/19 support is still intended, this dependency might be incompatible and could cause runtime issues for users on those versions.
• [WARNING][security] The S3 storage type requires accessKeyId and secretAccessKey to be provided directly in the file object's s3Access configuration. While this enables flexible per-asset S3 access, it's crucial to highlight the security implications. If the DDO containing this information becomes publicly accessible (e.g., via IPFS or a misconfigured metadata store), these credentials could be exposed. It would be beneficial to add a prominent note in this documentation section (docs/Storage.md, specifically near the 'S3 storage' section) advising users on best practices for managing these credentials, such as using AWS IAM roles with temporary credentials, or other secure secrets management solutions, especially for public assets or sensitive data. This would guide users towards more secure configurations.
• [INFO][style] Minor formatting. Consider adding a space to align Arweave under Type in the table, e.g., | Arweave | arweave |.
• [INFO][style] The line credentials (see below). on line 139, followed by **s3Access fields:** on line 142 is a bit redundant. You could rephrase line 139 to credentials (see 's3Access fields below).for more direct guidance. • [INFO][other] Thenamefield forArweaveFileObjectis explicitly set to an empty string with the comment// Never send the file name for Arweave as it may leak the transaction ID. This is a good security measure to prevent exposure of sensitive transaction IDs. It would be valuable to add a similar brief explanation in the docs/Storage.md` under the Arweave section, so users understand this design choice and its implications for the file name.
PR: Storage cleanup, S3 support, and documentation
Summary
Refactors the storage component into one file per class, adds S3-compatible storage support (AWS, Ceph, MinIO, etc.), adds documentation for all supported storage types (URL, IPFS, Arweave, S3), and improves type consistency for file objects.
Changes
1. Storage component refactor (
src/components/storage/)index.tsinto:Storage.ts– Abstract base classStorage(validation,getFile,getReadableStream,getFileInfo,isEncrypted,canDecrypt, etc.). UsesStorageObjectunion (includes S3).UrlStorage.ts–UrlStorage(URL-backed files, GET/POST, optional headers,unsafeURLs).ArweaveStorage.ts–ArweaveStorage(Arweave transaction ID, gateway URL).IpfsStorage.ts–IpfsStorage(IPFS CID, gateway URL).getStorageClass.ts– FactorygetStorageClass(file, config)that returns the appropriate concrete class; usesCORE_LOGGER.errorfor errors.index.tsnow only imports the above and re-exportsStorage,UrlStorage,ArweaveStorage,IpfsStorage,S3Storage, and attachesgetStorageClasstoStorage.getStorageClassso existing call sites remain valid.API: Unchanged for existing types. New
type: 's3'file objects are supported viaStorage.getStorageClass(...).2. S3 storage (
src/components/storage/S3Storage.ts)S3Storageclass for S3-compatible backends (Amazon S3, Ceph, MinIO, DigitalOcean Spaces, etc.):s3Accesson the file object:endpoint,bucket,objectKey,accessKeyId,secretAccessKey;regionoptional (defaults tous-east-1when omitted).validate()– Requiress3Accessand non-empty bucket, objectKey, endpoint, accessKeyId, secretAccessKey.getReadableStream()– Uses AWS SDK v3GetObjectCommand; builds client froms3Access(endpoint normalized tohttps://when needed).fetchSpecificFileMetadata()– UsesHeadObjectCommandfor metadata (content length, type, name, optional ETag checksum).getDownloadUrl()– Returnss3://bucket/objectKey(no credentials).FileObjectType.S3infileObject.ts;getStorageClassandindexexportS3Storage.3. Storage documentation
docs/Storage.mddescribing:s3Accessshape, optionalregion, no node-level S3 config (credentials and endpoint on the file object).README.md– Added a link to Storage Types in the Additional Resources section.4. Type consistency (
src/@types/fileObject.ts)FileObjectType– AddedS3 = 's3'.S3ObjectandS3FileObject– New interfaces for S3 file objects;StorageObjectunion includesS3FileObject.FileInfoHttpRequest.type– UsesFileObjectTypeenum instead of string union (includess3).Files touched
src/components/storage/index.tssrc/components/storage/Storage.tssrc/components/storage/UrlStorage.tssrc/components/storage/ArweaveStorage.tssrc/components/storage/IpfsStorage.tssrc/components/storage/S3Storage.tssrc/components/storage/getStorageClass.tsdocs/Storage.mdREADME.mdsrc/@types/fileObject.tsTesting
npm run build– passes.npm run lint– passes (no new errors).nvm usethen full test suite as needed.Checklist
Storage.getStorageClass, class exports).region; works with AWS and generic S3 (Ceph, MinIO, etc.).FileObjectType; S3 types infileObject.ts.