feat: add zstd compression to s3 chunk store#863
Conversation
Summary of ChangesHello @kalbasit, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the S3 chunk storage mechanism by integrating Zstandard compression. This change is designed to significantly reduce storage costs and improve network performance by minimizing the amount of data transferred to and from S3. It ensures that data is compressed before being stored and automatically decompressed upon retrieval, maintaining data integrity while optimizing resource usage. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Documentation Updates 1 document(s) were updated by changes in this PR: S3-Compatible Storage BackendView Changes@@ -1,9 +1,7 @@
## S3-Compatible Storage Backend: Design & Implementation
-
The S3-compatible storage backend for ncps provides a scalable, cloud-native alternative to local filesystem storage. It is implemented in the `pkg/storage/s3` package and is a drop-in replacement for local storage, supporting AWS S3, MinIO, and other S3-compatible services. The backend implements the same storage interface, allowing seamless integration and migration between storage types. [Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/pkg/storage/s3/README.md#L3-L183)
### Configuration Options
-
Configuration is managed via CLI flags, environment variables, or configuration files. The core options are:
| Option | Env Variable | Required | Description |
@@ -18,7 +16,6 @@
The endpoint should be provided **without** the URL scheme in code (e.g., `"localhost:9000"`), but CLI flags may include the scheme (`http://` or `https://`). The scheme takes precedence over the `use-ssl` flag. [Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/README.md#L46-L782)
#### Example: AWS S3
-
```bash
ncps serve \
--cache-hostname=ncps.example.com \
@@ -33,7 +30,6 @@
```
#### Example: MinIO
-
```bash
ncps serve \
--cache-hostname=ncps.example.com \
@@ -48,7 +44,6 @@
```
### Supported Providers
-
The backend supports AWS S3, MinIO, and any S3-compatible service (such as Ceph). Provider-specific notes:
- **AWS S3**: Requires region and SSL; endpoint is typically `s3.<region>.amazonaws.com`.
@@ -56,7 +51,6 @@
- **Other S3-Compatible**: Use the appropriate endpoint and credentials for your provider.
### Integration with the Storage Interface
-
The S3 backend implements the `storage.Store` interface, providing methods for CRUD operations on secret keys, narinfo files, and nar objects. This allows it to be used interchangeably with the local storage backend. Initialization is performed via:
```go
@@ -81,7 +75,6 @@
[Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/pkg/storage/s3/example.go#L16-L81)
### Path Sharding
-
To prevent performance bottlenecks from too many files in a single directory, the backend shards objects using the first one and two characters of the object's hash. For example, a narinfo file with hash `abc123` is stored at `store/narinfo/a/ab/abc123.narinfo`. This is implemented via helper functions:
```go
@@ -94,7 +87,6 @@
[Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/pkg/helper/filepath.go#L5-L38)
#### Storage Structure
-
```mermaid
graph TD
A["bucket"]
@@ -107,11 +99,9 @@
```
### Error Handling
-
The backend translates S3-specific errors to project-level errors. For example, S3 `NoSuchKey` errors become `storage.ErrNotFound`, and attempts to overwrite existing objects return `storage.ErrAlreadyExists`. Configuration is validated before initialization, and bucket existence is checked. All operations include OpenTelemetry tracing for observability. [Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/pkg/storage/s3/s3.go#L1-L554)
### Migration from Local Storage
-
To migrate from local storage to S3:
1. Create an S3 bucket or MinIO instance.
@@ -139,7 +129,6 @@
### Usage Examples
#### AWS S3 (Go)
-
```go
cfg := s3.Config{
Bucket: "my-nix-cache",
@@ -153,7 +142,6 @@
```
#### MinIO (Go)
-
```go
cfg := s3.Config{
Bucket: "my-nix-cache",
@@ -166,7 +154,6 @@
```
#### Store Operations
-
```go
// Store a secret key
secretKey, err := signature.LoadSecretKey("your-secret-key-content")
@@ -181,7 +168,6 @@
[Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/pkg/storage/s3/example.go#L16-L81)
### Development Setup with MinIO
-
To develop and test with MinIO:
1. Start MinIO locally (the dev script manages this for you):
@@ -198,8 +184,51 @@
The script auto-restarts on code changes and configures MinIO for local development. The S3 backend will use the MinIO instance at `localhost:9000` with default credentials (`minioadmin`/`minioadmin`). [Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/README.md#L46-L782)
+### Kubernetes Development Environment (Kind)
+You can also develop and test with MinIO running inside a local Kubernetes cluster using [Kind](https://kind.sigs.k8s.io/) and the provided setup script:
+
+1. Start the Kubernetes development environment:
+
+ ```bash
+ ./dev-scripts/k8s-cluster.sh
+ ```
+
+ This script will:
+ - Create a Kind cluster (if not already running)
+ - Deploy MinIO with a pre-created bucket (`ncps-bucket`) and access keys
+ - Deploy PostgreSQL, MariaDB, and Redis instances
+ - Output connection details for all services
+
+2. To access MinIO from your host, port-forward the service:
+
+ ```bash
+ kubectl port-forward -n minio svc/minio 9000:9000
+ ```
+
+3. Use the following credentials and endpoint for your S3 backend configuration:
+
+ | Option | Value |
+ |--------------|-------------------|
+ | Endpoint | http://localhost:9000 |
+ | Bucket | ncps-bucket |
+ | Access Key | ncps-access-key |
+ | Secret Key | ncps-secret-key |
+ | Use SSL | false |
+
+ Example CLI usage:
+
+ ```bash
+ ncps serve \
+ --cache-storage-s3-bucket=ncps-bucket \
+ --cache-storage-s3-endpoint=http://localhost:9000 \
+ --cache-storage-s3-access-key-id=ncps-access-key \
+ --cache-storage-s3-secret-access-key=ncps-secret-key \
+ --cache-storage-s3-use-ssl=false
+ ```
+
+The script ensures MinIO is ready and pre-configured for development, so you do not need to manually create buckets or credentials. This environment is ideal for testing NCPS against S3-compatible storage and database backends in a reproducible local cluster.
+
### Security Considerations
-
Store credentials securely (prefer environment variables or IAM roles for AWS). Use SSL/TLS in production. For AWS, restrict access with IAM policies and consider temporary credentials. Enable bucket versioning and set appropriate bucket policies for access control. [Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/pkg/storage/s3/README.md#L3-L183)
--- |
There was a problem hiding this comment.
Code Review
This pull request introduces zstd compression for S3 chunk storage, aiming to optimize storage costs and network performance. However, a critical Denial of Service (DoS) vulnerability has been identified in the GetChunk method, where unconstrained zstd decoders could lead to memory exhaustion under high concurrency or with malicious decompression bombs. Additionally, a critical race condition exists due to the concurrent use of a non-thread-safe zstd.Encoder, and there is some unused code. It is recommended to use sync.Pool for both encoders and decoders, set memory limits for decoders, and add an additional test case to prevent regressions. A note has been added regarding loop variable capture in Go 1.22+ for the suggested test.
0efd7d8 to
971925c
Compare
a467841 to
1005205
Compare
1005205 to
3da748c
Compare
971925c to
117d7b0
Compare
3da748c to
8c899c5
Compare
117d7b0 to
6518e24
Compare
8c899c5 to
0afaa04
Compare
24a34db to
c933ddf
Compare
0afaa04 to
68c548b
Compare
68c548b to
520d156
Compare
Chunks stored in S3 can be quite large, and many of them are highly compressible. Adding compression reduces storage costs and improves network performance by transferring fewer bytes. This change: - Implements zstd compression in PutChunk for S3 storage. - Returns the compressed size from PutChunk to allow tracking actual storage usage. - Automatically decompresses chunks in GetChunk using a wrapped ReadCloser. - Adds integration tests to verify both compression effectiveness and round-trip correctness.
520d156 to
eb568cf
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #863 +/- ##
=====================================
Coverage 3.96% 3.96%
=====================================
Files 6 6
Lines 429 429
=====================================
Hits 17 17
Misses 409 409
Partials 3 3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|

Chunks stored in S3 can be quite large, and many of them are highly compressible.
Adding compression reduces storage costs and improves network performance by
transferring fewer bytes.
This change: