Skip to content

feat: add zstd compression to s3 chunk store#863

Merged
kalbasit merged 1 commit intomainfrom
02-11-feat_add_zstd_compression_to_s3_chunk_store
Feb 12, 2026
Merged

feat: add zstd compression to s3 chunk store#863
kalbasit merged 1 commit intomainfrom
02-11-feat_add_zstd_compression_to_s3_chunk_store

Conversation

@kalbasit
Copy link
Owner

Chunks stored in S3 can be quite large, and many of them are highly compressible.
Adding compression reduces storage costs and improves network performance by
transferring fewer bytes.

This change:

  • Implements zstd compression in PutChunk for S3 storage.
  • Returns the compressed size from PutChunk to allow tracking actual storage usage.
  • Automatically decompresses chunks in GetChunk using a wrapped ReadCloser.
  • Adds integration tests to verify both compression effectiveness and round-trip correctness.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @kalbasit, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the S3 chunk storage mechanism by integrating Zstandard compression. This change is designed to significantly reduce storage costs and improve network performance by minimizing the amount of data transferred to and from S3. It ensures that data is compressed before being stored and automatically decompressed upon retrieval, maintaining data integrity while optimizing resource usage.

Highlights

  • Zstandard Compression: Implemented zstd compression for chunks stored in S3 during the PutChunk operation.
  • Storage Usage Tracking: Modified PutChunk to return the compressed size, enabling accurate tracking of actual storage usage.
  • Automatic Decompression: Enabled automatic decompression of chunks in GetChunk using a custom s3ReadCloser that wraps the zstd decoder.
  • Integration Tests: Added comprehensive integration tests to verify both the effectiveness of compression and the correctness of the round-trip (compress and decompress) process.
Changelog
  • pkg/storage/chunk/s3.go
    • Integrated the github.com/klauspost/compress/zstd library for compression.
    • Defined a new s3ReadCloser struct to manage zstd decoding and proper closing of the underlying io.ReadCloser.
    • Added encoder and decoder fields to the s3Store struct to hold zstd writer and reader instances.
    • Initialized zstd.NewWriter and zstd.NewReader in NewS3Store for compression and decompression operations.
    • Modified GetChunk to create a new zstd decoder for each object and return it wrapped in s3ReadCloser.
    • Updated PutChunk to compress data using the zstd encoder before uploading to S3 and to return the compressed data size.
  • pkg/storage/chunk/s3_test.go
    • Added bytes import for test data manipulation.
    • Introduced a new test case stored chunk is zstd-compressed in S3 to assert that chunks are indeed compressed and result in a smaller size.
    • Added another test case compressed chunk round-trips correctly via S3 to verify that data compressed and stored in S3 can be retrieved and decompressed without corruption.
Activity
  • No human activity has occurred on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@dosubot dosubot bot added enhancement New feature or request go Pull requests that update go code labels Feb 12, 2026
@dosubot
Copy link

dosubot bot commented Feb 12, 2026

Documentation Updates

1 document(s) were updated by changes in this PR:

S3-Compatible Storage Backend
View Changes
@@ -1,9 +1,7 @@
 ## S3-Compatible Storage Backend: Design & Implementation
-
 The S3-compatible storage backend for ncps provides a scalable, cloud-native alternative to local filesystem storage. It is implemented in the `pkg/storage/s3` package and is a drop-in replacement for local storage, supporting AWS S3, MinIO, and other S3-compatible services. The backend implements the same storage interface, allowing seamless integration and migration between storage types. [Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/pkg/storage/s3/README.md#L3-L183)
 
 ### Configuration Options
-
 Configuration is managed via CLI flags, environment variables, or configuration files. The core options are:
 
 | Option                        | Env Variable                  | Required | Description                                 |
@@ -18,7 +16,6 @@
 The endpoint should be provided **without** the URL scheme in code (e.g., `"localhost:9000"`), but CLI flags may include the scheme (`http://` or `https://`). The scheme takes precedence over the `use-ssl` flag. [Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/README.md#L46-L782)
 
 #### Example: AWS S3
-
 ```bash
 ncps serve \
   --cache-hostname=ncps.example.com \
@@ -33,7 +30,6 @@
 ```
 
 #### Example: MinIO
-
 ```bash
 ncps serve \
   --cache-hostname=ncps.example.com \
@@ -48,7 +44,6 @@
 ```
 
 ### Supported Providers
-
 The backend supports AWS S3, MinIO, and any S3-compatible service (such as Ceph). Provider-specific notes:
 
 - **AWS S3**: Requires region and SSL; endpoint is typically `s3.<region>.amazonaws.com`.
@@ -56,7 +51,6 @@
 - **Other S3-Compatible**: Use the appropriate endpoint and credentials for your provider.
 
 ### Integration with the Storage Interface
-
 The S3 backend implements the `storage.Store` interface, providing methods for CRUD operations on secret keys, narinfo files, and nar objects. This allows it to be used interchangeably with the local storage backend. Initialization is performed via:
 
 ```go
@@ -81,7 +75,6 @@
 [Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/pkg/storage/s3/example.go#L16-L81)
 
 ### Path Sharding
-
 To prevent performance bottlenecks from too many files in a single directory, the backend shards objects using the first one and two characters of the object's hash. For example, a narinfo file with hash `abc123` is stored at `store/narinfo/a/ab/abc123.narinfo`. This is implemented via helper functions:
 
 ```go
@@ -94,7 +87,6 @@
 [Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/pkg/helper/filepath.go#L5-L38)
 
 #### Storage Structure
-
 ```mermaid
 graph TD
     A["bucket"]
@@ -107,11 +99,9 @@
 ```
 
 ### Error Handling
-
 The backend translates S3-specific errors to project-level errors. For example, S3 `NoSuchKey` errors become `storage.ErrNotFound`, and attempts to overwrite existing objects return `storage.ErrAlreadyExists`. Configuration is validated before initialization, and bucket existence is checked. All operations include OpenTelemetry tracing for observability. [Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/pkg/storage/s3/s3.go#L1-L554)
 
 ### Migration from Local Storage
-
 To migrate from local storage to S3:
 
 1. Create an S3 bucket or MinIO instance.
@@ -139,7 +129,6 @@
 ### Usage Examples
 
 #### AWS S3 (Go)
-
 ```go
 cfg := s3.Config{
     Bucket:          "my-nix-cache",
@@ -153,7 +142,6 @@
 ```
 
 #### MinIO (Go)
-
 ```go
 cfg := s3.Config{
     Bucket:          "my-nix-cache",
@@ -166,7 +154,6 @@
 ```
 
 #### Store Operations
-
 ```go
 // Store a secret key
 secretKey, err := signature.LoadSecretKey("your-secret-key-content")
@@ -181,7 +168,6 @@
 [Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/pkg/storage/s3/example.go#L16-L81)
 
 ### Development Setup with MinIO
-
 To develop and test with MinIO:
 
 1. Start MinIO locally (the dev script manages this for you):
@@ -198,8 +184,51 @@
 
 The script auto-restarts on code changes and configures MinIO for local development. The S3 backend will use the MinIO instance at `localhost:9000` with default credentials (`minioadmin`/`minioadmin`). [Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/README.md#L46-L782)
 
+### Kubernetes Development Environment (Kind)
+You can also develop and test with MinIO running inside a local Kubernetes cluster using [Kind](https://kind.sigs.k8s.io/) and the provided setup script:
+
+1. Start the Kubernetes development environment:
+
+   ```bash
+   ./dev-scripts/k8s-cluster.sh
+   ```
+
+   This script will:
+   - Create a Kind cluster (if not already running)
+   - Deploy MinIO with a pre-created bucket (`ncps-bucket`) and access keys
+   - Deploy PostgreSQL, MariaDB, and Redis instances
+   - Output connection details for all services
+
+2. To access MinIO from your host, port-forward the service:
+
+   ```bash
+   kubectl port-forward -n minio svc/minio 9000:9000
+   ```
+
+3. Use the following credentials and endpoint for your S3 backend configuration:
+
+   | Option        | Value             |
+   |--------------|-------------------|
+   | Endpoint     | http://localhost:9000 |
+   | Bucket       | ncps-bucket       |
+   | Access Key   | ncps-access-key   |
+   | Secret Key   | ncps-secret-key   |
+   | Use SSL      | false             |
+
+   Example CLI usage:
+
+   ```bash
+   ncps serve \
+     --cache-storage-s3-bucket=ncps-bucket \
+     --cache-storage-s3-endpoint=http://localhost:9000 \
+     --cache-storage-s3-access-key-id=ncps-access-key \
+     --cache-storage-s3-secret-access-key=ncps-secret-key \
+     --cache-storage-s3-use-ssl=false
+   ```
+
+The script ensures MinIO is ready and pre-configured for development, so you do not need to manually create buckets or credentials. This environment is ideal for testing NCPS against S3-compatible storage and database backends in a reproducible local cluster.
+
 ### Security Considerations
-
 Store credentials securely (prefer environment variables or IAM roles for AWS). Use SSL/TLS in production. For AWS, restrict access with IAM policies and consider temporary credentials. Enable bucket versioning and set appropriate bucket policies for access control. [Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/pkg/storage/s3/README.md#L3-L183)
 
 ---

How did I do? Any feedback?  Join Discord

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces zstd compression for S3 chunk storage, aiming to optimize storage costs and network performance. However, a critical Denial of Service (DoS) vulnerability has been identified in the GetChunk method, where unconstrained zstd decoders could lead to memory exhaustion under high concurrency or with malicious decompression bombs. Additionally, a critical race condition exists due to the concurrent use of a non-thread-safe zstd.Encoder, and there is some unused code. It is recommended to use sync.Pool for both encoders and decoders, set memory limits for decoders, and add an additional test case to prevent regressions. A note has been added regarding loop variable capture in Go 1.22+ for the suggested test.

@kalbasit kalbasit force-pushed the 02-11-feat_add_zstd_compression_to_local_chunk_store branch from 0efd7d8 to 971925c Compare February 12, 2026 05:37
@kalbasit kalbasit force-pushed the 02-11-feat_add_zstd_compression_to_s3_chunk_store branch from a467841 to 1005205 Compare February 12, 2026 05:37
@kalbasit kalbasit force-pushed the 02-11-feat_add_zstd_compression_to_s3_chunk_store branch from 1005205 to 3da748c Compare February 12, 2026 06:15
@kalbasit kalbasit force-pushed the 02-11-feat_add_zstd_compression_to_local_chunk_store branch from 971925c to 117d7b0 Compare February 12, 2026 06:15
@kalbasit kalbasit force-pushed the 02-11-feat_add_zstd_compression_to_s3_chunk_store branch from 3da748c to 8c899c5 Compare February 12, 2026 06:37
@kalbasit kalbasit force-pushed the 02-11-feat_add_zstd_compression_to_local_chunk_store branch from 117d7b0 to 6518e24 Compare February 12, 2026 06:37
@kalbasit kalbasit force-pushed the 02-11-feat_add_zstd_compression_to_s3_chunk_store branch from 8c899c5 to 0afaa04 Compare February 12, 2026 06:56
@kalbasit kalbasit force-pushed the 02-11-feat_add_zstd_compression_to_local_chunk_store branch 2 times, most recently from 24a34db to c933ddf Compare February 12, 2026 07:22
@kalbasit kalbasit force-pushed the 02-11-feat_add_zstd_compression_to_s3_chunk_store branch from 0afaa04 to 68c548b Compare February 12, 2026 07:22
Base automatically changed from 02-11-feat_add_zstd_compression_to_local_chunk_store to main February 12, 2026 07:34
@kalbasit kalbasit force-pushed the 02-11-feat_add_zstd_compression_to_s3_chunk_store branch from 68c548b to 520d156 Compare February 12, 2026 07:39
@kalbasit kalbasit enabled auto-merge (squash) February 12, 2026 07:39
Chunks stored in S3 can be quite large, and many of them are highly compressible.
Adding compression reduces storage costs and improves network performance by
transferring fewer bytes.

This change:
- Implements zstd compression in PutChunk for S3 storage.
- Returns the compressed size from PutChunk to allow tracking actual storage usage.
- Automatically decompresses chunks in GetChunk using a wrapped ReadCloser.
- Adds integration tests to verify both compression effectiveness and round-trip correctness.
@kalbasit kalbasit force-pushed the 02-11-feat_add_zstd_compression_to_s3_chunk_store branch from 520d156 to eb568cf Compare February 12, 2026 08:32
@codecov
Copy link

codecov bot commented Feb 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 3.96%. Comparing base (152eeb6) to head (eb568cf).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@          Coverage Diff          @@
##            main    #863   +/-   ##
=====================================
  Coverage   3.96%   3.96%           
=====================================
  Files          6       6           
  Lines        429     429           
=====================================
  Hits          17      17           
  Misses       409     409           
  Partials       3       3           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kalbasit kalbasit merged commit 7c5de21 into main Feb 12, 2026
16 checks passed
@kalbasit kalbasit deleted the 02-11-feat_add_zstd_compression_to_s3_chunk_store branch February 12, 2026 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request go Pull requests that update go code size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant