Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,7 @@ anyhow = "1.0.100"
parking_lot = { version = "0.12.4", features = ["arc_lock", "send_guard"] }
async-trait = "0.1.77"
futures-util = "0.3.31"
walkdir = "2.5.0"

[dev-dependencies]
tempfile = "3.15.0"
81 changes: 73 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,8 @@ docker run -d \
| `CLEANER_INTERVAL` | `3600` | Cleaner run interval (seconds) |
| `CLEANER_BATCH_SIZE` | `1000` | Cleaner batch size |
| `CLEANER_MAX_DELETES` | `10000` | Max deletions per cleaner run |
| `FILETRACKER_URL` | - | Old Filetracker URL for live migration |
| `FILETRACKER_URL` | - | Old Filetracker URL for live migration (HTTP fallback) |
| `FILETRACKER_V1_DIR` | - | V1 Filetracker directory for filesystem-based migration |

For PostgreSQL, use:
```
Expand Down Expand Up @@ -103,13 +104,15 @@ Environment variables override config file values.

## Migration

> **📖 Complete Migration Guide**: See [docs/migration.md](docs/migration.md) for comprehensive migration instructions from Filetracker v2.1+
>
> _Note: Migration from Filetracker v1.x will be supported in a future release._
> **📖 Complete Migration Guide**: See [docs/migration.md](docs/migration.md) for comprehensive migration instructions

### Quick Start: Offline Migration
s3dedup supports migration from both Filetracker V1 (filesystem-based) and V2 (HTTP-based) servers.

Migrate all files from old Filetracker while the proxy is offline:
### V2 Migration (Filetracker 2.1+)

#### Offline Migration

Migrate all files from Filetracker V2 via HTTP while the proxy is offline:

```bash
docker run --rm \
Expand All @@ -121,7 +124,7 @@ docker run --rm \
--max-concurrency 10
```

### Quick Start: Live Migration (Zero Downtime)
#### Live Migration (Zero Downtime)

Run the proxy while migrating in the background:

Expand All @@ -139,11 +142,73 @@ docker run -d \
live-migrate --env --max-concurrency 10
```

During live migration:
During V2 live migration:
- **GET**: Falls back to old Filetracker if file not found, migrates on-the-fly
- **PUT**: Writes to both s3dedup and old Filetracker
- **DELETE**: Deletes from both systems

### V1 Migration (Legacy Filetracker)

V1 Filetracker stores files directly on the filesystem and serves them via a simple HTTP protocol.
The key difference from V2 is that V1 doesn't have a `/list/` endpoint for file discovery, so migration uses
filesystem walking.

**Performance**: V1 migration uses chunked processing to handle millions of files efficiently without loading
all file paths into memory. The filesystem is scanned in chunks of 10,000 files, keeping memory usage constant
regardless of total file count.

#### Offline Migration

Migrate from V1 filesystem (requires access to `$FILETRACKER_DIR`):

```bash
docker run --rm \
--env-file .env \
-v s3dedup-data:/app/data \
-v /path/to/filetracker:/filetracker:ro \
ghcr.io/sio2project/s3dedup:latest \
migrate-v1 --env \
--v1-directory /filetracker \
--max-concurrency 10
```

#### Live Migration

Run the proxy while migrating from V1 in the background:

```bash
# With both filesystem access and HTTP fallback
docker run -d \
--name s3dedup \
-p 8080:8080 \
-v s3dedup-data:/app/data \
-v /path/to/filetracker:/filetracker:ro \
--env-file .env \
ghcr.io/sio2project/s3dedup:latest \
live-migrate-v1 --env \
--v1-directory /filetracker \
--filetracker-url http://old-filetracker-v1:8000 \
--max-concurrency 10

# Or with HTTP fallback only (no filesystem access)
docker run -d \
--name s3dedup \
-p 8080:8080 \
-v s3dedup-data:/app/data \
--env-file .env \
ghcr.io/sio2project/s3dedup:latest \
live-migrate-v1 --env \
--filetracker-url http://old-filetracker-v1:8000 \
--max-concurrency 10
```

During V1 live migration:
- **Background filesystem migration**: If `--v1-directory` is provided, filesystem is scanned in chunks to migrate all files
- Chunked processing handles millions of files with constant memory usage
- **HTTP fallback**: If `--filetracker-url` is provided, GET requests fall back to V1 server if file not found
- Automatically migrates files on first access
- **New requests**: Server accepts PUT/GET/DELETE requests normally during migration

For detailed migration strategies, performance tuning, troubleshooting, and rollback procedures, see the [Migration Guide](docs/migration.md).

## API Endpoints
Expand Down
5 changes: 5 additions & 0 deletions src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ pub struct BucketConfig {
/// Optional filetracker URL for live migration mode
#[serde(default)]
pub filetracker_url: Option<String>,

/// Optional V1 filetracker directory for filesystem-based migration
#[serde(default)]
pub filetracker_v1_dir: Option<String>,
}

impl Config {
Expand Down Expand Up @@ -251,6 +255,7 @@ impl BucketConfig {
.unwrap_or(10000),
},
filetracker_url: std::env::var("FILETRACKER_URL").ok(),
filetracker_v1_dir: std::env::var("FILETRACKER_V1_DIR").ok(),
})
}
}
11 changes: 7 additions & 4 deletions src/kvstorage/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,14 @@ pub(crate) trait KVStorageTrait {
self.set_ref_count(bucket, hash, cnt + 1).await
}

async fn decrement_ref_count(&mut self, bucket: &str, hash: &str) -> Result<()> {
async fn decrement_ref_count(&mut self, bucket: &str, hash: &str) -> Result<i64> {
let cnt = self.get_ref_count(bucket, hash).await?;
if cnt == 0 {
return Ok(());
return Ok(0);
}
self.set_ref_count(bucket, hash, cnt - 1).await
let new_count = cnt - 1;
self.set_ref_count(bucket, hash, new_count).await?;
Ok(new_count as i64)
}

async fn get_modified(&mut self, bucket: &str, path: &str) -> Result<i64>;
Expand Down Expand Up @@ -181,8 +183,9 @@ impl KVStorage {
/**
* Decrement the reference count for a hash.
* If the reference count is already 0, do nothing.
* Returns the new reference count after decrementing.
*/
pub async fn decrement_ref_count(&mut self, bucket: &str, hash: &str) -> Result<()> {
pub async fn decrement_ref_count(&mut self, bucket: &str, hash: &str) -> Result<i64> {
debug!(
"Decrementing ref count for bucket: {}, hash: {}",
bucket, hash
Expand Down
Loading