A Spring Boot 3.5.6 middleware service for secure batch file uploads with AWS S3 integration, designed for multi-tenant environments.
Data Forge Middleware provides a RESTful API for managing batch file uploads from client sites to AWS S3 storage. It features:
- Multi-tenant Architecture: Account-based isolation with site-level authentication
- Batch Upload Management: Track upload sessions with lifecycle states and metadata
- S3 Integration: Secure file storage with automatic retry and checksum validation
- Admin Portal: Keycloak-secured endpoints for account and site management
- Observability: Structured JSON logging, metrics, and health checks
- PostgreSQL 16: Partitioned error logs and optimized queries
- Java 21 (Temurin/Corretto recommended)
- PostgreSQL 16+ with partitioning support
- AWS S3 or LocalStack for development
- Keycloak (optional, for admin endpoints)
- Gradle 9.0+ (wrapper included)
Run infrastructure services in Docker and DFM from IDE for debugging:
# Start infrastructure (PostgreSQL, Keycloak, LocalStack)
./scripts/docker-dev.sh start
# Or manually
docker-compose -f docker-compose.dev.yml up -dThen in IntelliJ IDEA:
- Open Run Configuration
- Set Active profiles:
dev - Run or Debug the application
Infrastructure services:
- PostgreSQL: localhost:5432 (user:
postgres, password:postgres, database:dfm) - Keycloak: http://localhost:8081 (admin/admin, realm: dfm)
- LocalStack S3: http://localhost:4566 (bucket: dfm-uploads)
Stop infrastructure:
./scripts/docker-dev.sh stopSee docker-compose.dev.yml for configuration details.
The easiest way to run the complete stack:
# Start all services (PostgreSQL, Keycloak, LocalStack S3, DFM Backend)
docker-compose up -d
# Check services are healthy
docker-compose ps
# View logs
docker-compose logs -f dfm-backendServices will be available at:
- Application API: http://localhost:8080
- Swagger UI: http://localhost:8080/swagger-ui.html
- Keycloak Admin: http://localhost:8081 (admin/admin)
- PostgreSQL: localhost:5432 (databases:
dfm,keycloak) - LocalStack S3: http://localhost:4566
Pre-configured Keycloak users:
- Admin:
admin/admin(ROLE_ADMIN) - User:
user/user(ROLE_USER)
See docker/README.md for detailed Docker configuration.
git clone <repository-url>
cd data-forge-middleware
./gradlew buildCreate PostgreSQL database:
CREATE DATABASE dataforge;
CREATE USER dataforge_user WITH PASSWORD 'your-password';
GRANT ALL PRIVILEGES ON DATABASE dataforge TO dataforge_user;Copy example configuration:
cp src/main/resources/application-dev.yml.example src/main/resources/application-dev.ymlEdit application-dev.yml:
spring:
datasource:
url: jdbc:postgresql://localhost:5432/dataforge
username: dataforge_user
password: your-password
s3:
bucket:
name: dataforge-uploads
endpoint: http://localhost:4566 # LocalStack
region: us-east-1
access-key: test
secret-key: test
jwt:
secret: your-256-bit-secret-key-here-minimum-32-chars
expiration-minutes: 60
batch:
timeout-minutes: 60docker run -d -p 4566:4566 -p 4571:4571 \
--name localstack \
-e SERVICES=s3 \
-e DEBUG=1 \
localstack/localstack:latest
# Create S3 bucket
aws --endpoint-url=http://localhost:4566 s3 mb s3://dataforge-uploads./gradlew flywayMigrate./gradlew bootRun --args='--spring.profiles.active=dev'Application starts on http://localhost:8080
Access interactive API documentation:
http://localhost:8080/swagger-ui.html
curl -X POST http://localhost:8080/api/v1/auth/token \
-H "Authorization: Basic $(echo -n 'example.com:client-secret' | base64)" \
-H "Content-Type: application/json"Response:
{
"token": "eyJhbGciOiJIUzI1NiIs...",
"expiresAt": "2024-01-01T13:00:00",
"tokenType": "Bearer"
}curl -X POST http://localhost:8080/api/v1/batch/start \
-H "Authorization: Bearer <jwt-token>" \
-H "Content-Type: application/json"Response:
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"siteId": "123e4567-e89b-12d3-a456-426614174000",
"status": "IN_PROGRESS",
"s3Path": "account-id/example.com/2024-01-01/12-00/",
"uploadedFilesCount": 0,
"totalSize": 0,
"hasErrors": false,
"startedAt": "2024-01-01T12:00:00"
}curl -X POST http://localhost:8080/api/v1/batch/{batchId}/upload \
-H "Authorization: Bearer <jwt-token>" \
-F "file=@/path/to/file.csv"curl -X POST http://localhost:8080/api/v1/batch/{batchId}/complete \
-H "Authorization: Bearer <jwt-token>"curl -X POST http://localhost:8080/admin/accounts \
-H "Authorization: Bearer <keycloak-token>" \
-H "Content-Type: application/json" \
-d '{
"email": "user@example.com",
"name": "Example User"
}'curl -X POST http://localhost:8080/admin/accounts/{accountId}/sites \
-H "Authorization: Bearer <keycloak-token>" \
-H "Content-Type: application/json" \
-d '{
"domain": "example.com",
"displayName": "Example Site"
}'src/main/java/com/bitbi/dfm/
├── account/
│ ├── domain/ # Account aggregate
│ ├── application/ # AccountService, statistics
│ ├── infrastructure/ # JpaAccountRepository
│ └── presentation/ # AccountAdminController
├── site/
│ ├── domain/ # Site aggregate
│ ├── application/ # SiteService, event handlers
│ ├── infrastructure/ # JpaSiteRepository
│ └── presentation/ # SiteAdminController
├── batch/
│ ├── domain/ # Batch aggregate, BatchStatus
│ ├── application/ # BatchLifecycleService, timeout scheduler
│ ├── infrastructure/ # JpaBatchRepository
│ └── presentation/ # BatchController
├── upload/
│ ├── domain/ # UploadedFile, FileChecksum
│ ├── application/ # FileUploadService
│ ├── infrastructure/ # S3FileStorageService, config
│ └── presentation/ # FileUploadController
├── error/
│ ├── domain/ # ErrorLog (partitioned)
│ ├── application/ # ErrorLoggingService, export
│ ├── infrastructure/ # JpaErrorLogRepository, partition scheduler
│ └── presentation/ # ErrorLogController
├── auth/
│ ├── domain/ # JwtToken value object
│ ├── application/ # TokenService
│ ├── infrastructure/ # JwtTokenProvider, security config
│ └── presentation/ # AuthController
└── shared/
├── config/ # OpenAPI, Actuator, Metrics
├── exception/ # GlobalExceptionHandler, ErrorResponse
└── health/ # S3HealthIndicator
- accounts: User accounts with soft delete
- sites: Client sites with domain-based authentication
- batches: Upload sessions with lifecycle tracking
- uploaded_files: File metadata with S3 keys and checksums
- error_logs: Partitioned by month with JSONB metadata
- One Active Batch Per Site: Only one IN_PROGRESS batch allowed per site
- Concurrent Batch Limit: Maximum 5 active batches per account
- Batch Timeout: Batches auto-expire after 60 minutes (configurable)
- Cascade Deactivation: Deactivating account deactivates all sites
- File Size Limit: 500MB per file upload
./gradlew test./gradlew integrationTest./gradlew contractTest./gradlew jacocoTestReport
open build/reports/jacoco/test/html/index.htmlcurl http://localhost:8080/actuator/healthResponse:
{
"status": "UP",
"components": {
"db": { "status": "UP" },
"s3": { "status": "UP", "details": { "bucket": "dataforge-uploads" } },
"diskSpace": { "status": "UP" }
}
}curl http://localhost:8080/actuator/metricsCustom metrics:
batch.started- Total batches startedbatch.completed- Total batches completedbatch.failed- Total batches failedfiles.uploaded- Total files uploadederror.logged- Total errors logged
Structured JSON logging in production:
{
"@timestamp": "2024-01-01T12:00:00.000Z",
"level": "INFO",
"logger": "com.bitbi.dfm.batch.application.BatchLifecycleService",
"message": "Starting new batch",
"batchId": "550e8400-e29b-41d4-a716-446655440000",
"siteId": "123e4567-e89b-12d3-a456-426614174000",
"application": "data-forge-middleware"
}spring:
profiles:
active: prod
datasource:
url: jdbc:postgresql://<rds-endpoint>:5432/dataforge
hikari:
maximum-pool-size: 20
minimum-idle: 5
s3:
bucket:
name: prod-dataforge-uploads
region: us-east-1
# Uses IAM role credentials in production
logging:
level:
root: INFO
com.bitbi.dfm: INFOFROM eclipse-temurin:21-jre-alpine
WORKDIR /app
COPY build/libs/*.jar app.jar
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "app.jar"]Build and run:
./gradlew bootJar
docker build -t dataforge-middleware .
docker run -p 8080:8080 \
-e SPRING_PROFILES_ACTIVE=prod \
-e SPRING_DATASOURCE_URL=jdbc:postgresql://db:5432/dataforge \
dataforge-middleware- Follow Java 21 conventions
- Use Lombok for boilerplate reduction
- Domain-driven design principles
- Package by layered feature (PbLF)
- Create feature branch:
git checkout -b feature/my-feature - Write tests for new functionality
- Ensure all tests pass:
./gradlew test - Update documentation as needed
- Submit PR with clear description
Proprietary - Bit BI
For issues or questions:
- Email: support@bitbi.com
- Documentation: https://docs.dataforge.bitbi.com