A high-performance distributed system for processing large CSV files (e.g., millions of rows) using gRPC streaming, Django REST Gateway, and a NextJs frontend.
The app demonstrates memory-efficient streaming, asynchronous background job handling, and live upload + download management.
Frontend (NextJs) | REST Gateway - Django | (async job) | gRPC CSV Processor | Storage Directory → Aggregated CSV Output
Frontend (NextJs) Allows users to upload CSV files and track processing jobs.
Backend (Django) Receives uploads, starts a background process, and sends the CSV to the gRPC service.
gRPC Service (Python) Streams the CSV bytes, aggregates data efficiently, and returns summary metrics.
Storage Stores input files
- Frontend (Next.js) – Allows CSV upload, progress tracking, and downloading results.
- Backend Gateway (Django + DRF) – Handles file uploads asynchronously, dispatches background gRPC jobs, and serves processed results.
- gRPC Service (Python) – Streams CSV chunks and aggregates totals per department efficiently.
Core Goal : Aggregate Number of Sales per Department from huge CSVs while keeping memory usage constant.
Steps:
The client (Django) sends CSV chunks (64KB each) to the gRPC server via streaming RPC.
The server’s StreamingCSVAggregator processes each line immediately — it never loads the full file in memory.
totals[department] += number_of_sales
After all chunks, a totals dictionary is written as a new CSV.
The gRPC server returns total departments , rows processed , download URL and metrics (processing time, memory usage)
Large CSVs (hundreds of MB or millions of rows): Process incrementally via streaming bytes Memory spikes: Only store aggregated totals dictionary (e.g. {Department → total_sales}) Monitoring: tracemalloc used to measure peak memory during background processing
Read + Parse CSV O(N) Each row is processed once. Aggregation (hash map) O(1) per row Dictionary insert/update for department totals. Total Memory O(D) Only department totals are kept, not all rows. Overall Time: O(N) Space: O(D)
- Python 3.9+
- Node.js 18+
pip install grpcio grpcio-tools djangorestframework drf-yasg python-dotenv- (Optional)
npm installfor frontend dependencies
- GRPC_SERVER_ADDR=localhost:50051
- BASE_URL=http://127.0.0.1:8002/api
cd backend
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
cd backend/grpc_service
# (Re)generate protobuf files
python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. csv_upload.proto
# Run the async gRPC server
python server.py
### This service Receives CSV chunks via streaming. ses a streaming aggregator to process rows without loading the full file into memory. rites summarized results (totals per department) to /grpc_service/storage.
cd backend/gateway
python manage.py migrate
python manage.py runserver
cd frontend
npm install
npm run dev
Then open http://localhost:3000