Skip to content

shanbel-kassa/Python-gRPC-NextJs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

CSV Processing System (gRPC + Django Gateway + NextJs Frontend)

A high-performance distributed system for processing large CSV files (e.g., millions of rows) using gRPC streaming, Django REST Gateway, and a NextJs frontend.
The app demonstrates memory-efficient streaming, asynchronous background job handling, and live upload + download management.


Project Architecture

Frontend (NextJs) | REST Gateway - Django | (async job) | gRPC CSV Processor | Storage Directory → Aggregated CSV Output

Frontend (NextJs) Allows users to upload CSV files and track processing jobs.
Backend (Django) Receives uploads, starts a background process, and sends the CSV to the gRPC service. gRPC Service (Python) Streams the CSV bytes, aggregates data efficiently, and returns summary metrics.
Storage Stores input files

Components

  • Frontend (Next.js) – Allows CSV upload, progress tracking, and downloading results.
  • Backend Gateway (Django + DRF) – Handles file uploads asynchronously, dispatches background gRPC jobs, and serves processed results.
  • gRPC Service (Python) – Streams CSV chunks and aggregates totals per department efficiently.

Algorithm Explanation

Core Goal : Aggregate Number of Sales per Department from huge CSVs while keeping memory usage constant.

Steps:

The client (Django) sends CSV chunks (64KB each) to the gRPC server via streaming RPC.

The server’s StreamingCSVAggregator processes each line immediately — it never loads the full file in memory.

totals[department] += number_of_sales

After all chunks, a totals dictionary is written as a new CSV.

The gRPC server returns total departments , rows processed , download URL and metrics (processing time, memory usage)

Memory-Efficiency Strategy

Large CSVs (hundreds of MB or millions of rows): Process incrementally via streaming bytes Memory spikes: Only store aggregated totals dictionary (e.g. {Department → total_sales}) Monitoring: tracemalloc used to measure peak memory during background processing

Complexity Analysis

Read + Parse CSV O(N) Each row is processed once. Aggregation (hash map) O(1) per row Dictionary insert/update for department totals. Total Memory O(D) Only department totals are kept, not all rows. Overall Time: O(N)  Space: O(D)

How to Run the System

1. Prerequisites

  • Python 3.9+
  • Node.js 18+
  • pip install grpcio grpcio-tools djangorestframework drf-yasg python-dotenv
  • (Optional) npm install for frontend dependencies

2. Environment variables (from .env):

3. Setup gRPC Service

cd backend 
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
cd backend/grpc_service

# (Re)generate protobuf files
python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. csv_upload.proto

# Run the async gRPC server
python server.py

### This service Receives CSV chunks via streaming. ses a streaming aggregator to process rows without loading the full file into memory. rites summarized results (totals per department) to /grpc_service/storage.

4. Setup gRPC Service

cd backend/gateway
python manage.py migrate
python manage.py runserver

5. Run Frontend (NextJS)

cd frontend
npm install
npm run dev

Then open http://localhost:3000

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published