Skip to content

feat(api): implement bulk data insertion endpoint with proper validat…#111

Open
ayash911 wants to merge 2 commits intogeturbackend:mainfrom
ayash911:feat/bulk-operations
Open

feat(api): implement bulk data insertion endpoint with proper validat…#111
ayash911 wants to merge 2 commits intogeturbackend:mainfrom
ayash911:feat/bulk-operations

Conversation

@ayash911
Copy link
Copy Markdown
Contributor

@ayash911 ayash911 commented Apr 16, 2026

Pull Request Description

Fixes #101
[NSoC'26]

This PR introduces a dedicated bulk operations endpoint (POST /api/data/:collectionName/bulk) to support high-throughput applications and data migrations. It reduces HTTP round-trips by allowing 100+ document inserts in a single request and correctly enforces project database size quotas mid-flight. Mongoose's insertMany with ordered: false ensures valid records are preserved even if some records fail due to constraints, responding accurately with a 207 Multi-Status object that identifies failing records by index.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • UI/UX improvement (Frontend only)
  • Refactor / Chore

Testing & Validation

Backend Verification:

  • I have run npm test in the backend/ directory and all tests passed.
  • I have verified the API endpoints using Postman/Thunder Client.
  • New unit tests have been added (if applicable).

Frontend Verification:

  • I have run npm run lint in the frontend/ directory.
  • Verified the UI changes on different screen sizes (Responsive).
  • Checked for any console errors in the browser dev tools.

Screenshots / Recordings (Optional)

Checklist

  • My code follows the code style of this project.
  • I have performed a self-review of my code.
  • I have commented my code, particularly in hard-to-understand areas.
  • My changes generate no new warnings or errors.
  • I have updated the documentation (README/Docs) accordingly.

Built with love for urBackend.

Summary by CodeRabbit

  • New Features

    • Added bulk insert endpoint supporting multiple-record insertion with per-item validation and partial success handling.
    • Enforces database size limits at batch level.
    • Dispatches webhook notifications for successfully inserted items.
  • Tests

    • Added comprehensive test suite covering bulk insert scenarios including validation failures, partial successes, and edge cases.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 16, 2026

📝 Walkthrough

Walkthrough

A new bulk insert endpoint (POST /:collectionName/bulk) has been implemented in the public API controller and routes. The feature validates array payloads, enforces database size limits, performs ordered-false insertions via MongoDB, handles partial failures with HTTP 207 responses, updates project database usage, and dispatches webhooks for successfully inserted documents.

Changes

Cohort / File(s) Summary
Bulk Insert Implementation
apps/public-api/src/controllers/data.controller.js, apps/public-api/src/routes/data.js
Added insertBulkData controller handler that validates array payloads, enforces database limits, performs bulk inserts with insertMany({ ordered: false }), captures and maps bulk-write errors to original indices, updates project database usage, and dispatches webhooks. Wired new POST /:collectionName/bulk route with existing middleware chain (verifyApiKey, blockUsersCollectionDataAccess, resolvePublicAuthContext, projectRateLimiter, authorizeWriteOperation). Also adjusted error logging in aggregateData to skip console output for ZodError and test environments.
Bulk Insert Tests
apps/public-api/src/__tests__/data.controller.bulk.test.js
Added comprehensive Jest test suite for insertBulkData covering: non-array rejection (HTTP 400), empty array rejection (HTTP 400), full success (HTTP 201), partial validation failure (HTTP 207 with per-item error indexing), all-items-fail validation (HTTP 400), database limit exceeded (HTTP 403), and MongoBulkWriteError handling with inserted count derivation and webhook dispatching.

Sequence Diagram

sequenceDiagram
    actor Client
    participant API as Public API
    participant Validation as Validator
    participant DB as MongoDB
    participant Project as Project Model
    participant Webhooks as Webhook Dispatcher

    Client->>API: POST /bulk (array of items)
    API->>Validation: Validate each item schema
    Validation-->>API: Per-item validation results
    
    alt Any items fail validation
        API->>API: Filter valid items only
    end
    
    API->>Project: Check database limit
    alt Limit exceeded
        API-->>Client: HTTP 403
    else Within limit
        API->>DB: insertMany(validItems, {ordered: false})
        alt Partial/full success
            DB-->>API: Insertion results + errors
            API->>API: Map bulk errors to original indices
            API->>Project: Update databaseUsed
            Project-->>API: Acknowledgment
            loop For each inserted item
                API->>Webhooks: Dispatch webhook
            end
            API-->>Client: HTTP 207 (partial success)
        else All items fail
            API-->>Client: HTTP 400
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • yash-pouranik

Poem

🐰 Hops bounce with glee, bulk inserts flow,
Many items in one request we go,
Validation springs forth, limits we check,
MongoDB batches with ordered-false deck,
Webhooks dance as each document finds home!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and specifically describes the main change: implementing a bulk data insertion endpoint with validation.
Linked Issues check ✅ Passed All technical requirements from issue #101 are met: POST /api/data/:collectionName/bulk endpoint created, validation for arrays implemented, 100+ records supported, per-index error reporting implemented, and database quotas enforced.
Out of Scope Changes check ✅ Passed All changes are directly scoped to bulk operations implementation. Minor adjustments to error logging in aggregateData are incidental to the feature scope.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/public-api/src/controllers/data.controller.js`:
- Around line 160-165: The quota check and the later increment for
project.databaseUsed have a race condition under concurrent bulk writes: instead
of relying on an in-memory check against project.databaseUsed and only
incrementing after insert (variables/functions: project.resources.db.isExternal,
batchDocSize, project.databaseUsed, project.databaseLimit, and the later
increment block), make the reservation atomic by moving the quota enforcement
into the backing store (e.g., perform an atomic conditional update like a DB
transaction or a findOneAndUpdate with $inc and a conditional that ensures
(databaseUsed + batchDocSize) <= databaseLimit), or wrap the insert and usage
increment in a single DB transaction/row lock; if atomic DB updates/transactions
aren’t available, introduce a server-side locking mechanism to serialize quota
checks and increments around this code path.
- Around line 179-195: The controller currently exposes raw Mongo errors and
rethrows raw Error objects; replace those exposures with AppError usage: where
the code references writeErr.errmsg, err.message, and the rethrow (inside the
if/else handling around insertedData, validIndicesMap, errors), map/translate DB
error details to sanitized, user-friendly messages and push new AppError
instances (or convert them via AppError.wrap/construct) into your error flow
instead of including raw errmsg or message, and instead of "throw err" throw a
new AppError with a safe message and appropriate status/code; update handling
around variables insertedData, writeErr, err, validIndicesMap, and errors to
ensure no Mongo internals are returned to the client.
- Around line 119-125: The endpoint currently returns mixed top-level response
shapes (e.g., {error}, {insertedCount}, {errors}, {insertedData}) when
validating incomingDataArray and after processing; change all responses in this
controller that use res.status(...).json(...) to follow the API envelope {
success: boolean, data: {...}, message: string }. For validation failures (e.g.,
when !Array.isArray(incomingDataArray) or incomingDataArray.length === 0 and
other error paths), return res.status(400).json({ success: false, data: {},
message: "Payload must be an array of objects." }) (or the appropriate message).
For success/bulk results (places that currently send
insertedCount/insertedData/errors at top-level), return res.status(200).json({
success: true, data: { insertedCount, insertedData, errors }, message: "Bulk
insert completed" }) and for partial failures set success to false but still
embed details under data. Update every occurrence that references
incomingDataArray validation and the bulk processing response helpers so the
shape is consistent (search for incomingDataArray, insertedCount, insertedData,
errors in this file to apply changes).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e4b43a15-1133-4532-8690-91da03ab40a3

📥 Commits

Reviewing files that changed from the base of the PR and between 4e6b3c7 and 35bfa7b.

📒 Files selected for processing (3)
  • apps/public-api/src/__tests__/data.controller.bulk.test.js
  • apps/public-api/src/controllers/data.controller.js
  • apps/public-api/src/routes/data.js

Comment thread apps/public-api/src/controllers/data.controller.js
Comment thread apps/public-api/src/controllers/data.controller.js
Comment thread apps/public-api/src/controllers/data.controller.js Outdated
@yash-pouranik
Copy link
Copy Markdown
Collaborator

Will revew this by today.

@yash-pouranik
Copy link
Copy Markdown
Collaborator

@coderabbitai Ok to merge??

@yash-pouranik yash-pouranik added enhancement New feature or request backend NSOC'26 level-2 5 - points labels Apr 18, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 18, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get PR details, reviews, and comments
gh pr view 111 --repo geturbackend/urBackend --json state,reviews,reviewDecision,statusCheckRollup,comments,mergeable

echo "---REVIEW_COMMENTS---"
gh pr view 111 --repo geturbackend/urBackend --json reviewThreads --jq '.reviewThreads[] | {isResolved: .isResolved, body: .comments[0].body[:200]}'

Length of output: 23133


🏁 Script executed:

#!/bin/bash
# Get inline review comments via REST API to check if they are resolved
gh api repos/geturbackend/urBackend/pulls/111/comments --jq '[.[] | {id: .id, path: .path, line: .line, body: (.body[:150]), resolved: .resolved}]'

Length of output: 812


@yash-pouranik Not quite yet — while all CI checks are green ✅ and there are no merge conflicts, there are 3 unresolved Major review comments that should be addressed before merging:


❌ Open Issues (all Major severity)

  1. Inconsistent response shapes (data.controller.js, Line 134)
    The endpoint returns mixed top-level shapes ({ error }, { insertedCount }, { errors }, etc.) depending on the code path. These should be normalized to a standard API envelope like { success: boolean, data: {...}, message: string } for consistency with the rest of the API.

  2. Race condition in quota enforcement (data.controller.js, Line 198)
    The quota check reads project.databaseUsed in memory and increments it only after the insert. Under concurrent bulk writes, two requests can both pass the quota check simultaneously and together exceed the limit. The fix is to make the reservation atomic — e.g., via a findOneAndUpdate with a conditional $inc, or a DB-level transaction/lock.

  3. Raw MongoDB errors exposed to clients (data.controller.js)
    Fields like writeErr.errmsg and raw err.message are returned directly in API responses, leaking internal DB details. These should be sanitized and wrapped in AppError (or equivalent) before being sent to the client.


✅ What's Good

  • All CI checks pass (lint, unit tests, CodeQL, public-api-tests)
  • Issue #101 acceptance criteria are all met per pre-merge checks
  • PR scope is clean and well-tested

I'd recommend addressing those 3 issues — particularly the race condition (#2) which is a correctness/security concern — before merging. Let me know if you'd like help implementing any of the fixes!

@yash-pouranik
Copy link
Copy Markdown
Collaborator

@ayash911 please address those 3 issues and commit.

@yash-pouranik
Copy link
Copy Markdown
Collaborator

@ayash911 bro please fix this, THen will merge na

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend enhancement New feature or request level-2 5 - points NSOC'26

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Backend] Bulk Operations Endpoint **Difficulty**: Intermediate | **Area**: apps/public-api

2 participants