Skip to content

Fix for concurrency problem in Elasticsearch and fix for excluding invalid mountnames. Related issues are 1472, 901 and 1455#1495

Merged
PrathibaJee merged 3 commits intov2.0.0_implfrom
KarthikeyanV/issue1472
Oct 9, 2025
Merged

Fix for concurrency problem in Elasticsearch and fix for excluding invalid mountnames. Related issues are 1472, 901 and 1455#1495
PrathibaJee merged 3 commits intov2.0.0_implfrom
KarthikeyanV/issue1472

Conversation

@KarthikeyanV-TechM
Copy link
Collaborator

@KarthikeyanV-TechM KarthikeyanV-TechM commented Oct 1, 2025

Details of changes:

  1. Added a per-mountname lock directly inside regardDeviceAlarm. This way, if Kafka delivers hundreds of updates for the same device at once, they’ll queue up and be processed sequentially per device (no version conflicts in Elasticsearch).
  2. All messages for the same mountname will be queued in sequence.
  3. Different devices (mountname values) are processed in parallel (no global bottleneck).
  4. Prevents version_conflict_engine_exception completely and data losses in ES.
  5. Added a new utility called alarmLogTracker.js which tracks and logs all alarm-notification related updates into a log file alarm-notification.log
  6. This log file will retain a logs of size upto 200MB for 10days.
  7. Fix for excluding invalid mountnames from /v1/provide-list-of-cached-devices API.
  8. Related issues are 1472, 901 and 1455

@KarthikeyanV-TechM KarthikeyanV-TechM changed the title Fix for concurrency problem in Elasticsearch. Related issues are 1472 and 901 Fix for concurrency problem in Elasticsearch and fix for excluding invalid mountnames. Related issues are 1472, 901 and 1455 Oct 2, 2025
Copy link
Collaborator

@PrathibaJee PrathibaJee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi karthikeyan,
Thanks for working out in detail.
The proposed solution looks good. After going through the changes I have some doubts regarding (considering that the prod environment produce millions of alarms to MWDI consumer).

  1. memory leak &
  2. error isolation

feel free to either fix or comment further. Many thanks.


return result;
} finally {
release();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to release the lock in the finally block.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

let alarmTypeQualifier = currentJSON['alarm-type-qualifier'];
let problemSeverity = currentJSON['problem-severity'];
let mountname = decodeMountName(resource, false);
// 🔹 Wrap the critical section with a per-mountname lock
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kindly remove the diamond symbol. Thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

.gitignore Outdated
@@ -1 +1 @@

.idea/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is specific to local environment. Please remove. Thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return result;
} finally {
release();
queueCounts.set(id, queueCounts.get(id) - 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're storing a Promise and a queue count for each mountname (id) in locks and queueCounts, but never removing them after processing is complete.
As MWDI already have memory consumption problems, suggesting to clean up both maps when the queue count drops to zero. Thanks.

finally {
  release();
  const newCount = queueCounts.get(id) - 1;
  if (newCount <= 0) {
    queueCounts.delete(id);
    locks.delete(id);
  } else {
    queueCounts.set(id, newCount);
  }
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


try {
const start = Date.now();
const result = await fn();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if fn() hangs or throws an error, it could block the queue for that id.
If such scenarios happen , add a timeout wrapper or error isolation, something like this,

const timeout = (ms) => new Promise((_, reject) => setTimeout(() => reject(new Error("Timeout")), ms));
const result = await Promise.race([fn(), timeout(10000)]); // 10s timeout

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@PrathibaJee PrathibaJee merged commit 45c6c69 into v2.0.0_impl Oct 9, 2025
@kmohr-soprasteria kmohr-soprasteria deleted the KarthikeyanV/issue1472 branch March 13, 2026 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants