Design consideration to use MQ for Notifications and Reporting Request.

Introduction

This wiki page encourages the use of MQ instead of a Worker process for Notification and reporting request. MQ reduces unnecessary threads (Notification Thread) and also reduce memory consumption. The bottom line is the Notification and reporting request are REST based and it makes complete sense to exercise a solution which is horizontally scalable, reliable and fault tolerant.

Why MQ ?

ActiveMQ offers proven scalaibility,availability, and performance that will grow with the customer's requirements. You may end up bringing down your server if one or more of the Notification Thread is in a deadlock situation.
The Active MQ message broker is writtern in langugae and its very portable.
ActiveMQ can facilitate a wide range of general purpose solutions via multiple messaging pattern.
ActiveMQ is an OSS based Service Oriented Architecture (SOA) deployment. Features
Active MQ is fast and feature rich open source JMS message broker primarily targeted for loosely coupled distributed application environments. Its provides persistence and once and only once assurance message delivery and can be highly scalable through clustering ,peer to peer federated network.
Active Message Queing supports both the point to point and publisher and subscribe messaging models.
ActiveMQ provides high availablity and high performance clustering and general purpose asynchronous mesaging.
Active MQ Support variety of transport protocol such as TCP,SSL,UDP and Multicast.
Embedded and standlone broker mechanism one of the unique featues of Active MQ.
One advantage of using an embedded broker is that if the network fails , its embedded clients can still use the services of the broker.
Strutucred architecture provide more binded and reliable messgae queueing mechanism.
Best for Large Resource Mangement architecture.
Authentication using JNDI-LDAP -SSL
Active MQ also provides scheduling mechanism so we could deliver the message based on a schedule too.

Design

A notification/report request comes into HMIS in the form of a REST call which is pushed into the Queue. We could have 2 different Queues 1 for notification and the other one for reporting which will be part of a fancy Queue cluster which can be horizotally scalable via a configuration change.We shall also have 2 error queues one for notification and reports.Active MQ has loggingInterceptor which will provide us enough logging for messages. The message consumers shall perform the notification/reporting tasks using the Notification/Reports Builder process.The messages from the error queue can be processed by another process which will perform a configurable number of retries before notifying the system administrator.

Conclusion

The bottom line here is MQs are more scalable,reliable, fault tolerant and portable solution when compared to a bunch of threads waiting to act upon something inserted inside a DB.

Text [but somewhat edited] of conversation on Google Hangouts of whether to use MQs now or later.

" Surya Yadavalli Hi Eric, made good progress on creating a worker service and setting up a scheduler for that. Basically, this acts as a MQ at at DB level. For instance, when we have a notification to be sent out, we will keep that notification in this Worker Table, and a scheduler picks this up and send it to a NotificationWorker who can send out notifications. Like wise, if there is a request for report, that request will simply get into this table and a scheuler picks up this Reporting request and sends it to ReportWorker. Surya • Wed, 5:37 PM Surya Yadavalli So, essentially, this acts as our central place for handling asynchronous requests like Notifications and Reports. So, given the type of the Request, Appropriate Work will handle the Job. Surya • Wed, 5:38 PM Surya Yadavalli Also, I have considered the scenarios, where notifications fail , reports fails so, I am working on retries, and maximum no of retires.. so, We know that something failed, so will try again, after N number of attempts, if we still fail, then we will give up and look into what happend. So, system just wont keep working on something that will never work. One thing I need to know is, what kind of reporting formats that we need ? PDF is one for sure. other than that, do we need reports in XL format as well ? Surya • Wed, 5:42 PM There are XML reports. Wed, 5:43 PM Eric Jahn Surya Yadavalli I am working on Notification for now, but like i mentioned I have reporting service also in mind, so some components are overlapped and used for both. Surya • Wed, 5:43 PM One is the AHAR. It is submitted directly to HUD, not back to the requestor. (Maybe a copy to the requestor). Another is the SSVF, which is CSV. That goes to the veteran's administration for homeless vets. Wed, 5:44 PM Eric Jahn Surya Yadavalli so, the response for a report request is either XML and CSV ? either XML or CSV ? Surya • Wed, 5:44 PM "Supportive Services for Veteran's Families" = SSVF No, each report has specific combinations of formats. SSVF = CSV only (maybe XML some day) AHAR = PDF and XML almost all the other reports are PDF only PDF Wed, 5:46 PM Eric Jahn Surya Yadavalli ok, so, PDF, XML , CSV that we should be able to support for now. Surya • Wed, 5:46 PM yes Wed, 5:46 PM Eric Jahn Eric/Surya, I hope this design is not final and we are still contemplating some additions to this design. Well First up I really liked your diagram Surya. I like the way you want to use templates for notification typical of a notification or a report layout. The only thing I want to add/change is to use MQs instead of workers. MQs are scalable and portable compared to worker roles. Plus if something goes wrong you can always throw things into an Error Queue. Sandeep • Thu, 12:43 AM Sandeep Dolia And re-run the report/notification from the error Queue. Sandeep • Thu, 12:54 AM Sandeep Dolia Let me know if you want me to draw out stuff. But I feel my point is very clear. MQ is a better option than running a scheduled job based on a row inserted into a table. ... Sandeep • Thu, 1:11 AM Surya Yadavalli MQ and scheduling are two different things. They are NOT alternatives to each other. Before this design i spent decent time on how this should work but, for notification and reporting purposes, we don't need MQ. ... Surya • Thu, 2:29 AM Sandeep Dolia Do we need scheduling for notification/reports ? I believe both of them are events based. Once we get a REST call for notification we perform notification and same is the case with reports MQ will provide necessary scalability for this generic process. I can write things down in a wiki. Sandeep • Thu, 2:35 AM Sandeep Dolia Here's the Wiki page. Design consideration to use MQ for Notifications and Reporting Request. https://github.com/servinglynk/OpenHMISDataWarehouse/wiki/Design-consideration-to-use-MQ-for-Notifications-and-Reporting-Request. ... Sandeep • Thu, 4:47 AM So it sounds like both the standalone MQ and the Worker Table are performing the same functionality of asynchronously queuing/dequeuing message tasks, since there can be long report generation times for most of the HUD reports. The MQ will be more robust, but will use more resources. Then there is the issue of needing a scheduler abstraction to act on the queue, as opposed to just using First In/First Out on the queue? Thu, 8:25 AM Eric Jahn Do I have the dimensions of the discussion fairly summarized, our am I missing/misstating a dimension? Thu, 8:37 AM Eric Jahn Sandeep Dolia MQ uses less resources compared to a bunch of threads waiting for something to be inserted into a table. Eric, You are correct here. Sandeep • Thu, 10:44 AM Sandeep • Thu, 11:00 AM So the MQ versus a simple table is definitely a robustness issue. Now onto the scheduler part. Sandeep, if we were to use a MQ, wouldn't you still need some scheduler code to pull of the Queue, and hand off the task to a thread? err pull off Thu, 11:16 AM Eric Jahn Sandeep Dolia Well, First of all we do have any requirement related to scheduling a notificaiton or report ? Secondly Active MQ has a scheduler. http://activemq.apache.org/delay-and-schedule-message-delivery.html which can run CRON jobs based of a schedule provided. MQ Consumers could be part of any JVMs They are scalable and if we find the Queue dept rising we could always add more consumers to process that request through a config change. Sandeep • Thu, 11:22 AM No, there are no scheduling requirements, per se. I think it only comes in as an abstraction from needing to have some code watching for resources to free up, at which point the scheduler pops a task off the queue stack and tells a specific worker/thread to run against it now. So it's not really a scheduler, but an event dispatcher we need, working on a FIFO basis? Thu, 11:25 AM Eric Jahn Sandeep Dolia Yes Exactly. MQ are the best when it comes to dispatching events of an asynchronous nature. Sandeep • Thu, 11:26 AM Okay, let's wait for Surya's response to this. I don't want to reinvent the queue wheel, but installing something like ActiveMQ is a pretty big, monolithic dependency. But I'm not sure a simple table (even if it's replicated in a data store) is good enough to be our queue for all report/export requests coming in. Thu, 11:30 AM Eric Jahn Sandeep Dolia Active MQs with a cluster is distributed and it is very simple to install an MQ in both Windows and Unix. It is simple and platform independent http://activemq.apache.org/getting-started.html So technically you could install it in your local machine too. And we could get fancy in out live environment by making it part of a cluster. http://activemq.apache.org/clustering.html Sandeep • Thu, 11:32 AM Okay, let's hear back form Surya. err from In the meantime, I'm going to do another iteration of time frames for this project, so I'll have lots of questions about estimates time/effort on stuff like the reports, and other business logic like unduplication. I like that "competing consumer" pattern in your link.
Thu, 11:44 AM Eric Jahn Surya, looking at this: https://raw.githubusercontent.com/servinglynk/OpenHMISDataWarehouse/master/doc/Notification_Service.png?token=ADs 7Kh1jXspHyxcE4vXgwYw39jkSi4xt5ks5VJt-SwA%3D%3D , what is the difference between the "Notification Worker" #5 and the "Notification Engine" #6. ? Thu, 4:40 PM Eric Jahn Sandeep Dolia Here's a quick and dirty diagram with MQs https://drive.draw.io/#G0Bxy6mznLsgoDSDk2NWxMTUs3OGs Sandeep • Thu, 4:51 PM Sandeep Dolia It is completely distributed and the Producers and the consumers may be part of a different JVM. Sandeep • Thu, 5:04 PM Surya Yadavalli ... Why do we need to a DB and polling, when we can use MQ. I am assuming this is the concern. What we did here is not an alternative approach for MQ, but we didn sorry. What we did here is not an alternative approach for MQ, but we didn't simpy use MQ. When you have a notification to be sent out, or when you receive a report request, We always should have a track in our DB, regardless of how we process (either through MQ or poll that data). Surya • Thu, 5:20 PM Sandeep Dolia And we can also have that using the MQ approach Sandeep • Thu, 5:20 PM When you have a notification to be sent out, or when you receive a report request, We always should have a track in our DB, regardless of how we process (either through MQ or poll that data).
Assuming that when a notification request is received, and when a report request is received , we keep in a INIT status (so we know this is not processed yet)., and once that is processed, we keep that status as SUCCESS or FAILURE and if its a FAILURE, we want to process again for N number of time. Regardless of the technology we choose, I believe thsi is what we want to acheive. Given this, coming to the need of MQ , we are not developing a system to process emails, ours is not a SMTP server. Notification service is a supporting system that we have to support our actual busienss functions. Do we have a Million notifications to send out in an hour on in a day ? or some X number, that we think a need of MQ. My take is NO. However, as I mentioned , if we see that need, down the line, assumign that we really need to process 100s of thousands of notifications or 100s of thousands of report requests, instead of processing the notification messages or reprotign messages directly (as you can currenlty all the messages are handed over to the Engine), we will just put them into a Queue (MQ implemenation). That is an extension to the system, but not a change to the system. so, esstentially MQ will pick up the message and a consumer (notification consumer or report request consumer ) will hand it over the engine. Questio is, if we dont' see millions or 100s thousands of notifications or Report request in an hour, why MQ ?, when we can plug in any time we want Secondly, coming to the disturbution . If you look at our Micro services, approach, this Notiication service is independent so, that issue will not arise as it can completely run on its own server / clustered servers. When notifications and report requests are asynchronous in nature, my initial thought was MQ , but then just becuase MQ serves asynchronous prupose, we dont' have to use that, if that is not needed. I will include, how MQ can be plugged in any time in this design and publish in sometime, if you are okay with this Sandeep, sorry for interpputing but please go ahead Surya • Thu, 5:34 PM Sandeep Dolia Surya, I hear your thought about the Engine and I'm in favor of that. Typically the Consumers would carry the engine code. I'm also in favor of persisting these requests in a DB and maintaining their state.(Had that in my design too). But we cannot design applications thinking we will not get enought messages in the future. Sandeep • Thu, 5:36 PM Sandeep, not at all, I am saying, we are not talking about an Alternative appraoch to MQ here. Surya • Thu, 5:37 PM Sandeep Dolia MQ is proven to be scalable, portable and easily configurable verses a bunch to worker threads. Processes like these when failed can bring down the server. Sandeep • Thu, 5:39 PM Surya Yadavalli give me an example ? I mean a scenario. Surya • Thu, 5:39 PM Sandeep Dolia When a worker thread goes into a an infinite loop or some unhandled exception it may effect other threads bringing the entire system out with memory issues. Sandeep • Thu, 5:40 PM Surya Yadavalli if worker thread goes into a infinite loop, that is code issue. And if there is a memory issue, that is a code issue too. Surya • Thu, 5:41 PM Sandeep Dolia It could be a data issue issue too which the code could not handle. Sandeep • Thu, 5:41 PM Surya Yadavalli meaning ? Surya • Thu, 5:42 PM Sandeep Dolia Either ways. Surya, a worker thread can only be a short term POC thing. But MQs would make this approach more portable and scalable and fault tolerant. Sorry I was tying my sentence and did not see you typed "meaning". All I'm saying is there may be multiple possible reasons for a thread failure, code issue, data issue or DB down your worker role would be stuck. Do you agree what MQ is a long term preferred approach for this situation ? Did you get a chance to look into my wiki page ? Sandeep • Thu, 5:46 PM Surya Yadavalli I am totally with you, if you say, we will have a 100K notifications to receive an hour. or 100K reports to generate an hour. Just that we need to have a balance on if this is requried or just doing more than needed. As long as this is something we can always plug in, you dont have to worry about , how this is scalable in the future. I have worked on one system earlier that receives 75 Million medical device messages a day and we used MQ because that totally makes sense for that scenario. Surya • Thu, 5:49 PM Sandeep Dolia The best part is we could always scale up and scale down whenever we want. If the worker thread is down then we might have to restart the server etc, Sandeep • Thu, 5:50 PM Surya Yadavalli What do you mean by worker thread is down ? Worker thread is a spring bean, What exactly do you mean by that being down ? I dont understand that part Surya • Thu, 5:51 PM Sandeep Dolia And with MQ you could send things to the Error Queue which may be processed with re-try logic later. Sandeep • Thu, 5:51 PM Surya Yadavalli When you say, worker thread is down, you are essentially saying our application is down We have more bigger issues to deal with then, not notifications Surya • Thu, 5:51 PM Sandeep Dolia The worker thread is in a deadlock situation becasuse of the code or data issues. Sandeep • Thu, 5:52 PM Surya Yadavalli Worker therad will be in a dead lock , because of code or data issues ? Surya • Thu, 5:53 PM Sandeep Dolia If you know Worker thread and don't know about different situations through which a Worker thread could cause memory issues and bring the server down then this is a bigger issue with that design. Sandeep • Thu, 6:00 PM Surya Yadavalli Lets not be generic in saying we might get into memory issues, we might get in deadlocks, worker thread can take down the servers..As much as those statements make me feel worried, I really want you come up with a real use case that will pose an issue to the system. Surya • Thu, 6:02 PM Sandeep Dolia Issues usually occur on a case by case basis and a very simple one could be where your code has an UnhandledException causing the the WorkerThread to consume all the memory. This was a very simple one. Sandeep • Thu, 6:04 PM Surya Yadavalli Okay, what do you mean by worker threm consuming memory ? elaborate that part of the issue Surya • Thu, 6:04 PM Sandeep Dolia The Worker Role would run in an infinite loop unable to process other reuqest. Sandeep • Thu, 6:05 PM Surya Yadavalli I am not against MQ, but I want to know if we have real concerns to be worried about , so dont get me wrong. Surya • Thu, 6:05 PM Sandeep Dolia I hear you. Worker role just didn't seam right from a scalabliliy, portability perspective. Sandeep • Thu, 6:06 PM Surya Yadavalli So, you are saying worker thread consume all the memory, if it runs infinite loop ? Surya • Thu, 6:06 PM Sandeep Dolia There may be multiple worker role blocked because of the same issue and leaving GC in a vulnerable state. Rather than pointing out a lot of defects with the Worker role process I also want to show the bright side of MQ. Look into my Wiki page and you shall get more idea. A bunch of threads to act upon depending on a DB change freaks me out. MQ process has an execellent re-try logic and will not effect existing process environment. Sandeep • Thu, 6:11 PM Surya Yadavalli ... We just don't have bunch of threads runing against DB. if you have worked with Spring batch , or quartz ? We are just doing that. MQ, We can get it set up , get it up and running and have notification service hooked up with it in no time. Surya • Thu, 6:13 PM Sandeep Dolia Ok. ... why do we want to run a batch process based on something persisted inside a DB. MQ is more generic in that context. I know Spring Batch is an easier implementation. Than setting up MQ and using it. But MQ has long term benefits. Your produces and consumers are Platform, OS independent. Sandeep • Thu, 6:17 PM Surya Yadavalli Anything is Java component. You dont have to worry about OS indepence here. I guess. Surya • Thu, 6:17 PM Sandeep Dolia Just specifying an advantage of MQ approach. Sandeep • Thu, 6:18 PM Surya Yadavalli You use something, when you need. Just becuase you see the word "asynchronous" you don't have to use MQ 😃 Surya • Thu, 6:18 PM Sandeep Dolia In short Spring batch can scale vertically and MQs can scale horizontally. Sandeep • Thu, 6:19 PM Surya Yadavalli if we should implement MQ, it's not difficult for us anyway. Question here is MQ can be plugged in any time, do we need it now , given the amount of requests we aniticipate in next 2 years, 3 years, 5 years down the line. Surya • Thu, 6:22 PM Sandeep Dolia That's exactly I'm saying. 2, 3,5 years down the line we could just add more nodes to consumers and they are good to go. I don't know if we will be there in next 3,5,7 years. Sandeep • Thu, 6:24 PM Surya Yadavalli We can just run notification service on another node. You are not seeing the micro service advantage though Surya • Thu, 6:24 PM Sandeep Dolia Ah.. I see that possibility too. Sandeep • Thu, 6:25 PM Surya Yadavalli The way you are designign these services has some pupose to it. That is scalability. Surya • Thu, 6:26 PM Sandeep Dolia What is your thought on the re-try logic. Sandeep • Thu, 6:26 PM Surya Yadavalli Just configure, how many times you want. it will be taken care of.

MQ doesnt offer benefits just like that. It comes with a cost. MQ also persists data in DB. You have to deal with those systems where MQ is running, and the DB where you are peristeing data , (if not file based). Surya • Thu, 6:28 PM Sandeep Dolia Just want to see what will be the cost of adding a consumer verses adding a new instance of notification service is. Sandeep • Thu, 6:28 PM Surya Yadavalli That system itslef simply won't run just like that. You require maintainnce. There are resources who just deal with MQ systems Surya • Thu, 6:29 PM Sandeep Dolia In MQs it is configurable. Sandeep • Thu, 6:29 PM Surya Yadavalli This is configurable here as well. Surya • Thu, 6:29 PM MQs with Spring has an auto clean option so you don't have to maintain a DB The messages will be deleted. Sandeep • Thu, 6:34 PM Sandeep Dolia The bottom line from my side is MicroService do provide the scalibility however we need to think about if we want to consider using Spring Batch which looks up a DB an triggers an event or an MQ approach with a cluster with logging interceptor. Sandeep • Thu, 6:38 PM I think at this point we should let Surya continue with his implementation as he's conceived it. I can't see us getting >1000 report requests per day for the foreseeable future with this system. ... You can still edit the points you've made on that page afterward. Is that okay? Also, Surya, could you add to your Notification diagram a labeled box overlay indicating where an MQ could be substituted in, for when the day comes that we'll need it? I'd like to just get something working for notification/reporting, but not dwell too much on it, since we really need to get the API and report logic implemented now. Thanks guys for going through this exercise, and it was very worthwhile, so that we are consciously accepting the consequences of our decision. Thu, 11:09 PM Eric Jahn Sandeep Dolia On a side note the only trivia to Surya's design will be what will happen to the REST call when we run into an unrecoverable state like database is down etc... We need to think about this because notification module is a silo. If we are not able to persist into the DB due to some reason then the batch process will not be called. Food for thought... . Sandeep • Thu, 11:17 PM So if the database is down, the REST call for a report just returns the error code. Is that sufficient? Thu, 11:26 PM Eric Jahn Sandeep Dolia Yes for reports. But notifications can be as a result of a new client added. And is performed on a backend. Should we consider having an emailSent field in the client table. ? New client is just a scenario, but there may be several scenarios like that. I just want to contribute in making the process fault tolerant from both recoverable and unrecoverable state. Sandeep • Thu, 11:32 PM I'm not understanding why we would send an email out for when a new client is added. Thu, 11:32 PM Eric Jahn Sandeep Dolia Sorry if this is not part of the requirements. I thought since we take users information we could notify them that they are part of this system somehow.... Sandeep • Thu, 11:34 PM I see generally where you're going though. If the message db is the same as the model transactional db, when things break, both go down. Thu, 11:34 PM Eric Jahn Sandeep Dolia Yes. Sandeep • Thu, 11:35 PM Oh, like system users? Thu, 11:35 PM Eric Jahn Sandeep Dolia Yes. Sandeep • Thu, 11:35 PM I'd like to just use email for really latent/asynchronous processes, like report completion and such. But I'm sure administrative functions will crop up where we have to email users. Thu, 11:37 PM Eric Jahn Sandeep Dolia Since the message Db and HMIS Db are separate Dbs it could be possible that something passes in the HMIS Db but fails in the message Db. Sandeep • Thu, 11:39 PM Yes, something we'll have to keep in mind. The good thing is that we're using distributed replicated data stores, so we have some resilience. But on some level, if the database goes down, I think the whole system is eventually down. Thu, 11:41 PM Eric Jahn Sandeep Dolia And that's why we need an indicator with in our app that when we make a rest call to the notification service we need to re-process those requests based on a HMIS table or a field (notificationSent) with in an existing table. We can deal with these scenarios once Surya builds basic features. Sandeep • Thu, 11:47 PM Agreed Thu, 11:47 PM Eric Jahn "

Provide feedback

Saved searches