Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NGAS subscription does not check if subscriber ID already taken before creating new entry #50

Closed
smclay opened this issue May 5, 2021 · 9 comments

Comments

@smclay
Copy link
Contributor

smclay commented May 5, 2021

It would appear that when NGAS receives a request for a new subscription it does not check if the subsriber ID is already in use before attempting to create a new entry in the NGAS_SUBSCRIBERS table...

2021-04-21T16:34:41.083 [ 1221] [ R-1758806] [  INFO] ngamsServer.ngamsServer#handleHttpRequest:1806 Handling HTTP request: client_address=('134.171.18.32', 32850) - method=GET - path=|SUBSCRIBE?subscr_id=aat-ngas-6%3A8002&priority=1&url=http%3A%2F%2Faat-ngas-6.hq.eso.org%3A8002%2FQARCHIVE&start_date=2021-04-21T18%3A34%3A41.078|
2021-04-21T16:34:41.084 [ 1221] [ R-1758806] [  INFO] ngamsServer.ngamsCmdHandling#_get_module:74 Received command: SUBSCRIBE
2021-04-21T16:34:41.084 [ 1221] [ R-1758806] [  INFO] ngamsServer.commands.subscribe#handleCmd:110 Creating subscription for files >= 2021-04-21T16:34:41.078
2021-04-21T16:34:41.093 [ 1221] [ R-1758806] [ ERROR] ngamsServer.ngamsServer#reqCallBack:1743 Error while serving request
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ngamsServer/ngamsServer.py", line 1727, in reqCallBack
    method, path, headers)
  File "/usr/lib/python2.7/site-packages/ngamsServer/ngamsServer.py", line 1822, in handleHttpRequest
    ngamsCmdHandling.handle_cmd(self, reqPropsObj, httpRef)
  File "/usr/lib/python2.7/site-packages/ngamsServer/ngamsCmdHandling.py", line 63, in handle_cmd
    msg = _get_module(srvObj, reqPropsObj).handleCmd(srvObj, reqPropsObj, httpRef)
  File "/usr/lib/python2.7/site-packages/ngamsServer/commands/subscribe.py", line 133, in handleCmd
    addSubscriber(srvObj, subscrObj)
  File "/usr/lib/python2.7/site-packages/ngamsServer/commands/subscribe.py", line 55, in addSubscriber
    srvObj.getDb().insertSubscriberEntry(subscrObj)
  File "/usr/lib/python2.7/site-packages/ngamsLib/ngamsDbNgasSubscribers.py", line 156, in insertSubscriberEntry
    self.query2(sql, args = vals)
  File "/usr/lib/python2.7/site-packages/ngamsLib/ngamsDbCore.py", line 837, in query2
    return t.execute(sqlQuery, args)
  File "/usr/lib/python2.7/site-packages/ngamsLib/ngamsDbCore.py", line 837, in query2
    return t.execute(sqlQuery, args)
  File "/usr/lib/python2.7/site-packages/ngamsLib/ngamsDbCore.py", line 572, in execute
    cursor.execute(sql, args)
  File "/usr/lib/python2.7/site-packages/DBUtils/SteadyDB.py", line 608, in tough_method
    result = method(*args, **kwargs)  # try to execute
IntegrityError: ORA-00001: unique constraint (NGAS05_AAT.IDX_NGAS_SUBSCRIBERS_SUBSCR_ID) violated
2021-04-21T16:34:41.094 [ 1221] [ R-1758806] [  INFO] ngamsServer.ngamsServer#send_status:350 Returning status FAILURE with message ORA-00001: unique constraint (NGAS05_AAT.IDX_NGAS_SUBSCRIBERS_SUBSCR_ID) violated and HTTP code 400
2021-04-21T16:34:41.094 [ 1221] [ R-1758806] [  INFO] ngamsServer.ngamsServer#send_status:350 Returning status FAILURE with message ORA-00001: unique constraint (NGAS05_AAT.IDX_NGAS_SUBSCRIBERS_SUBSCR_ID) violated and HTTP code 400
@rtobar
Copy link
Contributor

rtobar commented May 13, 2021

@smclay thanks for reporting this.

In the email exchanged we had about this I mentioned that simply we should check for the presence of the subscriber ID on the database and act differently depending on that result (i.e., succeed?). But now that I think about it a bit more, we should probably have the full picture before deciding how to proceed with this one.

The deeper problem I remember you are having is that your logfiles are getting flooded with these messages. This means there is a counterpart to this server that is trying to create this subscription without ever giving up. That counterpart is probably this:

if stat.getStatus() != NGAMS_SUCCESS:
msg = "Unsuccessful NGAS XML response. Status: %s, message: %s. Will try again later"
logger.warning(msg, stat.getStatus(), stat.getMessage())
continue

This check sits inside a loop that won't finish until all subscriptions are successfully made.

Subscription creation is currently successful only if a new subscription is effectively added to the database. The question is then what to do if a subscription with the same ID comes in. Here are some options:

  • Assume that because the ID is the same as an existing subscription, the subscription parameters must be the same, and simply return successfully (200).
  • Like above, but double-check that the subscription parameters are exactly the same. If they are, declare a success, if they are not declare an error (400).
  • Like above, but return a specific success code different from 200 when parameters are the same.
  • If a subscription ID is already taken just fail, but with a different error (some 4xx specific code).

I don't find options 1 and 2 too attractive because they declare a success even when a new subscription was not created, which I see as a bit of a contradiction in the semantics of the command; option 1 in particular is too simplistic, as it assumes too much. And if we go for option 3 or 4 we'd need to change the code I quoted above to take into account the new return code and adjust itself.

Thoughts? There are probably other alternatives I haven't thought of.

@smclay
Copy link
Contributor Author

smclay commented May 18, 2021

@rtobar I tried to look at the subscriber side logs but unfortunately they have already been deleted.

I am unsure but I believe when the subscriber shuts down it sends an UNSUBSCRIBE command to the subscription publisher. Is that correct? Therefore in the event of a crash or messy shutdown the subscription entry will probably not be removed. On restarting the subscriber it will send a new SUBSCRIBE command which should result in an identical request to the entry already stored in the database. I think this is probably the most likely scenario for duplicate requests which should be quite harmless.

Therefore I think I agree that options 3 and 4 are better solutions. The subscriber can log a warning when option 3 occurs. In the event of option 4, I would suggest a notification email is sent to alert the NGAS administrator. If the administrator is expecting data to flow but the subscription is rejected it is probably better to inform them so they can act on it otherwise they may not notice and the problem could go undetected for hours or days.

rtobar added a commit that referenced this issue May 19, 2021
This change is the core of #50: it checks if a subscription for the
given ID exists or not before attempting insertion. If one exists, the
server replies to the client with different HTTP codes depending on
whether the subscription is identical to the one the client is trying to
add or not. In both cases the NGAS status object still reports SUCCESS
though, but we might want to change that.

Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
rtobar added a commit that referenced this issue May 21, 2021
This change is the core of #50: it checks if a subscription for the
given ID exists or not before attempting insertion. If one exists, the
server replies to the client with different HTTP codes depending on
whether the subscription is identical to the one the client is trying to
add or not. In both cases the NGAS status object still reports SUCCESS
though, but we might want to change that.

Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
rtobar added a commit that referenced this issue May 21, 2021
This change is the core of #50: it checks if a subscription for the
given ID exists or not before attempting insertion. If one exists, the
server replies to the client with different HTTP codes depending on
whether the subscription is identical to the one the client is trying to
add or not.

Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
@rtobar
Copy link
Contributor

rtobar commented May 21, 2021

@smclay I've pushed a few changes to the issue-50 branch. I've implemented both the changes in the SUBSCRIBE command (returning different HTTP codes when a subscription already exists) and in the automatic-subscription-on-server-startup logic that should look into these HTTP codes and react accordingly. In particular, an error is logged, an email is sent, and no further attempts are done to create the faulty subscription.

Could you give these changes a go? I've added unit tests for the changes in SUBSCRIBE, but haven't yet tried automatically testing the rest (it will require some more effort), so if you could try to reproduce the situation and check that things are working it would be great.

@smclay
Copy link
Contributor Author

smclay commented May 21, 2021

@rtobar thanks for the update. I will build new packages and deploy over the weekend. Hopefully I will have some test results next week.

@smclay
Copy link
Contributor Author

smclay commented May 31, 2021

@rtobar I have carried out tests. For the most part it is working well. If the subscription already exists I get the following log messages from the subscriber host...

2021-05-31T12:05:49.721 [ 7901] [SUBSCRIBER] [ ERROR] ngamsServer.ngamsSrvUtils#_create_remote_subscriptions:151 Different subscription with ID '%s' already exists, giving up
2021-05-31T12:05:49.723 [ 7901] [SUBSCRIBER] [  INFO] ngamsLib.ngamsNotification#_sendNotifMsg:121 Sending Notification Message to: smclay@eso.org. Subject: aat-ngas-6:8001: Automatic subscription cannot be created
2021-05-31T12:05:49.760 [ 7901] [SUBSCRIBER] [  INFO] ngamsServer.ngamsSrvUtils#_create_remote_subscriptions:164 Successfully subscribed to aat-ngas-5:8001 with url=http://aat-ngas-6.hq.eso.org:8001/QARCHIVE
2021-05-31T12:05:49.761 [ 7901] [SUBSCRIBER] [ ERROR] ngamsServer.ngamsSrvUtils#_create_remote_subscriptions:171 Error while adding subscription, will try later
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ngamsServer/ngamsSrvUtils.py", line 168, in _create_remote_subscriptions
    subscriptions.remove(subscrObj)
ValueError: list.remove(x): x not in list

There appears to be two bugs. The log message "151 Different subscription with ID '%s' already exists, giving up" has not been formatted properly and there is a ValueError exception that could probably be caught and handled.

On the publisher side we get the following log messages...

2021-05-31T12:05:49.656 [11060] [       R-0] [  INFO] ngamsServer.ngamsServer#handleHttpRequest:1806 Handling HTTP request: client_address=('134.171.18.32', 57492) - method=GET - path=|SUBSCRIBE?subscr_id=aat-ngas-6%3A8001&priority=1&url=http%3A%2F%2Faat-ngas-6.hq.eso.org%3A8001%2FQARCHIVE&start_date=2021-05-31T12%3A05%3A49.646|
2021-05-31T12:05:49.658 [11060] [       R-0] [  INFO] ngamsServer.ngamsCmdHandling#_get_module:81 Received command: SUBSCRIBE
2021-05-31T12:05:49.662 [11060] [       R-0] [  INFO] ngamsServer.commands.subscribe#handleCmd:117 Creating subscription for files >= 2021-05-31T12:05:49.646
2021-05-31T12:05:49.701 [11060] [       R-0] [  INFO] ngamsServer.ngamsServer#send_status:350 Returning status FAILURE with message Different subscription with ID 'aat-ngas-6:8001' existed and HTTP code 409
2021-05-31T12:05:49.705 [11060] [       R-0] [  INFO] ngamsServer.ngamsServer#send_data:320 Sending 369 bytes of data of type text/xml and headers {}
2021-05-31T12:05:49.705 [11060] [       R-0] [  INFO] ngamsServer.ngamsServer#handleHttpRequest:1833 Total time for handling request: (GET, SUBSCRIBE ,, ): 0.050 [s]

I also receive an email from the subscriber...

Notification Message:

NGAS attempted to create an automatic subscription with ID=aat-ngas-6:8003 to obtain data from aat-ngas-5:8003, but the remote server already has a subscription registered with the same ID, but different details.

Instead of retrying to create this subscription over and over, this server will give up now. To fit this either remove the remote subscription, or change the ID of the subscription to be created in the local server configuration.


Note: This is an automatically generated message

There is a typo 'fit' -> 'fix' in the email message.

@smclay
Copy link
Contributor Author

smclay commented May 31, 2021

@rtobar I noticed that shutting down the subscriber removes the subscriptions from the ngas_subscribers table. This happens also when the subscriber ID was already taken. I think this bahviour is probably the best solution.

@rtobar
Copy link
Contributor

rtobar commented Jun 1, 2021

@smclay thanks for the detailed report. I've addressed all the points you mentioned (typo in email, unformatted log statement, ValueError issue) while maintaining the behavior at shutdown. If you pull the issue-50 branch you should see the new commit with the fixes. Let me know if things are correctly for you. Hopefully this time I got things right, and if so I'll merge back to the master branch.

@smclay
Copy link
Contributor Author

smclay commented Jun 7, 2021

@rtobar I tested your latest changes today. Everything looks good...

2021-06-07T13:12:31.566 [31652] [SUBSCRIBER] [ ERROR] ngamsServer.ngamsSrvUtils#_create_remote_subscriptions:158 Different subscription with ID 'aat-ngas-6:8001' already exists, giving up
2021-06-07T13:12:31.568 [31652] [SUBSCRIBER] [  INFO] ngamsLib.ngamsNotification#_sendNotifMsg:121 Sending Notification Message to: smclay@eso.org. Subject: aat-ngas-6:8001: Automatic subscription cannot be created
...
2021-06-07T13:12:41.601 [31652] [SUBSCRIBER] [  INFO] ngamsServer.ngamsSrvUtils#_create_remote_subscriptions:184 No Subscriptions established

I think we can now close this issue. Thank you for the improvements.

@rtobar
Copy link
Contributor

rtobar commented Jun 8, 2021

Thanks @smclay for double-checking the new changes, I've just merged the new issue-50 branch into master.

@rtobar rtobar closed this as completed Jun 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants