Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NGSI-LD subscription constant crashes: "std::bad_alloc" #60

Closed
tanktoo opened this issue May 8, 2019 · 18 comments
Closed

NGSI-LD subscription constant crashes: "std::bad_alloc" #60

tanktoo opened this issue May 8, 2019 · 18 comments
Assignees
Labels
bug Something isn't working Fixed - needs validation

Comments

@tanktoo
Copy link

tanktoo commented May 8, 2019

Hi,
I am trying to get a subscription working for new added data sources. For this I add the following subscription:
{ "id": "urn:ngsi-ld:Subscription:testsubscription", "type": "Subscription", "entities": [ { "type": ".*" } ], "notification": { "format": "keyValues", "endpoint": { "uri": "http://callback_computer:8080/callback" } }, "@context": "https://fiware.github.io/NGSI-LD_TestSuite/ldContext/testFullContext.jsonld" }

From time to time I get a 201 header back with the subscription id as location. Besides the fact that I don't get any callback for new context (don't know why) the orion-ld server, provided as the latest docker image, constantly crashes.
When trying to add the subscription the connection seems to hang up. After a while (ca. 1 minute) the broker has crashed with a "std::bad_alloc" in its log.
Even if I am doing something wrong with the subscription a crash should not happen? Is there any help/tutorial showing ho to get a subscription working with NGSI-LD? I just found the test suite where I copied the subscription example and changed it to mine (as shown above).
Thanks and kind regards,
tank

@kzangeli
Copy link
Collaborator

kzangeli commented May 8, 2019

The ngsi-ld implementation is very much work-in-progress and likely to fail or crash.
We expect to have a stable release candidate before winter, with most of the specification supported.
So, crashes is nothing to be surprised by, I'm afraid.

That said, I will look into this problem right now and see if I can see anything strange.

Could you please tell me the response to a GET /ngsi-ld/ex/v1/version

% curl localhost:1026/ngsi-ld/ex/v1/version

Just to see the exact version you are using.

About the subscription, type ".*" will not give you what you want (which I assume is "any type").
In orion, "typePattern": ".*" would do it but orionld doesn't support type patterns.
Instead, use "idPattern": ".*" and whatever type your entity is of.

I will do my tests with your exact request though to try to catch the error that you have seen.
Because "yes", the broker should never ever crash. Only report errors in a controlled manner.

@kzangeli kzangeli self-assigned this May 8, 2019
@tanktoo
Copy link
Author

tanktoo commented May 8, 2019

Hi,
the curl output for version is:
{ "branch": "bug/133.orionld_issue_0019", "kbase version": "0.2", "kalloc version": "0.2", "kjson version": "0.2" }
Regarding the type. I tested with NGSIv2 last week. There I had changed to typePattern. I just saw the example in the NGSI-LD test suite and changed to type. What I want to achieve is to get a notification for every entity created within orion-ld without knowing any type before.
Thanks for your help and the answers.

@kzangeli
Copy link
Collaborator

kzangeli commented May 8, 2019

So, you have a very new version of orionld, that's great.

About the crashes, "std::bad_alloc" indicates you are out of memory. Nothing we can do about a problem like that.
Without memory, the broker can of course not run. It seems you have a serious memory problem.

I created a simple func-test, that simply creates the subscription and receives the 201 Created and ran it 100 times and all OK. But then again, I have 16 GB in my laptop ...

About typePattern ...
This field isn't a part of ngsi-ld, see spec: https://www.etsi.org/deliver/etsi_gs/CIM/001_099/009/01.01.01_60/gs_CIM009v010101p.pdf
So, what you want to do is not possible, sorry.

If this is a big problem for you (and others) we might open up an issue about it and eventually implement typePatterns.

I don't know why this kind of patterns weren't included in the spec for ngsi-ld but I will find out and I will let you know.

kzangeli added a commit that referenced this issue May 8, 2019
@kzangeli
Copy link
Collaborator

kzangeli commented May 8, 2019

I added the functest, in case you want to use it:

% cd test/functionalTest/
% ./testHarness.sh ngsild_issue_0060.test
mié may  8 17:30:34 CEST 2019
0001/1: 0000_ngsild/ngsild_issue_0060.test ...................  03 seconds
Total test time: 3.17 seconds

I'm closing this issue.
The memory problem is all you, I'm afraid.
If you wish to ask for typePattern to be implemented, open a new issue that talks only about typePatterns (then we'll see if the public opinion is with you :))

@kzangeli kzangeli closed this as completed May 8, 2019
@tanktoo
Copy link
Author

tanktoo commented May 8, 2019

Hm, I had a look at the memory consumption as this was one of our first ideas. At the moment the server (VM) has only 4GB but I can easily increase it. Will test it tomorrow and give you an answer.
For the typePattern stuff, I will think about it tomorrow. Thanks for your help!

@tanktoo
Copy link
Author

tanktoo commented May 9, 2019

Hm, I don't exactly know how to run the functional test. I can execute the script as I have the git repo checked out locally but the broker is running in a docker container. So I am not sure it the script is doing anything (I get 3 outputs all with failure 10).
Besides this I have increased the memory of our server to 32GB and can reproduces the std::alloc easily. I just send the subscription, exactly the one from your testfile for Issue60, twice. The first try is working (at least I get a 201 back). If I send the same subscription again I can see how the consumed memory is increasing. At about 20GB RAM consumption the broker crashes with std::alloc.
So I don't know if this is somehow related to docker? You are running the broker without docker or?

@kzangeli
Copy link
Collaborator

kzangeli commented May 9, 2019

"Create the 'same' subscription twice": that's very important input.
That's not what I tried yesterday. Will try it now. Might be a bug, an infinite loop somewhere in the broker that allocates a block over and over until there is no RAM left.

Testing it NOW ! :)

@kzangeli
Copy link
Collaborator

kzangeli commented May 9, 2019

So, there is definitely a bug.
Testing with a second creation of a sub with the same ID in my case there is no crash, but the broker responds with a 201 Created which is no good. Should be a 400 Bad Request. Will check with the documentation to see if there is anything specified about the desired behaviour. I doubt it, as this is a pretty strange case, trying to overwrite an existing subscription (PATCH /ngsi-ld/v1/subscriptions/{subscriptionId} would be used for this purpose, but that operation isn't implemented).

Will fix it so that is returns a 400 Bad Request (unless the spec says otherwise).

Now, why are you trying to create the same subscription again?
Perhaps your problems disappear if you avoid to do that ...

@kzangeli kzangeli reopened this May 9, 2019
@kzangeli kzangeli added the bug Something isn't working label May 9, 2019
@kzangeli
Copy link
Collaborator

kzangeli commented May 9, 2019

The broker now looks up a subscription before creating it.
If it already exists, an error is returned (400 Bad Request)
See PR #62.

Now you don't have any other choice than to avoid trying to recreate the subscription ...
Pity I couldn't reproduce your memory problem ...

Perhaps you could send me the logfile (still using the version of the broker that fails), starting the broker with all traces:

% orionld -t 0-255 -logLevel DEBUG

Like that, with some luck, I might be able to find the problem you are facing.

[ Also, in case you are familiarized with valgrind, you could run the broker under valgrind and send me the report. that would be extremely valuable ]

@tanktoo
Copy link
Author

tanktoo commented May 9, 2019

I changed to debug level with all traces. As the broker now doesn't crash immediately (there is a high CPU load, I think it will crash in days after finishing all the logs) I just attached a log from the beginning until it starts repeating all the lines. Seem like an infinite loop.

orion_halfhalf.log.tar.gz
The log file contains the first 500k lines of the orion log. I hope this helps.

Edit: I did the subscription again just because of testing. I did not receive anything and just played a bit too much with postman. In the real use case we won't have to constantly update the subscription and if we need to update we can DELETE and renew it (or PATCH if it is integrated sometime)

@kzangeli
Copy link
Collaborator

kzangeli commented May 9, 2019

It helps, of course it does :)
Seems now that the broker has entered a recursive function that it never comes out of.
Strange that the same thing doesn't happen to me ... Pity!

However, now that I have an idea what to look for I will do my best to reproduce the error and fix it.

Thank you very much for helping me to find an ugly bug !

@tanktoo
Copy link
Author

tanktoo commented May 10, 2019

The broker now looks up a subscription before creating it.
If it already exists, an error is returned (400 Bad Request)
See PR #62.

Is this already implemented in the latest docker image? I still get the error. As the "GET" for active subscriptions is not implemented it is somehow impossible to proof for active subscriptions as my broker is still crashing instead of providing a bad request.

@kzangeli
Copy link
Collaborator

kzangeli commented May 10, 2019

It is implemented, just your broker seems to crash before it gets there ...
I am about to use the docker image to run the test.
Once I'm able to reproduce the bug it will most probably be easy to fix it
Patience ... :)

@kzangeli
Copy link
Collaborator

Meanwhile, the output from valgrind for this crash would be really useful

@tanktoo
Copy link
Author

tanktoo commented May 10, 2019

If you can provide some information on how to use it with the docker image I can test it.

@kzangeli
Copy link
Collaborator

valgrind is included in the docker image, hopefully this will be finished today.
Also, the contexts have changed from http to https and this gave me problems in CentOS (not in Ubuntu, probably due to newer libcurl).

So, I'd propose to get a new docker image and try again. We'd be able to do some serious debugging with valgrind.

I will let you know when the new docker image is prepared

@kzangeli
Copy link
Collaborator

The docker image should be OK now

@kzangeli kzangeli removed the bug Something isn't working label Jul 3, 2019
@kzangeli kzangeli added bug Something isn't working Fixed - needs validation labels Jul 15, 2019
@jmcanterafonseca
Copy link

it works for me. closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Fixed - needs validation
Projects
None yet
Development

No branches or pull requests

3 participants