Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libmicrohttpd crash #212

Closed
benwtrent opened this issue Apr 20, 2015 · 32 comments
Closed

libmicrohttpd crash #212

benwtrent opened this issue Apr 20, 2015 · 32 comments

Comments

@benwtrent
Copy link
Contributor

I am experiencing an intermittent crash when utilizing the rest API for janus.

This happens after numerous connections and disconnections to the janus server and any plugin(I have been using the video room plugin but since it is not plugin specific, I doubt it has to do with the plugin I am using).

It may be due to the previous request against the rest API being invalid but a single invalid request should not dork up the entire system...

GDB Backtrace...
Some janus DB output at level 5

@benwtrent
Copy link
Contributor Author

Nvm on the invalid inputs being the cause. It just happened again and it seems that there is a lock up in the REST interface.

Now, my code does try to take advantage of a previously connected janus session/attached plugin if it has been less than 20 seconds since it was attached. This is in an effort to try and not spam numerous connections/disconnections when multiple requests have to be made within a short time.

@benwtrent
Copy link
Contributor Author

Same error but higher Janus debug level = 5
Same error but debug level 7//I removed all the libnice garbage to make it something worth reading

@lminiero
Copy link
Member

Currently abroad, will look into this when I get back.

@lminiero
Copy link
Member

Have you checked what happens at line 1907 of libmicrohttpd's daemon.c? A "close failed" error can apparently be triggered for several different causes, and my sources are different from yours as at that line I have something different. By the way, are you using a different threading model than the default "one-per-connection"?

@benwtrent
Copy link
Contributor Author

I am not sure by what you mean by different threading model. I have a REST connection that could make multiple posts before closing itself. Should it be only one REST connection per post?

@lminiero
Copy link
Member

In janus.cfg you can choose how the web server must behave, that is one thread per connection or a thread pool of a limited size the web server can exploit in rotation to handle requests.

About your question, if I got it right I don't think you're limited in any way: HTTP/1.1 definitely supports multiple requests over the same connection and supports pipelining as well. Web servers usually handle this automatically, and I suppose libmicrohttpd does as well. The Janus core has no view of this, which is handled transparently by the library: we're only notified when a request has been received, and then we can act on it.

@benwtrent
Copy link
Contributor Author

OK, libmicrohttpd ver 1.9.30-1, and I have unlimited threads(one per connection).

I will see if I can find that source code for the daemon

@benwtrent
Copy link
Contributor Author

Just so there is further info, this seems to happen during microhttpd trying to accept the connection(when I connect with my management node to close, or create a room).

@lminiero
Copy link
Member

Looks like an internal issue within libmicrohttpd, rather than something in Janus. Is this happening regularly? Have you found a way to replicate this on a consistent basis? It might be worthwhile to also notify this on their mailing list (https://lists.gnu.org/mailman/listinfo/libmicrohttpd) in case it turns out the problem's there.

By the way, you may know this already, but if you have several sockets open already and you didn't increase the ulimit for file descriptors, it may be crashing for this reason.

@benwtrent
Copy link
Contributor Author

I will take a gander at the ulimit as what happens with WebSockets is that it does not crash, however it just ignores my request and my client just spins until it times out.

@benwtrent
Copy link
Contributor Author

Checked ulimit and this has nothing to do with it. I will try pulling down an updated version of the code.

@benwtrent
Copy link
Contributor Author

updated version did not work, trying a new connection on the rest caused the close failed error.

@benwtrent
Copy link
Contributor Author

Well, I feel foolish, I was still linking against the old version. I have been testing with the latest release of libmicrohttpd and it has been working better.

@lminiero
Copy link
Member

lminiero commented May 7, 2015

I'm afraid I still don't have clear the scenario you're describing. Are you writing a client from scratch, including transports, yourself? are you experiencing issues in getting your HTTP/WebSockets to work as expected with Janus? We did several tests with HTTP and WebSockets coming and going and we never experienced those issues: in our experience both libraries (libmicrohttpd and libwebsockets) are quite stable, so not sure where to start looking.

@benwtrent
Copy link
Contributor Author

This is on me again. I was sending Null data to Janus and that freaked the rest interface

@lminiero
Copy link
Member

What do you mean by null data? If it's anything we can fix in Janus we might want to do that, as otherwise you might have found some kind of "exploit" to crash it. Same thing if it's a bug that needs to be fixed in libmicrohttpd, since in that case it affects other projects besides this.

@benwtrent
Copy link
Contributor Author

Ugh, I worded that wrong, not NULL data(this is in reference to a personal bug in my code nothing to do with janus :)), I am handling connection state poorly on my side and it causes Libmicrohttpd to crash on a close.So, for some reason, my client delayed handling the request as it came in and handled the requests out of order. This caused some weird connection logic on my client side which create a false connection.

It seemed that my rest client kept its TCP connection open with libmicrohttpd(through the TCP of the HTTP rest information) and this causes an issue when my session times out(as no messaging is occurring). So, janus consequently closes my session, and when the TCP connection is finally let go, it is already invalidated by janus and thus crashes it.

This is ONLY a theory. I am not an expert in this particular area.

@benwtrent
Copy link
Contributor Author

This is still happening...now on the close found on daemon.c:2048.

Janus seems to prematurely timeout my session and then that consequently causes an issue. I am using 0.9.41 of microhttpd now on a headless debian server.

Below is additional info but the skinny Janus(or just libmicrohttpd) stops responding to my keep alive requests. It recognizes that their is a new post on the correct url(Janus printout says as much), but does not handles it. Consequently causing my connection to timeout, be cleaned up, and then when trying to close the session, libmicrohttpd crashes.

Here is a sample transaction(this is all over the REST API not websocket):

2015-05-20 13:16:08,073 [19] DEBUG JanusApi Sending: 
{"janus":"keepalive","transaction":"1d9b9anf9b3c"}
 To url {SessionToken}/4056330232 with SessionID 3343615003
2015-05-20 13:16:08,104 [19] DEBUG JanusApi Recv: 
{
   "janus": "ack",
   "session_id": 3343615003,
   "transaction": "1d9b9anf9b3c"
}
2015-05-20 13:16:12,129 [5] DEBUG JanusApi Sending: 
{"janus":"message","transaction":"ci7yn8l8of15","body":{"request":"destroy","room":49}}
 To url {SessionToken}/4056330232 with SessionID 3343615003
2015-05-20 13:16:12,160 [5] DEBUG JanusApi Recv: 
    Content: {
   "janus": "success",
   "session_id": 3343615003,
   "sender": 4056330232,
   "transaction": "ci7yn8l8of15",
   "plugindata": {
      "plugin": "janus.plugin.videoroom",
      "data": {
         "videoroom": "destroyed",
         "room": 49
      }
   }
}, 

2015-05-20 13:16:15,904 [9] DEBUG JanusApi Sending: 
{"janus":"message","transaction":"xlv8zng1rwbt","body":{"request":"create","room":50,"record":false,"is_private":false,"bitrate":2048000,"fir_freq":25}}
 To url {SessionToken}/4056330232 with SessionID 3343615003
2015-05-20 13:16:15,936 [9] DEBUG JanusApi Recv: 
    Content: {
   "janus": "success",
   "session_id": 3343615003,
   "sender": 4056330232,
   "transaction": "xlv8zng1rwbt",
   "plugindata": {
      "plugin": "janus.plugin.videoroom",
      "data": {
         "videoroom": "created",
         "room": 50
      }
   }
}, 

2015-05-20 13:16:35,482 [19] DEBUG JanusApi Sending: 
{"janus":"keepalive","transaction":"rc513qczpuif"}
 To url {SessionToken}/4056330232 with SessionID 3343615003
2015-05-20 13:16:35,482 [19] DEBUG JanusApi Recv: 

2015-05-20 13:16:55,749 [19] DEBUG JanusApi Sending: 
{"janus":"keepalive","transaction":"90d3rcxuneow"}
 To url {SessionToken}/4056330232 with SessionID 3343615003
2015-05-20 13:17:25,781 [19] DEBUG JanusApi Recv: 

You can see after that last command to the video room to create room 50 the janus gateway stops responding to my keepalive requests. Consequently, my connection "timesout" and janus tries to close my session. Janus receives my post. I see the printout later(after that room is created) that a post request _is received_ from me on that URI.

  • _Please see this pastebin for janus debug_ pastebin.
  • _Core dump backtrace_ pastebin

This connection essentially only creates and destroys video rooms for right now and keeps itself alive. Consistently, after a handful of rooms are created and destroyed(joined and left by other parties as well) janus will crash with this error.

@benwtrent benwtrent reopened this May 20, 2015
@lminiero
Copy link
Member

It may be a deadlock somewhere, but as far as Janus is concerned, I don't think we do any lock on keepalives: we just answer and update a timestamp. I'll look into this ASAP.

@benwtrent
Copy link
Contributor Author

The connections that don't respond have New connection on REST API printed out in the log before Got a HTTP POST request on /janus/3343615003/4056330232 when all the other keepalives were not treated as a New REST API connection.

@lminiero
Copy link
Member

The New connection on REST API only appears when a new TCP connection is accepted, it doesn't refer to HTTP requests. If a connection is reused for an HTTP transaction (as it often happens with HTTP/1.1), then you won't get that message.

To check if it's a locking issue, you may want to add some debug lines before the janus_session_find at line 1030 of janus.c. In fact, I think that's the only synchronized call that is made for a keepalive: a lock is used to retrieve the session object, and nothing else is locked after that since, as I anticipated, a keepalive only updates a timestamp and immediately returns a success.

@benwtrent
Copy link
Contributor Author

This issue may stem from using a different TCP socket connection for the same session ID and plugin handle ID. For some reason, the socket is closed either on my side or on Janus's side and after the new socket is made, the system ignores the request being made on that socket.

@benwtrent
Copy link
Contributor Author

Also, from what I can read in the logging, it never parses the URL to grab the session id from the request. It seems that only the firstround of the janus_ws_handler is called when this issue arises.

@lminiero
Copy link
Member

The firstround only parses request line, headers and the like, and doesn't process the payload. Is there any payload to process when that happens? Is the HTTP request correct? Make also sure that a MHD_YES is returned from that first round, or otherwise libmicrohttpd will assume that preprocessing the request failed and so it will stop there (not sure if this also results in the TCP connection to be closed).

@benwtrent
Copy link
Contributor Author

Does libmicrohttpd return an error if it is malformed? Should I be able to check the status code of the http header sent back to my client?

@lminiero
Copy link
Member

Not sure what happens when you send a MHD_NO back, that is if it returns an error code or just closes the connection. I think that if we return an MHD_NO so soon it's probabiy the latter.

@lminiero
Copy link
Member

lminiero commented Jul 8, 2015

Can you share any code that can replicate the issue? Especially if it's the way you were constructing requests that was crashing the library. I'd like to figure out if it's a problem in the library itself or the way we use it.

@benwtrent
Copy link
Contributor Author

I have not been able to reproduce the issue since I changed my logic for the REST interface but I think it had to do with my Rest client not handling the underlying TCP conenction very well.

@lminiero
Copy link
Member

lminiero commented Jul 9, 2015

If you still have a version of it that caused the issue, it could help me debugging, even just to figure out if there's any measure I can take not to make this a potential problem. Otherwise, I guess we can close the issue, if you're ok with that.

@benwtrent
Copy link
Contributor Author

I do not have that old version anylonger. The issue should probably be closed.

@Bhlowe
Copy link

Bhlowe commented Oct 31, 2017

Just woke up to a similar error.. Possible I sent something bad via the API.. Just working with the streaming plugin and token management.

Creating new session: 1791654788786794
Creating new handle in session 1791654788786794: 905821567368574
Detaching handle from JANUS Streaming plugin
Fatal error in GNU libmicrohttpd daemon.c:1826: close failed
Aborted (core dumped)

@crowdoptic
Copy link

I also am experiencing a similar issue. While I see nothing of note in the Janus log. I do get this in syslog right before Janus crashes and is restarted by systemd.
Fatal error in GNU libmicrohttpd daemon.c:3092: Close socket failed.

I cannot tie it to a particular request made by our app as it seems to be on socket close rather than when the request was made.

I am running the latest (Feb 6th 2018) trunk and libmicrohttpd-0.9.59.

I'm happy to post with full logs if that helps, but it seems as if the issue is still present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants