Skip to content

Fixed pvr.hts ReadMessage data timeout to user settings timeout. #181

Closed
wants to merge 1 commit into from
View
27 addons/pvr.hts/src/HTSPConnection.cpp
@@ -269,6 +269,8 @@ htsmsg_t* CHTSPConnection::ReadMessage(int iInitialTimeout /* = 10000 */, int iD
{
void* buf;
uint32_t l;
+ uint8_t lb[4];
+ ssize_t lread;
// get the first queued message if any
if(m_queue.size())
@@ -288,19 +290,32 @@ htsmsg_t* CHTSPConnection::ReadMessage(int iInitialTimeout /* = 10000 */, int iD
}
// read the size
- if (m_socket->Read(&l, 4, iInitialTimeout) != 4)
+ if ((lread = m_socket->Read(lb, 4, iInitialTimeout)) != 4)
{
- // timed out
- if(m_socket->GetErrorNumber() == ETIMEDOUT)
- return NULL;
+ if (lread > 0)
+ {
+ // we have read "some" bytes before timeout, read the rest or fail.
+ XBMC->Log(LOG_NOTICE, "%s Size preread %d", __FUNCTION__, lread);
+ if (m_socket->Read(lb + lread, 4 - lread, 0) == 4 - lread)
@FernetMenta
Collaborator
FernetMenta added a note Apr 1, 2013

That would block forever which is not the desired behavior.

@dezi
dezi added a note Apr 1, 2013

No, this would leave 0-3 bytes unprocessed, same problem. 1-3 bytes unprocessed make the HTSP stream unusable. If the initial result is not 4 bytes, we exspect a timeout or worse. If it was timeout, the stream is still intact and we MUST process, what we have read. Therefore we MUST know, what has been read.
(Sorry, i was confused for a moment, see next comment.)

@dezi
dezi added a note Apr 1, 2013

Yes it will block, i did this on purpose to make sure to read the remaing few bytes in one retry. It will return immedeately, if the original problem was a broken pipe something like that. In this case, a new session will be requested.

@FernetMenta
Collaborator
FernetMenta added a note Apr 1, 2013

Yes, I understand. The same problem exist with vnsi but we can't wait forever. If we don't get the remaining bytes withing a given timeout, we lost sync and need to terminate connection. For vnsi I do this already.
Waiting here for longer does harm because it blocks player thread. You won't make player hang in case of a nasty behaving backend, do you?

FernetMenta@2d93e7a

@dezi
dezi added a note Apr 1, 2013

No, i would certainly not. But the chance of tvheadend failing in spooling the 4 size bytes is low. It would be more obvious, if the stream has broken pipe or something like this, which does not hang, but returns immedeately.

But we can set the second timeout to user timeout, as in the data read call. I believe, setting it to 5 ms again will fail again, because Raspi ist doing some housekeeping here, whatever it is, therefor is said: "Timeout unpredictable but response for sure" => 0.

The bigger issue would be, to inspect the remaining tcp platforms for the same issue and other PVR.XXX modules how they react on the TcpSocketRead modification.

@FernetMenta
Collaborator
FernetMenta added a note Apr 1, 2013

I agree, having 2 timeouts makes sense. Why not changing socket::Read by adding this second timeout with a default of 100ms. This should not have any negative impact on the other clients. They even benefit from correcting the error in the read method.

@dezi
dezi added a note Apr 1, 2013

No, we have already two timeouts in pvr.hts. That is fine. I just need to know, what has been read or not from the socket, then we are fine. I do not want to alter the TcpSocketRead API, just make a required and hopefully harmless for other pvrs tune-up.

@FernetMenta
Collaborator
FernetMenta added a note Apr 1, 2013

Your current code returns bytes read even if the function hit timeout. This changes usage of this function as applications needs to compare return value with desired value. If those 2 doesn't match means a "bad" timeout. Doesn't this alter the API?
If you put the additional code into socket::read and added an addition timeout with a default value to the function's signature, I don't have to change anything for vnsi. And I think the other addons are fine as well.

@dezi
dezi added a note Apr 2, 2013

VNSISession.cpp has the very same flaw in dropping a valid connection if a timeout with the initialTimeout occurs. The algo looks similar to what is in VPR.HTS.

@FernetMenta
Collaborator
FernetMenta added a note Apr 2, 2013

Yes I know. Interestingly vnsi does not behave that nasty as tvh.

@dezi
dezi added a note Apr 2, 2013

I think, that is because You allways call that code with the default timeout, which is 1000 ms (correct me if i am wrong) and never with 5 ms, like tvh. I could "avoid" this behaviour in pvr.ths by setting the timeout from 5 to 500 ms. Setting it to 50 ms still gave the fault, but with lower frequency.

@FernetMenta
Collaborator
FernetMenta added a note Apr 2, 2013

Yes, you are corect. VNSI calls it with 1000ms timeout. Some users test this pr with regard on failing channel switching. This could very likely be one of the reasons, right?

@dezi
dezi added a note Apr 3, 2013

I think we should fix that anyway. If a timeout with accidential size read occurs where we do not expect it, the following happens:

  • The next size read returns a random size.
  • If the buffer cannot be allocated, we read the remaining bytes from stream into null pointer.
  • If the buffer can be allocated, it will be very large, screwing VM.
  • The remaining bytes are read, and because size is much larger than what can be read, it will timeout in any case.
  • A timeout after 1 - 5 seconds (user setting) leads to loss of frames, the live stream is broken.
  • The stream will be reconnected.
  • If we wrote to null pointer buffer, we will crash later on anyway.

I consider this as a bug.

@FernetMenta
Collaborator
FernetMenta added a note Apr 3, 2013

This is definitely a bug and needs to be fixed. We have to wait for opdenkamp, afaik he uses the platform code in other applications too.

@dezi
dezi added a note Apr 3, 2013

Yes, good idea. I just wanted to "stimulate" the problem here with my patches and will happily accept any solution for this problem. In the mean time, i can just work on other stuff regarding omxplayer and tvh with my patches. Maybe he can also check the other platforms (win32) as well.

@opdenkamp
Owner
opdenkamp added a note Apr 16, 2013

i'm not using the socket code in other projects yet (i do use the rest of it in libCEC indeed). this was something for libCEC 3.0+, so can be changed safely when all add-ons that use it are updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ lread = 4;
+ }
+ else
+ {
+ // timed out
+ if(m_socket->GetErrorNumber() == ETIMEDOUT)
+ return NULL;
+ }
+ }
+ if (lread != 4)
+ {
// read error, close the connection
XBMC->Log(LOG_ERROR, "%s - failed to read packet size (%s)", __FUNCTION__, m_socket->GetError().c_str());
TriggerReconnect();
return NULL;
}
- l = ntohl(l);
+ l = (lb[0] << 24) + (lb[1] << 16) + (lb[2] << 8) + lb[3];
@FernetMenta
Collaborator
FernetMenta added a note Apr 1, 2013

why? this breaks the code for other platforms.

@dezi
dezi added a note Apr 1, 2013

As far as I understood, network byte order is always Big-Endian. Please explain to me.

@FernetMenta
Collaborator
FernetMenta added a note Apr 1, 2013

but the byte order on the host differs, that's why such functions like ntohl exists.

@FernetMenta
Collaborator
FernetMenta added a note Apr 1, 2013

ok, you add byte by byte. that's really ugly, isn't it?

@dezi
dezi added a note Apr 1, 2013

Better than an union or memcpy i thought...

@FernetMenta
Collaborator
FernetMenta added a note Apr 1, 2013

The problem is that you move details from a lower layer to the application level which make the design brittle and confusing. socket::read is designed to read a given number of bytes with timeout. I see no problem with this interface.
What is the actual problem? poll times out though there are a couple of byes to read?

@dezi
dezi added a note Apr 1, 2013

Yes, i have only experienced this behaviour on my RaspPi. Even if You have a timeout of 50 ms, it would eventually return only 1-3 bytes and then timeout again. It is obvious, that if You read bytes off the length from HTSP Stream, this HTSP Connections is rendered useless. Moreof, the next read of 4 bytes will read a wrong size. If this wrong size cannot be allocated, it is never checked anywhere. If this can be allocated, it is likely to screw up VM. Anyway, if have experienced the message "Lost Connections to TVHeadEnd" in XBMC quite a few times even on my OSX-XBMC. I believe, that tvheadend is rock solid in serving HTSP, so its always a problem in the frontend.

The TCPSocket routines give no gurantee, that nothing has been read in a case of timeout. So they cannot be used for this purpose as they are right now (finding out if something is available or if it would block). If we wanted it "clean", we had to use the result of the "poll" routine in PVR's read thread rather than try a message read with a delay of 5 ms.

@FernetMenta
Collaborator
FernetMenta added a note Apr 1, 2013

Then the method TcpSocketRead has to be fixed. In case of poll times out do a non blocking recv. That would fix it for other backends too as they suffer from the same problem.

@elupus
Collaborator
elupus added a note Apr 1, 2013

To me this looks right. The manual combination of bytes is correct imho. This is an "invented" byte order that tvheadend writes to the stream. It has nothing to do with ntohl.

@FernetMenta
Collaborator
FernetMenta added a note Apr 1, 2013

This is an "invented" byte

I don't think so. hts works pretty much the same way as vnsi and vnsi writes the length with htonl.

@dezi
dezi added a note Apr 1, 2013

I would not do a non-block read in TcpSocketRead if something partial times out. Just do not cloak the bytes actually read before timeout with the error code.

In my point of view a call to

Read(buf,256,1000 /* timeout */)

means, read at most 256 bytes and return number of bytes read and NOT read all or some and return -ETIMEOUT as the result if it finally timed out.

@FernetMenta
Collaborator
FernetMenta added a note Apr 1, 2013

If poll times out, verify by doing a non-blocking read. This does not cloak any bytes read earlier. You can even do in addition to what you have already added.
The higher level code can decide how it would like to handle the results. In this case the changes to HTSPConnection.cpp would be obsolete.

@elupus
Collaborator
elupus added a note Apr 1, 2013

https://github.com/tvheadend/tvheadend/blob/master/src/htsmsg_binary.c#L224 is how it's written to the stream. No htonl there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
// empty message
if(l == 0)
@@ -629,7 +644,7 @@ void* CHTSPConnection::Process(void)
{
{
CLockObject lock(m_mutex);
- msg = ReadMessage(5);
+ msg = ReadMessage(5,g_iConnectTimeout * 1000);
}
if(msg == NULL || msg->hm_data == NULL)
{
View
8 lib/platform/posix/os-socket.h
@@ -224,7 +224,7 @@ namespace PLATFORM
if (iPollResult == 0)
{
*iError = ETIMEDOUT;
- return -ETIMEDOUT;
+ return (iBytesRead > 0) ? iBytesRead : -ETIMEDOUT;
@FernetMenta
Collaborator
FernetMenta added a note Apr 1, 2013

This looks correct. I am wondering if we should not subtract timeout by the time already spent in this function. Makes no sense starting a read of a large packet with almost no time left.

@dezi
dezi added a note Apr 1, 2013

The problem is here with the timeout a such. Even if the time left is still large, the bytes could just arrive a "nanosecond" before it is due. We have to be able to deal with a partially filled buffer anyway, no matter what size was requested. The decision, to get the remaining/missing bytes in blocking or non-blocking mode must be left to the application which requested the bytes.

@FernetMenta
Collaborator
FernetMenta added a note Apr 1, 2013

You have to set a limit for timeout, even a nonosecond is late and can considered an error. This abstraction layer becomes useless if you wanted to rebuild the interface of the read function for higher levels.

@dezi
dezi added a note Apr 1, 2013

No, that is not what i intend. I just want to know, if a timeout occurs, what has been read already or what has not been read. The timeout, if the app layer requests bytes with timeout, is not an error as such but something exspected.

@opdenkamp
Owner
opdenkamp added a note Apr 16, 2013

agreed, and the state can be read from iError so you would know whether a timeout has occured

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
}
}
@@ -236,12 +236,12 @@ namespace PLATFORM
if (errno == EAGAIN && iTimeoutMs > 0)
continue;
*iError = errno;
- return -errno;
+ return (iBytesRead > 0) ? iBytesRead : -errno;
}
- else if (iReadResult == 0 || (iReadResult != (ssize_t)len && iTimeoutMs == 0))
+ else if (iReadResult == 0)
{
*iError = ECONNRESET;
- return -ECONNRESET;
+ return (iBytesRead > 0) ? iBytesRead : -ECONNRESET;
}
iBytesRead += iReadResult;
Something went wrong with that request. Please try again.