Fixed pvr.hts ReadMessage data timeout to user settings timeout. #181

Closed
wants to merge 1 commit into
from
Jump to file or symbol
Failed to load files and symbols.
+25 −10
Diff settings

Always

Just for now

@@ -269,6 +269,8 @@ htsmsg_t* CHTSPConnection::ReadMessage(int iInitialTimeout /* = 10000 */, int iD
{
void* buf;
uint32_t l;
+ uint8_t lb[4];
+ ssize_t lread;
// get the first queued message if any
if(m_queue.size())
@@ -288,19 +290,32 @@ htsmsg_t* CHTSPConnection::ReadMessage(int iInitialTimeout /* = 10000 */, int iD
}
// read the size
- if (m_socket->Read(&l, 4, iInitialTimeout) != 4)
+ if ((lread = m_socket->Read(lb, 4, iInitialTimeout)) != 4)
{
- // timed out
- if(m_socket->GetErrorNumber() == ETIMEDOUT)
- return NULL;
+ if (lread > 0)
+ {
+ // we have read "some" bytes before timeout, read the rest or fail.
+ XBMC->Log(LOG_NOTICE, "%s Size preread %d", __FUNCTION__, lread);
+ if (m_socket->Read(lb + lread, 4 - lread, 0) == 4 - lread)

This comment has been minimized.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

That would block forever which is not the desired behavior.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

That would block forever which is not the desired behavior.

This comment has been minimized.

@dezi

dezi Apr 1, 2013

No, this would leave 0-3 bytes unprocessed, same problem. 1-3 bytes unprocessed make the HTSP stream unusable. If the initial result is not 4 bytes, we exspect a timeout or worse. If it was timeout, the stream is still intact and we MUST process, what we have read. Therefore we MUST know, what has been read.
(Sorry, i was confused for a moment, see next comment.)

@dezi

dezi Apr 1, 2013

No, this would leave 0-3 bytes unprocessed, same problem. 1-3 bytes unprocessed make the HTSP stream unusable. If the initial result is not 4 bytes, we exspect a timeout or worse. If it was timeout, the stream is still intact and we MUST process, what we have read. Therefore we MUST know, what has been read.
(Sorry, i was confused for a moment, see next comment.)

This comment has been minimized.

@dezi

dezi Apr 1, 2013

Yes it will block, i did this on purpose to make sure to read the remaing few bytes in one retry. It will return immedeately, if the original problem was a broken pipe something like that. In this case, a new session will be requested.

@dezi

dezi Apr 1, 2013

Yes it will block, i did this on purpose to make sure to read the remaing few bytes in one retry. It will return immedeately, if the original problem was a broken pipe something like that. In this case, a new session will be requested.

This comment has been minimized.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

Yes, I understand. The same problem exist with vnsi but we can't wait forever. If we don't get the remaining bytes withing a given timeout, we lost sync and need to terminate connection. For vnsi I do this already.
Waiting here for longer does harm because it blocks player thread. You won't make player hang in case of a nasty behaving backend, do you?

FernetMenta/xbmc-pvr-addons@2d93e7a

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

Yes, I understand. The same problem exist with vnsi but we can't wait forever. If we don't get the remaining bytes withing a given timeout, we lost sync and need to terminate connection. For vnsi I do this already.
Waiting here for longer does harm because it blocks player thread. You won't make player hang in case of a nasty behaving backend, do you?

FernetMenta/xbmc-pvr-addons@2d93e7a

This comment has been minimized.

@dezi

dezi Apr 1, 2013

No, i would certainly not. But the chance of tvheadend failing in spooling the 4 size bytes is low. It would be more obvious, if the stream has broken pipe or something like this, which does not hang, but returns immedeately.

But we can set the second timeout to user timeout, as in the data read call. I believe, setting it to 5 ms again will fail again, because Raspi ist doing some housekeeping here, whatever it is, therefor is said: "Timeout unpredictable but response for sure" => 0.

The bigger issue would be, to inspect the remaining tcp platforms for the same issue and other PVR.XXX modules how they react on the TcpSocketRead modification.

@dezi

dezi Apr 1, 2013

No, i would certainly not. But the chance of tvheadend failing in spooling the 4 size bytes is low. It would be more obvious, if the stream has broken pipe or something like this, which does not hang, but returns immedeately.

But we can set the second timeout to user timeout, as in the data read call. I believe, setting it to 5 ms again will fail again, because Raspi ist doing some housekeeping here, whatever it is, therefor is said: "Timeout unpredictable but response for sure" => 0.

The bigger issue would be, to inspect the remaining tcp platforms for the same issue and other PVR.XXX modules how they react on the TcpSocketRead modification.

This comment has been minimized.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

I agree, having 2 timeouts makes sense. Why not changing socket::Read by adding this second timeout with a default of 100ms. This should not have any negative impact on the other clients. They even benefit from correcting the error in the read method.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

I agree, having 2 timeouts makes sense. Why not changing socket::Read by adding this second timeout with a default of 100ms. This should not have any negative impact on the other clients. They even benefit from correcting the error in the read method.

This comment has been minimized.

@dezi

dezi Apr 1, 2013

No, we have already two timeouts in pvr.hts. That is fine. I just need to know, what has been read or not from the socket, then we are fine. I do not want to alter the TcpSocketRead API, just make a required and hopefully harmless for other pvrs tune-up.

@dezi

dezi Apr 1, 2013

No, we have already two timeouts in pvr.hts. That is fine. I just need to know, what has been read or not from the socket, then we are fine. I do not want to alter the TcpSocketRead API, just make a required and hopefully harmless for other pvrs tune-up.

This comment has been minimized.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

Your current code returns bytes read even if the function hit timeout. This changes usage of this function as applications needs to compare return value with desired value. If those 2 doesn't match means a "bad" timeout. Doesn't this alter the API?
If you put the additional code into socket::read and added an addition timeout with a default value to the function's signature, I don't have to change anything for vnsi. And I think the other addons are fine as well.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

Your current code returns bytes read even if the function hit timeout. This changes usage of this function as applications needs to compare return value with desired value. If those 2 doesn't match means a "bad" timeout. Doesn't this alter the API?
If you put the additional code into socket::read and added an addition timeout with a default value to the function's signature, I don't have to change anything for vnsi. And I think the other addons are fine as well.

This comment has been minimized.

@dezi

dezi Apr 2, 2013

VNSISession.cpp has the very same flaw in dropping a valid connection if a timeout with the initialTimeout occurs. The algo looks similar to what is in VPR.HTS.

@dezi

dezi Apr 2, 2013

VNSISession.cpp has the very same flaw in dropping a valid connection if a timeout with the initialTimeout occurs. The algo looks similar to what is in VPR.HTS.

This comment has been minimized.

@FernetMenta

FernetMenta Apr 2, 2013

Collaborator

Yes I know. Interestingly vnsi does not behave that nasty as tvh.

@FernetMenta

FernetMenta Apr 2, 2013

Collaborator

Yes I know. Interestingly vnsi does not behave that nasty as tvh.

This comment has been minimized.

@dezi

dezi Apr 2, 2013

I think, that is because You allways call that code with the default timeout, which is 1000 ms (correct me if i am wrong) and never with 5 ms, like tvh. I could "avoid" this behaviour in pvr.ths by setting the timeout from 5 to 500 ms. Setting it to 50 ms still gave the fault, but with lower frequency.

@dezi

dezi Apr 2, 2013

I think, that is because You allways call that code with the default timeout, which is 1000 ms (correct me if i am wrong) and never with 5 ms, like tvh. I could "avoid" this behaviour in pvr.ths by setting the timeout from 5 to 500 ms. Setting it to 50 ms still gave the fault, but with lower frequency.

This comment has been minimized.

@FernetMenta

FernetMenta Apr 2, 2013

Collaborator

Yes, you are corect. VNSI calls it with 1000ms timeout. Some users test this pr with regard on failing channel switching. This could very likely be one of the reasons, right?

@FernetMenta

FernetMenta Apr 2, 2013

Collaborator

Yes, you are corect. VNSI calls it with 1000ms timeout. Some users test this pr with regard on failing channel switching. This could very likely be one of the reasons, right?

This comment has been minimized.

@dezi

dezi Apr 3, 2013

I think we should fix that anyway. If a timeout with accidential size read occurs where we do not expect it, the following happens:

  • The next size read returns a random size.
  • If the buffer cannot be allocated, we read the remaining bytes from stream into null pointer.
  • If the buffer can be allocated, it will be very large, screwing VM.
  • The remaining bytes are read, and because size is much larger than what can be read, it will timeout in any case.
  • A timeout after 1 - 5 seconds (user setting) leads to loss of frames, the live stream is broken.
  • The stream will be reconnected.
  • If we wrote to null pointer buffer, we will crash later on anyway.

I consider this as a bug.

@dezi

dezi Apr 3, 2013

I think we should fix that anyway. If a timeout with accidential size read occurs where we do not expect it, the following happens:

  • The next size read returns a random size.
  • If the buffer cannot be allocated, we read the remaining bytes from stream into null pointer.
  • If the buffer can be allocated, it will be very large, screwing VM.
  • The remaining bytes are read, and because size is much larger than what can be read, it will timeout in any case.
  • A timeout after 1 - 5 seconds (user setting) leads to loss of frames, the live stream is broken.
  • The stream will be reconnected.
  • If we wrote to null pointer buffer, we will crash later on anyway.

I consider this as a bug.

This comment has been minimized.

@FernetMenta

FernetMenta Apr 3, 2013

Collaborator

This is definitely a bug and needs to be fixed. We have to wait for opdenkamp, afaik he uses the platform code in other applications too.

@FernetMenta

FernetMenta Apr 3, 2013

Collaborator

This is definitely a bug and needs to be fixed. We have to wait for opdenkamp, afaik he uses the platform code in other applications too.

This comment has been minimized.

@dezi

dezi Apr 3, 2013

Yes, good idea. I just wanted to "stimulate" the problem here with my patches and will happily accept any solution for this problem. In the mean time, i can just work on other stuff regarding omxplayer and tvh with my patches. Maybe he can also check the other platforms (win32) as well.

@dezi

dezi Apr 3, 2013

Yes, good idea. I just wanted to "stimulate" the problem here with my patches and will happily accept any solution for this problem. In the mean time, i can just work on other stuff regarding omxplayer and tvh with my patches. Maybe he can also check the other platforms (win32) as well.

This comment has been minimized.

@opdenkamp

opdenkamp Apr 16, 2013

Owner

i'm not using the socket code in other projects yet (i do use the rest of it in libCEC indeed). this was something for libCEC 3.0+, so can be changed safely when all add-ons that use it are updated

@opdenkamp

opdenkamp Apr 16, 2013

Owner

i'm not using the socket code in other projects yet (i do use the rest of it in libCEC indeed). this was something for libCEC 3.0+, so can be changed safely when all add-ons that use it are updated

+ lread = 4;
+ }
+ else
+ {
+ // timed out
+ if(m_socket->GetErrorNumber() == ETIMEDOUT)
+ return NULL;
+ }
+ }
+ if (lread != 4)
+ {
// read error, close the connection
XBMC->Log(LOG_ERROR, "%s - failed to read packet size (%s)", __FUNCTION__, m_socket->GetError().c_str());
TriggerReconnect();
return NULL;
}
- l = ntohl(l);
+ l = (lb[0] << 24) + (lb[1] << 16) + (lb[2] << 8) + lb[3];

This comment has been minimized.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

why? this breaks the code for other platforms.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

why? this breaks the code for other platforms.

This comment has been minimized.

@dezi

dezi Apr 1, 2013

As far as I understood, network byte order is always Big-Endian. Please explain to me.

@dezi

dezi Apr 1, 2013

As far as I understood, network byte order is always Big-Endian. Please explain to me.

This comment has been minimized.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

but the byte order on the host differs, that's why such functions like ntohl exists.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

but the byte order on the host differs, that's why such functions like ntohl exists.

This comment has been minimized.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

ok, you add byte by byte. that's really ugly, isn't it?

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

ok, you add byte by byte. that's really ugly, isn't it?

This comment has been minimized.

@dezi

dezi Apr 1, 2013

Better than an union or memcpy i thought...

@dezi

dezi Apr 1, 2013

Better than an union or memcpy i thought...

This comment has been minimized.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

The problem is that you move details from a lower layer to the application level which make the design brittle and confusing. socket::read is designed to read a given number of bytes with timeout. I see no problem with this interface.
What is the actual problem? poll times out though there are a couple of byes to read?

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

The problem is that you move details from a lower layer to the application level which make the design brittle and confusing. socket::read is designed to read a given number of bytes with timeout. I see no problem with this interface.
What is the actual problem? poll times out though there are a couple of byes to read?

This comment has been minimized.

@dezi

dezi Apr 1, 2013

Yes, i have only experienced this behaviour on my RaspPi. Even if You have a timeout of 50 ms, it would eventually return only 1-3 bytes and then timeout again. It is obvious, that if You read bytes off the length from HTSP Stream, this HTSP Connections is rendered useless. Moreof, the next read of 4 bytes will read a wrong size. If this wrong size cannot be allocated, it is never checked anywhere. If this can be allocated, it is likely to screw up VM. Anyway, if have experienced the message "Lost Connections to TVHeadEnd" in XBMC quite a few times even on my OSX-XBMC. I believe, that tvheadend is rock solid in serving HTSP, so its always a problem in the frontend.

The TCPSocket routines give no gurantee, that nothing has been read in a case of timeout. So they cannot be used for this purpose as they are right now (finding out if something is available or if it would block). If we wanted it "clean", we had to use the result of the "poll" routine in PVR's read thread rather than try a message read with a delay of 5 ms.

@dezi

dezi Apr 1, 2013

Yes, i have only experienced this behaviour on my RaspPi. Even if You have a timeout of 50 ms, it would eventually return only 1-3 bytes and then timeout again. It is obvious, that if You read bytes off the length from HTSP Stream, this HTSP Connections is rendered useless. Moreof, the next read of 4 bytes will read a wrong size. If this wrong size cannot be allocated, it is never checked anywhere. If this can be allocated, it is likely to screw up VM. Anyway, if have experienced the message "Lost Connections to TVHeadEnd" in XBMC quite a few times even on my OSX-XBMC. I believe, that tvheadend is rock solid in serving HTSP, so its always a problem in the frontend.

The TCPSocket routines give no gurantee, that nothing has been read in a case of timeout. So they cannot be used for this purpose as they are right now (finding out if something is available or if it would block). If we wanted it "clean", we had to use the result of the "poll" routine in PVR's read thread rather than try a message read with a delay of 5 ms.

This comment has been minimized.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

Then the method TcpSocketRead has to be fixed. In case of poll times out do a non blocking recv. That would fix it for other backends too as they suffer from the same problem.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

Then the method TcpSocketRead has to be fixed. In case of poll times out do a non blocking recv. That would fix it for other backends too as they suffer from the same problem.

This comment has been minimized.

@elupus

elupus Apr 1, 2013

Collaborator

To me this looks right. The manual combination of bytes is correct imho. This is an "invented" byte order that tvheadend writes to the stream. It has nothing to do with ntohl.

@elupus

elupus Apr 1, 2013

Collaborator

To me this looks right. The manual combination of bytes is correct imho. This is an "invented" byte order that tvheadend writes to the stream. It has nothing to do with ntohl.

This comment has been minimized.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

This is an "invented" byte

I don't think so. hts works pretty much the same way as vnsi and vnsi writes the length with htonl.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

This is an "invented" byte

I don't think so. hts works pretty much the same way as vnsi and vnsi writes the length with htonl.

This comment has been minimized.

@dezi

dezi Apr 1, 2013

I would not do a non-block read in TcpSocketRead if something partial times out. Just do not cloak the bytes actually read before timeout with the error code.

In my point of view a call to

Read(buf,256,1000 /* timeout */)

means, read at most 256 bytes and return number of bytes read and NOT read all or some and return -ETIMEOUT as the result if it finally timed out.

@dezi

dezi Apr 1, 2013

I would not do a non-block read in TcpSocketRead if something partial times out. Just do not cloak the bytes actually read before timeout with the error code.

In my point of view a call to

Read(buf,256,1000 /* timeout */)

means, read at most 256 bytes and return number of bytes read and NOT read all or some and return -ETIMEOUT as the result if it finally timed out.

This comment has been minimized.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

If poll times out, verify by doing a non-blocking read. This does not cloak any bytes read earlier. You can even do in addition to what you have already added.
The higher level code can decide how it would like to handle the results. In this case the changes to HTSPConnection.cpp would be obsolete.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

If poll times out, verify by doing a non-blocking read. This does not cloak any bytes read earlier. You can even do in addition to what you have already added.
The higher level code can decide how it would like to handle the results. In this case the changes to HTSPConnection.cpp would be obsolete.

This comment has been minimized.

@elupus

elupus Apr 1, 2013

Collaborator

https://github.com/tvheadend/tvheadend/blob/master/src/htsmsg_binary.c#L224 is how it's written to the stream. No htonl there.

@elupus

elupus Apr 1, 2013

Collaborator

https://github.com/tvheadend/tvheadend/blob/master/src/htsmsg_binary.c#L224 is how it's written to the stream. No htonl there.

// empty message
if(l == 0)
@@ -629,7 +644,7 @@ void* CHTSPConnection::Process(void)
{
{
CLockObject lock(m_mutex);
- msg = ReadMessage(5);
+ msg = ReadMessage(5,g_iConnectTimeout * 1000);
}
if(msg == NULL || msg->hm_data == NULL)
{
@@ -224,7 +224,7 @@ namespace PLATFORM
if (iPollResult == 0)
{
*iError = ETIMEDOUT;
- return -ETIMEDOUT;
+ return (iBytesRead > 0) ? iBytesRead : -ETIMEDOUT;

This comment has been minimized.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

This looks correct. I am wondering if we should not subtract timeout by the time already spent in this function. Makes no sense starting a read of a large packet with almost no time left.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

This looks correct. I am wondering if we should not subtract timeout by the time already spent in this function. Makes no sense starting a read of a large packet with almost no time left.

This comment has been minimized.

@dezi

dezi Apr 1, 2013

The problem is here with the timeout a such. Even if the time left is still large, the bytes could just arrive a "nanosecond" before it is due. We have to be able to deal with a partially filled buffer anyway, no matter what size was requested. The decision, to get the remaining/missing bytes in blocking or non-blocking mode must be left to the application which requested the bytes.

@dezi

dezi Apr 1, 2013

The problem is here with the timeout a such. Even if the time left is still large, the bytes could just arrive a "nanosecond" before it is due. We have to be able to deal with a partially filled buffer anyway, no matter what size was requested. The decision, to get the remaining/missing bytes in blocking or non-blocking mode must be left to the application which requested the bytes.

This comment has been minimized.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

You have to set a limit for timeout, even a nonosecond is late and can considered an error. This abstraction layer becomes useless if you wanted to rebuild the interface of the read function for higher levels.

@FernetMenta

FernetMenta Apr 1, 2013

Collaborator

You have to set a limit for timeout, even a nonosecond is late and can considered an error. This abstraction layer becomes useless if you wanted to rebuild the interface of the read function for higher levels.

This comment has been minimized.

@dezi

dezi Apr 1, 2013

No, that is not what i intend. I just want to know, if a timeout occurs, what has been read already or what has not been read. The timeout, if the app layer requests bytes with timeout, is not an error as such but something exspected.

@dezi

dezi Apr 1, 2013

No, that is not what i intend. I just want to know, if a timeout occurs, what has been read already or what has not been read. The timeout, if the app layer requests bytes with timeout, is not an error as such but something exspected.

This comment has been minimized.

@opdenkamp

opdenkamp Apr 16, 2013

Owner

agreed, and the state can be read from iError so you would know whether a timeout has occured

@opdenkamp

opdenkamp Apr 16, 2013

Owner

agreed, and the state can be read from iError so you would know whether a timeout has occured

}
}
@@ -236,12 +236,12 @@ namespace PLATFORM
if (errno == EAGAIN && iTimeoutMs > 0)
continue;
*iError = errno;
- return -errno;
+ return (iBytesRead > 0) ? iBytesRead : -errno;
}
- else if (iReadResult == 0 || (iReadResult != (ssize_t)len && iTimeoutMs == 0))
+ else if (iReadResult == 0)
{
*iError = ECONNRESET;
- return -ECONNRESET;
+ return (iBytesRead > 0) ? iBytesRead : -ECONNRESET;
}
iBytesRead += iReadResult;