New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MiniSEED with no blockette 1000 #1544

Merged
merged 13 commits into from Oct 6, 2016

Conversation

Projects
None yet
5 participants
@krischer
Member

krischer commented Oct 3, 2016

This PR teaches ObsPy how to read MiniSEED files with no blockette 1000.

Test file is from @ThomasLecocq - I assume its okay to use two records of it here.

Don't merge yet as one change should probably be merged upstream into libmseed.

@chad-iris: This PR changes 1 thing in libmseed and works around an other problem:

  • libmseed cannot correctly detect the record length for the last record if blockette 1000 is missing. fe8e803 contains a proposed fix. I am not sure it is optimal but it passes all our tests.
  • If msr_parse() cannot detect the record length it returns MINRECLEN as an error code. I think it should return a specialized error code.
      /* Found record but could not determine length */
      if ( detlen == 0 )
    {
      return MINRECLEN;
    }

krischer added some commits Oct 3, 2016

@krischer krischer added the .io.mseed label Oct 3, 2016

@chad-iris

This comment has been minimized.

Show comment
Hide comment
@chad-iris

chad-iris Oct 3, 2016

Contributor

ms_detect() determines if a record is detected and, if possible, what the record length is. With these changes the routine assumes too much.

It is the caller of this routine that knows the context of the buffer. Knowing that there is no more data, therefore implying a record length, is a job for the caller. This is easily done when the return from ms_detect() is 0.

Contributor

chad-iris commented on fe8e803 Oct 3, 2016

ms_detect() determines if a record is detected and, if possible, what the record length is. With these changes the routine assumes too much.

It is the caller of this routine that knows the context of the buffer. Knowing that there is no more data, therefore implying a record length, is a job for the caller. This is easily done when the return from ms_detect() is 0.

@chad-iris

This comment has been minimized.

Show comment
Hide comment
@chad-iris

chad-iris Oct 4, 2016

Contributor

These changes allow scenarios where record lengths are mis-identified just because the buffer is a valid length. A 512-byte buffer containing the first 512 bytes of a 4096-byte record that does not have a blockette 1000 in the first 512-bytes (or at all) is identified incorrectly. Not good.

I suggest changing the ObsPy reader to use the pattern supported by libmseed's API.

Here are the return values for msr_parse():

 * Return values:
 *   0 : Success, populates the supplied MSRecord.
 *  >0 : Data record detected but not enough data is present, the
 *       return value is a hint of how many more bytes are needed.
 *  <0 : libmseed error code (listed in libmseed.h) is returned.

Here is the supported sequence (used by other parts of libmseed and likely beyond):

Call msr_parse() to parse a record from the buffer, then

  1. If a positive value is returned, aka there is definitely a record but it cannot be parsed, either:
    1. Get more data and call msr_parse() again
    2. If you know the record length, because you know it's the end of a buffer or file, call msr_parse() again and specify the length explicitly
  2. If zero is returned, handle the record and move on
  3. If negative is returned, handle the error

Note the case of calling msr_parse() and needing to get more data, which is quite common, especially when reading chunked data. Once that pattern is supported, handling the implied record length case is a small addition. You can see this pattern in ms_readmsr_main() for reference.

Regarding the lower level functions, It is not the job of msr_parse() nor ms_detect() to determine the implied record length that is at the end of a buffer, only the caller has the context to be able to determine that scenario with any certainty.

Contributor

chad-iris commented Oct 4, 2016

These changes allow scenarios where record lengths are mis-identified just because the buffer is a valid length. A 512-byte buffer containing the first 512 bytes of a 4096-byte record that does not have a blockette 1000 in the first 512-bytes (or at all) is identified incorrectly. Not good.

I suggest changing the ObsPy reader to use the pattern supported by libmseed's API.

Here are the return values for msr_parse():

 * Return values:
 *   0 : Success, populates the supplied MSRecord.
 *  >0 : Data record detected but not enough data is present, the
 *       return value is a hint of how many more bytes are needed.
 *  <0 : libmseed error code (listed in libmseed.h) is returned.

Here is the supported sequence (used by other parts of libmseed and likely beyond):

Call msr_parse() to parse a record from the buffer, then

  1. If a positive value is returned, aka there is definitely a record but it cannot be parsed, either:
    1. Get more data and call msr_parse() again
    2. If you know the record length, because you know it's the end of a buffer or file, call msr_parse() again and specify the length explicitly
  2. If zero is returned, handle the record and move on
  3. If negative is returned, handle the error

Note the case of calling msr_parse() and needing to get more data, which is quite common, especially when reading chunked data. Once that pattern is supported, handling the implied record length case is a small addition. You can see this pattern in ms_readmsr_main() for reference.

Regarding the lower level functions, It is not the job of msr_parse() nor ms_detect() to determine the implied record length that is at the end of a buffer, only the caller has the context to be able to determine that scenario with any certainty.

krischer added some commits Oct 4, 2016

@chad-iris

This comment has been minimized.

Show comment
Hide comment
@chad-iris

chad-iris Oct 4, 2016

Contributor

Something like this might be enough, adding a second call to msr_parse() after the existing first call:

        retcode = msr_parse ( (mseed+offset), buflen - offset, &msr, reclen, dataflag, verbose);

        // If record is detected but length is unknown, check if remaining buffer is a valid
        // length parse using the remaining buffer length as the record length.
        if (retcode > 0) {
          datasize = buflen - offset;
          if (datasize == 128 || datasize == 256 || datasize == 512 || datasize == 1024 ||
              datasize == 2048 || datasize == 4096 || datasize == 8192)
            retcode = msr_parse ( (mseed+offset), buflen - offset, &msr, datasize, dataflag, verbose);
        }
Contributor

chad-iris commented Oct 4, 2016

Something like this might be enough, adding a second call to msr_parse() after the existing first call:

        retcode = msr_parse ( (mseed+offset), buflen - offset, &msr, reclen, dataflag, verbose);

        // If record is detected but length is unknown, check if remaining buffer is a valid
        // length parse using the remaining buffer length as the record length.
        if (retcode > 0) {
          datasize = buflen - offset;
          if (datasize == 128 || datasize == 256 || datasize == 512 || datasize == 1024 ||
              datasize == 2048 || datasize == 4096 || datasize == 8192)
            retcode = msr_parse ( (mseed+offset), buflen - offset, &msr, datasize, dataflag, verbose);
        }
@krischer

This comment has been minimized.

Show comment
Hide comment
@krischer

krischer Oct 4, 2016

Member

Looks almost exactly like what I just pushed ;-)

Member

krischer commented Oct 4, 2016

Looks almost exactly like what I just pushed ;-)

@krischer

This comment has been minimized.

Show comment
Hide comment
@krischer

krischer Oct 4, 2016

Member

Thanks a lot for the detailed explanations! RTFM for me I guess ;-)

This PR now follows your advised sequence and everything seems to work nicely.

Member

krischer commented Oct 4, 2016

Thanks a lot for the detailed explanations! RTFM for me I guess ;-)

This PR now follows your advised sequence and everything seems to work nicely.

@ThomasLecocq

This comment has been minimized.

Show comment
Hide comment
@ThomasLecocq

ThomasLecocq Oct 4, 2016

Contributor

👍 me happy :)

Contributor

ThomasLecocq commented Oct 4, 2016

👍 me happy :)

@krischer

This comment has been minimized.

Show comment
Hide comment
@krischer

krischer Oct 4, 2016

Member

One more question: If no blockette 1000 is present ObsPy now assumes big endian for the data. Is that safe?

Member

krischer commented Oct 4, 2016

One more question: If no blockette 1000 is present ObsPy now assumes big endian for the data. Is that safe?

Show outdated Hide outdated obspy/io/mseed/src/obspy-readbuffer.c
else if ( retcode > 0 && retcode < (buflen - offset)) {
int r_bytes = buflen - offset;
if ((r_bytes == 128) || (r_bytes == 256) || (r_bytes == 512) || (r_bytes == 1024) ||
(r_bytes == 2048) || (r_bytes == 4096) || (r_bytes == 8192) || (r_bytes == 16384)) {

This comment has been minimized.

@Jollyfant

Jollyfant Oct 4, 2016

Contributor

I thought the record length can theoretically be anywhere between 256 and 2^256 bytes.

@Jollyfant

Jollyfant Oct 4, 2016

Contributor

I thought the record length can theoretically be anywhere between 256 and 2^256 bytes.

This comment has been minimized.

@krischer

krischer Oct 4, 2016

Member

I guess its unlikely to happen but the latest version now has a more generic test allowing for all possible record lengths.

@krischer

krischer Oct 4, 2016

Member

I guess its unlikely to happen but the latest version now has a more generic test allowing for all possible record lengths.

This comment has been minimized.

@chad-iris

chad-iris Oct 4, 2016

Contributor

miniSEED support enormous record lengths, but libmseed only allows up to MAXRECLEN (currently 1048576, aka 2^20 bytes) so it's not worth going beyond that.

What libmseed does internally for this edge case is to increment by MINRECLEN bytes (currently 128) until it hits MAXRECLEN or the end of the buffer. It then checks that the value is a power of 2 and determines the record length to be as implied.

A more efficient search would be to increment by powers of 2 instead of MINRECLEN, but in reality it doesn't take many cycles to find the <= 4096 record lengths.

@chad-iris

chad-iris Oct 4, 2016

Contributor

miniSEED support enormous record lengths, but libmseed only allows up to MAXRECLEN (currently 1048576, aka 2^20 bytes) so it's not worth going beyond that.

What libmseed does internally for this edge case is to increment by MINRECLEN bytes (currently 128) until it hits MAXRECLEN or the end of the buffer. It then checks that the value is a power of 2 and determines the record length to be as implied.

A more efficient search would be to increment by powers of 2 instead of MINRECLEN, but in reality it doesn't take many cycles to find the <= 4096 record lengths.

krischer added some commits Oct 4, 2016

@krischer

This comment has been minimized.

Show comment
Hide comment
@krischer

krischer Oct 4, 2016

Member

AppVeyor error:

obspy-readbuffer.obj : error LNK2019: unresolved external symbol roundf referenced in function readMSEEDBuffer

Does windows not have roundf()?

Member

krischer commented Oct 4, 2016

AppVeyor error:

obspy-readbuffer.obj : error LNK2019: unresolved external symbol roundf referenced in function readMSEEDBuffer

Does windows not have roundf()?

@krischer

This comment has been minimized.

Show comment
Hide comment
@krischer

krischer Oct 4, 2016

Member

All CI pass. No clue what coveralls problem is. IMHO ready to merge. Can someone please review?

Member

krischer commented Oct 4, 2016

All CI pass. No clue what coveralls problem is. IMHO ready to merge. Can someone please review?

@megies

This comment has been minimized.

Show comment
Hide comment
@megies

megies Oct 4, 2016

Member

No clue what coveralls problem is.

off topic..

  • no clue why coveralls is still running, we disabled it IIRC
  • no clue why codecov is not showing up in many PRs, i thought we enabled it properly.. -_-
Member

megies commented Oct 4, 2016

No clue what coveralls problem is.

off topic..

  • no clue why coveralls is still running, we disabled it IIRC
  • no clue why codecov is not showing up in many PRs, i thought we enabled it properly.. -_-
@krischer

This comment has been minimized.

Show comment
Hide comment
@krischer

krischer Oct 4, 2016

Member

off topic..

no clue why coveralls is still running, we disabled it IIRC
no clue why codecov is not showing up in many PRs, i thought we enabled it properly.. -_-

This PR is to the maintenance branch - we still use the old coverage testing there.

Member

krischer commented Oct 4, 2016

off topic..

no clue why coveralls is still running, we disabled it IIRC
no clue why codecov is not showing up in many PRs, i thought we enabled it properly.. -_-

This PR is to the maintenance branch - we still use the old coverage testing there.

@ThomasLecocq

This comment has been minimized.

Show comment
Hide comment
@ThomasLecocq

ThomasLecocq Oct 4, 2016

Contributor

Dunno how to review .. but if it read the test file I sent you without breaking anything else, I'd say it's good to go :-)

Contributor

ThomasLecocq commented Oct 4, 2016

Dunno how to review .. but if it read the test file I sent you without breaking anything else, I'd say it's good to go :-)

@chad-iris

This comment has been minimized.

Show comment
Hide comment
@chad-iris

chad-iris Oct 4, 2016

Contributor

One more question: If no blockette 1000 is present ObsPy now assumes big endian for the data. Is that safe?

Hhmm, probably, since most data without 1000's are pretty old and little-endian was not common then.

But I'm not sure why you have to assume, libmseed's automatic byte order determination will work whether 1000s are included or not.

Contributor

chad-iris commented Oct 4, 2016

One more question: If no blockette 1000 is present ObsPy now assumes big endian for the data. Is that safe?

Hhmm, probably, since most data without 1000's are pretty old and little-endian was not common then.

But I'm not sure why you have to assume, libmseed's automatic byte order determination will work whether 1000s are included or not.

@krischer

This comment has been minimized.

Show comment
Hide comment
@krischer

krischer Oct 4, 2016

Member

One more question: If no blockette 1000 is present ObsPy now assumes big endian for the data. Is that safe?

Hhmm, probably, since most data without 1000's are pretty old and little-endian was not common then.

But I'm not sure why you have to assume, libmseed's automatic byte order determination will work whether 1000s are included or not.

But that just determines the byte order of the header, right? And, if I remember correctly, msr->byteorder is only set if Blockette 1000 is present or the byteorder is forced through one of the environment variables.

I just checked the documentation and libmseed apparently always assumes identical header and data byte orders if Blockette 1000 is not present. I'll make some changes, so ObsPy does the same.

Member

krischer commented Oct 4, 2016

One more question: If no blockette 1000 is present ObsPy now assumes big endian for the data. Is that safe?

Hhmm, probably, since most data without 1000's are pretty old and little-endian was not common then.

But I'm not sure why you have to assume, libmseed's automatic byte order determination will work whether 1000s are included or not.

But that just determines the byte order of the header, right? And, if I remember correctly, msr->byteorder is only set if Blockette 1000 is present or the byteorder is forced through one of the environment variables.

I just checked the documentation and libmseed apparently always assumes identical header and data byte orders if Blockette 1000 is not present. I'll make some changes, so ObsPy does the same.

@krischer krischer merged commit 208f71f into obspy:maintenance_1.0.x Oct 6, 2016

3 of 4 checks passed

coverage/coveralls Coverage decreased (-0.01%) to 86.393%
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
docker-testbot Docker tests succeeded
Details

@krischer krischer referenced this pull request Oct 6, 2016

Merged

Update to libmseed v2.18 #1540

@megies megies deleted the krischer:primordial-mseed branch Oct 6, 2016

@megies megies added this to the 1.0.3 milestone Oct 6, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment