New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read fcnt format #2265

Merged
merged 29 commits into from Dec 20, 2018

Conversation

Projects
None yet
4 participants
@thefroid
Copy link
Contributor

thefroid commented Nov 16, 2018

What does this PR do?

Read waveforms and headers of fcnt format. More informations can be extracted from the fcnt file compared to the previous code.

Why was it initiated? Any relevant Issues?

It was initiated because i needed informations which were not extracted by the previous code.

PR Checklist

  • Correct base branch selected? master for new features, maintenance_... for bug fixes
  • This PR is not directly related to an existing issue (which has no PR yet).
  • If the PR is making changes to documentation, docs pages can be built automatically.
    Just remove the space in the following string after the + sign: "+DOCS"
  • If any network modules should be tested for the PR, add them as a comma separated list
    (e.g. clients.fdsn,clients.arclink) after the colon in the following magic string: "+TESTS:"
    (you can also add "ALL" to just simply run all tests across all modules)
  • All tests still pass.
  • Any new features or fixed regressions are be covered via new tests.
  • Any new or changed features have are fully documented.
  • Significant changes have been added to CHANGELOG.txt .
  • First time contributors have added your name to CONTRIBUTORS.txt .
@d-chambers

This comment has been minimized.

Copy link
Member

d-chambers commented Nov 17, 2018

Hi @thefroid,

Thanks for contributing to ObsPy! It's great to see new contributors, and we look forward to working on ObsPy with you more.

I certainly see where you are coming from. There is a lot of information contained in the rg16 file format that users might want to extract. The current code only gets enough information to fill out the basics in each traces' stats. However, there are a few issues, which I will list later on, but in general, it is better/easier to take the existing code base and add the features you need rather than replace it with a completely different implementation. This is especially true if you change the signature (the names/number of arguments) of any of the public functions/methods because by doing so you are likely to break existing codes. Also, deleting tests is rarely the right thing to do; each test is probably there for a reason. For example, it may document expected behavior users rely on or ensure a bug does not get reintroduced down the road.

I contributed the original code, and in order to understand some of the specific requirements and discussion of the implementation I suggest you read through the original PR.

A few issues:

  1. The signature of read_rg16 has changed. I know several users (including myself) for which this will cause problems.

  2. read_rg16 should be named _read_rg16 to be consistent with all the other read functions from the io module.

  3. The automatic mapping from in-line and cross-line to principal components is not desirable. See this discussion.

  4. The efficient merge function is important (at least to me), and should not be removed without discussion.

  5. The three_component flag is not necessary, and no other parser requires users to specify how many components are contained in the trace before reading the file.

  6. It is probably better to add format-specific information (when details is set to True) to an attribute dict in the Stats object named "rg16". For example, take a look how the sac parser does this for sac-specific info.

  7. The read function can no longer accept a file buffer (most parsers should be able to accept both a string specifying path and a file buffer).

  8. The file format document included in the docs section is marked as proprietary. I believe (but I am not an IP lawyer) that if we include it in the ObsPy code base it would imply a) that we have the right to distribute it, b) the LGPL license applies to it, and c) the ObsPy copyrights apply to it. I don't think any of these is true. If the document owner provided explicit (written) consent to include it we could probably do it, otherwise I do not feel comfortable including it. We don't want to land ourselves in any legal trouble.

Moving forward, it would probably be easier to address the issues by modifying the existing code-base (to pull out the information you need) rather than completely replacing it. However, if there is a good reason to replace it, the issues are addressed, and the new code will not break existing codes I am fine with it.

@ThomasLecocq

This comment has been minimized.

Copy link
Contributor

ThomasLecocq commented Nov 18, 2018

Hey,

good to see this coming :) While you're busy, you can also add @d-chambers and myself as original authors of the rg16 support (I wrote the sources that were forked by "iceseismic").

@megies megies added this to the 1.2.0 milestone Nov 19, 2018

@thefroid

This comment has been minimized.

Copy link
Contributor

thefroid commented Nov 20, 2018

Hi @d-chambers,

Thank you for the code reviewing. I took into account your suggestions and I corrected the points: 1,2,3,4,5 and 7. I am working now on point 6.
For point 8, I sent an e-mail to a people from FairField to ask if it was possible to put the fcnt documentation in the github repository of ObsPy. He replied to me : "Yes that is ok. That information is public for users to be able to read our data format". That's why i felt free to share the fcnt documentation. But if it is a problem, I can remove it.
Concerning point 3, in my mind, the normal process to deploy the nodes is the gold contact toward the north, that's why i automatically mapped in Z,N,E the components. Maybe, It could be automatically mapped in Z, 1, 2 because generally the nodes are deployed fastly on the field, so the orientation error could be more than 5 degrees. It will have the advantage to remove one argument of the function.

@ThomasLecocq

This comment has been minimized.

Copy link
Contributor

ThomasLecocq commented Nov 20, 2018

Concerning point 3, in my mind, the normal process to deploy the nodes is the gold contact toward the north, that's why i automatically mapped in Z,N,E the components. Maybe, It could be automatically mapped in Z, 1, 2 because generally the nodes are deployed fastly on the field, so the orientation error could be more than 5 degrees. It will have the advantage to remove one argument of the function.

hi @thefroid indeed, it's also what is recommended by SISMOB France (they even drew an arrow on top of the nodes :-) )

image

About your fast-merge code, it's very nice and I think it should actually be somehow brought to another level, where it could be useful to any Stream object, @krischer, @megies: what do you think?

@thefroid

This comment has been minimized.

Copy link
Contributor

thefroid commented Nov 20, 2018

Hi Thomas Lecocq,
The fast-merge code was implemented by Derrick Chambers. I just re-used its function.

@d-chambers

This comment has been minimized.

Copy link
Member

d-chambers commented Nov 20, 2018

Hey @thefroid,

These changes look promising. Thanks for working on it. I am quite busy atm but I should be able to do a proper review within a week or so. Stay tuned.

@megies

This comment has been minimized.

Copy link
Member

megies commented Nov 21, 2018

For point 8, I sent an e-mail to a people from FairField to ask if it was possible to put the fcnt documentation in the github repository of ObsPy. He replied to me : "Yes that is ok. That information is public for users to be able to read our data format". That's why i felt free to share the fcnt documentation. But if it is a problem, I can remove it.

@thefroid It would be good to have the pdf file in the repository. Ideally you should contact FairField again, and be precise, i.e. ask them if we can include the file under LGPL license in obspy. Seems like they have no problem with it anyway.

@megies

This comment has been minimized.

Copy link
Member

megies commented Nov 21, 2018

(they even drew an arrow on top of the nodes :-) )

Really funny how a seismic instrument would not sport a "North" arrow when comin from the factory by default.. ♐️

@d-chambers d-chambers self-requested a review Nov 22, 2018

:copyright:
The ObsPy Development Team (devs@obspy.org)
the ObsPy Development Team (devs@obspy.org)

This comment has been minimized.

@d-chambers

d-chambers Nov 22, 2018

Member

capitalize "the"

components to Z, N, and E (if 3 component) as well as correct the polarity
for the vertical component.
* `details`: If this parameter is set to True, all the informations contained
in the headers are extracted. These informations are stored in an attribute

This comment has been minimized.

@d-chambers
"""
from __future__ import (absolute_import, division, print_function,

This comment has been minimized.

@d-chambers

d-chambers Nov 22, 2018

Member

I think these lines need to be here, even for empty files.

This comment has been minimized.

@megies

megies Dec 13, 2018

Member

Yes, @thefroid these lines have to be in (almost) every .py file. There's also a test for it, but I guess that fail is lost in all the test fail noise we have right now.. :(

@@ -1,233 +1,845 @@
# -*- coding: utf-8 -*-
from __future__ import (absolute_import, division, print_function,

This comment has been minimized.

@d-chambers

d-chambers Nov 22, 2018

Member

Unfortunately we need to keep these lines until we drop python 2 support. They should be at the top (after the docstring) of each file.

continuous data files having 100,000+ traces this will create
more manageable streams.
:param merge: If True merge contiguous data blocks as they are found. For
continuous data files having 100,000+ traces this will create

This comment has been minimized.

@d-chambers

d-chambers Nov 22, 2018

Member

multi-line parameter text should be indented an additional level after the first line. See the old code for an example.

if endtime is None:
local_time = time.localtime()
endtime = UTCDateTime(local_time[0], local_time[1], local_time[2],
local_time[3], local_time[4], local_time[5])

This comment has been minimized.

@d-chambers

d-chambers Nov 22, 2018

Member

It would be easier to just init a far out time eg:

endtime = UTCDateTime('3000-01-01').timestamp

one line rather than 3.

This comment has been minimized.

@d-chambers

d-chambers Nov 22, 2018

Member

infact, combining these to one line might be a good idea:

starttime = starttime or UTCDateTime(1970, 1, 1)
endtime = endtime or UTCDateTime("3000-01-01")
# return max(num_traces1, num_traces2)


if __name__ == '__main__':

This comment has been minimized.

@d-chambers

d-chambers Nov 22, 2018

Member

These lines should be at the end of each file. They run the doctests (when applicable)

@d-chambers

This comment has been minimized.

Copy link
Member

d-chambers commented Nov 22, 2018

I am looking this over more and there are still some general issues.

  1. Make sure to run the linter to catch style violations. You can just install flake8 then run python obspy/core/tests/test_code_formatting.py and if the test passes you don't have any issues. The future imports should be at the top of every file.

  2. Reading three component data now takes ~80% longer than the previous code. This is a problem because often rg16 files can be very large.

  3. obspy.read no longer works on rg16 files with or without the format keyword specified.

  4. In testing this code on some large rg16 files from work I am running into some difficult to debug issues. This is bit frustrating because the existing implementation works fine.

There are a few potential paths forward on this one:

  1. We keep working through these issues until the new code passes
  2. We just modify the old code to pull the additional data out of the headers you need
  3. We use the new code as a header parser (e.g. to create the "rg16" attribute dict) then combine it with the old code for when the details parameter is set to True.

Honestly, my preferred path forward on this is 2 or 3. I am having a hard time justifying ripping out the entirety of the old (working/tested) implementation just to add support to read additional data from the headers. However, since I wrote the original implementation I may be a bit biased so perhaps I should "recuse" myself. What do you think @ThomasLecocq or @megies ?

:param left_part: If True, start the reading from the first half part
of the byte position. If False, start the reading from the second
half part of the byte position.
:type left_part: boolean

This comment has been minimized.

@megies

megies Dec 11, 2018

Member

Not sure if boolean works (probably does), we usually have bool

if dtype == 'binary':
return _read_binary(fi, length, left_part)
if dtype == 'IEEE':
data = np.frombuffer(fi.read(int(length)), '>f4')
return data[0] if len(data) == 1 else data

This comment has been minimized.

@megies

megies Dec 11, 2018

Member

this is just cosmetics, but logically this should be elifs.

return data[0] if len(data) == 1 else data


@_register_read_func('bcd')
def _read_bcd(fi, length):
def _read_bcd(fi, length, left_part):

This comment has been minimized.

@megies

megies Dec 11, 2018

Member

we have some BCD code in io.reftek already I think. Just leave as is for now, but ideally we want to avoid duplicated code..

This comment has been minimized.

@d-chambers

d-chambers Dec 11, 2018

Member

There was some talk on refactoring all of our parsers to use common utilities functions. It would be worthwhile but a huge effort best left for another PR.

This comment has been minimized.

@megies

megies Dec 19, 2018

Member

I agree, should be done in another PR. Mostly mentioned it for future reference and FYI 👍

@@ -36,6 +36,8 @@ master:
- obspy.io.reftek:
* Implement reading reftek encodings '16' and '32' (uncompressed data,
16/32bit integers, see #2058 and #2059)
- obspy.io.rg16:
* implement module to read waveforms and headers from fcnt format.

This comment has been minimized.

@megies

megies Dec 11, 2018

Member

please add ticket number

@megies

This comment has been minimized.

Copy link
Member

megies commented Dec 11, 2018

I skimmed it and made some nit-picky comments.. aside from that if @d-chambers is happy, I'm happy. Thanks for all the work @thefroid 🎉

@d-chambers

This comment has been minimized.

Copy link
Member

d-chambers commented Dec 11, 2018

I try to always be happy, it's easier to live life that way 😄

thefroid added some commits Dec 12, 2018

`Farfield Nodal <fairfieldnodal.com>`_'s
`Zland <http://fairfieldnodal.com/equipment/zland>`_ product line.
``Farfield Nodal <fairfieldnodal.com>``_'s
``Zland <http://fairfieldnodal.com/equipment/zland>``_ product line.

This comment has been minimized.

@megies

megies Dec 13, 2018

Member

I'm sorry if I confused you earlier @thefroid.. these links should stay as before. Please see the autogenerated PR docs here: http://docs.obspy.org/pull-requests/2265/packages/obspy.io.rg16.html?highlight=fairfield

Also see this page for reference: http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html

2. Using the :mod:`obspy.io.rg16` specific function
:func:`obspy.io.rg16.core._read_rg16`.
2. Using the :mod:``obspy.io.rg16`` specific function
:func:``obspy.io.rg16.core._read_rg16``.

This comment has been minimized.

@megies

megies Dec 13, 2018

Member

same here @thefroid these are internal sphinx links, see http://www.sphinx-doc.org/en/master/usage/restructuredtext/domains.html#python-roles (although we just use :func: and not :py:func:), so they need exactly one backtick.

from __future__ import (absolute_import, division, print_function,
unicode_literals)
from future.builtins import * # NOQA
from future.builtins import list

This comment has been minimized.

@megies
2. Using the :mod:`obspy.io.rg16` specific function
:func:`obspy.io.rg16.core._read_rg16`.
2. Using the specific function :func:`obspy.io.rg16.core._read_rg16` in the
module :mod:`obspy.io.rg16`.

This comment has been minimized.

@megies

megies Dec 19, 2018

Member

Using the private function should not be advertized as a legit standard use case. I know this has been in there before your PR @thefroid but would you be so kind to remove the ad for the private function? You can have a look at e.g. the mseed landing page: http://docs.obspy.org/master/packages/obspy.io.mseed.html?highlight=_read_mseed

The RG16 landing page should clearly advertize using read() and should only mention the private method _read_rg16() by stating that additional kwargs to pass in to read() should be looked up there.

CC @d-chambers

@megies

This comment has been minimized.

Copy link
Member

megies commented Dec 19, 2018

Last minor nit picks:

  • don't advertize direct use of private function _read_rg16 on RG16 API landing page
  • some files are still made executable by this PR, please make them non-executable again like mentioned before (e.g. obspy/io/rg16/core.py 100644 → 100755 see e.g. this page if anything is unclear: https://stackoverflow.com/a/40979016)

thefroid added some commits Dec 19, 2018

:mod:`obspy.io.rg16`
>>> from obspy.io.rg16.core import _read_initial_headers
>>> initial_headers = _read_initial_headers(filename)

This comment has been minimized.

@megies

megies Dec 19, 2018

Member

Hmm. @thefroid this should maybe stay in, as this is really functionality that the user can not achieve with core high level routines.

This comment has been minimized.

@thefroid

thefroid Dec 19, 2018

Contributor

In fact, the information contained in the initial headers can be retrieved calling the read function with parameter details set to True. It creates a rg16 object containing information from initial headers and trace headers.

This comment has been minimized.

@megies

megies Dec 20, 2018

Member

Ah I see, alright then! Is unpacking these "details" very costly? Otherwise it might be worth a thought to unpack those details by default and use that switch to opt out for speed critical scenarios?

This comment has been minimized.

@thefroid

thefroid Dec 20, 2018

Contributor

Yes setting "details" to True in the read function is quite costly.

This comment has been minimized.

@megies

megies Dec 20, 2018

Member

Ok, then it's fine as it is now. 👍

@megies

This comment has been minimized.

Copy link
Member

megies commented Dec 19, 2018

@d-chambers and @thefroid, if I read this correctly..

trace_starttime = _read(filename, trace_block_start + 20 + 2*32, 8,
'binary') / 1e6
if starttime.timestamp <= trace_starttime < endtime.timestamp:
trace = _make_trace(filename, trace_block_start, headonly,
contacts_north, details)

..then the user could end up missing data that is actually in between user-specified starttime and endtime, as this check only respects the start time of a given time series. Is the length of the time series known at this time? If so, this should really be changed to include segments that partially overlap given start/endtime and a stream.trim(...) added at the very end to cut off any unwanted parts.

@thefroid

This comment has been minimized.

Copy link
Contributor

thefroid commented Dec 20, 2018

@megies Yes you are right, it can miss data. For example when the start of the data block is before the starttime and the end of the data block is after the starttime set by the user, the data block is not read. I corrected this bug in my before last commit. A stream._ltrim and stream._rtrim is applied in the read function if options starttime and endtime are used, so that's why i didn't do a stream.trim() in the read_rg16 function.

@megies

This comment has been minimized.

Copy link
Member

megies commented Dec 20, 2018

A stream._ltrim and stream._rtrim is applied in the read function if options starttime and endtime are used, so that's why i didn't do a stream.trim() in the read_rg16 function.

Ah OK, yeah makes sense that we do that in read at the end.. didn't remember. ^^
Just saw this minor flaw about selection of preliminary rg16 traces in the rg16 read method. Thanks a bunch for fixing that oversight @thefroid! 🎉

@megies

This comment has been minimized.

Copy link
Member

megies commented Dec 20, 2018

One last thing @thefroid (hopefully).. we need a tiny bit of numpy backwards compatibility fixed, then we can merge.

See http://tests.obspy.org/?pr=2265 and especially http://tests.obspy.org/100219/#1

np.isclose is not available for numpy < 1.7 (I know that's ancient.. but we still support some Linux distros that have numpy 1.6), so I guess you need to simulate it with sort of a try: np.testing.assert_allclose / except AssertionError: ... combination..

@megies

This comment has been minimized.

Copy link
Member

megies commented Dec 20, 2018

Looks good, thanks again @thefroid! Merging! 🎉

@megies megies merged commit 4ff18a2 into obspy:master Dec 20, 2018

1 of 5 checks passed

ci/circleci Your tests failed on CircleCI
Details
continuous-integration/travis-ci/pr The Travis CI build failed
Details
docs-buildbot Build succeeded, but there are warnings/errors:
Details
docker-testbot docker testbot results not available yet
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment