New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing Encoding Flag for Reading/writing SAC files. #1789

Merged
merged 7 commits into from Jun 5, 2017

Conversation

Projects
None yet
3 participants
@lijunzh
Contributor

lijunzh commented May 16, 2017

This PR tries to implementing 'encoding' flag for read/write functions in io.sac module. It intends to offer a complete solution to fix the problem for #1768 which only has a band-aid solution in #1773.

@jkmacc-LANL, this is just a place holder pull request to start the conversion. Let's work together on the changes. Let me know how you want to implement this.

PR Checklist

  • All tests still pass.
  • Any new features or fixed regressions are be covered via new tests.
  • Any new or changed features have are fully documented.
  • Significant changes have been added to CHANGELOG.txt .

@lijunzh lijunzh requested review from krischer, megies and jkmacc-LANL May 16, 2017

@lijunzh lijunzh added the .io.sac label May 16, 2017

@lijunzh lijunzh added this to the 1.2.0 milestone May 16, 2017

@megies

This comment has been minimized.

Show comment
Hide comment
@megies

megies May 16, 2017

Member

@jkmacc-LANL, this is just a place holder pull request to start the conversion. Let's work together on the changes. Let me know how you want to implement this

👍 for opening this as a stub PR with a branch in obspy/obspy directly, to make it easy for other people to interact

Member

megies commented May 16, 2017

@jkmacc-LANL, this is just a place holder pull request to start the conversion. Let's work together on the changes. Let me know how you want to implement this

👍 for opening this as a stub PR with a branch in obspy/obspy directly, to make it easy for other people to interact

@lijunzh

This comment has been minimized.

Show comment
Hide comment
@lijunzh

lijunzh May 19, 2017

Contributor

Here is my understanding of the SAC reading process. @jkmacc-LANL, correct me if I am wrong.

  1. The obspy.read calls some methods to determine the file type and will eventually decide to use obspy.io.sac.core._read_sac to read a SAC file.
  2. obspy.io.sac.core._read_sac calls obspy.io.sac.core._internal_read_sac to read SAC file (by calling obspy.io.sac.sactrace.SACTrace.read) and feed to obspy.trace, then obspy.stream
  3. obspy.io.sac.sactrace.SACTrace.read then calls two functions obspy.io.sac.arrayio.read_sac or obspy.io.sac.arrayio.read_sac_ascii to actually read the data from SAC depends on SAC file format.

My thoughts on the current code are following:

  • Do we need that many layers of calling? I check that obspy.io.sac.core._internal_read_sac was only called by obspy.io.sac.core._read_sac Inside core module. Will it be cleaner and faster (with less function overhead) that we implement _internal_read_sac as a code block instead of a function?
  • Assuming that we want to keep the multiple layers of calling functions, which level should we put the 'encoding' flag and let it propagate to the lowest level? I assume that users that use obspy.read may not bother to assign an 'encoding' flag. Maybe, we should start at obspy.io.sac.core._read_sac?
  • I am not entirely sure why we have a obspy.io.sac.arrayio.read_sac_ascii in addition to obspy.io.sac.arrayio.read_sac. Is it related to encoding problem? Should we do something there as well?

Let me know what you guys think.

Contributor

lijunzh commented May 19, 2017

Here is my understanding of the SAC reading process. @jkmacc-LANL, correct me if I am wrong.

  1. The obspy.read calls some methods to determine the file type and will eventually decide to use obspy.io.sac.core._read_sac to read a SAC file.
  2. obspy.io.sac.core._read_sac calls obspy.io.sac.core._internal_read_sac to read SAC file (by calling obspy.io.sac.sactrace.SACTrace.read) and feed to obspy.trace, then obspy.stream
  3. obspy.io.sac.sactrace.SACTrace.read then calls two functions obspy.io.sac.arrayio.read_sac or obspy.io.sac.arrayio.read_sac_ascii to actually read the data from SAC depends on SAC file format.

My thoughts on the current code are following:

  • Do we need that many layers of calling? I check that obspy.io.sac.core._internal_read_sac was only called by obspy.io.sac.core._read_sac Inside core module. Will it be cleaner and faster (with less function overhead) that we implement _internal_read_sac as a code block instead of a function?
  • Assuming that we want to keep the multiple layers of calling functions, which level should we put the 'encoding' flag and let it propagate to the lowest level? I assume that users that use obspy.read may not bother to assign an 'encoding' flag. Maybe, we should start at obspy.io.sac.core._read_sac?
  • I am not entirely sure why we have a obspy.io.sac.arrayio.read_sac_ascii in addition to obspy.io.sac.arrayio.read_sac. Is it related to encoding problem? Should we do something there as well?

Let me know what you guys think.

@jkmacc-LANL

This comment has been minimized.

Show comment
Hide comment
@jkmacc-LANL

jkmacc-LANL May 19, 2017

Contributor

Thanks for starting this, @lijunzh.

  • I haven't paid much attention to the stuff in obspy.io.sac.core, so if you think there should be fewer calling layers, I don't have an issue with that a priori. It looks like there's an _internal_<function> for every _<function> (which are part of the plugin API), and that the reason for having the _internal_<function> versions is to reduce code duplication. The API functions handle either streams or file names, and just hand off to the _internal_<function> which handles just streams. That being the case, I don't actually see a strong case for code duplication to reduce the number of function calls.
  • Yes, I'd make encoding available in _read_sac by pulling it out of **kwargs and propagating it down the stack to arrayio.read_sac. I can't remember, did we decide it needed to be documented at every layer?
  • arrayio.read_sac_ascii is just a low-level function for the SAC alphanumeric format, as opposed to the binary format. It may also benefit from the encoding flag, though the logic around this line may need to include some decoding. I don't consider this a high priority.
Contributor

jkmacc-LANL commented May 19, 2017

Thanks for starting this, @lijunzh.

  • I haven't paid much attention to the stuff in obspy.io.sac.core, so if you think there should be fewer calling layers, I don't have an issue with that a priori. It looks like there's an _internal_<function> for every _<function> (which are part of the plugin API), and that the reason for having the _internal_<function> versions is to reduce code duplication. The API functions handle either streams or file names, and just hand off to the _internal_<function> which handles just streams. That being the case, I don't actually see a strong case for code duplication to reduce the number of function calls.
  • Yes, I'd make encoding available in _read_sac by pulling it out of **kwargs and propagating it down the stack to arrayio.read_sac. I can't remember, did we decide it needed to be documented at every layer?
  • arrayio.read_sac_ascii is just a low-level function for the SAC alphanumeric format, as opposed to the binary format. It may also benefit from the encoding flag, though the logic around this line may need to include some decoding. I don't consider this a high priority.
@megies

This comment has been minimized.

Show comment
Hide comment
@megies

megies May 22, 2017

Member

I assume that users that use obspy.read may not bother to assign an 'encoding' flag.

I would allow passing encoding down from read(.. **kwargs) when it's encountered in kwargs but I agree that we probably don't want to document it with read but rather on the SAC API start page (io/sac/__init__.py) and with _read_sac and consorts.

Member

megies commented May 22, 2017

I assume that users that use obspy.read may not bother to assign an 'encoding' flag.

I would allow passing encoding down from read(.. **kwargs) when it's encountered in kwargs but I agree that we probably don't want to document it with read but rather on the SAC API start page (io/sac/__init__.py) and with _read_sac and consorts.

@lijunzh

This comment has been minimized.

Show comment
Hide comment
@lijunzh

lijunzh May 22, 2017

Contributor

@megies @jkmacc-LANL Great, glad that we agree on the code structure. I will start to work on the implementation/documentation and keep you guys posted.

Contributor

lijunzh commented May 22, 2017

@megies @jkmacc-LANL Great, glad that we agree on the code structure. I will start to work on the implementation/documentation and keep you guys posted.

"""
tr0 = read(self.file_encode, encoding='cp1252')[0]
self.assertEqual(tr0.stats.get('channel'), 'ÇÏÿÿÇÏÿÿ')

This comment has been minimized.

@lijunzh

lijunzh May 29, 2017

Contributor

Somehow, this test did not run during pytest. I need some help here.

@lijunzh

lijunzh May 29, 2017

Contributor

Somehow, this test did not run during pytest. I need some help here.

This comment has been minimized.

@jkmacc-LANL

jkmacc-LANL May 30, 2017

Contributor

I'm not sure about this. Maybe @megies knows what's happening.

@jkmacc-LANL

jkmacc-LANL May 30, 2017

Contributor

I'm not sure about this. Maybe @megies knows what's happening.

@lijunzh

This change passes all the old tests. The new test I wrote did not run. Can someone take a look and let me know what went wrong with it? Thanks.

@jkmacc-LANL

This all makes sense to me; my suggestions are really just to move the new keyword to the end of calling signatures and calls.

This PR has made me realize just how spread-out the string decoding/encoding is in the SAC subpackage. I think a separate PR that pushes string handling into arrayio.py and changes how the string headers are stored (i.e. not in a NumPy byte array) is warranted.

Thanks for your work on this, @lijunzh !

Show outdated Hide outdated obspy/io/sac/sactrace.py
Show outdated Hide outdated obspy/io/sac/core.py
Show outdated Hide outdated obspy/io/sac/core.py
Show outdated Hide outdated obspy/io/sac/sactrace.py
"""
tr0 = read(self.file_encode, encoding='cp1252')[0]
self.assertEqual(tr0.stats.get('channel'), 'ÇÏÿÿÇÏÿÿ')

This comment has been minimized.

@jkmacc-LANL

jkmacc-LANL May 30, 2017

Contributor

I'm not sure about this. Maybe @megies knows what's happening.

@jkmacc-LANL

jkmacc-LANL May 30, 2017

Contributor

I'm not sure about this. Maybe @megies knows what's happening.

@lijunzh

This comment has been minimized.

Show comment
Hide comment
@lijunzh

lijunzh May 30, 2017

Contributor

@jkmacc-LANL Thanks for your review. I put the encoding parameter at the beginning at first and tried to use it as a required parameter but realized that it will fail a bunch of tests. I changed it back to default at "ASCII" but forgot to move its position. Good catch. I will change them now.

Contributor

lijunzh commented May 30, 2017

@jkmacc-LANL Thanks for your review. I put the encoding parameter at the beginning at first and tried to use it as a required parameter but realized that it will fail a bunch of tests. I changed it back to default at "ASCII" but forgot to move its position. Good catch. I will change them now.

@lijunzh

This comment has been minimized.

Show comment
Hide comment
@lijunzh

lijunzh May 30, 2017

Contributor

@jkmacc-LANL The order has been changed for those place as you requested.

Contributor

lijunzh commented May 30, 2017

@jkmacc-LANL The order has been changed for those place as you requested.

@@ -1056,6 +1056,9 @@ def read(cls, source, headonly=False, ascii=False, byteorder=None,
beginning with '-12345' are considered unset. If True, they
are instead passed without modification. Good for debugging.
:type debug_strings: bool
:param encoding: Encoding string that passes the user specified
encoding scheme.
:type encoding: str

This comment has been minimized.

@jkmacc-LANL

jkmacc-LANL May 30, 2017

Contributor

(Looks like there's extra whitespace here.)

@jkmacc-LANL

jkmacc-LANL May 30, 2017

Contributor

(Looks like there's extra whitespace here.)

This comment has been minimized.

@lijunzh

lijunzh May 30, 2017

Contributor

@jkmacc-LANL Thanks for catching it. Yeah, my IDE keeps adding white space in comments. I need to fix this problem every time I changed docstring.

@lijunzh

lijunzh May 30, 2017

Contributor

@jkmacc-LANL Thanks for catching it. Yeah, my IDE keeps adding white space in comments. I need to fix this problem every time I changed docstring.

@lijunzh

This comment has been minimized.

Show comment
Hide comment
@lijunzh

lijunzh May 30, 2017

Contributor

@megies Could you please take a look at the new test I wrote and let me know if what went wrong with it? Everything else is ready for merge.

Contributor

lijunzh commented May 30, 2017

@megies Could you please take a look at the new test I wrote and let me know if what went wrong with it? Everything else is ready for merge.

@lijunzh

This comment has been minimized.

Show comment
Hide comment
@lijunzh

lijunzh Jun 5, 2017

Contributor

@megies @jkmacc-LANL Here is the pytest results I got.

lijun@lawn-128-61-45-214:~/git/obspy/obspy/io/sac$ pytest -v tests/test_core.py ============================= test session starts ==============================
platform darwin -- Python 3.6.1, pytest-3.0.7, py-1.4.33, pluggy-0.4.0 -- /Users/lijun/anaconda3/bin/python
cachedir: ../../../.cache
rootdir: /Users/lijun/git/obspy, inifile:
collected 51 items

tests/test_core.py::CoreTestCase::test_always_sac_reftime PASSED
tests/test_core.py::CoreTestCase::test_convert_to_sac PASSED
tests/test_core.py::CoreTestCase::test_decimate_resample PASSED
tests/test_core.py::CoreTestCase::test_default_values PASSED
tests/test_core.py::CoreTestCase::test_encoding_flag PASSED
tests/test_core.py::CoreTestCase::test_invalid_header_field PASSED
tests/test_core.py::CoreTestCase::test_is_sac_bytes_io PASSED
tests/test_core.py::CoreTestCase::test_is_sac_open_file PASSED
tests/test_core.py::CoreTestCase::test_is_sac_string_io_raises PASSED
tests/test_core.py::CoreTestCase::test_is_sacxy_bytes_io PASSED
tests/test_core.py::CoreTestCase::test_is_sacxy_open_file_binary_mode PASSED
tests/test_core.py::CoreTestCase::test_is_sacxy_open_file_text_mode_fails PASSED
tests/test_core.py::CoreTestCase::test_is_sacxy_string_io_raises PASSED
tests/test_core.py::CoreTestCase::test_issue390 PASSED
tests/test_core.py::CoreTestCase::test_issue_156 PASSED
tests/test_core.py::CoreTestCase::test_iztype11 PASSED
tests/test_core.py::CoreTestCase::test_merge_sac_obspy_headers PASSED
tests/test_core.py::CoreTestCase::test_not_ascii PASSED
tests/test_core.py::CoreTestCase::test_not_used_but_given_headers PASSED
tests/test_core.py::CoreTestCase::test_null_terminated_strings PASSED
tests/test_core.py::CoreTestCase::test_read_and_write_via_obspy PASSED
tests/test_core.py::CoreTestCase::test_read_big_endian_via_obspy PASSED
tests/test_core.py::CoreTestCase::test_read_head_via_obspy PASSED
tests/test_core.py::CoreTestCase::test_read_sac_from_bytes_io PASSED
tests/test_core.py::CoreTestCase::test_read_sac_from_open_file PASSED
tests/test_core.py::CoreTestCase::test_read_via_obspy PASSED
tests/test_core.py::CoreTestCase::test_read_via_obspy_from_bytes_io PASSED
tests/test_core.py::CoreTestCase::test_read_with_fsize PASSED
tests/test_core.py::CoreTestCase::test_read_write_bytes_io PASSED
tests/test_core.py::CoreTestCase::test_read_write_open_file PASSED
tests/test_core.py::CoreTestCase::test_read_write_via_obspy PASSED
tests/test_core.py::CoreTestCase::test_read_write_xy_via_obspy PASSED
tests/test_core.py::CoreTestCase::test_read_xy_write_xy_from_bytes_io PASSED
tests/test_core.py::CoreTestCase::test_read_xy_write_xy_from_open_file_binary_mode PASSED
tests/test_core.py::CoreTestCase::test_read_xy_write_xy_via_obspy PASSED
tests/test_core.py::CoreTestCase::test_reference_time PASSED
tests/test_core.py::CoreTestCase::test_sac_booleans_from_trace PASSED
tests/test_core.py::CoreTestCase::test_sac_file_from_new_header PASSED
tests/test_core.py::CoreTestCase::test_set_version PASSED
tests/test_core.py::CoreTestCase::test_swap_bytes_via_obspy PASSED
tests/test_core.py::CoreTestCase::test_undefined_b PASSED
tests/test_core.py::CoreTestCase::test_valid_sac_from_minimal_existing_sac_header PASSED
tests/test_core.py::CoreTestCase::test_write_sac_xy_with_minimum_stats PASSED
tests/test_core.py::CoreTestCase::test_write_small_trace PASSED
tests/test_core.py::CoreTestCase::test_write_via_obspy PASSED
tests/test_core.py::CoreTestCase::test_write_via_obspy_to_bytes_io PASSED
tests/test_core.py::CoreTestCase::test_writing_micro_seconds PASSED
tests/test_core.py::CoreTestCase::test_writing_to_file_like_objects_with_obspy PASSED
tests/test_core.py::CoreTestCase::test_writing_to_io_string_io_fails PASSED
tests/test_core.py::CoreTestCase::test_writing_to_obj_with_multiple_traces_fails PASSED
tests/test_core.py::CoreTestCase::test_wrong_encoding PASSED

========================== 51 passed in 2.42 seconds ===========================

The new test I wrote did not run at all. Here is the code I added.

    def test_encoding_flag(self):
        """
        Test passing encoding flag through obspy.read
        """
        tr0 = read(self.file_encode, encoding='cp1252')[0]
        self.assertEqual(tr0.stats.get('channel'), 'ÇÏÿÿÇÏÿÿ')
Contributor

lijunzh commented Jun 5, 2017

@megies @jkmacc-LANL Here is the pytest results I got.

lijun@lawn-128-61-45-214:~/git/obspy/obspy/io/sac$ pytest -v tests/test_core.py ============================= test session starts ==============================
platform darwin -- Python 3.6.1, pytest-3.0.7, py-1.4.33, pluggy-0.4.0 -- /Users/lijun/anaconda3/bin/python
cachedir: ../../../.cache
rootdir: /Users/lijun/git/obspy, inifile:
collected 51 items

tests/test_core.py::CoreTestCase::test_always_sac_reftime PASSED
tests/test_core.py::CoreTestCase::test_convert_to_sac PASSED
tests/test_core.py::CoreTestCase::test_decimate_resample PASSED
tests/test_core.py::CoreTestCase::test_default_values PASSED
tests/test_core.py::CoreTestCase::test_encoding_flag PASSED
tests/test_core.py::CoreTestCase::test_invalid_header_field PASSED
tests/test_core.py::CoreTestCase::test_is_sac_bytes_io PASSED
tests/test_core.py::CoreTestCase::test_is_sac_open_file PASSED
tests/test_core.py::CoreTestCase::test_is_sac_string_io_raises PASSED
tests/test_core.py::CoreTestCase::test_is_sacxy_bytes_io PASSED
tests/test_core.py::CoreTestCase::test_is_sacxy_open_file_binary_mode PASSED
tests/test_core.py::CoreTestCase::test_is_sacxy_open_file_text_mode_fails PASSED
tests/test_core.py::CoreTestCase::test_is_sacxy_string_io_raises PASSED
tests/test_core.py::CoreTestCase::test_issue390 PASSED
tests/test_core.py::CoreTestCase::test_issue_156 PASSED
tests/test_core.py::CoreTestCase::test_iztype11 PASSED
tests/test_core.py::CoreTestCase::test_merge_sac_obspy_headers PASSED
tests/test_core.py::CoreTestCase::test_not_ascii PASSED
tests/test_core.py::CoreTestCase::test_not_used_but_given_headers PASSED
tests/test_core.py::CoreTestCase::test_null_terminated_strings PASSED
tests/test_core.py::CoreTestCase::test_read_and_write_via_obspy PASSED
tests/test_core.py::CoreTestCase::test_read_big_endian_via_obspy PASSED
tests/test_core.py::CoreTestCase::test_read_head_via_obspy PASSED
tests/test_core.py::CoreTestCase::test_read_sac_from_bytes_io PASSED
tests/test_core.py::CoreTestCase::test_read_sac_from_open_file PASSED
tests/test_core.py::CoreTestCase::test_read_via_obspy PASSED
tests/test_core.py::CoreTestCase::test_read_via_obspy_from_bytes_io PASSED
tests/test_core.py::CoreTestCase::test_read_with_fsize PASSED
tests/test_core.py::CoreTestCase::test_read_write_bytes_io PASSED
tests/test_core.py::CoreTestCase::test_read_write_open_file PASSED
tests/test_core.py::CoreTestCase::test_read_write_via_obspy PASSED
tests/test_core.py::CoreTestCase::test_read_write_xy_via_obspy PASSED
tests/test_core.py::CoreTestCase::test_read_xy_write_xy_from_bytes_io PASSED
tests/test_core.py::CoreTestCase::test_read_xy_write_xy_from_open_file_binary_mode PASSED
tests/test_core.py::CoreTestCase::test_read_xy_write_xy_via_obspy PASSED
tests/test_core.py::CoreTestCase::test_reference_time PASSED
tests/test_core.py::CoreTestCase::test_sac_booleans_from_trace PASSED
tests/test_core.py::CoreTestCase::test_sac_file_from_new_header PASSED
tests/test_core.py::CoreTestCase::test_set_version PASSED
tests/test_core.py::CoreTestCase::test_swap_bytes_via_obspy PASSED
tests/test_core.py::CoreTestCase::test_undefined_b PASSED
tests/test_core.py::CoreTestCase::test_valid_sac_from_minimal_existing_sac_header PASSED
tests/test_core.py::CoreTestCase::test_write_sac_xy_with_minimum_stats PASSED
tests/test_core.py::CoreTestCase::test_write_small_trace PASSED
tests/test_core.py::CoreTestCase::test_write_via_obspy PASSED
tests/test_core.py::CoreTestCase::test_write_via_obspy_to_bytes_io PASSED
tests/test_core.py::CoreTestCase::test_writing_micro_seconds PASSED
tests/test_core.py::CoreTestCase::test_writing_to_file_like_objects_with_obspy PASSED
tests/test_core.py::CoreTestCase::test_writing_to_io_string_io_fails PASSED
tests/test_core.py::CoreTestCase::test_writing_to_obj_with_multiple_traces_fails PASSED
tests/test_core.py::CoreTestCase::test_wrong_encoding PASSED

========================== 51 passed in 2.42 seconds ===========================

The new test I wrote did not run at all. Here is the code I added.

    def test_encoding_flag(self):
        """
        Test passing encoding flag through obspy.read
        """
        tr0 = read(self.file_encode, encoding='cp1252')[0]
        self.assertEqual(tr0.stats.get('channel'), 'ÇÏÿÿÇÏÿÿ')
@jkmacc-LANL

This comment has been minimized.

Show comment
Hide comment
@jkmacc-LANL

jkmacc-LANL Jun 5, 2017

Contributor

The tests are listed alphabetically, and it looks like I see test_encoding_flag PASSED at the top. I think you're good!

Contributor

jkmacc-LANL commented Jun 5, 2017

The tests are listed alphabetically, and it looks like I see test_encoding_flag PASSED at the top. I think you're good!

@lijunzh

This comment has been minimized.

Show comment
Hide comment
@lijunzh

lijunzh Jun 5, 2017

Contributor
Contributor

lijunzh commented Jun 5, 2017

@lijunzh lijunzh merged commit 9f9e65b into master Jun 5, 2017

5 of 6 checks passed

docker-testbot docker testbot results not available yet
ci/circleci Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 90%)
Details
codecov/project 87.67% (+1.41%) compared to 44ca7bb
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@lijunzh lijunzh deleted the SAC_encoding_flag branch Jun 5, 2017

@lijunzh lijunzh modified the milestones: 1.1.0, 1.2.0 Jun 8, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment