Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable string encoding for SAC files #1773

Merged
merged 1 commit into from
May 12, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.txt
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,8 @@ master: (doi: 10.5281/zenodo.165135)
inheritance. (see #1507)
* Reference time not written to SAC file when made from scratch
(see #1575)
* Reinforce ASCII encoding in reading non-ASCII SAC files regardless of
default encoding setting. (see #1768)
- obspy.io.sh:
* Fix writing of long headers for Python 3 (see #1526)
* Whitespace in header fields is not ignored anymore (see #1552)
Expand Down
1 change: 1 addition & 0 deletions obspy/CONTRIBUTORS.txt
Original file line number Diff line number Diff line change
Expand Up @@ -78,3 +78,4 @@ Wassermann, Joachim
Williams, Mark C.
Winkelman, Andrew
Zad, Seyed Kasra Hosseini
Zhu, Lijun
2 changes: 1 addition & 1 deletion obspy/io/sac/sactrace.py
Original file line number Diff line number Diff line change
Expand Up @@ -1092,7 +1092,7 @@ def read(cls, source, headonly=False, ascii=False, byteorder=None,
val = _ut._clean_str(val, strip_whitespace=False)
if val.startswith(native_str('-12345')):
val = HD.SNULL
hs[i] = val
hs[i] = val.encode('ASCII', 'replace')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense to me; when reading, why would you want bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment. hs[i] was an array of bytes. So, the previous statement will make the implicit conversion which is equivalent to hs[i] = val.encode('ASCII', 'strict'). However, when header string is not encoded in 'ASCII', it causes a problem and thus raises an exemption. This PR was trying to let the non-ASCII characters pass the read function for now as '?' and leave the implementation of a 'encoding' flag for the later version as we discussed before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ut._clean_str produces a str; encode goes from str to bytes; I think you might have something backwards. Are you testing on Python 3?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, upon further investigation, this works because SAC I/O is not very good at defining boundaries and does encode/decode a bit too much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing I didn't know is that NumPy does an implicit encode here when assigning to hs[i] (even on Python 3, probably for backwards compatibility reasons, but it's kind of unfortunate.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree with you. This has given us a lot problems in the past.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QuLogic Yep.. it's a bit strange that we store it internally as bytes again (in numpy) after first decoding it. But we kind of decided to make this PR a minimal fix and postpone the more major cleanup for later..


sac = cls._from_arrays(hf, hi, hs, data)
if sac.dist is None:
Expand Down
Binary file added obspy/io/sac/tests/data/test_encode.sac
Binary file not shown.
8 changes: 8 additions & 0 deletions obspy/io/sac/tests/test_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ def setUp(self):
self.fileseis = os.path.join(self.path, "data", "seism.sac")
self.file_notascii = os.path.join(self.path, "data",
"non_ascii.sac")
self.file_encode = os.path.join(self.path, "data", "test_encode.sac")
self.testdata = np.array(
[-8.74227766e-08, -3.09016973e-01,
-5.87785363e-01, -8.09017122e-01, -9.51056600e-01,
Expand Down Expand Up @@ -915,6 +916,13 @@ def test_always_sac_reftime(self):
self.assertAlmostEqual(tr1.stats.sac.a, a, places=5)
self.assertEqual(tr1.stats.sac.b, b)

def test_wrong_encoding(self):
"""
Read SAC file with wrong encoding
"""
tr0 = read(self.file_encode)[0]
self.assertEqual(tr0.stats.get('channel'), '????????')


def suite():
return unittest.makeSuite(CoreTestCase, 'test')
Expand Down