Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mini seed output of int64 data #2356 #2373

Open

Conversation

@LMurrayBergquist
Copy link
Contributor

commented Apr 4, 2019

This pull request changes obspy/io/mseed/core.py to allow int64 data to be written to MiniSEED as long as it can be safely downcast to int32. If downcasting the data from int64 to int32 changes the array it is not allowed and results in an error message. A test of writing int64 data to mseed has been added to obspy/io/mseed/tests/test_mseed_reading_and_writing.py.

This pull request was initiated to fix issue: MiniSEED output of int64 data #2356.

PR Checklist

  • Correct base branch selected? master for new features, maintenance_... for bug fixes
  • This PR is not directly related to an existing issue (which has no PR yet).
  • If the PR is making changes to documentation, docs pages can be built automatically.
    Just remove the space in the following string after the + sign: "+ DOCS"
  • If any network modules should be tested for the PR, add them as a comma separated list
    (e.g. clients.fdsn,clients.arclink) after the colon in the following magic string: "+TESTS:"
    (you can also add "ALL" to just simply run all tests across all modules)
  • All tests still pass.
  • Any new features or fixed regressions are be covered via new tests.
  • Any new or changed features have are fully documented.
  • Significant changes have been added to CHANGELOG.txt .
  • First time contributors have added your name to CONTRIBUTORS.txt .

@megies megies changed the base branch from maintenance_1.1.x to master Apr 4, 2019

@megies

This comment has been minimized.

Copy link
Member

commented Apr 4, 2019

I changed base branch for the PR, you inadvertently selected the maintenance branch, but at this point everything goes into master (and you properly based your branch on master it seems, so all's good)

Show resolved Hide resolved obspy/io/mseed/core.py Outdated
Show resolved Hide resolved obspy/io/mseed/core.py Outdated
Show resolved Hide resolved obspy/io/mseed/core.py Outdated
@megies

This comment has been minimized.

Copy link
Member

commented Apr 4, 2019

@LMurrayBergquist I just saw in line 777 there is a check if not trace_attr['encoding']:. This looks dangerous to me @krischer as it could evaluate to False when a string encoding was set earlier encoding = 0. This should be if trace_attr['encoding'] is None:

Please change @LMurrayBergquist

Show resolved Hide resolved obspy/io/mseed/core.py Outdated
@megies

This comment has been minimized.

Copy link
Member

commented Apr 4, 2019

Please add a changelog one-liner as well (also referencing issue number).

Also please add test cases, ideally writing to BufferIO objects (you can find examples in the test code e.g. in io/mseed/tests/test_mseed_util.py, it should be with blocks like with io.BytesIO() as buf:).

  • test that succesfully downcasting and writing to output works and re-read the data to see that it is the same as the input data
  • test that the expected error message is raised when the data can not be downcast (look in test code for examples that use with self.assertRaises(...):

Aside from these things the code looks good to me so far.. keep at it :-)

@LMurrayBergquist

This comment has been minimized.

Copy link
Contributor Author

commented Apr 4, 2019

Please add a changelog one-liner as well (also referencing issue number).

Also please add test cases, ideally writing to BufferIO objects (you can find examples in the test code e.g. in io/mseed/tests/test_mseed_util.py, it should be with blocks like with io.BytesIO() as buf:).

  • test that successfully downcasting and writing to output works and re-read the data to see that it is the same as the input data
    -> here should I just edit the test that I made so that it uses 'with' blocks?
  • test that the expected error message is raised when the data can not be downcast (look in test code for examples that use with self.assertRaises(...):

Ok, I will add the changes you suggest and make these test. But what is the 'changelog', and what should I add to that?

@megies

This comment has been minimized.

Copy link
Member

commented Apr 4, 2019

Check this: https://github.com/obspy/obspy/blob/master/CHANGELOG.txt sitting in the root dir of the git repo

@@ -495,6 +495,12 @@ def allocate_data(samplecount, sampletype):
del lil # NOQA
return Stream(traces=traces)

# Define helper function to copy data, replaces use of: trace.data.copy().astype(dtype)

This comment has been minimized.

Copy link
@megies

megies Apr 4, 2019

Member

PEP8: 2 blank lines before and after function definitions (see CircleCI log)

Also, this comment should instead be a docstring..

def ...():
    """
    .....  <- here
    """
# Define helper function to copy data, replaces use of: trace.data.copy().astype(dtype)
def _np_copy_astype(data, dtype):
try:
return data.astype(dtype, _np_copy_astype=True)

This comment has been minimized.

Copy link
@megies

megies Apr 4, 2019

Member

this needs to be ..., copy=True)

This comment has been minimized.

Copy link
@LMurrayBergquist

LMurrayBergquist Apr 4, 2019

Author Contributor

oops, that must have auto-filled in and I didn't notice, thanks!

This comment has been minimized.

Copy link
@megies

megies Apr 5, 2019

Member

well, we'd have seen it in failing tests anyway.. that's what they're there for anyway ;-)

# make sure the data can be written to mseed -> write to io.BytesIO() object in test
file = io.BytesIO()
st.write(file, format="mseed")
st2 = read(file)

This comment has been minimized.

Copy link
@megies

megies Apr 4, 2019

Member

Please use with construct, that's automatically closing the buffer at the end..

with io.BytesIO() as buf:
    st.write(buf, format="mseed")
    st2 = read(buf)

Also file is a Python builtin, try to avoid (locally) overwriting builtins at all cost.

This comment has been minimized.

Copy link
@megies

megies Apr 4, 2019

Member

Please als otest the failing case as mentioned with self.assertRaises(...) but I guess you're still working on this ... :-)

@megies

This comment has been minimized.

Copy link
Member

commented Apr 5, 2019

You've based this on an outdated master branch.. that's why all the Circle CI fails etc. Please rebase your branch on a current master and force-push. Get in touch with @DominikStr if you need help.

@megies megies added the .io.mseed label Apr 5, 2019

Show resolved Hide resolved obspy/io/mseed/core.py Outdated

# Test that error message is indeed raised when data cannot be downcast
# Create dummy stream that cannot be properly downcast to int64
x_int64 = np.arange(2147483648, 9223372036854775807, 460000000000000000, dtype=np.int64)

This comment has been minimized.

Copy link
@megies

megies Apr 9, 2019

Member

Ooof that's pretty long numbers to look at. I am assuming this arange will only contain a handful of numbers (it should only use a few so tests stay slim and fast)? But nobody will be able to tell from looking at this, so you should rather hard code it with like 3-4 numbers in it..

x_int64 = np.array([2147483648, ..., 9223372036854775807], dtype=np.int64)

Also, a good test should cover as many apects / potential inputs as possible. In this case, you should definitely add a test that has large negative numbers, as buggy code might not take that into account properly (I know yours does, but still -- tests should be agnostic of the code). Also, it seems the test data contains only numbers that can't be int32-represented. I think a better test data set would be to have like 4-5 small numbers and then just one large (incompatible) number among them.

Also, come to think of it, the check if data can besafely downcast to int32 could probably be replaced by checking the min/max values of the data (and checking against np.iinfo(np.int32).min/max). That might be faster, but we have it super safe I guess, so we'll leave as it for now I think.

This comment has been minimized.

Copy link
@LMurrayBergquist

LMurrayBergquist Apr 9, 2019

Author Contributor

Good point, I've updated the test for negative numbers and a mix of mainly small numbers with one too big one. Also a good point that checking that the min and max are in the right range should be just as good, I suppose it would take about the same space, but maybe it's faster to compute?

This comment has been minimized.

Copy link
@QuLogic

QuLogic Apr 10, 2019

Member

These seem like they'd be a bit clearer written as powers of 2/10: 2147483648, 9223372036854775807, 460000000000000000 == 2**31, 2**63 - 1, 460 * 10**15 (though maybe if the exact numbers don't matter too much, the last can be another power of 2 as well.)

This comment has been minimized.

Copy link
@megies

megies Apr 10, 2019

Member

Also a good point that checking that the min and max are in the right range should be just as good, I suppose it would take about the same space, but maybe it's faster to compute?

Just leave as is for now.. it's safe as it's implemented as is right now and that's most important for such an edge case.

This comment has been minimized.

Copy link
@megies

megies Apr 10, 2019

Member

the exact numbers don't matter too much, the last can be another power of 2 as well

Yep maybe clearer to just use powers of 2. And yes the exact numbers don't matter at all.

@@ -785,6 +795,16 @@ def _write_mseed(stream, filename, encoding=None, reclen=None, byteorder=None,
trace_attr['encoding'] = 1
elif trace.data.dtype.type == np.dtype(native_str('|S1')).type:
trace_attr['encoding'] = 0
# int64 data not supported; if possible, downcast to int32, otherwise create error message

This comment has been minimized.

Copy link
@megies

megies Apr 9, 2019

Member

Please add a note here that this could be switched to numpy.can_cast() whenever we bump to numpy 1.9.0 (see https://docs.scipy.org/doc/numpy/reference/generated/numpy.can_cast.html?highlight=1.9#numpy-can-cast) and maybe also add it in setup.py like before

@@ -28,6 +28,9 @@ master:
have been deselected due to the default location priorities setting. This
is a pure usability improvement as it has been confusing users
(see #2159).
- obspy.io.mseed:
* Add add ability to write int64 data to mseed if it can safely be downcast

This comment has been minimized.

Copy link
@megies

megies Apr 10, 2019

Member

Add add ;-)

This comment has been minimized.

Copy link
@LMurrayBergquist

LMurrayBergquist Apr 10, 2019

Author Contributor

oops! Thanks!

@megies

This comment has been minimized.

Copy link
Member

commented Apr 10, 2019

Restarting Circle CI, it glitched out somehow..

@krischer
Copy link
Member

left a comment

Just two small in-line comments.

In general I'm not too convinced that we want auto-dtype conversions to happen but I can see that it makes a lot of things easier in this particular case.

elif trace.data.dtype.type == np.int64:
# check if data can be safely downcast to int32
trace_data1 = _np_copy_astype(trace.data, np.int32)
if np.array_equal(trace.data, trace_data1):

This comment has been minimized.

Copy link
@krischer

krischer Apr 10, 2019

Member

How about just checking if the values in the array are in bounds with np.iinfo(np.int32) before casting instead of doing the cast + equality check? Feels a bit cleaner and should also be faster.

This comment has been minimized.

Copy link
@LMurrayBergquist

LMurrayBergquist Apr 10, 2019

Author Contributor

True, that does look smoother, I'll change to that!


# make sure the data can be written to mseed -> write to io.BytesIO() object in test
with io.BytesIO() as buf:
st.write(buf, format="mseed")

This comment has been minimized.

Copy link
@krischer

krischer Apr 10, 2019

Member

Please also add a check that the original data in the traces is still of dtype int64.

This comment has been minimized.

Copy link
@LMurrayBergquist

LMurrayBergquist Apr 10, 2019

Author Contributor

Do you mean after st.write() add a check that st is still of dtype int64?

This comment has been minimized.

Copy link
@krischer

krischer Apr 11, 2019

Member

exactly

@megies

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

Still needs a rebase on current master branch (see above comment)

LMurrayBergquist added some commits Apr 4, 2019

Fix mseed output of int64 data #2356
Changes obspy/io/mseed/core.py so that int64 data is accepted if it can be safely downcast to type int32 data, otherwise it creates an error message. A test for this has been added to test_mseed_reading_and_writing.py to check that an input array of int64 data is the same after being written to mseed.
Update obspy/io/mseed/core.py
Edits to obspy/io/mseed/core.py so that format complies with PEP8 and to make the code more robust.
Update to io/mseed/core.py and tests
Updates io/mseed/core.py to fix some typos and errors. Adds a test to test_mseed_reading_and_writing.py to check that the error message is raised for data that cannot be downcast. Also changes tests to use with blocks and BufferIO().
Update CHANGELOG.txt
Added lines:
 "- obspy.io.mseed:
   * Add add ability to write int64 data to mseed if it can safely be downcast to int32 data, otherwise raises ObsPyMSEEDError. (see #2356)"
Fix mseed output of int64 data #2356
Changes obspy/io/mseed/core.py so that int64 data is accepted if it can be safely downcast to type int32 data, otherwise it creates an error message. A test for this has been added to test_mseed_reading_and_writing.py to check that an input array of int64 data is the same after being written to mseed.
Update obspy/io/mseed/core.py
Edits to obspy/io/mseed/core.py so that format complies with PEP8 and to make the code more robust.
Update to io/mseed/core.py and tests
Updates io/mseed/core.py to fix some typos and errors. Adds a test to test_mseed_reading_and_writing.py to check that the error message is raised for data that cannot be downcast. Also changes tests to use with blocks and BufferIO().

LMurrayBergquist and others added some commits Apr 9, 2019

Updates for writing int64 data to mseed
test_downcast_int64_to_int32() was updated to include negative numbers and some data that could safely be downcast to int32. Notes on what could be updated in future once numpy is updated have been added to obspy/setup.py and obspy/io/mseed/core.py
Update CHANGELOG.txt
Fix typo in master section io.mseed
Update of test_downcast_int64_to_int32
Added a check that smaller negative numbers also pass, and too large negative numbers do fail. Also changed from writing long numbers to powers of 2.
Update core.py and test
Check in io/mseed/core.py changed to check that data is in bounds of int32 before converting, test also changed to include check that original stream dtype was not changed.

@megies megies force-pushed the LMurrayBergquist:MiniSEED-Output-of-int64-data-#2356 branch from d46e9e4 to d05f2bf May 15, 2019

@megies

This comment has been minimized.

Copy link
Member

commented May 15, 2019

@LMurrayBergquist I've properly rebased, fixed code formatting and force-pushed to your branch. There were quite a few rebase conflicts due to the messy merging of master branch, looks OK to me, but please double check if the final code state now is what it should be like.

If you think it's OK, @krischer can merge after a final review I guess.

@megies megies added this to the 1.2.0 milestone May 15, 2019

@megies megies added this to Waiting for final manual validation by Core Dev in Release 1.2.0 May 15, 2019

@LMurrayBergquist

This comment has been minimized.

Copy link
Contributor Author

commented May 15, 2019

@LMurrayBergquist I've properly rebased, fixed code formatting and force-pushed to your branch. There were quite a few rebase conflicts due to the messy merging of master branch, looks OK to me, but please double check if the final code state now is what it should be like.

If you think it's OK, @krischer can merge after a final review I guess.

That looks good to me! And the test still passes, so I think it is good now, thanks for doing all that!

@megies

This comment has been minimized.

Copy link
Member

commented May 17, 2019

@LMurrayBergquist your last merge commit introduced a sloppy local branch merge again, after I cleanly rebased your PR. I'll force-push back to the previous state.

@megies megies force-pushed the LMurrayBergquist:MiniSEED-Output-of-int64-data-#2356 branch from 47ca7a7 to d05f2bf May 17, 2019

@LMurrayBergquist

This comment has been minimized.

Copy link
Contributor Author

commented May 18, 2019

@LMurrayBergquist your last merge commit introduced a sloppy local branch merge again, after I cleanly rebased your PR. I'll force-push back to the previous state.

Oh dear, sorry about that! When I opened the Github desktop app it said there were some conflicts, so I went through and fixed those, then pushed it back. I guess now that you've cleaned it up I should not push anything, I should just leave it, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.