Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reftek mass position channels #2678

Merged
merged 13 commits into from Jul 16, 2020
Merged

Reftek mass position channels #2678

merged 13 commits into from Jul 16, 2020

Conversation

MaevaP
Copy link
Contributor

@MaevaP MaevaP commented Jul 8, 2020

What does this PR do?

Reftek mass position channel data are '16' encoded and have a sampling rate of 0.1 Hz.
This PR modifies section of the obspy.io.reftek.core routines that deal with upacking data with '16' and '32' encodings to account for number of samples in each data packet.
This PR also changes the type of EH_PAYLOAD['sampling_rate'] from int to float in the obspy.io.reftek.packet module.

Why was it initiated? Any relevant Issues?

This PR was initiated to provide read support for Reftek 130 mass position channel data.

PR Checklist

  • Correct base branch selected? master for new features, maintenance_... for bug fixes
  • This PR is not directly related to an existing issue (which has no PR yet).
  • If the PR is making changes to documentation, docs pages can be built automatically.
    Just remove the space in the following string after the + sign: "+ DOCS"
  • If any network modules should be tested for the PR, add them as a comma separated list
    (e.g. clients.fdsn,clients.arclink) after the colon in the following magic string: "+TESTS:"
    (you can also add "ALL" to just simply run all tests across all modules)
  • All tests still pass.
  • Any new features or fixed regressions are be covered via new tests.
  • Any new or changed features have are fully documented.
  • Significant changes have been added to CHANGELOG.txt .
  • First time contributors have added your name to CONTRIBUTORS.txt .

obspy/io/reftek/packet.py Outdated Show resolved Hide resolved
obspy/io/reftek/core.py Outdated Show resolved Hide resolved
@megies
Copy link
Member

megies commented Jul 10, 2020

So, if I understand correctly this actually does two very different things

  • store samples per second as a float. the official manual states that field is ASCII integer but I guess since reftek itself is writing those floating point ASCII values, we'll have to adapt. I'm always feeling slightly uneasy storing integers as float, but as far as I understand, there should be no floating point accuracy issues for any integer in the range +/-2**24 when storing as a 32 bit float, so I guess we're good
  • allow for packets that are not filling the fixed amount of data space in the packet. I never thought of that case, since the reftek plugin it started out with STEIM encodings unpacked by libmseed which takes care of any empty space at the end of packets, I guess

Otherwise this looks good to me, especially since existing tests pass and you have a test for the new case.

todos:

  • one liner in changelog
  • add yourself to contributors (unless you dont want to)
  • would be good if we could make the string representation have a guaranteed fixed width again (maybe just use str(...)[:4], since the original value's string representation also used to fit in 4 ASCII chars?)
  • that other inline comment about setting the itemsize in bytes

The part about taking into account the number of samples in each packet / empty space at end could be considered a bug fix, I guess. So we could think about putting it into maintenance_1.2.x. On the other hand not sure if we will have a 1.2.3 release even, so , I guess it can go into master as proposed and if need be it can be backported.

@megies megies self-assigned this Jul 10, 2020
@megies megies added this to the 2.0.0 milestone Jul 10, 2020
@megies megies added the .io issues generally related to our read/write plugins label Jul 10, 2020
@MaevaP
Copy link
Contributor Author

MaevaP commented Jul 10, 2020

Hi Tobias,
You are correct the updates do two different things.
Regarding the sample rate, the manual is unfortunately a bit misleading (or out-of-date) because it implies that the sample rate is always an integer which is not the case for the auxiliary data streams. Most PASSCAL experiments end up configuring the sample rate of the auxiliary channels to 0.1 Hz.
I just addressed all the items in your to-do list. It looks like, however, that I can't update the CHANGELOG.txt file without creating a conflict and making the checks fail.
Thank you for all the feedback and positively considering my PR.

@dsentinel
Copy link

dsentinel commented Jul 10, 2020

Thanks @MaevaP !
This looks good to me as well, with Tobias's suggestions.

I just addressed all the items in your to-do list. It looks like, however, that I can't update the CHANGELOG.txt file without creating a conflict and making the checks fail.

Rebase from the current master and push.... I could have sworn there used to be a button for this
OR
hit the resolve conflicts button above

@megies ;

there should be no floating point accuracy issues for any integer in the range +/-2**24 when storing as a 32 bit float, so I guess we're good

Ugh, yes, floats are such a pain. Sometimes decimal comes to the rescue.
I think this shouldn't be an issue either given the parameters from rt's such as as SR's are not values like 20.1.

Not a clean way to solve the printing problem as str(SR)[:4] -> 500. with a dangling dot, but only a little hard on the eyes.

@MaevaP
Copy link
Contributor Author

MaevaP commented Jul 10, 2020

Thank you for the feedback @dsentinel.
The conflict has been resolved.

@megies
Copy link
Member

megies commented Jul 11, 2020

Not a clean way to solve the printing problem as str(SR)[:4] -> 500. with a dangling dot, but only a little hard on the eyes.

if we wanna be super clean, we could check int(samprate) == samprate then %d (like before) otherwise the construct we have right now. but i dont care, you can leave as is as far as i'm concerned, its just a print representation that will only very rarely be used anyway.

obspy/CONTRIBUTORS.txt Outdated Show resolved Hide resolved
@megies
Copy link
Member

megies commented Jul 11, 2020

Just the ordering in contributors, and if you're comfortable with that please git rebase -i and just go through with it without edits to get rid of that merge commit. I can do it if its unclear

@MaevaP
Copy link
Contributor Author

MaevaP commented Jul 13, 2020

Hi Tobias,
I changed the ordering in the CONTRIBUTORS file and pushed the updates after running git pull --rebase origin reftek_soh. I am not sure whether this is what you were asking for. I am not very familiar with git rebase.
Thank you for the feedback and help.

obspy/io/reftek/core.py Outdated Show resolved Hide resolved
CHANGELOG.txt Outdated Show resolved Hide resolved
obspy/io/reftek/core.py Outdated Show resolved Hide resolved
obspy/io/reftek/core.py Outdated Show resolved Hide resolved
@megies
Copy link
Member

megies commented Jul 14, 2020

I am not very familiar with git rebase.
Thank you for the feedback and help.

For git rebase you want to specify the base branch, so e.g. git rebase origin/master (unless you git fetch origin before it might not do anything though). It basically is an alternative to git merge origin/master. Rebasing will disrupt history, so other people working on the same branch will have to rewrite their history too, if they want to keep working on the same PR. But rebasing gives a clean history, that could even be merged into the base branch with just a fast-forward. Merging will keep current history intact, but on the other hand it introduces ugly merge commits and make history have lots of branching and merging. In obspy we clean up PRs by doing rebases, which is usually fine since mostly one person is usually in charge of a PR.

@megies
Copy link
Member

megies commented Jul 15, 2020

I took over for the (hopefully) last iteration of changes, since I don't want this to stay dangling for too long, as it's hard to come back to stuff like this after a while.

If CI goes green I'd appreciate a final review @dsentinel

@MaevaP
Copy link
Contributor Author

MaevaP commented Jul 15, 2020

Awesome! Thanks for taking over and addressing that quickly. All these changes make sense and are an improvement.
Thanks again to both for the help and feedback.

Copy link

@dsentinel dsentinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Merge as is, or minor changes.

@@ -314,15 +310,38 @@ def to_stream(self, network="", location="", component_codes=None,
sample_data = _unpack_C0_C2_data(packets_,
encoding)
elif encoding in ('16', '32'):
dtype = {'16': np.int16, '32': np.int32}[encoding]
# rt130 stores in big endian
dtype = {'16': '>i2', '32': '>i4'}[encoding]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

sample_data = sample_data.flatten().view(dtype)
# switch endianness, rt130 stores in big endian
sample_data = sample_data.byteswap()
sample_data = sample_data.view(dtype)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit strange to take a view of a variable that is being overwritten by itself. I think astype would be cleaner, but there's a chance of a copy, the conditions which are vague from the docs so this is fine with me.

obspy/io/reftek/core.py Outdated Show resolved Hide resolved
Co-authored-by: dsentinel <lloyd@passcal.nmt.edu>
@megies megies merged commit 18013bf into obspy:master Jul 16, 2020
@megies
Copy link
Member

megies commented Jul 16, 2020

thanks for the PR @MaevaP good stuff! 🎉

@megies
Copy link
Member

megies commented Jul 16, 2020

@MaevaP if you can be asked, it would be nice to test if it also works if the half-full packet is in the middle of the contiguous packet list. You happen to have real data with a half-full packet that is contiguously followed by some other packets? I noticed in the test, that the half-full packet is at the end, so the test case is lacking a bit

@MaevaP
Copy link
Contributor Author

MaevaP commented Jul 16, 2020

@megies That's a good point.
With all the test data sets I have worked with so far, I have never seen an incomplete packet in the middle of the contiguous packet list. But that being said, I have only recently started working with rt130 data.
I can manipulate the packets in the test data set I provided to create a data set that would fit the bill if you think that it would still be useful for testing purposes.

@megies
Copy link
Member

megies commented Jul 17, 2020

if you think that it would still be useful for testing purposes.

I think it'd be worth it, yes. I'm not using these numpy function on a daily basis, and we are cutting out some parts of a contiguous array and just return a view, so I'm wondering if that cuts it, especially if we run something that is working on the C pointers from numpy afterwards.. or if we need to make a copy after the cutting out middle parts to be safe.. I have things in mind like #192, #193 or #1704

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
.io issues generally related to our read/write plugins
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants