Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Stream.stack method #2440

Merged
merged 17 commits into from Oct 10, 2019

Conversation

@trichter
Copy link
Member

trichter commented Aug 22, 2019

What does this PR do?

I added a stack method to the Stream object. Any discussion about the implementation is welcome before adding some tests.

Why was it initiated? Any relevant Issues?

See #1741, and maybe #841, #1002

PR Checklist

  • Correct base branch selected? master for new features, maintenance_... for bug fixes
  • This PR is not directly related to an existing issue (which has no PR yet).
  • If the PR is making changes to documentation, docs pages can be built automatically.
    Just remove the space in the following string after the + sign: + DOCS
  • If any network modules should be tested for the PR, add them as a comma separated list
    (e.g. clients.fdsn,clients.arclink) after the colon in the following magic string: "+TESTS:"
    (you can also add "ALL" to just simply run all tests across all modules)
  • All tests still pass.
  • Any new features or fixed regressions are be covered via new tests.
  • Any new or changed features have are fully documented.
  • Significant changes have been added to CHANGELOG.txt .
  • First time contributors have added your name to CONTRIBUTORS.txt .
@trichter trichter added the .core label Aug 22, 2019
@trichter trichter added this to the 1.2.0 milestone Aug 22, 2019
@trichter trichter force-pushed the stack branch from 5c49eec to d03c2f9 Aug 22, 2019
seedid and `'all'` (default) which stacks together
all traces in the stream.
:param type: Type of stack, one of the following:
`'normal'`: normal stack (default),

This comment has been minimized.

Copy link
@QuLogic

QuLogic Aug 23, 2019

Member

The 'normal' stack is normal, is a bit of a tautology.

This comment has been minimized.

Copy link
@ThomasLecocq

ThomasLecocq Aug 23, 2019

Contributor

'normal' should be 'linear' or 'average', maybe?

@trichter

This comment has been minimized.

Copy link
Member Author

trichter commented Aug 23, 2019

Thanks for the comments, +DOCS

stack = np.mean(data, axis=0) * phase_stack
elif type == 'root':
r = np.mean(np.sign(data) * np.abs(data)
** (1 / order), axis=0)

This comment has been minimized.

Copy link
@ThomasLecocq

ThomasLecocq Aug 23, 2019

Contributor

is 1/order safe on all python versions?

This comment has been minimized.

Copy link
@trichter

trichter Aug 23, 2019

Author Member

I think its not. I am used to py3, now.

This comment has been minimized.

Copy link
@ThomasLecocq

ThomasLecocq Aug 23, 2019

Contributor

me too and 1.2.0 will be "py3 only", but in case of...

This comment has been minimized.

Copy link
@trichter

trichter Aug 23, 2019

Author Member

Probably ;)

This comment has been minimized.

Copy link
@trichter

trichter Aug 29, 2019

Author Member

Actually, there is this: from __future__ import division

@trichter

This comment has been minimized.

Copy link
Member Author

trichter commented Aug 28, 2019

From my point of view, this PR is ready. Here is a link to the relevant doc: [l]

:param group: Stack waveforms together which have the same metadata
given by this parameter. The parameter should name the
corresponding keys of the stats object,
e.g. `'{network}.{station}'` for stacking all

This comment has been minimized.

Copy link
@QuLogic

QuLogic Aug 31, 2019

Member

Single backticks are a cross-reference in reST; this (and all others below) should be doubled for code style.

:returns: New stream object with stacked traces. The metadata of each
trace corresponds to the metadata of the original traces
if those are the same. Additionaly, the entries

This comment has been minimized.

Copy link
@QuLogic

QuLogic Aug 31, 2019

Member
Suggested change
if those are the same. Additionaly, the entries
if those are the same. Additionally, the entries
if group == 'seedid':
group = '{network}.{station}.{location}.{channel}'
if len(type) == 2:
type, order = type

This comment has been minimized.

Copy link
@QuLogic

QuLogic Aug 31, 2019

Member

This needs a bit more validation; if you pass type = 'root', it will raise an undefined variable error on order later.

You could also pass type = 'pw' which gets split into p and w.

for tr in self:
groups[group.format(**tr.stats)].append(tr)
stacks = []
for groupid, grouptrcs in groups.items():

This comment has been minimized.

Copy link
@QuLogic

QuLogic Aug 31, 2019

Member

Don't think you need to shorten grouptrcs; maybe just traces will do?

header['stack'] = groupid
header['stack_count'] = len(grouptrcs)
npts_all = [len(tr) for tr in grouptrcs]
npts_dif = max(npts_all) - min(npts_all)

This comment has been minimized.

Copy link
@QuLogic

QuLogic Aug 31, 2019

Member
Suggested change
npts_dif = max(npts_all) - min(npts_all)
npts_dif = np.ptp(npts_all)

This comment has been minimized.

Copy link
@d-chambers

d-chambers Aug 31, 2019

Member

nice trick, I didn't know there was a ptp in numpy!

grouptrcs = [copy.copy(tr) for tr in grouptrcs]
for tr in grouptrcs:
tr.data = tr.data[:npts]
data = np.array([tr.data for tr in grouptrcs])

This comment has been minimized.

Copy link
@QuLogic

QuLogic Aug 31, 2019

Member

You could

Suggested change
data = np.array([tr.data for tr in grouptrcs])
data = np.array([tr.data[:npts] for tr in grouptrcs])

and not bother with copying above.

Copy link
Member

d-chambers left a comment

This looks great, I have done stacking "by hand" before and this will make it much easier.

One major question I have is how this handles temporal alignment. It seems the Stack method does not consider the starttime / endtime of each trace. This is fine, as often one may wish to stack data from different times, but I think this point should be mentioned in the doc string. If the user wants to stack data only from overlapping times it is up to the user to align the traces.

Also, as @QuLogic mentioned, a few more validations on inputs/helpful error messages would be useful.

locations and channels of the stations and returning a stream
consisting of one stacked trace for each station.
This parameter can take two special values,
`'seedid'` which stacks the waveforms by seedid and

This comment has been minimized.

Copy link
@d-chambers

d-chambers Aug 31, 2019

Member

I would consider chaning this to 'id' to be consistently named with the Trace.id property. Also, describe it as "seed id" and not "seedid".

"""
Return stream with traces stacked by the same selected metadata.
:param group: Stack waveforms together which have the same metadata

This comment has been minimized.

Copy link
@d-chambers

d-chambers Aug 31, 2019

Member

I recommend also adding :type group: str so it is clear this must be a string. Same for the other parameters.

if those are the same. Additionaly, the entries
`stack` (result of the format operation on the group parameter) and
`stack_count` (number of stacked traces)
are written to the stats object.

This comment has been minimized.

Copy link
@d-chambers

d-chambers Aug 31, 2019

Member

maybe "stats object(s)" as the resulting stream could return more than one trace.

@@ -2656,6 +2656,59 @@ def test_write_empty_stream(self):
self.assertEqual(e.exception.args[0],
'Can not write empty stream to file.')

def test_stack(self):

This comment has been minimized.

Copy link
@d-chambers

d-chambers Aug 31, 2019

Member

There is a lot going on here, do you mind breaking this up into a few different tests? Or maybe just add a few more comments about what each subtest is doing if you prefer.

st += st.copy()
st2 = st.stack('seedid')
self.assertEqual(len(st2), 3)
self.assertEqual({tr.stats.stack for tr in st2}, {tr.id for tr in st})

This comment has been minimized.

Copy link
@d-chambers

d-chambers Aug 31, 2019

Member

Should this preserve order? If so use a list or tuple rather than set. Same for other set asserts below.

This comment has been minimized.

Copy link
@trichter

trichter Sep 1, 2019

Author Member

No, the order is not important. It will only be preserved on python>3.4. Also, the number of traces of the stack and the original stream is different, so list or tuple would not be sufficient. The keys of an OrderedDict would be.

:param npts_tol: Tolerate traces with different number of points
with a difference up to this value. Surplus samples are discarded.
:returns: New stream object with stacked traces. The metadata of each

This comment has been minimized.

Copy link
@d-chambers

d-chambers Aug 31, 2019

Member

I think returning a new Stream instance is the best approach, but would it make sense to perform this operation in place in order to be consistent with most other Stream methods?

This comment has been minimized.

Copy link
@trichter

trichter Sep 1, 2019

Author Member

I also think returning a new Stream is the best approach here, because the number of traces changes and a lot of metadata changes, which is not the case for the other, in-place methods.

This comment has been minimized.

Copy link
@krischer

krischer Oct 9, 2019

Member

I see two solutions here:

(1) Actually change the existing stream object in-place. This is what a few other method, e.g. .merge() already do and it would fit right in with most other stream methods.
(2) Rename the method to .create_stack() which would return a new object. Then its IHMO a bit clearer that this does not modify in-place.

I slightly tend towards the first choice but I'm fine with either.

This comment has been minimized.

Copy link
@trichter

trichter Oct 9, 2019

Author Member

I prefer (2) with the name of (1) ;).
But I am also OK with (1) and will change the behavior.

group = '{network}.{station}.{location}.{channel}'
if len(type) == 2:
type, order = type
groups = collections.defaultdict(list)

This comment has been minimized.

Copy link
@d-chambers

d-chambers Aug 31, 2019

Member

Nice use of collections!

@trichter

This comment has been minimized.

Copy link
Member Author

trichter commented Sep 1, 2019

@QuLogic @d-chambers thanks for the stimulant comments, I will take care of them in a week or so.

@trichter

This comment has been minimized.

Copy link
Member Author

trichter commented Sep 1, 2019

One major question I have is how this handles temporal alignment. It seems the Stack method does not consider the starttime / endtime of each trace. This is fine, as often one may wish to stack data from different times, but I think this point should be mentioned in the doc string. If the user wants to stack data only from overlapping times it is up to the user to align the traces.

This is done by these lines

            """
            :returns: New stream object with stacked traces. The metadata of each
                 trace corresponds to the metadata of the original traces
                 if those are the same.
            """
            ...
            header = {k: v for k, v in grouptrcs[0].stats.items()
                      if all(tr.stats.get(k) == v for tr in grouptrcs)}

This means the starttime is preserved if it is the same for all traces in one group. In all other cases the user has to take care of it. I think I will add a test for this and maybe make it more clear in the doc.

@trichter trichter force-pushed the stack branch from 1a9b118 to b65a56c Sep 12, 2019
@trichter

This comment has been minimized.

Copy link
Member Author

trichter commented Sep 12, 2019

I implemented the review and rebased on current master.

@trichter trichter added this to Waiting for final manual validation by Core Dev in Release 1.2.0 Sep 16, 2019
"""
Return stream with traces stacked by the same selected metadata.
:param str group: Stack waveforms together which have the same metadata

This comment has been minimized.

Copy link
@krischer

krischer Oct 9, 2019

Member
Suggested change
:param str group: Stack waveforms together which have the same metadata
:param group: Stack waveforms together which have the same metadata

This comment has been minimized.

Copy link
@trichter

trichter Oct 9, 2019

Author Member

Just added this, because @d-chambers asked to document the types. For me, it does not matter.

This comment has been minimized.

Copy link
@krischer

krischer Oct 9, 2019

Member

Then you have to use :type group: str as you already did for one of the other parameters. I'm not sure if sphinx can use the syntax you originally used but we probably stick to a consistent docstyle syntax.

This comment has been minimized.

Copy link
@trichter

trichter Oct 9, 2019

Author Member

@krischer can you clarify a bit. IMHO it is not necessary to document the basic types, but it also does not hurt. Do you also want me to remove the int from int npts_tol?

This comment has been minimized.

Copy link
@krischer

krischer Oct 9, 2019

Member

For the group parameter I would document the types as it is not obvious and it can take a string or a tuple. In general I also think its nice to be consistent within a single docstring and if a single type is documented, all should be documented.

But its all a bit moot as we'll hopefully drop Python 2 support very soon and can then just use type hints for documentation so we'll have some work to do in any case.

This comment has been minimized.

Copy link
@trichter

trichter Oct 9, 2019

Author Member

Yeah sphinx is OK with this syntax, I use it all the time, because the other form is a bit verbose, but I can of course change it.

:param npts_tol: Tolerate traces with different number of points
with a difference up to this value. Surplus samples are discarded.
:returns: New stream object with stacked traces. The metadata of each

This comment has been minimized.

Copy link
@krischer

krischer Oct 9, 2019

Member

I see two solutions here:

(1) Actually change the existing stream object in-place. This is what a few other method, e.g. .merge() already do and it would fit right in with most other stream methods.
(2) Rename the method to .create_stack() which would return a new object. Then its IHMO a bit clearer that this does not modify in-place.

I slightly tend towards the first choice but I'm fine with either.

@@ -3118,6 +3119,80 @@ def remove_sensitivity(self, *args, **kwargs):
tr.remove_sensitivity(*args, **kwargs)
return self

def stack(self, group='all', type='linear', npts_tol=0):

This comment has been minimized.

Copy link
@krischer

krischer Oct 9, 2019

Member

Could you rename type to stack_type to not clash with the reserved type word?

This comment has been minimized.

Copy link
@krischer

krischer Oct 9, 2019

Member

Also maybe rename group to group_by which is what most other packages like pandas would probably call an operation like this.

header = {k: v for k, v in traces[0].stats.items()
if all(tr.stats.get(k) == v for tr in traces)}
header['stack'] = groupid
header['stack_count'] = len(traces)

This comment has been minimized.

Copy link
@krischer

krischer Oct 9, 2019

Member

We don't really have a good concept for this type of meta-data for the stack so I also don't really know what's the best way here. But maybe move these two to a new stack group in the stats dictionary? Really just a suggestion though.

This comment has been minimized.

Copy link
@trichter

trichter Oct 9, 2019

Author Member

So, you would prefer stats.stack.id/group and stats.stack.count? I don't know, I think I would prefer slightly the flat variant, but I am also OK with this one.

This comment has been minimized.

Copy link
@trichter

trichter Oct 9, 2019

Author Member

Hmm, one could also add 'stats.stack.type'.

This comment has been minimized.

Copy link
@krischer

krischer Oct 9, 2019

Member

Yea that might work. I slightly prefer to group all the stack meta parameters which I think would look a bit nicer.

obspy/core/stream.py Show resolved Hide resolved
@trichter trichter force-pushed the stack branch from 0e44190 to 1c7d2c1 Oct 9, 2019
@trichter

This comment has been minimized.

Copy link
Member Author

trichter commented Oct 9, 2019

I implemented @krischer's review and rebased on current master.

@trichter

This comment has been minimized.

Copy link
Member Author

trichter commented Oct 10, 2019

This is again ready for the next step, be it another review or merging.

@krischer

This comment has been minimized.

Copy link
Member

krischer commented Oct 10, 2019

I think this looks good now. Thanks a lot!

@krischer krischer merged commit 79a8385 into master Oct 10, 2019
1 of 4 checks passed
1 of 4 checks passed
continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
continuous-integration/appveyor/pr AppVeyor build failed
Details
docs-buildbot Build succeeded, but there are warnings/errors:
Details
ci/circleci Your tests passed on CircleCI!
Details
@krischer krischer deleted the stack branch Oct 10, 2019
@krischer krischer moved this from Waiting for final manual validation by Core Dev to Done in Release 1.2.0 Oct 10, 2019
@trichter

This comment has been minimized.

Copy link
Member Author

trichter commented Oct 11, 2019

It was a pleasure, I think I am going to use this method a lot.

@trichter trichter referenced this pull request Oct 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
5 participants
You can’t perform that action at this time.