Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Inventory.merge() method #2346

Open
wants to merge 6 commits into
base: master
from

Conversation

Projects
None yet
5 participants
@megies
Copy link
Member

commented Mar 29, 2019

I think it would be great to have a .merge() method for Inventory.

Since all the Inventory / Network etc objects have more fields in them that might collide we probably need more than one merge strategy to offer, like with merging waveforms. I'm thinking..

  • very defensive method that only merges when all the fields match (e.g. end date of networks etc)
  • very drastic method that merges e.g. all networks with same network code together and resets end time and comments etc
  • maybe one/some in between strategies

@DominikStr I think this is a good project for you, please discuss any unclear points directly in this ticket so that other people can chime in.

@megies megies added this to the 2.0.0 milestone Mar 6, 2019

@megies

This comment has been minimized.

Copy link
Member Author

commented Mar 6, 2019

This is also connected very loosely to #2087

@krischer

This comment has been minimized.

Copy link
Member

commented Mar 6, 2019

I've once written something along this line here: https://github.com/SeismicData/pyasdf/blob/master/pyasdf/inventory_utils.py

I'm not entirely sure how solid this is in the grand scheme of things but please feel free to use it as a starting point!

First prototype for issue #2346
First try on a inventory.merge() method. Not yet thoroughly tested. If this going in the right direction I will make the descriptions more clear, write test cases and look if I can get the code completly consistent.
@DominikStr

This comment has been minimized.

Copy link
Contributor

commented Mar 27, 2019

I committed the first prototype for this method. Maybe you could have a look if this is going in the right direction.

I tried to make it as modularizable as possible. This also has the advantage to easily change and add different strategies on how to deal with conflicting entries.

At the moment it is not tested thoroughly and the description is not yet finished. As soon as it is clear how this method works this will follow.

@megies

This comment has been minimized.

Copy link
Member Author

commented Mar 28, 2019

Same thing here, please avoid brackets in branch names @DominikStr. Maybe even avoid dots. Please push to a new branch with a plain name.

@DominikStr

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

Renamed this one as well. Hopefully, it is working.

@megies

This comment has been minimized.

Copy link
Member Author

commented Mar 29, 2019

Converted this into a PR as well.

@dsentinel

This comment has been minimized.

Copy link
Contributor

commented Apr 4, 2019

I've once written something along this line here: https://github.com/SeismicData/pyasdf/blob/master/pyasdf/inventory_utils.py

I'm not entirely sure how solid this is in the grand scheme of things but please feel free to use it as a starting point!

I also wrote something similar for needs I had, as well as cascading things such as epochs from a channel up to station and network.
If interested I'll dig it up.

@megies

This comment has been minimized.

Copy link
Member Author

commented Apr 4, 2019

I've once written something along this line here: https://github.com/SeismicData/pyasdf/blob/master/pyasdf/inventory_utils.py
I'm not entirely sure how solid this is in the grand scheme of things but please feel free to use it as a starting point!

I also wrote something similar for needs I had, as well as cascading things such as epochs from a channel up to station and network.
If interested I'll dig it up.

Have you guys all switched to backroom development now? What happened to adding to obspy directly so others can use it?

@megies

This comment has been minimized.

Copy link
Member Author

commented Apr 4, 2019

@DominikStr, looking at your prototype, some thoughts came up..

  • we should think about whether it might be possible to catch all useful scenarios in just two modes of operation. we could then avoid a key word controlled choice like merge_type="standard" and use a simpler bool switch like strict=False. Maybe in normal mode of operation we merge pretty much everything and set everything to None that's not the same on two merged objects? I think that is the most common use case. If user switches to strict=True we only merge objects that agree in all the metadata (which likely will rarely happen out in the wild, probably. i think far less people might need that option. Can we come up with just two options that make sense throughout the full hierarchy (I guess we can, we just have to think it through properly because extending it to other options would be messy then in the future)
  • maybe we could somehow combine all the inventory_merge=None etc keywords into a single one? something like what level='station' does for FDSN client get_stations(). right now the call syntax is pretty longish and cluttered, i'd prefer to have as few kwargs as possible while only sacrificing very odd edge-case use cases.
  • I think we don't need the first/second switch. We can always let self get the preference and users can then simply switch inv.merge(inv2) with inv2.merge(inv) if they need it the other way round. Although, I think that this is gonna be useful not as much for putting together de-facto data center metadata but rather in production codes that just have the merged inventory for processing uses and care less about the more "informative/verbose only" fields. Also, the first/second switch is only useful ifto-be-merged objects come from self/other but if both come from self or both come from other it might not make sense anyway, or am I missing something?

@DominikStr what do you think on these things? Maybe we can also get some other opinions (@krischer? @d-chambers?)

@d-chambers

This comment has been minimized.

Copy link
Member

commented Apr 4, 2019

Maybe we can also get some other opinions

I have never come across a need for this in my day to day and consequently haven't given it much thought. It would be good to checkout @dsentinel 's implementation, I know he is an inventory guru.Unfortunately, I am falling behind on a few projects due in early May so I haven't been able to commit too much time to ObsPy lately, but things should start to clear up in a month or so.

@DominikStr

This comment has been minimized.

Copy link
Contributor

commented Apr 4, 2019

@dsentinel it would be great if you could share it, as I am not using obspy in practise. I would be glad to see what features are important in a method like this.

I think it is definitely a good idea to try and reduce everything to strict or not strict. I'm at the moment not sure how to do it exactly but I will find a way. If we only have a switch we also would get rid of all the kwargs and would have a cleaner function call.

And you are right the distinction between first and second doesn't make much sense. I'm not sure if we should keep these options at all if we want to condense the function to strict or not strict.

"equipment", "response", "description",
"comments", "start_date", "end_date",
"restricted_status", "alternate_code",
"historical_code", "data_availability"]

This comment has been minimized.

Copy link
@krischer

krischer Apr 11, 2019

Member

This should not be necessary as the individual objects implement __eq__ methods so they can be compared for equality that way.

This comment has been minimized.

Copy link
@DominikStr

DominikStr Apr 27, 2019

Contributor

As I understand it __eq__ compares all properties. For channels, it would not be necessary indeed. For stations and networks, I want to exclude for example the selected number, so I think the lists are necessary ?!?

This comment has been minimized.

Copy link
@krischer

krischer Apr 29, 2019

Member

Yea that is a good point. I did not see that.

@krischer

This comment has been minimized.

Copy link
Member

commented Apr 11, 2019

I also wrote something similar for needs I had, as well as cascading things such as epochs from a channel up to station and network.
If interested I'll dig it up.

Definitely!

Have you guys all switched to backroom development now? What happened to adding to obspy directly so others can use it?

Haha ;-) Sometimes there is just no time to do the extra effort to get things into ObsPy because they might not be general enough.

@krischer

This comment has been minimized.

Copy link
Member

commented Apr 11, 2019

@DominikStr what do you think on these things? Maybe we can also get some other opinions (@krischer? @d-chambers?)

I think the default merge method should operate in a sense that it always just updates all date times at the station and network levels, e.g. take smallest start and largest end from all the original objects. Additionally if one object has a key (e.g. equipment) and the other not it should just use the one. If non-date key's differ I think we should raise a nice error message and let the user's deal with it as I'm not sure we can find a general way to deal with this. Also if channels from overlapping epoch times differ we raise - no idea what else to do.

I not sure we actually need a strict kwarg as its IMHO pretty clear what the "right" way to do it is even though its tedious and tricky to implement. But I'm happy to hear of any counterexamples :-)

DominikStr added some commits May 3, 2019

Now try on the merge method. It is now way simpler and takes no user …
…input. Not tested thoroughly yet and description is not final as well.
Now try on the merge method. It is now way simpler and takes no user …
…input. Not tested thoroughly yet and description is not final as well.
Merge remote-tracking branch 'origin/add_2346_Inventory_merge_method'…
… into add_2346_Inventory_merge_method

# Conflicts:
#	obspy/core/inventory/inventory.py
@DominikStr

This comment has been minimized.

Copy link
Contributor

commented May 3, 2019

I tried to make the merging now as easy as possible while still preserving the most important features. The decision flow I tried to follow is below.

Overlapping and equal = Merge all to one object
Overlapping and not equal = Raise Error
Not Overlapping = Keep always

I also implemented the filling of None if possible.

Merge Scheme

Maybe you could comment if you think this is sufficient as I have no idea of which inventories are normally merged.

@dsentinel

This comment has been minimized.

Copy link
Contributor

commented May 22, 2019

Here's a gist of what I've done. It's a bit different as I'm merging into an Inventory from a Stream or records, but there's some useful examples of how to match a node, get it's parent, extend an epoch, etc. I think some kind of API like this is a good starting point. I think some of these like get_parent should be properties, as well as len() which can return odd results.

Once there's an API for matching, navigating, and modifying and Inventory, putting merge logic on top will be cleaner, and easier to approach.
Some of this is to work around oddities in the underlying Inventory like how select works. Perhaps its a good time to address those as they are found.

Don't judge my hacky code :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.