Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganization of ObsPy submodule structure #842

Merged
merged 55 commits into from Mar 24, 2015
Merged

Reorganization of ObsPy submodule structure #842

merged 55 commits into from Mar 24, 2015

Conversation

krischer
Copy link
Member

As mentioned a lot of times (recently in #841 (comment)), we should think about reorganizing the internal submodule structure.

  • as always, this would have to handled via DeprectionWarnings and redirecting imports for one full major release cycle (like we did for obspy.signal.psd when renaming to obspy.signal.spectral_estimation; e.g. if we make the switch for 0.10.0 all imports need to be redirected until at 0.11.0).
  • move everything that has no routines intended to be used directly to ...
    • obspy.plugins?
    • obspy.formats?
    • obspy.io?
  • things that should be moved..
    • obspy.station to obspy.core.station ??
    • obspy.station.stationxml to ... ?
  • ...

more comments...?

current plan:

obspy
├── core/
│   ├── trace.py
│   ├── stream.py
│   ├── utcdatetime.py
│   ├── event.py
│   ├── station/
│   │   ├── inventory.py
│   │   ├── network.py
│   │   └── ... 
│   └── util/
│       ├── geodetics
│       └── ...
├── io/
│   ├── ah
│   ├── cnv 
│   ├── css 
│   ├── datamark
│   ├── gse2
│   ├── json
│   ├── kinemetrics
│   ├── mseed
│   ├── ndk 
│   ├── nlloc
│   ├── pdas
│   ├── pde 
│   ├── quakeml
│   ├── sac 
│   ├── seg2
│   ├── segy
│   ├── seisan
│   ├── sh
│   ├── stationxml
│   ├── wav 
│   ├── y
│   ├── xseed
│   └── zmap
├── clients
│   ├── arclink
│   ├── earthworm
│   ├── fdsn
│   ├── iris
│   ├── neic
│   ├── neries
│   ├── seedlink
│   └── seishub
├── db
├── imaging
├── lib
├── realtime
├── signal
└── taup (rename?)

@ThomasLecocq
Copy link
Contributor

While thinking about this, we should also try to find a way for users to provide "external" plugins, without needing to fork the whole obspy stuff. Imagine I produce a new "acme" plugin, and I want it to be in my own python path, but I want to be able to declare it to obspy without modifying obspy's core. Something like a .obspy/enabled-plugins (ala apache2) in the user folder, where users could add extension points to obspy?

@barsch
Copy link
Member

barsch commented Jul 15, 2014

maybe obspy.io for all file formats - I like short module names

@ThomasLecocq - ObsPy already has a plugin system - there is no need to fork master at all

@ThomasLecocq
Copy link
Contributor

@barsch woops... ok... sorry 'bout that.

+1 for obspy.io

@krischer
Copy link
Member

@ThomasLecocq Here is an example for an external waveform plugins:

The same can be done for many other things where we use the plugin system.

Also +1 for obspy.io

Furthermore I would suggest obspy.clients for all downloading clients and obspy.toolbox for things like the array processing and the noise processing as discussed in #841.

While we are at it we could refactor obspy.core a bit. Right now the Stream and Trace classes are defined in obspy.core, as are all event related classes. The station related classes on the other hand are defined in obspy.station. I think we should either move all to obspy.core, or have obspy.event, obspy.waveform, and obspy.station as three "core" submodules dedicated to handling the different classes. I tend to prefer the later variant.

In any case, I think we should wait until after the release of 0.10 which I think we should release fairly soon. The Python 3 stuff seems stable enough and we will never catch the remaining issues if we don't release.

@ThomasLecocq
Copy link
Contributor

Why not using some QuakeML-style obspy.inventory or obspy.objects ? (for .station, .event)

edit: or obspy.resources ?

@krischer
Copy link
Member

obspy.objects is pretty nice. obspy.inventory would be confusing as we already have the obspy.station.Inventory object.

@megies
Copy link
Member Author

megies commented Jul 16, 2014

I don't know.. I like obspy.core.. I think the name makes a good distinction against anything else that is in the root module structure. I would just move obspy.station to obspy.core (and maybe think about merging all the station classes into one file obspy/core/station.py, like we have for event.py).

@barsch
Copy link
Member

barsch commented Jul 16, 2014

I don't like obspy.objects to generic IMHO

I don't have an issue with introducing obspy.event and obspy.inventory and obspy.waveform but merging those three into obspy.core would be fine too although I would introduce submodules rather creating one huge file per entity

how about obspy.net for network related stuff

@megies
Copy link
Member Author

megies commented Jul 16, 2014

how about obspy.net for network related stuff

too easy to misinterpret as related to "network" as in "seismic network" IMHO

here's one pretty conservative reorganization possibility:

-- obspy/
   |
   |-- core/
       |
       |-- trace.py
       |-- stream.py
       |-- utcdatetime.py
       |-- event.py
       |-- station/
           |
           |-- inventory.py
           |-- network.py
           |-- ...
   |
   |-- plugins/
       |
       |-- gse2
       |-- mseed
       |-- sac
       |-- ...
       |-- quakeml
       |-- stationxml
   |
   |-- clients/
       |
       |-- arclink
       |-- earthworm
       |-- fdsn
       |-- iris
       |-- neic
       |-- neries
       |-- seedlink
       |-- seishub
   |
   |-- db/
   |-- imaging/
   |-- realtime/
   |-- signal/
   |-- taup/
   |-- xseed/

@QuLogic
Copy link
Member

QuLogic commented Jul 16, 2014

I like obspy.io as well; it's like scipy.io with its Matlab, wave, etc. formats or pandas.io.

You could use ext instead of plugins, like Sphinx or SQLAlchemy. BTW, is xseed not similar to mseed?

Oh, and could we please clarify the situation with the functions for calculating GC distance; there are no less than 3 of them (depending on how you count the imports).

@barsch
Copy link
Member

barsch commented Jul 16, 2014

BTW, is xseed not similar to mseed?

mseed == MiniSEED-> data only
xseed == Dataless SEED/ RESP files -> meta data describing data
seed = Dataless SEED + MiniSEED

obspy.ext would be fine for me

@krischer
Copy link
Member

I strongly prefer obspy.io for anything that deals with fileformats. scipy, pandas, astropy, ... all have an io submodule for that kind of stuff. Other than that our plugin entry points are hooking into obspy.plugin and we cannot change that without breaking code.

+1 for cleaning up the geodetic functions!

@megies
Copy link
Member Author

megies commented Jul 17, 2014

Updated tree, more comments, votes for changes?

-- obspy/
   |
   |-- core/
       |
       |-- trace.py
       |-- stream.py
       |-- utcdatetime.py
       |-- event.py
       |-- station/
           |
           |-- inventory.py
           |-- network.py
           |-- ...
   |
   |-- io/
       |
       |-- gse2
       |-- mseed
       |-- sac
       |-- ...
       |-- quakeml
       |-- stationxml
   |
   |-- clients/
       |
       |-- arclink
       |-- earthworm
       |-- fdsn
       |-- iris
       |-- neic
       |-- neries
       |-- seedlink
       |-- seishub
   |
   |-- db/
   |-- imaging/
   |-- realtime/
   |-- signal/
   |-- taup/
   |-- xseed/

BTW, is xseed not similar to mseed?

It's pretty different, the most important functionality in xseed is included in the Parser class that can read Dataless/Full SEED and XMLSEED and then provides access to the station metadata. Ideally all of these formats should be parsed by read_inventory but we couldnt come up with a good approach so far. The parsing SEED is extremely complicated and the only good conversion tool (https://seiscode.iris.washington.edu/projects/stationxml-converter maintained by IRIS) is Java so we found no way to reuse that one internally..

@petrrr
Copy link
Contributor

petrrr commented Jul 17, 2014

I would like to propose that modules mseed and xseed are merged into one module seed (probably: obspy.io.seed) where all SEED related functionally is collected.

This should also help to provide (nearly) full support for SEED. One would need to implement the few missing ASCII blockettes. The idea is to provide a fullseed reader, by dispatching ASCII blockets to the existing parser as implemented in xseed and the binary blockettes to libmseed, provide several objects, but read the whole file only once.

@megies
Copy link
Member Author

megies commented Jul 17, 2014

but read the whole file only once.

mseed is pure waveform I/O, xseed is pure metadata I/O. Right now we have no object really that contains both.

@krischer
Copy link
Member

xseed should go to io.

In the very long run we should probably change the obspy.xseed.Parser class to use the station related classes (this is no fun task; any takers? ;-)) which would take care of the issue mentioned by @megies. Or we just consider SEED a legacy format and leave support as it is right now...good enough in my book but I can also see the other side of that coin.

@petrrr I don't mind merging obspy.mseed and obspy.xseed but I don't think it is required either. They are very different modules and really don't have anything do to with each other except being specified in the same manual.

In regards to a full SEED reader, the following would do the trick:

from obspy import read
from obspy.xseed import Parser

def read_seed(filename):
    try:
        st = read(filename)
    except:
        st = None
    try:
        p = Parser(filename)
    except:
        p = None
    return st, p

The overhead is minimal compared to your approach and thus not really worth the effort. Both reading methods will just jump around the records in the files until it finds some it can parse so the file is not really read twice . The dataless SEED part is usually only a couple records at the beginning of a file.

If one really cares about performance, one should not really use full SEED in any case.

We furthermore currently lack an internal data structure to deal with mixed data types, e.g. waveforms and station metadata but this is something we've been thinking about for a while.

@megies
Copy link
Member Author

megies commented Jul 17, 2014

this is no fun task

And also a monster task. Only if it would somehow work to use the IRIS Java tool internally this would be feasible IMO. But we've seen only possibilities that still rely on the Java VM at runtime which is out of the question..

@megies
Copy link
Member Author

megies commented Jul 17, 2014

I've put the module tree sketch at the top comment, modifications can go there now..

@krischer
Copy link
Member

So should we start this after we release 0.10 or before? I would prefer to do it afterwards, at least we should have merged most open PRs before we do this and maybe do it really quick in a sprint so we avoid merge hell as much as possible

@claudiodsf
Copy link
Member

Another point I would like to rise again here is the dependence of obspy.signal on matplotlib (see #576).
It would be nice, as also requested by @petrrr several times, to have a matplotlib dependence only for obspy.imaging.

@petrrr
Copy link
Contributor

petrrr commented Jul 17, 2014

@krischer: Considering SEED a legacy format or not, might be less relevant from a practical point of view. Even if we might consider it technically outdated, I do not see it disappearing any soon. It is just too established -- maybe even for good reasons as it solved already may problems with different formats which existed before -- while there is a lack of a well-established alternatives which could be picked up right away. (Yes, I know about proposals). This may change over time, but such processes take time and we are not there yet.

I agree, that we may not need to consider some features of SEED which are rarely used. Maybe you are right that traversing file several times is not such a big issue. Still I would consider the current implementation incomplete (the proposed solution is not fully equivalent to full SEED support) and some task are difficult to deal with (both in Obspy as well as in general).

But I accept that as long as it is only me to have this issue, we can live without addressing this.

@krischer
Copy link
Member

What do you mean by rarely used features of SEED?

One of the few things that is currently impossible with ObsPy is writing full SEED files. But that would actually be fairly easy to add in case such a thing is desired. Otherwise we also cannot really extract information from exotic data only blockettes but that again would be fairly simple to add. Regarding dataless blockettes I think we can deal with all of them, otherwise obspy.xseed would raise an error.

@barsch
Copy link
Member

barsch commented Jul 17, 2014

if we completely change the structure anyway we could use the opportunity to rename taup to a more fitting name

also I would wait for the next release and start work on it at a dedicated dev sprint

@barsch barsch added this to the 0.11.0 milestone Jul 18, 2014
@megies
Copy link
Member Author

megies commented Jul 18, 2014

Another point I would like to rise again here is the dependence of obspy.signal on matplotlib (see #576).
It would be nice, as also requested by @petrrr several times, to have a matplotlib dependence only for obspy.imaging.

What for? matplotlib is a hard dependency. On top of that, matplotlib is only imported in signal inside the respective routines that use it.

@claudiodsf
Copy link
Member

@megies It's effectively better now, respect to #576, since matplotlib is now 'lazy' imported.

And apparently, obspy.signal will run also without matplotlib: e.g., https://github.com/obspy/obspy/blob/master/obspy/signal/spectral_estimation.py#L42.

So, for the moment, that's ok for me, unless @petrrr has other comments.

@ThomasLecocq
Copy link
Contributor

I find this part

-- obspy/
   |
   |-- core/
       |
       |-- ...
       |-- station/
           |
           |-- inventory.py
           |-- network.py
           |-- ...
   |

counter intuitive... station-> then inventory & network...

@petrrr petrrr modified the milestones: 0.10.0, 0.11.0 Jul 25, 2014
@krischer krischer removed this from the 0.10.0 milestone Jul 29, 2014
@QuLogic
Copy link
Member

QuLogic commented Mar 23, 2015

I finished up the network tests and docs, though I didn't change the tutorial yet. If we go the deprecation route, those would probably be handy for testing. The docs were changed rather bluntly, so they might require some new top-level pages or the like.

@markcwill
Copy link
Contributor

@megies, @krischer, @QuLogic sorry, brain fart, I think I forgot you guys are merging master -> releases, not develop -> master.

My point was, I don't quite see a major version bump as "just a number". Probably not a big deal for this one, but IMO there should be a clearly stated policy if say, bug fixes will not be ported to a 0.x release once a 1.0.0 happens, and the delineation should be made as clearly and loudly as possible.

@CTrabant
Copy link

My point was, I don't quite see a major version bump as "just a number". Probably not a big deal for this one, but IMO there should be a clearly stated policy if say, bug fixes will not be ported to a 0.x release once a 1.0.0 happens, and the delineation should be made as clearly and loudly as possible.

I second this notion. While there should not be too much meaning packed into version numbers, changing major version numbers at major changes can be very useful. Fast forward a few years, keeping in mind production environments that do not change quickly, and it will be very easy to remember when ObsPy change to Python3 only based solely on major version number. Imagine if what we know as Python 3 was versioned Python 0.12.45.3, it would be basically useless.

@QuLogic
Copy link
Member

QuLogic commented Mar 23, 2015

I think you may have misread my "just a number" comment; I meant we shouldn't be afraid to make the jump to 1.0 if the changes are major enough to warrant it (which I think means we both agree.)

Of course, there are also some connotations with a "1.0" that aren't there with a "2.0", namely stability, etc., but I think (except for this proposed change), ObsPy really is quite stable at the moment.

@mbyt
Copy link
Member

mbyt commented Mar 23, 2015

We divagated from the actual topic into a discussion of version numbers ;o).

Back to the topic: I'd vote for merging this, let's resolve the remainder in master --- and the next release is not in the near future --- so there remains lot's of time to discuss....

@krischer
Copy link
Member

Give me like one more day and then we can merge this. I really want the import deprecations to work and I think I can get it working.

I want that my code that depends on ObsPy runs on the current master and the latest stable. Otherwise it just makes my (and I assume others as well) life harder and honestly we should not merge anything that makes things more cumbersome in the long run for us.

@krischer
Copy link
Member

Turned out easier (and cleaner) than expected :-) Sometimes Python is quite nice to use. Feel free to merge as soon as CI passes.

Essentially the imports of the previous version now still work but raise a deprecation warning.

krischer added a commit that referenced this pull request Mar 24, 2015
Reorganization of ObsPy submodule structure
@krischer krischer merged commit d5213da into master Mar 24, 2015
@megies
Copy link
Member Author

megies commented May 18, 2015

I just noticed that obspy/core/scripts has been moved to obspy/scripts. However all other scripts folders remain as subfolders of the respective submodules, namely:

obspy/imaging/scripts
obspy/db/scripts
obspy/io/mseed/scripts
obspy/io/xseed/scripts

@krischer, what's the rationale here? Isn't that introducing kind of an inconsistency?

@QuLogic QuLogic deleted the reorganization branch March 3, 2016 20:42
@QuLogic QuLogic added the cleanup code refactoring etc. label Mar 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup code refactoring etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants