Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for the IRIS Federated Catalog Service #1779

Closed
wants to merge 87 commits into from

Conversation

CelsoReyes
Copy link
Contributor

@CelsoReyes CelsoReyes commented May 5, 2017

IRIS has a Federated Catalog Webservice that I have been tasked with integrating into ObsPy.

From the webpage:

The irisws-fedcatalog web service returns selected time series channels from across multiple FDSN data centers. The combined catalog of channels is updated daily from each contributing data center.

The use of this module is similar to the existing fdsn Client.

from obspy.clients.fdsn import FederatedClient
client = FederatedClient()
data = client.get_waveforms( ... ) #  just like normal
data = client.get_waveforms_bulk( ... ) #  just like normal
inv = client.get_stations( ... ) #  just like normal
inv = client.get_stations_bulk( ... ) #  just like normal

These would behave much like the normal versions, except that they would retrieve data from any number of data centers (providers), depending on the response of the fedcatalog service

one can also get the routing information without downloading the data

routes = client.get_routing( ... )
routes = client.get_routing_bulk( ... )

I have attached a diagram that shows how the parts interact:
FederatedClient.pdf

@CelsoReyes
Copy link
Contributor Author

apveyor seems to befailing on test_future_imports_in_every_file, even though the imports exist...

@megies megies changed the title Irisfederator Add support for the IRIS Federated Catalog Service May 8, 2017
@megies megies added .clients.fdsn enhancement feature request labels May 8, 2017
@megies
Copy link
Member

megies commented May 8, 2017

@megies
Copy link
Member

megies commented May 8, 2017

apveyor seems to be failing on test_future_imports_in_every_file, even though the imports exist...

Those imports need to be at top of files and grouped together otherwise our regex checks dont pick it up properly. I've added some commits that should fix the future imports, @CelsoReyes please remember to pull in changes to your local branch.

@megies megies requested a review from krischer May 8, 2017 09:40
@megies
Copy link
Member

megies commented May 8, 2017

Looks like this is might be ready-ish already, so I'll add it to 1.1.0 for now..

@megies megies added this to the 1.1.0 milestone May 8, 2017
@krischer
Copy link
Member

krischer commented May 8, 2017

Something just went wrong here ;-) Did you merge master into the branch? You'll probably want to rebase fcf5736eaef9c50fb0de1a077c17515f886f9cbb on top of master. I can do it you want but it would result in a force push to your fork so I'd better ask beforehand.

@CelsoReyes
Copy link
Contributor Author

Hi Tobias, Lion,
Thanks for the review. This is where I guess I'm getting tripped up now. I had attempted to update my local branches to get them synced up before making any changes. But I guess now my version and this are intimately tied together? This is a part of Git I've not had much experience in.

I haven't made any changes other than trying to get things synced, so feel free to do whatever you need to to untangle me (sorry!). Then, once we're all in a good place, I will make any still-needed changes to my branch

@megies megies force-pushed the irisfederator branch 2 times, most recently from c543fb2 to 7257edf Compare May 8, 2017 12:44
@megies
Copy link
Member

megies commented May 8, 2017

Rebased on current master and force-pushed..

btw @CelsoReyes if you add your ETH email address to your list of mail addresses in github those commits should be properly connected to your user account.

@CelsoReyes
Copy link
Contributor Author

Hi, Just checking on the status. It looks like it's tripping over some core stuff? I'm not making too much sense out of a couple of these reports. Moving forward, issues regarding this feature will be handled by Chad Trabant and the guys at IRIS. Of course, I'm still here and at your service for questions

@krischer
Copy link
Member

I'm still working on one other thing - once that is done I'll have a look at this PR. Sorry for taking so long but this is a big PR and I need to find a big chunk of time to do it.

@megies
Copy link
Member

megies commented May 24, 2017

The files that were removed in this commit (0f39d1a) need to be removed from history for good, just removing them with another commit doesn't cut it, this will need some history cleanup/squashing.

@megies
Copy link
Member

megies commented Sep 3, 2017

Why is SSL verification disabled in 'requests' calls?

From what I'm reading here this would mean additional dependencies? Dunno..

@markcwill
Copy link
Contributor

@megies, you are right for urllib3, but requests is a wrapper and does this for you, certifi is a hard dep for requests, so all the deps should be met? The only reason I can think of for doing this is that the FDSN certs aren't valid, or are self-signed, the solution for this would be to change the endpoint to not be https if it isn't actually supported, or offer their self-signed cert publicly so users can add it to their trusted bundle somehow. This looks like a security issue to me -- ignoring invalid certs deep in the code and swallowing all the user warnings?

I might be missing something. Curious as to the reason @CelsoReyes or @nick-iris or @adam-iris have for doing this.

@megies
Copy link
Member

megies commented Sep 4, 2017

This looks like a security issue to me -- ignoring invalid certs deep in the code and swallowing all the user warnings?

I agree, going to https and ignoring warnings is not really an option. If https doesn't work properly we should either figure out why or fall back to http.

@nick-falco
Copy link
Member

nick-falco commented Sep 7, 2017

I tend to disagree. The user is making a request to the obspy client and specifies start and end time, so to the user it doesnt matter where the data is cut to these times, but it should be in any case.

The normal (non-federated) obspy.clients.fdsn.Client does not trim samples outside of the requested start and end times. I think this is also an issue with the normal Client and not the FederatedClient exclusively.

For example, the following dataselect request for 10 seconds of data returned a result containing data before and after the requested start and end times.

GET Version:
http://eida.ipgp.fr/fdsnws/dataselect/1/query?net=G&sta=PEL&cha=LHZ&starttime=2016-11-01T00:00:00&endtime=2016-11-01T00:00:10

   Source                Start sample             End sample        Gap  Hz  Samples
G_PEL_00_LHZ      2016,305,23:43:16.000000 2016,306,00:00:27.000000  ==  1   1032
Total: 1 trace(s) with 1 segment(s)

ObsPy Client Version:
i.e. running

from obspy import UTCDateTime
from obspy.clients import fdsn
c = fdsn.client.Client(
                        service_mappings={
                            'dataselect': \
                                'http://eida.ipgp.fr/fdsnws/dataselect/1'
                        },
                        debug=True
                    )

stream = c.get_waveforms('G', 'PEL',
                         '*', 'LHZ',
                         UTCDateTime('2016-11-01T00:00:00'),
                         UTCDateTime('2016-11-01T00:00:10'))

print stream

produces this output

1 Trace(s) in Stream:
G.PEL.00.LHZ | 2016-10-31T23:43:16.000000Z - 2016-11-01T00:00:27.000000Z | 1.0 Hz, 1032 samples

…llib3. FederatedClient now follows the same design as Client.

- Simplify parallel request test in test_federatedclient to speed things up.
- Update FederatedClient to use CustomRedirectHandler the class defined in the ~obspy.clients.fdsn.client module.
@markcwill
Copy link
Contributor

Per #1254, I think we are trying to move toward requests rather than away. @nick-iris your existing code looked good (other than disabling SSL). I was kind of hoping it would be model for fixing fdsn.client, not the other way around... Was there a test failure related to requests?

@adam-iris
Copy link
Contributor

I think moving to requests is a good idea, but I feel like this isn't the right place to spearhead that work. (Mainly, it seems like a very large change in scope.) The latest changes have FederatedClient now mostly reusing components from fdsn.client; as a result, if fdsn.client does move to use requests I think it should be relatively simple for FederatedClient to follow.

@megies
Copy link
Member

megies commented Sep 13, 2017

We're trying to finally get to a 1.1.0 release, so I think it's OK for now if it's not using requests exclusively. (CC @krischer)

For example, the following dataselect request for 10 seconds of data returned a result containing data before and after the requested start and end times.

Interesting. This should indeed then also be fixed in the regular FDSN client. At least for the normal request (not for the direct output request using filename=... which directly saves the MSEED data to file without modification).
I guess it's OK for now in here as well and we can handle it in a follow up issue for both clients.

----------------------------------

The
:mod:`FDSN fedcatalog_client <obspy.clients.fdsn.routers.fedcatalog_client>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link seems broken in the docs build, I think the reason is that the new submodules are not added to the sphinx package skeleton: https://github.com/obspy/obspy/blob/master/misc/docs/source/packages/obspy.clients.fdsn.rst

See also the build log for respective warnings/errors: http://docs.obspy.org/pull-requests/1779/log.txt

`FDSN Station and Dataselect web service definitions
<https://www.fdsn.org/webservices/>`_.

The first step is always to initialize a :class:`FederatedClient` object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link sees broken as well.. this should work, hopefully: :class:`~obspy.clients.fdsn.FederatedClient` if not you have to specify full submodule path..

>>> client = FederatedClient()

(1) :meth:`~obspy.clients.fdsn.routers.fedcatalog_client.
FederatedClient.get_waveforms()`: The following
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broken link here too, likely due to aforementioned missing submodule entries in sphinx skeleton document (misc/docs/source/packages/obspy.clients.fdsn.rst)


This FederatedClient first queries the FedCatalog service to determine where
the data of interest reside. It then queries the individual web services from
each provider using the FDSN Client routines to retrieve the resulting data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove leading spaces for all these lines, it's interpreted as a block quote by sphix: http://docs.obspy.org/pull-requests/1779/packages/autogen/obspy.clients.fdsn.routers.html#module-obspy.clients.fdsn.routers

client = Client()
t = UTCDateTime("2010-02-27T06:45:00.000")
st = client.get_waveforms("IU", "ANMO", "00", "LHZ", t, t + 60 * 60)
st.plot()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is broken as it's importing FedClient and then trying to use normal client. Also, I think it's not necessary in this example to do a plot.. I would just have it print the st after fetching the data in the doctest example above.


Retrieving Waveform Metadata
---------------------------
Submitting a GET request to the federated catalog service. The service
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of times I'm seeing these notes about doing GET or POST requests (FDSN API front page, fdsn/routers API front page etc)... this should be removed from the docs I think, that's a low level implementation aspect that's not interesting for the user.

recognizes not only the parameters normally accepted by the bulkdataselect web
service, but also the parameters accepted by the station service. This
includes geographic parameters. For more details, see the help for
obspy.clients.fdsn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take the time to properly do sphinx links, i.e. :mod:`obspy.clients.fdsn` . I know it's annoying to write up in the code base, but high quality across the documentation is not a cosmetics aspect, but it's vital for the usefulness of the docs. And once we start letting sloppy docs sections get through, it's all downhill from there.


caveats
----
* duplicated metadata remains duplicated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could use a bit more information

the duplications. example:
>>> INV2 = fclient.get_stations(network="I?", station="AN*",
... level="network", includeoverlaps="true")
... #doctest: +SKIP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (deduplication) sounds like a (valid) feature request for Inventory class. I'm not sure though if this is the right place to add this notice..

and then starttime by date.
A contains B : This takes wildcards and time ranges into account to denote
whether a request for A would include B's data too.
A == B : all fields are the same (wildcards only match wildcards, for example)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole section renders pretty awkward in sphinx, can you please try to fix it? http://docs.obspy.org/pull-requests/1779/packages/autogen/obspy.clients.fdsn.routers.html#module-obspy.clients.fdsn.routers

screenshot from 2017-10-04 17-43-14

@nick-falco
Copy link
Member

@megies Is there a script in place for creating the Sphinx docs on my local computer? Thanks.

@jkmacc-LANL
Copy link
Contributor

@nick-iris There's a Makefile in misc/docs. You can type make html, as long as you have Sphinx and any other needed extensions installed. There's a dependencies file next to the Makefile that lists doc dependencies.

@jkmacc-LANL
Copy link
Contributor

Here's a relevant link in the Wiki.

@megies
Copy link
Member

megies commented Oct 4, 2017

Also see https://gist.github.com/megies/f3f639f575e0ca97fdfd9b41348da802 for the exact conda env currently used on our docs buildbot. I would recommend to start out with the same sphinx at least. Also, I fear building docs takes a long time, unfortunately.. :-/

@krischer
Copy link
Member

krischer commented Oct 7, 2017

Just a quick heads up so we don't do duplicate work: I'm currently working on finalizing this and I'll push my things early next week.

@krischer krischer mentioned this pull request Oct 10, 2017
@krischer
Copy link
Member

Has been merged as part of #1919. I don't know why it did not automatically close it.

@krischer krischer closed this Oct 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants