Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding format argument to get_stations_bulk (and what else?) #3173

Closed

Conversation

filefolder
Copy link
Contributor

@filefolder filefolder commented Oct 11, 2022

Added format argument to get_stations_bulk as currently there is no way to save the file as text, only xml (or whatever the server default is). There may be more edits to make here but I don't think so?

Fixes #3140

@megies
Copy link
Member

megies commented Oct 11, 2022

👍

Hmm.. to be honest, just after a brief look at it, I think all parameters that get urlencoded in the GET request should in principle be included here too..!?! So all the latitude/maxradius/... parameters should be there too, I think.

Screenshot from 2022-10-11 15-48-58

Also, while you are at it, if you want you could also handle #3138, i.e. for station request, do not use autodiscovery of file format, but rather use STATIONXML if format is None or xml and use STATIONTXT if format is text.

@megies megies modified the milestones: 1.3.1, 1.4.0 Oct 11, 2022
@megies
Copy link
Member

megies commented Oct 11, 2022

Ah, I guess it's OK to base this on master, since it changes the call syntax

@filefolder
Copy link
Contributor Author

started looking into the ampersand bug... if I understand the solution is to set format='STATIONXML' to the default in the function header and add something like

    if format.lower() in ("txt", "text"):
        format = "STATIONTXT"
    elif format.lower() == "scxml":
        format = "SC3ML"

in read_inventory? which then gives a, i guess more detailed error of

  File "testbad.xml", line 21
    <Name>ampersand & test & !!</Name>
                    ^
XMLSyntaxError: xmlParseEntityRef: no name, line 21, column 26

(also adding a seiscomp scXml option as this was always bugging me but perhaps overstepping a smidge)

@megies
Copy link
Member

megies commented Oct 18, 2022

@filefolder yes, kind of. I am not sure if adding anything besides the formats StationXML (format="xml") and StationText (format="text") officially specified by FDSN makes a lot of sense.. because we would have to know a-priori how to map anything the user might provide later on for custom web service and then know what internal obspy format to match it to. Actually thinking about it right now, I can really only see this working with automatic file format detection.. so maybe sth like this:

if format is None or format == 'xml':
    read_inventory(data_stream, format='STATIONXML')
elif format == 'text':
    read_inventory(data_stream, format='STATIONTEXT')
else:
    read_inventory(data_stream)

@filefolder
Copy link
Contributor Author

were we doing a separate pr for this?

line 750 clients/fdsn/client.py


        if filename:
            self._write_to_file_object(filename, data_stream)
            data_stream.close()
        else:
            # This works with XML and StationXML data.
            #OLD inventory = read_inventory(data_stream)
            if format is None or format == 'xml':
                inventory = read_inventory(data_stream, format='STATIONXML')
            elif format == 'text':
                inventory = read_inventory(data_stream, format='STATIONTEXT')
            else:
                inventory = read_inventory(data_stream)
            data_stream.close()
            return inventory
In [5]: inv = obspy.read_inventory('testbad.xml')
TypeError: Unknown format for file testbad.xml

I think I find this less informative vs before where it literally points to the problem (py 3.8 anyway). Also maybe relevant that seiscomp will catch and fix these types of errors at inventory import, and with the issue fixed at IRIS, this should now be a rare bug for end users.

@megies
Copy link
Member

megies commented Oct 20, 2022

In [5]: inv = obspy.read_inventory('testbad.xml')
TypeError: Unknown format for file testbad.xml

I think I find this less informative vs before where it literally points to the problem (py 3.8 anyway). Also maybe relevant that seiscomp will catch and fix these types of errors at inventory import, and with the issue fixed at IRIS, this should now be a rare bug for end users.

This isn't what would happen in the originally described case.

On requesting a "xml" download (format not specified or "xml") this will hit your file:

            if format is None or format == 'xml':
                inventory = read_inventory(data_stream, format='STATIONXML')

were we doing a separate pr for this?

I dont care either way, I'd say just do it in one PR

@filefolder
Copy link
Contributor Author

I think that's it (also put it in get_stations_bulk), but not sure how to test exactly as seiscomp won't allow it and the original example is fixed I think

@megies
Copy link
Member

megies commented Oct 20, 2022

I think that's it (also put it in get_stations_bulk), but not sure how to test exactly as seiscomp won't allow it and the original example is fixed I think

I dont think it needs testing, that was just a garbled file and we wanted to have a better error message which we should have now.
Otherwise, testing could be done with mock, mockin in a garbled stationxml file.

@megies
Copy link
Member

megies commented Oct 20, 2022

@filefolder like mentioned above, I believe most (if not all) parameters used in the regular request should be available to bulk request in sent via the POST data payload like the format option. #3173 (comment)

The lat/lon circular/box constraints, updatedafter, matchtimeseries need to be accomodated. The time specific constraints don't really make much sense, since start/end time is specified on each line, and just checked IRIS, they refuse to accept requests with any of the time constraints at the top of the list in the POST body as parameters

Can you add those too?

Screenshot from 2022-10-20 15-23-05

level=station
format=text
TA 2* * * 2010-01-01 2020-01-01
#Network | Station | Latitude | Longitude | Elevation | SiteName | StartTime | EndTime 
TA|214A|31.9559|-112.811501|543.0|Organ Pipe National Monument, Ajo, AZ, USA|2007-05-07T00:00:00.0000|2018-09-07T23:59:59.0000
TA|221A|32.009399|-107.778198|1277.0|Mesquite Ranch, Deming, NM, USA|2008-02-11T00:00:00.0000|2010-01-18T23:59:59.0000
TA|222A|32.104599|-107.101303|1324.0|Williams Family Ranch, Las Cruces, NM, USA|2008-02-14T00:00:00.0000|2010-01-17T23:59:59.0000
TA|223A|32.006199|-106.427597|1232.0|Chaparral, Anthony, NM, USA|2008-03-28T00:00:00.0000|2010-01-16T23:59:59.0000
TA|224A|32.076|-105.522598|1487.0|Cornudas Mountain, Dell City, TX, USA|2008-02-28T00:00:00.0000|2010-01-16T23:59:59.0000
TA|225A|32.1101|-104.822899|1703.0|Deer Hill, Carlsbad, NM, USA|2008-03-26T00:00:00.0000|2010-02-19T23:59:59.0000
TA|226B|32.077801|-104.165398|981.0|Tecolote Peak, Malaga, NM, USA|2009-02-21T00:00:00.0000|2010-02-17T23:59:59.0000
TA|227A|32.012001|-103.292397|879.0|Bennet, Jal, NM, USA|2008-03-21T00:00:00.0000|2010-02-15T23:59:59.0000
TA|228A|32.118|-102.591797|954.0|UT Block 9, Goldsmith, TX, USA|2009-02-10T00:00:00.0000|2010-12-08T23:59:59.0000
TA|229A|31.9671|-101.810699|804.0|Bryant Ranch, Stanton, TX, USA|2009-02-09T00:00:00.0000|2010-12-09T23:59:59.0000
TA|230A|31.8878|-101.112396|742.0|Sterling City, TX, USA|2009-03-03T00:00:00.0000|2011-02-05T23:59:59.0000
TA|231A|31.935301|-100.316299|574.0|Bronte, TX, USA|2009-03-01T00:00:00.0000|2011-02-10T23:59:59.0000
TA|232A|31.888|-99.646896|621.0|Coleman, TX, USA|2009-03-07T00:00:00.0000|2011-02-17T23:59:59.0000
TA|233A|32.017899|-98.899803|539.0|Rising Star, TX, USA|2009-11-18T00:00:00.0000|2011-10-01T23:59:59.0000
TA|234A|32.004002|-98.136803|358.0|Clairette, TX, USA|2009-11-17T00:00:00.0000|2011-10-01T23:59:59.0000
TA|236A|31.999701|-96.530998|118.0|Katherine and Luke Keathley, Corsicana, TX, USA|2010-04-25T00:00:00.0000|2012-01-20T23:59:59.0000
TA|237A|32.001499|-95.808403|126.0|Washetta, Montalba, TX, USA|2010-02-20T00:00:00.0000|2012-01-20T23:59:59.0000
TA|238A|32.003399|-95.1203|126.0|Jacksonville, TX, USA|2010-02-22T00:00:00.0000|2011-12-11T23:59:59.0000
TA|239A|32.017899|-94.470703|100.0|Gary, TX, USA|2010-02-21T00:00:00.0000|2011-12-11T23:59:59.0000
[...]
TA|255A|31.9263|-82.4758|45.0|Hazlehurst, GA, USA|2012-04-04T00:00:00.0000|2014-02-03T23:59:59.0000
TA|256A|31.9799|-81.887802|46.0|Glennville, GA, USA|2012-03-16T00:00:00.0000|2014-01-22T23:59:59.0000
TA|257A|31.9746|-81.0261|9.0|Skidaway Island, Savannah, GA, USA|2012-03-16T00:00:00.0000|2014-01-21T23:59:59.0000
level=station
format=text
latitude=32.649899
longitude=-83.831596
maxradius=1
TA 2* * * 2010-01-01 2020-01-01
#Network | Station | Latitude | Longitude | Elevation | SiteName | StartTime | EndTime 
TA|253A|32.061199|-84.129402|136.0|Americus, GA, USA|2012-03-10T00:00:00.0000|2014-01-26T23:59:59.0000
TA|254A|31.9457|-83.290497|78.0|Abbeville, GA, USA|2012-03-14T00:00:00.0000|2014-02-02T23:59:59.0000
level=station
format=text
latitude=32.649899
longitude=-83.831596
maxradius=2
startbefore=2012-03-12
TA 2* * * 2010-01-01 2020-01-01
Error 400 BAD_REQUEST: Key (startbefore) not allowed here: [6]: startbefore=2012-03-12

Request:
http://service.iris.edu/fdsnws/station/1/query

Request Submitted:
2022-10-20T13:27:46

Service version:1.1.52
          build:1.2.3

@megies
Copy link
Member

megies commented Oct 20, 2022

Also, please add a changelog entry, that checklist in the PR template is not completely useless ;-P

Copy link
Member

@megies megies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • add other parameters to bulk POST payload
  • changelog

filefolder added a commit to filefolder/obspy that referenced this pull request Oct 26, 2022
minradius=None, maxradius=None, level=None,
includerestricted=None, includeavailability=None,
updatedafter=None, matchtimeseries=None, filename=None,
format=None, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add new kwargs after preexisting kwargs, to prevent breaking existing user code in case of using these kwargs as args (which isn't what you would usually do, but still)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK will do if you say so.. I did consider that initially but decided it may be better to make a small leap to maintain the same general structure as get_stations since this jumbled order with format and filename etc in the middle will be permanent after this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it might be slightly ugly, but I think being safe to not break code is more important.

I think we could consider to eventually rectify this by making a leap to introducing , *, in call syntax, making it def get_stations_bulk(self, bulk, *, minlatitude=None, maxlatitude=None, ...) to force using these kwargs by their name, which kind of is the only sane way to use them and what I would expect most people do anyway. When/if we do that as an announced potentially breaking change, we would then be free to reorder as we please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fdsn.client: add query parameters to get_station_bulk()
2 participants