Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid duplication in request body. #15

Closed
kura-okubo opened this issue Jun 10, 2019 · 4 comments
Closed

Avoid duplication in request body. #15

kura-okubo opened this issue Jun 10, 2019 · 4 comments
Labels
server-side issue e.g., non-standard implementation of XML, non-standard or missing FDSN keywords

Comments

@kura-okubo
Copy link

Hello jpjones,

I tried downloading data from New Zealand data server GEONET, and I found duplications in data query strings as below.


julia> get_data("FDSN", "NZ.NAAS.*.BN?", s="2016-05-20T01:14:25.07", t="2016-05-20T01:24:25.07", v=2, src="GEONET")
[ Info: 2019-06-10T22:12:23.160: Querying FDSN stations
Most compact request form = ["NZ" "NAAS" "*" "BN?" ""]
request url:http://service.geonet.org.nz/fdsnws/station/1/query
request body:
level=response
format=xml
 NZ NAAS * BN? 2016-05-20T01:14:25.07 2016-05-20T01:24:25.07

[ Info: 2019-06-10T22:12:24.644: Building list of channels
data query strings:
NZ NAAS 20 BN2
NZ NAAS 20 BN1
NZ NAAS 20 BNZ
NZ NAAS 20 BN2
NZ NAAS 20 BN1
NZ NAAS 20 BNZ
[ Info: 2019-06-10T22:12:24.652: Data query begins
request url: http://service.geonet.org.nz/fdsnws/dataselect/1/query
request body:
format=miniseed
NZ NAAS 20 BN2 2016-05-20T01:14:25.070000 2016-05-20T01:24:25.070000
NZ NAAS 20 BN1 2016-05-20T01:14:25.070000 2016-05-20T01:24:25.070000
NZ NAAS 20 BNZ 2016-05-20T01:14:25.070000 2016-05-20T01:24:25.070000
NZ NAAS 20 BN2 2016-05-20T01:14:25.070000 2016-05-20T01:24:25.070000
NZ NAAS 20 BN1 2016-05-20T01:14:25.070000 2016-05-20T01:24:25.070000
NZ NAAS 20 BNZ 2016-05-20T01:14:25.070000 2016-05-20T01:24:25.070000
NZ.NAAS.20.BN2: resized from length 0 to length 360000
NZ.NAAS.20.BN1: resized from length 0 to length 360000
NZ.NAAS.20.BNZ: resized from length 0 to length 360000
[ Info: 2019-06-10T22:12:37.211: Done FDSNget query.
[ Info: 2019-06-10T22:12:37.211: Removing empty channels.
SeisData with 3 channels (3 shown)
    ID: NZ.NAAS.20.BN2                     NZ.NAAS.20.BN1                     NZ.NAAS.20.BNZ
  NAME: Napier Airport                     Napier Airport                     Napier Airport
   LOC: -39.4687 N, 176.872 E, 2.0 m       -39.4687 N, 176.872 E, 2.0 m       -39.4687 N, 176.872 E, 2.0 m
    FS: 50.0                               50.0                               50.0
  GAIN: 1.01972e5                          1.01972e5                          1.01972e5
  RESP: c = 1.0, 0 zeros, 0 poles          c = 1.0, 0 zeros, 0 poles          c = 1.0, 0 zeros, 0 poles
 UNITS: m/s2                               m/s2                               m/s2
   SRC: http://service.geonet.org.nz/fdsn… http://service.geonet.org.nz/fdsn… http://service.geonet.org.nz/fdsn…
  MISC: 2 entries                          2 entries                          2 entries
 NOTES: 0 entries                          0 entries                          0 entries
     T: 2016-05-20T01:14:24.034 (0 gaps)   2016-05-20T01:14:20.871 (0 gaps)   2016-05-20T01:14:24.515 (0 gaps)
     X: -2.057e+03                         -2.413e+03                         -3.780e+02
        -2.057e+03                         -2.415e+03                         -3.710e+02
            ...                                ...                                ...
        -2.036e+03                         -2.375e+03                         -3.520e+02
        (nx = 61112)                       (nx = 60702)                       (nx = 60444)
     C: 0 open, 0 total

Julia>

I guess this is due to the duplication in StationXML downloaded from GEONET. This duplication causes an error when using lat-lon box request (due to limitation of request number of channels) as the number of request becomes much larger than actual number.

In addition, it sometimes causes an error as below:

┌ Warning: Error thrown:
│ URL: http://service.geonet.org.nz/fdsnws/dataselect/1/query
│ POST BODY:
│ format=miniseed
│ NZ CCCC 20 BN1 2018-05-20T00:42:04.570000 2018-05-20T00:52:04.570000
│ NZ CCCC 20 BN2 2018-05-20T00:42:04.570000 2018-05-20T00:52:04.570000
│ NZ CCCC 20 BNZ 2018-05-20T00:42:04.570000 2018-05-20T00:52:04.570000
│ NZ CCCC 20 BNZ 2018-05-20T00:42:04.570000 2018-05-20T00:52:04.570000
│ NZ CCCC 20 BN2 2018-05-20T00:42:04.570000 2018-05-20T00:52:04.570000
│ NZ CCCC 20 BN1 2018-05-20T00:42:04.570000 2018-05-20T00:52:04.570000
│
│ ERROR TYPE: HTTP.IOExtras.IOError
└ @ SeisIO ~/.julia/packages/SeisIO/mMCC6/src/Web/0_essentials.jl:58

So I would like to ask to modify get_data function to avoid this duplicated request.

Best,

@jpjones76
Copy link
Owner

Thank you for the report. I'm looking into this now.

@jpjones76
Copy link
Owner

jpjones76 commented Jun 11, 2019

OK, I found the problem. The bad news is that it's definitely an issue with GeoNet. Marine replicated this bug in ObsPy earlier today. The good news is that I know how to fix it. Amazingly enough, my blind guess about the cause is correct:

When GeoNet changes a channel's parameters, they record a startDate attribute for the new XML element, but there's no endDate attribute added to the old element. However, it seems that a channel element with noendDate is considered valid in the time range -∞:+∞; for example, your query for 2016 returns some channel elements with a startDate of November 2017. I did an identical query through their webpage and got exactly the same results.

Workaround: I can add a control loop to SeisIO that retains one unique entry per channel, based on startDate . This might be messy because I need to test each channel ID for uniqueness, then loop over each group of IDs to create an array of endDate values, then retain the element that's correct for the query window.

Do you know anyone at GeoNet? Could you encourage them to add endDate values to their station XML? I ask because it's easy to imagine a "use case" where this breaks research: suppose a program reads station XML until the first match of each channel. That's OK for normal station XML, but would yield Geonet parameters that are outdated and therefore wrong. Now suppose one's research requires correcting to true ground velocity, and the "wrong" parameters include a gain...

(I thought of this because I encountered a very similar "use case" with Win32 data in 2016: JMA, Nagoya University, and HiNet each had their own parameter file for the two JMA stations on Mt. Ontake. No two parameter files agreed. The gain of each seismic channel varied from file to file by ~50%; the gain of each infrasound channel varied by 3-4 orders of magnitude. No one knew which parameters were current.)

I'll add a fix to SeisIO in a few days. At the moment I'm trying to learn why the Julia ecosystem didn't update SeisIO to v0.3.0.

@jpjones76 jpjones76 added the bug verified bug in SeisIO code label Jun 13, 2019
@jpjones76 jpjones76 added server-side issue e.g., non-standard implementation of XML, non-standard or missing FDSN keywords and removed bug verified bug in SeisIO code labels Jul 11, 2019
@jpjones76
Copy link
Owner

Hi, I implemented a rewrite of FDSN_sta_xml tonight that should include a very clean workaround for this problem. Are you still having this issue, or is it now fixed?

@kura-okubo
Copy link
Author

kura-okubo commented Jul 17, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
server-side issue e.g., non-standard implementation of XML, non-standard or missing FDSN keywords
Projects
None yet
Development

No branches or pull requests

2 participants