Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid resource identifier for QuakeML produced by SC3 #2090

Closed
claudiodsf opened this issue Mar 19, 2018 · 8 comments · Fixed by #2104
Closed

Invalid resource identifier for QuakeML produced by SC3 #2090

claudiodsf opened this issue Mar 19, 2018 · 8 comments · Fixed by #2104
Milestone

Comments

@claudiodsf
Copy link
Member

This is a follow-up of the discussion in the mailing list, which I report here:

The attached QuakeML file (ipgp2016afqtcu.xml.gz) has been produced by SeisComP3.

It is correctly read by read_events(), but when trying to write it back to disk, I get the following error:

ValueError: The id 'smi:org.gfz-potsdam.de/geofon/RMHP(60)>>ITAPER(3)>>BW(4,5,15)' is not a valid QuakeML resource identifier.

This is related to the presence of the sign > which is encoded in the original xml file as > but gets decoded as >.

This symbol is part of the SeisComP3 filter grammar: https://www.seiscomp3.org/doc/jakarta/2015.040/base/filter-grammar.html

I think that, as long as this symbol is not decoded from its HTML representation, it should be acceptable.

So my impression is that this is an ObsPy bug.

@claudiodsf
Copy link
Member Author

The same file is not valid according to jing, but due to a different error:

ipgp2016afqtcu.xml:61:127: error: value of attribute "publicID" is invalid; must be a URI

That's because an URI cannot have more than one # sign.

By manually fixing this problem (see ipgp2016afqtcu_fix.xml.gz), the resulting file is valid XML, according to jing, even with those > in it.

@claudiodsf
Copy link
Member Author

claudiodsf commented Mar 19, 2018

This has been discussed in the mailing list with @krischer, but I'm not sure whether it is an ObsPy bug or a SeisComP3 bug, so I call here @andres-h and @gempa-jabe as well.
Thanks!

Edit: also ping @jmsaurel

@megies
Copy link
Member

megies commented Mar 20, 2018

See also #1872

@krischer
Copy link
Member

@andres-h @gempa-jabe pinging you again so we can discuss this. Its getting a bit lengthy on the mailing list.

The question boils down whether or not resource ids like smi:org.gfz-potsdam.de/geofon/RMHP(60)>>ITAPER(3)>>BW(4,5,15) are valid in QuakeML or not. In the raw XML it is encoded as smi:org.gfz-potsdam.de/geofon/RMHP(60)>>ITAPER(3)>>BW(4,5,15).

Validation tools don't complain but I personally think that the regular expression for the allowable characters has to be applied at the decoded representation which would mean that these ids are not valid.

@claudiodsf
Copy link
Member Author

Pinging also @feuchner and @kaestli in case they want/can weight in.

@andres-h
Copy link

I think a clean solution would be using filter ID without any HTML special characters, but at the end it's @gempa-jabe who decides.

In any case, there will be QuakeML generated with old SC3 versions, so it would be good if this error can be changed to a warning at least.

@gempa-jabe
Copy link

Using filterID without any XML specific characters would help in this particular case but it is unfortunately not a generic solution. We need to find a way to safely transform any string to an smi and back. I don't have a perfect solution to that right now. We also can't change the filters as we are using them in SC3, so the simplest approach is to ignore filterID while converting to QuakeML. But that does not prevent anyone from configuring invalid publicIDs that do not fit into the smi scheme. I could imagine to encode the IDs as base64 but there is no notion that this has been done, so the way back is not clear. I agree with @andres-h that this won't help with documents created with older SC3 version. Relaxing the smi restrictions or issue a warning would help here.

@claudiodsf
Copy link
Member Author

Thanks @andres-h and @gempa-jabe for the replies! I will continue the discussion on #2093 on how to implement a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants