Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python module validation of observation_id field - too stringent? #12

Closed
alaity47 opened this issue Feb 9, 2023 · 6 comments
Closed

Comments

@alaity47
Copy link

alaity47 commented Feb 9, 2023

Several of our collections at IRSA have observationid values that include spaces or slashes, and we're running into the following error trying to export them with the python caom2 library:

invalid observation_id: may not contain space ( ), slash (/), escape (\), or percent (%)

I reached out to @pdowler, who said this is to ensure that observation_id does not contain any characters aren't URI-safe, and suggested I open an issue here for discussion.

We don't use that field for URI population and haven't been able to find any documentation indicating that the observation_id string has any restrictions.

Is the caom2 module possibly being too stringent in how it validates this column?

(FYI we use CAOM 2.3, and the python module caom2 2.3.8.4)

@pdowler
Copy link
Member

pdowler commented Feb 9, 2023

see https://www.opencadc.org/caom2/#ObservationURI and https://www.opencadc.org/caom2/#PlaneURI

Those two types specify the structure of those URIs, but since they do so in reference to other fields they only loosely imply that (i) those field values must be valid to use in URIs and (ii) additional path separators (/) are not allowed.

This is admittedly a poorly documented part of the model. Since URIs are used internally and also services are expected to be able to provide ivoa publisher ID values based on these same strings, the restriction on chars that cannot be used in URIs (eg space) is pretty much impossible to remove.

@pdowler
Copy link
Member

pdowler commented Feb 9, 2023

Since ObservationURI contains {collection} and {observationID} (one /) and PlaneURI contains that plus {productID}, the URIs can currently be validated based on number of path components to make sure s/w is using the right one in the right place... we could think about allowing additional separators (/) but that would weaken the validation of those two types of URI.

@pdowler
Copy link
Member

pdowler commented Feb 9, 2023

@alaity47 Can you provide some concrete use case/example of observationID and/or productID values? Where they come from, who decides what those strings are, etc.

@alaity47
Copy link
Author

alaity47 commented Feb 9, 2023

The format/content of observationid for us varies by mission. We only have a couple of collections where a space or a slash can be in there, and it's because we're either using part or all of a filename path, or some kind of target/object name.

Some examples:

allsky/970607n/s043
BD+62 1644
HR 7042

I completely understand enforcing these restrictions on URIs, but the way we've implemented things here there's no direct relationship between observation_id and any URIs. To me the documentation reads as if that field can be any string (https://www.opencadc.org/caom2/2.3/#Observation.observationID)

@dr-rodriguez
Copy link

At MAST, the observation_id is typically the "fileroot" of the primary science file. For example, if the file is ipppssoot_drz.fits, the observation_id would be ipppssoot. I know we use URI parsing logic for the artifact URI and would not be surprised if we also have some for the observation URI. Many times that parsing logic is very simple and just splits by "/" so having additional ones would likely break things on our end.
Regardless, because our observation IDs are derived from filenames, we generally have the restriction that they would be valid filenames in the first place and so "/" would not be allowed. Spaces in principle should have no issues, but for us we use underscore and dashes to split up parts of a fileroot (eg, hlsp_k2sc_k2_llc_211804118-c05_kepler_v2_lc).
Could you consider replacing / or spaces with _ or - to work around these limitations?

@pdowler
Copy link
Member

pdowler commented Jul 26, 2024

closing in favour of #3

@pdowler pdowler closed this as completed Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants