Fix/schema identifier inconsistencies: allow identifier strings in cross-reference properties#54
Fix/schema identifier inconsistencies: allow identifier strings in cross-reference properties#54dgbroeder wants to merge 4 commits intoskg-if:mainfrom
Conversation
- cf.search.keyword - cf.search.org_name - relevant_organisations.name, relevant_organisations.indentifiers.scheme, relevant_organisations.indentifiers.value - srv_has_hosting_organisation.name, srv_has_hosting_organisation.scheme, srv_has_hosting_organisation.identifier - cf.search.org_name - pageQueryParam, pageSizeQueryParamoadded syntax examples for all these
…ence properties Several cross-reference properties (references to other entities) are documented as accepting plain identifier strings (e.g. "org_1", "ven1") but the schema defined them as $ref object types only, causing validators to reject data the docs explicitly show as valid. Changes from bare $ref / oneOf to anyOf: [string | $ref] for: - Product: topics[].term, relevant_organisations[], funding[], contributions[].declared_affiliations[], contributions[].by, manifestations[].biblio.in, manifestations[].biblio.hosting_data_source - Person/Agent: affiliations[].affiliation - Grant: beneficiaries[], contributions[].declared_affiliations[], contributions[].by, funding_agency API responses may still return expanded objects.
|
Hi Daan, @dgbroeder PR Topic 1 meta section page The idea was reuse the semantic logic of the activitystreams vocab, with page navigation managed by the result identifiers. previous and next. (the "page" is a string and can have any value) On the first commit your request is to add the following fields in the meta section Note : If we want a total count for the whole search (not the current page count), we could add a partOf section. |
|
PR Topic 2 identifiers Current status This is the current format of local_identifiers we agreed for the SKG-IF OpenAPI. see : https://elements-demo.stoplight.io/?spec=https://w3id.org/skg-if/api/skg-if-openapi.yaml
All examples have this unique generic format in the OpenAPI documentation.
example: In the SKG-IF OpenAPI the local_identifier is here to loop back on the entities with OpenAPI compatible URLs, that was deep discussion we had with Menzo @menzowindhouwer Discussion
At the moment : https://github.com/skg-if/examples/blob/main/OpenCitations/oc_1.jsonld is compatible with the model, but it is not a SKG-IF OpenAPI output. It is a just static file on github and it is not a response to any SKG-IF specific endpoint (an OpenAPI endpoint defines a URL enpoint + query params + output format). We have endpoints as defined above for each entity type ( products, grants etc... ) "local_identifier": "https://w3id.org/oc/meta/br/062501777134" is not compatible with the OpenAPI it resolves to an html page, and you don't have a generic process to get its data via :
I am not sure allowing/suggesting external URIs make sense. "non RDF" client won't be able to ingest/interpret them. As explained before, it is allowed in pure RDF, but gets very confusing in the REST OpenAPI for client applications. In other words : the OpenAPI is only relying on local SKG entities that can be exposed by the OpenAPI it self. It removes the RDF Open World Assumption . In RDF you are free to put anything in an id without any guarantee it resolves, which is really uncommon for most REST API developers. |
1. Full URL — any URL, used as-is as the entity identifier (e.g. a server API URL, a DOI, a ROR URL, or an SKG-IF sandbox URL) 2. Plain string — resolved by prepending the `@base` from the JSON-LD preamble; the framework prescribes `https://w3id.org/skg-if/sandbox/<provider-acronym>/` as the `@base` for entities that have no independently dereferenceable identifier of their own, producing a *sandbox URL* — but any `@base` value is valid 3. On-the-fly — plain string using the template `otf___<session-id>___<identifier-string>`, also resolved via `@base`, for identifiers created on-the-fly during document generation
|
as we discussed today. these proposals make the API align with what is currently in the specs and examples. the current restriction of the local_identifier format by the API is i think not needed. we should however try to avoid using very long identifiers ie. concatenating server URL + sandbox type identifier URL. lets think of a few workflow examples with where this should occur. |
|
Ok as discussed this morning with Daan. What would be option to resolve them ?
Option 3 could work. |
|
HI @rduyme, My two cents here. My point, in short, is that SKG-IF and its API must be aligned. Thus, I believe that the real answer is option 1. The SKG-IF says explicitly that:
Thus, I think that requiring the source to use a URL template that differs from what they may have already used is a major breaking point, because it forces a source to change all its entire LOD-oriented logic (already implemented and shared) to make it compatible with SKG-IF. In addition, just to mention, also Crossref uses the doi.org URL for referring to its resources internally, and thus, it is not compatible with the constrained local identifier proposed currently in the API. Same thing with ORCID. And this problem will apply to any source that already exposes data as LOD. I do not think forcing anything here is an added value; it may put adoption at risk. In addition, being part of the same specification, there is a strong need that the SKG-IF API follows the SKG-IF (data model, ontology, etc.); otherwise, there is a huge risk to expose SKG-IF compliant data in dumps that differs with those returned by the SKG-IF API, and I think this is not ideal, honestly. Of course, if a source wants, the URL can be constructed as suggested currently, but that should not be mandatory for all adopters. Have a nice day :-) S. |
|
Thanks Silvio @essepuntato Option 1 review If we want to be clean on option 1, it would then mean changing a bit our approach, forcing user to use the w3id.org if they don't have reliable ids as URLs. ie: no local_identifier like :
No usage of /resolve endpoint then. We see this URL as id compatibility approach :
|
|
Thanks for responding Renaud, Silvio Option 1 Full URL is included in the pull request, so i am fine with that. But should it be a Full URL with exclusion of other possibilities? Maybe not.
I am not shocked by "https://example.com/skg-if/api/products/https://w3id.org/oc/meta/br/062501777134" If external URIs make sense is a matter for the source and its intended clients, having an agreed common resolving method (option 2 and 3) is an additional constraint, and could be a recommendation in stead of a hard requirement. |
|
OK let's go for option 1 I will include your changes. I will also :
Other remarks:
|
|
Option 1 sounds good to me as it seems to align with what I've done in CESSDA's staging endpoint so far for
It just takes the actual identifier at the end of the I've been wondering how to do it for entities that don't always have any identifier but I guess some id has to be generated then, e.g. for Currently the staging endpoint contains a lot of otf identifiers for other entities referenced in Edit: After reading the other issue (#39) I think I understand better what the intention with |
-added a number of new example files for testing the cross-entity referencing vs embedded (light) entities -modified app.py (FastAPI) to support full URL local_identifier handling, and handling expansion of cross-reffed entities when the expand=true fkag is set


Several cross-reference properties (references to other entities) are documented as accepting plain identifier strings (e.g.
"org_1","ven1") but the schema defined them as$refobject types only, causing validators to reject data the docsexplicitly show as valid.
Changes from bare
$ref/oneOftoanyOf: [string | $ref]for:topics[].term,relevant_organisations[],funding[],contributions[].declared_affiliations[],contributions[].by,manifestations[].biblio.in,manifestations[].biblio.hosting_data_sourceaffiliations[].affiliationbeneficiaries[],contributions[].declared_affiliations[],contributions[].by,funding_agencyAPI responses may still return expanded objects.