Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API / CSV export / Add support for custom export. #7132

Merged
merged 3 commits into from Aug 24, 2023
Merged

Conversation

fxprunayre
Copy link
Member

Current CSV export is based on XSL transformations and can be hard to use when user is interested in element with multiple values (eg. online source, contacts).

Default CSV export remains the same but add the possibility to customize the export with 2 additional parameters to create custom export:

  • loopElementXpath: element to loop on eg.

    • use . for the metadata
    • .//gmd:CI_ResponsibleParty for all contacts in ISO19139,
    • .//gmd:transferOptions/*/gmd:onLine/* for all online resources in ISO19139.
  • propertiesXpath: columns to extract eg.

    • gmd:identificationInfo/*/gmd:citation/*/gmd:title//text() for the title.

Also add parameters for defining separators:

  • sep for column separator
  • internalSep when multiple values are stored in a field

User can then build custom reports from the API:

eg. Export Contact with role,org,email
http://localhost:8080/srv/api/records/csv?bucket=s101&loopElementXpath=.//gmd:CI_ResponsibleParty&propertiesXpath=gmd:role/*/@codeListValue&propertiesXpath=gmd:organisationName/*/text()&propertiesXpath=.//gmd:electronicMailAddress/*/text()

eg. Export Online source with protocol,url,name,desc http://localhost:8080/srv/api/records/csv?bucket=s101&loopElementXpath=.//gmd:transferOptions/*/gmd:onLine/*&propertiesXpath=gmd:protocol/*/text()&propertiesXpath=gmd:linkage/*/text()&propertiesXpath=gmd:name/*/text()&propertiesXpath=gmd:description/*/text()

eg. Export Metadata with title,alternateTitle,status,maintenanceFreq,... http://localhost:8080/srv/api/records/csv?bucket=s101&loopElementXpath=.&propertiesXpath=gmd:identificationInfo/*/gmd:citation/*/gmd:title//text()&propertiesXpath=gmd:identificationInfo/*/gmd:citation/*/gmd:alternateTitle//text()&propertiesXpath=gmd:identificationInfo/*/gmd:status/*/@codeListValue&propertiesXpath=gmd:identificationInfo/*//gmd:maintenanceAndUpdateFrequency/*/@codeListValue&propertiesXpath=gmd:identificationInfo/*//gmd:otherConstraints//text()&propertiesXpath=gmd:identificationInfo/*/gmd:topicCategory//text()&propertiesXpath=gmd:identificationInfo/*/gmd:language/*/@codeListValue&propertiesXpath=gmd:identificationInfo/*/gmd:graphicOverview/*/gmd:fileName/*/text()

Functions from XPath v1 can also be used eg. count http://localhost:8080/srv/api/records/csv?bucket=s101&loopElementXpath=.&propertiesXpath=count(gmd:identificationInfo/*/gmd:descriptiveKeywords)

For all exports 2 columns are added first:

  • UUID
  • permalink

When using XPath, it is recommended to export records in same schema (or at least same base schema). If not, then XPath error messages are returned in cells unless XPath provided do not require namespaces. Users have to configure a proper selection to avoid mixing schema.

image

Current CSV export is based on XSL transformations and can be hard to use when user is interested in element with multiple values (eg. online source, contacts).

Default CSV export remains the same but add the possibility to customize the export with 2 additional parameters to create custom export:

* `loopElementXpath`: element to loop on eg. use
`.` for the metadata
`.//gmd:CI_ResponsibleParty` for all contacts in ISO19139,
`.//gmd:transferOptions/*/gmd:onLine/*` for all online resources in ISO19139.

* `propertiesXpath`: columns to extract eg.
`gmd:identificationInfo/*/gmd:citation/*/gmd:title//text()` for the title.

Also add parameters for defining separators:
* `sep` for column separator
* `internalSep` when multiple values are stored in a field

User can then build custom reports from the API:

eg. Export Contact with role,org,email
http://localhost:8080/srv/api/records/csv?bucket=s101&loopElementXpath=.//gmd:CI_ResponsibleParty&propertiesXpath=gmd:role/*/@codeListValue&propertiesXpath=gmd:organisationName/*/text()&propertiesXpath=.//gmd:electronicMailAddress/*/text()

eg. Export Online source with protocol,url,name,desc
http://localhost:8080/srv/api/records/csv?bucket=s101&loopElementXpath=.//gmd:transferOptions/*/gmd:onLine/*&propertiesXpath=gmd:protocol/*/text()&propertiesXpath=gmd:linkage/*/text()&propertiesXpath=gmd:name/*/text()&propertiesXpath=gmd:description/*/text()

eg. Export Metadata with title,alternateTitle,status,maintenanceFreq,...
http://localhost:8080/srv/api/records/csv?bucket=s101&loopElementXpath=.&propertiesXpath=gmd:identificationInfo/*/gmd:citation/*/gmd:title//text()&propertiesXpath=gmd:identificationInfo/*/gmd:citation/*/gmd:alternateTitle//text()&propertiesXpath=gmd:identificationInfo/*/gmd:status/*/@codeListValue&propertiesXpath=gmd:identificationInfo/*//gmd:maintenanceAndUpdateFrequency/*/@codeListValue&propertiesXpath=gmd:identificationInfo/*//gmd:otherConstraints//text()&propertiesXpath=gmd:identificationInfo/*/gmd:topicCategory//text()&propertiesXpath=gmd:identificationInfo/*/gmd:language/*/@codeListValue&propertiesXpath=gmd:identificationInfo/*/gmd:graphicOverview/*/gmd:fileName/*/text()

Functions from XPath v1 can also be used eg. `count`
http://localhost:8080/srv/api/records/csv?bucket=s101&loopElementXpath=.&propertiesXpath=count(gmd:identificationInfo/*/gmd:descriptiveKeywords)

For all exports 2 columns are added first:
* UUID
* permalink

When using XPath, it is recommended to export records in same schema (or at least same base schema). If not, then XPath error messages are returned in cells unless XPath provided do not require namespaces. Users have to configure a proper selection to avoid mixing schema.
@fxprunayre fxprunayre added this to the 4.2.5 milestone Jun 5, 2023
@sonarcloud
Copy link

sonarcloud bot commented Jun 5, 2023

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

0.0% 0.0% Coverage
0.0% 0.0% Duplication

@bernhardreiter
Copy link
Contributor

bernhardreiter commented Jun 7, 2023

The following patch fixes the code smell found by sonarlint (and improves the comment http -> 'https').

diff --git a/services/src/main/java/org/fao/geonet/api/records/CatalogApi.java b/services/src/main/java/org/fao/geonet/api/records/CatalogApi.java
index e935046aad..4034cae6a8 100644
--- a/services/src/main/java/org/fao/geonet/api/records/CatalogApi.java
+++ b/services/src/main/java/org/fao/geonet/api/records/CatalogApi.java
@@ -125,6 +125,7 @@ public class CatalogApi {
             .add("resourceTitleObject.default") // TODOES multilingual
             .add("resourceAbstractObject.default").build();
     }
+    private static final String AMP_ENT ="&";
 
     @Autowired
     DefaultLanguage defaultLanguage;
@@ -780,14 +781,14 @@ public class CatalogApi {
         String previousPage = canonicalURL + "?" + paramsAsString(allRequestParams) + "&from=" + prevFrom + "&to=" + prevTo;
         String nextPage = canonicalURL + "?" + paramsAsString(allRequestParams) + "&from=" + nextFrom + "&to=" + nextTo;
 
-        // Hydra Paging information (see also: http://www.hydra-cg.com/spec/latest/core/)
-        String hydraPagedCollection = "<hydra:PagedCollection xmlns:hydra=\"http://www.w3.org/ns/hydra/core#\" rdf:about=\"" + currentPage.replace("&", "&amp;") + "\">\n" +
+        // Hydra Paging information (see also: https://www.hydra-cg.com/spec/latest/core/)
+        String hydraPagedCollection = "<hydra:PagedCollection xmlns:hydra=\"http://www.w3.org/ns/hydra/core#\" rdf:about=\"" + currentPage.replace("&", AMP_ENT) + "\">\n" +
             "<rdf:type rdf:resource=\"hydra:PartialCollectionView\"/>" +
-            "<hydra:lastPage>" + lastPage.replace("&", "&amp;") + "</hydra:lastPage>\n" +
+            "<hydra:lastPage>" + lastPage.replace("&", AMP_ENT) + "</hydra:lastPage>\n" +
             "<hydra:totalItems rdf:datatype=\"http://www.w3.org/2001/XMLSchema#integer\">" + numberMatched + "</hydra:totalItems>\n" +
-            ((prevFrom <= prevTo && prevFrom < from && prevTo < to) ? "<hydra:previousPage>" + previousPage.replace("&", "&amp;") + "</hydra:previousPage>\n" : "") +
-            ((nextFrom <= nextTo && from < nextFrom && to < nextTo) ? "<hydra:nextPage>" + nextPage.replace("&", "&amp;") + "</hydra:nextPage>\n" : "") +
-            "<hydra:firstPage>" + firstPage.replace("&", "&amp;") + "</hydra:firstPage>\n" +
+            ((prevFrom <= prevTo && prevFrom < from && prevTo < to) ? "<hydra:previousPage>" + previousPage.replace("&", AMP_ENT) + "</hydra:previousPage>\n" : "") +
+            ((nextFrom <= nextTo && from < nextFrom && to < nextTo) ? "<hydra:nextPage>" + nextPage.replace("&", AMP_ENT) + "</hydra:nextPage>\n" : "") +
+            "<hydra:firstPage>" + firstPage.replace("&", AMP_ENT) + "</hydra:firstPage>\n" +
             "<hydra:itemsPerPage rdf:datatype=\"http://www.w3.org/2001/XMLSchema#integer\">" + hitsPerPage + "</hydra:itemsPerPage>\n" +
             "</hydra:PagedCollection>";
         // Construct the RDF output

@fxprunayre fxprunayre modified the milestones: 4.2.5, 4.2.6 Jul 5, 2023
@sonarcloud
Copy link

sonarcloud bot commented Aug 16, 2023

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

0.0% 0.0% Coverage
0.0% 0.0% Duplication

warning The version of Java (11.0.20) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

idea Catch issues before they fail your Quality Gate with our IDE extension sonarlint SonarLint

Copy link
Collaborator

@benoitregamey benoitregamey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested it out, works fine. Very useful for many users !

@josegar74 josegar74 merged commit 51de002 into main Aug 24, 2023
9 of 10 checks passed
@fxprunayre fxprunayre deleted the 425-csvapibyxpath branch September 19, 2023 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants