Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSW mappings to capture persistent URL and licences for GA #2026

Merged
merged 15 commits into from Feb 12, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGES.md
Expand Up @@ -39,6 +39,7 @@ Connectors:

- Made the CSV connector put the description column in the distribution description, not just the dataset one
- Fixed: CSW connector does not capture all distributions for some datasources (e.g. TERN)
- Fixed: CSW connector does not capture persistent URL or licences for some sources (eg. geoscience australia)

Interoperability:

Expand Down
3 changes: 2 additions & 1 deletion deploy/helm/magda-dev.yml
Expand Up @@ -195,7 +195,8 @@ connectors:
name: magda-csw-connector
id: ga
name: Geoscience Australia
sourceUrl: http://www.ga.gov.au/geonetwork/srv/en/csw
sourceUrl: https://ecat.ga.gov.au/geonetwork/srv/eng/csw
outputSchema: http://standards.iso.org/iso/19115/-3/mdb/1.0
pageSize: 100
- image:
name: magda-project-open-data-connector
Expand Down
12 changes: 12 additions & 0 deletions docs/docs/connector-howto.md
@@ -0,0 +1,12 @@
#How to design and maintain connectors for external systems

Connectors are responsible for fetching metadata from external systems and converting their attributes
into those represented by "aspects" in the MAGDA system.

Most of the existing connectors are written in Javascript and are inherited from the [JSON Connector](https://github.com/magda-io/magda/blob/master/magda-typescript-common/src/JsonConnector.ts)
and [JSON Transformer](https://github.com/magda-io/magda/blob/master/magda-typescript-common/src/Transformer.ts) base implementations

If the system you are working with does not use JSON ie. XML, it typical to convert to a JSON representation first.

When developing a new connector, it is useful to save some samples of the source system and implement a [connector test](https://github.com/magda-io/magda/blob/master/magda-typescript-common/src/test/connectors/runConnectorTest.ts)
Each aspect-template can then be tested and debugged using the ["debugger;" javascript statement for inline/eval script debugging](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/debugger).
1 change: 1 addition & 0 deletions docs/docs/index.md
Expand Up @@ -3,6 +3,7 @@
- [How to use APIs](/docs/using-api)
- [How to document APIs](/docs/api-documentation-howto)
- [How to build and run](/docs/building-and-running)
- [How to design and maintain connectors for external systems](/docs/connector-howto)
- [How to deploy an HA, production deployment on GKE](/docs/deploying-for-production-on-gke)
- [Ports used when running locally](/docs/local-ports)
- [Regression test](/docs/regression-test)
Expand Down
2 changes: 1 addition & 1 deletion magda-ckan-connector/Dockerfile
@@ -1,4 +1,4 @@
FROM node:6
FROM node:8

RUN mkdir -p /usr/src/app
COPY . /usr/src/app
Expand Down
2 changes: 1 addition & 1 deletion magda-csv-connector/Dockerfile
@@ -1,4 +1,4 @@
FROM node:6
FROM node:8

RUN mkdir -p /usr/src/app
COPY . /usr/src/app
Expand Down
2 changes: 1 addition & 1 deletion magda-csw-connector/Dockerfile
@@ -1,4 +1,4 @@
FROM node:6
FROM node:8

RUN mkdir -p /usr/src/app
COPY . /usr/src/app
Expand Down
20 changes: 18 additions & 2 deletions magda-csw-connector/aspect-templates/dataset-source.js
@@ -1,14 +1,30 @@
const csw = libraries.csw;
const jsonpath = libraries.jsonpath;

const identifier = jsonpath.value(
const urnIdentifier = jsonpath.value(
dataset.json,
"$..MD_Identifier[?(@.codeSpace[0].CharacterString[0]._=='urn:uuid')].code.._"
);

const gaDataSetURI = jsonpath.value(
jsonpath.nodes(
dataset.json,
"$..MD_Identifier[?(@.codeSpace[0].CharacterString[0]._=='ga-dataSetURI')]"
),
"$.._"
);

const fileIdentifier = jsonpath.value(
dataset.json,
"$.fileIdentifier[*].CharacterString[*]._"
);

return {
type: "csw-dataset",
url: csw.getRecordByIdUrl(identifier),
url:
gaDataSetURI ||
csw.getRecordByIdUrl(fileIdentifier) ||
csw.getRecordByIdUrl(urnIdentifier),
id: csw.id,
name: csw.name
};
54 changes: 35 additions & 19 deletions magda-csw-connector/aspect-templates/dcat-dataset-strings.js
Expand Up @@ -40,8 +40,18 @@ const modifiedDate =

const extent = jsonpath.query(identification, "$[*].extent[*].EX_Extent[*]");

const responsibleParties = libraries.cswFuncs.getResponsibleParties(dataset);

const datasetContactPoint = getContactPoint(
jsonpath.query(dataset.json, "$.contact[*].CI_ResponsibleParty[*]"),
jsonpath
.nodes(dataset.json, "$..CI_ResponsibleParty[*]")
.concat(
jsonpath.nodes(
dataset.json,
"$..CI_Responsibility[?(@.role[0].CI_RoleCode)]"
)
)
.map(x => x.value),
true
);
const identificationContactPoint = getContactPoint(
Expand All @@ -67,16 +77,26 @@ const pointOfTruth = distNodes.filter(
"Point of truth URL of this metadata record"
);

const responsibleParties = jsonpath.query(
dataset.json,
"$..CI_ResponsibleParty[*]"
const publisher = libraries.cswFuncs.getOrganisationNameFromResponsibleParties(
libraries.cswFuncs.getPublishersFromResponsibleParties(responsibleParties)
);
const byRole = libraries.lodash.groupBy(responsibleParties, party =>
jsonpath.value(party, '$.role[*].CI_RoleCode[*]["$"].codeListValue.value')

const urnIdentifier = jsonpath.value(
dataset.json,
"$..MD_Identifier[?(@.codeSpace[0].CharacterString[0]._=='urn:uuid')].code.._"
);
const datasetOrgs = byRole.publisher || byRole.owner || byRole.custodian || [];
const publisher = getContactPoint(datasetOrgs, false);

const gaDataSetURI = jsonpath.value(
jsonpath.nodes(
dataset.json,
"$..MD_Identifier[?(@.codeSpace[0].CharacterString[0]._=='ga-dataSetURI')]"
),
"$.._"
);
const fileIdentifier = jsonpath.value(
dataset.json,
"$.fileIdentifier[*].CharacterString[*]._"
);
return {
title: jsonpath.value(citation, "$[*].title[*].CharacterString[*]._"),
description: jsonpath.value(
Expand All @@ -94,7 +114,7 @@ return {
)
)
.filter((item, index, array) => array.indexOf(item) === index),
publisher: publisher,
publisher: publisher ? publisher : "",
accrualPeriodicity: jsonpath.value(
identification,
'$[*].resourceMaintenance[*].MD_MaintenanceInformation[*].maintenanceAndUpdateFrequency[*].MD_MaintenanceFrequencyCode[*]["$"].codeListValue.value'
Expand All @@ -120,7 +140,8 @@ return {
"$[*].descriptiveKeywords[*].MD_Keywords[*].keyword[*].CharacterString[*]._"
),
contactPoint: contactPoint,
landingPage: jsonpath.value(pointOfTruth, "$[*].linkage[*].URL[*]._")
landingPage:
jsonpath.value(pointOfTruth, "$[*].linkage[*].URL[*]._") || gaDataSetURI
};

function findDatesWithType(dates, type) {
Expand Down Expand Up @@ -211,29 +232,24 @@ function getContactPoint(responsibleParties, preferIndividual) {

const contactInfo = jsonpath.query(
responsibleParties,
"$[*].contactInfo[*].CI_Contact[*]"
"$..contactInfo[*].CI_Contact[*]"
);
const individual = jsonpath.value(
responsibleParties,
"$[*].individualName[*].CharacterString[*]._"
);
const organisation = jsonpath.value(
responsibleParties,
"$[*].organisationName[*].CharacterString[*]._"
const organisation = libraries.cswFuncs.getOrganisationNameFromResponsibleParties(
responsibleParties
);
const homepage = jsonpath.value(
contactInfo,
"$[*].onlineResource[*].CI_OnlineResource[*].linkage[*].URL[*]._"
);
const address = jsonpath.query(
contactInfo,
"$[*].address[*].CI_Address[*]"
);
const address = jsonpath.query(contactInfo, "$..address[*].CI_Address[*]");
const emailAddress = jsonpath.value(
address,
"$[*].electronicMailAddress[*].CharacterString[*]._"
);

const name = preferIndividual
? individual || organisation
: organisation || individual;
Expand Down
52 changes: 50 additions & 2 deletions magda-csw-connector/aspect-templates/dcat-distribution-strings.js
Expand Up @@ -20,10 +20,55 @@ const constraints = jsonpath.query(
);
const licenseName = jsonpath.value(constraints, "$[*].licenseName[*]._");
const licenseUrl = jsonpath.value(constraints, "$[*].licenseLink[*]._");
const license =
let license =
licenseName || licenseUrl
? [licenseName, licenseUrl].filter(item => item !== undefined).join(" ")
: undefined;
if (!license) {
const legalConstraints = jsonpath
.nodes(dataset.json, "$..MD_LegalConstraints[*]")
.map(node => {
return {
...node,
title:
jsonpath.value(node, "$..title[*].CharacterString[*]._") ||
jsonpath.value(
node,
"$..otherConstraints[*].CharacterString[*]._"
),
codeListValue: jsonpath.value(node, "$..MD_RestrictionCode[0]")
? jsonpath.value(node, "$..MD_RestrictionCode[0]").$
.codeListValue.value
: undefined
};
});
// try looking for just creative commons licences
license = legalConstraints
.filter(
lc =>
lc.codeListValue == "license" &&
lc.title &&
lc.title.search(
/Creative Commons|CC |BY|Attribution|creativecommons/
)
)
.map(lc => {
return lc.title;
})
.join(" ");

if (!license) {
license = legalConstraints
.filter(lc => lc.codeListValue == "license" && lc.title)
.map(lc => {
return lc.title;
})
.join(" ");
}
if (license.length === 0) {
license = undefined;
}
}
const rights = jsonpath.value(
constraints,
"$[*].MD_LegalConstraints[*].useLimitation[*].CharacterString[*]._"
Expand All @@ -34,7 +79,10 @@ const description = jsonpath.value(
distribution,
"$.description[*].CharacterString[*]._"
);
const url = jsonpath.value(distribution, "$.linkage[*].URL[*]._");
const url =
jsonpath.value(distribution, "$.linkage[" + "*].URL[*]._") ||
jsonpath.value(distribution, "$.linkage[" + "*].CharacterString[*]._");

let format = jsonpath.value(distribution, "$.protocol[*].CharacterString[*]._");

if (!format) {
Expand Down
20 changes: 18 additions & 2 deletions magda-csw-connector/aspect-templates/distribution-source.js
@@ -1,14 +1,30 @@
const csw = libraries.csw;
const jsonpath = libraries.jsonpath;

const identifier = jsonpath.value(
const urnIdentifier = jsonpath.value(
dataset.json,
"$..MD_Identifier[?(@.codeSpace[0].CharacterString[0]._=='urn:uuid')].code.._"
);

const gaDataSetURI = jsonpath.value(
jsonpath.nodes(
dataset.json,
"$..MD_Identifier[?(@.codeSpace[0].CharacterString[0]._=='ga-dataSetURI')]"
),
"$.._"
);

const fileIdentifier = jsonpath.value(
dataset.json,
"$.fileIdentifier[*].CharacterString[*]._"
);

return {
type: "csw-distribution",
url: csw.getRecordByIdUrl(identifier),
url:
gaDataSetURI ||
csw.getRecordByIdUrl(fileIdentifier) ||
csw.getRecordByIdUrl(urnIdentifier),
id: csw.id,
name: csw.name
};
30 changes: 18 additions & 12 deletions magda-csw-connector/aspect-templates/organization-details.js
Expand Up @@ -2,42 +2,48 @@ const cleanOrgTitle = libraries.cleanOrgTitle;

const name = transformer.getNameFromJsonOrganization(organization);
const jsonpath = libraries.jsonpath;
const phone = jsonpath.value(
organization,
"$.contactInfo[*].CI_Contact[*].phone[*].CI_Telephone[*].voice[0].CharacterString[0]._"
);
const phone =
jsonpath.value(
organization,
"$..contactInfo[*].CI_Contact[*].phone[*].CI_Telephone[*].voice[0].CharacterString[0]._"
) ||
jsonpath.value(
organization,
"$..contactInfo[*].CI_Contact[*].phone[*].CI_Telephone[?(@.numberType[0].CI_TelephoneTypeCode[0][\"$\"].codeListValue.value=='voice')].number[0].CharacterString[0]._"
);

const website = jsonpath.value(
organization,
"$.contactInfo[*].CI_Contact[*].onlineResource[*].CI_OnlineResource[*].linkage[*].URL[0]._"
"$..contactInfo[*].CI_Contact[*].onlineResource[*].CI_OnlineResource[*].linkage[*].URL[0]._"
);
const email = jsonpath.value(
organization,
"$.contactInfo[*].CI_Contact[*].address[*].CI_Address[*].electronicMailAddress[*].CharacterString[0]._"
"$..contactInfo[*].CI_Contact[*].address[*].CI_Address[*].electronicMailAddress[*].CharacterString[0]._"
);
const addrStreet = jsonpath.value(
organization,
"$.contactInfo[*].CI_Contact[*].address[*].CI_Address[*].deliveryPoint[*].CharacterString[0]._"
"$..contactInfo[*].CI_Contact[*].address[*].CI_Address[*].deliveryPoint[*].CharacterString[0]._"
);
const addrSuburb = jsonpath.value(
organization,
"$.contactInfo[*].CI_Contact[*].address[*].CI_Address[*].city[*].CharacterString[0]._"
"$..contactInfo[*].CI_Contact[*].address[*].CI_Address[*].city[*].CharacterString[0]._"
);
const addrState = jsonpath.value(
organization,
"$.contactInfo[*].CI_Contact[*].address[*].CI_Address[*].administrativeArea[*].CharacterString[0]._"
"$..contactInfo[*].CI_Contact[*].address[*].CI_Address[*].administrativeArea[*].CharacterString[0]._"
);
const addrPostCode = jsonpath.value(
organization,
"$.contactInfo[*].CI_Contact[*].address[*].CI_Address[*].postalCode[*].CharacterString[0]._"
"$..contactInfo[*].CI_Contact[*].address[*].CI_Address[*].postalCode[*].CharacterString[0]._"
);
let addrCountry = jsonpath.value(
organization,
"$.contactInfo[*].CI_Contact[*].address[*].CI_Address[*].country[*].CharacterString[0]._"
"$..contactInfo[*].CI_Contact[*].address[*].CI_Address[*].country[*].CharacterString[0]._"
);
if (!addrCountry) {
addrCountry = jsonpath.value(
organization,
"$.contactInfo[*].CI_Contact[*].address[*].CI_Address[*].country[*].Country[0]._"
"$..contactInfo[*].CI_Contact[*].address[*].CI_Address[*].country[*].Country[0]._"
);
}

Expand Down