Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connect MIMS catalogue as ODIS Node #250

Open
jmckenna opened this issue Jun 19, 2023 · 14 comments
Open

connect MIMS catalogue as ODIS Node #250

jmckenna opened this issue Jun 19, 2023 · 14 comments
Labels

Comments

@jmckenna
Copy link
Contributor

jmckenna commented Jun 19, 2023

@marksparkza
Copy link
Contributor

An initial implementation is now complete and scheduled to go live on 8 Nov.

The sitemap.xml index file will be located at:
https://data.ocean.gov.za/mims/catalog/sitemap.xml

Following is an example of a JSON-LD record that will be embedded in the record detail view:

{
    "@context": "https://schema.org/",
    "@type": "Dataset",
    "url": "https://data.ocean.gov.za/mims/catalog/10.15493/DEA.MIMS.01232023",
    "name": "Processed underway Thermosalinograph (TSG) observations from the Integrated Ecosystem Programme: Southern Benguela (IEP:SB) on the Algoa Voyage 279, February 2022",
    "identifier": "doi:10.15493/DEA.MIMS.01232023",
    "license": "https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode",
    "keywords": [
        "Algoa",
        "Algoa 279",
        "SOUTH ATLANTIC OCEAN",
        "THERMOSALINOGRAPH",
        "TSG",
        "physical oceanography"
    ],
    "description": "Here we present the 6-second resolution processed Thermosalinograph (TSG) data collected during the Integrated Ecosystem Programme: Southern Benguela (IEP:SB) cruise on the Algoa Voyage 279 between 04 February and 12 February 2022. A SeaBird SBE45 Thermosalinograph (TSG) is used to opportunistically collect underway near-surface temperature and conductivity measurements during research and monitoring cruises. Water is continuously pumped to the TSG from an intake located in the hull of the vessel, and the observations are continuously interfaced with navigational information. A temperature sensor close to the intake provides temperature measurements of the incoming water (T1). The temperature of the water inside the conductivity cell (T2) is used to accurately compute salinity (S) from the conductivity measurements (C). The IEP:SB in 2013 consolidated a long-term, multi-decadal time-series (from 1951 onward) of information for this important region and has continued monitoring in the form of the IEP:SB. The programme is a multi-disciplinary, collaborative and capacity building platform undertaking relevant science, including updating technology, with the aim to develop ecosystem indicators that can be used to effectively monitor and understand the Southern Benguela. These include physical, chemical, planktonic, microbial, seabird, marine mammal, benthic and pollution (plastic) ecosystem indicators as required by ecosystem-based management regarding the following priorities: ocean warming, ocean acidification, trophic functioning, pollution and water quality. It is on-going monitoring programme."
}

@jmckenna
Copy link
Contributor Author

jmckenna commented Nov 6, 2023

@marksparkza looks good, thanks for this update. Comments:

  • the top-level @id property is also very important to include. So if your new record pages embed the JSON-LD (as @id needs to resolve to the actual JSON-LD), then an example could be:
      "@context": "https://schema.org/",
      "@type": "Dataset",
      "@id": "https://data.ocean.gov.za/mims/catalog/10.15493/DEA.MIMS.01232023",
    
  • For spatial, we recommend using the spatialCoverage property, such as:
    "spatialCoverage": {
        "@type": "Place",
        "geo": {
            "@type": "GeoShape",
            "polygon": "142.014 10.161667,142.014 18.033833,147.997833 18.033833,147.997833 10.161667,142.014 10.161667"
        },

You can use box (bounding box) instead of polygon if you wish.

thanks.

@marksparkza
Copy link
Contributor

@jmckenna Thanks for the feedback and additional info. I'll add in @id and spatialCoverage. Would it be beneficial to also include temporalCoverage?

Should @id be included instead of, or in addition to, url - seeing as they would have the same value?

@jmckenna
Copy link
Contributor Author

jmckenna commented Nov 6, 2023

@marksparkza here are some more comments:

  • please include both @id and url, even if same value
  • strongly encouraged to include the license property for each record (oops you already have it sorry!)
  • if it exists for the dataset, including distribution is very useful to end-users & search engines (the download url for the data), such as
      "distribution": {
          "@type": "DataDownload",
          "contentUrl": "http://urlToDirectDownloadOfThisDataset.org/",
          "encodingFormat": "text/csv"
      },
    
  • temporalCoverage is optional, but see an example here

@marksparkza
Copy link
Contributor

@jmckenna I've included @id in tomorrow's update. spatialCoverage and temporalCoverage will be added in the near future. I'm not sure if we can include distribution. There are some terms-of-use considerations which I will need to discuss with our data curation team. I'll let you know once I have a verdict on this.

@jmckenna
Copy link
Contributor Author

jmckenna commented Nov 7, 2023

@marksparkza ok thanks, will re-harvest tomorrow or Thursday. (on our side, spatialCoverage is very important, as then we can discover your records through a spatial search)

@marksparkza
Copy link
Contributor

@jmckenna The changes are now live 🎉 Please let me know if you encounter any issues when harvesting.

Noted re spatialCoverage. I just wanted to take a bit more time to assess how best to implement this. All our spatial extents are represented as bounding boxes, so at first glance box would be the obvious choice. However the schema.org definition of box is rather vague, so polygon might be the better choice. Either way I want to make sure our implementation is consistent with ODIS and Google expectations.

@marksparkza
Copy link
Contributor

The spatialCoverage example above (from OIH documentation) gives points in lon-lat order.

However, Google says that points must be in lat-lon order.

Science on Schema also says that points must be in lat-lon order.

There is an open issue on schema.org for the ordering of lats and lons in points.

@marksparkza
Copy link
Contributor

On closer inspection, I think the OIH polygon example might be invalid.

Here is schema.org's description of GeoShape, which does suggest lat-lon ordering:

The geographic shape of a place. A GeoShape can be described using several properties whose values are based on latitude/longitude pairs. Either whitespace or commas can be used to separate latitude and longitude; whitespace should be used when writing a list of several such points.

In the OIH example, if the commas are taken to be lat-lon separators as described for GeoShape, and the spaces are taken to be point delimiters, then the first and last terms (142.014 and 10.161667) are invalid. According to Google and Science on Schema, this polygon (if those terms are dropped/ignored) describes a diagonal line between 10N,142E and 18N,148E.

@jmckenna
Copy link
Contributor Author

jmckenna commented Nov 8, 2023

@marksparkza Yes I am very familiar with all of the links and discussion that you are referring to, as I have had to examine and explain this issue to so many OIH partners. Here is some clarification:

  • schema.org expects lat,long coordinate pairs (y,x)
    • this is opposite of most geo software (Shapely, etc.) which expects long,lat (x,y)
  • GeoShape box is recommended, to avoid polygon issues like you mentioned, with a syntax of "box": "miny minx maxy maxx"
                "@type": "GeoShape",
                "box": "46.37 -147.525 54.5617 -125.4467"
    
    • even though, as you know, in all geo software bounding box is always minx, miny, maxx, maxy

It may not answer all of your questions, but these are the guidelines (that now works for both ODIS and Google Dataset Search) that I share with partners (even though, as you said, the schema.org documentation is not clear).

@marksparkza
Copy link
Contributor

@jmckenna Thanks for clarifying the box format. This probably is the better option for us as it maps directly from the bounding boxes in our metadata records.

I've created a PR #364 to address the incorrect formatting of polygon in the OIH examples.

@marksparkza
Copy link
Contributor

@jmckenna Feedback on distribution: our curators suggested that users should rather be redirected back to our catalogue than provided with direct download links.

spatialCoverage and temporalCoverage will be included in the next update on Wednesday.

@marksparkza
Copy link
Contributor

@jmckenna I'm following up to inquire about the status of connecting the MIMS catalogue as an ODIS node. Our MIMS datasets don't seem to be available as yet on OIH, so I was wondering whether anything is still needed from our side in terms of the sitemap or JSON-LD that we are publishing?

jmckenna added a commit that referenced this issue Jun 11, 2024
@jmckenna
Copy link
Contributor Author

@marksparkza did some more testing on your endpoint inside ODIS, here is some feedback:

  • found 1981 records in your sitemap
  • 1407 records were indexed in ODIS as type Dataset
  • records appear well on a dev instance of the ODIS front-end search
    • temporarily you can see them here
    • a Spatial Search shows the geometry correctly

mims-spatial-records

  • missing an ODISCat entry for this endpoint, so I will contact Tshikana, Bubele now

jmckenna added a commit that referenced this issue Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants