connect ZMT catalogue as ODIS node #276

jmckenna · 2023-07-11T15:56:57Z

Summary:

existing PANGAEA catalogue
ZMT team are mapping metadata properties through the ODIS Book examples
possible JSON-LD templates to also use:
- minimal Dataset template
- thorough Dataset template
- event time-series example (this was created by another partner)
ZMT team will inform OIH team when sitemap.xml or JSON-LD is ready
ZMT team should also create an entry inside the ODIS Catalogue
- important fields are Startpoint URL for ODIS-Arch (the url to your sitemap) and Type of the ODIS-Arch URL (choose "sitemap")

This issue will allow questions, updates, and discussions by both teams.

The text was updated successfully, but these errors were encountered:

acwittmann · 2023-07-14T10:00:52Z

Hi @jmckenna @fils @pbuttigieg @fspreck
my colleague @uschindler from PANGAEA tested implementing "Event" for the JSON-LDs of PANGAEA datasets as in the ODIS Book example, see
https://doi.pangaea.de/10.1594/PANGAEA.948712?format=metadata_jsonld&incubation=true
It seems Google Search is quite particular when it comes to using the term "Event", as he promptly received the following error message. We may have to stick with working with temporal and spatial coverage, unless we (ZMT & OIH) do not need to worry about Google.

Problems of type "Structured Data Events" detected on doi.pangaea.de

To the owner of doi.pangaea.de:

The Search Console has identified that your website is affected by 13 problem(s) of type "Structured Data Events". The following problems have been found on your website. We recommend addressing these issues, if possible, to ensure optimal functioning and high visibility in Google search results.
Most common critical issues*

Missing "startDate" field

Missing "location" field

*Critical issues prevent a page or feature from appearing in search results.
Most common non-critical issues‡

Missing "offers" field

Missing "performer" field

Missing "eventAttendanceMode" field

Missing "eventStatus" field

Missing "image" field

‡Non-critical issues are suggestions for improvement. They do not prevent a page or feature from appearing in Google search results. Some non-critical issues may negatively impact the display in search results, while others may be escalated to critical issues later on.

uschindler · 2023-07-14T12:13:57Z

To add more information: The problem comes from the "Event" in english language having more than one meaning, in Schema.org used as the German word "Veranstaltung" (artistical event) not as abstract "Ereignis" (generic event like in PANGAEA).

The problem with Google interpreting the "subjectOf" relation is that the dataset is now linked to an artistic event. Google extracts from the datasets multiple events and also wants to publish them separately to the dataset as "artistic event", so at end it will work like "User searches for movie name" and google presents events related to that. They extract all events from a given page (in our case a dataset) because in most cases cinema homepages have a list of events for a specific cinema hall, so for datasets they also expect multiple events as separate entities.

As PANGAEA wants to prevent that its events are shown as artistic events in google search we have to stop adding events to schema.org, as it is the wrong entity type.

P.S.: I am in contact with Natasha Noy regarding this.

TimmFitschen · 2023-09-20T07:34:25Z

@jmckenna Hi, I am preparing the sitemap and json+ld resources. The documentation says that the crawlers expect a script tag inside of a html document.

<script type="application/ld+json">JSON_LD content</script>

Is it possible to direct the crawler to a json+ld file directly? I mean, I know how to do this in the sitemap. The question is, rather, will the crawler accept that as well? Or do we need the "detour" via the html document?

jmckenna · 2023-09-26T19:03:24Z

@TimmFitschen if you're asking just about Google and other search engines, they expect the JSON-LD to be inline only (see related StackExchange thread)>. But I believe ODIS itself will accept it (@fils can you confirm?).

uschindler · 2023-09-26T20:20:31Z

Hi,
Yes the source must be inside the script tag and therefore in the html. Technically it would be correct to add a href attribute to link to an url. This would be better for mobile browsers, as the transfer size gets smaller, but according to documentation this is not allowed.

I have contact to Google, maybe there's a change. An easy way is to simply test it. After setting it up with a href Link you can run the Google structured data analyzer.

P.S.: PANGAEA also delivers the Schema.org, when you do a content negotiation on landing page using accept header (see signposting.org). The FUJI fair checker also uses content negotiation, if available.

jmckenna · 2023-09-27T11:52:05Z

@uschindler interesting, I'm curious of Google's updated view on this, keep us posted.

fspreck-indiscale · 2023-11-02T08:03:20Z

Hi,

we now have the sitemap with all public datasets online. Is it possible to do a crawler test run before we enter it to the ODIS catalogue?

fspreck-indiscale · 2023-11-15T16:16:06Z

Hi @jmckenna, we updated our Jsons according to the things we discussed last time. Can you run your tests again against the sitemap and check whether

@id looks fine
publisher is now the correct property and located correctly within the json
spatialCoverage works in form of an array of Places each specified by a GeoCoordinates object instead of the boxes we had before

Thank you!

jmckenna · 2023-11-27T20:24:00Z

updates since today's meeting:

we now handle the type:GeoCoordinates as points in the ODIS front-end spatial search (see screen capture below of the ZMT spatial records)

42 records are missing a spatialCoverage such as https://dataportal.leibniz-zmt.de/Entity/19378

uschindler · 2023-11-27T20:40:38Z

Hi,

42 records are missing a spatialCoverage such as https://dataportal.leibniz-zmt.de/Entity/19378

This PANGAEA one has no spatial coverage. That's not an issue in your portal.

jmckenna · 2023-11-27T20:50:12Z

@uschindler today in the meeting I had mentioned that some records in the ZMT sitemap do not have spatialCoverage and the reaction from the ZMT team I believe was "all records should have spatialCoverage", so, I am not understanding your response.

uschindler · 2023-11-27T21:51:28Z

@uschindler today in the meeting I had mentioned that some records in the ZMT sitemap do not have spatialCoverage and the reaction from the ZMT team I believe was "all records should have spatialCoverage", so, I am not understanding your response.

The problem is that the link posted is about data harvested from PANGAEA: https://dataportal.leibniz-zmt.de/Entity/19378; this entry refers to this PANGAEA dataset: https://doi.org/10.1594/PANGAEA.890177

This one has no spatial coverage and will never have one. It is correct. If you harvest PANGAEA, you have to live with the fact that datasets may not have a coverage. I won't try to explain this here why there's no coverage available, but in short: it is not mandatory and for this dataset there's no way to provide a coverage. It has none.

fspreck-indiscale · 2023-11-28T09:44:03Z

@jmckenna @uschindler, sorry that was too bold a claim, then. And it will be even more so in the future, unfortunately, once we included more non-PANGAEA dataset in the portal -- they will most probably not have geo information at all.

fspreck-indiscale · 2023-11-28T09:55:48Z

@jmckenna How does your frontend treat entries like https://dataportal.leibniz-zmt.de/Entity/18288 (view-source:https://dataportal.leibniz-zmt.de/oih/dataset_18288.html for the json, respectively) where we have an array of places in the spatial coverage? There should be a lot more points than datasets if you show the full array on your map (~900 locations vs ~150 datasets).

jmckenna · 2023-11-28T13:48:29Z

@fspreck-indiscale good point, we don't handle a list of geocoordinates yet, but we should. (we only use the first point) Thanks for reporting this.

fspreck-indiscale · 2023-12-01T14:12:39Z

@jmckenna Hi, I just updated the JSONs again; they now have sdPublisher and creditText.

jmckenna · 2023-12-01T14:39:28Z

thanks @fspreck-indiscale, will do another harvest here...

fspreck-indiscale · 2023-12-12T15:45:16Z

@jmckenna We added keywords (simple array of strings for now) to some of the datasets; do they look good after harvesting?

The schema.org validator passes.

jmckenna · 2024-01-22T18:57:10Z

updates from meeting on 2024-01-15:

ODISCat entry made: https://catalogue.odis.org/view/3289
keywords were fixed
preference for frequency of harvesting into ODIS: monthly, as specified in the sitemap

jmckenna · 2024-04-05T14:52:03Z

@fspreck-indiscale thanks for updating the keywords syntax. I notice that some have odd characters inside the JSON-LD, such as this record:

landing page: https://dataportal.leibniz-zmt.de/Entity/19689
JSON-LD: https://dataportal.leibniz-zmt.de/oih/dataset_19689.html

 "keywords": [
  "coral climatology",
  "oxygen isotope",
  "trace elements ratio",
  "\u03b418Oseawater"
 ],

fspreck-indiscale · 2024-04-05T15:41:17Z

Hi @jmckenna, good point, we've not considered these characters so far. Escaping non-ASCI is the safe default of the exporter but by no means is it necessary (on the landing page, it's UTF8 δ). May we use UTF8 strings in the JSON-LD?

jmckenna · 2024-04-05T16:48:45Z

Hi @fspreck-indiscale, in fact on the ODIS search front-end it appears as follows, so I think it is OK to use these unicode characters. (does that keyword look ok here in this screen capture to you?)

jmckenna · 2024-04-05T20:08:52Z

@fspreck-indiscale the ZMT records (201) are now on the production server ( https://oceaninfohub.org/ ).

There is an issue however, on our side: the "Provider" facet lists 2 different providers for your records: "Leibniz Centre for Tropical Marine Research, Bremen, Germany" and then "Leibniz Center for Tropical Marine Research (ZMT)" (the second one comes from the name in the ODIS config). It seems the provider name in the JSON-LD and the prov:wasAttributedTo name are both being used here for some reason (again, this is a problem on our front-end/indexing side).

Here is the harvested JSON-LD example: https://api.search.oceaninfohub.org/source?id=https%3A%2F%2Fdataportal.leibniz-zmt.de%2Foih%2Fdataset_19754.html&_gl=1*1qbvbqk*_ga*NjkyMjg3NDkwLjE3MTIzNDMxMjM.*_ga_MQDK6BB0YQ*MTcxMjM0MzEyMy4xLjEuMTcxMjM0NzExNC4wLjAuMA..*_ga_QJ5XJMZFXW*MTcxMjM0MzEzNi4xLjEuMTcxMjM0NzExNC4wLjAuMA..

@pbuttigieg @fils can you see the source of the problem here?

jmckenna · 2024-04-05T21:11:48Z

More info: the records harvested inside Solr (search index) contain only one provider:

"txt_provider":["Leibniz Centre for Tropical Marine Research, Bremen, Germany"]

This is puzzling.

jmckenna · 2024-04-05T21:26:56Z

Ah, it could be that no other partner is setting "provider" to themself. CIOOS uses "provider" to point to their regional partner who 'provides' the catalogue (such as CIOOS-Atlantic, or CIOOS-Pacific).

Example JSON-LD for CIOOS record: https://api.search.oceaninfohub.org/source?id=https%3A%2F%2Fcatalogue.cioos.ca%2Fdataset%2F777530f0-adaf-4ddb-86bb-6f1269dcb259.jsonld&_gl=1*15ety5n*_ga*MTQ3NjY4NzQyNy4xNzEyMzIxMTE5*_ga_MQDK6BB0YQ*MTcxMjM1MTk2Mi4zLjEuMTcxMjM1MjI2Ni4wLjAuMA..*_ga_QJ5XJMZFXW*MTcxMjM1MTk3MS4zLjEuMTcxMjM1MjI2Ni4wLjAuMA..

I'd need @pbuttigieg @fils to clarify how we should assume the correct use of "provider" is.

Maybe we should setup another ZMT-ODIS technical meeting in the next 2 weeks, to examine this together.

pbuttigieg · 2024-04-07T10:35:56Z

From the node perspective, the provider should be the entity that provided them with the data that the JSON-LD record is about.

they have sdPublisher for identifying the entity that created the JSON-LD

jmckenna added the partners label Jul 17, 2023

jmckenna mentioned this issue Nov 27, 2023

handle type:GeoCoordinates in JSON-LD iodepo/oih-ui#108

Merged

This was referenced Nov 28, 2023

handle creditText in JSON-LD iodepo/oih-ui#109

Merged

handle list of GeoCoordinates in JSON-LD iodepo/oih-ui#110

Open

This was referenced Jan 22, 2024

use <changefreq> setting inside sitemap for harvesting frequency #386

Open

Depth representation conventions #377

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

connect ZMT catalogue as ODIS node #276

connect ZMT catalogue as ODIS node #276

jmckenna commented Jul 11, 2023 •

edited

Loading

acwittmann commented Jul 14, 2023

uschindler commented Jul 14, 2023

TimmFitschen commented Sep 20, 2023 •

edited

Loading

jmckenna commented Sep 26, 2023

uschindler commented Sep 26, 2023

jmckenna commented Sep 27, 2023

fspreck-indiscale commented Nov 2, 2023

fspreck-indiscale commented Nov 15, 2023

jmckenna commented Nov 27, 2023

uschindler commented Nov 27, 2023

jmckenna commented Nov 27, 2023 •

edited

Loading

uschindler commented Nov 27, 2023 •

edited

Loading

fspreck-indiscale commented Nov 28, 2023

fspreck-indiscale commented Nov 28, 2023

jmckenna commented Nov 28, 2023

fspreck-indiscale commented Dec 1, 2023

jmckenna commented Dec 1, 2023

fspreck-indiscale commented Dec 12, 2023

jmckenna commented Jan 22, 2024

jmckenna commented Apr 5, 2024 •

edited

Loading

fspreck-indiscale commented Apr 5, 2024

jmckenna commented Apr 5, 2024

jmckenna commented Apr 5, 2024 •

edited

Loading

jmckenna commented Apr 5, 2024

jmckenna commented Apr 5, 2024 •

edited

Loading

pbuttigieg commented Apr 7, 2024

connect ZMT catalogue as ODIS node #276

connect ZMT catalogue as ODIS node #276

Comments

jmckenna commented Jul 11, 2023 • edited Loading

acwittmann commented Jul 14, 2023

uschindler commented Jul 14, 2023

TimmFitschen commented Sep 20, 2023 • edited Loading

jmckenna commented Sep 26, 2023

uschindler commented Sep 26, 2023

jmckenna commented Sep 27, 2023

fspreck-indiscale commented Nov 2, 2023

fspreck-indiscale commented Nov 15, 2023

jmckenna commented Nov 27, 2023

uschindler commented Nov 27, 2023

jmckenna commented Nov 27, 2023 • edited Loading

uschindler commented Nov 27, 2023 • edited Loading

fspreck-indiscale commented Nov 28, 2023

fspreck-indiscale commented Nov 28, 2023

jmckenna commented Nov 28, 2023

fspreck-indiscale commented Dec 1, 2023

jmckenna commented Dec 1, 2023

fspreck-indiscale commented Dec 12, 2023

jmckenna commented Jan 22, 2024

jmckenna commented Apr 5, 2024 • edited Loading

fspreck-indiscale commented Apr 5, 2024

jmckenna commented Apr 5, 2024

jmckenna commented Apr 5, 2024 • edited Loading

jmckenna commented Apr 5, 2024

jmckenna commented Apr 5, 2024 • edited Loading

pbuttigieg commented Apr 7, 2024

jmckenna commented Jul 11, 2023 •

edited

Loading

TimmFitschen commented Sep 20, 2023 •

edited

Loading

jmckenna commented Nov 27, 2023 •

edited

Loading

uschindler commented Nov 27, 2023 •

edited

Loading

jmckenna commented Apr 5, 2024 •

edited

Loading

jmckenna commented Apr 5, 2024 •

edited

Loading

jmckenna commented Apr 5, 2024 •

edited

Loading