-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
connect ZMT catalogue as ODIS node #276
Comments
Hi @jmckenna @fils @pbuttigieg @fspreck Problems of type "Structured Data Events" detected on doi.pangaea.de To the owner of doi.pangaea.de: The Search Console has identified that your website is affected by 13 problem(s) of type "Structured Data Events". The following problems have been found on your website. We recommend addressing these issues, if possible, to ensure optimal functioning and high visibility in Google search results. Missing "startDate" field Missing "location" field *Critical issues prevent a page or feature from appearing in search results. Missing "offers" field Missing "performer" field Missing "eventAttendanceMode" field Missing "eventStatus" field Missing "image" field ‡Non-critical issues are suggestions for improvement. They do not prevent a page or feature from appearing in Google search results. Some non-critical issues may negatively impact the display in search results, while others may be escalated to critical issues later on. |
To add more information: The problem comes from the "Event" in english language having more than one meaning, in Schema.org used as the German word "Veranstaltung" (artistical event) not as abstract "Ereignis" (generic event like in PANGAEA). The problem with Google interpreting the "subjectOf" relation is that the dataset is now linked to an artistic event. Google extracts from the datasets multiple events and also wants to publish them separately to the dataset as "artistic event", so at end it will work like "User searches for movie name" and google presents events related to that. They extract all events from a given page (in our case a dataset) because in most cases cinema homepages have a list of events for a specific cinema hall, so for datasets they also expect multiple events as separate entities. As PANGAEA wants to prevent that its events are shown as artistic events in google search we have to stop adding events to schema.org, as it is the wrong entity type. P.S.: I am in contact with Natasha Noy regarding this. |
@jmckenna Hi, I am preparing the sitemap and json+ld resources. The documentation says that the crawlers expect a script tag inside of a html document. <script type="application/ld+json">JSON_LD content</script> Is it possible to direct the crawler to a json+ld file directly? I mean, I know how to do this in the sitemap. The question is, rather, will the crawler accept that as well? Or do we need the "detour" via the html document? |
@TimmFitschen if you're asking just about Google and other search engines, they expect the JSON-LD to be inline only (see related StackExchange thread)>. But I believe ODIS itself will accept it (@fils can you confirm?). |
Hi, I have contact to Google, maybe there's a change. An easy way is to simply test it. After setting it up with a href Link you can run the Google structured data analyzer. P.S.: PANGAEA also delivers the Schema.org, when you do a content negotiation on landing page using accept header (see signposting.org). The FUJI fair checker also uses content negotiation, if available. |
@uschindler interesting, I'm curious of Google's updated view on this, keep us posted. |
Hi, we now have the sitemap with all public datasets online. Is it possible to do a crawler test run before we enter it to the ODIS catalogue? |
Hi @jmckenna, we updated our Jsons according to the things we discussed last time. Can you run your tests again against the sitemap and check whether
Thank you! |
updates since today's meeting:
|
Hi,
This PANGAEA one has no spatial coverage. That's not an issue in your portal. |
@uschindler today in the meeting I had mentioned that some records in the ZMT sitemap do not have spatialCoverage and the reaction from the ZMT team I believe was "all records should have spatialCoverage", so, I am not understanding your response. |
The problem is that the link posted is about data harvested from PANGAEA: https://dataportal.leibniz-zmt.de/Entity/19378; this entry refers to this PANGAEA dataset: https://doi.org/10.1594/PANGAEA.890177 This one has no spatial coverage and will never have one. It is correct. If you harvest PANGAEA, you have to live with the fact that datasets may not have a coverage. I won't try to explain this here why there's no coverage available, but in short: it is not mandatory and for this dataset there's no way to provide a coverage. It has none. |
@jmckenna @uschindler, sorry that was too bold a claim, then. And it will be even more so in the future, unfortunately, once we included more non-PANGAEA dataset in the portal -- they will most probably not have geo information at all. |
@jmckenna How does your frontend treat entries like https://dataportal.leibniz-zmt.de/Entity/18288 (view-source:https://dataportal.leibniz-zmt.de/oih/dataset_18288.html for the json, respectively) where we have an array of places in the spatial coverage? There should be a lot more points than datasets if you show the full array on your map (~900 locations vs ~150 datasets). |
@fspreck-indiscale good point, we don't handle a list of geocoordinates yet, but we should. (we only use the first point) Thanks for reporting this. |
@jmckenna Hi, I just updated the JSONs again; they now have |
thanks @fspreck-indiscale, will do another harvest here... |
@jmckenna We added keywords (simple array of strings for now) to some of the datasets; do they look good after harvesting? The schema.org validator passes. |
updates from meeting on 2024-01-15:
|
@fspreck-indiscale thanks for updating the keywords syntax. I notice that some have odd characters inside the JSON-LD, such as this record:
"keywords": [
"coral climatology",
"oxygen isotope",
"trace elements ratio",
"\u03b418Oseawater"
], |
Hi @jmckenna, good point, we've not considered these characters so far. Escaping non-ASCI is the safe default of the exporter but by no means is it necessary (on the landing page, it's UTF8 δ). May we use UTF8 strings in the JSON-LD? |
Hi @fspreck-indiscale, in fact on the ODIS search front-end it appears as follows, so I think it is OK to use these unicode characters. (does that keyword look ok here in this screen capture to you?) |
@fspreck-indiscale the ZMT records (201) are now on the production server ( https://oceaninfohub.org/ ). There is an issue however, on our side: the "Provider" facet lists 2 different providers for your records: Here is the harvested JSON-LD example: https://api.search.oceaninfohub.org/source?id=https%3A%2F%2Fdataportal.leibniz-zmt.de%2Foih%2Fdataset_19754.html&_gl=1*1qbvbqk*_ga*NjkyMjg3NDkwLjE3MTIzNDMxMjM.*_ga_MQDK6BB0YQ*MTcxMjM0MzEyMy4xLjEuMTcxMjM0NzExNC4wLjAuMA..*_ga_QJ5XJMZFXW*MTcxMjM0MzEzNi4xLjEuMTcxMjM0NzExNC4wLjAuMA.. @pbuttigieg @fils can you see the source of the problem here? |
More info: the records harvested inside Solr (search index) contain only one provider:
This is puzzling. |
Ah, it could be that no other partner is setting Example JSON-LD for CIOOS record: https://api.search.oceaninfohub.org/source?id=https%3A%2F%2Fcatalogue.cioos.ca%2Fdataset%2F777530f0-adaf-4ddb-86bb-6f1269dcb259.jsonld&_gl=1*15ety5n*_ga*MTQ3NjY4NzQyNy4xNzEyMzIxMTE5*_ga_MQDK6BB0YQ*MTcxMjM1MTk2Mi4zLjEuMTcxMjM1MjI2Ni4wLjAuMA..*_ga_QJ5XJMZFXW*MTcxMjM1MTk3MS4zLjEuMTcxMjM1MjI2Ni4wLjAuMA.. I'd need @pbuttigieg @fils to clarify how we should assume the correct use of Maybe we should setup another ZMT-ODIS technical meeting in the next 2 weeks, to examine this together. |
From the node perspective, the provider should be the entity that provided them with the data that the JSON-LD record is about. they have sdPublisher for identifying the entity that created the JSON-LD |
Summary:
Startpoint URL for ODIS-Arch
(the url to your sitemap) andType of the ODIS-Arch URL
(choose "sitemap")This issue will allow questions, updates, and discussions by both teams.
cc @fils @pbuttigieg
The text was updated successfully, but these errors were encountered: