Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Dataverse produce valid DDI codebook 2.5 XML #3648

Closed
jomtov opened this issue Feb 26, 2017 · 54 comments · Fixed by #9484
Closed

Make Dataverse produce valid DDI codebook 2.5 XML #3648

jomtov opened this issue Feb 26, 2017 · 54 comments · Fixed by #9484
Assignees
Labels
Feature: Harvesting Feature: Metadata NIH OTA: 1.2.1 2 | 1.2.1 | Design and implement integration with controlled vocabularies | 5 prdOwnThis is an it... pm.GREI-d-1.2.1 NIH, yr1, aim2, task1: Design and implement integration with controlled voc Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) Type: Bug a defect User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh
Milestone

Comments

@jomtov
Copy link

jomtov commented Feb 26, 2017

Forwarded from the ticket:
https://help.hmdc.harvard.edu/Ticket/Display.html?id=245607


Hello,
I tried to validate two items exported to DDI from dataverse.harvard.edu with codebook.xsd (2.5) and got the same types of validation errors described below for item1 (below the line, should work as a well-formed xml-file):

Item 1:https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/BAMCSI

Item 2: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/P4JTOD

What could be done about it (else than meddling with the schema?)

Best regards,

Joakim Philipson
Research Data Analyst, Ph.D., MLIS
Stockholm University Library

Stockholm University
SE-106 91 Stockholm
Sweden

Tel: +46-8-16 29 50
Mobile: +46-72-1464702
E-mail: joakim.philipson@sub.su.se
http://orcid.org/0000-0001-5699-994X

<docDscr>
<citation>
    <titlStmt>
        <titl>What’s in a name? : Sense and Reference in biodiversity information </titl>
        <IDNo agency="DOI">doi:10.7910/DVN/BAMCSI</IDNo>
    </titlStmt>
    <distStmt>
        <distrbtr>Harvard Dataverse</distrbtr>
        <distDate>2017-01-12</distDate>
    </distStmt>
    <verStmt source="DVN">
        <version date="2017-01-12" type="RELEASED">1</version>
    </verStmt>
    <biblCit>Philipson, Joakim, 2017, "What’s in a name? : Sense and Reference in
        biodiversity information", doi:10.7910/DVN/BAMCSI, Harvard Dataverse, V1</biblCit>
</citation>

<xs:attribute name="source" default="producer">
xs:simpleType
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="archive"/>
<xs:enumeration value="producer"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>

<stdyInfo>
    <subject>
        <keyword>Medicine, Health and Life Sciences</keyword>
        <keyword>Computer and Information Science</keyword>
        <keyword vocab="casrai" URI="http://dictionary.casrai.org/Metadata"
            >Metadata</keyword>
        <keyword vocab="casrai" URI="http://dictionary.casrai.org/PID_system">PID
            system</keyword>
        <keyword vocab="wikipedia" URI="https://en.wikipedia.org/wiki/Biodiversity"
            >Biodiversity</keyword>
        <keyword vocab="smw-rda" URI="http://smw-rda.esc.rzg.mpg.de/index.php/Taxonomy"
            >Taxonomy</keyword>
    </subject>
    <abstract>"That which we call a rose by any other name would smell as sweet.”
        Shakespeare has Juliet tell her Romeo that a name is just a convention without
        meaning, what counts is the reference, the 'thing itself', to which the property of
        smelling sweet pertains alone. Frege in his classical paper “Über Sinn und
        Bedeutung” was not so sure, he assumed names can be inherently meaningful, even
        without a known reference. And Wittgenstein later in Philosophical Investigations
        (PI) seems to deny the sheer arbitrariness of names and reject looking for meaning
        out of context, by pointing to our inability to just utter some random sounds and by
        that really implying e.g. the door. The word cannot simply be separated from its
        meaning, in the same way as the money from the cow that could be bought for them (PI
        120). Scientific names of biota, in particular, are often descriptive of properties
        pertaining to the organism or species itself. On the other hand, in semantic web
        technology and Linked Open Data (LOD) there is an overall effort to replace names by
        their references, in the form of web links or Uniform Resource Identifiers (URIs).
        “Things, not strings” is the motto. But, even in view of the many "challenges with
        using names to link digital biodiversity information" that were extensively
        described in a recent paper, would it at all be possible or even desirable to
        replace scientific names of biota with URIs? Or would it be sufficient to just
        identify equivalence relationships between different variants of names of the same
        biota, having the same reference, and then just link them to the same “thing”, by
        means of a property sameAs(URI)? The Global Names Architecture (GNA) has a resolver
        of scientific names that is already doing that kind of work, linking names of biota
        such as Pinus thunbergii to global identifiers and URIs from other data sources,
        such as Encyclopedia of Life (EOL) and uBio Namebank. But there may be other
        challenges with going from a “natural language”, even from a not entirely coherent
        system of scientific names, to a semantic web ontology, a solution to some of which
        have been proposed recently by means of so called 'lexical bridges'.</abstract>
    <sumDscr/>
    <contact affiliation="Stockholm University" email="joakim.philipson@sub.su.se"
        >Philipson, Joakim</contact>
    <depositr>Philipson, Joakim</depositr>
    <depDate>2017-01-12</depDate>
</stdyInfo>    
<xs:complexType name="keywordType" mixed="true">
    <xs:complexContent>
        <xs:extension base="simpleTextType">
            <xs:attribute name="vocab" type="xs:string"/>
            <xs:attribute name="vocabURI" type="xs:string"/>
        </xs:extension>
    </xs:complexContent>
</xs:complexType>
<sumDscr/>
<contact affiliation="Stockholm University" email="joakim.philipson@sub.su.se"
    >Philipson, Joakim</contact>

<!-- In codebook: -->

<xs:complexType name="sumDscrType">
    <xs:complexContent>
        <xs:extension base="baseElementType">
            <xs:sequence>
                <xs:element ref="timePrd" minOccurs="0" maxOccurs="unbounded"/>
                <xs:element ref="collDate" minOccurs="0" maxOccurs="unbounded"/>
                <xs:element ref="nation" minOccurs="0" maxOccurs="unbounded"/>
                <xs:element ref="geogCover" minOccurs="0" maxOccurs="unbounded"/>
                <xs:element ref="geogUnit" minOccurs="0" maxOccurs="unbounded"/>
                <xs:element ref="geoBndBox" minOccurs="0"/>
                <xs:element ref="boundPoly" minOccurs="0" maxOccurs="unbounded"/>
                <xs:element ref="anlyUnit" minOccurs="0" maxOccurs="unbounded"/>
                <xs:element ref="universe" minOccurs="0" maxOccurs="unbounded"/>
                <xs:element ref="dataKind" minOccurs="0" maxOccurs="unbounded"/>
            </xs:sequence>
        </xs:extension>
    </xs:complexContent>
</xs:complexType>

<xs:element name="sumDscr" type="sumDscrType">
    <xs:annotation>
        <xs:documentation>
            <xhtml:div>
                <xhtml:h1 class="element_title">Summary Data Description</xhtml:h1>
                <xhtml:div>
                    <xhtml:h2 class="section_header">Description</xhtml:h2>
                    <xhtml:div class="description">Information about the and geographic coverage of the study and unit of analysis.</xhtml:div>
                </xhtml:div>
            </xhtml:div>
        </xs:documentation>
    </xs:annotation>
</xs:element>
<useStmt>CC0 Waiver</useStmt>

dataverse_1062_philipsonErrorTypes.txt

@jggautier
Copy link
Contributor

jggautier commented Feb 28, 2017

Thanks @jomtov for moving this issue from our support system!

I thought it might be helpful to give some background on the issue, list what might need to change when the DDI xml is made valid, and describe the errors.

As background for anyone else interested, the DDI xml that Dataverse generates for each dataset (and datafile) needs to follow DDI's schema, so that other repositories and applications using DDI xml can use it (e.g. during harvesting).

To answer jomtov's question, I think Dataverse's xml would need to be corrected. After fixing the errors and making sure the XML is valid, these are what I imagine will need to be adjusted:

  • the scripts used to pull Dataverse metadata into the DDI xml
  • possibly other applications that use parts of the DDI xml that Dataverse produces. (Scholars Portal's Data Explorer uses parts of the DDI (variable level metadata) that is valid)

There are five errors here, described in the dataverse_1062_philipsonErrorTypes.txt file in jomtov's post:

1. DDI schema doesn't like "DVN" as a value for source in <verStmt source="DVN">
Only "archive" and "producer" are allowed as values.

2. DDI schema doesn't like the URI attribute being called "URI":

_Attribute 'URI' is not allowed to appear in element 'keyword'._

As jomtov points out, the keyword URI is called vocabURI in Dataverse. Unless there's a reason why it's called URI in the DDI XML, I think this is as easy as changing "URI" to "vocabURI", which is okay with the schema.

<keyword vocab="term" vocabURI="http://vocabulary.org/">Metadata</keyword>

3. DDI schema doesn't like where "contact" info is placed:

<sumDscr/>
  <contact affiliation="A University" email="email@domain.com">Name</contact>

_Invalid content was found starting with element '{"ddi:codebook:2_5":contact}'. One of '{"ddi:codebook:2_5":sumDscr, "ddi:codebook:2_5":qualityStatement, "ddi:codebook:2_5":notes, "ddi:codebook:2_5":exPostEvaluation}' is expected._

The DDI schema says that sumDscr shouldn't hold things like contact info. The contact element should be under useStmt:

<useStmt>
...
    <contact affiliation="A University" email="email@domain.com">Name</contact>
...
</useStmt>

4 and 5. DDI schema doesn't like <useStmt> being followed by a value, here the value being the license:
<useStmt>CC0 Waiver</useStmt>

Two of the elements that can be nested under <useStmt> are <restrctn> and <conditions>. Either element seems appropriate for holding license info. to me. The schema's descriptions of the two elements makes <conditions> sound like a catchall and <restrctn> sound like the primary element to use. However, ICPSR uses <conditions> for license-like info.

Lastly, this isn't one of the five errors reported, but DDI likes <dataAccs> a level under <useStmt>. (Right now it's a level under <stdydscr>.) So the following change should fix these errors:

<useStmt>
  <dataAccs>
    <conditions>CC0 Waiver</conditions>
    <contact>...</contact>
  </dataAccs>
</useStmt>

@jggautier
Copy link
Contributor

jggautier commented Mar 2, 2017

There may be more validation errors (since these two datasets have only some of all possible metadata). @raprasad and I talked yesterday about trying to validate all (or a greater number?) of Harvard Dataverse's DDI XML to find additional errors and make sure the DDI XML is always valid.

There was also some discussion about when and how Dataverse validates the DDI it generates, and making sure that process is working.

@raprasad raprasad self-assigned this Mar 3, 2017
@pdurbin
Copy link
Member

pdurbin commented Mar 7, 2017

@jomtov would you be able to tell us what tools you're using to validate against a DDI 2.5 schema? I documented how to validate against DDI 2.0 using MSV (Multi Schema Validator) at http://guides.dataverse.org/en/4.6/developers/tools.html#msv but I seem to recall that DDI 2.5 is more complicate and requires multiple schema file or something. I don't think I ever figured out to use MSV to validate DDI 2.5. Do you use some other tool? Any tips for me? Thanks!

@jomtov
Copy link
Author

jomtov commented Mar 7, 2017

@pdurbin, I used the schema found in the schemaLocation of the exported xml-files of the item examples above:
<codeBook xmlns="ddi:codebook:2_5" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd"
version="2.5">
in oXygen xml-editor 18 with Xerces validation engine.
I don't think you need to invoke multiple schemas here, the errortypes are clearly described and have corresponding entries in the codebook.xsd 2.5-schema.

@jomtov jomtov closed this as completed Mar 7, 2017
@jomtov jomtov reopened this Mar 7, 2017
pdurbin added a commit that referenced this issue Mar 7, 2017
The DDI 2.5 test fails with this:

`src-resolve: Cannot resolve the name 'xml:lang' to a(n) 'attribute declaration' component.`

We should be exporting valid DDI 2.5.
@pdurbin
Copy link
Member

pdurbin commented Mar 7, 2017

Ah, thanks @jomtov. Judging from its Wikipedia page, the Oxygen XML Editor is not free and open source. Bummer.

In a491cd9 I just pushed some code to demonstrate the difficultly I've seen in validating against that codebook.xsd file you mentioned, which was I checked into the code base long ago when I first attempted (and failed) to get Dataverse to validate the DDI 2.5 it exports.

The failing Travis build from that commit at demonstrates the error I'm seeing:

Tests in error:

testValidateXml(edu.harvard.iq.dataverse.util.xml.XmlValidatorTest): src-resolve: Cannot resolve the name 'xml:lang' to a(n) 'attribute declaration' component.

That's from https://travis-ci.org/IQSS/dataverse/builds/208627544#L3805

Does anyone have any idea how to fix this test? Here's the line that's failing:

assertEquals(true, XmlValidator.validateXml(dir + "dataset-finch1.xml", dir + "codebook.xsd"));

@jomtov
Copy link
Author

jomtov commented Mar 8, 2017

Well, @pdurbin, https://www.corefiling.com/opensource/schemaValidate.html (also on GitHub) is a free xml validator online that seems to work anyway. I uploaded the codebook.xsd and one of the erroneous export-items from above and validated - here attached as .txt -files, since .xsd and .xml are not supported by GitHub, to be 'reconverted' again before use:
codebook.txt
dataverse_1062_Philipson_newexp2DDIcb.txt

True, the validator did not find some of the other referenced schemas, but they are not relevant here, and all the specific codebook.xsd validation errors seems to be identified anyway (scrolling down in the results):

Validation 1, 504 cvc-enumeration-valid: Value 'DVN' is not facet-valid with respect to enumeration '[archive, producer]'. It must be a value from the enumeration.
Validation 1, 504 cvc-attribute.3: The value 'DVN' of attribute 'source' on element 'verStmt' is not valid with respect to its type, '#AnonType_sourceGLOBALS'.
Validation 1, 1314 cvc-complex-type.3.2.2: Attribute 'URI' is not allowed to appear in element 'keyword'.
Validation 1, 1402 cvc-complex-type.3.2.2: Attribute 'URI' is not allowed to appear in element 'keyword'.
Validation 1, 1498 cvc-complex-type.3.2.2: Attribute 'URI' is not allowed to appear in element 'keyword'.
Validation 1, 1600 cvc-complex-type.3.2.2: Attribute 'URI' is not allowed to appear in element 'keyword'.
Validation 1, 3918 cvc-complex-type.2.4.a: Invalid content was found starting with element 'contact'. One of '{"ddi:codebook:2_5":sumDscr, "ddi:codebook:2_5":qualityStatement, "ddi:codebook:2_5":notes, "ddi:codebook:2_5":exPostEvaluation}' is expected.
Validation 1, 4071 cvc-complex-type.2.4.a: Invalid content was found starting with element 'useStmt'. One of '{"ddi:codebook:2_5":method, "ddi:codebook:2_5":dataAccs, "ddi:codebook:2_5":othrStdyMat, "ddi:codebook:2_5":notes}' is expected.
Validation 1, 4091 cvc-complex-type.2.3: Element 'useStmt' cannot have character [children], because the type's content type is element-only.<

Maybe this could be useful?

@pdurbin
Copy link
Member

pdurbin commented Mar 10, 2017

@jomtov thanks for the pointer to https://www.corefiling.com/opensource/schemaValidate.html which I just tried. It seems to work great. It's perfect for one-off validation of an XML file against a schema. To be clear, what I was trying to say in #3648 (comment) is that I'd like to teach Dataverse itself to validate XML against a schema. It works for DDI 2.0 but not DDi 2.5. I still don't understand why. For the Java developers reading this, a491cd9 is the commit I made the other day.

@jggautier
Copy link
Contributor

jggautier commented May 27, 2017

Hello,
I tried to validate two items exported to DDI from dataverse.harvard.edu with codebook.xsd (2.5) and got the same types of validation errors described below for item1 (below the line, should work as a well-formed xml-file):

Item 1:https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/BAMCSI (direct link to dataset's DDI xml)

Hi @jomtov. Here's the corrected DDI xml for the first dataset: valid_DDIXMLforItem1.zip. At first I misinterpreted the errors you posted, but I've got it down now. It's valid as far as I can tell. The online tool you mentioned keeps timing out for me. When you get the chance, could you check to see if the corrected DDI xml is valid with the tool you use?

A while back @pdurbin posted a DDI xml file for a dataset with most of the metadata fields that Dataverse exports. That file and the corrected file (validated with "topic classification" included) are here: invalid_and_valid_DDIxml.zip. Most of the corrections were just moving elements around in the xml, but some involved changing which fields the elements go into (e.g. CC0, or what's entered into Terms of Use if CC0 isn't chosen, can't go into useStmt since that element doesn't take a value; it takes only other elements, and license metadata doesn't fit in those subelements. I moved it to the copyright element, where ICPSR and ADA put their license metadata) or how many times an element can be repeated. These changes mean:

  • Dataverse's metadata crosswalk will need to be updated.
  • The dataset metadata entry form UI will need adjustments. For example, in the current geospatial metadata block, a user can enter multiple sets of Geographic Bounding Boxes (longitudes and latitudes), but the way that info. is captured in Dataverse's DDI is invalid. The schema requires all four fields (westBL, eastBL, southBL and northBL) to have values if any of the fields has a value. If fewer than four fields have values, the DDI will be invalid (and the metadata probably wouldn't be very useful as a bounding box), so there should be some way to require that if any of the fields have values, all four fields must. Another issue related to UI: Will a depositor need to enter multiple sets of geographic bounding boxes, and should she be able to? If a depositor does need to, should each set be tied to the other geographic metadata (name of nation, state, city, etc)? In the linked validDDI.xml file, I assume a connection between the coordinates and the nation, state, city, etc., but the metadata entry form would need to be adjusted so that depositors are aware of that connection.

I'd like to rename this issue to something like "Make Dataverse produce valid DDI codebook 2.5 xml", which would involve "teaching Dataverse itself to validate" DDI xml against the codebook 2.5 schema.

@pdurbin
Copy link
Member

pdurbin commented Jun 23, 2017

@jomtov are you ok with renaming this issue as @jggautier suggests?

@jomtov
Copy link
Author

jomtov commented Jul 1, 2017

@pdurbin and @jggautier, Yes, I am OK with the renaming suggested. (Sorry for belated answer, been on vacation off-line for a while.) Keep up the good work!

@pdurbin pdurbin changed the title Validation errors for export to DDI codebook Make Dataverse produce valid DDI codebook 2.5 xml Jul 1, 2017
@pdurbin pdurbin added Type: Suggestion an idea User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh labels Jul 1, 2017
@jggautier
Copy link
Contributor

jggautier commented Jul 22, 2017

The xml files in my earlier comment (ZIP file) don't have most of the metadata in the Terms tab, so the corrections don't take that metadata into account. Current exported DDI from Dataverse has most of the Terms metadata in the right DDI element, but just in the wrong place in the xml.

The exception is the Terms of Access metadata field - whatever's entered there is exported to DDI's dataAccs element, which shouldn't take a value (like the useStmt problem in my earlier comment). The Terms of Access field deals with file level restrictions, which may be handled differently with the upcoming work on DataTags integration, so work may need to be done to map file-level terms and access metadata to DDI.

@pdurbin pdurbin changed the title Make Dataverse produce valid DDI codebook 2.5 xml Dataverse doesn't always produce valid DDI codebook 2.5 XML Aug 9, 2017
@pdurbin pdurbin added Type: Bug a defect and removed Type: Suggestion an idea labels Aug 9, 2017
@jggautier
Copy link
Contributor

I wrote a doc describing what I think are most of the mapping changes needed: https://drive.google.com/open?id=1ICXRL8DP5fCGYiRyRphh_3OotNaWOOak1VmnyufBNsM

I'm pointing our ADA friends to this issue and doc, especially the part about the Terms metadata, since I think the invalid mapping has complicated their own work mapping ADA's DDI to Dataverse's for their planned migration.

@pdurbin
Copy link
Member

pdurbin commented Aug 14, 2017

I rewrote the XML validator in Dataverse an now have a test to validate XML we send to DataCite (it operates on a static file) and I added a FIXME to use the validator with DDI as well:

// assertTrue(XmlValidator.validateXml("src/test/java/edu/harvard/iq/dataverse/export/ddi/dataset-finch1.xml", new URL("http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd")));

@landreev
Copy link
Contributor

landreev commented Mar 23, 2023

Ouch. It's insane that we've only gotten around to fixing it now. Much of these are simply matters of our code writing elements in random order, where the schema defines a<xs:sequence> - not that difficult to fix.
Though there is a couple of non-trivial things where decisions need to be made; (what to do with the bounding boxes, for example).

landreev added a commit that referenced this issue Mar 24, 2023
…a violations *for the control dataset we are using in our tests*. There is almost certainly more that needs to be done. #3648
landreev added a commit that referenced this issue Mar 28, 2023
landreev added a commit that referenced this issue Mar 28, 2023
landreev added a commit that referenced this issue Mar 28, 2023
landreev added a commit that referenced this issue Mar 28, 2023
@landreev
Copy link
Contributor

Just to clarify a couple of things from an earlier discussion:

sizing:

* We will address the immediate issue of the bad ddi xml exports by looking specifically at what has been reported.
...
* If we find that the validator needs work, we will create a new separate issue when this is complete

"Looking specifically at what has been reported" may not easily apply. This is a very old issue, with a lot of back-and-forth (that's very hard to read), and many of the things reported earlier have already been fixed in other PRs. So I assumed that the goal of the PR was "make Dataverse produce valid DDI". (i.e., if something not explicitly mentioned here is obviously failing validation, it needed to be fixed too - it did not make sense to make a PR that would fix some things, but still produce ddi records that fail validation; especially since people have been waiting for it to be fixed since 2017).

The previously discussed automatic validation - adding code to the exporter that would validate in real time every ddi record produced, and only cache it if it passes the validation - does make sense to be left as a separate sprint-sized task. (the validation itself is not hard to add; but we'll meed to figure out how to report the errors). I have enabled the validation test in DDIExporterTest.testExportDataset() however, so, in the meantime, after we merge this PR, any developer working on the ddi exporter will be alerted if they break it by introducing something invalid, because they won't be able to build their branch.

To clarify, in the current state, the exporter in my branch is producing valid ddi xml for our control "all fields" dataset, plus all the other datasets used in our tests, and whatever I could think of to test. It does NOT guarantee that there is no possible scenario where it can still output something illegal! So, yes, it is important to add auto-validation. And, if and when somebody finds another such scenario, we will treat it as a new issue.

A couple of arbitrary decisions had to be made. I will spell it out in the PR description. My general approach was, if something does not translate from our metadata to the ddi format 1:1, just drop it and move on. We don't assume that it's a goal, to preserve all of our metadata when exporting DC, it's obvious that only a subset of our block fields can be exported in that format. But it's not a possibility with the ddi either, now that we have multiple blocks and the application is no longer centered around quantitative social science. So, no need to sweat a lost individual field here and there.

@landreev landreev changed the title Dataverse doesn't always produce valid DDI codebook 2.5 XML Make Dataverse produce valid DDI codebook 2.5 XML Mar 29, 2023
landreev added a commit that referenced this issue Mar 29, 2023
@kaczmirek
Copy link

To check compatibility I use the following two validators:

  1. BASE http://oval.base-search.net/ (this shows the new error "No incremental harvesting" in 12.1. I suggest adding this validator to the validation pipeline)
  2. CESSDA https://cmv.cessda.eu/#!validation
    with settings validation Gate= BASIC and Profile = CESSDA DATA CATALOGUE (CDC) DDI2.5 PROFILE - MONOLINGUAL: 1.0.4
    This gives both schema violations and constraint violations (the latter are probably not relevant for Dataverse because the constraints of the profile can differ from what the Dataverse project wants to see. Although it would be good the add the attributes and tags that are recommended in the Gate = STANDARD)
    It is important to pass these two validators because this can result in being included and findable in a lot of aggregators like OpenAire, ELIXIR, B2FIND (https://b2find.eudat.eu/) which are all important players in Europe and with respect to the European Open Science Cloud (EOSC), etc.
    Currently, we have local fixes at several Dataverse installations to pass the validators (I only looked at the ones participating in CESSDA in Europe).

@landreev
Copy link
Contributor

landreev commented Apr 5, 2023

@kaczmirek
CESSDA (https://cmv.cessda.eu/#!validation) is my favorite validator tool as well.
I made a pull request the other week (#9484, linked to this issue) that fixes the numerous schema violations in our DDI export. I recommend the CESSDA validator under "how to test" there, with the same profile you mentioned ("CESSDA DATA CATALOGUE (CDC) DDI2.5 PROFILE - MONOLINGUAL: 1.0.4").

landreev added a commit that referenced this issue Apr 11, 2023
…he API), and the

corresponding control ddi export. #3648
landreev added a commit that referenced this issue Apr 17, 2023
landreev added a commit that referenced this issue Apr 19, 2023
…made

multiple in PR #9254; would be great to put together a process for developers
who need to make changes to fields in metadata blocks that would help
them to know of all the places where changes like this need to be made.
(not the first time, when something breaks, in ddi export specifically, after
a field is made multiple). #3648
landreev added a commit that referenced this issue Apr 20, 2023
@pdurbin pdurbin added this to the 5.14 milestone May 10, 2023
@BPeuch BPeuch moved this from Pretty please to Solved (thank you!) in Dataverse SODHA (Belgium) Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Harvesting Feature: Metadata NIH OTA: 1.2.1 2 | 1.2.1 | Design and implement integration with controlled vocabularies | 5 prdOwnThis is an it... pm.GREI-d-1.2.1 NIH, yr1, aim2, task1: Design and implement integration with controlled voc Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) Type: Bug a defect User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh
Projects
Dataverse SODHA (Belgium)
Solved (thank you!)
Status: No status
pdurbin
Watching
Development

Successfully merging a pull request may close this issue.