You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The change to using a schema_dv_mdb_copies.xml file breaks the ability to search for most metadata fields in the metadata blocks beyond the most common few (see below for details). For example: https://demo.dataverse.org/dataverse/demo/?q=outerspace returns nothing whereas https://demo.dataverse.org/dataverse/demo/?q=originOfSources:outerspace returns my test dataset, even though originOfSources is one of the fields included in the schema_dv_mdb_copies.xml, which is intended to make the contents of those fields searchable as simple text/without specifying the field name.
The problem is that 1) the schema_dv_mdb_copies.xml file encloses the elements in a parent element, 2) the include statement copies the parent element into the schema.xml file, and 3) solr evidently doesn't read elements within a sub-element.
This is unlike the schema_dv_mdb_fields.xml file, which includes a parent element, with solr able to parse additional elements from within that element after the schema_dv_mdb_fields.xml file is included. (So actually including the fields themselves into the schema works as intended).
The problem has been somewhat hidden by the use of the 'edismax' search handler which automatically searches several fields (including title, subject, keyword, authorAffiliation) in addition to the main text field. This has the effect that searches for words in titles work, even though the title field is not being copied to the text field as it is supposed to. Metadatablock fields not directly in schema.xml now and not in the edismax list, are affected - many citation block fields and all fields in other metadata blocks. (Sites not using the relatively recent schema_dv_mdb_copies.xml aren't affected.)
The problem was discovered at QDR where the edismax search handler isn't used and therefore more fields are affected. (FWIW: https://guides.dataverse.org/en/latest/installation/prerequisites.html says that using this search handler with the same fields as Harvard is optional and the guide describes how to disable it, so other sites could also have more fields affected.)
The quickest solution I can see is to simply cut/paste the elements from schema_dv_mdb_copies.xml manually into schema.xml. Changing to dynamic fields, which has been suggested elsewhere should also resolve the issue (and would be preferable since that would avoid the manual update to schema.xml). (Note that the issue is independent of whether schema_dv_mdb_copies.xml is being generated by the update script or managed manually. It's the inclusion of that file with it's enclosing element that is the problem. One also can't just remove the element from schema_dv_mdb_copies.xml because XML requires a single root element.)
I can submit a quick PR to resolve this by putting the elements in schema.xml directly. We may also want to let people know that this works on prior versions as well: if you use schema_dv_mdb_copies.xml, copy the elements into schema.xml, restart solr and do an incremental re-index.
The text was updated successfully, but these errors were encountered:
qqmyers
added a commit
to QualitativeDataRepository/dataverse
that referenced
this issue
May 10, 2021
The change to using a schema_dv_mdb_copies.xml file breaks the ability to search for most metadata fields in the metadata blocks beyond the most common few (see below for details). For example: https://demo.dataverse.org/dataverse/demo/?q=outerspace returns nothing whereas https://demo.dataverse.org/dataverse/demo/?q=originOfSources:outerspace returns my test dataset, even though originOfSources is one of the fields included in the schema_dv_mdb_copies.xml, which is intended to make the contents of those fields searchable as simple text/without specifying the field name.
The problem is that 1) the schema_dv_mdb_copies.xml file encloses the elements in a parent element, 2) the include statement copies the parent element into the schema.xml file, and 3) solr evidently doesn't read elements within a sub-element.
This is unlike the schema_dv_mdb_fields.xml file, which includes a parent element, with solr able to parse additional elements from within that element after the schema_dv_mdb_fields.xml file is included. (So actually including the fields themselves into the schema works as intended).
The problem has been somewhat hidden by the use of the 'edismax' search handler which automatically searches several fields (including title, subject, keyword, authorAffiliation) in addition to the main text field. This has the effect that searches for words in titles work, even though the title field is not being copied to the text field as it is supposed to. Metadatablock fields not directly in schema.xml now and not in the edismax list, are affected - many citation block fields and all fields in other metadata blocks. (Sites not using the relatively recent schema_dv_mdb_copies.xml aren't affected.)
The problem was discovered at QDR where the edismax search handler isn't used and therefore more fields are affected. (FWIW: https://guides.dataverse.org/en/latest/installation/prerequisites.html says that using this search handler with the same fields as Harvard is optional and the guide describes how to disable it, so other sites could also have more fields affected.)
The quickest solution I can see is to simply cut/paste the elements from schema_dv_mdb_copies.xml manually into schema.xml. Changing to dynamic fields, which has been suggested elsewhere should also resolve the issue (and would be preferable since that would avoid the manual update to schema.xml). (Note that the issue is independent of whether schema_dv_mdb_copies.xml is being generated by the update script or managed manually. It's the inclusion of that file with it's enclosing element that is the problem. One also can't just remove the element from schema_dv_mdb_copies.xml because XML requires a single root element.)
I can submit a quick PR to resolve this by putting the elements in schema.xml directly. We may also want to let people know that this works on prior versions as well: if you use schema_dv_mdb_copies.xml, copy the elements into schema.xml, restart solr and do an incremental re-index.
The text was updated successfully, but these errors were encountered: