Simple search of most metadata fields broken #7864

qqmyers · 2021-05-10T22:23:57Z

The change to using a schema_dv_mdb_copies.xml file breaks the ability to search for most metadata fields in the metadata blocks beyond the most common few (see below for details). For example: https://demo.dataverse.org/dataverse/demo/?q=outerspace returns nothing whereas https://demo.dataverse.org/dataverse/demo/?q=originOfSources:outerspace returns my test dataset, even though originOfSources is one of the fields included in the schema_dv_mdb_copies.xml, which is intended to make the contents of those fields searchable as simple text/without specifying the field name.

The problem is that 1) the schema_dv_mdb_copies.xml file encloses the elements in a parent element, 2) the include statement copies the parent element into the schema.xml file, and 3) solr evidently doesn't read elements within a sub-element.

This is unlike the schema_dv_mdb_fields.xml file, which includes a parent element, with solr able to parse additional elements from within that element after the schema_dv_mdb_fields.xml file is included. (So actually including the fields themselves into the schema works as intended).

The problem has been somewhat hidden by the use of the 'edismax' search handler which automatically searches several fields (including title, subject, keyword, authorAffiliation) in addition to the main text field. This has the effect that searches for words in titles work, even though the title field is not being copied to the text field as it is supposed to. Metadatablock fields not directly in schema.xml now and not in the edismax list, are affected - many citation block fields and all fields in other metadata blocks. (Sites not using the relatively recent schema_dv_mdb_copies.xml aren't affected.)

The problem was discovered at QDR where the edismax search handler isn't used and therefore more fields are affected. (FWIW: https://guides.dataverse.org/en/latest/installation/prerequisites.html says that using this search handler with the same fields as Harvard is optional and the guide describes how to disable it, so other sites could also have more fields affected.)

The quickest solution I can see is to simply cut/paste the elements from schema_dv_mdb_copies.xml manually into schema.xml. Changing to dynamic fields, which has been suggested elsewhere should also resolve the issue (and would be preferable since that would avoid the manual update to schema.xml). (Note that the issue is independent of whether schema_dv_mdb_copies.xml is being generated by the update script or managed manually. It's the inclusion of that file with it's enclosing element that is the problem. One also can't just remove the element from schema_dv_mdb_copies.xml because XML requires a single root element.)

I can submit a quick PR to resolve this by putting the elements in schema.xml directly. We may also want to let people know that this works on prior versions as well: if you use schema_dv_mdb_copies.xml, copy the elements into schema.xml, restart solr and do an incremental re-index.

…ple_search_misses_fields Quick fix for #7864 (Simple search of most metadata fields broken)

qqmyers added a commit to QualitativeDataRepository/dataverse that referenced this issue May 10, 2021

IQSS#7864 - fix simple search missing fields

249296c

qqmyers added a commit to QualitativeDataRepository/dataverse that referenced this issue May 10, 2021

IQSS#7864 - fix simple search missing fields

bcca1bf

qqmyers added a commit to QualitativeDataRepository/dataverse that referenced this issue May 10, 2021

Quick fix for IQSS#7864

465ffe5

qqmyers mentioned this issue May 10, 2021

Quick fix for #7864 (Simple search of most metadata fields broken) #7865

Merged

kcondon closed this as completed in #7865 May 14, 2021

kcondon added a commit that referenced this issue May 14, 2021

Merge pull request #7865 from QualitativeDataRepository/IQSS/7864_sim…

2dbf9b7

…ple_search_misses_fields Quick fix for #7864 (Simple search of most metadata fields broken)

This was referenced May 26, 2021

Update schema.xml in-place #7903

Closed

Fix for #7903 update schema.xml in place #7904

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple search of most metadata fields broken #7864

Simple search of most metadata fields broken #7864

qqmyers commented May 10, 2021

Simple search of most metadata fields broken #7864

Simple search of most metadata fields broken #7864

Comments

qqmyers commented May 10, 2021