Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple search of most metadata fields broken #7864

Closed
qqmyers opened this issue May 10, 2021 · 0 comments · Fixed by #7865
Closed

Simple search of most metadata fields broken #7864

qqmyers opened this issue May 10, 2021 · 0 comments · Fixed by #7865

Comments

@qqmyers
Copy link
Member

qqmyers commented May 10, 2021

The change to using a schema_dv_mdb_copies.xml file breaks the ability to search for most metadata fields in the metadata blocks beyond the most common few (see below for details). For example: https://demo.dataverse.org/dataverse/demo/?q=outerspace returns nothing whereas https://demo.dataverse.org/dataverse/demo/?q=originOfSources:outerspace returns my test dataset, even though originOfSources is one of the fields included in the schema_dv_mdb_copies.xml, which is intended to make the contents of those fields searchable as simple text/without specifying the field name.

The problem is that 1) the schema_dv_mdb_copies.xml file encloses the elements in a parent element, 2) the include statement copies the parent element into the schema.xml file, and 3) solr evidently doesn't read elements within a sub-element.

This is unlike the schema_dv_mdb_fields.xml file, which includes a parent element, with solr able to parse additional elements from within that element after the schema_dv_mdb_fields.xml file is included. (So actually including the fields themselves into the schema works as intended).

The problem has been somewhat hidden by the use of the 'edismax' search handler which automatically searches several fields (including title, subject, keyword, authorAffiliation) in addition to the main text field. This has the effect that searches for words in titles work, even though the title field is not being copied to the text field as it is supposed to. Metadatablock fields not directly in schema.xml now and not in the edismax list, are affected - many citation block fields and all fields in other metadata blocks. (Sites not using the relatively recent schema_dv_mdb_copies.xml aren't affected.)

The problem was discovered at QDR where the edismax search handler isn't used and therefore more fields are affected. (FWIW: https://guides.dataverse.org/en/latest/installation/prerequisites.html says that using this search handler with the same fields as Harvard is optional and the guide describes how to disable it, so other sites could also have more fields affected.)

The quickest solution I can see is to simply cut/paste the elements from schema_dv_mdb_copies.xml manually into schema.xml. Changing to dynamic fields, which has been suggested elsewhere should also resolve the issue (and would be preferable since that would avoid the manual update to schema.xml). (Note that the issue is independent of whether schema_dv_mdb_copies.xml is being generated by the update script or managed manually. It's the inclusion of that file with it's enclosing element that is the problem. One also can't just remove the element from schema_dv_mdb_copies.xml because XML requires a single root element.)

I can submit a quick PR to resolve this by putting the elements in schema.xml directly. We may also want to let people know that this works on prior versions as well: if you use schema_dv_mdb_copies.xml, copy the elements into schema.xml, restart solr and do an incremental re-index.

qqmyers added a commit to QualitativeDataRepository/dataverse that referenced this issue May 10, 2021
qqmyers added a commit to QualitativeDataRepository/dataverse that referenced this issue May 10, 2021
qqmyers added a commit to QualitativeDataRepository/dataverse that referenced this issue May 10, 2021
kcondon added a commit that referenced this issue May 14, 2021
…ple_search_misses_fields

Quick fix for #7864 (Simple search of most metadata fields broken)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant