New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle multilingual tokenized fields #621
Conversation
4a43a81
to
ea537e6
Compare
|
||
if source[key].is_a?(Hash) | ||
source[key].each do |language_code, values| | ||
sink["#{key}.#{language_code}_tsim"] = values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This brings symmetry between the lang-aware _ssim
fields and the lang-aware _tsim
fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, and where is the test for THIS change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe lines 62-64 test this code.
ea537e6
to
4cb621e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
I'm a little muzzy on
- whether we have the suffix right in the comments
- whether there should be a method comment for the new sortable_blah method
- whether the change for _tsim fields is tested
- whether the tests are as clear as we would like. It seems like this may be tested, but ...
I guess in a perfect world, i would like the tests to better clarify what is going on in the code for multilingual values.
@@ -31,6 +31,6 @@ | |||
"cho_is_part_of": "http://pudl.princeton.edu/collections/pudl0100", | |||
"cho_language": "Arabic", | |||
"cho_spatial": ["Egypt"], | |||
"cho_title": "افواه و ارانب", | |||
"cho_title": {"ar-Arab": ["افواه و ارانب"]}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming that we have some test data with the language and some without?? 'Cause we want that, yeah?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ndushay If I understand you correctly, yes, this fixture does already have a mix of multilingual fields and lang-ignorant fields.
@@ -33,7 +33,7 @@ | |||
'cho_is_part_of_ssim' => 'http://pudl.princeton.edu/collections/pudl0100', | |||
'cho_language_ssim' => 'Arabic', | |||
'cho_spatial_ssim' => ['Egypt'], | |||
'cho_title_ssim' => 'افواه و ارانب', | |||
'cho_title.ar-Arab_ssim' => ['افواه و ارانب'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it appears there is a test for sortable_cho_title_ssi ... below? correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 75-78 test the sortable logic, and touches on both lang-aware and lang-ignorant fields.
|
||
if source[key].is_a?(Hash) | ||
source[key].each do |language_code, values| | ||
sink["#{key}.#{language_code}_tsim"] = values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, and where is the test for THIS change?
4cb621e
to
41208ca
Compare
An additional wrinkle uncovered when dealing with multilingual fields: sending a hash along in a tokenized field makes Solr unhappy. This PR adds awareness of multilingual fields (using the IIIF-y language hash), and reaches into these hashes to construct lang-specific Solr fields with arrays as values. Also, make sure the derived sortable field logic can deal with these same fields.
41208ca
to
f9645b0
Compare
Why was this change made?
An additional wrinkle uncovered when dealing with multilingual fields: sending a hash along in a tokenized field makes Solr unhappy. This PR adds awareness of multilingual fields (using the IIIF-y language hash), and reaches into these hashes to construct lang-specific Solr fields with arrays as values. Also, make sure the derived sortable field logic can deal with these same fields.
Was the documentation (README, API, wiki, ...) updated?
no.