New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong ordervalues for subjects in multilang repository #172
Comments
|
After further analysis I believe it is a regression from 2010 (and certainly from 3.0.5 in which sorting of subjects select field works as expected). More specifically EPrints::DataObj::Subject::get_all got broken by this commit: http://github.com/eprints/eprints/commit/c3c8e1 which is not a bad idea as such, but uncovers a more fundamental bug. It seems to me that the generation of ordervalues for subjects has never worked as intended and therefore my $results = $ds->search( custom_order => "name_name" ); can't do its job either. It won't be easy to fix, given that the metafield ordervalue method, which is supposed to generate a language-specific string, is called on the low-level text metafield name_name, which doesn't know anything about its sibling name_lang. I guess Multilang.pm is the place which is supposed to bring together both of them, but it is out-of-context in EPrints::Index::_do_ordervalues where ordervalue generation occurs for each individual db storable subfield separately (i.e. field names derived from keys of $subject->get_data). I'm at a loss how it should be fixed properly. As a workaround I'm going to hack local EPrints::Index::_do_ordervalues to include special handling for name_name. |
|
Shouldn't Multi-lang::lang_value handle this (via Repository::best_language)? I'm currently looking into replacing some of these concepts with a xapian back-end for searching/retrieving/ordering. |
|
The trouble is that Multilang is not at all invoked from _do_ordervalues. But I tried to hack it into _do_ordervalues as an experiment - something along these lines: instead of using the (text) metafield for "name_name" use the multilang metafield for "name". It didn't work because Multilang.pm expects the argument to ordervalue to be in some other format than Text.pm. In fact I have suspicions that Multilang.pm might be buggy in its own way in 3.3.12 - I can't quite see how value_to_langhash can produce data in the required format for its later consumer ordervalue_single (the lack of type specifications really makes it hard to grasp). |
|
Here's my current crude workaround: |
|
OK I see - I guess Multilang is not invoked cos it's a Compound and Compound fields are "virtual" (they don't exist as such in the DB) so the following line skips it: next if $field->is_virtual; I guess there could be something on those lines (though I don't like the ->isa test): my $field = $dataset->field( $fieldname ) or next;
if( $field->isa( 'EPrints::MetaField::Multilang' ) || !$field->is_virtual)
{
# get the ordervalues
my $ov = $field->ordervalue( ..... );
# etc.
}
|
|
It's not just about is_virtual, it's worse than that. The loop iterates over keys from %$changed, which in case of insert_ordervalues equals keys from $subject->get_data. This hash doesn't even contain keys for virtual fields - it only contains 'name_name' and 'name_lang' (but not 'name'). |
|
OK - I see $changed is passed to _do_ordervalues when a data-obj is updated to avoid unnecessary DB queries. I'm wondering if that small optimisation is really useful and whether data-obj should be fully re-indexed/re-ordered when they're updated. This way we can iterate over $dataob->dataset->fields rather than keys %$changed. |
|
Sounds good under assumption that values provided by get_data item can be processed by the fields. For example I had problems passing $value to Multilang ordervalue function - which value do I pass? The one stored under the name_name key? Or the one under the name_lang key? Or both, somehow combined? But perhaps it would work to iterate over fields and call get_value on each of them and passing that to ordervalue of the same field (I haven't checked). What I also find suspicious is that currently in the ordervalues table there are separate attributes for "name_name" and "name_lang". Logically, I would expect just "name" to be stored there. So I'm worried that either I don't understand the ordervalues design or maybe there's something inherently contradictory within it. This is certainly territory where one must be careful not to introduce new bugs by "optimistic" fixes. |
|
Jan, That issue had been overlooked by the former dev's - Multilang fields should - in most cases - give a n-to-1 mapping: the DB may store N values (one for GE, one for EN, FR ....) but in most cases only one value should be selected/displayed. Multilang titles are common in multilingual repositories and when enabling them they break most Import/Export plug-ins which expects a single text value for the title field (that's been flagged up on github already). This is a similar issue for the ordervalues: when generating the GE ordervalues only the GE value should be selected (as you said, it makes no sense to have the EN value in GE ordering....). So we need some kind of wrapper function to get_value which gives either a "selected" value (via langid in this scenario) or which gives the entire data-structure ({name,lang}) back. This is one of the issues I want to have fixed for the next (minor) release. (ok - end of my monologue :)) I agree with you that for Multilang fields only the "main field name" (so 'name' here) should be generated - again because of that unusual n-to-1 mapping. For some other Compound fields, it may make sense to have the sub-fields used for the ordervalues (e.g. creators/creators_name). About your first question: you need to pass the actual value which represents what is stored in the DB (the "raw" value if you like), for multi-lang something like: $rawvalue = [
{ name => 'Wie geht es dir?', lang => 'ge' },
{ name => 'How are you?', lang => 'en' },
...
];
$multilang->ordervalue will then select the most appropriate value given $langid. So a potential pseudo-patch is: foreach $langid ( $repo->config( 'languages' ) )
foreach $field ( $dataobj->dataset->fields )
if( multilang )
do the funky ordervalue generation
if( virtual )
skip
else
process as normal
That is, if we all agree that a data-obj should always be fully re-indexed and re-ordered when it has been modified. |
|
We should have had that conversation on the forum! |
|
Hi everyone, is there any news on this issue? |
|
presumably some added functions of 3.4.2 could help!? |
I'm having trouble with incorrect sorting of the subjects input field in a 3.3.12 repository supporting two languages (German and English). EPrints::DataObj::Subject::get_all returns labels in wrong order. I suspect that the content of subject__ordervalues_de and subject__ordervalues_en is generated incorrectly, but it's hard to tell what it's supposed to be lacking proper db schema documentation.
For example, given a subjectid='dfd' I have
subject__ordervalues_de.name_name = 'Deutsches
Fernerkundungsdatenzentrum:German Remote Sensing Data Center'
which seems correct, but at the same time:
subject__ordervalues_en.name_name = 'Deutsches Fernerkundungsdatenzentrum:German Remote Sensing Data Center'
That is, exactly the same value, whereas I would expect the German and English strings to be at least swapped for ORDER BY to work correctly. (I don't understand why both values are written separated by a colon to each specific language table, though - does it make sense to write a German version into ordervalues_en at all?).
When I swap the pos attribute value for 'de' and 'en' in subject_name_name and subject_name_lang tables, the ordering in the other language gets broken. I can't get it to order correctly in both languages.
Truncating and regenerating the ordervalues tables as in epadmin/upgrade_3_3_3_to_3_3_4 does not help, so it doesn't seem to be a with corrupted data.
The text was updated successfully, but these errors were encountered: