Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong ordervalues for subjects in multilang repository #172

Open
jploski opened this issue Dec 2, 2013 · 12 comments
Open

Wrong ordervalues for subjects in multilang repository #172

jploski opened this issue Dec 2, 2013 · 12 comments

Comments

@jploski
Copy link

@jploski jploski commented Dec 2, 2013

I'm having trouble with incorrect sorting of the subjects input field in a 3.3.12 repository supporting two languages (German and English). EPrints::DataObj::Subject::get_all returns labels in wrong order. I suspect that the content of subject__ordervalues_de and subject__ordervalues_en is generated incorrectly, but it's hard to tell what it's supposed to be lacking proper db schema documentation.

For example, given a subjectid='dfd' I have

subject__ordervalues_de.name_name = 'Deutsches
Fernerkundungsdatenzentrum:German Remote Sensing Data Center'

which seems correct, but at the same time:

subject__ordervalues_en.name_name = 'Deutsches Fernerkundungsdatenzentrum:German Remote Sensing Data Center'

That is, exactly the same value, whereas I would expect the German and English strings to be at least swapped for ORDER BY to work correctly. (I don't understand why both values are written separated by a colon to each specific language table, though - does it make sense to write a German version into ordervalues_en at all?).

When I swap the pos attribute value for 'de' and 'en' in subject_name_name and subject_name_lang tables, the ordering in the other language gets broken. I can't get it to order correctly in both languages.

Truncating and regenerating the ordervalues tables as in epadmin/upgrade_3_3_3_to_3_3_4 does not help, so it doesn't seem to be a with corrupted data.

@jploski
Copy link
Author

@jploski jploski commented Dec 2, 2013

After further analysis I believe it is a regression from 2010 (and certainly from 3.0.5 in which sorting of subjects select field works as expected). More specifically EPrints::DataObj::Subject::get_all got broken by this commit: http://github.com/eprints/eprints/commit/c3c8e1 which is not a bad idea as such, but uncovers a more fundamental bug.

It seems to me that the generation of ordervalues for subjects has never worked as intended and therefore my $results = $ds->search( custom_order => "name_name" ); can't do its job either.

It won't be easy to fix, given that the metafield ordervalue method, which is supposed to generate a language-specific string, is called on the low-level text metafield name_name, which doesn't know anything about its sibling name_lang. I guess Multilang.pm is the place which is supposed to bring together both of them, but it is out-of-context in EPrints::Index::_do_ordervalues where ordervalue generation occurs for each individual db storable subfield separately (i.e. field names derived from keys of $subject->get_data).

I'm at a loss how it should be fixed properly. As a workaround I'm going to hack local EPrints::Index::_do_ordervalues to include special handling for name_name.

@sebastfr
Copy link

@sebastfr sebastfr commented Dec 2, 2013

Shouldn't Multi-lang::lang_value handle this (via Repository::best_language)?

I'm currently looking into replacing some of these concepts with a xapian back-end for searching/retrieving/ordering.

@jploski
Copy link
Author

@jploski jploski commented Dec 2, 2013

The trouble is that Multilang is not at all invoked from _do_ordervalues.

But I tried to hack it into _do_ordervalues as an experiment - something along these lines: instead of using the (text) metafield for "name_name" use the multilang metafield for "name". It didn't work because Multilang.pm expects the argument to ordervalue to be in some other format than Text.pm. In fact I have suspicions that Multilang.pm might be buggy in its own way in 3.3.12 - I can't quite see how value_to_langhash can produce data in the required format for its later consumer ordervalue_single (the lack of type specifications really makes it hard to grasp).

@jploski
Copy link
Author

@jploski jploski commented Dec 3, 2013

Here's my current crude workaround:

--- eprints-3.3.12-src/perl_lib/EPrints/Index.pm    2013-07-30 14:07:46.000000000 +0200
+++ eprints/perl_lib/EPrints/Index.pm   2013-12-03 00:13:26.000000000 +0100
@@ -258,11 +258,19 @@
            my $field = $dataset->field( $fieldname );
            next if $field->is_virtual;

-           my $ov = $field->ordervalue( 
-                   $data->{$fieldname},
-                   $session,
-                   $langid,
-                   $dataset );
+            my $ov;
+            if ($fieldname eq 'name_name') {
+                my $i = 0;
+                my $lang_i = { map { $_ => $i++ } @{$data->{name_lang}} }->{$langid};
+                $ov = $data->{name_name}->[$lang_i];
+            }
+            else {
+               $ov = $field->ordervalue( 
+                       $data->{$fieldname},
+                       $session,
+                       $langid,
+                       $dataset );
+            }

            push @fnames, $field->get_sql_name();
            push @fvals, $ov;

@sebastfr
Copy link

@sebastfr sebastfr commented Dec 3, 2013

OK I see - I guess Multilang is not invoked cos it's a Compound and Compound fields are "virtual" (they don't exist as such in the DB) so the following line skips it:

next if $field->is_virtual;

I guess there could be something on those lines (though I don't like the ->isa test):

my $field = $dataset->field( $fieldname ) or next;
if( $field->isa( 'EPrints::MetaField::Multilang' ) || !$field->is_virtual)
{
 # get the ordervalues
 my $ov = $field->ordervalue( ..... );
 # etc.
}

@jploski
Copy link
Author

@jploski jploski commented Dec 3, 2013

It's not just about is_virtual, it's worse than that. The loop iterates over keys from %$changed, which in case of insert_ordervalues equals keys from $subject->get_data. This hash doesn't even contain keys for virtual fields - it only contains 'name_name' and 'name_lang' (but not 'name').

@sebastfr
Copy link

@sebastfr sebastfr commented Dec 3, 2013

OK - I see $changed is passed to _do_ordervalues when a data-obj is updated to avoid unnecessary DB queries. I'm wondering if that small optimisation is really useful and whether data-obj should be fully re-indexed/re-ordered when they're updated. This way we can iterate over $dataob->dataset->fields rather than keys %$changed.

@jploski
Copy link
Author

@jploski jploski commented Dec 3, 2013

Sounds good under assumption that values provided by get_data item can be processed by the fields. For example I had problems passing $value to Multilang ordervalue function - which value do I pass? The one stored under the name_name key? Or the one under the name_lang key? Or both, somehow combined? But perhaps it would work to iterate over fields and call get_value on each of them and passing that to ordervalue of the same field (I haven't checked).

What I also find suspicious is that currently in the ordervalues table there are separate attributes for "name_name" and "name_lang". Logically, I would expect just "name" to be stored there. So I'm worried that either I don't understand the ordervalues design or maybe there's something inherently contradictory within it. This is certainly territory where one must be careful not to introduce new bugs by "optimistic" fixes.

@sebastfr
Copy link

@sebastfr sebastfr commented Dec 3, 2013

Jan,

That issue had been overlooked by the former dev's - Multilang fields should - in most cases - give a n-to-1 mapping: the DB may store N values (one for GE, one for EN, FR ....) but in most cases only one value should be selected/displayed. Multilang titles are common in multilingual repositories and when enabling them they break most Import/Export plug-ins which expects a single text value for the title field (that's been flagged up on github already).

This is a similar issue for the ordervalues: when generating the GE ordervalues only the GE value should be selected (as you said, it makes no sense to have the EN value in GE ordering....). So we need some kind of wrapper function to get_value which gives either a "selected" value (via langid in this scenario) or which gives the entire data-structure ({name,lang}) back. This is one of the issues I want to have fixed for the next (minor) release. (ok - end of my monologue :))

I agree with you that for Multilang fields only the "main field name" (so 'name' here) should be generated - again because of that unusual n-to-1 mapping. For some other Compound fields, it may make sense to have the sub-fields used for the ordervalues (e.g. creators/creators_name).

About your first question: you need to pass the actual value which represents what is stored in the DB (the "raw" value if you like), for multi-lang something like:

$rawvalue = [ 
 { name => 'Wie geht es dir?', lang => 'ge' },
 { name => 'How are you?', lang => 'en' },
 ...
];

$multilang->ordervalue will then select the most appropriate value given $langid.

So a potential pseudo-patch is:

foreach $langid ( $repo->config( 'languages' ) )
  foreach $field ( $dataobj->dataset->fields )
    if( multilang )
       do the funky ordervalue generation
    if( virtual )
        skip
    else
         process as normal

That is, if we all agree that a data-obj should always be fully re-indexed and re-ordered when it has been modified.

@sebastfr
Copy link

@sebastfr sebastfr commented Dec 3, 2013

We should have had that conversation on the forum!

@dennmuel
Copy link

@dennmuel dennmuel commented Nov 26, 2018

Hi everyone, is there any news on this issue?

@Pfiffikus
Copy link

@Pfiffikus Pfiffikus commented Aug 12, 2020

presumably some added functions of 3.4.2 could help!?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants