Skip to content
Permalink
Browse files
Use Unicode char ranges when processing names
Fixes #13

Use \p{...} Unicode char ranges when processing names, which fixes names being
incorrectly split when they contain a non-[A-Z].

Moved splitting initials (AJ Bower) [A-Z] regexp above the line that lowercases
the initials, causing them to never be split (>6 year old bug).
  • Loading branch information
Tim Brody committed Jan 17, 2013
1 parent 5837839 commit 8108c2ff55255c0e300cfcb4b64ef3331c710da5
Showing with 4 additions and 4 deletions.
  1. +4 −4 perl_lib/EPrints/MetaField/Name.pm
@@ -210,6 +210,9 @@ sub get_search_conditions
$indexmode = "index_start";
}

# split up initials
$v2 =~ s/([\p{Uppercase}])/ $1/g;

# name searches are case sensitive
$v2 = "\L$v2";

@@ -223,11 +226,8 @@ sub get_search_conditions
}


# split up initials
$v2 =~ s/([A-Z])/ $1/g;

# remove not a-z characters (except ,)
$v2 =~ s/[^a-z,]/ /ig;
$v2 =~ s/[^\p{Lowercase},]/ /ig;

my( $family, $given ) = split /\s*,\s*/, $v2;
my @freetexts = ();

3 comments on commit 8108c2f

@phluid61

This comment has been minimized.

Copy link
Contributor

@phluid61 phluid61 replied May 9, 2016

Is that the right regex at line 230? /\p{Lowercase}/i becomes /\p{Cased}/. I believe \p{Letter} would make more sense.

Especially since I doubt there'd be many names with U+2170 Small Roman Numeral One in (which matches \p{Lowercase})

@phluid61

This comment has been minimized.

Copy link
Contributor

@phluid61 phluid61 replied May 9, 2016

Don't forget the apostrophe.

@mpbraendle

This comment has been minimized.

Copy link
Contributor

@mpbraendle mpbraendle replied May 9, 2016

Is author search with wildcards possible? I don't think so.

Please sign in to comment.