Permalink
Browse files

fix initial match problem

  • Loading branch information...
1 parent 280b251 commit 6bcca11792e45de0ea9c93c7af41b00e8eaa8afc @grobie committed Oct 8, 2009
Showing with 6 additions and 8 deletions.
  1. +0 −3 TODO
  2. +6 −5 distance.rb
View
3 TODO
@@ -1,5 +1,2 @@
-* Datenreinigung Herr und Frau ...
-
* Ähnlichkeit Vergleich Referenzdaten
-* vornamen matching (Ä. <=> Ämil)
* hausnummern range
View
@@ -1,4 +1,5 @@
require 'amatch'
+require 'unicode'
module Distance
include Amatch
@@ -9,14 +10,14 @@ def self.edit_distance(s1, s2)
end
def self.edit_distance_initial(s1, s2)
- s1,s2 = s1.downcase, s2.downcase
- if s1 =~ /^[a-zäöüÄÖÜ]\./ || s2 =~ /^[a-zäöüÄÖÜ]\./
- i1 = s1 =~ /^[äöüÄÖÜ]/ ? s1[0,2] : s1[0,1]
- i2 = s2 =~ /^[äöüÄÖÜ]/ ? s2[0,2] : s2[0,1]
+ s1,s2 = Unicode.downcase(s1), Unicode.downcase(s2)
+ if s1 =~ /^[a-zäöüÄÖÜ]\.*/u || s2 =~ /^[a-zäöüÄÖÜ]\.*/u
+ i1 = s1 =~ /^[äöüÄÖÜ]/u ? s1[0,2] : s1[0,1]
+ i2 = s2 =~ /^[äöüÄÖÜ]/u ? s2[0,2] : s2[0,1]
i1 == i2 ? 0 : 1
else
edit_distance(s1,s2)
end
end
-end
+end

0 comments on commit 6bcca11

Please sign in to comment.