Skip to content
Permalink
Browse files

Test extended

  • Loading branch information...
javacook committed Sep 28, 2018
2 parents 87d1360 + f1da0ae commit 85510a4c79bbf53df15e6c5d8d05328441c1f11b
@@ -1,25 +1,43 @@
This library is abel to divide (German) a concatenated street names
consisting of the pure street name the house number and its affix
into its single parts. Examples:

Some streets contain a number as suffix or infix so that it is
impossible to decide whether this number is a house number or
a part of the street name itselfs. An example is "Straße 101".
This could be the street "Straße" with house number "101" but
in reality this the number "101" is part of the street name
(in Berlin). To make this decision unambiguous the street
"Straße 101" was added to a list of "special streets". This list
can be found in the file "specialstreets.txt".
This library is able to divide (German) a concatenated street names
consisting of the pure street name, the house number, and its affix
into its single parts.

Some streets contain a number as suffix or infix themselves so that
it is impossible to decide whether this number is a house number or
belongs to the street name. An example is "Straße 101" in Berlin.
At first glance this seems to be a street "Straße" with
house number "101", but in reality the number "101" is part of
the street name. To make this decision unambiguous the street
"Straße 101" is added to a list of "special streets". This list
can be found in the file "specialstreets.txt" and can/must be updated
periodically.

More Examples:

input | street | house no | affix
-------------------- | ---------------- | -------- | -------
Gartenstr. 25a | Gartenstr. | 25 | a
Brückenstr. 12a-13c | Brückenstr. | 12 | a-13c
Straße 101 Nr. 12 | Straße 101 | 12 | null
Straße 101 Nr. 12 | Straße 101 | 12 |
C 3 54 | C 3 | 54 |
In den 30 Morgen 34b | In den 30 Morgen | 34 | b
C 3 54 | C 3 | 54 | null


In the examples above you can see the very strange street names of the
city Manheim. The inner of the city in organized as a matrix with names
like A4 or C3.

Usage
-----
val streetDivider = StreetDivider()
println(streetDivider.parse("Gartenstr. 25a"))

-> Location(street=Gartenstr., houseNumber=25, houseNoAffix=a)

Try it out
----------

You can download (downloads/streetdivider.jar) a small executable jar with UI and test
the street decomposition with your own street names...

![streetdivider.png](streetdivider.png)
@@ -1,8 +1,8 @@
group 'de.kotlincook.textmining'
version '1.4'
version '1.6-SNASHOT'

buildscript {
ext.kotlin_version = '1.2.21'
ext.kotlin_version = '1.2.71'

repositories {
mavenCentral()
Binary file not shown.
@@ -38,12 +38,14 @@ open class StreetDivider(private val dictionary: Dictionary) {
val inputTrimmed = input.trim()
var street1 = inputTrimmed

// Search for the longest prefix of inputTrimmed which is a special street name
for (ch in inputTrimmed.reversed()) {
if (dictionary.contains(street1.standardizeStreetName())) break
street1 = street1.removeSuffix(ch.toString())
}

if (street1.isNotEmpty()) {
// inputTrimmed starts with a special street...
val houseNoWithAffix = inputTrimmed.substring(street1.length)
// The following "if" avoids that B54 is devided into B5 and house no 4
// if B5 is a special street:
@@ -60,6 +62,7 @@ open class StreetDivider(private val dictionary: Dictionary) {
}
}
}
// inputTrimmed does not start with a special street name or is of the "B54 case" (see above)
val (street2, houseNoWithAffix) = divideIntoStreetAndHouseNoWihAffixDueToNumber(inputTrimmed)
if (street2 == "" || houseNoWithAffix == null) {
return Location(inputTrimmed.removeTrailingSpecialChars())
@@ -20,6 +20,7 @@ class StreetDividerTest extends Specification {
def "division of compound street names into their parts works correct"() {
expect:
streetDivider.parse(input) == new Location(street, houseNo, affix)
streetDivider.parse(" " + input + " ") == new Location(street, houseNo, affix)

where:
input | street | houseNo | affix
@@ -28,13 +29,25 @@ class StreetDividerTest extends Specification {
"A" | "A" | null | null
"5" | "5" | null | null
"143" | "143" | null | null
"5001 44" | "5001" | 44 | null
"374, 4" | "374" | 4 | null
"B 4 10–10a" | "B 4" | 10 | "–10a"
"B45" | "B" | 45 | null
"B4 5" | "B4" | 5 | null
"D4" | "D4" | null | null
"D 4" | "D 4" | null | null
"D 4 3" | "D 4" | 3 | null
"D 4 3 8" | "D 4" | 3 | "8"
"D 4 3b" | "D 4" | 3 | "b"
"D 4 3 8b" | "D 4" | 3 | "8b"
"D 4, 3" | "D 4" | 3 | null
"D4" | "D4" | null | null
"D4 31" | "D4" | 31 | null
"D43 1" | "D" | 43 | "1"
"D431" | "D" | 431 | null
"D,431" | "D" | 431 | null
"D+431" | "D" | 431 | null
"D-431" | "D" | 431 | null
"D.431" | "D." | 431 | null
"D4, 3" | "D4" | 3 | null
"D4, Nr. 3" | "D4" | 3 | null
"D4, Nr.3" | "D4" | 3 | null
@@ -49,7 +62,7 @@ class StreetDividerTest extends Specification {
"Bundesstr. 2 Nr.0" | "Bundesstr. 2" | 0 | null
"Bundesstr. 2 Nr.O" | "Bundesstr. 2 Nr.O" | null | null
"Straße 73 5a" | "Straße 73" | 5 | "a"
" Straße 73" | "Straße 73" | null | null
"Straße 73" | "Straße 73" | null | null
"Stra ße 73 5a" | "Stra ße" | 73 | "5a"
"Str. 73 5a" | "Str. 73" | 5 | "a"
"Strasse73 5a" | "Strasse" | 73 | "5a"
@@ -99,6 +112,7 @@ class StreetDividerTest extends Specification {
"Dünè 5" | "Dünè 5" | null | null
"Duené5" | "Duené5" | null | null
"Dunê5" | "Dunê" | 5 | null
"Åbjerg 17423 ZZ" | "Åbjerg" | 17423 | "ZZ"
}

}
BIN +17.8 KB streetdivider.png
Binary file not shown.

0 comments on commit 85510a4

Please sign in to comment.
You can’t perform that action at this time.