Updates to GB phone number data and tests => 1.7.9 #55

Merged
merged 5 commits into from Aug 20, 2012

Conversation

Projects
None yet
3 participants
@g1smd
Contributor

g1smd commented Aug 19, 2012

Additional GB phone number ranges defined.

Add more GB test numbers and ensure all GB test numbers are in valid prefixes and ranges.

Tidy various RegEx patterns and fix an obvious typo.

Additional GB phone number ranges defined.
Add more GB test numbers and ensure all GB test numbers are in valid prefixes and ranges.

Tidy various RegEx patterns and fix an obvious typo.
@travisbot

This comment has been minimized.

Show comment Hide comment
@travisbot

travisbot Aug 19, 2012

This pull request fails (merged dc8cc17 into 501ecc8).

This pull request fails (merged dc8cc17 into 501ecc8).

@g1smd

This comment has been minimized.

Show comment Hide comment
@g1smd

g1smd Aug 19, 2012

Contributor

This might also be useful... https://github.com/g1smd/Drupal-CCK-Phone-GB/blob/master/phone.gb.inc

It's a PHP routine that validates GB numbers entered in multiple formats, extracts the NSN and checks it is a valid length and in a valid range, and then formats the number according to both the length and initial digits of the NSN.

Contributor

g1smd commented Aug 19, 2012

This might also be useful... https://github.com/g1smd/Drupal-CCK-Phone-GB/blob/master/phone.gb.inc

It's a PHP routine that validates GB numbers entered in multiple formats, extracts the NSN and checks it is a valid length and in a valid range, and then formats the number according to both the length and initial digits of the NSN.

@floere

This comment has been minimized.

Show comment Hide comment
@floere

floere Aug 19, 2012

Owner

Hi @g1smd – can you help me with this failing spec http://travis-ci.org/#!/floere/phony/jobs/2166646 ? How should it be?

Owner

floere commented Aug 19, 2012

Hi @g1smd – can you help me with this failing spec http://travis-ci.org/#!/floere/phony/jobs/2166646 ? How should it be?

@g1smd

This comment has been minimized.

Show comment Hide comment
@g1smd

g1smd Aug 19, 2012

Contributor

The 0 trunk prefix or +44 country code are not part of the digit count.

Numbers with 9 digits and beginning 500 should format as 0500 557799.

Numbers with 10 digits and beginning 800 should format as 0800 555 7799.
Numbers with 9 digits and beginning 800 should format as 0800 557799.

Numbers with 10 digits and beginning 808 should format as 0808 555 7799.

Contributor

g1smd commented Aug 19, 2012

The 0 trunk prefix or +44 country code are not part of the digit count.

Numbers with 9 digits and beginning 500 should format as 0500 557799.

Numbers with 10 digits and beginning 800 should format as 0800 555 7799.
Numbers with 9 digits and beginning 800 should format as 0800 557799.

Numbers with 10 digits and beginning 808 should format as 0808 555 7799.

@travisbot

This comment has been minimized.

Show comment Hide comment
@travisbot

travisbot Aug 19, 2012

This pull request fails (merged 13b2649 into 501ecc8).

This pull request fails (merged 13b2649 into 501ecc8).

@travisbot

This comment has been minimized.

Show comment Hide comment
@travisbot

travisbot Aug 19, 2012

This pull request fails (merged aaff357 into 501ecc8).

This pull request fails (merged aaff357 into 501ecc8).

@travisbot

This comment has been minimized.

Show comment Hide comment
@travisbot

travisbot Aug 19, 2012

This pull request fails (merged 3e1f4c5 into 501ecc8).

This pull request fails (merged 3e1f4c5 into 501ecc8).

@travisbot

This comment has been minimized.

Show comment Hide comment
@travisbot

travisbot Aug 19, 2012

This pull request passes (merged cd9cda9 into 501ecc8).

This pull request passes (merged cd9cda9 into 501ecc8).

@g1smd

This comment has been minimized.

Show comment Hide comment
@g1smd

g1smd Aug 19, 2012

Contributor

Although this now passes the tests, it doesn't yet correctly handle 0500 + 6 digits, 0800 + 6 digits, 016977 + 4 digits numbers.

The full list of valid UK telephone number formats is here: http://www.aa-asterisk.org.uk/index.php/Number_format

Contributor

g1smd commented Aug 19, 2012

Although this now passes the tests, it doesn't yet correctly handle 0500 + 6 digits, 0800 + 6 digits, 016977 + 4 digits numbers.

The full list of valid UK telephone number formats is here: http://www.aa-asterisk.org.uk/index.php/Number_format

@floere

This comment has been minimized.

Show comment Hide comment
@floere

floere Aug 20, 2012

Owner

Thanks! Let's not worry about perfection – in the end, it's more important that the number is shown, than shown perfectly formatted.

Owner

floere commented Aug 20, 2012

Thanks! Let's not worry about perfection – in the end, it's more important that the number is shown, than shown perfectly formatted.

floere added a commit that referenced this pull request Aug 20, 2012

Merge pull request #55 from g1smd/master
Updates to GB phone number data and tests.

@floere floere merged commit 23e8a91 into floere:master Aug 20, 2012

1 check passed

default The Travis build passed
Details
@g1smd

This comment has been minimized.

Show comment Hide comment
@g1smd

g1smd Aug 20, 2012

Contributor

If I knew more about the way the code works I would attempt to add handling for 3+6 and 5+4 numbers.

Contributor

g1smd commented Aug 20, 2012

If I knew more about the way the code works I would attempt to add handling for 3+6 and 5+4 numbers.

@floere

This comment has been minimized.

Show comment Hide comment
@floere

floere Aug 20, 2012

Owner

No worries – I'll rewrite the UK file a bit and show you. Btw: https://github.com/floere/phony/wiki/Contributors

Owner

floere commented Aug 20, 2012

No worries – I'll rewrite the UK file a bit and show you. Btw: https://github.com/floere/phony/wiki/Contributors

@floere

This comment has been minimized.

Show comment Hide comment
@floere

floere Aug 20, 2012

Owner

@g1smd Can you help me with this one? Which should it be?

Phony.split('44800557788').should  == ['44', '800', '557788']  # Freefone
Phony.split('448005878323').should == ['44', '800', '587', '8323'] # Freefone, regression
Owner

floere commented Aug 20, 2012

@g1smd Can you help me with this one? Which should it be?

Phony.split('44800557788').should  == ['44', '800', '557788']  # Freefone
Phony.split('448005878323').should == ['44', '800', '587', '8323'] # Freefone, regression
@floere

This comment has been minimized.

Show comment Hide comment
@floere

floere Aug 20, 2012

Owner

@g1smd See https://github.com/floere/phony/blob/master/lib/phony/countries/united_kingdom.rb#L188-190, or https://github.com/floere/phony/blob/master/lib/phony/countries.rb on the DSL of how to define formatting/splitting.

It goes through the rules in order, eg. Belgium:

country '32', match(/^(70|800|90\d)\d+$/) >> split(3,3) | # Service
              match(/^(4[789]\d)\d{6}$/)  >> split(6)   | # Mobile
              one_of('2','3','4','9')     >> split(3,5) | # Short NDCs
              fixed(2)                    >> split(3,5)   # 2-digit NDCs

Service numbers are regexp matched first, followed by regexp matched mobile numbers, then a list of NDCs, finally the catchall rule which just splits off the first 2 digits.

Interesting here is that there are mobile numbers starting with 4 (second rule), but also an NDC starting with 4 (Brügge, rule 3) – since the mobile number is more specific, we put its rule before the short NDC rule.
If we switched the rules, any number starting with 4 would not make it to the mobile rule since it would be caught by the one_of short NDC rule first.

I hope that helps :)

Owner

floere commented Aug 20, 2012

@g1smd See https://github.com/floere/phony/blob/master/lib/phony/countries/united_kingdom.rb#L188-190, or https://github.com/floere/phony/blob/master/lib/phony/countries.rb on the DSL of how to define formatting/splitting.

It goes through the rules in order, eg. Belgium:

country '32', match(/^(70|800|90\d)\d+$/) >> split(3,3) | # Service
              match(/^(4[789]\d)\d{6}$/)  >> split(6)   | # Mobile
              one_of('2','3','4','9')     >> split(3,5) | # Short NDCs
              fixed(2)                    >> split(3,5)   # 2-digit NDCs

Service numbers are regexp matched first, followed by regexp matched mobile numbers, then a list of NDCs, finally the catchall rule which just splits off the first 2 digits.

Interesting here is that there are mobile numbers starting with 4 (second rule), but also an NDC starting with 4 (Brügge, rule 3) – since the mobile number is more specific, we put its rule before the short NDC rule.
If we switched the rules, any number starting with 4 would not make it to the mobile rule since it would be caught by the one_of short NDC rule first.

I hope that helps :)

@g1smd

This comment has been minimized.

Show comment Hide comment
@g1smd

g1smd Aug 20, 2012

Contributor

0800 numbers can have either 9 or 10 digits (not including the +44 or 0). Unless you want to make a list of the 1000 number allocation blocks (each one has either 1000 or 10000 numbers within) it is just way easier to allow the whole of 0800 to be either 9 or 10 digits long and show the format depending on the length of whatever was entered by the user.

0800 numbers with 10 digits are formatted as 0800 xxx xxxx.
0800 numbers with  9 digits are formatted as 0800 xxxxxx.

For 0500 numbers, all numbers have a total of 9 digits and are formatted as 0500 xxxxxx.

The UK number plan is quite complicated. It's not easy to cater for everything in a simple way.


In ELNS areas you have the situation where:

(01423) 5xxxxx with 10 digits is Harrogate [4+6]
(01423) 4xxxxx with 10 digits is Boroughbridge [4+6]

ELNS areas have more than one name within the area code with the first digit of the local number indicating which name applies:

http://www.aa-asterisk.org.uk/index.php/ELNS_areas

(01423) xxxxx  with  9 digits is always invalid for this area code (number too short).

Some area codes have different number lengths within the same area code:

(01884) 34xxx  with  9 digits is Tiverton [4+5]
(01884) 35xxx  with  9 digits is Tiverton [4+5]
(01884) 36xxxx with 10 digits is Tiverton [4+6]
(01884) 37xxxx with 10 digits is Tiverton [4+6]
(01884) 38xxx  with  9 digits is Tiverton [4+5]

There are 41 areas with a mix such as this:

http://www.aa-asterisk.org.uk/index.php/Local_dialling_rules_for_GB_telephone_numbers#Detailed_version

(01884) 36xxx  with  9 digits is invalid (number too short).
(01884) 37xxx  with  9 digits is invalid (number too short).

Mixed areas have a mix of different length area codes sharing the same initial digits.

In this case, all the numbers have the same total length:

(01539) 2xxxxx with 10 digits is Kendal [4+6]
(01539) 3xxxxx with 10 digits is Kendal [4+6]
(015394) xxxxx with 10 digits is Hawkshead [5+5]
(015395) xxxxx with 10 digits is Grange-over-Sands [5+5]
(015396) xxxxx with 10 digits is Sedbergh [5+5]
(01539) 7xxxxx with 10 digits is Kendal [4+6]

There are a number of areas like this:

http://www.aa-asterisk.org.uk/index.php/Mixed_areas

(01539) 3xxxx  with  9 digits is invalid (number too short).
(015396) xxxx  with  9 digits is invalid (number too short).

Things can get complicated. Some areas have both mixed length area codes and a mix of number lengths within one or more of the area codes.

In this case, the number length mix is in the parent area code:

(01946) 4xxxxx with 10 digits is Whitehaven [4+6]
(01946) 5xxxxx with 10 digits is Whitehaven [4+6]
(01946) 61xxx  with  9 digits is Whitehaven [4+5]
(01946) 62xxx  with  9 digits is Whitehaven [4+5]
(01946) 63xxx  with  9 digits is Whitehaven [4+5]
(019467) xxxxx with 10 digits is Gosforth [5+5]
(01946) 8xxxxx with 10 digits is Whitehaven [4+6]

There's a mix of 4 and 5 digit area codes, and the 4 digit area code has a mix of 9 and 10 digit numbers as 4+5 and 4+6 format.

(01946) 5xxxx  with  9 digits is invalid (number too short).
(019467) xxxx  with  9 digits is invalid (number too short).

The number length mix could be in the longer area code:

(01697) 2xxxxx with 10 digits is Brampton [4+6]
(016973) xxxxx with 10 digits is Wigton [5+5]
(016974) xxxxx with 10 digits is Raughton Head [5+5]
(016977) 2xxx  with  9 digits is Brampton [5+4]
(016977) 3xxx  with  9 digits is Brampton [5+4]
(016977) 4xxxx with 10 digits is Brampton [5+5]
(016977) 5xxxx with 10 digits is Brampton [5+5]

The 016977 area code has both 9 and 10 digit numbers, in 5+4 and 5+5 format. The 01697 area code has only 4+6 format numbers.

(01697) 2xxxx  with  9 digits is invalid (number too short).
(016973) xxxx  with  9 digits is invalid (number too short).
(016977) 4xxx  with  9 digits is invalid (number too short).
(01697) 8xxxx  with  9 digits is invalid (number too short).

There's several ways to validate and format UK numbers but I have given up on the method where the number is first split into area code (NDC) and local number with an attempt to then validate each part. It became way too complicated.

The old way was to try to validate these:

01697  + 6 digits beginning 2
016973 + 5 digits beginning 0 to 9
016974 + 5 digits beginning 0 to 9
01697  + 6 digits beginning 5 or 6
016977 + 4 digits beginning 2 or 3
016977 + 5 digits beginning 4 to 9
01697  + 6 digits beginning 8 or 9

and then format the number using similar rules.

I found that way too complicated to program.

With validation by initial digits and total number length as one process, and formatting as a separate process, the job is much simpler. It works something like this...

For validation there's just two rules.

0169772 and 0169773 numbers with 9 digits are valid.
01697 numbers with 10 digits are valid (at this stage, don't care how they will eventually format).

For formatting there's only a small number of rules:

0169772 and 0169773 numbers with 9 digits format as 5+4
016973, 016974 and 016977 numbers with 10 digits are formatted as 5+5.
01697 numbers not mentioned above are formatted as 4+6.

I took this approach in the code over here: https://github.com/g1smd/Drupal-CCK-Phone-GB/blob/master/phone.gb.inc

Just a few RegEx patterns validate every number range in use in the UK, and a short list of rulesets format those numbers.

For example, Freefone numbers are validated with:

^80(?:0(?:1111|\d{6,7})|8\d{7})|500\d{6}$

Formatting is done with:

Ranges 80d (including 800) with 10 digits

Leading Digits: ^80[08]
Pattern: ^(80\d)(\d{3})(\d{4})$
Format: $1 $2 $3 

Ranges 500, 800 with 9 digits

Leading Digits: ^[58]00
Pattern: ^([58]00)(\d{6})$
Format: $1 $2 

I'll take a look to see if any of these patterns can be used here.

[Had to rewrite this post as half of it vanished minutes after original posting. In the new post I might not have included all the points originally made.]

Contributor

g1smd commented Aug 20, 2012

0800 numbers can have either 9 or 10 digits (not including the +44 or 0). Unless you want to make a list of the 1000 number allocation blocks (each one has either 1000 or 10000 numbers within) it is just way easier to allow the whole of 0800 to be either 9 or 10 digits long and show the format depending on the length of whatever was entered by the user.

0800 numbers with 10 digits are formatted as 0800 xxx xxxx.
0800 numbers with  9 digits are formatted as 0800 xxxxxx.

For 0500 numbers, all numbers have a total of 9 digits and are formatted as 0500 xxxxxx.

The UK number plan is quite complicated. It's not easy to cater for everything in a simple way.


In ELNS areas you have the situation where:

(01423) 5xxxxx with 10 digits is Harrogate [4+6]
(01423) 4xxxxx with 10 digits is Boroughbridge [4+6]

ELNS areas have more than one name within the area code with the first digit of the local number indicating which name applies:

http://www.aa-asterisk.org.uk/index.php/ELNS_areas

(01423) xxxxx  with  9 digits is always invalid for this area code (number too short).

Some area codes have different number lengths within the same area code:

(01884) 34xxx  with  9 digits is Tiverton [4+5]
(01884) 35xxx  with  9 digits is Tiverton [4+5]
(01884) 36xxxx with 10 digits is Tiverton [4+6]
(01884) 37xxxx with 10 digits is Tiverton [4+6]
(01884) 38xxx  with  9 digits is Tiverton [4+5]

There are 41 areas with a mix such as this:

http://www.aa-asterisk.org.uk/index.php/Local_dialling_rules_for_GB_telephone_numbers#Detailed_version

(01884) 36xxx  with  9 digits is invalid (number too short).
(01884) 37xxx  with  9 digits is invalid (number too short).

Mixed areas have a mix of different length area codes sharing the same initial digits.

In this case, all the numbers have the same total length:

(01539) 2xxxxx with 10 digits is Kendal [4+6]
(01539) 3xxxxx with 10 digits is Kendal [4+6]
(015394) xxxxx with 10 digits is Hawkshead [5+5]
(015395) xxxxx with 10 digits is Grange-over-Sands [5+5]
(015396) xxxxx with 10 digits is Sedbergh [5+5]
(01539) 7xxxxx with 10 digits is Kendal [4+6]

There are a number of areas like this:

http://www.aa-asterisk.org.uk/index.php/Mixed_areas

(01539) 3xxxx  with  9 digits is invalid (number too short).
(015396) xxxx  with  9 digits is invalid (number too short).

Things can get complicated. Some areas have both mixed length area codes and a mix of number lengths within one or more of the area codes.

In this case, the number length mix is in the parent area code:

(01946) 4xxxxx with 10 digits is Whitehaven [4+6]
(01946) 5xxxxx with 10 digits is Whitehaven [4+6]
(01946) 61xxx  with  9 digits is Whitehaven [4+5]
(01946) 62xxx  with  9 digits is Whitehaven [4+5]
(01946) 63xxx  with  9 digits is Whitehaven [4+5]
(019467) xxxxx with 10 digits is Gosforth [5+5]
(01946) 8xxxxx with 10 digits is Whitehaven [4+6]

There's a mix of 4 and 5 digit area codes, and the 4 digit area code has a mix of 9 and 10 digit numbers as 4+5 and 4+6 format.

(01946) 5xxxx  with  9 digits is invalid (number too short).
(019467) xxxx  with  9 digits is invalid (number too short).

The number length mix could be in the longer area code:

(01697) 2xxxxx with 10 digits is Brampton [4+6]
(016973) xxxxx with 10 digits is Wigton [5+5]
(016974) xxxxx with 10 digits is Raughton Head [5+5]
(016977) 2xxx  with  9 digits is Brampton [5+4]
(016977) 3xxx  with  9 digits is Brampton [5+4]
(016977) 4xxxx with 10 digits is Brampton [5+5]
(016977) 5xxxx with 10 digits is Brampton [5+5]

The 016977 area code has both 9 and 10 digit numbers, in 5+4 and 5+5 format. The 01697 area code has only 4+6 format numbers.

(01697) 2xxxx  with  9 digits is invalid (number too short).
(016973) xxxx  with  9 digits is invalid (number too short).
(016977) 4xxx  with  9 digits is invalid (number too short).
(01697) 8xxxx  with  9 digits is invalid (number too short).

There's several ways to validate and format UK numbers but I have given up on the method where the number is first split into area code (NDC) and local number with an attempt to then validate each part. It became way too complicated.

The old way was to try to validate these:

01697  + 6 digits beginning 2
016973 + 5 digits beginning 0 to 9
016974 + 5 digits beginning 0 to 9
01697  + 6 digits beginning 5 or 6
016977 + 4 digits beginning 2 or 3
016977 + 5 digits beginning 4 to 9
01697  + 6 digits beginning 8 or 9

and then format the number using similar rules.

I found that way too complicated to program.

With validation by initial digits and total number length as one process, and formatting as a separate process, the job is much simpler. It works something like this...

For validation there's just two rules.

0169772 and 0169773 numbers with 9 digits are valid.
01697 numbers with 10 digits are valid (at this stage, don't care how they will eventually format).

For formatting there's only a small number of rules:

0169772 and 0169773 numbers with 9 digits format as 5+4
016973, 016974 and 016977 numbers with 10 digits are formatted as 5+5.
01697 numbers not mentioned above are formatted as 4+6.

I took this approach in the code over here: https://github.com/g1smd/Drupal-CCK-Phone-GB/blob/master/phone.gb.inc

Just a few RegEx patterns validate every number range in use in the UK, and a short list of rulesets format those numbers.

For example, Freefone numbers are validated with:

^80(?:0(?:1111|\d{6,7})|8\d{7})|500\d{6}$

Formatting is done with:

Ranges 80d (including 800) with 10 digits

Leading Digits: ^80[08]
Pattern: ^(80\d)(\d{3})(\d{4})$
Format: $1 $2 $3 

Ranges 500, 800 with 9 digits

Leading Digits: ^[58]00
Pattern: ^([58]00)(\d{6})$
Format: $1 $2 

I'll take a look to see if any of these patterns can be used here.

[Had to rewrite this post as half of it vanished minutes after original posting. In the new post I might not have included all the points originally made.]

@floere

This comment has been minimized.

Show comment Hide comment
@floere

floere Aug 21, 2012

Owner

For some strange reason, my post also vanished. Ugh.

In short:
First of all, many thanks for the detailed writeup. The UK is fast becoming the most complex numbering system in Phony ;)

Since Phony doesn't do validations (yet!), just plausibility tests, I am looking at implementing

0800 numbers with 10 digits are formatted as 0800 xxx xxxx.
0800 numbers with  9 digits are formatted as 0800 xxxxxx.

probably using match. Shall I or do you want to have a go?

Owner

floere commented Aug 21, 2012

For some strange reason, my post also vanished. Ugh.

In short:
First of all, many thanks for the detailed writeup. The UK is fast becoming the most complex numbering system in Phony ;)

Since Phony doesn't do validations (yet!), just plausibility tests, I am looking at implementing

0800 numbers with 10 digits are formatted as 0800 xxx xxxx.
0800 numbers with  9 digits are formatted as 0800 xxxxxx.

probably using match. Shall I or do you want to have a go?

@g1smd

This comment has been minimized.

Show comment Hide comment
@g1smd

g1smd Aug 21, 2012

Contributor

When I say "validation" I might not be using exactly the right word.

The routines I have written elsewhere will let you know that a number has the right number of digits to be valid and is in a number range that is allocated.

Maybe that's what you call "plausible"?

How detailed do you want to go? These patterns cover all GB number ranges:
http://www.aa-asterisk.org.uk/index.php/Regular_Expressions_for_Validating_and_Formatting_GB_Telephone_Numbers
http://www.aa-asterisk.org.uk/index.php/Local_dialling_rules_for_GB_telephone_numbers

Contributor

g1smd commented Aug 21, 2012

When I say "validation" I might not be using exactly the right word.

The routines I have written elsewhere will let you know that a number has the right number of digits to be valid and is in a number range that is allocated.

Maybe that's what you call "plausible"?

How detailed do you want to go? These patterns cover all GB number ranges:
http://www.aa-asterisk.org.uk/index.php/Regular_Expressions_for_Validating_and_Formatting_GB_Telephone_Numbers
http://www.aa-asterisk.org.uk/index.php/Local_dialling_rules_for_GB_telephone_numbers

@floere

This comment has been minimized.

Show comment Hide comment
@floere

floere Aug 21, 2012

Owner

Hi @g1smd,

Phony has two different functions, and goals:

  • Formatting/Splitting: Format a number, best effort, but display a number even if it is eg. too long (sometimes, you get in-house phone systems with extensions). The main goal is that somebody will be able to call the number, even if it is formatted a bit wonkily.
  • Plausibility check: Do we have a E164 number? It might not be valid, but it could be a number of a given country, even if it does not actually exist. If it is not plausible, it is 100% not plausible (Phony bugs excepted). If it is plausible, it might not be valid.

Next up:

  • Validation: Do we have an actual existing number? If it is valid, it is 100% valid. If it is invalid, it is 100% invalid. This is definitely the hardest to define and implement, and also to maintain. I didn't want to go there yet, as we are far from formatting >80% of the countries, still.

Anyway, looking at your link… It appears as if we were already formatting the numbers quite well – with the exception of the 800 range, as you note (the 2 cases I mention above). Am I right, or am I wrong?

Cheers,
Florian

Owner

floere commented Aug 21, 2012

Hi @g1smd,

Phony has two different functions, and goals:

  • Formatting/Splitting: Format a number, best effort, but display a number even if it is eg. too long (sometimes, you get in-house phone systems with extensions). The main goal is that somebody will be able to call the number, even if it is formatted a bit wonkily.
  • Plausibility check: Do we have a E164 number? It might not be valid, but it could be a number of a given country, even if it does not actually exist. If it is not plausible, it is 100% not plausible (Phony bugs excepted). If it is plausible, it might not be valid.

Next up:

  • Validation: Do we have an actual existing number? If it is valid, it is 100% valid. If it is invalid, it is 100% invalid. This is definitely the hardest to define and implement, and also to maintain. I didn't want to go there yet, as we are far from formatting >80% of the countries, still.

Anyway, looking at your link… It appears as if we were already formatting the numbers quite well – with the exception of the 800 range, as you note (the 2 cases I mention above). Am I right, or am I wrong?

Cheers,
Florian

@g1smd

This comment has been minimized.

Show comment Hide comment
@g1smd

g1smd Aug 22, 2012

Contributor

Yes. That seems a good assessment. Does phony reject numbers in area codes that don't exist and numbers with not enough digits in any area code?

There are over 500 area codes that use the 4+6 format. You have a partial list to which I added a note "... and 500 others". Should this section of the function be listing all such valid area codes? That list should be the last list in the function. The 4+5 stuff should be before it.

Alternatively, what do you think about running the numbers through the RegEx patterns listed in valid_gb_phone_range before starting any splitting? That RegEx pattern currently tests for an exact number of digits but can easily be changed to "or more" by leaving off the end anchoring on those where an extension is possible, or perhaps adding an optional (\d+)?$ group to the end of each pattern to capture the extra digits. What the RegEx pattern can do for sure is to say that a number does not have enough digits to be valid or begins with digits that are never allocated in GB. Once you know the number is in a valid range, and has at least the right number of digits, the rules for splitting are much easier to define. Instead of listing the 20, 23, 24, 28, 29 area codes for 2+8 splitting and more than 500 area codes for 4+6 splitting, you only need to say "number starts with '2' and has total of 10 or more digits: split as 2+8+extn." or "all other numbers beginning '1' with a total of 10 digits, and not already split some other way, use the 4+6 format (+extn).

Formatting can be very exact, the GB rules are fairly simple to define. I use those listed in format_gb_nsn. The order is important, for similar reasons to your Belgium example.

The 0800 issue should be simple, as long as number length is also taken into account - but it appears that extensions might cause an issue. For extensions, I often use the # or x separator when displaying the number.

If you want to see such splitting in action, slightly simplified versions (some numbers one digit short still display an area code name) of the functions I wrote are in the demo at: http://www.agilebase.co.uk/test [U:demo/P:demo] Try entering some GB numbers in the 'edit' tab, then look at the 'view' tab for the results. This is a public demo and the test data is regenerated every day (site is sometimes offline for a few hours when private testing is being done).

Contributor

g1smd commented Aug 22, 2012

Yes. That seems a good assessment. Does phony reject numbers in area codes that don't exist and numbers with not enough digits in any area code?

There are over 500 area codes that use the 4+6 format. You have a partial list to which I added a note "... and 500 others". Should this section of the function be listing all such valid area codes? That list should be the last list in the function. The 4+5 stuff should be before it.

Alternatively, what do you think about running the numbers through the RegEx patterns listed in valid_gb_phone_range before starting any splitting? That RegEx pattern currently tests for an exact number of digits but can easily be changed to "or more" by leaving off the end anchoring on those where an extension is possible, or perhaps adding an optional (\d+)?$ group to the end of each pattern to capture the extra digits. What the RegEx pattern can do for sure is to say that a number does not have enough digits to be valid or begins with digits that are never allocated in GB. Once you know the number is in a valid range, and has at least the right number of digits, the rules for splitting are much easier to define. Instead of listing the 20, 23, 24, 28, 29 area codes for 2+8 splitting and more than 500 area codes for 4+6 splitting, you only need to say "number starts with '2' and has total of 10 or more digits: split as 2+8+extn." or "all other numbers beginning '1' with a total of 10 digits, and not already split some other way, use the 4+6 format (+extn).

Formatting can be very exact, the GB rules are fairly simple to define. I use those listed in format_gb_nsn. The order is important, for similar reasons to your Belgium example.

The 0800 issue should be simple, as long as number length is also taken into account - but it appears that extensions might cause an issue. For extensions, I often use the # or x separator when displaying the number.

If you want to see such splitting in action, slightly simplified versions (some numbers one digit short still display an area code name) of the functions I wrote are in the demo at: http://www.agilebase.co.uk/test [U:demo/P:demo] Try entering some GB numbers in the 'edit' tab, then look at the 'view' tab for the results. This is a public demo and the test data is regenerated every day (site is sometimes offline for a few hours when private testing is being done).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment