length_validation #63

wants to merge 27 commits into from

3 participants


add a simple length_validator that will be used in "plausible?"

Defined for those countries that have simple phone number schemes (i.e. where the length of the phone number is fixed)

jcfischer added some commits Oct 30, 2012
@jcfischer jcfischer + added a simple "length_validator"
this will validate phone numbers which country have fixed length numbers
@jcfischer jcfischer + updated some countries with length validator ea3d0ff
@jcfischer jcfischer + documentation for length_validator 0129ce8
@jcfischer jcfischer + lenght validator now can take a hash for different ndc lengths

validate_lengths({ 2 => 6, 3 => 7, 4 => [8,9] })

phone numbers with a two digit ndc are 6 digits long, numbers with a 3 digit
ndc are 7 digits long, 4 digit ndcs can be 8 or 9 digits long
@jcfischer jcfischer + added a bit more documentation a168bb2

Hi @jcfischer – thanks for the pull request! I am currently thinking about the API for version 2 and where it should go.

This pull request puts a bit of pressure on to speed up the thinking.

The current line of thinking is to have a normalizable? and a plausible? method. The first would be less strict, and the second one would be far more strict. Musings:
#48 (comment)

I am currently wondering where the length checking should be. There are levels of strictness and information:

  • loose length checking (currently in, plausible E164 length?)
  • loose number checking for country (Do the numbers for a country make sense, roughly? – won't pick up unused ranges)
  • strict length checking (your pull request, plausible length for country? Might prohibit numbers with office extensions)
  • strict number checking (will pick up unallocated number ranges, might prohibit numbers with office extensions)

We are in the loose area (plausible) – we know 100% whether a number is NOT a correct E164 number.

Length checking can be split into 2 checks. Let's assume it is a fixed length (say, 10) – we run into two possible issues:

  • minimum length: Will this prohibit short numbers that are reachable externally via country code?
  • maximum length: Will this be a problem for office number extensions that are usually attached to a normal number?

My line of thinking is therefore to have two methods: One for normalizing (loose, all good numbers are "good", some bad numbers might be "good", but no good number is "bad"), and one for validity? (strict, all good numbers are "good", all bad numbers are "bad").

The problem with this is: If we go for strict checking, then that method is expected to strictly check all of the countries – making it pricey to implement.

So, where do we put the length checking? I am unsure.

(I'll add some notes on the pull request code – in the chance you might find it interesting :) )


This could be optimized a little. The problem is that we always traverse the whole array and check all. However, as soon as one validator is not plausible?, we can stop.

So a bit faster code would be:

validators.each do |validator|
   return unless validator.plausible? ndc, rest

This will instantly return falsy, if eg. the first plausibility check fails.


Since we get an ndc String and a rest Array, we could avoid allocating a string by using

ndc.size + rest.inject(0) { |sum, part| sum + part.size }

for the length.


This will not work, because ndc can be "false" (for example in the case of Denmark). Maybe rewritten as

rest_size = rest.inject(0) { |sum, part| sum + part.size }
size = ndc ? nds.size + rest_size : rest_size

You are absolutely right. I was too far ahead in 2.0 thinking (where we might remove it for the API or replace it with nil – with nil your fix is still important)

How about this?

ndc.size + rest.inject(0) { |sum, part| part ? sum + part.size : sum }

false doesn't have a size method...


Only just now saw your new commits. I will look at them on Friday, thanks!

If you need this released rather soonish – I am considering going into 2.0.0.pre mode. Let me know.


Agree on the normalizable? and plausible? methods.

While going through the countries list and adding length validators, I thought that the current splitting definitions already provide enough information to extract the expected lengths, and the size of the ndcs for the various splittings. Maybe it would be more DRY (and generally easier) to implement an "expected length" attribute/method?

Regarding short / long numbers: what about a strict parameter:

Phony.plausible?("41791234567", :strict => true)  # => true
Phony.plausible?("41791234567#2", :strict => false) # => true
Phony.plausible?("41-118", :strict => true)  # false
Phony.plausible?("41-118", :strict => false)  # true

My use-case for this behaviour is to determine if a user enters his correct phone number into a smartphone app so that we can send him an SMS (you would be amazed at the amount of crap we get)


Also - regarding the latest commits: I'm not totally happy with the form I have used:

validate_lengths({ 2 => 9, 3 => [9, 10] })

where the key is the length of the NDC

I couldn't come up with a better solution however...


Spinning off your idea of :strict => true – the plausible? method already implements "hints" where you can restrict the cc or the ndc.

Perhaps we should depart from strict/non-strict and let people be more specific. Something like:

Phony.plausible?("41791234567", :length => true)  # => true
Phony.plausible?("41791234567#2", :length => false) # => true
Phony.plausible?("41-118", :length => true)  # false
Phony.plausible?("41-118", :length => false)  # true

Not too happy about :length => true. Sounds a bit wonky, "length is true". But check_length is a bit wordy.



Regarding validate_lengths – it is very specific to a certain style of numbers, namely ones where length is dependent on ndc length. Not sure if this holds.

I agree with your ideas about DRY – it would be great to have the splitting rules also define lengths already. This pushes us back into using regexps for everything territory. Not sure if I want to go there yet…

Regarding your solution, perhaps

length 2 => 9, 3 => (9..10)

would be the nicest.


Ok, so in summary, there are two things to decide on:

  • plausible? vs. plausible? and normalizable?
    I am now leaning towards using just plausible? with another hint added. In any case, this also contains the decision what to return when there is no length restriction. Always plausible if no other check fails?

  • length check definition - how?
    There are two ways to go about this – explicit and implicit, from the splitting definition. Implicit would be great, but I am unsure whether it can be done easily. If we do implicit, we have to make sure that formatting numbers that are too long still works as before.


I would say that plausible? should err on the positive case (as it does now) - the more hints or validation functions there are, the stricter it becomes, if nothing fails - hey - it must be plausible.

As for the length check, I would suggest we go the explicit route for now. I have tried to look into the implicit route - and I fear, that is might get quite hairy

I will change the syntax of the lenght validation for now


What were the problems with the implicit route?


Me not understanding what to do, basically - (and not taking the time to really grok the code) - shame on me


No problem.


Flöre, I have changed the syntax of the length validation (as agreed) and added a bunch of new countries.

I'm not sure, there's a way to extract those commits from this pull request though...



@jcfischer Maybe I am being dense – can you explain me each of the length definition, especially in regard to how it is split, please?


huh - I can't parse your question... But I try

length(10)   # =>  all numbers have fixed length of 10
length( 2 => 8..9,  3 => 10)    # 2 digit ndc number are 8 or 9 digits long, 3 digit ndc is 10

does that help?


I understand now – we are talking about the total length, of course. Cheers!
The lengths seem to correspond to the ndc length + split total length pretty well. Are there many cases where this is not the case?


Germany comes to mind immediately, but IIRC there are some other countries that are't working - I'd have to go through the country list and see. Maybe a new flag to the country definition?

country '49', one_of(...) ... spilt(3,10), :irregular => true

which disables the automatic length validation

jcfischer added some commits Dec 27, 2012
@jcfischer jcfischer + changed italy mobile ndc to 3 digits bf0e241
@jcfischer jcfischer + Length validations for Germany and Italy c138017
@jcfischer jcfischer + Updated Argentina, added El Salvador e399a78
@jcfischer jcfischer + updated README 3ed583f
@jcfischer jcfischer + change length validation for german 3 digit ndcs to be more strict
this could lead to valid german landline numbers not being considered valid.
However, for our use case, we are only interested in mobile numbers...
@jcfischer jcfischer + handle special trunk codes when normalizing
Some countries have special trunk codes (instead of 0)

Hungary - 06
Russia - 8
Lithuania - 8
Belarus - 8 (not implemented)

Thanks! Let's get a move on. Can you rebase this on https://github.com/floere/phony/tree/twodotoh please? :)


Hi guys, this functionality would also help me out quite a bit. Any update on where it is now? Thanks!


Hi @rjhaveri – currently it's in this pull request, or in @jcfischer's fork. You can point Gemfile entries to a specific Github repo commit. Maybe try that for now as I try to find time for a 2.0 design/release. Thanks for your patience :)


I'm going ahead and doing it myself – I cannot guarantee that the length validation will look the same at all.


I pulled this in 6 days ago – please resubmit mobino@25fd22d.

@floere floere closed this Mar 24, 2013

Thanks again for pushing – the new solution will not require the length method (your work is of course still much appreciated!).


Thanks for your work on this! It will be very useful. Two questions that I'm unclear on:

1) Regarding your comment above about the length method, is it still useful to add the length parameter in countries.rb? I have a few more countries that I can add the length parameter to.

2) I wasn't able to see a way to set a max_length, due to the possibility of extensions. Is there a way to add a max_length for when we know there will be no extension?

Thank you!


Hi @rjhaveri!

1) The hot branch is currently the v2.0.0-beta1 one. That one does not use the length parameter anymore, but instead bases itself on the fixed method.

2) This is still very much unclear – I am unsure whether we should consider extensions for the plausible? method. The plausible method considers numbers longer than the E164 format max length (15) as implausible.

This is very much a library in development, even though it already does many things well – some things aren't as clear cut. So if you think you have good, well reasoned input, please tell us :)

I think we will move towards a "normalizable?" method that simply checks whether there is enough information for the number to be normalizable and still be a correct number (eg. 1-555-5555555 or, but not 1-555) – this method can be used for Phony.normalize before storing a number into a data store (eg. DB). Then we'll have plausible? which will be more strict (we can say 100% whether something is NOT a valid E164 number), and valid? (we can say 100% whether something IS a valid E164 number). Unsure about whether the last one will ever be implemented.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment