length_validation #63

Closed
wants to merge 27 commits into from

3 participants

@jcfischer
Collaborator

add a simple length_validator that will be used in "plausible?"

Defined for those countries that have simple phone number schemes (i.e. where the length of the phone number is fixed)

jcfischer added some commits Oct 30, 2012
@jcfischer jcfischer + added a simple "length_validator"
this will validate phone numbers which country have fixed length numbers
d275a4e
@jcfischer jcfischer + updated some countries with length validator ea3d0ff
@jcfischer jcfischer + documentation for length_validator 0129ce8
@jcfischer jcfischer + lenght validator now can take a hash for different ndc lengths
Example:

validate_lengths({ 2 => 6, 3 => 7, 4 => [8,9] })

phone numbers with a two digit ndc are 6 digits long, numbers with a 3 digit
ndc are 7 digits long, 4 digit ndcs can be 8 or 9 digits long
8afd4a7
@jcfischer jcfischer + added a bit more documentation a168bb2
@floere
Owner

Hi @jcfischer – thanks for the pull request! I am currently thinking about the API for version 2 and where it should go.

This pull request puts a bit of pressure on to speed up the thinking.

The current line of thinking is to have a normalizable? and a plausible? method. The first would be less strict, and the second one would be far more strict. Musings:
#48 (comment)

I am currently wondering where the length checking should be. There are levels of strictness and information:

  • loose length checking (currently in, plausible E164 length?)
  • loose number checking for country (Do the numbers for a country make sense, roughly? – won't pick up unused ranges)
  • strict length checking (your pull request, plausible length for country? Might prohibit numbers with office extensions)
  • strict number checking (will pick up unallocated number ranges, might prohibit numbers with office extensions)

We are in the loose area (plausible) – we know 100% whether a number is NOT a correct E164 number.

Length checking can be split into 2 checks. Let's assume it is a fixed length (say, 10) – we run into two possible issues:

  • minimum length: Will this prohibit short numbers that are reachable externally via country code?
  • maximum length: Will this be a problem for office number extensions that are usually attached to a normal number?

My line of thinking is therefore to have two methods: One for normalizing (loose, all good numbers are "good", some bad numbers might be "good", but no good number is "bad"), and one for validity? (strict, all good numbers are "good", all bad numbers are "bad").

The problem with this is: If we go for strict checking, then that method is expected to strictly check all of the countries – making it pricey to implement.

So, where do we put the length checking? I am unsure.

(I'll add some notes on the pull request code – in the chance you might find it interesting :) )

@floere

This could be optimized a little. The problem is that we always traverse the whole array and check all. However, as soon as one validator is not plausible?, we can stop.

So a bit faster code would be:

validators.each do |validator|
   return unless validator.plausible? ndc, rest
end

This will instantly return falsy, if eg. the first plausibility check fails.

@floere

Since we get an ndc String and a rest Array, we could avoid allocating a string by using

ndc.size + rest.inject(0) { |sum, part| sum + part.size }

for the length.

Collaborator

This will not work, because ndc can be "false" (for example in the case of Denmark). Maybe rewritten as

rest_size = rest.inject(0) { |sum, part| sum + part.size }
size = ndc ? nds.size + rest_size : rest_size
Owner

You are absolutely right. I was too far ahead in 2.0 thinking (where we might remove it for the API or replace it with nil – with nil your fix is still important)

How about this?

ndc.size + rest.inject(0) { |sum, part| part ? sum + part.size : sum }
Collaborator

false doesn't have a size method...

@floere
Owner

Only just now saw your new commits. I will look at them on Friday, thanks!

If you need this released rather soonish – I am considering going into 2.0.0.pre mode. Let me know.

@jcfischer
Collaborator

Agree on the normalizable? and plausible? methods.

While going through the countries list and adding length validators, I thought that the current splitting definitions already provide enough information to extract the expected lengths, and the size of the ndcs for the various splittings. Maybe it would be more DRY (and generally easier) to implement an "expected length" attribute/method?

Regarding short / long numbers: what about a strict parameter:

Phony.plausible?("41791234567", :strict => true)  # => true
Phony.plausible?("41791234567#2", :strict => false) # => true
Phony.plausible?("41-118", :strict => true)  # false
Phony.plausible?("41-118", :strict => false)  # true

My use-case for this behaviour is to determine if a user enters his correct phone number into a smartphone app so that we can send him an SMS (you would be amazed at the amount of crap we get)

@jcfischer
Collaborator

Also - regarding the latest commits: I'm not totally happy with the form I have used:

validate_lengths({ 2 => 9, 3 => [9, 10] })

where the key is the length of the NDC

I couldn't come up with a better solution however...

@floere
Owner

Spinning off your idea of :strict => true – the plausible? method already implements "hints" where you can restrict the cc or the ndc.

Perhaps we should depart from strict/non-strict and let people be more specific. Something like:

Phony.plausible?("41791234567", :length => true)  # => true
Phony.plausible?("41791234567#2", :length => false) # => true
Phony.plausible?("41-118", :length => true)  # false
Phony.plausible?("41-118", :length => false)  # true

Not too happy about :length => true. Sounds a bit wonky, "length is true". But check_length is a bit wordy.

Ideas?

@floere
Owner

Regarding validate_lengths – it is very specific to a certain style of numbers, namely ones where length is dependent on ndc length. Not sure if this holds.

I agree with your ideas about DRY – it would be great to have the splitting rules also define lengths already. This pushes us back into using regexps for everything territory. Not sure if I want to go there yet…

Regarding your solution, perhaps

length 2 => 9, 3 => (9..10)

would be the nicest.

@floere
Owner

Ok, so in summary, there are two things to decide on:

  • plausible? vs. plausible? and normalizable?
    I am now leaning towards using just plausible? with another hint added. In any case, this also contains the decision what to return when there is no length restriction. Always plausible if no other check fails?

  • length check definition - how?
    There are two ways to go about this – explicit and implicit, from the splitting definition. Implicit would be great, but I am unsure whether it can be done easily. If we do implicit, we have to make sure that formatting numbers that are too long still works as before.

@jcfischer
Collaborator

I would say that plausible? should err on the positive case (as it does now) - the more hints or validation functions there are, the stricter it becomes, if nothing fails - hey - it must be plausible.

As for the length check, I would suggest we go the explicit route for now. I have tried to look into the implicit route - and I fear, that is might get quite hairy

I will change the syntax of the lenght validation for now

@floere
Owner

What were the problems with the implicit route?

@jcfischer
Collaborator

Me not understanding what to do, basically - (and not taking the time to really grok the code) - shame on me

@floere
Owner

No problem.

@jcfischer
Collaborator

Flöre, I have changed the syntax of the length validation (as agreed) and added a bunch of new countries.

I'm not sure, there's a way to extract those commits from this pull request though...

cheers

@floere

@jcfischer Maybe I am being dense – can you explain me each of the length definition, especially in regard to how it is split, please?

@jcfischer
Collaborator

huh - I can't parse your question... But I try

length(10)   # =>  all numbers have fixed length of 10
length( 2 => 8..9,  3 => 10)    # 2 digit ndc number are 8 or 9 digits long, 3 digit ndc is 10

does that help?

@floere
Owner

I understand now – we are talking about the total length, of course. Cheers!
The lengths seem to correspond to the ndc length + split total length pretty well. Are there many cases where this is not the case?

@jcfischer
Collaborator

Germany comes to mind immediately, but IIRC there are some other countries that are't working - I'd have to go through the country list and see. Maybe a new flag to the country definition?

country '49', one_of(...) ... spilt(3,10), :irregular => true

which disables the automatic length validation

jcfischer added some commits Dec 27, 2012
@jcfischer jcfischer + changed italy mobile ndc to 3 digits bf0e241
@jcfischer jcfischer + Length validations for Germany and Italy c138017
@jcfischer jcfischer + Updated Argentina, added El Salvador e399a78
@jcfischer jcfischer + updated README 3ed583f
@jcfischer jcfischer + change length validation for german 3 digit ndcs to be more strict
this could lead to valid german landline numbers not being considered valid.
However, for our use case, we are only interested in mobile numbers...
cdd2c9e
@jcfischer jcfischer + handle special trunk codes when normalizing
Some countries have special trunk codes (instead of 0)

Hungary - 06
Russia - 8
Lithuania - 8
Belarus - 8 (not implemented)
eec5800
@floere
Owner

Thanks! Let's get a move on. Can you rebase this on https://github.com/floere/phony/tree/twodotoh please? :)

@rjhaveri

Hi guys, this functionality would also help me out quite a bit. Any update on where it is now? Thanks!

@floere
Owner

Hi @rjhaveri – currently it's in this pull request, or in @jcfischer's fork. You can point Gemfile entries to a specific Github repo commit. Maybe try that for now as I try to find time for a 2.0 design/release. Thanks for your patience :)

@jcfischer
Collaborator
@floere
Owner

I'm going ahead and doing it myself – I cannot guarantee that the length validation will look the same at all.

@floere
Owner

I pulled this in 6 days ago – please resubmit mobino@25fd22d.

@floere floere closed this Mar 24, 2013
@floere
Owner

Thanks again for pushing – the new solution will not require the length method (your work is of course still much appreciated!).

@rjhaveri

Thanks for your work on this! It will be very useful. Two questions that I'm unclear on:

1) Regarding your comment above about the length method, is it still useful to add the length parameter in countries.rb? I have a few more countries that I can add the length parameter to.

2) I wasn't able to see a way to set a max_length, due to the possibility of extensions. Is there a way to add a max_length for when we know there will be no extension?

Thank you!

@floere
Owner

Hi @rjhaveri!

1) The hot branch is currently the v2.0.0-beta1 one. That one does not use the length parameter anymore, but instead bases itself on the fixed method.

2) This is still very much unclear – I am unsure whether we should consider extensions for the plausible? method. The plausible method considers numbers longer than the E164 format max length (15) as implausible.

This is very much a library in development, even though it already does many things well – some things aren't as clear cut. So if you think you have good, well reasoned input, please tell us :)

I think we will move towards a "normalizable?" method that simply checks whether there is enough information for the number to be normalizable and still be a correct number (eg. 1-555-5555555 or 41.44.123.12.12, but not 1-555) – this method can be used for Phony.normalize before storing a number into a data store (eg. DB). Then we'll have plausible? which will be more strict (we can say 100% whether something is NOT a valid E164 number), and valid? (we can say 100% whether something IS a valid E164 number). Unsure about whether the last one will ever be implemented.

Cheers,
Florian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment