Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Help creating a rule for a last name #34

Closed
rejeep opened this Issue · 4 comments

2 participants

@rejeep

Hi,

I'm trying to create a rule for a last name. This is what I have come up with:

rule last_name
  [A-Za-z]+ space [iI]+ | [iI]+ &[^iI]+ | [^iI] [A-Za-z]+
end

So:

  • Match a "name", followed by a space, followed by any number of i's. Or
  • If the first characters are one or more i's, then something that is not an i must follow. Or
  • If the first character is not an i, then any "name" can follow

If correct, this should be able to parse:

  • Rule 1) Love III
  • Rule 2) Immelman
  • Rule 3) Donald

But it fails on Immelman. It would also fail on for example Love IIIx.

I guess my second rule is wrong? But why?

@mjackson
Owner

I don't follow the logic that you're using. Why not just try a simpler pattern, like [A-Za-z]+ (" "* [iI]+)?? Here's what I get when I use this pattern in irb:

irb> require 'citrus'
=> true
irb> rule = Citrus.rule '[A-Za-z]+ (" "* [iI]+)?'
=> /[A-Za-z]/+ (" "* /[iI]/+)?
irb> rule.test 'Love III'
=> 8
irb> rule.test 'Immelman'
=> 8
irb> rule.test 'Donald'
=> 6
@rejeep

Because I have another rule, which would conflict with this. If I do it like you, then the name David Love III would parse as first name David, middle name Love and last name III. But the first name should be Davis and last name Love III. What I'm trying with my rule is to make sure that the last name can not be only I's.

Maybe it's simpler if I give you the whole grammar:

grammar Name
  rule name
    first_name space middle_name space last_name |
    first_name space last_name |
    first_name
  end

  rule first_name
    [A-Za-z]+
  end

  rule last_name
    [A-Za-z]+ space [iI]+ | [iI]+ &[^iI]+ | [^iI] [A-Za-z]+
  end

  rule middle_name
    ([A-Za-z] '.') {
      delete('.')
    }
    | [A-Za-z]+
  end

  rule space
    [ \t]*
  end
end
@mjackson
Owner

Why don't you try something like this:

require 'citrus'

Citrus.eval(<<CITRUS)
grammar Name
  rule name
    first_name space middle_name space last_name space suffix? |
    first_name space last_name space suffix? |
    first_name
  end

  rule first_name
    [A-Za-z]+
  end

  rule middle_name
    ([A-Za-z] '.') {
      delete('.')
    }
    | [A-Za-z]+
  end

  rule last_name
    !suffix [A-Za-z]+
  end

  rule suffix
    [iI]+ | `jr` '.'?
  end

  rule space
    [ \t]*
  end
end
CITRUS

puts Name.parse("David Love III").dump

This grammar separates out the suffix of the name (I've allowed for "jr." as well, just to demonstrate) from the last name. You can see in the dump of the match how the various tokens are broken up.

@mjackson mjackson closed this
@rejeep

Ahh, nice. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.