Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Candidate tokens: mechanism for bare token to fullname token mapping? #8

Closed
robla opened this issue Jun 16, 2021 · 6 comments
Closed

Comments

@robla
Copy link
Contributor

robla commented Jun 16, 2021

As of this writing (in June 2021) the testcases all propose the following way to map between fullname tokens and bare tokens (proposal "a"):

[Doña García Márquez]: DGM
[Steven B. Jensen]:    SBJ
[Sue Ye (蘇業)]:        SY
[Adam Muñoz]:          AM

The reason why I chose that ordering is because it makes the square bracket the first character of the line, and makes it possible to determine what the line type is by the first character. However, over in issue #5 , @brainbuz suggested reversing the order, and including an explicit section header at the top (proposal "b"):

=choices
DGM: Doña García Márquez
SY: Sue Ye (蘇業)
AM: Adam Muñoz

I believe that we should make it possible to infer the section that a line is in from the first character of the line, and have a convention (rather than a requirement) of using comments to delimit sections for readability for now. We may want to make sections more explicit in the near future, but my hunch is that having line-based section identification will force us to make the line formats we design more robust (and human readable) and will also encourage more robust implementations without too much burden. We should discuss the generalities of my hunch over in issue #6.

On the subject of mapping bare tokens to fullname tokens, I would like to propose a hybrid of the two proposals above (proposal "c"):

=DGM:[Doña García Márquez]
=SBJ:[Steven B. Jensen]
=SY:[Sue Ye (蘇業)]
=AM:[Adam Muñoz]

My new proposal "c" has a distinct character ("=") at the beginning of each line, and implies a sort of prefix notation for the "=" operator (that is, using the word "operator" really loosely). It still seems best to require that arbitrary strings are enclosed in square brackets, so that we have the option to add things to the line without too much hassle, and so that it's possible to put comments on the end of each line:

=DGM:[Doña García Márquez] # see marquez2024.com for candidate website
=SBJ:[Steven B. Jensen]    # dropped out of race three days prior to election
=SY:[Sue Ye (蘇業)]         #  see sueye.org/2024 for more
=AM:[Adam Muñoz]           #  see munozftw.org for more

My question: what should our mechanism for mapping bare candidate tokens to fullname candidate tokens?

  • Proposal "a": the current mechanism implied by the testcases (e.g. test case 5)
  • Proposal "b": an explicit "=candidates" section which allows for bare token followed by bare freeform string on each line
  • Proposal "c": equal-prefix ("="), followed by bare token, then colon (":"), then fullname token
  • Option "d": remove bare token to fullname token mapping from ABIF to keep it simple
  • Option "e": something else

I'm leaning toward my new proposal "c", but I'm open to new suggestions and/or defense of the suggestions outlined above.

@brainbuz
Copy link
Contributor

I made suggestion b, but I think c is also a good choice. b could be amended to work for comments at the end or for more than one data item per key.

There are several cases where I would expect to see lists or key value lists in the metadata --- the choices list, the division list, and withdrawn choices (choices included in the data but for which votes will not be counted, possibly SBJ in the example, depending on the rules). Every list type will need its own marker or to have a section header, which is why I prefer b over c, we can define as many list types as are needed without worrying about what symbols are available.

@simberaj
Copy link

I am indifferent between "a" and "c", having currently implemented "a" in votelib (but "c" would be easy as well). Both "a" and "c" have the advantages of not having to maintain parser state across lines, and I also find both of them readable, which should IMHO be an important concern.

robla added a commit that referenced this issue Jun 18, 2021
This commit provides replacement files for test005 through test009,
using new files:

* test010.abif
* test011.abif
* test012.abif
* test013.abif
* test014.abif

If we decide on "option C" in ABIF issue #8, then tests 005-009 will
be deprecated (and should probably return errors)

ABIF issue #8 (<#8>)
@robla
Copy link
Contributor Author

robla commented Aug 29, 2021

Over on issue #1 in (#issuecomment-900871094 on Aug 18), @simberaj noted:

  • The candidate token definition formats differ: yours seems to accept =A: [Vít Rakušan] while votelib expects [Vít Rakušan]: A, which is IMHO simpler while retaining the benefit of being able to determine the line type from the first byte

Yup, the EBNF-based implementation that I built in July accepts both Proposals "a" and "c" above:

  • Proposal "a": the current mechanism implied by the testcases (e.g. test case 5)

Illustrated by this example: "[Vít Rakušan]: A"

  • Proposal "c": equal-prefix ("="), followed by bare token, then colon (":"), then fullname token

Illustrated by this example: "=A: [Vít Rakušan]"

Given @brainbuz 's implicit preference for Proposal "c" over "a" from his earlier comments, and given the thinking I've been doing about this, I like "c" better. The equals ("=") sign provides the character at the start of the line which declares "this line will be a bare token mapping line". Moreover, it will make it much easier for authors to line up the columns of text if they are allowed to choose a fixed identifier length (e.g the one-character "A" or "B", or the five character "C0001" or "C0002"), and then the arbitrary-length candidate name follows the identifier on the line. And also, when people write numbered/lettered lists of people's names (in English, at least) the number/letter usually precedes each person's full name/

Proposal "a" ("[Vít Rakušan]: A") is depcrecated in my mind. I left it in the EBNF that I wrote, but perhaps it's early enough in the life of ABIF to eliminate that option altogether. I still need to write that "test015.abif" that I promised over in issue #1 . I'll start writing it below now:

# Case 15 - Mixed candidate token mapping syntax
#
# 2021-08-28 - This case is described in ABIF issue #8 , and should almost
# 	       certainly fail.  See ABIF issue #8:
# 	       <https://github.com/electorama/abif/issues/8>

=DGM:[Doña García Márquez]
[Steven B. Jensen]:SBJ
=SY:[Sue Ye (蘇業)]
[Adam Muñoz]:AM

27:DGM>SBJ>[蘇業]>AM
26:SBJ>DGM=[蘇業]>AM
24:[蘇業]>DGM=AM>SBJ
23:AM>[蘇業]>DGM>SBJ

The example above alternates between Proposals "a" and "c", which probably shouldn't work in any self-respecting ABIF implementation.

Here's the error returned by the EBNF-based implementation that I wrote:

ERROR (UnexpectedCharacters): testfiles/test015.abif
No terminal defined for '[' at line 8 col 1

[Steven B. Jensen]:SBJ
^

Expecting: {'HASHMARK', 'OPENCURLY', 'INTEGER', 'EQUAL'}

testfiles/test015.abif -- Count: None

My hunch is that It experienced an accidental failure, but I'm not sure. Regardless, this case (or some variation thereof) is almost certainly going to be the next test case in the test suite.

@robla
Copy link
Contributor Author

robla commented Aug 29, 2021

Ooops, I figured out that my abif.py implementation is working exactly as designed, and failing on Proposal "a" above. The next test ("test015.abif") is probably going to relate to some other portion of my response on issue #1.

@simberaj
Copy link

I'm convinced by your arguments. I would not introduce deprecated features at the very start, so let's stick to Proposal "c" as the only variant. I am only tempted to give a chance to a variant that makes the initial equals-sign optional...

@robla
Copy link
Contributor Author

robla commented Aug 29, 2021

Thanks for the prompt reply, @simberaj . Let's close this issue with proposal "c":

  • Proposal "c": equal-prefix ("="), followed by bare token, then colon (":"), then fullname token

Thus this will be valid ABIF:

=DGM:[Doña García Márquez]
=SBJ:[Steven B. Jensen]
=SY:[Sue Ye (蘇業)]
=AM:[Adam Muñoz]

Implementations that numerically-assign candidates with a fixed identifier (or "bare token") can use a letter of their choosing, followed by digits, and then the candidate name as follows:

=C001:[Doña García Márquez]
=C002:[Steven B. Jensen]
=C003:[Sue Ye (蘇業)]
=C004:[Adam Muñoz]

...or if they really want to be verbose, they can also spell it out:

=CANDIDATE_NUMBER_001:[Doña García Márquez]
=CANDIDATE_NUMBER_002:[Steven B. Jensen]
=CANDIDATE_NUMBER_003:[Sue Ye (蘇業)]
=CANDIDATE_NUMBER_004:[Adam Muñoz]

I don't remember for sure if underscore (_) or digits ([0-9]) are valid in the current BNF for ABIF, but I'm pretty sure I want them both to be valid. Regardless, I'm going to close out this issue as settled on proposal "c".

p.s. regarding @simberaj 's comment "I am only tempted to give a chance to a variant that makes the initial equals-sign optional...": I'm tempted to drop the equals-sign (or make it optional) as well, but I'm afraid of making that change yet. One of us should reopen this issue (or create a new issue) if we really want to press that case.

@robla robla closed this as completed Aug 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants