Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HEX Numeric notation not supported #62

Closed
hipitihop opened this issue Oct 28, 2019 · 7 comments · Fixed by #63
Closed

HEX Numeric notation not supported #62

hipitihop opened this issue Oct 28, 2019 · 7 comments · Fixed by #63

Comments

@hipitihop
Copy link

Hex notation e.g. 0xFFDC73 causes traceback:

Sample EDN:

{"AMBER" {:id "AMBER" :name "Amber 80%" :color 0xFFDC73}}

pip details:

"edn-format": {
            "hashes": [
                "sha256:aa3cc3041497b7e22386f96c44658e589597d9d26df39c4a77bbac0c8091ea88"
            ],
            "index": "pypi",
            "version": "==0.6.3"
        }

  return edn_format.loads(edn_str)
File "/usr/local/lib/python3.6/site-packages/edn_format/edn_parse.py", line 161, in parse
  return p.parse(text, lexer=lex())
File "/usr/local/lib/python3.6/site-packages/ply/yacc.py", line 333, in parse
   return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
File "/usr/local/lib/python3.6/site-packages/ply/yacc.py", line 1120, in parseopt_notrack
   p.callable(pslice)
File "/usr/local/lib/python3.6/site-packages/edn_format/edn_parse.py", line 78, in p_map
   raise EDNDecodeError('Even number of terms required for map')
edn_format.exceptions.EDNDecodeError: Even number of terms required for map
@bfontaine
Copy link
Collaborator

Those aren’t valid EDN: https://github.com/edn-format/edn#integers

@c00kiemon5ter
Copy link

c00kiemon5ter commented Oct 28, 2019

Yes, HEX notation is not defined by the spec - only "plain" integers. There are edn parsers that work with hex, though. Being liberal like that is good, but unless the communicating parties have agreed to that, they cannot claim that they talk edn. One should properly support hex through an extension.

Also see, edn-format/edn#47

@bfontaine
Copy link
Collaborator

Yeah, this is why I dislike the EDN format from a parser implementation perspective: there’s no unique spec for it.

Anyway, do you want to try and make a pull-request about this?
Integers are parsed here:

def t_INTEGER(t):
r"""[+-]?\d+N?"""
if t.value.endswith('N'):
t.value = t.value[:-1]
t.value = int(t.value)
return t

You can add tests here:

edn_format/tests.py

Lines 93 to 95 in 0d4c213

def test_parser_single_expressions(self):
for expected, edn_string in (
(1, "1"),

@c00kiemon5ter
Copy link

c00kiemon5ter commented Oct 28, 2019

Yeah, this is why I dislike the EDN format from a parser implementation perspective: there’s no unique spec for it.

There is one unique spec. It is defined here:
https://github.com/edn-format/edn

What implementations allow for, doesn't make the spec less of a spec - it makes the implementation more liberal. This is fine technically, but it does confuse reasoning about what is "actually" supported. This is solved very easily by reading the spec (which is short and simple). Technically, this could be solved with "strict" (or "lax") flag on the parser.

If we want to make edn_format more liberal, to accept integers in hex format, then this should be a separate token type (to easily use it in "lax" mode, or ignore it in "strict" mode), ie t_INTEGER_HEX. This will also make it easier to reason about the regex.

btw, edn_format already supports t_RATIO which is not defined in the spec.

@bfontaine
Copy link
Collaborator

By "spec" I meant something strict like a BNF grammar. The EDN document starts with:

Currently this specification is casual, as we gather feedback from implementors. A more rigorous e.g. BNF will follow.

We’re left with either that document or Clojure’s reference implementation. However, Clojure doesn’t respect that spec: it supports things that aren’t spec’d (like ratios edn-format/edn#64 or NaN/Inf edn-format/edn#2) and rejects things that are (edn-format/edn#68).

With no update to that document since 2014, parser implementers are left alone.

@bfontaine
Copy link
Collaborator

@swaroopch It’s your call, your’re the owner 😃 What do you want to do here? Should edn_format support that notation?

The flag solution may be interesting, but it significantly complicates the code because it means dynamically changing the grammar.

I think it makes sense to support t_RATIO because it allows to represent some data that has no other representation (think 1/3). It’s a native type in Clojure and as such is dumped in its own representation when one uses pr-str. On the other hand, hexadecimal notation never occurs in dumped data because it’s just a different representation for an integer:

(pr-str 1/3) ; => "1/3"
(pr-str 0xFF) ; => "255"

@swaroopch
Copy link
Owner

@hipitihop @c00kiemon5ter @bfontaine Thanks for the great discussion on this topic 👍🏼

My inputs are:

  • Since the spec does not explicitly mention this, it is not a requirement to have this feature.
  • Let us not add a flag for this behavior. (1) that would not be a good API; (2) I believe it is not a major feature to have a feature flag.
  • If there is the possibility of transparently supporting this, I would welcome pull requests for the same.
    • By transparently, I mean, that the functionality of this package should be the same with or without the presence of hexadecimal integers in the data.
    • If this means, we can only deserialize hexadecimal into integer but serializing will yield integer, I'm okay with that. However, I'm open to inputs on this.

bfontaine added a commit to bfontaine/edn_format that referenced this issue Nov 1, 2019
bfontaine added a commit that referenced this issue Nov 2, 2019
gabrielferreira95 added a commit to nubank/edn_format that referenced this issue Nov 25, 2019
* Convert requirements from exact to minimum

Fixes swaroopch#40

* Bump version to 0.5.13

* LICENSE: bump year

* Fix ambiguous variable name (swaroopch#44)

* Add a Changelog

* Fix parsing of exact-precision floats with a negative exposant (swaroopch#47)

* Accept missing printable ASCII chars (swaroopch#45)

* Add a sort_sets optional keyword argument to dump (swaroopch#42)

* add failing unit test

* add ImmutableList

* use ImmutableList on code

* fix latin

* conform to flake8

* add test

* add placeholder file

* workaround for limitation on the image downstream

* remove conda.txt file

* add docstring

* descend from Sequence instead of MutableSequence

* copy explicitly

* remove MutableSequence tests

* bump version

* fix formatting

* use copy.copy()

as .copy() is not available for lists on python2

* remove redundant tests

* simplify condition

* Raise custom exceptions on syntax errors (swaroopch#46)

* Add CONTRIBUTING.md (swaroopch#48)

* Handle fractions

edn-format/edn#64

* Handle #_

Fixes #4.

Note it doesn’t support top-level #_ usage such as in:

    foo #_ bar

* Add tests for #_

* Tweak README regarding caveats + mention @bfontaine contributor

* Bump version to 0.6.0

Thanks @bfontaine and @konr!

* Add v0.6.0 to the Changelog

* Run Travis tests on latest Python 3.7

* Use dist: xenial

* Use mutable data structures to improve parsing time

Previously a list was used to hold the intermediate results during
parsing of "expressions", but each expression was prepended by
copying, Fixing this takes parsing from quadratic to linear time.

* Bump version to 0.6.1

* Update setup.py to 0.6.1

* Various code simplifications (swaroopch#53)

* Parse nil and booleans as symbols (swaroopch#54)

* Add Vagrantfile for local dev

* Use collections.abc instead of collections

Fixes swaroopch#55

* Fix build issues

https://travis-ci.org/swaroopch/edn_format/jobs/475411817

* Fix Python 2.x compatibility

https://travis-ci.org/swaroopch/edn_format/jobs/475413453

* Bump version to 0.6.2

* DRY the requirements

Move from setup.py to only requirements.txt

* Add badge for PyPI version

* add support for unicode char literals

* fixed indentation in test_sort_keys

* combined version_info checks in edn_format/edn_lex.py

Co-Authored-By: LeXofLeviafan <lexofleviafan@gmail.com>

* fixed comment in char token definition

* made changes according to flake8 demands

* Bump version to 0.6.3

* Travis : Upgrade Ubuntu version to latest LTS, bionic

* Use tag signing

* Add an edn_parse.tag decorator, + docs and tests on tags (swaroopch#59)

* Contributor Notes: replace the deprecated commands

* Version 0.6.4

* Move contributor notes in CONTRIBUTING.md (swaroopch#61)

* Add edn_format.loads_all to parse all expressions in a string (swaroopch#60)

* Parse hexadecimal notation

Fixes swaroopch#62.

* Disallow 0-prefixed integers

From the spec:

> No integer other than 0 may begin with 0.

* Release v0.6.5

* Bump version to 0.6.5-nu
gcbeltramini pushed a commit to nubank/edn_format that referenced this issue Nov 27, 2019
* Convert requirements from exact to minimum

Fixes swaroopch#40

* Bump version to 0.5.13

* LICENSE: bump year

* Fix ambiguous variable name (swaroopch#44)

* Add a Changelog

* Fix parsing of exact-precision floats with a negative exposant (swaroopch#47)

* Accept missing printable ASCII chars (swaroopch#45)

* Add a sort_sets optional keyword argument to dump (swaroopch#42)

* add failing unit test

* add ImmutableList

* use ImmutableList on code

* fix latin

* conform to flake8

* add test

* add placeholder file

* workaround for limitation on the image downstream

* remove conda.txt file

* add docstring

* descend from Sequence instead of MutableSequence

* copy explicitly

* remove MutableSequence tests

* bump version

* fix formatting

* use copy.copy()

as .copy() is not available for lists on python2

* remove redundant tests

* simplify condition

* Raise custom exceptions on syntax errors (swaroopch#46)

* Add CONTRIBUTING.md (swaroopch#48)

* Handle fractions

edn-format/edn#64

* Handle #_

Fixes #4.

Note it doesn’t support top-level #_ usage such as in:

    foo #_ bar

* Add tests for #_

* Tweak README regarding caveats + mention @bfontaine contributor

* Bump version to 0.6.0

Thanks @bfontaine and @konr!

* Add v0.6.0 to the Changelog

* Run Travis tests on latest Python 3.7

* Use dist: xenial

* Use mutable data structures to improve parsing time

Previously a list was used to hold the intermediate results during
parsing of "expressions", but each expression was prepended by
copying, Fixing this takes parsing from quadratic to linear time.

* Bump version to 0.6.1

* Update setup.py to 0.6.1

* Various code simplifications (swaroopch#53)

* Parse nil and booleans as symbols (swaroopch#54)

* Add Vagrantfile for local dev

* Use collections.abc instead of collections

Fixes swaroopch#55

* Fix build issues

https://travis-ci.org/swaroopch/edn_format/jobs/475411817

* Fix Python 2.x compatibility

https://travis-ci.org/swaroopch/edn_format/jobs/475413453

* Bump version to 0.6.2

* DRY the requirements

Move from setup.py to only requirements.txt

* Add badge for PyPI version

* add support for unicode char literals

* fixed indentation in test_sort_keys

* combined version_info checks in edn_format/edn_lex.py

Co-Authored-By: LeXofLeviafan <lexofleviafan@gmail.com>

* fixed comment in char token definition

* made changes according to flake8 demands

* Bump version to 0.6.3

* Travis : Upgrade Ubuntu version to latest LTS, bionic

* Use tag signing

* Add an edn_parse.tag decorator, + docs and tests on tags (swaroopch#59)

* Contributor Notes: replace the deprecated commands

* Version 0.6.4

* Move contributor notes in CONTRIBUTING.md (swaroopch#61)

* Add edn_format.loads_all to parse all expressions in a string (swaroopch#60)

* Parse hexadecimal notation

Fixes swaroopch#62.

* Disallow 0-prefixed integers

From the spec:

> No integer other than 0 may begin with 0.

* Release v0.6.5

* Merge with upstream

* Remove repeated tests

* Undo some changes after the merge

* Fix regex for t_FLOAT

* Add more test cases

* Refactor t_FLOAT regex
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants