Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse: Implement hacky way to handle full range of small ints. #313

Closed
wants to merge 1 commit into from

Conversation

pfalcon
Copy link
Contributor

@pfalcon pfalcon commented Feb 21, 2014

So, I proceeded to write testcases for varlen encoded small ints, and that's why I paid attention that parser has its own "small ints" which don't match normal "small ints". So, while my aim was to let people write a billion as a number in uPy, that still can't be done. OMG, let's work around it then, but see TODO below.

Parser has its own "small small ints", which unfortunately means that full
range of small ints are not available in lexical form. This issue is usually
masked by presence of long int, but without them, it's not possible to write
biggish numbers in a Python program.

To work that around, CONST_INT node is overloaded with special encoding for
full-range small ints, which eventually propagated to bytecode properly, as
LOAD_CONST_SMALL_INT.

TODO: Ideally, parser should not have its own "small int" type, and instead
all parts of interpreter should use single consistent type (even if that on
parser side requires more fat representation) - dealing with types fragmented
without real need is a chore.

Parser has its own "small small ints", which unfortunately means that full
range of small ints are not available in lexical form. This issue is usually
masked by presence of long int, but without them, it's not possible to write
biggish numbers in a Python program.

To work that around, CONST_INT node is overloaded with special encoding for
full-range small ints, which eventually propagated to bytecode properly, as
LOAD_CONST_SMALL_INT.

TODO: Ideally, parser should not have its own "small int" type, and instead
all parts of interpreter should use single consistent type (even if that on
parser side requires more fat representation) - dealing with types fragmented
without real need is a chore.
@dpgeorge
Copy link
Member

How about we fix it properly now. Proposal is to change the encoding of parse node leaves to following (for 16 bits, easily generalises to 32, 64 bits):

0000 0000 0000 0000  no node
xxxx xxxx xxxx xx00  pointer to mp_parse_node_struct_t (x != 0, will always be 4 byte aligned)
xxxx xxxx xxxx xxx1  small int, x being the value
xxxx xxxx xxxt tt10  other, t=type, x=value

@pfalcon
Copy link
Contributor Author

pfalcon commented Feb 21, 2014

Well, that's nice encoding scheme! I didn't think it would be easy to change tagging away from current 4-bit scheme, so my thinking revolved around having another buffer to store small ints, and put index # into mp_parse_node_t . But indeed, if it's not compound node, and not small int, then payload is either qstr id or token id, so tags for such nodes can be easily extended from 4 to 5 bits.

So, let me know if you would have time for this refactor, otherwise I can give it a try.

@dpgeorge
Copy link
Member

Give it a shot!

pfalcon added a commit to pfalcon/pycopy that referenced this pull request Feb 22, 2014
@pfalcon
Copy link
Contributor Author

pfalcon commented Feb 22, 2014

Superseded by #314.

@pfalcon pfalcon closed this Feb 22, 2014
@pfalcon pfalcon deleted the deal-with-two-small-ints branch February 22, 2014 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants