New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSoC: JSON module, generators and context managers #483

Open
abonie opened this Issue Apr 1, 2017 · 15 comments

Comments

Projects
None yet
3 participants
@abonie
Contributor

abonie commented Apr 1, 2017

Intro

For my GSoC project I would like to fulfill a few tasks, that include finishing up some of the partially implemented data types, implementing modules from Python standard library that are not yet present in Batavia and providing support for some of the language constructs.
I will present my proposed schedule broken down into 3 phases, each roughly a month long. Each task will be described in detail in its phase.

Summarized deliverables:

  • Dicts and lists in Batavia fully conform to Python's dicts and lists
  • Native implementation of json module (with two minor caveats explained below)
  • Support for yield from statement
  • Support for with statement
  • Implementation of contextlib module (reused from CPython)

Phase 1 (June)

Tasks:

  • Complete implementation of dict and list data types
  • Implement JSON module from Python standard library

Full disclosure: June is the last month of summer semester at my university, so I will be able to commit 18+ hours per week in Phase 1, but definitely not full-time. That is why I am reaching for relatively low hanging fruits here, and proposing more challenging tasks in phases 2 & 3.

Why:

Dicts and lists are two most commonly used containers in Python. As it stands, their Batavia implementation is functional, but incomplete. Following methods are not properly implemented:

  • dict is missing: pop, popitem, setdefault, fromkeys while keys, values and items return type is incosistent with Python's reference documentation and __len__ is bugged
  • list is missing insert and remove and does not handle arguments passed to pop

JSON is the prevailing data format for browser/server communication. Obviously Batavia will need to support encoding and decoding between JSON representation and Python objects and that is what json module from standard library does.

Together these two changes will greatly increase convenience of manipulating and sending data.

How:

I will make use of JavaScripts builtin JSON object and its parse and stringify methods to provide native implementation conforming to API of Python's json module. This comes with few caveats:

  • In JavaScript's stringify it is not possible to disable circularity check while encoding an object to JSON. Therefore this implementation will ignore check_circular argument passed to corresponding methods.
  • Similarly there is no way to use stringify as a generator producing encoding in chunks "on the fly". As a result iterencode method of JSONEncoder will always return one whole chunk.
    This is not ideal, however it will still be consistent with Python's API

Schedule:

Week 1: Implement dict and list methods listed above
Week 2: json.dump, json.dumps
Week 3: json.JSONEncoder
Week 4: json.load, json.loads
Week 5: json.JSONDecoder, json.JSONDecodeError

Phase 2 (July)

Tasks:

  • Complete/fix implementation of generators
  • Support yield from statement
  • Start working on adding support for with statement

Why:

Context managers are objects that allow creating runtime context in which body of the with statement is executed. In my mind they are a great feature of Python language as they encourage writing safe programs, by being easier to work with in many aspects than try/finally statements. Currently there is no support for with statement in Batavia.

Generators in turn are an enigmatic concept in Python, that evolved into fully fledged coroutines. Complete support for generators is important to begin work on asyncio in the future, but also to provide full support for context managers. At this point Batavia does not support yield from statement, which greatly simplifies working with generators and current implementation of generators needs fixing (in particular close and throw do not work properly)

Schedule:

Week 6: Fix bugs in generators
Week 7 & 8: Handle YIELD_FROM opcode in Batavia's Python VM
Week 9 & 10: Handle opcodes SETUP_WITH and WITH_CLEANUP in VM

Phase 3 (August)

Tasks:

  • Finish working on with statement
  • Provide implementation of contextlib module from Python standard library

Why:

I have already described the benefits of with statements in previous phase. contextlib provides utilities for working with them, e.g. multiple context managers implementation and decorators for creating context managers.

CPython's contextlib is written in pure Python. It should be relatively straightforward to reuse that implementation. It does make few imports:

  • sys.stdout/in which are already implemented in Batavia
  • sys.exc_info which will require native implementation
  • functools.wraps again, this has Python implementation that could be reused
  • collections.deque however it is used as a simple stack, so could be replaced by a Python list

Schedule:

Week 11: Complete work on SETUP_WITH/WITH_CLEANUP
Week 12: For Python 3.5+ break down WITH_CLEANUP into WITH_CLEANUP_START and _FINISH, provide implementation that will work for async with in the future
Week 13: Implement "dependencies" for contextlib.py
Week 14: Incorporate CPython's contextlib into Batavia

@freakboy3742 freakboy3742 added the GSoC label May 31, 2017

@freakboy3742

This comment has been minimized.

Show comment
Hide comment
@freakboy3742

freakboy3742 May 31, 2017

Member

This project was selected as part of the 2017 GSoC.

Member

freakboy3742 commented May 31, 2017

This project was selected as part of the 2017 GSoC.

@abonie

This comment has been minimized.

Show comment
Hide comment
@abonie

abonie Jun 7, 2017

Contributor

Week 1 status update:
So I have already dealt with lists and most of dict stuff, see #518 and #560 and am currently working on json encoding.

I was under impression, that I could easily get JSON.stringify to do the heavy lifting, unfortunately I realize now that is not the case, and since I am adjusting the approach, I also decided to do JSONEncoder first and dump/dumps after that.

You can checkout my current work on branch json of my fork: https://github.com/abonie/batavia/commits/json
Essentially a very basic implementation of JSONEncoder is done, I am trying to figure out one bug and then move on to more advanced features of that class.

Contributor

abonie commented Jun 7, 2017

Week 1 status update:
So I have already dealt with lists and most of dict stuff, see #518 and #560 and am currently working on json encoding.

I was under impression, that I could easily get JSON.stringify to do the heavy lifting, unfortunately I realize now that is not the case, and since I am adjusting the approach, I also decided to do JSONEncoder first and dump/dumps after that.

You can checkout my current work on branch json of my fork: https://github.com/abonie/batavia/commits/json
Essentially a very basic implementation of JSONEncoder is done, I am trying to figure out one bug and then move on to more advanced features of that class.

@abonie

This comment has been minimized.

Show comment
Hide comment
@abonie

abonie Jun 13, 2017

Contributor

Week 2:
I have tracked what I thought was a bug in my code (mentioned last week) but turned out to be just some unexpected output cleansing and fixed it along with few bugs: #566

Then I spent quite some time trying to work out how to properly pass keyword arguments to a class constructor implemented in native code, e.g. for Python code JSONEncoder(foo=4, bar=2), where JSONEncoder is implemented in JavaScript, I would like to access foo and bar with their respective values in JSONEncoder JS constructor. I could not figure it out and did the next best thing IMO: exposed a JSONEncoder function, that creates instances of JSONEncoder (much like dict is currently implemented in Batavia). That helps, because I know how to access keyword arguments in a regular function.
This is not ideal, so if anyone has a better solution, please let me know.

Finally I fixed few minor bugs in my code and covered a number of JSONEncoder init arguments:
skipkeys, allow_nan, indent and separators.

Again, you can see it all here: https://github.com/abonie/batavia/commits/json

Edit: One issue I forgot to mention is with sort_keys, I was going to use sorted for this, but Batavia's implementation is incomplete, i.e. does not work for list of tuples and does not support key param. I will either fix sorted or find a workaround, unless someone wants to tackle sorted issue on their own 😃

Contributor

abonie commented Jun 13, 2017

Week 2:
I have tracked what I thought was a bug in my code (mentioned last week) but turned out to be just some unexpected output cleansing and fixed it along with few bugs: #566

Then I spent quite some time trying to work out how to properly pass keyword arguments to a class constructor implemented in native code, e.g. for Python code JSONEncoder(foo=4, bar=2), where JSONEncoder is implemented in JavaScript, I would like to access foo and bar with their respective values in JSONEncoder JS constructor. I could not figure it out and did the next best thing IMO: exposed a JSONEncoder function, that creates instances of JSONEncoder (much like dict is currently implemented in Batavia). That helps, because I know how to access keyword arguments in a regular function.
This is not ideal, so if anyone has a better solution, please let me know.

Finally I fixed few minor bugs in my code and covered a number of JSONEncoder init arguments:
skipkeys, allow_nan, indent and separators.

Again, you can see it all here: https://github.com/abonie/batavia/commits/json

Edit: One issue I forgot to mention is with sort_keys, I was going to use sorted for this, but Batavia's implementation is incomplete, i.e. does not work for list of tuples and does not support key param. I will either fix sorted or find a workaround, unless someone wants to tackle sorted issue on their own 😃

@abonie

This comment has been minimized.

Show comment
Hide comment
@abonie

abonie Jun 20, 2017

Contributor

Week 3:
This week I have completed the encoding part of json module. PR is ready for review: #577.
As mentioned, I ended up NOT using JSON.stringify as heavily, which means that there is no reason not to implement JSONEncoder.interencode as a proper generator, except the lack of time. But also I will be working on generators next month, so it may be better to wait anyway.

With some pointers from @freakboy3742 I investigated the kwargs issue from last week. I think it requires a wider debate, for now it is still unresolved.

I have also started working towards decoding json, here is relevant branch of my fork: https://github.com/abonie/batavia/tree/json_decoding.

Unfortunately, next week I may be to busy with exams to achieve week 4 goals. My first priority will be responding to any comments regarding PR #577 and then, when time permits, move forward with decoding.

Contributor

abonie commented Jun 20, 2017

Week 3:
This week I have completed the encoding part of json module. PR is ready for review: #577.
As mentioned, I ended up NOT using JSON.stringify as heavily, which means that there is no reason not to implement JSONEncoder.interencode as a proper generator, except the lack of time. But also I will be working on generators next month, so it may be better to wait anyway.

With some pointers from @freakboy3742 I investigated the kwargs issue from last week. I think it requires a wider debate, for now it is still unresolved.

I have also started working towards decoding json, here is relevant branch of my fork: https://github.com/abonie/batavia/tree/json_decoding.

Unfortunately, next week I may be to busy with exams to achieve week 4 goals. My first priority will be responding to any comments regarding PR #577 and then, when time permits, move forward with decoding.

@abonie

This comment has been minimized.

Show comment
Hide comment
@abonie

abonie Jun 28, 2017

Contributor

Week 4:
As I mentioned, this week I had to suspend my activity in Batavia. Last week's PR is merged now, but no other progress has been made. My last exam is on Friday and after that I will start to catch up with my schedule -- Plan for next week is to do as much json decoding as possible.

Contributor

abonie commented Jun 28, 2017

Week 4:
As I mentioned, this week I had to suspend my activity in Batavia. Last week's PR is merged now, but no other progress has been made. My last exam is on Friday and after that I will start to catch up with my schedule -- Plan for next week is to do as much json decoding as possible.

@abonie

This comment has been minimized.

Show comment
Hide comment
@abonie

abonie Jul 4, 2017

Contributor

Week 5:
This week I had less time than expected, but I managed to put together a functional implementation of JSONDecoder. As I realized, that some of the decoder's features like parse_int, parse_float or object_pairs_hook arguments cannot be implemented in a sensible way with JSON.parse, given limited time this week, I decided to provide something that can be used right away for less sophisticated use cases. Here is PR: #586, still WIP, but mostly ready and it will be done by tomorrow. After that, I will post some sort of recap of the first 5 weeks.

Contributor

abonie commented Jul 4, 2017

Week 5:
This week I had less time than expected, but I managed to put together a functional implementation of JSONDecoder. As I realized, that some of the decoder's features like parse_int, parse_float or object_pairs_hook arguments cannot be implemented in a sensible way with JSON.parse, given limited time this week, I decided to provide something that can be used right away for less sophisticated use cases. Here is PR: #586, still WIP, but mostly ready and it will be done by tomorrow. After that, I will post some sort of recap of the first 5 weeks.

@abonie

This comment has been minimized.

Show comment
Hide comment
@abonie

abonie Jul 11, 2017

Contributor

Week 6:
I finished up #586 (it is ready for review now) and moved on to generators. Fixed number of issues so far but there is still at least one problem with throw method of generators I could not figure out yet. PR: #590

Also to sum up phase one:
Main goals have been achieved. Few complications arose when I realized JSON.parse and JSON.stringify cannot handle all of the json module functionalities. In the end there are some minor unfinished issues:

  • dict methods: values, keys and items were left as is (they return lists instead of special view-objects).
  • decoding JSON is missing few secondary features: parse_int, parse_float, parse_const and object_pairs_hook.
  • JSONEncoder and JSONDecoder are exposed as factory functions instead of actual classes (because Batavia does not support keyword arguments for class constructors).
  • (edit) also skipped JSONDecodeError as I realized it is not present in Python 3.4
Contributor

abonie commented Jul 11, 2017

Week 6:
I finished up #586 (it is ready for review now) and moved on to generators. Fixed number of issues so far but there is still at least one problem with throw method of generators I could not figure out yet. PR: #590

Also to sum up phase one:
Main goals have been achieved. Few complications arose when I realized JSON.parse and JSON.stringify cannot handle all of the json module functionalities. In the end there are some minor unfinished issues:

  • dict methods: values, keys and items were left as is (they return lists instead of special view-objects).
  • decoding JSON is missing few secondary features: parse_int, parse_float, parse_const and object_pairs_hook.
  • JSONEncoder and JSONDecoder are exposed as factory functions instead of actual classes (because Batavia does not support keyword arguments for class constructors).
  • (edit) also skipped JSONDecodeError as I realized it is not present in Python 3.4
@abonie

This comment has been minimized.

Show comment
Hide comment
@abonie

abonie Jul 18, 2017

Contributor

Week 7:
PR with fixes for various generator bugs and missing features is ready for review: #590

I looked into implementing yield from. It does not seem overly complicated, some initial work is on my fork: abonie@9bc4dcc and I will follow up this week.

Contributor

abonie commented Jul 18, 2017

Week 7:
PR with fixes for various generator bugs and missing features is ready for review: #590

I looked into implementing yield from. It does not seem overly complicated, some initial work is on my fork: abonie@9bc4dcc and I will follow up this week.

@abonie

This comment has been minimized.

Show comment
Hide comment
@abonie

abonie Jul 25, 2017

Contributor

Week 8:
yield from is there as a PR: #592 and I have already started working on with statement: https://github.com/abonie/batavia/commits/with_statement.

I feel like it may be a good idea to add even more tests for generators, so I'll try to provide some if I get any extra time.

Contributor

abonie commented Jul 25, 2017

Week 8:
yield from is there as a PR: #592 and I have already started working on with statement: https://github.com/abonie/batavia/commits/with_statement.

I feel like it may be a good idea to add even more tests for generators, so I'll try to provide some if I get any extra time.

@abonie

This comment has been minimized.

Show comment
Hide comment
@abonie

abonie Aug 1, 2017

Contributor

Week 9:
There is a WIP PR for with statement: #594, it only needs some minor cleanup and will be ready for review.

I also have a quick, but working solution for splitting WITH_CLEANUP into WITH_CLEANUP_START and WITH_CLEANUP_FINISH here: https://github.com/abonie/batavia/tree/with_cleanup_python35. I will double check that this is all that is required here and submit a PR soon.

Contributor

abonie commented Aug 1, 2017

Week 9:
There is a WIP PR for with statement: #594, it only needs some minor cleanup and will be ready for review.

I also have a quick, but working solution for splitting WITH_CLEANUP into WITH_CLEANUP_START and WITH_CLEANUP_FINISH here: https://github.com/abonie/batavia/tree/with_cleanup_python35. I will double check that this is all that is required here and submit a PR soon.

@abonie

This comment has been minimized.

Show comment
Hide comment
@abonie

abonie Aug 8, 2017

Contributor

Week 10:
I finished up with statement (#594) and once that was merged, submitted PR for the Python 3.5+ compatibility (#620). Both of those things were more or less ready last week, so most of this week I spent working on parsing Python 3.6 bytecode. It is not ready yet, but you can check out current state here #621

BTW I should explain: Russell suggested that Python 3.6 compatibility would be more worthwhile than contextlib that I had planned for last weeks of GSoC in my proposal, so we decided to give up contextlib for now and that is why I focused on support for 3.6 bytecode changes

Contributor

abonie commented Aug 8, 2017

Week 10:
I finished up with statement (#594) and once that was merged, submitted PR for the Python 3.5+ compatibility (#620). Both of those things were more or less ready last week, so most of this week I spent working on parsing Python 3.6 bytecode. It is not ready yet, but you can check out current state here #621

BTW I should explain: Russell suggested that Python 3.6 compatibility would be more worthwhile than contextlib that I had planned for last weeks of GSoC in my proposal, so we decided to give up contextlib for now and that is why I focused on support for 3.6 bytecode changes

@abonie

This comment has been minimized.

Show comment
Hide comment
@abonie

abonie Aug 15, 2017

Contributor

Week 11:
Still working on #621 (Python 3.6 bytecode compatibility). CALL_FUNCTION_* opcodes are done (and in the meantime I fixed splat operator #623), but I have some trouble with EXTENDED_ARG right now.

I think it would be good to create separate branches for different Python versions rather sooner than later.

edit: regarding EXTENDED_ARG problem, if someone could actually explain why in current implementation we add 4 NOPs here https://github.com/pybee/batavia/blob/master/batavia/VirtualMachine.js#L775 that would be most helpful :)
edit: got an answer - a NOP there is necessary there, so that next_pos property of previous opcode does not point to "void". And it turns out one NOP is enough.

Contributor

abonie commented Aug 15, 2017

Week 11:
Still working on #621 (Python 3.6 bytecode compatibility). CALL_FUNCTION_* opcodes are done (and in the meantime I fixed splat operator #623), but I have some trouble with EXTENDED_ARG right now.

I think it would be good to create separate branches for different Python versions rather sooner than later.

edit: regarding EXTENDED_ARG problem, if someone could actually explain why in current implementation we add 4 NOPs here https://github.com/pybee/batavia/blob/master/batavia/VirtualMachine.js#L775 that would be most helpful :)
edit: got an answer - a NOP there is necessary there, so that next_pos property of previous opcode does not point to "void". And it turns out one NOP is enough.

@abonie

This comment has been minimized.

Show comment
Hide comment
@abonie

abonie Aug 22, 2017

Contributor

Week 12:
Python 3.6 compatibility #621 done and merged.

I fixed multiple discrepancies in error messages between Python 3.4, 3.5 and 3.6 here #643, it is ready for review. Now most of the tests pass for each version. BTW discovered and fixed bug in yield from #646

Contributor

abonie commented Aug 22, 2017

Week 12:
Python 3.6 compatibility #621 done and merged.

I fixed multiple discrepancies in error messages between Python 3.4, 3.5 and 3.6 here #643, it is ready for review. Now most of the tests pass for each version. BTW discovered and fixed bug in yield from #646

@abonie

This comment has been minimized.

Show comment
Hide comment
@abonie

abonie Aug 30, 2017

Contributor

Wrap up:
#643 is merged, I promised a cleanup PR to make runtime version checks less awkward, but did not manage to deliver this week. I still intend to do it next week though.

Other than that, GSoC 2017 is over and I am pretty happy with work I have accomplished over the summer. Few things from original proposal that are still missing:

  • dict view objects
  • few features of JSON decoding (I expect that CPython implementation of json module could be reused in Batavia pretty soon though)
  • contextlib (with my mentor, we decided to skip this in favor of Python 3.6 bytecode support)

Everything else, and a little bit more, is done and merged :)

Thank you all very much, it's been a blast 👍

Contributor

abonie commented Aug 30, 2017

Wrap up:
#643 is merged, I promised a cleanup PR to make runtime version checks less awkward, but did not manage to deliver this week. I still intend to do it next week though.

Other than that, GSoC 2017 is over and I am pretty happy with work I have accomplished over the summer. Few things from original proposal that are still missing:

  • dict view objects
  • few features of JSON decoding (I expect that CPython implementation of json module could be reused in Batavia pretty soon though)
  • contextlib (with my mentor, we decided to skip this in favor of Python 3.6 bytecode support)

Everything else, and a little bit more, is done and merged :)

Thank you all very much, it's been a blast 👍

@wolfv

This comment has been minimized.

Show comment
Hide comment
@wolfv

wolfv Aug 30, 2017

Contributor

Congrats!

Contributor

wolfv commented Aug 30, 2017

Congrats!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment