Beam prototype new dofn #1

Closed
wants to merge 1 commit into
from

Projects

None yet

2 participants

@sb2nov
Owner
sb2nov commented Jan 10, 2017 edited

Prototype for the newDoFn in python

Pending tasks:

  • Enable type checking back
  • Make sure all unittests pass
  • Clean up some of the code in the DoFnRunner
  • Fix TODOs
sdks/python/apache_beam/dataflow_test.py
"""A custom DoFn for a FlatMap transform."""
- def process(self, context, prefix, suffix):
- return ['%s-%s-%s' % (prefix, context.element, suffix)]
+ def process(self, prefix, element=NewDoFn.ElementParam,
@robertwb
robertwb Jan 12, 2017

We should have a NewDoFn.SideInputParam which makes it an optional value without requiring it to come before the other arguments.

@sb2nov
sb2nov Jan 12, 2017 Owner

Should be differentiate between the deferred vs non-deferred side inputs ? Here I had only added the default value to the deferred one as the other just comes from args

@robertwb
robertwb Jan 12, 2017

also, NewDoFn.ElementParam should be default on first argument. Thus this signature (and most others) wouldn't change but for context -> element.

Probably makes sense to nail API before changing all examples (in a separate commit).

@sb2nov
sb2nov Jan 12, 2017 Owner

Makes sense. Two follow up questions then:

  1. Should we change functions like https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L752 to have the second argument tagged as SideInputParam ?

  2. Do we want to differentiate between deferred and non-deferred side inputs or it should not matter. Eg. https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L776

@robertwb
robertwb Jan 13, 2017
  1. No need. The "defaults" should be (element, side_input, side_input, ...). Actually, I wouldn't be opposed to requiring all side inputs to come last to be in the same order as the ParDo call (the interleaving may be odd). This can be relaxed, but not imposed, in the future based on more experience.

  2. No, we should not differentiate between these cases.

@sb2nov
Owner
sb2nov commented Jan 12, 2017 edited

@robertwb:

One more question about typechecking, in case of the new DoFn should we just ignore the arguments that are tagged as Context/Window/Timestamp and only check the ones that are either the Element or SideInput.

Also incase of the NewDoFn does that mean that all arguments should always be tagged with something.

@sb2nov sb2nov closed this Jan 16, 2017
@sb2nov sb2nov reopened this Jan 16, 2017
@sb2nov sb2nov closed this Jan 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment