Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Fix #62 Optional path argument in JoinMapper #65

wants to merge 5 commits into


None yet
2 participants

Now to get source path from the mapper routine just add **kwargs to the arguments list. Here are some examples.

def map_primary(key, value, **kwargs):
  key, value = value.strip().split('\t')
  print >> sys.stderr, key, value, kwargs['path']
  yield key, value

Or you can specify desired argument directly

def map_primary(key, value, path, **kwargs):
  key, value = value.strip().split('\t')
  print >> sys.stderr, key, value, path
  yield key, value

Callable instances are also supported

class MapSecondary(object):
  def __call__(self, key, value, path, **kwargs):
    key, value = value.strip().split(' ')
    print >> sys.stderr, value, path
    yield key, value

And previous mapper interface is working aswell

def map_primary(key, value):
  key, value = value.strip().split('\t')
  yield key, value

This approach allows easily extend interface to pass other arguments in the future


klbostee commented Jan 8, 2013

Sounds good! Will try to find some time to review and merge this soonish.

Did you measure the impact of this on performance in any way? I'm a bit worried that doing an additional (lambda) function call whenever the mapper gets called could make things substantially slower...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment