Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider making side_effect(file_obj=) arg more generic #114

Closed
erikrose opened this issue Feb 23, 2017 · 15 comments
Closed

Consider making side_effect(file_obj=) arg more generic #114

erikrose opened this issue Feb 23, 2017 · 15 comments

Comments

@erikrose
Copy link
Collaborator

Closing a file after yielding its last line is all well and good, but I wonder: can we be more general and add power without losing much brevity?

side_effect(log, some_file, last=lambda: some_file.close())
side_effect(ingest, listdir(some_dir), last=lambda: rmtree(some_dir))

@bbayles? @yardsale8?

@bbayles
Copy link
Collaborator

bbayles commented Feb 23, 2017

Hmm, interesting.

(1) This makes me think that at the minimum we should change the documentation (and maybe the argument name) to be more generic - it works for anything that needs to be closed, not just file objects.

(2) it does seem like files are a justifiably special case with side_effect, since writing to a file is the quintessential "impure" activity.

(3) Adding a generic "do this thing after you're done side_effecting the iterable items" opens the door to "do this thing before you start side_effecting the iterable items," which seems like it might be a bridge too far? The goal here is to make it easy to do something common (close a file) that's awkward to do currently (you'd have to wrap your whole pipeline in a with block), not to let you put more things on a line (this was my initial resistance to #112 ).

I'll look at the language for (1) and noodle around with (3).

@bbayles
Copy link
Collaborator

bbayles commented Feb 23, 2017

I thought the "well if you want to run a generic function after, why not before" was a reductio ad absurdum, but now that I've implemented it I kind of like it? See #115.

@erikrose
Copy link
Collaborator Author

I came to the same conclusion moments after my head hit the pillow: why not do before also? It took another 30 seconds from there to "Why not just have it take a context manager?" From there, I landed back at with_iter. In fact, I still am failing to see why #112 couldn't have just been written as…

consume(print(x, file=output) for output in with_iter(open("temp.tmp","w")) for x in range(3)).

@yardsale8
Copy link

yardsale8 commented Feb 24, 2017

I am with Erik on the fact that we are basically reinventing context management. Erik's suggestion was one of the first I tried, but it fails.

In [1]:  [print(x, file=out) for out in with_iter(open('temp.tmp', 'w')) for x in range(3)]
(snip)
    342     with context_manager as iterable:
--> 343         for item in iterable:
    344             yield item
    345

UnsupportedOperation: not readable

To make this work we need to yield a reference to the context manager from inside a with statement. Can either of you see a way to do this more explicitly from inside side_effect?

@bbayles
Copy link
Collaborator

bbayles commented Feb 25, 2017

Right, with_iter() would be fine for reading from a file that's the source for our iterable, but in Erik's example the range() is the source.

contextlib.closing() almost does what we want, but not quite.

At this point I like the before and after things, so I'm inclined to merge that. But I'd be interested if there's something more we might stumble onto in this discussion.

@bbayles
Copy link
Collaborator

bbayles commented Feb 25, 2017

I could see divorcing this from side_effect() and re-introducing with_file() with a different name and explanation. Instead of re-inventing context management we'd just be using it.

def contextualize(obj):
    """Wrap *obj*, an object that supports the context manager protocol,
    in a ``with`` statement and then yield the resultant object.

    The object's ``__enter__()`` method runs before this function yields, and
    its ``__exit__()`` method runs after control returns to this function.

    This can be used to operate on objects that can close automatically when
    using a ``with`` statement, like IO objects:

        >>> from io import StringIO
        >>> from more_itertools import consume
        >>> it = [u'1', u'2', u'3']
        >>> file_obj = StringIO()
        >>> consume(
        ...     print(x, file=f) for f in contextualize(file_obj) for x in it
        ... )
        >>> file_obj.closed
        True

    """
    with obj as context_obj:
        yield context_obj

@yardsale8
Copy link

Here's a thought. side_effect could be generalized to take 1 or 2 (maybe more) sequences and a function with a matching number of arguments. Then you have a clean composition with contextualize defined above.

consume(side_effect(lambda file, x: print(x, file=file), contextualize(open('temp.tmp','w'), xs))

To make this work, the iteration would need to be a double loop (Cartesian product), equivalent to

(func(x,y) for x in xs for y in ys)

I am not sure how you would generalize the chunking in side-effect. Perhaps a list of chunk-sizes when there is more than one sequence?

@bbayles
Copy link
Collaborator

bbayles commented Feb 26, 2017

Hmm, I see what you mean, but I'm not sure we should go there... 🐲

I think we should pick between #115 (before and after functions for side_effect()) and a re-worded #112 (with_file(), but with a different name and explanation). I'll let Erik say which one he likes best?


#115 Pros: before and after functions work nicely for setup and shutdown tasks. This is more flexible than what's currently in master (file_obj keyword), and it's easy to imagine how even the before function could be useful (see the write-a-header examples).

#115 Cons: Basically re-invents context management - before is __enter__; after is __exit__.

#112 Pros: Could be used places other than side_effect(). Fills what I think is a gap in contextlib - a function that makes sure __enter__ gets called, yields, and then makes sure __exit__ gets called.

#112 Cons: Not all that itertools-y?

@yardsale8
Copy link

yardsale8 commented Feb 27, 2017

I just realized that I can get the desired effect from my last example without rewriting side_effect using product from itertools (of course there is an itertool for this ... duh).

consume(side_effect(lambda f, x: print(x, file=f), product(contextualize(open('tmp.tmp','w')), xs))

Does this example make contextualize feel more like an itertools tool?

I personally like that I can do this without holding a reference to the file handler, allowing the side-effect to be defined and consumed in the one expression. This would be my one knock on the current solution, before and after require that I hold an external reference to the file, which might as well come from a with statement.

Unfortunately, the expression using product and contextualize is pretty complicated and obtuse, not the sort of thing I would like to present to my students.

What if with_file\contextualize acted like product(contextualize(f), xs), sort of a context zip?

def contextzip(context, seq):
    with context as c:
        for item in seq:
            yield (c, item)

This allows the user to bundle a context manager with the items of the sequence. They could do what-ever is required with the pair, assured that the context manager will be processed correctly behind the scenes.

Then the log to a file example becomes

print_to_file = lambda *args: print(args[1], file=args[0])
consume(side_effect(print_to_file, contextzip(open('temp.tmp','w'), xs)))

@erikrose
Copy link
Collaborator Author

Thank you both for thinking through all this in such depth! I really like contextualize. If we rename it context, look how nicely callsites read:

    @contextmanager
    def fake_file():
        """Pretend we're opening and closing a file."""
        print('open')
        yield 'some_file'
        print('close')

--> [print(x, c) for c in context(fake_file()) for x in range(3)]
open
0 some_file
1 some_file
2 some_file
close

The simpler context() + genexprs should be able to fill the role of contextzip, plus you can use arbitrarily many context managers by adding more in clauses.

I am still thinking about whether before= and after= kwargs are a good idea on side_effect().

@erikrose
Copy link
Collaborator Author

side_effect()'s before and after are expressible using context(), assuming they are already conveniently encapsulated in a context manager:

[x for c in context(fake_file())
   for x in side_effect(lambda x: print('logging: ', x * 2, c), range(3))]

That works even with Python 3's tighter list comprehension scoping. (As before, you have to be careful to put your context manager in the first in clause, or it opens and closes repeatedly.)

If the open/close behavior you're looking for is not in a context manager, it gets wordy. In that case, before/after are certainly a win on conciseness and arguably on clarity as well. Given that the original use case was logging to a file, I think we should merge context() and then wait to see if a common need for non-context-manager before/after hooks on side_effect() emerge. If they do, we can merge #115 also as a convenience.

@yardsale8
Copy link

I think you're right, context is simple and provides the most flexibility.

@bbayles
Copy link
Collaborator

bbayles commented Mar 21, 2017

I'm good with this outcome; I'll back off the file_obj addition to side_effect() and put out a PR for context(). I'll also close #115 pending evidence of its usefulness.

@bbayles
Copy link
Collaborator

bbayles commented Mar 21, 2017

That's #118. I think ready for a release after that? I have a branch mostly ready for that with some cleanup.

@erikrose
Copy link
Collaborator Author

Great! Thanks, @bbayles and @yardsale8!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants