Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider stripping out only IMAGES as option #58

Closed
rsvp opened this issue Jun 1, 2017 · 14 comments
Closed

Consider stripping out only IMAGES as option #58

rsvp opened this issue Jun 1, 2017 · 14 comments

Comments

@rsvp
Copy link

rsvp commented Jun 1, 2017

Frequently the output is useful as informal "testing" of results,
and there is very little overhead in keeping non-graphical results.
Images, however, add considerable unwanted bulk to a commit
for any version control system (unless those images are very
expensive to reproduce, or are historical for some reason).

Proposal: provide an option to strip out only output cells with images.

@kynan
Copy link
Owner

kynan commented Jun 9, 2017

Interesting, that sounds like a useful option. How do you see the "UI" of this option?

Suppose this could even be easily generalized to support any data type to either keep or strip out.

@rsvp
Copy link
Author

rsvp commented Jun 10, 2017

As an example, how about along these lines?

$ nbstripout --only=images [notebook(s)]
#  where images is just an alias for *.png, *.jpg, etc.

$ nbstripout --only='tmp-*.png' [notebook(s)]
#  _wildcard support_, say, for inconsequential images labeled as such by user.

@jpeacock29
Copy link
Contributor

I thought about this as I was implementing --keep-count and --keep-output and considered an options structure:

--keep-count, -c
--keep-output, -o
    --keep-text-output, -t
    --keep-image-output, -i
--keep-metadata, -m

Thus nbstripout -o would be equivalent to nbstripout -ti. Stripping only images would use nbstripout -ctm. For more fine-grained control at the cell level, keep_output-type metadata would suffice. That said, this doesn't scale well as maybe there are more output types we'd consider in the future. (Widgets?)

@rsvp
Copy link
Author

rsvp commented Jun 10, 2017

@jpeacock29 Re: more output types. Consider a flag like --regex, -r

where a user can implement their custom stripout by giving a regular expression.
This is what I did with sed -- then understood that an edited notebook
must leave in a trusted state. Going from regex to images would
then be a one-liner.

@kynan
Copy link
Owner

kynan commented Jun 10, 2017

Sounds quite sensible! Let's start with images since this seems to be the most important case.

One of you happy to have a go at this @rsvp @jpeacock29 ?

@jpeacock29
Copy link
Contributor

What would the regex flag be matched against? Every key in the ipynb?

@rsvp
Copy link
Author

rsvp commented Jun 11, 2017

@jpeacock29 here's a mock example for PNG images: $ nbstripout --regex='png": "i'

That should get rid of the super long lines encoding PNG images for now.
I question the permanence of that regular expression since it is subject
to the formatting whims upstream. Historically, both "image/png":
and "png": have been used as keys, so my example would fortunately
work for both cases.

But --regex would be a handy tool in any case.

@kynan
Copy link
Owner

kynan commented Jun 13, 2017

Feel free to have a go at this, I'm not sure when I might find time to work on it myself.

@kynan
Copy link
Owner

kynan commented Jul 30, 2017

One of you interested in working on this @rsvp @jpeacock29 ?

@kynan kynan added the state:waiting Waiting for response for reporter label Jul 30, 2017
@kynan kynan added this to the 0.3.2 milestone Jul 30, 2017
@kynan kynan removed this from the 0.3.2 milestone Jul 8, 2018
@kynan
Copy link
Owner

kynan commented Jul 9, 2018

Are you still interested in this @rsvp @jpeacock29 ?

@rsvp
Copy link
Author

rsvp commented Jul 10, 2018

hi @kynan my spare cycles are going to refactoring https://git.io/fecon235
so realistically maybe later this year.

Interestingly, one of the reasons leading to the spin-off of the source code
to another repository https://git.io/fecon236 was to leave behind all
the archival bulky images preserved in the .git for notebooks.

So this issue is still pertinent.

@kynan
Copy link
Owner

kynan commented Jun 28, 2020

@rsvp do you still have this use case?

@kynan
Copy link
Owner

kynan commented Apr 11, 2021

There's an in flight pull request (#135) that's somewhat related: only strip outputs that are larger than a certain size. Would that fit the bill?

@kynan
Copy link
Owner

kynan commented Sep 24, 2022

Given #135 has been released in nbstripout 0.5.0 and is arguably even more flexible than what's requested here I'll close this as fixed.

@kynan kynan closed this as completed Sep 24, 2022
@kynan kynan added resolution:fixed and removed help wanted state:waiting Waiting for response for reporter labels Sep 24, 2022
@kynan kynan modified the milestones: Backlog, 0.5.0 Sep 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants