Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply retroactively? #28

Closed
craffel opened this issue Apr 18, 2016 · 11 comments
Closed

Apply retroactively? #28

craffel opened this issue Apr 18, 2016 · 11 comments

Comments

@craffel
Copy link

craffel commented Apr 18, 2016

Hi, sorry if this is already possible, but it would be great to apply this retroactively to a repository where I have already committed notebook files with output. I would think this would involve some kind of filter-branch wizardry. Is it already possible? Is this a feature you would want added?

@kynan
Copy link
Owner

kynan commented Apr 18, 2016

Yes, you can do this already by calling the nbstripout binary in a git filter-branch pipeline. I don't have a command line handy, but if you're happy to craft one, I'd be happy to add it to the README!

@craffel
Copy link
Author

craffel commented Apr 18, 2016

Here's how I accomplished it:

git filter-branch --tree-filter 'if [ -f foo.ipynb ]; then nbstripout foo.ipynb; fi'

The if syntax is because if foo.ipynb was not checked in from the first commit, then nbstripout will fail and the filter-branch operation will abort. Another way to avoid that would be to add an --ignore-unmatch flag to nbstripout, like git rm has, which would be pretty straightforward.

This script is great, by the way. Now, if only there was a way to get git diff to ignore cell output too...

@kynan
Copy link
Owner

kynan commented Apr 19, 2016

Thanks, that was about the command I would have imagined. Have you tried with --index-filter too?

Note sure what you mean about diff? If you install the nbstripout filter it will also be used by git diff...

@craffel
Copy link
Author

craffel commented Apr 19, 2016

Thanks, that was about the command I would have imagined. Have you tried with --index-filter too?

No, I only tried --tree-filter, is there an advantage to one or the other?

If you install the nbstripout filter it will also be used by git diff...

Hah, you're totally right, forgot to install it after running the filter-branch. Ok, this script is awesome!

@kynan
Copy link
Owner

kynan commented Apr 20, 2016

The --index-filter is faster since it does not need to check out the work tree but only works with the index directly. But I guess in your case speed was not a concern!

@craffel
Copy link
Author

craffel commented Apr 20, 2016

Yes, took about 10 seconds :) thanks for your help

@kynan
Copy link
Owner

kynan commented Apr 20, 2016

Pleasure. Close the issue if you're satisfied! :)

@belteshassar
Copy link

belteshassar commented Jun 30, 2017

Thanks for this. It worked great. Any reason it's not been added to the README yet? Here's an update that uses --index-filter and operates on all ipynb-files in the repo:

git filter-branch -f --index-filter '
    git checkout -- :*.ipynb
    find . -name '*.ipynb' -exec nbstripout {} +
    git add . --ignore-removal
'

If the repo is large and the notebooks are in a subdirectory it will run faster with git checkout -- :<subdir>/*.ipynb. You'll get a warning from git for commits where there is no notebook. If your really annoyed by this you can pipe stderr to /dev/null.

Naturally, if your repo contains mainly notebooks, you might as well use --tree-filter:

git filter-branch -f --tree-filter 'find . -name '*.ipynb' -exec nbstripout {} +'

@kynan
Copy link
Owner

kynan commented Jul 4, 2017

@belteshassar Good point adding it to the README, will do that.

ooiM added a commit to ooiM/nbstripout that referenced this issue Jun 2, 2020
@AntoineGlacet
Copy link

AntoineGlacet commented Jul 25, 2022

Hi, I am a bit late to the party, but can we update this to use git filter-repo instead of git filter-branch?
It is now the git-recommended best practice to rewrite history (https://github.com/newren/git-filter-repo/).

I am not very familiar with it though so can't really tell you the exact command.

@kynan
Copy link
Owner

kynan commented Jul 26, 2022

Thanks, good suggestion! I hadn't heard for git filter-repo before and it was new to me that it's now recommended over git filter-branch. I'm reluctant to make this change though since

  1. I'm not familiar with git filter-repo and have never used it
  2. It's not a built-in command for git and needs to be installed separately afaict

If anyone has experience and wants to send a PR I'm happy to consider!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants