Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong paired paths when syncing with the --pre-commit flag #506

Open
slideclimb opened this issue May 16, 2020 · 16 comments
Open

Wrong paired paths when syncing with the --pre-commit flag #506

slideclimb opened this issue May 16, 2020 · 16 comments
Milestone

Comments

@slideclimb
Copy link

slideclimb commented May 16, 2020

I have the following directory structure, where the nb/test.ipynb is paired with the py/test.py.

├── nb
│   └── test.ipynb
└── py
    └── test.py

The notebook and Python file are paired with "jupytext": {"formats": "py//py,nb//ipynb"}. (using jupytext --set-formats py//py,nb//ipynb nb/test.ipynb)
This works fine when syncing with jupytext --sync nb/test.ipynb, but when syncing with jupytext --sync --pre-commit I get the following exception:

[jupytext] Notebooks in git index are:
py/test.py
[jupytext] Reading py/test.py
Traceback (most recent call last):
  File "PATH/jupytext", line 8, in <module>
    sys.exit(jupytext())
  File "PATH/jupytext/cli.py", line 292, in jupytext
    exit_code += jupytext_single_file(nb_file, args, log)
  File "PATH/jupytext/cli.py", line 389, in jupytext_single_file
    notebook, inputs_nb_file, outputs_nb_file = load_paired_notebook(notebook, fmt, nb_file, log)
  File "PATH/jupytext/cli.py", line 596, in load_paired_notebook
    for alt_path, alt_fmt in paired_paths(nb_file, fmt, formats):
  File "PATH/jupytext/paired_paths.py", line 128, in paired_paths
    raise InconsistentPath(u"Paired paths '{}' do not include the current notebook path '{}'. "
jupytext.paired_paths.InconsistentPath: Paired paths 'py/py/test.py','py/nb/test.ipynb' do not include the current notebook path 'py/test.py'. Current format is 'py:light', and paired formats are 'py//py,nb//ipynb'.

I have tried all sorts of pairings (taken from/inspired by the comments in #180), among them:

  • ..//py,ipynb (and ipynb,..//py`)
  • nb//ipynb,..//py
  • nb//ipynb,py//py
    but they all throw a similar exception (and work fine with a regular sync).

Addition
As a current workaround I'm using jupytext --sync nb/*.ipynb in my pre-commit hook, which seems to work fine.

@mwouts
Copy link
Owner

mwouts commented May 17, 2020

Hello @slideclimb , thank you for reporting this. I will have a look soon.

@mwouts
Copy link
Owner

mwouts commented May 18, 2020

Hi @slideclimb , I have extracted a test from your report (thanks again), see ff5b466 .

Here at least, the test runs OK, so I am sorry I have not yet reproduced your issue.

Do you think there is remaining difference, either

  • in the notebook metadata (but, according to the error message, we do seem to have the same jupytext.formats metadata (i.e. py//py,nb//ipynb)
  • or in the way we call jupytext --sync --pre-commit - as you will see in my test, that command is expected to belong to the pre-commit hook. Is that the way you call it as well?

Also, it you can, let me know if the same test works for you, or not, and/or if you can tweak it to reproduce the issue.

@slideclimb
Copy link
Author

Thanks for taking the time to create a detailed example!

The only difference I see is that you have the notebook file tracked in git, whereas I have the Python file tracked. This seems to be the problem, tracking only the Python file throws an exception, but tracking (only) the notebook file works fine. We don't have any difference in the other points you mentioned. The metadata is the same and jupytext --sync --pre-commit is called from the pre-commit hook.

Now I'm starting to doubt if I'm using jupytext the way it is intended... I use jupytext to create a paired Python file that I put in git, so I don't have to put the notebook file in git to avoid cumbersome and useless diffs/commits. Does that make sense?

@mwouts
Copy link
Owner

mwouts commented May 19, 2020

Oh I see... So, indeed now I can reproduce what you see (and I'll try to fix it).

Now, the question is what you want to do with the pre-commit. If you have the Python file tracked, then the .ipynb notebook will be updated when you change the .py file and commit. And if you edit the .ipynb file, then Jupyter will update both files, so your setting looks fine indeed! (and is probably a better one than the one we have in the documentation 😃 ).

@mwouts
Copy link
Owner

mwouts commented May 22, 2020

I have added the test above to the collection of Jupytext test. Following the changes made at #508 , the test now works, so I'll close this issue, and let you know (soon) when I have published a release candidate so that you can test on your side. Thanks again for your report!

@mwouts mwouts closed this as completed May 22, 2020
@mwouts
Copy link
Owner

mwouts commented May 22, 2020

@slideclimb , would you like to give a try to the latest RC, and hopefully confirm that the problem is gone? Thanks!

pip install jupytext==1.5.0rc1

@slideclimb
Copy link
Author

Thanks for the quick fix! I can confirm that the exception is gone 😄... But the pre-commit hook does nothing when I change the notebook file. (The list of notebooks in git index is empty.) Syncing manually (with jupytext --sync nb/test.ipynb) then updates the Python file, but when committing after that, git complains that it can't add the notebook file to git because it's in the .gitignore (as it should be).

Just to be sure we're on the same page, what I've just done is:

  • Create notebook nb/test.ipynb
  • Set up a jupytext pair with a python file py/test.py (this creates the python file)
  • Add the Python file to git (and add the notebook to the .gitignore)
  • Commit the Python file
  • Edit the notebook file
  • Commit (possibly add a dummy Python file to edit so the diff isn't empty before committing)

If I understand your test correctly, you first create the Python file and set it up with a paired notebook file? I think that's the only difference I see, as I create a notebook file and set it up paired Python file.


When I have only notebook files in a repo, the git diff is always empty and therefore I can't commit. But usually I have other python files living next to it that have been edited, or I manually update one of the notebooks before committing which then takes care of the rest.

@mwouts
Copy link
Owner

mwouts commented May 22, 2020

Thanks @slideclimb for the follow-up.

I'll adjust the test to match your use case, i.e. add *.ipynb to .gitignore.

One more question: which program do you use to edit the .ipynb file? If it's Jupyter notebook or Lab, and if you have Jupytext installed in the same environment, then .py file should be updated when you save the .ipynb file. But if you use other editors like PyCharm of VS Code for which we don't have a plugin, indeed, changing the .ipynb file is not going to update the .py version...

@mwouts mwouts reopened this May 22, 2020
@slideclimb
Copy link
Author

Thanks, I really hope we get it right this time :p

I use PyCharm.

@mwouts
Copy link
Owner

mwouts commented May 23, 2020

@slideclimb, that's good to know. We don't have a plugin yet for PyCharm (cf. #147), but I was wondering if one couldn't use the File Watchers in PyCharm. Personnally, I do this for black, and it works well. So I tried the following:

Screenshot from 2020-05-23 22-44-06

and it works partially, in the sense that, if you edit the .ipynb file, it does creates and updates the .py file.

Please pay attention however: with this settings, any change to the .py file that you don't commit will be lost! So maybe we should have another file watcher - the same, but on .py files (here it seems to work). And then, when you have the two file watchers, probably... you don't need the pre-commit hook, right?

Let me know what you think!

@slideclimb
Copy link
Author

That's a great solution! Didn't know about file watchers. I suppose this works, and I guess the pre-commit hook then indeed is unnecessary.

Nonetheless, I still think the pre-commit hook should work...


About a plugin, I have some experience with JetBrains plugins, so if you want I could try to cook up a prototype some time? I guess it should be possible to register file watchers from within that plugin, and add something that does a --set-formats whenever a notebook file is created. I'm probably missing a lot of jupytext details/features, but I'm curious to see if something this simple would work and it could possibly be a starting point for a full-fledged plugin.

@slideclimb
Copy link
Author

I tried setting up the file watchers in an existing project. So the notebook and python files are already paired, and currently synced. I observed the following:

  • When using only a watcher on the notebook, things seem to work pretty well. I'm not too convinced, as the first few times I executed a cell the entire notebook disappeared (which I could recover thanks to git), and later all the output disappeared, but I haven't been able to reproduce either. After that it magically worked fine.
  • When also using a watcher on the python file, a cell has to be executed twice before the output also changes. I have no idea why, the python file updates correctly after the first cell executes (and that does not store any output, right?).

I'll edit this comment whenever I have been able to reproduce these issues. So far I have not been successful in reproducing either. Just putting it here so you know it exists and I will get reminded of it whenever I get back here :)

@mwouts
Copy link
Owner

mwouts commented May 26, 2020

Thank you @slideclimb for the update. Yes using Jupytext in a file watcher is super new and maybe not so safe... Maybe this is because jupytext --sync is 100% based on file timestamps. And maybe we can improve this based on the experience with the two other plugins that exist at the moment:

a) In Jupyter, the plugin

  • will always load the input cells from the text version
  • but has an additional safety: it will not open a notebook when the script is older than the notebook

b) In VIM, the jupytext-vim plugin does not use --sync, but instead does something like jupytext --to py:percent notebook.ipynb and jupytext --to ipynb --update script.py. Maybe you cannot really use that in your setting, because you don't want to compute yourself the location of paired files, but maybe it would help to have an additional argument in jupytext --sync to let it know about which file is the most up-to-date, to avoid defaulting to timestamps when a more precise information is available.

Also, I did not follow-up on the pre-commit. I agree that it should work. We left it at the point where it was not acting on changes on the .ipynb file, because the .ipynb file was ignored. Do you think that, instead of ignoring the .ipynb file, you could unstage it in the pre-commit hook as we do here ?

@benlindsay
Copy link

I'm interested in the what the solution of unstaging the .ipynb file instead of .gitignore-ing it looks like, but i didn't see that in the link you shared. Can you point me to an example of that? Just starting in on jupytext and loving it already, so thanks for building it!

@mwouts
Copy link
Owner

mwouts commented Mar 7, 2021

Hi @benlindsay , I think in the comment above I was refering to the pre-commit script example at https://github.com/mwouts/jupytext/blob/v1.9.1/docs/using-pre-commit.md. We've removed it from the current version of the documentation as we think that pre-commit hooks are the way to go.

By the way, our current documentation at https://jupytext.readthedocs.io/en/stable/using-pre-commit.html is a bit short. We plan to post a longer article about pre-commit hooks for notebooks in the near future. Until then the best documentation are the tests test_pre_commit_(...).py in https://github.com/mwouts/jupytext/tree/master/tests.

We have not included an example where we unstage the .ipynb file as we were not sure that was a good practice, but I'll be happy to further discuss that.

In you own use case, do you use Jupyter to edit notebooks? If so, and if you only commit the .py files, then I'd be tempted to say that you don't need a hook, as Jupyter itself (with the Jupytext plugin) will take care of always taking the .py file as the most up-to-date version of the notebook. What do you think?

@benlindsay
Copy link

Thanks for the links! I plan to edit notebooks in jupyter and sometimes editing the .py files directly. The solution I'm trying out now is leaving .ipynb files .gitignored, then adding a pre-commit script that loops through all the notebooks in my repo, running jupytext --sync --pipe black on each file. Seems like it works fine for now. We'll see how it holds up when I actually start using it regularly in my workflow. Thanks again for this package! I'm excited to get a cleaner workflow set up around notebooks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants