Wrong paired paths when syncing with the --pre-commit flag #506

slideclimb · 2020-05-16T12:22:25Z

I have the following directory structure, where the nb/test.ipynb is paired with the py/test.py.

├── nb
│   └── test.ipynb
└── py
    └── test.py

The notebook and Python file are paired with "jupytext": {"formats": "py//py,nb//ipynb"}. (using jupytext --set-formats py//py,nb//ipynb nb/test.ipynb)
This works fine when syncing with jupytext --sync nb/test.ipynb, but when syncing with jupytext --sync --pre-commit I get the following exception:

[jupytext] Notebooks in git index are:
py/test.py
[jupytext] Reading py/test.py
Traceback (most recent call last):
  File "PATH/jupytext", line 8, in <module>
    sys.exit(jupytext())
  File "PATH/jupytext/cli.py", line 292, in jupytext
    exit_code += jupytext_single_file(nb_file, args, log)
  File "PATH/jupytext/cli.py", line 389, in jupytext_single_file
    notebook, inputs_nb_file, outputs_nb_file = load_paired_notebook(notebook, fmt, nb_file, log)
  File "PATH/jupytext/cli.py", line 596, in load_paired_notebook
    for alt_path, alt_fmt in paired_paths(nb_file, fmt, formats):
  File "PATH/jupytext/paired_paths.py", line 128, in paired_paths
    raise InconsistentPath(u"Paired paths '{}' do not include the current notebook path '{}'. "
jupytext.paired_paths.InconsistentPath: Paired paths 'py/py/test.py','py/nb/test.ipynb' do not include the current notebook path 'py/test.py'. Current format is 'py:light', and paired formats are 'py//py,nb//ipynb'.

I have tried all sorts of pairings (taken from/inspired by the comments in #180), among them:

..//py,ipynb (and ipynb,..//py`)
nb//ipynb,..//py
nb//ipynb,py//py
but they all throw a similar exception (and work fine with a regular sync).

Addition
As a current workaround I'm using jupytext --sync nb/*.ipynb in my pre-commit hook, which seems to work fine.

The text was updated successfully, but these errors were encountered:

mwouts · 2020-05-17T22:22:39Z

Hello @slideclimb , thank you for reporting this. I will have a look soon.

mwouts · 2020-05-18T22:42:44Z

Hi @slideclimb , I have extracted a test from your report (thanks again), see ff5b466 .

Here at least, the test runs OK, so I am sorry I have not yet reproduced your issue.

Do you think there is remaining difference, either

in the notebook metadata (but, according to the error message, we do seem to have the same jupytext.formats metadata (i.e. py//py,nb//ipynb)
or in the way we call jupytext --sync --pre-commit - as you will see in my test, that command is expected to belong to the pre-commit hook. Is that the way you call it as well?

Also, it you can, let me know if the same test works for you, or not, and/or if you can tweak it to reproduce the issue.

slideclimb · 2020-05-19T19:08:34Z

Thanks for taking the time to create a detailed example!

The only difference I see is that you have the notebook file tracked in git, whereas I have the Python file tracked. This seems to be the problem, tracking only the Python file throws an exception, but tracking (only) the notebook file works fine. We don't have any difference in the other points you mentioned. The metadata is the same and jupytext --sync --pre-commit is called from the pre-commit hook.

Now I'm starting to doubt if I'm using jupytext the way it is intended... I use jupytext to create a paired Python file that I put in git, so I don't have to put the notebook file in git to avoid cumbersome and useless diffs/commits. Does that make sense?

mwouts · 2020-05-19T20:46:05Z

Oh I see... So, indeed now I can reproduce what you see (and I'll try to fix it).

Now, the question is what you want to do with the pre-commit. If you have the Python file tracked, then the .ipynb notebook will be updated when you change the .py file and commit. And if you edit the .ipynb file, then Jupyter will update both files, so your setting looks fine indeed! (and is probably a better one than the one we have in the documentation 😃 ).

mwouts · 2020-05-22T13:59:21Z

I have added the test above to the collection of Jupytext test. Following the changes made at #508 , the test now works, so I'll close this issue, and let you know (soon) when I have published a release candidate so that you can test on your side. Thanks again for your report!

mwouts · 2020-05-22T17:00:20Z

@slideclimb , would you like to give a try to the latest RC, and hopefully confirm that the problem is gone? Thanks!

pip install jupytext==1.5.0rc1

slideclimb · 2020-05-22T18:02:43Z

Thanks for the quick fix! I can confirm that the exception is gone 😄... But the pre-commit hook does nothing when I change the notebook file. (The list of notebooks in git index is empty.) Syncing manually (with jupytext --sync nb/test.ipynb) then updates the Python file, but when committing after that, git complains that it can't add the notebook file to git because it's in the .gitignore (as it should be).

Just to be sure we're on the same page, what I've just done is:

Create notebook nb/test.ipynb
Set up a jupytext pair with a python file py/test.py (this creates the python file)
Add the Python file to git (and add the notebook to the .gitignore)
Commit the Python file
Edit the notebook file
Commit (possibly add a dummy Python file to edit so the diff isn't empty before committing)

If I understand your test correctly, you first create the Python file and set it up with a paired notebook file? I think that's the only difference I see, as I create a notebook file and set it up paired Python file.

When I have only notebook files in a repo, the git diff is always empty and therefore I can't commit. But usually I have other python files living next to it that have been edited, or I manually update one of the notebooks before committing which then takes care of the rest.

mwouts · 2020-05-22T19:36:53Z

Thanks @slideclimb for the follow-up.

I'll adjust the test to match your use case, i.e. add *.ipynb to .gitignore.

One more question: which program do you use to edit the .ipynb file? If it's Jupyter notebook or Lab, and if you have Jupytext installed in the same environment, then .py file should be updated when you save the .ipynb file. But if you use other editors like PyCharm of VS Code for which we don't have a plugin, indeed, changing the .ipynb file is not going to update the .py version...

slideclimb · 2020-05-22T20:22:44Z

Thanks, I really hope we get it right this time :p

I use PyCharm.

mwouts · 2020-05-23T20:58:15Z

@slideclimb, that's good to know. We don't have a plugin yet for PyCharm (cf. #147), but I was wondering if one couldn't use the File Watchers in PyCharm. Personnally, I do this for black, and it works well. So I tried the following:

and it works partially, in the sense that, if you edit the .ipynb file, it does creates and updates the .py file.

Please pay attention however: with this settings, any change to the .py file that you don't commit will be lost! So maybe we should have another file watcher - the same, but on .py files (here it seems to work). And then, when you have the two file watchers, probably... you don't need the pre-commit hook, right?

Let me know what you think!

slideclimb · 2020-05-24T08:18:28Z

That's a great solution! Didn't know about file watchers. I suppose this works, and I guess the pre-commit hook then indeed is unnecessary.

Nonetheless, I still think the pre-commit hook should work...

About a plugin, I have some experience with JetBrains plugins, so if you want I could try to cook up a prototype some time? I guess it should be possible to register file watchers from within that plugin, and add something that does a --set-formats whenever a notebook file is created. I'm probably missing a lot of jupytext details/features, but I'm curious to see if something this simple would work and it could possibly be a starting point for a full-fledged plugin.

slideclimb · 2020-05-25T18:35:59Z

I tried setting up the file watchers in an existing project. So the notebook and python files are already paired, and currently synced. I observed the following:

When using only a watcher on the notebook, things seem to work pretty well. I'm not too convinced, as the first few times I executed a cell the entire notebook disappeared (which I could recover thanks to git), and later all the output disappeared, but I haven't been able to reproduce either. After that it magically worked fine.
When also using a watcher on the python file, a cell has to be executed twice before the output also changes. I have no idea why, the python file updates correctly after the first cell executes (and that does not store any output, right?).

I'll edit this comment whenever I have been able to reproduce these issues. So far I have not been successful in reproducing either. Just putting it here so you know it exists and I will get reminded of it whenever I get back here :)

mwouts · 2020-05-26T03:16:05Z

Thank you @slideclimb for the update. Yes using Jupytext in a file watcher is super new and maybe not so safe... Maybe this is because jupytext --sync is 100% based on file timestamps. And maybe we can improve this based on the experience with the two other plugins that exist at the moment:

a) In Jupyter, the plugin

will always load the input cells from the text version
but has an additional safety: it will not open a notebook when the script is older than the notebook

b) In VIM, the jupytext-vim plugin does not use --sync, but instead does something like jupytext --to py:percent notebook.ipynb and jupytext --to ipynb --update script.py. Maybe you cannot really use that in your setting, because you don't want to compute yourself the location of paired files, but maybe it would help to have an additional argument in jupytext --sync to let it know about which file is the most up-to-date, to avoid defaulting to timestamps when a more precise information is available.

Also, I did not follow-up on the pre-commit. I agree that it should work. We left it at the point where it was not acting on changes on the .ipynb file, because the .ipynb file was ignored. Do you think that, instead of ignoring the .ipynb file, you could unstage it in the pre-commit hook as we do here ?

benlindsay · 2021-03-06T03:07:15Z

I'm interested in the what the solution of unstaging the .ipynb file instead of .gitignore-ing it looks like, but i didn't see that in the link you shared. Can you point me to an example of that? Just starting in on jupytext and loving it already, so thanks for building it!

mwouts · 2021-03-07T19:01:28Z

Hi @benlindsay , I think in the comment above I was refering to the pre-commit script example at https://github.com/mwouts/jupytext/blob/v1.9.1/docs/using-pre-commit.md. We've removed it from the current version of the documentation as we think that pre-commit hooks are the way to go.

By the way, our current documentation at https://jupytext.readthedocs.io/en/stable/using-pre-commit.html is a bit short. We plan to post a longer article about pre-commit hooks for notebooks in the near future. Until then the best documentation are the tests test_pre_commit_(...).py in https://github.com/mwouts/jupytext/tree/master/tests.

We have not included an example where we unstage the .ipynb file as we were not sure that was a good practice, but I'll be happy to further discuss that.

In you own use case, do you use Jupyter to edit notebooks? If so, and if you only commit the .py files, then I'd be tempted to say that you don't need a hook, as Jupyter itself (with the Jupytext plugin) will take care of always taking the .py file as the most up-to-date version of the notebook. What do you think?

benlindsay · 2021-03-07T19:56:54Z

Thanks for the links! I plan to edit notebooks in jupyter and sometimes editing the .py files directly. The solution I'm trying out now is leaving .ipynb files .gitignored, then adding a pre-commit script that loops through all the notebooks in my repo, running jupytext --sync --pipe black on each file. Seems like it works fine for now. We'll see how it holds up when I actually start using it regularly in my workflow. Thanks again for this package! I'm excited to get a cleaner workflow set up around notebooks :)

mwouts added a commit that referenced this issue May 18, 2020

Reproduce the conditions for #506 in a test

354f3e4

mwouts added a commit that referenced this issue May 18, 2020

Reproduce the conditions for #506 in a test

ff5b466

mwouts added a commit that referenced this issue May 19, 2020

Commit .py file only - and reproduce #506

adb8cd6

mwouts added this to the 1.5.0 milestone May 21, 2020

mwouts added a commit that referenced this issue May 22, 2020

Reproduce the conditions for #506 in a test

da81638

mwouts added a commit that referenced this issue May 22, 2020

Commit .py file only - and reproduce #506

bf5140c

mwouts mentioned this issue May 22, 2020

Reproduce wrong paired paths when syncing with the --pre-commit flag #509

Merged

mwouts added a commit that referenced this issue May 22, 2020

Reproduce the conditions for #506 in a test

7e4676f

mwouts added a commit that referenced this issue May 22, 2020

Commit .py file only - and reproduce #506

3bf081c

mwouts closed this as completed May 22, 2020

mwouts reopened this May 22, 2020

mwouts mentioned this issue Jun 8, 2020

Sync issues with path//Rmd #541

Closed

mwouts mentioned this issue Mar 7, 2021

jupytext in pre-commit hook #292

Closed

mwouts mentioned this issue Oct 10, 2021

PyCharm plugin for Jupytext #147

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong paired paths when syncing with the --pre-commit flag #506

Wrong paired paths when syncing with the --pre-commit flag #506

slideclimb commented May 16, 2020 •

edited

mwouts commented May 17, 2020

mwouts commented May 18, 2020

slideclimb commented May 19, 2020

mwouts commented May 19, 2020

mwouts commented May 22, 2020

mwouts commented May 22, 2020 •

edited

slideclimb commented May 22, 2020

mwouts commented May 22, 2020

slideclimb commented May 22, 2020

mwouts commented May 23, 2020

slideclimb commented May 24, 2020

slideclimb commented May 25, 2020

mwouts commented May 26, 2020

benlindsay commented Mar 6, 2021

mwouts commented Mar 7, 2021

benlindsay commented Mar 7, 2021

Wrong paired paths when syncing with the --pre-commit flag #506

Wrong paired paths when syncing with the --pre-commit flag #506

Comments

slideclimb commented May 16, 2020 • edited

mwouts commented May 17, 2020

mwouts commented May 18, 2020

slideclimb commented May 19, 2020

mwouts commented May 19, 2020

mwouts commented May 22, 2020

mwouts commented May 22, 2020 • edited

slideclimb commented May 22, 2020

mwouts commented May 22, 2020

slideclimb commented May 22, 2020

mwouts commented May 23, 2020

slideclimb commented May 24, 2020

slideclimb commented May 25, 2020

mwouts commented May 26, 2020

benlindsay commented Mar 6, 2021

mwouts commented Mar 7, 2021

benlindsay commented Mar 7, 2021

slideclimb commented May 16, 2020 •

edited

mwouts commented May 22, 2020 •

edited